Back to Blog

Data Use Act 2025 Rewrites Cold Email Personalization Rules

May 4, 2026 Avery Callahan

The Data Scraping War: How AI Middlemen and the Data Use Act 2025 Are Rewriting Cold Email Personalization Rules

What happens when the data you use to personalize cold emails was scraped without permission—and the new Data Use and Access Act makes that illegal? That's not a hypothetical. It's a live grenade sitting in your CRM right now. The same AI data brokers that publishers call "a hostile takeover funded by their own IP" are selling enrichment datasets to email marketers. You buy a list of "high-intent prospects" and don't ask where the intel came from. But the law is starting to care. And Gmail's spam filters are already ahead of the curve.

The $1 Billion Scraper Economy That Pays Publishers Zero

You've seen the headlines: AI companies scraping the open web to train models. But there's a lucrative side hustle that gets less airtime—third-party data brokers who scrape publisher content and resell it as "audience insights" or "intent data." A recent report pegged this scraper economy at $1 billion, per Mordor Intelligence. Identify 21 vendors doing it, including names like Firecrawl, Exa, Tavily, and Bright Data. Another tracker found nearly 40.

One publishing exec described it bluntly: "We’ve got all these 30, 40, 50 startup DSPs for content, but they’re taking a 100% fee." Not a cut—everything. They take the content, pay nothing, and in some cases build competing products that remove the publisher entirely. Chris Dicker, CEO of Candr Media, called it worse than the ad tech tax: "At least with ad tech middlemen publishers got something back. With scrapers, the value extraction is total."

These outfits are now rebranding as "agentic infrastructure" to keep stealing in plain sight. One analyst called out Parallel Web Systems specifically for this. The game hasn't changed—agents will consume the web at a scale that dwarfs human behavior. And until there's a real marketplace to price and govern that consumption, they compete on "who can extract the most value from the web the fastest while the question of who gets paid remains unresolved."

How Scraped Data Infects Your Cold Email Campaigns

If you're running cold email campaigns, you're likely buying data from vendors who sell "firmographics," "buying signals," or "contact enrichment." Some of that data was scraped from publisher sites—blogs, news articles, trade publications—without permission. The broker scrapes a mention of a company hiring a new VP of Sales, tags it as an "intent signal," and sells it to you. You then use that signal to personalize a cold email: "Congrats on the new role, Sarah. I see you're scaling your sales team."

There's a problem. Sarah never consented to that data being collected. The publisher didn't authorize the scrape. And if the scraped site had a no-crawl directive that the broker ignored—which is common—then you're building personalization on stolen IP.

This doesn't just raise legal flags. It kills deliverability. Gmail and Outlook are increasingly using engagement signals to judge sender reputation. If recipients flag your email as spam, or if your domain gets associated with low-quality data sourcing, you'll hit the spam folder fast. The irony: you invest in "better personalization" to improve response rates, but the raw material is so ethically tainted it torpedoes your sender score anyway.

The Data Use and Access Act 2025: The Shift Marketers Missed

Enter the Data Use and Access Act (DUAA) 2025. It's being called the biggest shift since GDPR, but with a different tone. The Act loosens restrictions on automated decision-making—good news for AI-driven personalization—but it also tightens rules on data provenance. You can use automation more freely, but only if your safeguards are solid. Sensitive data (health, ethnicity) is still off-limits.

Here's the kicker for cold email marketers: the DUAA introduces "broad consent" for research, meaning you don't need to re-consent every time for long-term studies. But that's not the same as consent for commercial messaging. The Act also relaxes cookie consent for analytics and performance—but it doesn't relax consent for scraping content without permission. If you're using data scraped from a publisher's site without their explicit authorization, you're operating in a grey zone that's rapidly turning red.

The DUAA makes it clear: data provenance matters. If you can't prove where the data came from and that it was collected lawfully, you're exposed. For email marketers, that means your list-buying habits need an audit. The brokers who slice up publisher content into "intent scores" aren't going to be transparent about their sourcing. They'll call it "agentic infrastructure" and keep selling.

Three Things You Can Do Right Now to Protect Your Sender Reputation

Stop expecting the law to police this. Start policing your own data pipeline. Here's concrete action you can take today:

Audit your data vendors. Ask every enrichment or list provider: "Where did you get this data? Can you show me the consent chain?" If they can't answer in a sentence, drop them. The scrapers will dodge or lie—that's the active deception publishers already report.
Shift to permission-first sourcing. Instead of buying scraped intent data, build signals from opt-in sources: webinar registrations, content downloads, LinkedIn groups where members explicitly agree to be contacted. Yes, it's slower. Yes, it's more expensive. But it keeps you out of the spam folder and off the legal docket.
Monitor deliverability with a vengeance. Use your email platform's tools (like FiresideSender's deliverability suite) to watch bounce rates, spam complaints, and engagement decay. If you see a sudden drop in inbox placement after using a new data source, that's a red flag. Pull the campaign and investigate. Don't assume the data is clean just because it came from a "trusted" broker.

Also: stop assuming that "personalization" requires deep scraping. A simple "I saw you at [event]" or "your recent post on [topic] resonated" is often enough—and that data comes from public, consensual interactions, not back-alley crawls.

The Unresolved Tension: Who Polices the Scrapers?

Publishers are screaming into a void. Licensing deals are driven less by recognition of value and more by platforms limiting legal exposure. Meanwhile, the brokers rebrand and keep extracting. The DUAA gives regulators more tools, but enforcement is still a question mark. And the email marketers who buy scraped data are the downstream consumers—you're the ones who get hit with fines, spam penalties, and reputational damage.

One publishing exec's comment hung in the air: "If the message is 'no crawl,' then they need to remember that no means no." But until the law matches that sentiment with teeth, the scraper economy will keep feeding your CRM. The question is whether you'll keep eating from that table—and whether your inbox placement will survive the meal.

The data scraping war isn't just a publisher problem. It's your deliverability problem. And the Data Use and Access Act 2025 just turned the heat up. Can email marketers afford to wait for regulation to catch up, or is it time to cut the scraped data cord now?

Related Resources

Keep building your outbound system

SiteSignals

Convert anonymous traffic into outreach-ready leads.

Email warming

Warm domains and improve inbox placement before outreach.

Cold email

Run sequences, personalize templates, and manage replies.