Vectors Won't Save You From GDPR. They Just Move The Problem.

Jun 2
6 min read

Anyone who has worked in the ad industry for a long time knows it has a habit of reinventing the same idea under a new name and hoping regulators won't notice. First it was cookies, then device graphs, then identity resolution, and finally hashed emails dressed up as "privacy-safe IDs." Each cycle came with the same pitch—the new format was different enough from the old one to fall outside the rules that had constrained it.

Now it's embeddings. Sure, the new pitch is more elegant than the ones before it. No emails, no names, no IDs — just math. What's not to like? A user becomes a position in a high-dimensional space, and the system reasons about that position without ever needing to know who the person is. For a moment, it feels like a clean break. Headline: it's not.

In my last two pieces, I argued that the segment was giving way to the vector and that measurement had to follow. Sure, that technical shift is real. But there's a quieter argument running underneath it that the industry has not yet had honestly: If the vector is the new substrate of the system, is it also the new substrate of the privacy problem? The answer, uncomfortably, is yes.

The Architecture that Got Us Here

Give embeddings their due. As a piece of engineering, they are a genuine leap. For those unfamiliar with the term, embeddings are numerical representations of data—such as text, images, or audio—that capture semantic meaning and relationships in a machine-readable vector format. They allow AI systems to compare, search, and reason over information based on contextual similarity rather than exact matches or keywords. The appeal from a privacy perspective is embeddings can reduce direct exposure of personally identifiable information (PII) and limit how raw data is shared across systems.

Using embeddings, a user is thus no longer represented by a discrete identifier matched against a discrete segment. They become a dense numerical position — proximity to "high-value auto intender," adjacency to "frequent traveler," drift toward "luxury retail affinity." Decisions are made by similarity rather than by lookup. Models generalize across platforms, while optimization adapts continuously. This is the foundation of the agentic systems now emerging across the buy side, and it is the reason the format is winning.

Out of that genuine leap, however, the industry has manufactured a misconception that is now hardening into received wisdom. Namely, abstraction is being confused with anonymization. Sorry to break it to you, but they are not the same thing.

The EU's GDPR (General Data Protection Regulation), a law governing how personal data is collected, stored, processed, and protected, was not written for cookies, or for hashes, or for tensors. It was written to be technology-agnostic by design, and its core test has nothing to do with the data structure in question. The test is whether the data relates to an identifiable individual, directly or indirectly. The way the statute is written, if a system can single out a user, link behavior across contexts, or infer attributes about that user, it is by definition processing personal data. The format of the data is irrelevant, what matters is what the format makes possible.

Embeddings, at least the way they are used in most real-world AdTech implementations, qualify as processing of personal data. In other words, they may not look like identity, but they certainly function like it.

Where the New Architecture Goes Blind

Sometimes it's difficult to believe the industry spent a decade solving for identity resolution. In a vector-native world, identity is no longer the only anchor—meaning is. And meaning, it turns out, is just as persistent, just as linkable, and just as actionable as the identifiers it replaced. This is where we run into issues with GDPR and other provacy laws.

Consider a user's position in semantic space is stable enough to be recognized across sessions, evolving enough to be tracked over time, and specific enough to be acted upon at the impression level. In other words, if you can consistently recognize and act on the same individual through their vector representation, you have not removed identity. You have simply changed the coordinate system.

This is the same pattern that played out with pseudonymization a decade ago. Hashed emails, device graphs, probabilistic IDs—each was offered up to regulators as not directly identifiable. Each was met with the same response: that didn't matter because the data could ultimately be linked back or used to profile. Embeddings are therefore the next iteration of that same argument, expressed in vectors instead of strings. Under GDPR, pseudonymous data remains fully in scope. Case closed.

Moreover, the substantive downstream activities haven't really changed either. Audience segmentation, lookalike modeling, personalization, bid optimization—these are the things AdTech does, and embeddings make them more efficient, not different. Rigid segments become fluid representations, binary inclusion becomes probabilistic similarity, and static rules become adaptive systems. From a regulatory standpoint, this is still profiling, and in many cases automated decision-making, both of which are explicitly regulated under GDPR regardless of whether traditional identifiers are present.

GDPR for Dummies, courtesy of HIPAA Guide

The Mechanism Has Changed. The Outcome Has Not.

There is a more subtle problem underneath the obvious one. In traditional systems, personal data was explicit, visible, and auditable—you could point to the row in the table. In vector-based systems, it becomes compressed, distributed, and latent. Recent regulatory and academic work has shown that personal information can be recovered from models through membership inference, model inversion, and attribute prediction. Personal data doesn't disappear inside a model, rather it becomes harder to detect and harder to govern. For regulators, that's an escalation, not a reduction in risk.

Importantly, this isn't a theoretical edge case. It's already embedded in the core of the ecosystem—in retail media networks compressing first-party CRM data into targeting layers, in clean rooms using vector similarity to match audiences across environments, in CTV and cross-device systems building household-level embeddings as persistent identity proxies, and in the agentic buying systems now optimizing toward inferred user states in real time. In each case, the pattern is identical: abstraction increases, interpretability decreases, but decisioning remains tied to individuals. The entire system remains squarely within regulatory scope.

Closing the Compliance Gap

The industry's recurring bet is that if it changes how data is represented, it changes how the data is regulated. Sorry to break it to you, but history keeps disagreeing. Regulation follows capability, not format, and implementation instead of impact. Cookies weren't the problem, nor were dentifiers. The problem was always the same—the ability to observe, model, and influence individuals at scale. Embeddings do not remove that ability. If anything, they enhance it.

There is a meaningful shift happening underneath this. But it isn't a regulatory escape hatch. Rather, it's a shift in where governance has to operate. The locus of personal data is moving from rows and tables to models and latent space, from explicit identifiers to inferred representations, and from deterministic matching to probabilistic alignment. None of this makes the data less personal—it only makes it more difficult to find.

The appropriate response to this shift is not better marketing of the latest abstraction gambit. I would argue what is needed is a different compliance posture—privacy-preserving techniques like differential privacy applied at the model layer, stricter separation between identity and representation, clearer purpose limitation, and rigorous model-level auditing that can interrogate what a model has learned and what it can recover. You don't get compliance by abstracting the data. You get it by redesigning the system itself.

Embeddings feel like a loophole because they obscure the data. But love it or hate it, GDPR was built to look past the surface. If a system can recognize, evaluate, or influence an individual—even probabilistically—it is by definition processing personal data. Vectors don't eliminate the privacy problem, they simply relocate it into a layer that is more powerful, less transparent, and significantly harder to regulate.

The cookie had its era. The hash had its era. The vector's era is just beginning — and it is arriving with the same regulatory weight the previous two carried, not less. The real question now isn't whether embeddings bypass GDPR. It's whether the industry is ready for the fact that they don't, and never will.

--------------------------------

Evgeny Popov is New York based, internationally accomplished global media executive with over 25 years of experience spanning APAC, LATAM, MENA, EMEA and Americas. He is known for his strong focus on building teams, products (SAAS · DAAS · IAAS · PAAS), and technologies that unlock growth for businesses ranging from startups to large enterprises. Over his career he has worked across sell side, buy side, media agencies, and with various vendors (tech and data) worldwide.

Vectors Won't Save You From GDPR. They Just Move The Problem.

The Architecture that Got Us Here

Where the New Architecture Goes Blind

The Mechanism Has Changed. The Outcome Has Not.

Closing the Compliance Gap

Recent Posts

1 Comment