Dark Experiment·2025-11-12·9 min read

Experiment 000: We predicted political affiliation from emoji use alone — 87.5% accuracy

Nobody thinks twice about which emoji they use. That indifference is the point. We trained a neural network exclusively on emoji behaviour — not what people said, not who they followed, not where they live — and predicted UK political affiliation with 87.5% accuracy on a held-out evaluation set of 2,000 profiles. The classifier never read a single word of content. It only watched which symbols people reached for, how often, and in what patterns.

Why emoji

The choice was deliberate. Emoji are low-stakes. People deploy them without the self-censorship that shapes word choice — they are affective shortcuts, not considered statements. They are also consistent: the same person tends to reach for the same symbols across contexts and platforms. And crucially, they are universally logged. Every platform captures them. Every API surfaces them. They are signal hiding in plain sight, treated as noise by almost everyone except advertising infrastructure.

The dataset

10,000 public social media profiles with declared or strongly inferred UK political affiliation — Labour, Conservative, Liberal Democrat, Reform, and Green — drawn from accounts that had publicly engaged with verified party content, used party hashtags, or self-identified in bios. Profiles with fewer than 200 posts were excluded. No private data. No scraping beyond public API limits. 2,000 additional profiles were held back entirely for final evaluation.

From each profile we extracted every emoji used across their post history: the emoji itself, its position in the post, frequency of use overall, repetition within a single post (e.g. 😂😂😂 vs a single 😂), and whether it appeared in isolation or alongside other emoji types. No text. No metadata. Emoji signals only.

Scoring system

Every emoji scored on 10 dimensions

Each emoji in the Unicode set was manually scored from 0.0 to 1.0 across ten emotional and behavioural dimensions: humour, anger, silliness, smugness, empathy, contempt, sadness, irony, mockery, and solidarity. Scores were assigned by a panel and cross-validated for consistency. A laughing-crying face scores high on humour and mockery, low on empathy. A raised fist scores high on solidarity and anger, low on humour. The resulting vector for each emoji becomes the input layer's vocabulary.

The network

A straightforward feed-forward neural network with three hidden layers. Input: a per-profile feature vector encoding aggregate emoji scores across all ten dimensions, plus two structural features — mean emoji repetition rate (how many of the same emoji appear consecutively or within a single post) and overall emoji density (emoji per post, normalised by post length). Output: a probability distribution across five party labels. No attention mechanism. No transformer. Deliberately simple — we wanted to know what a basic classifier could do with clean features, not what a state-of-the-art model could extract from raw sequences.

Trained on 10,000 profiles, 80/20 train/validation split during training, then evaluated cold on the held-out 2,000. We ran five independent training runs with different random seeds and averaged results to avoid reporting a lucky outcome.

Evaluation result

87.5% accuracy. 2,000 unseen profiles. Emoji only.

Across five runs, mean accuracy on the held-out set was 87.5%, ranging from 86.1% to 88.9%. Random chance across five parties is 20%. A naive majority-class baseline (predicting Labour for every profile) scores around 34% on our dataset. The classifier outperforms baseline by 53 percentage points using nothing but how people use emoji.

What the network found

The most predictive features were not what we expected going in. Aggregate anger score mattered, but repetition rate was the single strongest individual predictor — more than the emotional content of the emoji themselves. Left-leaning profiles showed significantly higher repetition: the same emoji deployed multiple times within a single post, a pattern that appears to amplify emotional intensity (😡😡😡 rather than 😡). Right-leaning profiles favoured lower overall emoji density — fewer emoji per post on average — and when emoji were used, they skewed toward high-humour, high-mockery, and high-smugness scores.

Key findings by affiliation

Left-leaning profiles

Higher anger dimension scores

Higher solidarity and empathy scores

Significantly higher repetition rate

Higher emoji density per post

Narrower emoji vocabulary (same symbols reused)

Right-leaning profiles

Higher humour and mockery scores

Higher smugness dimension scores

Lower repetition rate overall

Lower emoji density — fewer, more deliberate

Broader emoji vocabulary across posts

Green and Lib Dem profiles showed the highest empathy and sadness scores. Reform profiles were the most distinctive: low emoji density combined with the highest contempt scores of any group, making them the easiest cluster to separate from the others — the classifier was most confident on these labels. Labour and Conservative profiles showed the most overlap and were the hardest to distinguish from each other in borderline cases.

What this signal actually is

Emoji use is a proxy for emotional expression style. Emotional expression style is correlated — not perfectly, but significantly — with psychological traits like openness, agreeableness, and neuroticism. Those traits, in turn, are among the most robust predictors of political orientation in the academic literature (the Big Five / OCEAN model). The classifier isn't doing anything mystical. It is finding the downstream shadow of a well-established psychological relationship, expressed in a signal people never thought to guard.

The implication

You are not being profiled on what you say. You are being profiled on how you say it.

Content moderation, platform algorithms, and your own instincts all focus on the words. The signal extraction that actually matters operates at a layer below that — on the style, the rhythm, the affective texture of communication. Emoji are one example. Sentence length distribution, punctuation patterns, posting time variance, and reply latency are others. None of these feel like personal data. All of them are.

Good actors, bad actors

A classifier like this one has legitimate uses. Detecting coordinated inauthentic behaviour — bot networks trained to mimic human emotional patterns — becomes significantly easier when you can fingerprint the affective signature of genuine human communication at scale. If a thousand accounts claiming to be ordinary voters share identical emoji density and repetition profiles, that is a statistical anomaly worth investigating. Platform trust and safety teams could use exactly this kind of signal.

The bad actor version is less comfortable to describe but more likely to be deployed at scale. Political micro-targeting already uses declared interests, location, and browsing behaviour. Adding a behavioural political classifier derived from emoji — which works without any declared or consciously provided data — extends that targeting to people who have taken active steps to avoid being profiled. People who don't follow political accounts. People who don't engage with political content. People who think they're invisible. They are not.

Adversarial use case

Targeting the politically undecided — without their knowledge

The most valuable voters in any election are the genuinely undecided. Traditional targeting misses them because they produce no clear political signal. A classifier trained on stylistic patterns rather than content finds them anyway — and can identify which emotional levers are most likely to move them, based on their dimensional emoji profile. This is not a future risk. The data pipeline for this costs less than a junior analyst's monthly salary. The compute costs under £200 to run. Anyone with access to a platform's public API and three weeks of development time can build it.

The broader point

We chose emoji because no one takes them seriously. That was the experiment. The finding — 87.5% — is not the interesting part. The interesting part is the principle it demonstrates: any signal that encodes emotional expression at scale is a political profiling input. The more innocuous it feels, the more useful it is to a bad actor, because people don't think to sanitise it.

The signals you consciously curate — what you post, who you follow, what you publicly endorse — are the ones you have some control over. The signals that classify you most accurately are the ones you've never considered as signals at all. Emoji today. Typing cadence tomorrow. Scroll velocity the day after that. The input vector for political profiling is expanding faster than the awareness that it exists.

Defensive posture

You can't sanitise what you don't know is a signal

Individual defence against this class of attack is largely impractical — you cannot consciously randomise your emoji patterns without making all your communication feel alien. The real lever is regulatory: behavioural fingerprinting for political targeting should require the same consent and disclosure as any other political data use. It currently does not. Organisations deploying this — and they are — face no specific restriction on derived political inference from apparently neutral behavioural data. That gap is the vulnerability. Not the emoji.

← Back to The Lab Start a project