System1 Star Rating Explained: How to Interpret the Numbers (2026)

Q: Can I trust System1 for creative budget decisions?

Directionally yes, absolutely yes. Star Rating has documented correlation with sales lift on a 100k+ ad database. It's not a perfect oracle: 4-star ads can fail and 2-star ads can win for specific factors (media buy, distribution, competitor). But used as one of the decision criteria, it significantly reduces "disaster ad" risk.

Q: Does System1 work for social-first short-form ads?

Yes, System1 has an adapted version for TikTok/Reels short content. Smaller database, less consolidated calibration. For heavy-social SMBs: indicative, not gold standard.

Q: How to choose between Star Rating and Spike Rating as priority metric?

It depends on the campaign objective. Brand building (long-term): Star Rating priority. Direct response/time-limited promo: Spike Rating. Mixed campaigns: monitor both, target Star > 3 + Spike present.

In short: System1 Test Your Ad measures the emotional response of 150 consumers to an ad and converts it into a Star Rating (1.0-5.9) that predicts long-term ROI. 5+ star ads generate on average 3x ROI vs baseline. Three complementary metrics: Star Rating (long-term sales lift), Spike Rating (short-term lift 1-7 days), Fluency Rating (brand recognition). Paradigmatic examples: Aldi Kevin the Carrot (5.9), Apple "Mac vs PC", Volkswagen "Lemon". Limits: not very representative for Italy, B2B, direct response.

What System1 is (and why it differs from other pretests)

System1 is a company founded in 2000 (originally Brainjuicer) by John Kearon. From Orlando Wood (co-author "System1 effectiveness research") onward, the company developed Test Your Ad: a method of pretesting ads that measures consumers' non-conscious emotional response, drawing on Daniel Kahneman's System 1/System 2 theory.

Difference with traditional pretests (focus groups, recall tests, copy tests): System1 measures the feeling response to the film — the automatic reaction, before rational judgment. Traditional metrics measure what the consumer says they think (post-rationalization often poorly predictive of sales). System1 measures what the consumer feels in the moment (predictive of memorability and sales lift).

Star Rating: 1.0-5.9 scale, right-brain test

Star Rating is the main metric. Range 1.0 (worst) to 5.9 (best). Measured on 150 target consumers who watch the ad and complete a feeling test with a "Smiley faces" interface (7 emotions: very happy, happy, neutral, sad, angry, surprise, contempt).

Each emotion has a weight. The distribution of responses produces a normalized score. The output (1.0-5.9 stars) is calibrated on a database of 100,000+ ads and is correlated with documented long-term sales lift (12-24 months post-air).

Star Rating	% of ads	ROI multiplier vs baseline
1.0-1.9	35-40%	0.5-0.8x
2.0-2.9	30-35%	0.8-1.2x
3.0-3.9	15-20%	1.2-1.8x
4.0-4.9	8-12%	1.8-2.5x
5.0-5.9	3-5%	2.5-3.5x

The world's best ads (Cannes Lions Effectiveness winners) cluster in the 4.5-5.9 range. Most mass market ads are 1.5-2.5: barely ROI-positive.

Spike Rating: short-term lift

Spike Rating measures immediate sales lift potential (days 1-14 after air). Distinct from Star Rating because:

Star Rating: cumulative brand building, mental availability, ROI 12-24 months.
Spike Rating: immediate response, promo conversion, ROI 1-2 weeks.

Ads with high Spike Rating often have: scarcity message, clear CTA, time-limited promo, urgency. Ads with high Star Rating often have: positive feeling, character, storytelling, integrated brand.

The two can coexist but not always. Cannes Effectiveness winners tend to have high Star + medium Spike. Direct response winners (DR commercials) high Spike + low Star.

Fluency Rating: brand recognition

Fluency measures the percentage of viewers who correctly recognize the brand shown. It's critical because a brilliant ad without brand recognition = "vampire effect" (captures attention but doesn't build equity).

Alert threshold: Fluency < 70%. Below this level, the ad performs but the brand doesn't benefit. Optimal threshold: 80-90%.

Recurring pattern for high fluency: brand present at the start AND end, distinctive brand assets (color, sound, character) integrated in the narrative, brand voice congruent with narrative tone.

Paradigmatic examples

Aldi UK "Kevin the Carrot" (Christmas 2016-2024). Star Rating 5.9 (maximum). Character (Kevin), positive storytelling, clearly Aldi brand. ROI documented in IPA Effectiveness Awards: market share lift +3.2 percentage points over 8 years.

Apple "Mac vs PC" (2006-2009). Star Rating range 4.5-5.5. Character (Justin Long, John Hodgman), humor tone, brand framing. Cemented Apple's "creative pro" positioning for over a decade.

Volkswagen "Lemon" (1959). Print ad, but analyzable retroactively: maximum System1 level for uniqueness + emotional response (humorous twist). Paradigmatic case of the DDB era.

IKEA "Lamp" (Spike Jonze, 2002). Star Rating 5.5+. Minimal storytelling (lamp thrown away, ironic final hero), positive feeling via subverted humor. Cannes Lion Grand Prix.

Amazon "Moving Day" / "Alexa Loses Voice" (2018-2019). Multiple Super Bowl ads with Star Rating 5.5+. Pattern: character, humor, integrated brand.

Framework limits

(1) Cultural bias. System1 has a primarily UK/US database. For Italy, Germany, Japan, calibrations are less robust. Anglo-Saxon humor ads may not transfer to the Italian market (and vice versa).

(2) Not for B2B. The framework is validated on consumer ads. For B2B SaaS, B2B industrial, professional healthcare, there is no robust benchmark.

(3) Not for direct response. Star Rating measures long-term sales lift. For DR ads (call center, e-commerce CTA), Spike Rating is more relevant. Star Rating may be low but DR ad ROI-positive.

(4) Cost. Test Your Ad costs €5-15k per ad. For small brands with limited production budgets, it can be over-investment.

(5) Doesn't replace media planning. Star Rating 5.9 + weak media buy = no result. Star Rating + strategic reach = compound effect.

Low-budget alternatives for SMBs

(1) DIY pretest. Show the ad to 20-30 target people, ask: "On a 1-7 scale, how do you feel?". Calculate the average. It's not calibrated System1, but a useful direction for choosing between creative alternatives.

(2) Facebook Ad Library benchmark. Compare CTR, completion rate of your ad with sector Facebook benchmarks (publicly visible). Above benchmark = ad performing.

(3) Google Ads Brand Lift Study. For Google Ads advertisers, free integrated lift study (eligibility threshold of spend). Measures brand awareness lift on YouTube ads.

(4) Brain Boost / TestApe / similar low-cost tools. €500-3000 per ad, limited panel but similar framework. Limitations: less robust than System1 but useful for directional choice.

(5) Pre-test agency creative. Many boutique agencies offer internal pretest (focus group, qual research) as part of creative pricing.

Integrated workflow for SMBs

Concept development: 3-4 creative alternatives.
DIY pretest: 20-30 target people on feeling response. Select top 1-2.
Production: rough cut or finished depending on budget.
Paid test (System1 or alternative): final validation before air.
Airtime: with brand search, social mention, sales monitoring.
Post-campaign: comparison of predicted Star Rating vs effective sales lift. Internal model calibration.

FAQ

Can I trust System1 for creative budget decisions?

Directionally yes, absolutely yes. Star Rating has documented correlation with sales lift on a 100k+ ad database. It's not a perfect oracle: 4-star ads can fail and 2-star ads can win for specific factors (media buy, distribution, competitor). But used as one of the decision criteria, it significantly reduces "disaster ad" risk.

Does System1 work for social-first short-form ads?

Yes, System1 has an adapted version for TikTok/Reels short content. Smaller database, less consolidated calibration. For heavy-social SMBs: indicative, not gold standard.

How to choose between Star Rating and Spike Rating as priority metric?

It depends on the campaign objective. Brand building (long-term): Star Rating priority. Direct response/time-limited promo: Spike Rating. Mixed campaigns: monitor both, target Star > 3 + Spike present.

Does low Star Rating mean guaranteed "fail"?

No, but "probable underperformance". Ads with Star 1-2 generate on average ROI 0.5-1x (essentially: brand spends money without generating lift). Decision: review before air, modify editing, music, ending.

How important is Fluency Rating vs Star Rating?

Fluency is a gate: below 70% Star Rating doesn't translate to brand benefit. Above 70% Fluency, Star Rating becomes the main driver. Pattern: optimize fluency first (brand presence, distinctive assets), then Star Rating (feeling response).

Can I use System1 patterns without doing the test?

Yes, partially. The principles (character presence, positive atmosphere, integrated brand, memorable ending, distinctive music) are replicable in concept phase even without formal testing. The test validates; the principles guide creation.

Sources and references

System1 — Test Your Ad methodology and database: system1group.com
Wood, O. — "Lemon: How the Advertising Brain Turned Sour" (System1, 2019)
Wood, O. — "Look Out: How the Right Brain Sees Things Whole" (System1, 2021)
Kahneman, D. — "Thinking, Fast and Slow" (2011, Farrar, Straus and Giroux) — System 1/2 foundation
Binet, L. & Field, P. — "The Long and the Short of It" (IPA, 2013)
IPA Effectiveness Awards — case studies with System1 score correlation
Cannes Lions Creative Effectiveness — annual reports
Romaniuk, J. — "Building Distinctive Brand Assets" (2018) — fluency concept
Brain Boost / TestApe — alternative pretest tools docs