What is A/B testing and how does it work?

A/B testing (or split testing) is the method of comparing two versions of an element (page, ad, email, CTA) to determine which performs better. Traffic is split randomly between version A (control) and version B (new variant). The version with the higher conversion rate wins, if the difference is statistically significant.

How long should an A/B test run?

Minimum 2 weeks, ideally 4 weeks. Don't stop a test after 3 days and 100 sessions — results are not statistically significant. You need at least 100 conversions per variant to draw valid conclusions. If your traffic is low (under 500 sessions/day), A/B tests take much longer and may be inefficient.

What element should I test first: CTA, headline, or design?

Start with elements that have the highest potential impact: 1) Page headline or main heading — influences first impression. 2) CTA (call-to-action button) — text, color, and position. 3) Lead capture form or checkout flow. Test one element at a time; testing multiple elements simultaneously makes it impossible to identify the cause.

What does statistical significance mean in A/B testing?

Statistical significance (usually 95%) means there's a 95% probability that the observed difference is not by chance. Statistical significance calculators (free online) show you when you can declare a winner. Avoid declaring a winner before reaching statistical significance — you might implement a change that actually doesn't work.

Can I do A/B testing on ads in Google Ads or Meta Ads?

Yes. Google Ads has the Experiments feature that lets you test entire campaigns or ad groups. Meta Ads has integrated A/B Test in Ads Manager. For landing page tests, Optimizely, VWO, or Google Optimize (or alternatives) are dedicated tools. Ad tests should run simultaneously to eliminate seasonality effects.

How to Run A/B Tests Correctly: Without Wasting Traffic or Drawing Wrong Conclusions

Why Most A/B Tests Are Useless

80% of tests we see in client reports ran for 3 days with 200 sessions and declared a winner at +2%. That's not A/B testing — it's statistical noise. An incorrect test is worse than no test: it gives false certainty and leads to decisions that hurt conversion rates.

Minimum Requirements for a Valid Test

95% statistical significance — at most 5% chance the result is random. Minimum 100 conversions per variant — if your site does 10 conversions/week, you need 10 weeks per test, not 3 days. One variable at a time — headline, image, or CTA button. Not all three simultaneously.

Use a sample size calculator before starting. Tools like Evan Miller's calculator tell you exactly how many visitors you need per variant. Most Romanian e-commerce sites with under 5,000 sessions/month can only validly test 1–2 elements per quarter.

Types of Tests and When to Use Each

Classic A/B test

50% of traffic sees variant A, 50% sees variant B. Best for major changes: new headline, new offer, completely different page structure. Requires the most traffic but gives the clearest signal.

Multivariate testing

Tests combinations of variables simultaneously (headline × image × CTA = 8 combinations). Needs 4–8x more traffic. Only makes sense for sites with 50,000+ sessions/month.

Split URL testing

Traffic split between two different URLs, not variants of the same page. Right for completely redesigned pages or new landing pages where you want a clean comparison.

What's Worth Testing

Focus on high-impact elements: hero headline, main offer, primary CTA, contact form length, social proof positioning, price/pricing structure. Don't waste traffic volume testing button colour, font, or spacing — the effect size is too small to measure reliably.

Classic Mistakes

Stopping early — if after 2 days variant B shows 15% more conversions, the temptation is to declare it a winner. With low volume, a 15% difference is statistical noise. Set the test duration before starting and don't stop early.
Running multiple tests simultaneously — each active test consumes the same traffic pool and can interfere with others.
Ignoring seasonality — a test running Mon–Thu only is not valid. Run tests for at least 1–2 complete weeks to cover weekend variation.

Why A/B Testing Is Harder to Implement Than to Understand

The logic is simple. Execution isn't. To run a valid test you need sufficient traffic, a correctly configured tool, a clear hypothesis, and the discipline not to stop the test before reaching statistical significance. On a site with 3,000 sessions per month you can validate one element per quarter. On a site with 500 sessions per month, you practically can't test anything with certainty.

And if the test turns out negative, the interpretation is just as hard. Did variant B lose because the hypothesis was wrong, because the test ran too short, because seasonality distorted the data, or because traffic segments behaved differently? The answer to that question determines whether you test again or draw the wrong conclusion and revert to variant A without understanding why.

At DAFE Digital we test systematically for you. Well-grounded hypotheses, sufficient volume, clear conclusions without statistical noise.

Incorrect A/B testing produces false conclusions and bad decisions. We know how to build tests so you get real answers about what works — not confirmation of prior beliefs.

Let's talk about testing your campaigns and pages

Adela Mincea

Marketing Economist · Fondatoare DAFE Digital · Formator ANC

Adela is a Marketing Economist with over 10 years of paid media experience across Europe, the US and Asia. She founded DAFE Digital for one reason: serious Romanian businesses deserve the same paid media expertise companies get in any other market. That's what DAFE Digital does.

adelamincea.com LinkedIn

Share this article

LinkedIn X