Back to Blog

How to Prioritize CRO Tests: ICE, PIE, and PXL Frameworks Compared

· CRO Audits Team · 11 min read
How to Prioritize CRO Tests: ICE, PIE, and PXL Frameworks Compared

You have 47 test ideas on your backlog. Limited traffic. One testing tool. Where do you start?

This is the question that separates disciplined CRO programs from chaotic ones. Without a prioritization framework, teams default to gut instinct, politics, or whoever shouts loudest in the meeting. The result: wasted time on low-impact tests while high-value opportunities sit untouched.

A good prioritization framework gives you a repeatable, defensible way to rank test ideas so you’re always working on what matters most.

Let’s break down the three most widely used frameworks — ICE, PIE, and PXL — and help you pick the right one for your team.

Why Prioritization Matters More Than Ideas

Most CRO programs don’t fail because of bad ideas. They fail because of bad sequencing.

Consider this: if you can run roughly 2-3 tests per month, that’s about 30 tests per year. Your backlog probably has 50+ ideas. Choosing the wrong order means:

  • Lost revenue — high-impact tests sitting in a queue while you test button colors
  • Wasted traffic — every test that runs consumes traffic that could power a better test
  • Stakeholder fatigue — too many inconclusive results erode confidence in the program
  • Opportunity cost — time spent on marginal wins is time not spent on transformational ones

The math is simple: if your best test idea would generate $200K in annual revenue and your worst would generate $5K, running them in the wrong order costs you real money every month you delay.

Framework 1: ICE (Impact, Confidence, Ease)

ICE is the most popular framework for a reason — it’s dead simple.

How It Works

Score each test idea from 1-10 on three dimensions:

  • Impact — How much will this move the needle if it wins?
  • Confidence — How sure are you it will produce a measurable result?
  • Ease — How easy is this to implement and launch?

Multiply the three scores together, then rank by the result.

Example Scoring

Test IdeaImpactConfidenceEaseICE Score
Redesign checkout flow973189
Add trust badges to cart689432
Rewrite all product descriptions75270
Simplify mobile navigation875280

In this example, adding trust badges scores highest — not because it has the biggest potential impact, but because it’s high-confidence and easy to implement. That combination often wins.

When ICE Works Best

  • Small teams that need speed over precision
  • Early-stage CRO programs still building testing culture
  • Quick triage of a large backlog into rough priority tiers
  • Stakeholder alignment — the scoring is intuitive enough for anyone

ICE Limitations

The biggest problem with ICE is subjectivity. Two people scoring the same idea will often produce wildly different numbers. “Impact” especially is vague — does a 7 mean a 7% lift? $7K in revenue? A noticeable but not dramatic improvement?

Without calibration, ICE scores tend to reflect personal bias more than objective analysis.

Framework 2: PIE (Potential, Importance, Ease)

PIE was developed by Chris Goward at WiderFunnel and adds a strategic lens to prioritization.

How It Works

Score each test idea from 1-10 on:

  • Potential — How much improvement can be made on this page/element? (Based on data: analytics, heatmaps, user research)
  • Importance — How valuable is the traffic to this page? (Volume, quality, revenue impact)
  • Ease — How complex is the test to design, build, and run?

Average the three scores to get your PIE score.

What Makes PIE Different

The key distinction is Potential. Instead of asking “how big could the win be?” (which invites speculation), PIE asks “how much room for improvement exists here?”

This shifts the conversation toward data. A page with a 90% bounce rate has more potential than one with a 30% bounce rate. A checkout step where 40% of users drop off has more potential than one where 5% drop off.

Example Scoring

Test IdeaPotentialImportanceEasePIE Score
Redesign checkout flow8936.7
Add trust badges to cart5897.3
Rewrite product descriptions7746.0
Simplify mobile navigation8857.0

When PIE Works Best

  • Data-driven teams that have analytics and qualitative data to inform Potential scores
  • Page-level prioritization — PIE naturally maps to pages in your funnel
  • Teams with clear traffic data who can objectively score Importance
  • Mid-maturity CRO programs that have moved past pure gut instinct

PIE Limitations

PIE still relies on subjective scoring. The “Potential” dimension is better than “Impact” because it’s grounded in observable data, but it still requires interpretation.

PIE also doesn’t account for the quality of evidence behind each idea. A test inspired by user research and session recordings should rank differently than one inspired by a competitor’s website — but PIE treats them the same.

Framework 3: PXL (Prioritization by Experimentation Length)

PXL was developed by Peep Laja at CXL and takes the most rigorous approach of the three.

How It Works

Instead of subjective 1-10 scales, PXL uses binary (yes/no) and objective criteria:

Binary Questions (Yes = 1, No = 0):

  • Is the change above the fold?
  • Is the change noticeable within 5 seconds?
  • Does it add or remove an element (vs. modifying)?
  • Does it run on high-traffic pages?

Evidence-Based Scoring (0, 1, or 2):

  • Is it supported by user testing? (2 points)
  • Is it supported by qualitative data (surveys, recordings)? (1 point)
  • Is it supported by quantitative data (analytics, heatmaps)? (1 point)
  • Is it supported by best practices or hypothesis only? (0 points)

Ease of Implementation (1-3):

  • 1 = Complex (needs development resources, multiple sprints)
  • 2 = Moderate (can be done in a testing tool with some effort)
  • 3 = Easy (simple change in the testing tool)

Sum all scores to get the PXL priority.

Example Scoring

CriteriaCheckout RedesignTrust BadgesProduct CopyMobile Nav
Above the fold?1111
Noticeable in 5s?1101
Add/remove element?1101
High-traffic page?1111
User testing support2002
Qualitative support1111
Quantitative support1111
Ease1322
PXL Score99610

When PXL Works Best

  • Mature CRO programs with established research processes
  • Teams that struggle with scoring bias — binary questions reduce subjectivity
  • Organizations that need to justify test selection to stakeholders
  • High-traffic sites where test velocity is high and prioritization precision pays off

PXL Limitations

PXL is heavier to implement. Every test idea needs to be evaluated against research data, which means you need that research in the first place. For teams just starting out, this can feel like overhead.

The binary nature also means you lose nuance. A page that’s “above the fold” gets the same score whether it’s a hero banner or a tiny element near the fold line.

Head-to-Head Comparison

Scoring Method

  • ICE: Subjective 1-10 scales, multiplied
  • PIE: Subjective 1-10 scales, averaged
  • PXL: Mostly binary + objective criteria, summed

Setup Time

  • ICE: Minutes — gather the team and start scoring
  • PIE: 30-60 minutes — need analytics data for Importance and Potential
  • PXL: 1-2 hours — need research artifacts mapped to each idea

Bias Resistance

  • ICE: Low — highly subjective, prone to anchoring and HiPPO influence
  • PIE: Medium — Potential is data-informed but still interpreted
  • PXL: High — binary questions and evidence requirements reduce bias

Best For

  • ICE: Speed, early-stage programs, cross-functional alignment
  • PIE: Balanced approach, page-level prioritization
  • PXL: Rigor, mature programs, stakeholder accountability

How to Choose Your Framework

Start with ICE if:

You’re running fewer than 3 tests per month, your team is new to structured CRO, or you need to get buy-in from stakeholders who aren’t data-savvy. ICE’s simplicity is a feature — it gets people scoring and discussing without friction.

Move to PIE when:

You have Google Analytics data you trust, you’ve started collecting qualitative data (heatmaps, recordings, surveys), and you want prioritization that’s more grounded in evidence. PIE is the natural next step from ICE.

Graduate to PXL when:

You have a dedicated CRO team or analyst, you run user research regularly, you need to defend test selection to leadership, and you have enough test velocity that the precision pays off.

Or Combine Them

Many mature programs use a hybrid. For example:

  1. ICE for quick triage — rapidly sort 50 ideas into “high/medium/low” buckets
  2. PXL for final prioritization — rigorously rank the top 15-20 ideas
  3. PIE for page-level strategy — decide which pages to focus research on

Making Any Framework Work Better

Regardless of which framework you choose, these practices improve the quality of your prioritization:

Calibrate Your Team

Before scoring, align on what the numbers mean. Does “Impact 8” mean an 8% conversion lift? $80K in revenue? Run through 3-4 example ideas together to establish shared understanding.

Score Independently First

Have each team member score ideas independently before discussing. This prevents anchoring bias — where the first person to speak sets the range for everyone else.

Re-Prioritize Monthly

Your backlog isn’t static. New data arrives, business priorities shift, and previous test results inform new hypotheses. Review and re-score your top 20 ideas at least monthly.

Document Your Reasoning

Don’t just record scores — record why. “Confidence: 8 because session recordings show 35% of users struggling with this form field” is infinitely more useful than “Confidence: 8” when you revisit the backlog in two months.

Track Prediction Accuracy

After each test, compare your predicted impact to the actual result. Over time, this feedback loop makes your team better at scoring — and reveals systematic biases (like consistently overrating ease or underrating confidence).

A Practical Example: Prioritizing 5 Real Test Ideas

Let’s walk through prioritizing a realistic set of e-commerce test ideas using all three frameworks:

The Ideas:

  1. Add a sticky add-to-cart bar on mobile product pages
  2. Replace the homepage hero carousel with a single static image and CTA
  3. Add a progress indicator to the 4-step checkout
  4. Show estimated delivery dates on product pages
  5. Simplify the account creation form from 8 fields to 4

ICE Results: #5 (Simplify form) wins — high confidence from form analytics showing 60% abandonment, and it’s easy to implement.

PIE Results: #3 (Checkout progress bar) wins — the checkout page has the highest importance (all revenue flows through it) and high potential based on drop-off data.

PXL Results: #1 (Sticky add-to-cart) wins — it’s above the fold, noticeable in 5 seconds, supported by session recordings showing scroll-back behavior, and moderately easy to implement.

Three frameworks, three different winners. None of them are wrong — they’re optimizing for different things. ICE favors quick wins. PIE favors strategic importance. PXL favors evidence quality.

Start Somewhere

The worst prioritization framework is no framework at all. Even a rough ICE scoring session beats “let’s just test what the CEO suggested.”

Pick the framework that matches your team’s maturity, apply it consistently, and refine over time. The real value isn’t in the specific scores — it’s in the structured conversation about why certain tests should run before others.

That conversation, repeated monthly, is what turns a random collection of test ideas into a strategic CRO program.


Need help prioritizing your CRO test backlog? Our CRO audit identifies your highest-impact opportunities and ranks them by expected revenue impact — so you know exactly where to start.

Related Articles