Expert Guide Series

How Do You Test If New Technology Actually Helps Users?

New features fail because teams get excited about building things without checking if anyone actually needs them. I've seen it happen more times than I care to count—a company spends months developing what they think is a brilliant new feature, launches it with high expectations, and then watches as maybe 5% of their users even bother trying it. The rest? They ignore it completely or worse, they find it confusing and start leaving negative reviews. Its a pattern that repeats itself across every type of app I've worked on, from healthcare platforms to fintech products, and the really frustrating bit is that most of these failures could've been avoided with proper testing.

The problem isn't that teams don't test at all—most do some form of testing before launch. But here's where things go wrong: they test with the wrong people, ask the wrong questions, or measure the wrong things. I worked on an e-commerce app where the team was convinced users wanted a new AR feature to visualise products in their homes; they tested it with five people from their office, everyone said it was cool, and they built the whole thing. When it launched, actual users found it slow, battery-draining, and honestly just unnecessary for the products they were buying. The feature got used once by most people, then never again.

Testing technology isn't about validating your own ideas—its about discovering whether those ideas solve real problems for real people in their actual context

What makes testing new technology tricky is that users often can't articulate what they want until they experience it, and even then, what they say doesn't always match what they do. Someone might tell you in an interview that they'd love a certain feature, but when you track their behaviour in the app, they never use it. Or the opposite happens—they complain about something during testing but their usage data shows they're engaging with it constantly. This disconnect between stated preferences and actual behaviour is something you learn to expect after building apps for different industries over the years.

Why Most Technology Testing Gets It Wrong

I've watched dozens of apps fail not because they had bad ideas, but because they tested them completely wrong. The biggest mistake? Testing in a vacuum. Teams build a feature, show it to a handful of people (usually their mates or colleagues), get some positive feedback, and ship it. Then they're shocked when nobody uses it. It's a pattern I've seen play out over and over, especially with startups who are burning through their funding trying to figure out why users aren't engaging.

The problem starts with who's doing the testing. I worked on a healthcare app where the product team tested a medication reminder feature with five people from their office—young, tech-savvy types who weren't actually taking multiple medications daily. They loved the interface. It was clean, minimal, easy to use. But when we finally got it in front of actual patients (people in their 60s and 70s managing chronic conditions), they couldn't figure out how to set up their first reminder. The contrast between font sizes that looked "elegant" on a 25-year-old's phone were genuinely unreadable for someone with presbyopia. We had to rebuild the entire onboarding flow because we'd tested with the wrong people.

The Three Testing Traps I See Constantly

Here's where teams typically go wrong, and honestly I've made these mistakes myself early on:

Testing features in isolation instead of within the actual user journey—a checkout flow that works perfectly in testing falls apart when users have been scrolling through products for 20 minutes first
Asking leading questions like "what do you think of this new feature?" instead of just watching what people do naturally
Testing too late in the development cycle when changing direction means wasting weeks of work and blowing through budget
Using friends and family who want to be nice rather than honest about whats confusing or broken
Ignoring the context of where and when people actually use your app—that fintech feature that tested brilliantly in a quiet office becomes unusable on a crowded commute

The other massive issue? Most teams don't test the actual problem they're trying to solve. They test the solution. I see this constantly with e-commerce clients who build elaborate product recommendation engines without first validating whether users even want recommendations. Sometimes people just want to search for exactly what they came to buy. We built an entire AI-powered styling feature for a fashion app that took three months to develop, only to discover in testing that users found it patronising—they already knew their style, they just wanted better filtering options. Understanding which user problems are actually worth solving could've saved 12 weeks and £40,000 if we'd tested the assumption before building the feature.

Setting Up Tests That Actually Matter

I've seen so many apps waste weeks testing things that don't actually tell them anything useful. A fintech client once spent three months testing button colours whilst completely ignoring whether users understood their security features—which turned out to be the real reason people weren't completing signups. The issue? They hadn't set up their tests to measure what actually mattered to their business and users.

Before you run any test, you need to define what success looks like. And I mean really define it, not just say "we want better engagement." When I worked on a healthcare app for patient monitoring, we started by asking: what's the one thing that, if it improved, would make this feature worth building? For that project, it was whether nurses could log patient vitals 30% faster than their current paper-based system. That gave us a clear benchmark to test against.

What You Actually Need Before Testing

Here's what I set up for every test now, learned through making plenty of mistakes over the years:

A specific hypothesis—"Users will prefer feature X" is rubbish; try "Users will complete checkout 20% faster with one-tap payment"
Success metrics that relate to business goals—downloads mean nothing if nobody uses the app twice
A control group so you've got something to compare against (I learned this the hard way on an e-commerce project where we couldn't prove our new feature actually improved anything)
The minimum sample size you need—testing with five users can spot major UX problems but won't validate conversion rates
A timeline that's realistic—rushing tests means bad data, which means bad decisions

One thing people get wrong constantly? Testing too many variables at once. I worked with a startup who changed their onboarding flow, pricing page, and notification system all in the same week. When retention improved, they had no idea which change caused it. Test one thing at a time unless you've got the user numbers to run proper multivariate tests (and most apps don't). Validating whether features will generate revenue requires the same disciplined approach to testing individual variables.

Write down your hypothesis and success metrics before building anything. If you can't articulate what you're testing and why, you're not ready to test yet—you're just building features and hoping for the best.

Finding the Right People to Test With

The biggest mistake I see companies make? Testing with whoever's available. I've watched countless startups test their healthcare app with their mates from the office, then wonder why real patients couldn't figure out how to book an appointment. Getting the wrong test participants will give you feedback that's worse than useless—it'll actively mislead you about what needs fixing.

Your test participants need to match your actual users as closely as possible. When we built a fintech app for pension planning, we didn't test it with tech-savvy twenty-somethings; we found people in their fifties and sixties who'd never used investment apps before. Sure, the younger testers flew through the interface... but our real users got stuck on the login screen because we'd assumed everyone knew what two-factor authentication meant. It's a bit mad really, but that one mismatch cost the client three weeks of rework.

You need to think about more than just age though. Consider their technical comfort level, their motivation for using your app, even the devices they typically use. I mean, if you're building an e-commerce app for busy parents, testing with students who have loads of free time won't show you the same pain points. The parent might abandon their cart because checkout takes too long whilst they're watching their toddler; the student won't notice that friction at all. When you're working with international users, researching users across different languages and cultures adds another layer of complexity to participant selection.

Who You Actually Need

Here's what matters when recruiting testers:

Match their experience level to your target users—if you're building for beginners, don't test with experts
Find people who have the actual problem your app solves, not people who think it sounds interesting
Get a mix of device types and operating system versions your real users have
Include people who are initially sceptical or resistant; they'll expose issues your fans won't mention
Aim for 5-8 participants per test round—more than that and you start seeing diminishing returns

Where to Find Them

Finding the right people isn't always straightforward. For B2C apps, I've had success with social media groups where your target audience already hangs out, user research platforms like UserTesting or Respondent, and even just approaching people in places where they'd naturally use your app. For a retail app we tested in actual shopping centres (with permission obviously) because we needed to see how people would use it whilst carrying shopping bags and dealing with distractions.

B2B apps are trickier because you need people who actually do the job your app supports. We once built a logistics app and the client wanted to test with their management team... but the people who'd actually use it daily were warehouse staff and delivery drivers. We pushed back and insisted on testing with the real end users. Good thing too, because the managers thought the interface was "intuitive" whilst the warehouse staff couldn't find basic functions because they were wearing gloves and the buttons weren't designed for accessibility. You've got to test with people whose day-to-day reality matches how the app will actually be used.

What to Measure When Users Try Your Features

The hardest thing about measuring feature usage isn't the technical side—its deciding what actually matters. I've seen teams obsess over metrics that look impressive in board meetings but tell you nothing about whether people find your feature useful. Task completion rate sounds boring compared to "engagement metrics" but I'll tell you what, it's saved me from shipping more broken features than I can count.

When we built a medical appointment booking feature for a healthcare app, the client wanted to track everything. Every tap, every scroll, every hesitation. But here's what really mattered: did people successfully book an appointment? How long did it take them? And (this one's important) how many gave up halfway through? We found that 40% of users were abandoning the flow at the insurance selection step—not because they didn't want to book, but because the field was confusing. That single metric, the drop-off rate at each step, told us exactly where to focus our efforts.

Time-on-task tells you if your feature is efficient; error rates tell you if its clear; and completion rates tell you if its actually working

I always measure these three things first: completion rate (did they finish what they started?), time to completion (was it quick enough?), and error frequency (how many mistakes did they make?). Then, depending on the feature, I'll add specific metrics. For a fintech app's payment feature, we tracked how often users needed to retry a transaction; for an e-commerce search function, we measured whether the first three results were relevant enough that users clicked them. The key is measuring outcomes, not just activity—because a user who spends five minutes on your feature might be engaged, or they might just be lost.

Running Tests Without Influencing the Results

This is probably the hardest part of testing, if I'm being honest. You want to help users when they're stuck, you want to explain things, but the moment you do that you've basically ruined your test results. I learned this the hard way on a fintech project where we were testing a new investment feature—every time someone looked confused I'd jump in and say "oh just tap there" and suddenly we had this beautiful completion rate that meant absolutely nothing because I'd been acting like a human instruction manual the whole time.

The trick is to bite your tongue. Actually bite it. When someone's sitting there clearly struggling with your navigation you need to let them struggle because that's the exact data you need. If users cant figure it out in a test environment with you watching, they definitely won't figure it out at home on their own. What I do now is I tell people at the start "I'm testing the app, not you—there are no wrong answers and if something doesn't work that's the apps fault not yours." This takes the pressure off them and stops them trying to be polite about your terrible design choices.

What You Should Actually Say During Tests

Your questions matter more than you think. "Do you like this button?" is a rubbish question because people will just say yes to be nice. Instead I ask "what would you expect to happen if you tapped this?" or "where would you look to do [specific task]?" These questions get you actual insights without leading people to the answer you want. On a healthcare app we built, we stopped asking "is this clear?" and started asking "what does this screen tell you about your medication?" The difference in responses was honestly night and day; people told us they had no idea what the dosage warnings meant even though they'd previously said everything was "very clear."

Common Mistakes That Mess Up Your Data

Explaining features before users try them—let them discover things naturally or fail trying
Jumping in too quickly when someone pauses—silence is data, people need time to think
Using leading language like "the easy-to-use menu" which tells them its supposed to be easy
Testing in your office where users feel like they're being judged by a room full of developers
Having stakeholders watch live because they cant help themselves from defending design decisions
Pointing at the screen or hovering your hand near where they should tap next

One thing that really helped me was recording sessions and watching them back later. You notice all the tiny ways you influenced things—a nod here, a "mmhmm" there when they do what you wanted. Its a bit uncomfortable watching yourself basically coaching users through your test, but it makes you much better at staying neutral. For an e-commerce app we were testing, I thought I'd been completely hands-off but when I watched the recordings I was making encouraging noises every time someone found the checkout button. No wonder our completion rates looked good.

The other big thing is your prototype needs to work properly. If things are loading slowly or buttons dont respond, users will ask if they're doing something wrong and you'll end up explaining technical limitations instead of testing actual behaviour. I always make sure test builds are as close to production quality as possible—fake the backend data if you need to but make sure every interaction feels real.

Making Sense of What Users Tell You

The hardest part of user testing isn't collecting feedback—its working out what it actually means. I've sat through hundreds of testing sessions where users say one thing but their behaviour tells a completely different story. A healthcare app we built had users telling us they loved a new symptom-tracking feature, gave it high scores in surveys, but then we checked the analytics and barely 15% were using it after the first week. What went wrong?

People are rubbish at predicting their own behaviour, honestly. They want to be helpful, they want to give you the answer they think you're looking for, or they genuinely believe they'll use something even when they won't. This is why I always look at three types of feedback together: what users say, what they do during testing, and what the data shows afterwards. If all three align? Great, you've got something real. If they don't? That's when you need to dig deeper. Learning from competitors' user research approaches can help validate your findings across different contexts.

The trick is learning to spot patterns rather than reacting to individual comments. One user struggling with a button placement could be a fluke; five users having the same issue means you've got a problem. I keep a simple scoring system for feedback that helps me prioritise what actually matters:

How many users mentioned it without prompting
Did they struggle with it during the actual test
Does it affect core functionality or nice-to-have features
Can we fix it without rebuilding everything
Will it impact our retention metrics

But here's the thing—sometimes the most valuable insights come from what users don't say. Watch for hesitation, confusion, or moments where they backtrack. These silent signals often reveal usability issues that users cant articulate. In a fintech project, we noticed testers pausing for 3-4 seconds before tapping a "confirm payment" button even though nobody complained about it. Turns out the button colour was too similar to our cancel button and created subconscious doubt. Changed it, and that hesitation disappeared.

Record every test session if you can. I've lost count of how many times I've gone back to footage weeks later and spotted something I missed the first time round—a facial expression, a comment I didn't think was important, or a pattern that only became clear after seeing multiple users do the same thing.

When Numbers Don't Match What Users Say

You know what happens all the time? Your analytics tell you one thing and your user interviews tell you something completely different. I mean, I've had projects where users swore they loved a feature during testing sessions but the usage data showed they barely touched it after launch. It's a bit mad really, but this disconnect is one of the trickiest parts of mobile app development—and its something you absolutely need to learn to navigate properly.

Here's the thing; people are terrible at predicting their own behaviour. I worked on a fitness app where users kept telling us they wanted more detailed nutrition tracking with macro breakdowns and meal planning tools. Made sense, right? We built it, spent weeks getting the interface just right, and the qualitative feedback was brilliant. But when we looked at the actual usage data three months after launch... barely 8% of users who said they wanted it actually used it more than once. Meanwhile, a simple step counter we'd added almost as an afterthought had 67% daily engagement. The numbers don't lie, but people's words sometimes do (unintentionally, of course).

Why This Disconnect Happens

Users tell you what they think they should want or what sounds good in theory. They answer your questions based on their ideal version of themselves, not who they actually are when they're rushing through their day with 3% battery left. I've seen this pattern across healthcare apps, e-commerce platforms, you name it. Someone might say they want comprehensive product comparisons with detailed specifications, but the data shows they actually make quick decisions based on photos and star ratings.

How to Handle Conflicting Signals

Start by looking at what people do, not just what they say. If usage data contradicts interview feedback, trust the behaviour first but dig deeper to understand why. Sometimes users want a feature for peace of mind even if they rarely use it—like emergency contacts in a health app. Other times, they're just being polite or don't understand what they actually need. I always run A/B tests for at least two weeks before making big decisions; this gives you real behavioural data under actual usage conditions rather than the artificial environment of a test session.

Pay attention to the context too. Users might say they hate getting notifications, but your retention data shows that people who receive weekly summaries are 40% more likely to stay active. The solution isn't to ignore what they say—its to find the balance between stated preferences and observed behaviour. Maybe they don't want daily notifications but weekly ones work fine? Test different approaches, measure both satisfaction scores and usage metrics, and be willing to make decisions that feel counterintuitive based on what users told you directly. When testing payment features, for instance, consider how processing costs impact user behaviour alongside their stated preferences.

Deciding What to Build Next Based on Test Results

So you've got your test results back and now comes the hard part—actually deciding what to do with them. I mean, this is where theory meets reality and budgets meet roadmaps. Over the years I've seen companies make two big mistakes here; they either ignore data that doesnt match what they wanted to hear, or they chase every single insight without thinking about whats actually feasible. Neither approach works.

The way I handle this is by sorting findings into three buckets: quick wins, strategic investments, and nice-to-haves. Quick wins are things you can fix in a week or two that improve the experience—maybe it's changing button colours or rewording confusing labels. I built a fintech app where test results showed users got stuck on a verification screen simply because the heading said "Security Check" instead of "Verify Your Identity". Changed it in twenty minutes and the drop-off rate fell by 18%. Those are the findings you implement immediately.

The best test results aren't the ones that confirm your assumptions but the ones that challenge them enough to make you rethink your priorities

Strategic investments are bigger, like rebuilding a checkout flow or adding biometric authentication. These need proper planning, resource allocation, and probably a few sprints. For an e-commerce client we found users wanted Apple Pay integration—not exactly a quick fix but the data showed it would reduce cart abandonment by roughly 25%. That became our next quarter priority because the numbers justified the development cost. If you're working on apps that handle sensitive user data, you'll also need to factor in compliance costs when prioritising features.

Nice-to-haves go in the backlog. Users might mention them but they don't solve core problems. And here's the thing—you cant build everything. I've worked with healthcare apps where patients requested dozens of features but only three actually impacted whether they used the app daily. Focus on those three. Test results should guide you toward what matters most, not give you an endless to-do list that dilutes your product vision. Sometimes the most requested features create memorable user experiences that drive organic growth, making them worth the strategic investment.

Conclusion

Testing new technology isn't about running perfect experiments or collecting massive amounts of data—its about learning whether what you're building actually makes people's lives better. After years of watching apps succeed and fail, I can tell you that the ones that make it are the ones that actually listened to what their testing revealed, even when it wasn't what they wanted to hear.

The hardest part? Accepting that most of your brilliant ideas probably won't work the way you imagined. I've built features for fintech apps that tested beautifully in the lab but completely confused people in the real world; I've watched healthcare apps where the thing we thought was a minor addition became the most-used feature. You just don't know until real people try it in their actual lives, not in some controlled environment where everything is perfect.

What matters most is developing a habit of testing early and often. Start with rough prototypes before you invest months building the wrong thing. Get comfortable watching people struggle with your interface because that struggle tells you exactly what needs fixing. Mix your quantitative data with qualitative feedback so you understand not just what people are doing but why they're doing it. And please, test with people who actually represent your users—not just your colleagues or people who think like you do.

The apps that win aren't necessarily the ones with the most features or the fanciest technology. They're the ones that genuinely understand what their users need and deliver it in a way that feels natural. Testing is how you get there. Its messy, sometimes frustrating, and almost never goes according to plan... but it's the only reliable way I've found to build something people actually want to use.

Frequently Asked Questions

For spotting major usability issues, 5-8 participants per test round gives you the most valuable insights without diminishing returns. However, if you're testing conversion rates or other statistical measures, you'll need much larger sample sizes—I've found that 5 users can tell you your checkout flow is confusing, but you need hundreds to prove it converts 20% better than your old version.

Always prioritise behavioural data over stated preferences when they conflict—I've seen countless projects where users claimed they loved a feature but never actually used it after launch. The key is using both together: analytics tell you what's happening, user interviews tell you why, and that combination helps you make decisions that actually improve retention rather than just satisfaction scores.

Test early with rough prototypes before investing months building the wrong thing, then test again with working versions before launch. I've saved clients tens of thousands by discovering fundamental problems with paper sketches rather than after three months of development—it's much easier to change direction when you haven't already built everything.

Testing with the wrong people completely invalidates your results—I've seen healthcare apps tested with young office workers instead of actual patients, then fail miserably when real users couldn't even complete basic tasks. Your participants need to match your actual users' age, technical comfort level, and context of use, not just be people who are available and willing to help.

Bite your tongue and let people struggle—the moment you jump in to help or explain something, you've ruined your data. I learned this the hard way by unconsciously coaching users through confusing interfaces, then wondering why our completion rates looked so good but real users were getting stuck on the same issues.

Focus on completion rates, time-to-completion, and error frequency first—these tell you if your feature actually works in practice. Engagement metrics like time spent can be misleading because users might just be lost rather than engaged, which is why I always measure outcomes rather than just activity.

For behavioural data, I typically run A/B tests for at least two weeks to account for different usage patterns and initial novelty effects. Usability testing sessions give you immediate insights about confusion and friction, but you need longer-term data to see if people actually adopt the feature in their daily routines rather than just trying it once.

Trust the data over your assumptions—some of my most successful features came from completely changing direction based on unexpected test results. I once spent weeks building an AI styling feature for a fashion app, only to discover users found it patronising and just wanted better filtering options instead, which saved the client from launching something that would have hurt retention.

Subscribe To Our Learning Centre

Previous guide

← What's the Right Way to Price My App at Launch?

Next guide

How Do You Write Release Notes People Actually Read? →