238: A/B Testing
00:00:00
◼
►
Welcome to Under the Radar, a show about independent iOS app development.
00:00:04
◼
►
I'm Marco Arment. And I'm David Smith. Under the Radar is never longer than 30 minutes,
00:00:08
◼
►
so let's get started. Now, should we try two different intros here
00:00:12
◼
►
to see which one is more effective? Sure, I can go first and then you can go.
00:00:16
◼
►
Yeah, right. I can say welcome to Under the Radar, a show about
00:00:20
◼
►
independent iOS app development. I'm David Smith. And I'm Marco Arment. Oh,
00:00:24
◼
►
this show's never longer than 15 minutes, so let's get started. 30 minutes?
00:00:28
◼
►
Our first test is off to a
00:00:32
◼
►
roaring success. I don't have any practice doing that part of it. It's hard.
00:00:36
◼
►
I did yours fine. Yeah, you nailed mine perfectly. But yes, so we are
00:00:40
◼
►
going to talk, so everyone knows how much Marco and I love
00:00:44
◼
►
automated testing, and specifically we're going to talk about a particular kind of that
00:00:48
◼
►
A/B testing, which is honestly the only kind of automated testing I've ever actually
00:00:52
◼
►
used. Is this really even, like could this even be the same thing as testing?
00:00:56
◼
►
Should we even be using the same word? Because, you know, when most people say automated testing, they're
00:01:00
◼
►
talking about a code level thing. Oh, sure, I know, I know.
00:01:04
◼
►
But this is the kind of testing that I can get behind, and it's automated
00:01:08
◼
►
because I built a system to do it, and it happens automatically. I would maybe
00:01:12
◼
►
call this more like marketing science. Sure. But the
00:01:16
◼
►
fancy word that everyone uses for it is A/B testing.
00:01:20
◼
►
And I think it's something that I looked down on for a while.
00:01:24
◼
►
I don't know if looked down is the wrong word for it, but I feel like it was always this like, you get these
00:01:28
◼
►
articles that were passed around every now and then, where it's like, ooh, Google
00:01:32
◼
►
tested 46,000 shades of blue for their link in their home screen,
00:01:36
◼
►
and they're completely written, you know, like the app, or all their
00:01:40
◼
►
design decisions are designed by computers, and it's lost, you know,
00:01:44
◼
►
a sense of soul or artistry, and you kind of pass it around in that way.
00:01:48
◼
►
And I feel like sometimes it's easy for me to think of it in those terms,
00:01:52
◼
►
but in recent, recently, this is something that I've started using,
00:01:56
◼
►
and it's a tool that, it's like, surprise, surprise, people are using it because it's useful, and
00:02:00
◼
►
because it can answer questions that are very difficult to answer otherwise,
00:02:04
◼
►
and lets you in some ways be more creative,
00:02:08
◼
►
I find, which is something I want to talk about later, but I think it's really
00:02:12
◼
►
important as a developer with limited resources
00:02:16
◼
►
to consider this as one of the tools that are available to us, that
00:02:20
◼
►
it's something that we can use to make our apps better, to make our apps
00:02:24
◼
►
better in ways that are actually meaningfully and measurably better,
00:02:28
◼
►
not just notionally better, but before I dive
00:02:32
◼
►
too much into that, I think it's important, I think we're just going to start with just kind of a general overview
00:02:36
◼
►
of what A/B testing is, and the mechanics of kind of how it works. Does that make sense?
00:02:40
◼
►
Yeah, I actually would love to hear about this from you, because
00:02:44
◼
►
this is exactly the kind of thing that I wouldn't do
00:02:48
◼
►
until you did it and told me, and kind of convinced me to do it, because
00:02:52
◼
►
you are so much more pragmatic and experimental than I am.
00:02:56
◼
►
When I hear something like A/B testing, which I don't think
00:03:00
◼
►
I've ever actually really done, like, I've done
00:03:04
◼
►
like, you know, notional things like, oh, I'm going to change the wording of this thing in this
00:03:08
◼
►
build, and then see what happens, but then I'm not actually testing at the same time, so
00:03:12
◼
►
I'm like, you know, I'm not controlling the variables, so it's not really A/B testing.
00:03:16
◼
►
You know, where you are very good at pushing the boundaries
00:03:20
◼
►
of what independent developers think we should be doing and not doing,
00:03:24
◼
►
you are more open-minded to say, like, you know what, this thing that, like,
00:03:28
◼
►
other people do, that indies think is not for us, or not appropriate
00:03:32
◼
►
for us, or not something we need to do, you are more willing to try it,
00:03:36
◼
►
and then report back to us, and kind of convince us in
00:03:40
◼
►
more evidence-based and less
00:03:44
◼
►
emotional or less assumption-based reasoning than we would use, like, whether we should
00:03:48
◼
►
actually be looking at this or not. Thanks, that's encouraging to hear, but
00:03:52
◼
►
I think it's very much, like, A/B testing is
00:03:56
◼
►
a, I'll just do my overview. So A/B testing is a method by which
00:04:00
◼
►
you can evaluate the relative performance of two things
00:04:04
◼
►
in a way that you, at the end of the
00:04:08
◼
►
test, you have reasonable confidence that one is better or
00:04:12
◼
►
worse than the other, or they're the same, I suppose.
00:04:16
◼
►
It ends up with this sort of comparison function between two things.
00:04:20
◼
►
And in software, what we're typically doing is
00:04:24
◼
►
we're going to segment our user base, our audience, whatever it is, into
00:04:28
◼
►
two groups. For the purposes of this, I'm going to do two groups. You can do A/B testing
00:04:32
◼
►
with more than two groups, it's just the math and
00:04:36
◼
►
implementation of that gets very complicated, but it's like, in my version of this, I've done
00:04:40
◼
►
it with multiple variants and you just kind of have to account for that. But for simplicity,
00:04:44
◼
►
you take, say you take your audience and you can split it into
00:04:48
◼
►
two groups, and you're going to want to try and make those groups as
00:04:52
◼
►
similar as possible so that you're not creating
00:04:56
◼
►
some kind of bias in your system. So say, for example, you wouldn't want
00:05:00
◼
►
to do it that if, you know, it's from midnight to noon,
00:05:04
◼
►
UTC is one group, and from noon to midnight is
00:05:08
◼
►
another group, for example, like where you're shifting it based on time of day.
00:05:12
◼
►
That would be a bad way to segment your group because now you're creating these other variables
00:05:16
◼
►
that might impact how people are responding to whatever it is you're showing them.
00:05:20
◼
►
You want to segment them essentially randomly. And so for my A/B testing,
00:05:24
◼
►
I'm doing this completely randomly. It's just
00:05:28
◼
►
using Swift's random functions to adjust
00:05:32
◼
►
for this, and it seems so far it's been working great. And I think it's close enough
00:05:36
◼
►
for my purposes that I'm not trying to do something more fancy, but you just want to have
00:05:40
◼
►
two equally representative groups. And then you're going to show
00:05:44
◼
►
each of those two groups something different.
00:05:48
◼
►
In my case, I started doing this for trying to work on improvements to the paywall
00:05:52
◼
►
in Widgetsmith. That's something that I'm wanting to work
00:05:56
◼
►
towards making it better. I want to make it more, you know, sort of actually have a higher
00:06:00
◼
►
conversion rate and, you know, increase the number of people who are
00:06:04
◼
►
starting subscriptions. That was my ultimate goal, is, you know, how can I change my
00:06:08
◼
►
paywall so that that happens? And so the first thing I did is I took my original
00:06:12
◼
►
paywall and then I created a new one, and it was slightly different. And it was different in, like,
00:06:16
◼
►
I changed the buttons on the bottom of my paywall. It was the first A/B test I did.
00:06:20
◼
►
And for one of my groups, I showed them the
00:06:24
◼
►
one, for the other one I showed them the other. And then I instrument my app
00:06:28
◼
►
so that I can tell essentially how many times did you show A,
00:06:32
◼
►
how many times did you show B, and then what was the relative conversion
00:06:36
◼
►
rate of A versus B. And you, you know, I put this
00:06:40
◼
►
in my app, I run it, and what I start to get back is, you know, as they're
00:06:44
◼
►
reporting back their data, I'm getting the number of times the paywall was shown and the
00:06:48
◼
►
number of times that, you know, someone became, started
00:06:52
◼
►
a subscription for each of these two groups. And in some ways you would think,
00:06:56
◼
►
well, that's all you need. But the more, it's like
00:07:00
◼
►
this is where you start to get beyond my expertise but into an area that I think I can describe
00:07:04
◼
►
but not understand. And this is the, sort of the concept of
00:07:08
◼
►
statistical significance and whether
00:07:12
◼
►
the differences you're seeing are actually meaningful and you should interpret them
00:07:16
◼
►
as being true, or if they are potentially more likely to be
00:07:20
◼
►
a result of chance. And the way that I think about this is
00:07:24
◼
►
say you were trying to see if a coin was fair. And so you, you know,
00:07:28
◼
►
you flip a coin and in theory it should, you know, be half the time should come
00:07:32
◼
►
up heads and half the times it should come up tails. And so you flip a coin
00:07:36
◼
►
once and it's heads. So right now you have 100% heads
00:07:40
◼
►
and 0% tails. And so if you stopped your test there, you'd be like, wow,
00:07:44
◼
►
heads is way more likely than tails. But you obviously
00:07:48
◼
►
don't have enough data to really draw that conclusion yet. So you flip it again and it's also heads.
00:07:52
◼
►
So now you have two heads and zero tails. And it's like, wow, this, you know, this paywall is
00:07:56
◼
►
performing amazingly. You know, A is so much better. But obviously the more you flip
00:08:00
◼
►
the coin, over time, if it's a fair coin, you'd end up with
00:08:04
◼
►
50/50 and just sort of these statistical sort of noise
00:08:08
◼
►
that you can end up with where you can have streaks of people, you know,
00:08:12
◼
►
maybe in my case I could have streaks where there's just people are being more
00:08:16
◼
►
generous and, or more excited or whatever it is. And you have
00:08:20
◼
►
these things that are kind of happening, but it hasn't, but it's
00:08:24
◼
►
it isn't actually representative of the fundamentals difference.
00:08:28
◼
►
And so you, there are these big formulas that you can just sort of plug your data into and you
00:08:32
◼
►
you know, essentially the more, the more trials you have
00:08:36
◼
►
the easier it is to say that there's a significant difference between one thing
00:08:40
◼
►
or the other. So if one of them is coming, and obviously if you have big differences between them, which
00:08:44
◼
►
in a couple of my cases I actually did, where, you know, it's like one of them has
00:08:48
◼
►
you know, has this performance and the other one is like twice the performance.
00:08:52
◼
►
You actually don't need quite as many trials to say that that one is actually better
00:08:56
◼
►
but if you're in any of these cases where one is 5% better or 10% better
00:09:00
◼
►
you actually need a relatively high number of trials before
00:09:04
◼
►
you can be confident about this. And so the way I've been doing this is, you know, there's lots
00:09:08
◼
►
of like websites or calculator things that you can just punch your data into
00:09:12
◼
►
and it will tell you, you know, how likely can you be, how confident can you be
00:09:16
◼
►
that this difference that you're seeing is actually meaningful and not just
00:09:20
◼
►
the result of kind of statistical noise going back and forth.
00:09:24
◼
►
And the end result is that you'll end up with some kind of improvement
00:09:28
◼
►
and a confidence score. And so, you know, so my old paywall versus my new
00:09:32
◼
►
paywall, my new paywall performs better, it performs better by this percent
00:09:36
◼
►
and I'm, can be, in my case I was like 99.9% confident
00:09:40
◼
►
that it was actually an actual improvement. And then from there you just can
00:09:44
◼
►
kind of continue to iterate on this and you can continue to say, well what if I came up with
00:09:48
◼
►
another test to run or another option to try
00:09:52
◼
►
and you can just keep adding options into this and you continue to see what's the
00:09:56
◼
►
relative conversion rate, how do they compare against each other
00:10:00
◼
►
and kind of go from there. With the ultimate goal, obviously, of trying to say
00:10:04
◼
►
what is the best, you know, the best version of this that
00:10:08
◼
►
I can make. And the best, as long as you have a good definition of best,
00:10:12
◼
►
you're in a good place. And for something like a paywall it's relatively easy because
00:10:16
◼
►
I can, you know, I can relatively, I can say, you know, what I want
00:10:20
◼
►
is someone to start a trial. That's my goal and that's pretty straightforward.
00:10:24
◼
►
If you're doing something where you're trying to like A/B test, you know, colors
00:10:28
◼
►
in something, you know, which color should I make my,
00:10:32
◼
►
you know, the text in my app and you're going to be like, well I want to improve
00:10:36
◼
►
retention. It's like, retention is a very hard thing to measure that takes a very
00:10:40
◼
►
long time and is impacted by a lot of different factors, so that's probably a hard thing to
00:10:44
◼
►
measure. But something like this where you have a very clear goal, you can segment
00:10:48
◼
►
very easily, you know, essentially every time someone taps on a paywall
00:10:52
◼
►
I have an opportunity to segment them, say which group are you going to be in, show it
00:10:56
◼
►
to that person, and then I can measure the result.
00:11:00
◼
►
And I think the only other thing that I want to mention is that, you know, something I found that was relatively
00:11:04
◼
►
important for things like this is the importance of when you're doing that
00:11:08
◼
►
segmentation to be slightly sticky with it, to move, you know,
00:11:12
◼
►
to make a user into one of the groups and have them stay there
00:11:16
◼
►
in my case, I have them stay in there for at least two or three days
00:11:20
◼
►
because you don't otherwise, you know, if every time they open the paywall they're seeing a different
00:11:24
◼
►
one, to some degree that's helpful, but then they're also
00:11:28
◼
►
gaining the exposure to all of the sort of subsequent ones, and what you're actually measuring
00:11:32
◼
►
is, well, if I show the user all three of my paywalls, the third time I show it to them, they'll
00:11:36
◼
►
be successful, and that gets really complicated, and so, have some sense
00:11:40
◼
►
of stickiness to that, and then in my case, after three days, it resets
00:11:44
◼
►
and does a fresh check to see, it checks in with the server and says, are we still
00:11:48
◼
►
running this test, and if they're in a group that is still valid, it will
00:11:52
◼
►
randomly reassign them again. But anyway, hopefully that made sense,
00:11:56
◼
►
that's sort of the general process for this, and the end result is that, you know,
00:12:00
◼
►
in my case, it's like, I've been able to have a more performant paywall because I tried a lot of different
00:12:04
◼
►
things, things that I wasn't sure about, and I can validate the end result with
00:12:08
◼
►
actual data to know that, yep, these are better, this is, you know,
00:12:12
◼
►
more people are starting trials than where before, and the app, you know, sort of like, the app
00:12:16
◼
►
is more sustainable as a result.
00:12:18
◼
►
We are brought to you by Supporter. In past episodes, you've heard us talk about how to handle
00:12:22
◼
►
customer support in your apps. Well, you can add a native, great-looking support section
00:12:26
◼
►
to your app in only three minutes with Supporter. Bring your support section
00:12:30
◼
►
to the next level with step-by-step instruction pages, frequently asked question toggles,
00:12:34
◼
►
image and video support, release notes, contact options, markdown support,
00:12:38
◼
►
great accessibility, and more, and it's only 62 kilobytes.
00:12:42
◼
►
You can use Supporter with local data, or you can load remote JSON files
00:12:46
◼
►
for easy updating without having to update your whole app, and you can localize your support
00:12:50
◼
►
section based on the language of your user. You can use the optional count-based analytics
00:12:54
◼
►
to keep track of which support section gets visited the most, that way you know exactly which
00:12:58
◼
►
parts of your app are unclear. This, Dave, this sounds a lot like the way you did yours manually.
00:13:03
◼
►
With your one-time purchase of Supporter, you will receive
00:13:07
◼
►
the full Swift UI source code, so you can customize every little detail,
00:13:11
◼
►
clear instruction videos on how to build a great support section, and full documentation
00:13:15
◼
►
with example JSON. So, this is obviously
00:13:19
◼
►
right up the alley of a lot of our listeners and possibly the two of our hosts here.
00:13:23
◼
►
So, head to Supporter.goodsnooze.com
00:13:27
◼
►
like snooze like sleeping, so Supporter.goodsnooze.com
00:13:31
◼
►
Use code "under the radar" for 20% off
00:13:35
◼
►
to get Supporter for just $20. Add a complete support section
00:13:39
◼
►
to your app in only 3 minutes with Supporter. Once again, that URL
00:13:43
◼
►
is Supporter.goodsnooze.com. Code "under the radar"
00:13:47
◼
►
for 20% off to get it for just $20. Our thanks to
00:13:51
◼
►
Supporter for their support of this show. Thank you so much to Supporter.
00:13:55
◼
►
So, I'm actually really curious, you know, this world of
00:13:59
◼
►
A/B testing, so you brought up a few things I would not have thought of, like the stickiness
00:14:03
◼
►
thing of like, you know, make sure somebody's being shown the same thing on subsequent
00:14:07
◼
►
attempts within a time interval. That actually makes a lot of sense and that's something that
00:14:11
◼
►
if I was just doing like a little naive implementation of this, I probably would not have thought of that.
00:14:15
◼
►
The big, I think, dilemma and the big nuance point here
00:14:19
◼
►
for a lot of indies is similar between A/B testing and analytics.
00:14:23
◼
►
You know, we've talked about analytics before and we both perform some
00:14:27
◼
►
level of very basic first party analytics in our apps.
00:14:31
◼
►
You know, we don't integrate like big analytics packages, but, you know, we both have kind of like these
00:14:35
◼
►
custom stuff running on our own servers to just say like, "Oh, you know, this percentage of
00:14:39
◼
►
the user base uses this feature of the app or whatever."
00:14:43
◼
►
And I think my big challenge with analytics has been
00:14:47
◼
►
over time, like figuring out what is the right balance here?
00:14:51
◼
►
And what analytics can I collect that are actually
00:14:55
◼
►
actionable? And that to me is always like the key
00:14:59
◼
►
like gotcha factor is, am I collecting
00:15:03
◼
►
this data because I'm just curious and I want as much
00:15:07
◼
►
data as possible, which is not often a great thing, especially in a privacy conscious world?
00:15:11
◼
►
Or am I collecting this data for some kind of decision
00:15:15
◼
►
to be made in the future? Like is this actually actionable data?
00:15:19
◼
►
And I feel like with A/B testing you have a similar issue with
00:15:23
◼
►
that and you have a couple of new issues. So the similar issue is, are you
00:15:27
◼
►
testing something that really needs to be tested? Can you
00:15:31
◼
►
just figure out like which of these text labels is more clear on this button
00:15:35
◼
►
or do you really need to test it? And then if you test it, are you
00:15:39
◼
►
actually going to get like a really significant difference between two options
00:15:43
◼
►
or are you most likely to get like a very small, you know,
00:15:47
◼
►
not super significant difference between the two? And there's going to be
00:15:51
◼
►
different areas of your app where you have different choices to make here. And then
00:15:55
◼
►
obviously things like paywalls or whatever
00:15:59
◼
►
your business goal for the app is, that seems like the most important
00:16:03
◼
►
place to put something like this. Things like
00:16:07
◼
►
if you have a login or account creation process, you want to know like
00:16:11
◼
►
what kind of tweaks do you have on your first launch experience
00:16:15
◼
►
that can make more people proceed and create the account or set up the thing
00:16:19
◼
►
or whatever they need to do. And then obviously when it comes to how you make your money
00:16:23
◼
►
if there's an in-app purchase or whatever, you obviously want to instrument that
00:16:27
◼
►
as well as you can to be as optimized as it can be within
00:16:31
◼
►
reason. And I think that's the key part.
00:16:35
◼
►
People made fun of the Google 40,000 shades of blue thing because it seemed ridiculous
00:16:39
◼
►
and it did seem more of like a micro-optimization.
00:16:43
◼
►
And I think you can definitely get bogged down a lot in
00:16:47
◼
►
that sort of thing of like, you think you're optimizing the heck out of something
00:16:51
◼
►
but you're actually spending a huge amount of time and waste
00:16:55
◼
►
and privacy and data to possibly only get something
00:16:59
◼
►
within a few percentage points of what it was before. And so I think this is one of those areas
00:17:03
◼
►
where a little goes a long way, just like analytics. You kind of want coarse-grained
00:17:07
◼
►
analytics to know like, do people really use this feature or do I really need to be doing this thing?
00:17:11
◼
►
But to get too fine-grained with it, I think is
00:17:15
◼
►
a possible trap of infinite work
00:17:19
◼
►
that you might be creating for yourself for very small gains.
00:17:23
◼
►
Yeah, and I think with that too, the thing that I, one thing that I like about A/B testing
00:17:27
◼
►
that is sort of related to the tension around
00:17:31
◼
►
analytics and around whether you're collecting too much, is it useful, is it private, etc.
00:17:35
◼
►
is, I do kind of like with an A/B test that in some ways it has a defined
00:17:39
◼
►
duration, that the purpose of this is
00:17:43
◼
►
to try this versus this, and once you have an answer, once you
00:17:47
◼
►
have collected enough data that it's just statistically valid, you can
00:17:51
◼
►
stop the test, stop collecting data, and it goes away
00:17:55
◼
►
in a way that a lot of analytics and a lot of data collection is intended
00:17:59
◼
►
to be something that sort of goes on forever. But like in a weird way
00:18:03
◼
►
the one small sort of nice thing about this kind of
00:18:07
◼
►
data collection is that it's very time limited, which doesn't mean that you should be any less
00:18:11
◼
►
cautious with the data, and like I'm very thoughtful in the way I build my system that is like I don't know
00:18:15
◼
►
anything about these people, all I know is that someone opened a
00:18:19
◼
►
paywall, someone started a membership, I
00:18:23
◼
►
don't do any connection to the actual membership itself, I'm not trying to collect
00:18:27
◼
►
or be creepy in that way, but there is something nice about this that it's
00:18:31
◼
►
very time limited, which both in terms of my time and my energy, like my
00:18:35
◼
►
goal with doing this is I want to as quickly as I can
00:18:39
◼
►
narrow in in like what is within a few percentage
00:18:43
◼
►
points of the most optimal paywall that I can make,
00:18:47
◼
►
find that, and then move on to other things, move on to
00:18:51
◼
►
improving the app in other ways, like adding features and doing the things
00:18:55
◼
►
that I like the main business of my app, rather than getting too
00:18:59
◼
►
stuck in, like I'm sure at some point I would hit, like I have yet to hit that
00:19:03
◼
►
point yet and I'm continuing to run experiments and trials on improving my paywall,
00:19:07
◼
►
but eventually I will hit to that point where I try something that's another improvement
00:19:11
◼
►
or another idea, and at some point it's like this is the best I can do, this is
00:19:15
◼
►
what it is until I can come up with another whole
00:19:19
◼
►
concept or something more fundamentally changes, and at that
00:19:23
◼
►
point it's like I'll move on and I can close this down
00:19:27
◼
►
and stop collecting this data, and so that is certainly one nice thing
00:19:31
◼
►
about this, is that it is time limited in a way that a lot of data collection isn't.
00:19:35
◼
►
I think the other possible major pitfall that you could run into here
00:19:39
◼
►
is people treat data
00:19:43
◼
►
like whatever result you get from an A/B test, you're like "Oh, we have data on this now"
00:19:47
◼
►
and people treat that as gospel, and it's like
00:19:51
◼
►
this data, you cannot argue with the data, and the data
00:19:55
◼
►
means if we do this, we will make more money and therefore we are doing better, so therefore
00:19:59
◼
►
we must do this because the data supports that, and I think you really have to be careful
00:20:03
◼
►
with what you're measuring,
00:20:07
◼
►
how you are evaluating the score
00:20:11
◼
►
of the different options that you're testing, like what exactly are you evaluating
00:20:15
◼
►
when you're saying that A is better than B or vice versa, and
00:20:19
◼
►
that doesn't necessarily result in a
00:20:23
◼
►
broad picture, or long term success
00:20:27
◼
►
necessarily, and it doesn't reflect larger scale
00:20:31
◼
►
factors in your app, so for instance, we hear a lot about
00:20:35
◼
►
"Oh, the company X tested this thing that seems like people
00:20:39
◼
►
wouldn't like it, but it turns out it performs better and it makes them more money"
00:20:43
◼
►
and it's like, well yes, maybe, but
00:20:47
◼
►
if it makes people hate the app more, or if it makes the app seem
00:20:51
◼
►
overall crappier or more user hostile,
00:20:55
◼
►
or people just kind of get a little bit annoyed by it,
00:20:59
◼
►
then that might have long term negative effects on your customers and on
00:21:03
◼
►
their willingness to use your app or their impression of you, or how often they'll
00:21:07
◼
►
come back to it, like if people do the thing you want them to do,
00:21:11
◼
►
but are encountering friction or annoyance while
00:21:15
◼
►
doing it, then that's very likely to make them
00:21:19
◼
►
come back less often, and if the time comes that somebody
00:21:23
◼
►
else comes around offering the same thing that you do, maybe they'd be more willing to switch away from you
00:21:27
◼
►
because they actually have a kind of ambivalent relationship
00:21:31
◼
►
towards your app, whereas it seems
00:21:35
◼
►
by the numbers that you are measuring from your testing, it might seem like you are succeeding
00:21:39
◼
►
by making certain decisions, whereas you actually might be
00:21:43
◼
►
irritating your customers or just giving them little paper cuts, and so you really
00:21:47
◼
►
have to be very, very careful how you are judging
00:21:51
◼
►
how well you're doing here, because what you are measuring with an A/B test is
00:21:55
◼
►
a very small thing, usually.
00:21:59
◼
►
It's literally the definition of the forest of the trees, it's like you are
00:22:03
◼
►
looking at one tree, you are missing the entire forest around you, and so
00:22:07
◼
►
you really have to be very
00:22:11
◼
►
inquisitive about, like, if I do this thing, if I
00:22:15
◼
►
increase the success of this thing by this metric,
00:22:19
◼
►
am I going to possibly cause other negative effects that I'm not measuring?
00:22:23
◼
►
Because all you're optimizing for in that case is the thing you're measuring.
00:22:27
◼
►
But that doesn't necessarily mean you're making a better overall app or a better overall business.
00:22:31
◼
►
Yes, no, and I think that's an excellent point to make.
00:22:35
◼
►
And the obvious version of that would be
00:22:39
◼
►
say I started lying in my paywall, and say, like, you sign up
00:22:43
◼
►
and I'll send you a pony, and then, like,
00:22:47
◼
►
my conversion rate might go up. Yay! It's like, okay, but I'm not actually sending people
00:22:51
◼
►
ponies, and so that's going to make people really upset, and ultimately
00:22:55
◼
►
lead to lots of consequences. And so it is, you have to be very careful
00:22:59
◼
►
that what the changes you're making are, are, it's like
00:23:03
◼
►
ideally it's your choosing between two good choices, and it's like which of these
00:23:07
◼
►
is the most good, rather than trying to
00:23:11
◼
►
be putting yourself in a place that, yes, you could be
00:23:15
◼
►
optimizing yourself into a worse app that is going to perform
00:23:19
◼
►
worse on more sort of like macro measurements, that it's like, it's going to
00:23:23
◼
►
people are going to be less likely to recommend it to their friends, people are going to be less likely
00:23:27
◼
►
to continue using it, that they might sign up for a subscription, but they'll only do it
00:23:31
◼
►
once, and they'll churn away, and then you're actually, like, what you really,
00:23:35
◼
►
you know, it's like, it'd be much better overall for your business, almost certainly, to have
00:23:39
◼
►
a smaller number of people who sign up who keep
00:23:43
◼
►
their subscription every single month going forward, rather than have this very sort of
00:23:47
◼
►
churn heavy, lots of people signing up and then immediately canceling, signing up and then
00:23:51
◼
►
immediately canceling, like, that seems like the version
00:23:55
◼
►
where they stick around is going to be much more sustainable and better for your business, and so
00:23:59
◼
►
being careful and thoughtful about this, and I think
00:24:03
◼
►
that's just, in some ways, it's just another form of design, it's another kind of way
00:24:07
◼
►
of understanding that you don't want, this is a tool that
00:24:11
◼
►
you can use, but if you overuse it, or if you use it in ways that
00:24:15
◼
►
you know, if you, like, never create an option that you're not
00:24:19
◼
►
comfortable with, or you're not excited about, or you don't think is good,
00:24:23
◼
►
but instead, it's trying to understand, I think in some ways, for me it was
00:24:27
◼
►
I realized that I was designing a lot of things with assumptions
00:24:31
◼
►
or things that were built for me, and people who think like me, and
00:24:35
◼
►
people who are developers, or people who are, it's like, I was designing
00:24:39
◼
►
a paywall screen that made sense to me and looked good, but in, what I
00:24:43
◼
►
was sort of increasingly finding is that my paywall screen that I had made initially
00:24:47
◼
►
was confusing, and what I've been able to do with my refinements is to
00:24:51
◼
►
make it less confusing, and that's the kind of improvement that's like, that's great,
00:24:55
◼
►
this is exactly what I want, I want to make it clear, I want to make sure that people who are signing up
00:24:59
◼
►
sort of know what they're doing, and the way that I saw that is
00:25:03
◼
►
like, in, I also measure, you know, the number of people who hit
00:25:07
◼
►
on the, like, start subscription button, but then don't actually
00:25:11
◼
►
complete the purchase, like, I keep track of the, you know,
00:25:15
◼
►
whether people are, just because they hit buy, and then the, you know, the, the Apple's
00:25:19
◼
►
screen takes over, and it's like, I was able to get my cancellation
00:25:23
◼
►
rate down, which to me says, like, in terms of the number of people who hit
00:25:27
◼
►
that button, the number of people hitting the start subscription button, and then actually start a
00:25:31
◼
►
subscription, went up, and it's like, that's perfect, that means people were not confused,
00:25:35
◼
►
whereas in my old version, people were hitting that button, and then they're like, wait, what? You're going to start
00:25:39
◼
►
charging me money, or, you know, the pricing wasn't clear, or whatever it was, for whatever reason, they said,
00:25:43
◼
►
cancel, that's not what I want, and getting that number down, I think
00:25:47
◼
►
is, was like, yep, this is, I'm going in the right direction, but yeah, absolutely, it's important
00:25:51
◼
►
to make sure, it's like, you're, it's like, if you're going to make
00:25:55
◼
►
it, if you're choosing between two choices, like both choices before you
00:25:59
◼
►
start down this road, otherwise you're just ending up, you know, you can optimize yourself into a really
00:26:03
◼
►
bad place. Yeah, like, common sense first, before you
00:26:07
◼
►
even create the test, and always, and keep common sense in mind, you know, like,
00:26:11
◼
►
when, the reason why design by committee is
00:26:15
◼
►
considered so bad, and sense to produce bad results,
00:26:19
◼
►
is because design by committee reduces the chances
00:26:23
◼
►
for, like, an authoritative human decision to be like, wait,
00:26:27
◼
►
this, this is not good, even though, like, you all, we're all kind of trying to build
00:26:31
◼
►
consensus, like, but, you know, one person can say, like, wait, this, this is not good,
00:26:35
◼
►
we should just do this, and data is,
00:26:39
◼
►
is a way to basically have, like, the ultimate design by committee.
00:26:43
◼
►
That's not often a good thing. Like, ideally,
00:26:47
◼
►
you have an opinionated human with, with good, with high
00:26:51
◼
►
standards and decent taste, whose decisions are being
00:26:55
◼
►
informed by data, and that's a very different thing
00:26:59
◼
►
than making every decision by as much data as possible. You know, you still,
00:27:03
◼
►
I think that's a much better balance of, like, you know, you have to
00:27:07
◼
►
have somebody who's, who's providing the human filter above everything.
00:27:11
◼
►
So, you know, like, you know, a very common thing we've been talking about recently around
00:27:15
◼
►
these circles is the discussion around streaming app
00:27:19
◼
►
quality that John Siracusa started on Hypercritical's blog, about, like,
00:27:23
◼
►
you know, all the different TV streaming apps, and one of the things is, like, oh, if you, if you, the most
00:27:27
◼
►
common thing people want to do is resume the show they were already watching, but
00:27:31
◼
►
everyone learned through data that if you move that below the fold
00:27:35
◼
►
of the launch screen, then it'll, you'll make more money, because people will basically
00:27:39
◼
►
accidentally look at more content on their way to the thing they actually want to do, even
00:27:43
◼
►
though everyone hates it. And, like, all the customers are hating this, but
00:27:47
◼
►
technically the data says that you, it performed better on certain metrics, and, like, this is
00:27:51
◼
►
such a great example of, like, there weren't enough humans involved in that, in that decision. There, like, data
00:27:55
◼
►
overwhelmed it to the point where now we have worse
00:27:59
◼
►
products, people are less happy, they, they are actively being annoyed
00:28:03
◼
►
by all these streaming apps, and so, you know what, if, if
00:28:07
◼
►
they're constantly being annoyed by the streaming apps, and then something else comes along, some other service comes along
00:28:11
◼
►
that they're less annoyed by, they're going to want to spend more time on that one.
00:28:15
◼
►
And, you know, it, so it's, it's, it's complicated,
00:28:19
◼
►
but I think the answer is, like, let data help
00:28:23
◼
►
inform you as the human with good taste and who respect your
00:28:27
◼
►
customers to make a better product overall, not letting data
00:28:31
◼
►
optimize all of the humanity and user satisfaction out of your product.
00:28:35
◼
►
Yeah, and I think using it as, as a tool
00:28:39
◼
►
to be more creative, like, and I think that's where I'll leave this with, is the thing that I've
00:28:43
◼
►
really enjoyed about this is looking at these, some of these screens in my apps
00:28:47
◼
►
and saying, okay, how could I design this in a
00:28:51
◼
►
way that I think is great, I think is awesome, but is different, and coming at it,
00:28:55
◼
►
coming at it from, like, a totally different perspective and saying, like, if I was designing this from
00:28:59
◼
►
scratch today, what would I do? And trying to come up with different versions that I
00:29:03
◼
►
think are all good, I think are all humane and coming from the
00:29:07
◼
►
design aesthetic and the way I want to structure my business and the kind of business I want to have,
00:29:11
◼
►
but having the, in some ways, there's a, there's a freedom in rather than me having
00:29:15
◼
►
to only design one and that everything has to be
00:29:19
◼
►
that. I can design five, and I can, each of them are different
00:29:23
◼
►
in different ways, and then I can explore which of those are actually
00:29:27
◼
►
more understandable, more communicative, getting across my point better
00:29:31
◼
►
to my customers, and I found that to be incredibly satisfying creatively
00:29:35
◼
►
rather than feeling like I have to just magically, from my own way, kind of
00:29:39
◼
►
design the right thing, come down from my ivory tower and show it to my customers.
00:29:43
◼
►
And so instead, this is a way to design lots of things and to make that process in a weird
00:29:47
◼
►
way slightly more collaborative, which has been really enjoyable.
00:29:51
◼
►
Thanks for listening everybody, and we'll talk to you in two weeks. Bye.