00:00:20 ◼ ► where the whole story goes, there was a core motion bug that is still, I believe, in iOS,
00:00:52 ◼ ► And from almost every, you know, the vast majority of my users, it seems to work perfectly.
00:01:21 ◼ ► And then in my fix for that, I ended up with another bug that ended up over-reporting steps,
00:01:30 ◼ ► But in both cases, and this is, I think, something that I think we all kind of have to deal with at a certain point,
00:01:59 ◼ ► is probably, it's one of those, like, advanced developer skills that you develop over time just by necessity.
00:02:14 ◼ ► Like, there is a user who is having this bug, and every time they open the app, it doubles all their step counts.
00:02:27 ◼ ► Like, there is no, like, this doesn't, the way this code is written, this should never happen.
00:02:33 ◼ ► Like, it is just taking a number from the system framework and putting it into the database.
00:02:37 ◼ ► It is, like, how is that number becoming, like, multiplied exponentially in the middle?
00:02:44 ◼ ► But I have a list, and this is kind of what I think will be an interesting thing to unpack over the course of this episode.
00:02:53 ◼ ► Because at a certain point, like, you can, you know, and they become more esoteric and, like, eccentric as they go.
00:03:01 ◼ ► But you've, you know, you're kind of in there in this place that you have, if you, if it isn't obvious how to reproduce,
00:03:07 ◼ ► like, there's this first phase of all the strategies you can do to try and reproduce it.
00:03:12 ◼ ► And then you get into the phase where it's like you are becoming a detective, and it's like you accept,
00:03:18 ◼ ► like, you just sort of, you pass into the, you're on the step where you've reached acceptance,
00:03:27 ◼ ► Like, what can I do, even if I can never actually reproduce this, because I know what the, like, the input is, the code I wrote,
00:03:34 ◼ ► and I know what the output is, the code over here that I'm clearly seeing someone using out in the world,
00:03:42 ◼ ► So, that's where I find myself, and hopefully this will be an interesting just kind of exercise to, like, walk through.
00:03:47 ◼ ► Because as I was going through this crazy debugging thing, I just kept a list of every time I tried some crazy new approach,
00:03:56 ◼ ► Oh, yeah. I mean, I have had so many crazy bug reports or crash reports that I just have not been able to figure out.
00:04:06 ◼ ► Like, because I'll, you know, I'll just go through the same steps as you. Like, I'll look at the code, and I'll be like,
00:04:09 ◼ ► "That shouldn't be possible, but it's happening." So, you know, and there has never been a time when a bug that I thought was impossible
00:04:19 ◼ ► was actually, you know, some other reason, like, "Oh, there was a bug in the crash reporter," or, "There was a bug, like, in the OS."
00:04:25 ◼ ► Like, that almost never happens. I've heard programmers blame everything from, like, "Oh, it must be a bug in the compiler."
00:04:31 ◼ ► Spoiler alert, there's never a bug in the compiler as far as we're concerned. Like, if you're a compiler author, maybe.
00:04:42 ◼ ► And so, like, there's all sorts of, like, potential, like, you know, voodoo that we hope it is, that we blame,
00:04:47 ◼ ► because our logic is telling us, "It can't possibly be my bug." But, newsflash, it's usually our bug.
00:04:53 ◼ ► And so, I, too, have had a lot of situations where it's just completely not obvious. I mean, and there's lots of different reasons that this could be.
00:05:02 ◼ ► You know, part of it might just be, like, the crash log that you got, if you got one, isn't very useful.
00:05:08 ◼ ► You know, Apple's crash logs have a wide variety of usefulness. The good ones, the ones that are actually useful, are, like,
00:05:15 ◼ ► this line of this file through this exception. And that's great. If you can see that, you can almost always figure out,
00:05:23 ◼ ► at least, like, what caused the bug, or, like, at least, like, how you can maybe put in some safeguards to avoid it.
00:05:31 ◼ ► Maybe, like, in that function, check for the nil value, and just, you know, don't operate if you have one, or, you know, something like that.
00:05:37 ◼ ► But, a lot of times, you'll get crash logs that are not, you know, not, they don't symbolicate, or they don't actually include your code in the call trace.
00:05:48 ◼ ► Or, it's, you know, it's all system frameworks, and maybe you dispatched something out to a queue, and it lost track of where it was dispatched from.
00:05:56 ◼ ► Maybe it's some other kind of bug, like, being a high resource termination, like we talked about last week,
00:06:02 ◼ ► where it's not actually terminating you, like, at the point of the problem, it's just terminating your process in general,
00:06:08 ◼ ► and the log is basically whatever it was doing at that moment, but that doesn't necessarily mean that was the problem.
00:06:14 ◼ ► So, there's all sorts of, like, variability in crash logs, and so sometimes you just have to kind of try to reproduce it, try to figure it out,
00:06:22 ◼ ► and a lot of the conditions are, like, you know, this appears to happen on a, you know, with a device combination that you don't have,
00:06:32 ◼ ► in connectivity that you might not have, and that maybe Network Link Conditioner can or can't simulate for you,
00:06:38 ◼ ► using data that you don't have, because it's their data, in, you know, operating on a phone that has a low battery, you know,
00:06:46 ◼ ► wear level, so it's being throttled, and you can't reproduce that either, and maybe their phone has, like, a bad RAM chip or something,
00:06:53 ◼ ► and so you're occasionally getting RAM errors that aren't being detected, like, there's so many variables,
00:06:58 ◼ ► and if it's not something you can easily reproduce on your end, where you can actually make it crash, like, while running from Xcode in the debugger,
00:07:05 ◼ ► like, if you can't do that, a lot of times you're just out of luck, because, you know, a lot of these problems, you know,
00:07:11 ◼ ► users report them as, you know, intermittent issues, like, well, sometimes when I do this, it crashes, or, you know,
00:07:17 ◼ ► sometimes it loses my progress in this thing, and it's like, well, if it doesn't happen every time, it's, and you can't figure out why it might be happening,
00:07:33 ◼ ► And so, like, the first thing I always do when I hit the situation where it's, like, a hard to reproduce bug, or at least it's not obviously reproducible,
00:07:40 ◼ ► but it's something that, like, in my normal daily use, I can do it too. So, like, the first thing I tend to do is I go as wide as I can with testing devices.
00:07:49 ◼ ► So, like, on my desk right now, I am absolutely surrounded by old iPhones and Apple watches. Like, they are, like, sprawling over the edges of my desk.
00:07:58 ◼ ► Like, it's, and this is why, like, every time I have upgraded my iPhone, I always keep the old ones around.
00:08:05 ◼ ► If my wife, you know, is done with the phone, I always hold onto it. Like, maybe it's, like, in some ways, you know, like, the value of these is certainly higher earlier,
00:08:14 ◼ ► but at this point, like, the value to me is that I have all kinds of random devices running all kinds of OSs that I can try things on.
00:08:23 ◼ ► And so the first thing I do is I just do that. And I think sometimes you'll be lucky and you'll just cause it, because often these kinds of bugs, the hard to reproduce bugs,
00:08:31 ◼ ► the reality is, I think, like, probably, like, half of them, in my experience, become, are some form of timing issue.
00:08:38 ◼ ► That there is, the reason it is intermittent, the reason it doesn't happen all the time is that, you know, there is some kind of race condition.
00:08:44 ◼ ► There's some kind of issue where if this happens before this, then it's fine, but if, you know, every now and then it happens in the wrong order and things all fall apart.
00:08:53 ◼ ► And so testing on a variety of devices is a really useful way to do that, just because sometimes old devices or new devices are more susceptible to a particular timing issue.
00:09:05 ◼ ► And it's used, you know, one thing I will say is in all of my, like, in my customer support form, at the bottom of it, I always add a footer that includes a bit of debug information,
00:09:15 ◼ ► that includes the device type that the user has, which is something that you can just sort of pull from the device information of the device.
00:09:24 ◼ ► And so I can say, so I can get a sense of, you know, sometimes they'll be isolated to a particular device or a particular class of devices, you know, so it's like, it seems to be happening on A9 processors.
00:09:35 ◼ ► Like, the particular bugs that I've been dealing with weren't isolated to anything, but I always put that information in there, just so that I kind of have a starting place where to go.
00:09:43 ◼ ► But, like, step one, go as wide as you can on testing devices. And if you don't have access to those, then this is where you can, it's like you just sort of try to ask your friends, ask anybody you can.
00:09:55 ◼ ► Like, I asked, I've, you know, I called up my dad, had him run his phone and my mom's phone, like anybody you can think of who can try it, because what you're hoping for is that you'll, you know, you'll catch this bug in the wild,
00:10:07 ◼ ► where someone of these devices will start exhibiting this behavior, and then you can, it's just like, you can capture that, you can, you know, wrap it in bubble wrap, bring it into your lab and analyze it.
00:10:21 ◼ ► Yeah, I have found trying to capture info about the user's device and set up and everything, I honestly have very rarely found that a bug was specific to that device or that version of the OS or anything. Usually my bugs are so outrageous that they crash on everybody's devices, you know, it's some problem.
00:10:43 ◼ ► Yeah, yeah, yeah, I'm an equal opportunity crasher, I crash on everything. So, I found that that hasn't helped me. Like, I used to keep all my old devices and, you know, I would say it was for testing, you know, I must, I need to have this for testing, and the reality is, I think the total number of times I've actually needed to run something on an old device and actually gathered useful info from that, in the, you know, whatever it is, 11 years that I've been an app developer, I think might be something like three times.
00:11:14 ◼ ► And I think what I find is it isn't necessarily that it's ends up being unique to one device. It's like all I'm trying to do is increase the odds that some weird circumstance is going to happen.
00:11:27 ◼ ► And the best in like the, I'm going wide on devices, mostly just so that there's a higher probability that I'm going to see it rather than necessarily that it's unique to a particular device. Like sometimes it is not in the sense that it only happens on one device, but in that it's easiest to reproduce on one device, maybe.
00:11:45 ◼ ► Yeah. And like, and you know, depending on what your app does, like, you know, it might be more or less relevant. Like, if you have a game, and you're seeing some kind of like graphical thing, or you're running out of like, you know, graphical resources, obviously, like every phone has different GPUs.
00:11:57 ◼ ► And so, it might, you know, it might be more useful for you to have those old devices. If you're doing performance testing, and you know that it's more useful there too.
00:12:05 ◼ ► In my case, like making a podcast app, I'm not usually pushing the bounds of performance on any of the hardware that I'm running on. And I also am usually not like, the only thing that would really matter to me is like, if the audio output hardware is being handled differently on different devices.
00:12:21 ◼ ► And there was an era where that was happening. Like in the very early iPhone era, like some of the hardware would have hardware decoder, some of it wouldn't. Some of it would, you know, do different kinds of conversions, some of it wouldn't like on the output stage.
00:12:35 ◼ ► But these days, I don't think that changes very much. It actually does change significantly if you use AirPods. But otherwise, you know, you can do that on any phone. But otherwise, like, you know, the hardware doesn't really affect my app very much.
00:12:47 ◼ ► So the next thing I do, once I've tried a bunch of different devices is I will try to separate and isolate the code that seems to be responsible for the bug. And I often will end up putting that into a separate project.
00:13:03 ◼ ► And sometimes this is useful if eventually you're going to need to like submit a bug to Apple, then they often they will always ask for an example project. And so sometimes it's useful to just have one. But often I find that this is a good next step to try and isolate the areas where the bug could be happening.
00:13:22 ◼ ► And it lets you much more dynamically and much more quickly try different scenarios for that app that you don't have to like have the whole app get spun up. And it's doing a whole bunch of other stuff. Like you can just you know, in this case, I have a, you know, testing app, whose purpose is just the, you know, to do the all the step the step analysis stuff that I do in my app.
00:13:44 ◼ ► And I have a little test app that I pulled out all that like me for me, there's only like one or two classes that are primarily associated with that. And I pull them out. And then I just have like this ridiculously simple interface that has a bunch of different buttons that lets it try different things.
00:13:58 ◼ ► And it displays a whole bunch of debugging information. And then I can use that as a way to test it. Sometimes this will help just just like the aspect of pulling it out will identify will help you identify an issue that was a logical issue.
00:14:14 ◼ ► And it's not necessarily like you know, it's the, if in the process of extracting it, you realize there was some dependency that you weren't characterized that you weren't expecting, or those types of situations like sometimes I've even just found that to be useful, or even it just lets you look at the problem in a slightly different way.
00:14:30 ◼ ► And once we get into it, I'll get into next. It also just lets you try crazy approaches of testing your app without necessarily like dealing with it inside of the scope of your whole app or being worried about like breaking things in a horrible way.
00:14:44 ◼ ► We are brought to you this week by Linode. With Linode you get access to a suite of powerful hosting options with prices starting at just $5 a month. You can be up and running with your own virtual server in the Linode cloud in under a minute. Linode offers industry leading performance with native SSD storage, a 40 gigabit network, and Intel Xeon E5 processors.
00:15:04 ◼ ► They also now have 10 data centers around the world so you can serve your customers even faster than before from pretty much wherever you want. They also have an API that allows you to easily automate tasks or develop custom applications in the cloud, and everything is manageable via their own awesome web interface or the command line.
00:15:20 ◼ ► All Linode's pricing tiers feature hourly billing with a monthly cap on all plans and add-on services like backups and node balancers. Linode is great for so much of different hosting tasks. You can host database servers, mail servers, VPNs, Docker containers, Git servers, and so much more.
00:15:36 ◼ ► You can even host your entire web backend like Dave and I do. I host all of Overcast there. I have 24 Linode instances running most of the time and it's just a fantastic host. I've been there since 2011 and very, very happy with it. And they're also hiring right now.
00:15:51 ◼ ► So if you're looking into that, maybe go to Linode.com/careers. Otherwise they have fantastic pricing options available. Plans start at 1 gig of RAM for just $5 a month and they offer lots of other plans that go higher than that including high memory plans if you need anything like that.
00:16:06 ◼ ► Listeners can sign up at Linode.com/radar. That will support us and get you $20 towards any Linode plan. And that could be 4 free months in that 1 gig of RAM plan. And with a 7-day money-back guarantee, there's nothing to lose. So go to Linode.com/radar to learn more and sign up and take advantage of that $20 credit or use promo code radar2018 at checkout.
00:16:33 ◼ ► So once I've extracted the problematic code or the code that I think is problematic, and sometimes that helps and sometimes it doesn't. If it doesn't, this is where the fun starts. And this is where things get a little bit creative.
00:16:46 ◼ ► The next approach I tend to do is, if I can't reproduce a bug, I'll just start. This is where it's like the chaos monkey approach where I'll just start sending horrendously malformed data into the system.
00:17:03 ◼ ► I'll try and just send actual random. I can just set up this testing app so that it's just going to try and generate and save 100 million random samples and just see what happens.
00:17:19 ◼ ► Is it going to happen sometime in there? If a bug only happens out of every 100,000 saves, maybe I need to generate 100,000 saves. And you just start inventing things that could go wrong or problems that might happen and just throwing them at the app.
00:17:40 ◼ ► Similarly, I also will often start to do things where I start to intentionally break the app in a way that is potentially useful.
00:17:54 ◼ ► There are sometimes where I will have, if you do any kind of locking or queue-based things where you have a bunch of operations that happen, there's two operations that should never happen concurrently because they were executed on a serial queue, for example.
00:18:11 ◼ ► It's like, "What happens if I make that queue concurrent? Does the bug happen?" And obviously these aren't fixes, these aren't things that you're ever expecting to ship.
00:18:25 ◼ ► But these are things that you can try that sometimes will at least lead you in the direction of the bug that you're having such a hard time reproducing because the nature of it being hard to reproduce is that you don't know what's causing it.
00:18:41 ◼ ► So I don't have to guess. In your past experience, these are kinds of things that have caused trouble for me. So why don't I just, rather than the app now has all these fixes and all these hedges against these situations that should never be possible, what if I make them possible? Does it happen?
00:18:59 ◼ ► Half the time, all that you're doing is just breaking the app in a new way, not in the way you're trying to look at, in a way that is obviously broken. But sometimes maybe it helps. And the reality is, in this process, that the side effect that is sometimes useful is that sometimes you'll find other bugs.
00:19:17 ◼ ► Which, as I was going through this process recently, I ran into a whole bunch of these little kind of weird edge case bugs that I had no idea were in the app that weren't the bug I was looking for, but were very useful to know and easy to fix.
00:19:31 ◼ ► And it was definitely a nice side effect. And maybe sometimes the bugs you find will lead you in the direction of this bug or they're related, but in ways that you can't understand. Because I think there is something profoundly humbling about doing this kind of work, though, where you realize that as much as you created your app, you wrote it, you sat down in Xcode and you typed out all those characters.
00:19:55 ◼ ► And you sometimes don't know what's happening. It's like you're building this multi-dimensional puzzle and you can only kind of reason in a few of the dimensions at once, but there's all these kind of complicated interactions and things that are happening that, as much as we'd like to think that we were in control, we really aren't.
00:20:13 ◼ ► I mean, that's kind of the story of my life. Whenever I get some kind of weird bug, I realize, "Well, I have no idea." You kind of throw your hands up, you're like, "I have no clue what's going on here. I can't figure it out."
00:20:27 ◼ ► But the wonderful thing about being an indie is that no one else is going to figure it out for you. So you just kind of have to, it's like, "Well, either I figure this out or this bug doesn't get fixed." Simple as that. Or maybe by, as you mentioned, maybe by going and changing other things in the future, maybe you just kind of avoid that bug or you remove the conditions that we're bringing to the forefront.
00:20:53 ◼ ► There's so much about modern app development where it's so complicated. You're going to have bugs, as we talked about last week. You're going to have bugs no matter what. And some of them are just real head scratchers.
00:21:03 ◼ ► And sometimes you can use techniques like this, as you said, to figure them out. Sometimes you can't. And sometimes you just have to deal with, "Well, for some reason there's a bug here." And you can try to rewrite that whole module or change to a whole different API or something. There's some kind of squirked earth things you could try to do, but it's not really fixing the bug. But sometimes it's your only option.
00:21:29 ◼ ► And yes, there have definitely been these moments in this last process, last few days, where I've had this sense of, it's like the despair. It's like, "Man, what if I just burn the app down and start again?" And obviously it's like, "No." It's like a moment later, "No, Dave, it'll be fine. This is affecting 0.1% of people. 99% of people are happy. The people you're talking to are the people who are upset. It'll be okay."
00:21:59 ◼ ► So speaking of people, that is also actually a tool that I found very helpful. If you have somebody for whom the situation is fairly reproducible, engaging that person, talking to them, emailing them, collecting as much information from them as you can, especially if you can get a sense of how technical they are, can be very useful.
00:22:20 ◼ ► Actually, in my situation, I ran into somebody who was actually an under-the-radar listener. So listener Ollie was amazing in providing a whole bunch of information that was much more detailed and technical than you would normally be able to get from a typical user.
00:22:35 ◼ ► So if you can find someone like that, without being a little too pushy, but mind that relationship, squeeze every ounce of it that you can, because sometimes you'll get this one little clue. And I had a similar bug recently, it was a different situation, but I had one user who, the way they described the problem, totally reframed it in my mind.
00:22:58 ◼ ► And I was suddenly able to actually know what the solution was, because they were describing the problem in a different way than all the other bug reports that were coming in.
00:23:08 ◼ ► And so it's definitely a useful tool to listen to the customers who are reaching out. Keep in mind that they are the people who are upset, they are the people who are hitting the issue, and that they are not representative of your user base at large, which I just mentioned just from a sanity perspective,
00:23:24 ◼ ► because I think it's so easy to get stuck on, "This bug is so awful, it's affecting everybody, no one loves my app, it's terrible, everything's on fire," when the reality is, it's often a very narrow group.
00:23:36 ◼ ► In my case, it was happening rarely, and it was happening only for people who have Apple Watches, which is a very, it's like 10 or 15% of my users overall, and within this group, it's only for people who have the Watch app installed, which is even smaller,
00:23:49 ◼ ► and it's reassuring when you start to think through those things, but helpful users are amazing, and if you are in a situation where you can provide that feedback to a developer, which I guess I just mentioned in terms of, I imagine the audience at the show is very technical,
00:24:05 ◼ ► if you can provide that to somebody, you are doing them a tremendous service for tracking down a bug and getting it fixed, because as a technical user, if you need to look up logs, or you're able to generate files, or even just, if I ask somebody for a description of their setup,
00:24:23 ◼ ► you'll think of all the different things, like what's my locale, what my language is set to, what's my iOS version, what's my watchOS version, you can get into much more technical detail than maybe the typical user would say, "I have an iPhone," and that isn't quite as helpful.
00:24:38 ◼ ► Oh yeah, I mean, just to reiterate, if you can give a developer reproducible steps to cause a bug to happen, because that's one of the problems I had with Overcast this past build, is I had this weird streaming bug that I could not figure out the steps to reproduce it,
00:24:56 ◼ ► and so I couldn't get it to happen, and my customers kept reporting it, I could not get it to happen, and so I put out the call on Twitter from the company account, I'm like, "Look, please, if anybody has reproduction steps, please let me know," and fortunately, a couple people did.
00:25:10 ◼ ► Finally, all I had to do was ask, but a couple people did, and I was able to finally cause it to happen on my phone while running in the Xcode debugger, and so I was able to figure out and fix that crash,
00:25:24 ◼ ► and it was only because I had those steps to reproduce it. If I continued not being able to figure that out, that bug would just still be there, and those customers would just still be hitting it, and there would be nothing to do about it.
00:25:37 ◼ ► Yeah, good users are an amazing resource. And I guess the last of my list of things to do if you're trying to reproduce something is always remember that the users are potentially running a different version of the app than the one that you have, and by that I mean they are running whatever the latest app store version is, which may or may not be the version.
00:26:04 ◼ ► Sure. So there are all kinds of issues that can come from that, and so in the same way that at the beginning I said I could use lots of different devices, it's like I go to the app store, install the version from the app store onto each of those devices, build a user setup based on that app store version, and then see if I can recreate it, because I find certain issues, too, are often sometimes hard to reproduce if you're connected to the debugger, which is like,
00:26:33 ◼ ► mind-bending when you think about it, that some bugs only exist when they're not connected to the debugger, but sometimes that can happen, because apps behave differently and are in different states when they're connected to the debugger, like in terms of the, I'm not sure at the low level, but it seems like sometimes things are different when you're connected to the debugger when you aren't, and so sometimes just trying to use the app store version on a fresh install, build a bit of data, and then try and do the steps that seem to be maybe reproducing it can sometimes be helpful.
00:27:02 ◼ ► But at the end of this, sometimes you just can't, and this is ultimately where I found myself for this particular bug, is I ended up with a list of five things that I think that as I was auditing the code and doing all this testing that I thought might help, I was never able to reproduce it, but I took, you know, I've implemented all these fixes, they were all fairly safe and not problematic fixes, and I just, like, I shipped the update, and so far, it seems promising,
00:27:31 ◼ ► like, we'll see, but ultimately, that's the best I can do, and I tried in the update to, like, not be too clever, not be too cute, like, the point of this update is to just do simple, safe changes that are, seem hopefully in the right direction, and then my hope is that I'm pushing, like, I don't think I fixed it, but I probably hopefully pushed out by, like, one order of magnitude the likelihood of it happening to someone.
00:27:55 ◼ ► So rather, you know, if it was happening to one in 1000 users, maybe now it happens to one in every 10,000 users or something like that. And sometimes that's the best you can do until you hit a point that you can reproduce it, because even after all those crazy steps and all the crazy things I tried, I've still never seen it.