Developing Perspective

#151: A tree falls in the Woods.


00:00:00   Hello and welcome to Developing Perspective.

00:00:02   Developing Perspective is a podcast discussing news of note and iOS development, Apple and

00:00:06   the like.

00:00:07   I'm your host, David Smith.

00:00:08   I'm an independent iOS developer based in Herndon, Virginia.

00:00:10   This is show number 151 and today is Wednesday, November 13th.

00:00:14   Developing Perspective is never longer than 15 minutes, so let's get started.

00:00:18   All right.

00:00:20   Today I'm going to be unpacking a rather hopefully unique event that happened for me and for

00:00:27   feed wrangler and services I run this past weekend and kind of unpack it and hopefully

00:00:31   talk about some of the lessons I can learn from it and just generally how it seemed it

00:00:36   was just kind of a singular event that I thought would be interesting to talk about.

00:00:39   All right, so this this past weekend, I was down at my family has a little cabin just

00:00:44   it's around the woods in central Virginia, just outside of Charlottesville, if you know

00:00:48   the area. And I went down there with, you know, with some family and it's a lovely place.

00:00:54   It's a wonderful little cabin in the woods that has very limited connections to the outside

00:00:58   world.

00:00:59   It's one of the few places I actually go on a regular basis other than maybe an airplane,

00:01:04   even though it's a Wi-Fi now, where I'm basically disconnected.

00:01:07   There's a telephone.

00:01:08   But other than that, there's really no other connections to the outside world.

00:01:12   And it's a lovely place to go and just sort of decompress or whatever you want to call

00:01:18   it, where it's just kind of fascinating how-- it's like, I don't have my iPhone in my

00:01:23   my pocket, where it normally is, because, well, the iPhone isn't really doing anything.

00:01:29   It's not connected to anything. And so it's a lovely place to go and to relax and to spend

00:01:33   time with my family and to connect with them in a more direct way. And I went down there

00:01:38   this last weekend. And basically, while I'm there, I'm largely disconnected from the world.

00:01:46   And so it's always a little bit tricky for me because I run a variety of web services.

00:01:50   I run a variety of things and businesses and things that have an ongoing presence, you

00:01:54   know, 24 hours, seven days a week.

00:01:56   And so there's always a bit of attention and a bit of nervousness that I have when I go

00:01:59   to one of these trips because I'm not available.

00:02:03   On the flip side of that, I really don't, it's unrealistic for me to always to be available.

00:02:09   And some of this is just the reality of being a one man shop, that some of these risks are

00:02:12   things that naturally come out.

00:02:15   But anyway, so I was down there for the weekend.

00:02:19   On Saturday, we came back from the cabin, went to the local library, they were doing

00:02:24   a book sale, so we went into that.

00:02:26   And then, while I was there, they have Wi-Fi at the library, so I checked on everything,

00:02:29   made sure everything was up and happy, all systems were go, I did all my checks, everything

00:02:35   was fine.

00:02:36   Just to make sure, put my mind at ease, and got back in the car and we drive back to the

00:02:39   cabin and had a lovely next 24 hours.

00:02:43   Turns out, about five to 10 minutes after we got back to the cabin, after I lost reception

00:02:47   and after I was sort of disconnected then

00:02:50   for the next 24 hours, if I had had signal,

00:02:55   I would have received notification

00:02:57   that feed wrangler was completely down.

00:03:00   And as a result, none of the feed wrangler or pod wrangler

00:03:05   servers were working.

00:03:06   Everything was just down.

00:03:09   I'll just briefly touch on what happened, I believe.

00:03:11   It looks like one of my feed scrapers

00:03:14   had somehow gone haywire and started

00:03:19   going wild creating connections to the database.

00:03:21   And eventually, this number of connections

00:03:23   just overflowed its connection pool,

00:03:25   and all kind of bad things happened.

00:03:27   Some of these bugs that was easy to fix, relatively.

00:03:30   Because it was just, I restarted everything,

00:03:32   and so far everything's fine.

00:03:33   And I'm just kind of gradually investigating

00:03:34   exactly what happened to make sure it doesn't happen again.

00:03:37   But generally, that had happened.

00:03:39   So that means, however, I was unaware of this

00:03:41   for the next 24 hours.

00:03:43   I was blissfully unaware, having a great weekend with my family while at this time feed wranglers

00:03:51   down and no one has any idea what's going on.

00:03:54   And it was interesting, then on Sunday when I come back, I get in the car and we start

00:03:59   leaving and as you drive out of the cabin, you come in and out briefly with cell signal.

00:04:08   It's usually very poor signal.

00:04:10   It's like you have one bar and barely 3G, sometimes edge,

00:04:14   that kind of thing.

00:04:15   And I was driving at the time, so I

00:04:18   couldn't look at my phone.

00:04:19   But my phone starts-- I start hearing that it's kind of--

00:04:22   all these notifications start going off.

00:04:25   And the first one that comes on, I was like, oh, it's fine.

00:04:28   There's usually something happens within the 24 hours

00:04:31   that I'm away from signal.

00:04:33   There's usually some notifications,

00:04:34   some text message, something that comes up.

00:04:37   And then it just kept happening.

00:04:38   And they just kept happening.

00:04:39   And they just kept happening.

00:04:41   And so eventually I asked my dad to look at my phone.

00:04:44   And he says, oh, you know, it's a situation.

00:04:47   I'm starting to get some messages.

00:04:48   People are concerned about me.

00:04:49   Am I OK?

00:04:50   What's going on?

00:04:51   And so rather than driving home, we sort of

00:04:54   pull off at the library, which is on the way,

00:04:56   because it's the nearest sort of strong internet connection.

00:04:59   And I sit down, and I realize feeder anglers down.

00:05:01   This is kind of unfortunate.

00:05:04   Go log into the servers, reset everything,

00:05:06   get everything back up and running.

00:05:08   But it had been 24 hours since the service had gone down.

00:05:11   And the ironic part of some of this

00:05:14   is that I don't think I would have been as cavalier

00:05:18   to go into the forest without connection

00:05:20   if I'd been planning this trip back in June, when

00:05:24   feed wrangler was having a lot of scaling problems,

00:05:26   when I was still adapting and growing the service.

00:05:30   It being down briefly was a more common occurrence.

00:05:33   Because it's been so stable recently, though,

00:05:36   I didn't really think too much of it when I left.

00:05:37   I kind of had the vague thought that, "Hmm, it makes me a little nervous," and you know,

00:05:41   that's why I checked in on Saturday and everything was fine.

00:05:43   But ironically, because things had been so good, I didn't think a situation like this

00:05:48   was likely.

00:05:49   It turns out Murphy's law comes into play here in full force, and so things were down

00:05:56   for a long time.

00:05:57   A little note before I kind of get into some of the lessons I've learned that was kind

00:06:01   of cool though was as I got back and I sort of picked up my, started working through my

00:06:06   email and my Twitter queue and all these kinds of things, there was a very interesting kind

00:06:10   of like five stage process that everyone had gone through while, sort of while I was gone.

00:06:15   So if I, as I sort of starting at around midday on Saturday when things went, sort of went

00:06:19   offline, I started getting sort of mentions being like, hey Dave, just want to let you

00:06:23   know if you know what it looks like. It's having some trouble. I'm just gonna let you

00:06:26   know. I'm sure you're on it. And then sort of as things go on a few hours later, it becomes

00:06:30   And it's like, things are still down.

00:06:32   It's like, what's going on?

00:06:34   Just want to make sure you're working on it.

00:06:36   And then they turn into, I guess,

00:06:38   the anger phase, where it's like,

00:06:40   this is kind of getting unacceptable.

00:06:41   What's going on?

00:06:42   This is crazy.

00:06:43   This is a paid service.

00:06:43   I can't believe it's been down for so long.

00:06:45   There's no status updates.

00:06:46   This is unacceptable.

00:06:49   And then that very quickly turned into, oh no.

00:06:52   Has anyone seen Dave?

00:06:53   Does anybody know if he's OK?

00:06:56   Is he fine?

00:06:58   And this is like a change very quickly into like concern for my physical well-being.

00:07:00   Then people were worried that, you know, something had gone crazy.

00:07:04   And then the next, the fifth stage was the, it sounds like somebody needs to go find him.

00:07:09   It's like, you know, I have some friends of mine who were, you know, concerned, started

00:07:13   calling me.

00:07:14   It's like sort of making plans to come over to my house and see if I'm all right, because

00:07:18   it's certainly an unusual thing.

00:07:20   And it's a weird thing being so connected to the world as we are now, that typically,

00:07:26   I'm always available or connected or I'm on Twitter or on email or something in the next

00:07:32   -- at least in some way -- 24 hours a day, seven days a week.

00:07:38   And so it's a rare thing when someone just disappears.

00:07:40   And I appreciate the concern.

00:07:41   It was very kind of touching.

00:07:42   And I noticed a couple of people were also noticing this, that it was kind of cool that

00:07:46   it wasn't so much that people were -- after that early phase, it wasn't so much that people

00:07:52   were upset that Peter Engler was down.

00:07:54   They could do without their RSS.

00:07:55   that they became genuinely worried that something had happened to me. Of course, it makes me

00:07:59   laugh a little bit even more that this was immediately after I had posted my five-year

00:08:04   retrospective podcast and blog post. And so it almost had the appearance in some ways,

00:08:10   I guess you could say, of I just did this huge reminiscence that talks about my last

00:08:14   five years and here's what I've done. And then I just sort of dropped the mic and walked

00:08:18   away. Which wasn't the case, but it was kind of an amusing sort of coincidence that right

00:08:23   after I do all this retrospective work and kind of talk about the things I've done, then

00:08:27   he just disappears.

00:08:28   So things, you know, it's like, overall, things are, you know, things are alright as a result.

00:08:35   I mean, the server was down for 24 hours, which is unfortunate. It's something that

00:08:40   I would certainly like to avoid. The crazy reality, though, of servers and things is

00:08:46   is that it being down for that long is still a relatively short period of time, sort of

00:08:53   if you annualize it or however you want to look at it.

00:08:55   You know, the whole thing where they talk about how many nines is your uptime.

00:08:58   Being down for an entire day, your uptime would still be 99.9972%.

00:09:06   And so I'm not going to get too crazy about it.

00:09:08   I'm still working out if I'm going to-- there may be some kind of little extra extension

00:09:13   or something I do on people's accounts as a result,

00:09:15   add a day or a week on or something like that

00:09:17   to compensate for it.

00:09:18   But the reality is these are just

00:09:20   sort of some of the things that happen, especially

00:09:23   being a one-man team working on a project,

00:09:26   that sometimes it's never going to be-- there

00:09:29   are some benefits of that.

00:09:30   There are things that it allows me, for example,

00:09:33   to be a relatively low-cost service,

00:09:35   that I'm not trying to support a large team, a huge payroll.

00:09:38   So as a result, I can charge a relatively modest amount,

00:09:41   I believe I'm one of the cheapest, if not the cheapest,

00:09:45   non-free service around there for RSS syncing.

00:09:49   But on the flip side, there's these kinds of problems

00:09:51   where I don't have an ops team.

00:09:52   There's not a group of people with pagers

00:09:54   who are sitting there on call on a rotating schedule

00:09:58   to be available whenever time anything goes wrong.

00:10:01   That said, I'll probably be making some changes going

00:10:03   forward that when I'm in a situation like this where

00:10:06   I know I'm going to be disconnected,

00:10:08   I'll probably be giving the SSH keys to some trusted friends

00:10:11   so they can go in and do at least basic server maintenance,

00:10:13   even if it's just restart all the servers

00:10:15   and hope it comes back up and works, that kind of thing.

00:10:17   Which I probably should have done this time in retrospect,

00:10:20   obviously.

00:10:20   Like I said, things had been so stable that it really

00:10:23   wasn't a thought that I had.

00:10:25   But now it's a lesson learned.

00:10:26   And it's something that I can hopefully take from and grow

00:10:31   going forward.

00:10:32   But it was an interesting experience to go through.

00:10:36   And I think I've mentioned this in my 50 year-- it's

00:10:39   That's a five-year retrospective last week.

00:10:43   So much, I think, of being good at what you do,

00:10:47   being successful overall, is you have to make mistakes.

00:10:52   Because no matter how often you are told about something

00:10:58   or you read about something, it's

00:11:00   very hard to take those lessons and apply them

00:11:02   to yourself in a consistent way just

00:11:06   from hearing someone else's mistakes

00:11:08   or hearing about someone else's experience.

00:11:10   For me personally, maybe this is a character flaw.

00:11:12   I often have to have made that mistake myself the first time

00:11:15   before I will be able to learn from it and really apply it

00:11:18   because I don't take it seriously.

00:11:19   Or I diminish it or I say, oh, I'll be fine

00:11:21   or those types of things.

00:11:23   And so having, you know,

00:11:25   it's like having gone through this experience,

00:11:26   it's like I now have some interesting,

00:11:27   learned some interesting lessons about, you know,

00:11:30   I have to make some decisions too personally

00:11:32   about what kind of an uptime commitment

00:11:36   or expectation that I want to set,

00:11:38   that I want to have as reasonable for people.

00:11:41   Because the reality is I can't have

00:11:47   the constant uptime of feed wrangler

00:11:49   take precedence over my life in a pervasive way.

00:11:53   That I'm constantly worried about the servers

00:11:57   and the service and so on.

00:11:58   There's this constant worry every time

00:11:59   I go to bed that is something going

00:12:01   to happen in the middle of the night that's

00:12:03   going to cause the service to go down.

00:12:04   Because ultimately, the quality of life

00:12:06   that that would engender just sort of isn't for sale, maybe

00:12:09   is the best way to say it.

00:12:10   It's not something that I would choose to take on.

00:12:14   And because the reality is I'd rather be available and present

00:12:18   for my family, my kids, and I want to not have something

00:12:22   hanging over me.

00:12:23   But at the same time, I want to run a good service.

00:12:25   I want to run a service that works well for people

00:12:29   most of the time.

00:12:30   And a lot of that is probably going to come into--

00:12:32   I'll need to be making some changes, I think,

00:12:34   and technically on the back end some of the things

00:12:36   that I can do to potentially mitigate this

00:12:38   or make it a bit more self-healing

00:12:39   are those types of situations, sort of scenarios.

00:12:42   But overall that's kind of where I find I,

00:12:44   sort of I find myself.

00:12:46   And it was an interesting experience to go through

00:12:47   because now I have to think about those things

00:12:48   in a way that I don't think I necessarily

00:12:50   had thought them all the way through before.

00:12:52   That what is a reasonable expectation for uptime?

00:12:55   And the reality is some people,

00:12:56   customers are gonna have different feelings for that.

00:12:58   I've had some people who came back to me

00:13:00   once everything was back up

00:13:02   and they knew that I was okay

00:13:04   And they were like, I'm glad you had a great weekend.

00:13:06   Don't worry too much.

00:13:07   We can get our RSS later.

00:13:08   It's not a big deal.

00:13:09   And I had some people who emailed,

00:13:11   and they were very upset.

00:13:12   I can't believe this is a paid service.

00:13:14   I gave you good money for this service,

00:13:15   and I can't believe it's been down for as long as it is.

00:13:17   I want a refund.

00:13:17   I want-- the reality is I can't please everybody.

00:13:20   I'm going to-- no matter what I did,

00:13:23   if it being down for 24 hours is a long time.

00:13:26   If it had been down for two hours,

00:13:27   you would have had-- some people would have had the same thing.

00:13:29   It could have been down for two minutes,

00:13:31   have had the same reaction.

00:13:34   And that's valid.

00:13:35   That's fair.

00:13:36   They're entitled to their opinion.

00:13:38   But the reality is, ultimately, their opinion

00:13:40   can't drive what I do.

00:13:41   It's sort of like if you try to build

00:13:42   an app that was reacting to all the App Store comments

00:13:47   that you received.

00:13:48   The app you would end up with, I don't think would be good.

00:13:51   Nor would you enjoy building it, probably.

00:13:53   Ultimately, you have to make some decisions about the kind

00:13:55   of business you are, the kind of way

00:13:57   you want to run your business, and just go forward from there.

00:14:01   All right, so that's it for today's show.

00:14:02   Just kind of an interesting story that I wanted to share.

00:14:04   Hopefully that was interesting and useful.

00:14:05   And if you're one of the people who was affected by it,

00:14:08   I'm sorry.

00:14:09   I wish it hadn't happened.

00:14:10   I've learned a few things, and I don't

00:14:11   think this particular problem will happen again.

00:14:14   Though there's always the worry with these types of things

00:14:17   that you end up fighting the last war

00:14:19   rather than being able to predict whatever

00:14:20   the next battle is going to be.

00:14:22   But anyway, as always, if you have questions, comments,

00:14:24   concerns, or complaints, I'm on Twitter @_davidsmith,

00:14:27   david@developingperspective.com.

00:14:29   I hope you have a great rest of your week.

00:14:30   coding and I'll talk to you later. Bye.