#143: Bare Metal.
00:00:00
◼
►
Hello and welcome to Developing Perspective.
00:00:03
◼
►
Developing Perspective is a podcast discussing news of note in iOS development, Apple and
00:00:07
◼
►
I'm your host, David Smith.
00:00:08
◼
►
I'm an independent iOS and Mac developer based in Herna, Virginia.
00:00:11
◼
►
This is show number 143.
00:00:13
◼
►
Today is Tuesday, September 17th.
00:00:15
◼
►
Developing Perspective is never longer than 15 minutes, so let's get started.
00:00:19
◼
►
All right, so today I'm going to be unpacking, taking a break from, I guess, the iOS 7 stuff,
00:00:25
◼
►
new iPhone stuff, all the things I've been talking about recently, and go back a little
00:00:29
◼
►
bit to something that I talked about for a while before, which is Feed Wrangler and talking
00:00:34
◼
►
about some of the more web service side of things there. And specifically I'm going to
00:00:39
◼
►
be unpacking my experience of the last few days and some of the lessons I've learned.
00:00:44
◼
►
So just a quick bit of background, Feed Wrangler, soon to be pod Wrangler, is a system that
00:00:50
◼
►
I built that is a RSS aggregating syncing service. So it's a replacement for Google
00:00:55
◼
►
in a lot of ways that hopefully adds a bit more value
00:00:58
◼
►
and does things in another way beyond that.
00:01:01
◼
►
And this is a service that I launched,
00:01:02
◼
►
I think it was back in May-ish,
00:01:04
◼
►
it was about something like that.
00:01:06
◼
►
And it's been doing pretty well.
00:01:08
◼
►
It's definitely been meeting my expectations
00:01:09
◼
►
and doing well, sort of doing well
00:01:12
◼
►
and being a significant interesting part of my business.
00:01:16
◼
►
And as a result of, sort of,
00:01:19
◼
►
feeder-angler at its core is a sync service.
00:01:23
◼
►
It's something whose purpose is to take
00:01:26
◼
►
rather remarkably large amounts of data,
00:01:29
◼
►
essentially from RSS feeds all over the internet,
00:01:32
◼
►
aggregate them together, and then let people browse them
00:01:34
◼
►
in an easy interface, providing a good API for third party
00:01:38
◼
►
clients, and so on.
00:01:39
◼
►
And so it's something that I built initially
00:01:43
◼
►
on using virtual private servers,
00:01:45
◼
►
specifically at Linode.
00:01:47
◼
►
And it's something whose performance characteristics
00:01:51
◼
►
have always been complicated.
00:01:53
◼
►
I think by the nature of what it's doing,
00:01:58
◼
►
there are certain problems that are just part of it
00:02:01
◼
►
that are inescapable, that no matter what you do,
00:02:03
◼
►
it's just a hard problem to solve.
00:02:06
◼
►
There's a lot of data.
00:02:07
◼
►
There's a lot of, just the sheer bandwidth of items
00:02:08
◼
►
that I'm trying to manage and sort and deal with
00:02:12
◼
►
is pretty remarkable.
00:02:15
◼
►
I think that the simplest version is that right now,
00:02:17
◼
►
I think the database that sort of stores all the articles
00:02:21
◼
►
manages them I think is up to about 200 gigabytes or so.
00:02:25
◼
►
So about 200 gigabytes of data in just about six months or something of work.
00:02:29
◼
►
And so it's quite a lot to keep track of and to manage.
00:02:33
◼
►
And I built it using virtual private servers at Linode, which was something that I
00:02:37
◼
►
was very familiar with and it worked well for me in the past.
00:02:41
◼
►
And it had, however, what I recently found,
00:02:45
◼
►
and this is the main story that I'm going to get into now,
00:02:49
◼
►
that it was never quite performant enough.
00:02:52
◼
►
It was never quite as fast as I would have liked it to be.
00:02:55
◼
►
It was never quite as stable as I would have liked it to be.
00:02:58
◼
►
And so I kept trying and working out ways to get around that.
00:03:03
◼
►
I kept adding different types of caching,
00:03:06
◼
►
taking the database and splitting it in two
00:03:08
◼
►
and having a read slave and a master,
00:03:10
◼
►
where all the writes happen, and all these types of tricks
00:03:13
◼
►
that I was kind of heading towards.
00:03:15
◼
►
What I found, though, is that that
00:03:18
◼
►
all kinds of other problems with it.
00:03:20
◼
►
And so here comes the story part of this.
00:03:22
◼
►
So last week, Reader 2 launched, which I highly recommend.
00:03:27
◼
►
It's an excellent app.
00:03:28
◼
►
I was beta testing it for a long time,
00:03:30
◼
►
and Silvio did a really good job of building it.
00:03:33
◼
►
And it feels right at home on iOS 7.
00:03:34
◼
►
It's really sort of slick and has some cool, nice touches.
00:03:38
◼
►
And I really like it.
00:03:40
◼
►
And I was actually really excited when it launched,
00:03:43
◼
►
because it has a lot of features that people
00:03:46
◼
►
have been asking for.
00:03:47
◼
►
has full support for feed wrangler,
00:03:48
◼
►
it has smart stream support.
00:03:50
◼
►
It does a lot of things that I was really excited about.
00:03:52
◼
►
One thing I did not anticipate, however,
00:03:54
◼
►
which is perhaps a little bit of foolishness on my side,
00:03:57
◼
►
is that it was launched as a new app.
00:03:59
◼
►
And so everybody who uses it, which
00:04:02
◼
►
is a pretty high proportion of my user base,
00:04:05
◼
►
bought it and then proceeded to resync their entire article
00:04:10
◼
►
lists into it, which meant that my already kind of--
00:04:16
◼
►
A little bit on the edge, not a lot of headroom server infrastructure was completely crushed
00:04:20
◼
►
and destroyed.
00:04:21
◼
►
And so last Wednesday, I believe it was when it launched, all of a sudden things just went
00:04:26
◼
►
from bad to worse.
00:04:27
◼
►
And the things that I had been kind of scraping by with and my patches and sort of intermediate
00:04:33
◼
►
solutions had been working, but were not working well enough at this point, without amount
00:04:40
◼
►
of traffic, where I think I was going to something about two or three times at least normal traffic
00:04:45
◼
►
load, and especially the kind of traffic that it was, was very--
00:04:49
◼
►
was much more difficult to deal with than normal traffic.
00:04:52
◼
►
Because normal traffic, most applications
00:04:54
◼
►
are only asking for recent things.
00:04:56
◼
►
They're asking for, what are the new articles
00:04:58
◼
►
since a particular time?
00:05:00
◼
►
Whereas now, when you do a full sync, you kind of go back
00:05:04
◼
►
and go, OK, what are the articles they have,
00:05:07
◼
►
so sort of a week ago or a month ago, depending on the client.
00:05:10
◼
►
And you're kind of going this far more back,
00:05:13
◼
►
this deep, backward search.
00:05:14
◼
►
And so that's very hard from a caching perspective to deal with, because rather than dealing
00:05:18
◼
►
with the normal working set, you're dealing with essentially just an approximation of
00:05:23
◼
►
the entire working set of the database, which is 200 gigabytes, which is too large to reasonably
00:05:29
◼
►
And so everything just kind of fell apart.
00:05:32
◼
►
And so I arrived at the place of then having to try and deal with, well, what do I do with
00:05:41
◼
►
What can I do?
00:05:42
◼
►
do I keep trying to solve this the way that I have,
00:05:47
◼
►
with introducing new levels of caching,
00:05:50
◼
►
new levels of complexity,
00:05:52
◼
►
splitting traffic in different and interesting ways,
00:05:54
◼
►
or should I really just sort of do what I probably
00:05:57
◼
►
should have done from the start,
00:05:58
◼
►
and that is kind of throw money at the problem,
00:06:00
◼
►
is one way to say it.
00:06:02
◼
►
And I'm gonna kind of get into it a little bit later,
00:06:03
◼
►
but the main lesson that I've been,
00:06:06
◼
►
as I've been trying to boil down my experience
00:06:07
◼
►
over the last couple of days,
00:06:09
◼
►
is into sort of like an aphorism,
00:06:11
◼
►
I was going to try and put it concisely, is that the lesson I think I've learned is that
00:06:15
◼
►
I should try to avoid to solve with cleverness that which I could solve with money.
00:06:21
◼
►
And by that I mean I've recently gone through the process of migrating the entire feed wrangler
00:06:26
◼
►
back end to dedicated bare metal, ridiculously fast machines that are hosted, I host them
00:06:34
◼
►
with soft layer, but it's basically I'm literally buying a server that's incredibly beefy, you
00:06:40
◼
►
know, super fast, big SSD drives in a RAID configuration, the dual processing, big Xeon
00:06:48
◼
►
Like, it is a beefy, beefy machine.
00:06:50
◼
►
And it certainly cost more than what I was paying at Linode, but it means that my performance
00:06:54
◼
►
now is being-- essentially, my performance problem is being solved by just spending a
00:07:00
◼
►
lot more on the server.
00:07:02
◼
►
And the experience of going through this process has been kind of interesting, and I'll talk
00:07:07
◼
►
about that now.
00:07:08
◼
►
And it's also just--
00:07:09
◼
►
I really think I've learned a lot of things
00:07:11
◼
►
by going through this.
00:07:12
◼
►
So the process was essentially--
00:07:14
◼
►
so on Wednesday, the entire system just
00:07:17
◼
►
started grinding to a halt. Like,
00:07:18
◼
►
my error rates were getting really high,
00:07:20
◼
►
and just things were going crazy.
00:07:22
◼
►
If you're familiar with the load parameter,
00:07:25
◼
►
if you type top on any Unix or Linux machine,
00:07:28
◼
►
you can get a load parameter.
00:07:29
◼
►
And every now and then, my databases
00:07:31
◼
►
were spiking to loads of like 20 or 30,
00:07:34
◼
►
which is completely insane and definitely not what you want.
00:07:36
◼
►
You want to load to probably be between 0 and the number of cores
00:07:41
◼
►
on your machine at most.
00:07:42
◼
►
So you're at least nowhere higher than at least 8.
00:07:45
◼
►
8's still pretty high, but you don't want
00:07:47
◼
►
to be nowhere up in here, 20 or 30.
00:07:50
◼
►
And so my I/O was just really crushed.
00:07:54
◼
►
It was essentially running at 100% utilization the entire time.
00:07:57
◼
►
And so I started looking at my options.
00:07:59
◼
►
The first thing I thought about is, OK, is it just
00:08:01
◼
►
because Linux doesn't have SSDs?
00:08:04
◼
►
So I started looking at DigitalOcean, which is another VPS provider, which does have the SSD.
00:08:09
◼
►
And so I took my database, replicated it over there, tried it out.
00:08:13
◼
►
And the reality was it was a little bit faster, but it was a marginal improvement.
00:08:17
◼
►
And at this point I just started to think about, "Well, you know what I need to do?
00:08:22
◼
►
I think I just need to try it. What is the most performant thing that I could possibly do?"
00:08:25
◼
►
And that sort of led me to just deciding that, "You know what I need to do?
00:08:30
◼
►
to do, I'm just going to put this on a dedicated host.
00:08:32
◼
►
And so I've since, in the last few days,
00:08:35
◼
►
over the course of, I think it was three sort of sleepless
00:08:38
◼
►
nights doing this all between 2 and 5 in the morning,
00:08:42
◼
►
so that it has the least impact on customers.
00:08:45
◼
►
The reality is the thing was barely holding together
00:08:48
◼
►
as it was, so I suppose I could have done it
00:08:50
◼
►
in the middle of the day in some ways.
00:08:51
◼
►
But through a lot of sleepless nights,
00:08:53
◼
►
I was able to take the entire infrastructure that
00:08:56
◼
►
was currently on Linode and has now
00:08:58
◼
►
been migrated onto bare metal servers at SoftLayer.
00:09:03
◼
►
And that process, I will say, is a little bit insane,
00:09:06
◼
►
and it's something that I hope to not have to do again.
00:09:08
◼
►
It's kind of, the best analogy I can think for it
00:09:11
◼
►
is it's like changing the wheel in your car
00:09:14
◼
►
while you're driving it.
00:09:15
◼
►
'Cause you're trying to move these things
00:09:18
◼
►
and migrate them over while still dealing
00:09:20
◼
►
with all the incoming traffic.
00:09:21
◼
►
And so it was a little bit crazy and a little bit hectic,
00:09:24
◼
►
but thankfully it's done now.
00:09:25
◼
►
This is, I think, Tuesday, and so as of yesterday,
00:09:27
◼
►
about Monday morning, I did the last transitions,
00:09:32
◼
►
and other than some lingering traffic
00:09:35
◼
►
that still hasn't updated its DNS,
00:09:37
◼
►
everything's now hitting the soft layer stuff,
00:09:41
◼
►
and it's blazing.
00:09:43
◼
►
It's kind of remarkable.
00:09:44
◼
►
And at first, I wasn't even really sure what to expect
00:09:46
◼
►
when I went to this new server,
00:09:49
◼
►
but the reality is it's been incredibly faster.
00:09:51
◼
►
Not just a little bit, not like, "Oh, yeah,
00:09:53
◼
►
it's a 20%, 30% improvement."
00:09:53
◼
►
Everything is at least two to five times faster.
00:09:56
◼
►
And many operations are 10 times faster than they were.
00:09:59
◼
►
And they're much more consistent, too,
00:10:01
◼
►
which is even probably the more important thing.
00:10:03
◼
►
Because even if it was the same speed as it had been before,
00:10:06
◼
►
but the variance between different operations
00:10:11
◼
►
was better, I would have been happy.
00:10:13
◼
►
But it's both faster and more consistent.
00:10:15
◼
►
And I love being able to see this objectively, where
00:10:18
◼
►
I was looking at my server stats for this.
00:10:21
◼
►
And I was able to see that my typical response times before,
00:10:25
◼
►
like my overall average response time for all requests,
00:10:27
◼
►
was somewhere around 750 milliseconds to one second
00:10:33
◼
►
And as of right now, it's consistently
00:10:36
◼
►
around 200 milliseconds, and often even as low
00:10:39
◼
►
as 150 milliseconds.
00:10:41
◼
►
So that's about a five times improvement
00:10:44
◼
►
or so in response time.
00:10:46
◼
►
And that's average, so certainly some things are slower
00:10:48
◼
►
and some things are faster.
00:10:49
◼
►
That really means that a lot of things are much, much faster than they were before.
00:10:53
◼
►
And so that's the experience that I've had.
00:10:55
◼
►
And I think it's interesting as a broader lesson to understand that if there are things
00:11:00
◼
►
that you can do, it's very easy, I think, in the heat of the moment to be sort of pennywise
00:11:06
◼
►
and pound foolish, where I was looking at hosting at Linode as kind of a good option
00:11:12
◼
►
because it sort of did what I wanted, and it had close enough performance characteristics
00:11:18
◼
►
that it was possible. But the reality was the amount of time that I've probably spent
00:11:23
◼
►
in the last three or four months working on performance issues and tuning and caching
00:11:27
◼
►
and dealing with all these kinds of issues that would have completely gone away if I
00:11:31
◼
►
had just decided to start off investing the money and the resources into just taking,
00:11:39
◼
►
sort of letting the hardware solve the problem for me, I think the product would have been
00:11:43
◼
►
better. And that's sort of the tragedy of this. And that's why I wanted to kind of have
00:11:47
◼
►
an episode talking about this experience, is that making sure that when you're spending
00:11:53
◼
►
time on something, understanding whether that time could be better spent doing something
00:11:57
◼
►
else, and if you could outsource that need to something that you can solve with money,
00:12:05
◼
►
or with resources and whatever that looks like. And it's a funny thing to say, in some
00:12:08
◼
►
ways, to say that, "Well, what if you can't afford it?" Well, the reality is if you're
00:12:13
◼
►
you're spending your time on something that's sort of the-- if spending time, your own time,
00:12:19
◼
►
is the alternative, then the real question is, you know, can you afford that time? Because time is
00:12:23
◼
►
even more valuable than-- or much more constrained and limited, often, than money is. And so looking
00:12:30
◼
►
into ways to avoid spending time when you can solve it with money is very important. And it's
00:12:35
◼
►
like I really-- I also fell into the trap, I think, as an engineer, of thinking that the-- in some
00:12:42
◼
►
is I'd be cheating, or it's like, oh, I'd be a copping out
00:12:47
◼
►
or whatever if I didn't solve these problems with technology.
00:12:52
◼
►
If I wasn't being super clever, and I kept being like, oh, what if I did this?
00:12:57
◼
►
What if I did that? And I kept viewing it as this engineering problem.
00:13:02
◼
►
When the reality is there was a solution and a way to avoid it, to address this problem that didn't require cleverness.
00:13:08
◼
►
that wasn't about how smart I was,
00:13:13
◼
►
and it wasn't this problem that I needed to solve.
00:13:15
◼
►
I could just say, "You know what?
00:13:17
◼
►
I know it's like I can just move this to something
00:13:19
◼
►
where it'll be much, much, much better."
00:13:23
◼
►
And that's just something, it's exactly what that means,
00:13:25
◼
►
and the implications that might be for yourself
00:13:28
◼
►
and your own applications are certainly going to be different.
00:13:30
◼
►
It's, are you solving, would it be better to outsource
00:13:34
◼
►
a particular part of it to an expert in the field?
00:13:34
◼
►
Is there a part of your business that you
00:13:35
◼
►
can outsource in that way?
00:13:37
◼
►
Are there things that you're spending a lot of time on that
00:13:40
◼
►
could be solved in another way?
00:13:43
◼
►
And so the lesson that I've learned from this,
00:13:45
◼
►
and the thing that I've taken away,
00:13:46
◼
►
is that, A, it's hard to move servers.
00:13:49
◼
►
Mind the lesson.
00:13:51
◼
►
But the more important thing is, don't solve with cleverness
00:13:53
◼
►
that which you can solve with money, with resources.
00:13:57
◼
►
And so that's kind of the experience that I've had.
00:13:59
◼
►
And it's really exciting now.
00:14:01
◼
►
When I look at Feed Wrangler, I'm
00:14:02
◼
►
really excited about some of the things
00:14:03
◼
►
that I can do with it and with PodWrangler,
00:14:06
◼
►
and being ready for PodWrangler's launch
00:14:08
◼
►
in a different way, that I'm-- I've got tremendous amounts
00:14:10
◼
►
of headroom on my servers now that I can grow and play with
00:14:13
◼
►
and use that I just didn't have before.
00:14:15
◼
►
And so I'm excited about that.
00:14:17
◼
►
And I think that's ultimately the biggest plus,
00:14:19
◼
►
is that I can focus on my apps.
00:14:20
◼
►
I can focus on my user experience.
00:14:22
◼
►
I can focus on things that people will notice and enjoy
00:14:26
◼
►
in a way that I couldn't when I spent so much time and energy
00:14:29
◼
►
focused on making things just run in the first place.
00:14:32
◼
►
And so that's my experience.
00:14:34
◼
►
That's been my last couple of days.
00:14:35
◼
►
It's been a little rough, but I got through it.
00:14:37
◼
►
And I think I've learned some things from it, which I'll
00:14:39
◼
►
hopefully be able to implement in the future.
00:14:40
◼
►
And anyway, that's it for today's show.
00:14:42
◼
►
As always, if you have questions, comments,
00:14:43
◼
►
concerns, complaints, I'm on Twitter
00:14:45
◼
►
@_davidsmith, david@developingprospective.com.
00:14:47
◼
►
Otherwise, if you have a great week, enjoy your new iPhones,
00:14:50
◼
►
enjoy your new iOSes.
00:14:51
◼
►
And I'll talk to you guys next week.