158: Bundled Data
00:00:00
◼
►
- Welcome to Under the Radar,
00:00:01
◼
►
a show about independent iOS app development.
00:00:04
◼
►
I'm Mark Warmant.
00:00:05
◼
►
- And I'm David Smith.
00:00:06
◼
►
Under the Radar is never longer than 30 minutes,
00:00:08
◼
►
so let's get started.
00:00:10
◼
►
So today we wanted to talk a little bit about
00:00:13
◼
►
what I kind of think about as like sidecar data,
00:00:17
◼
►
things that kind of, data that you need to often ship
00:00:21
◼
►
with your app or provide to your app
00:00:23
◼
►
that supports its function, but isn't necessarily
00:00:26
◼
►
like the main content of the app.
00:00:29
◼
►
And what's interesting about this,
00:00:31
◼
►
and we have a couple of recent things that both you and I
00:00:34
◼
►
have been working through with this, Marco,
00:00:35
◼
►
where there's lots and lots of different ways
00:00:38
◼
►
that you can actually package up data like this
00:00:41
◼
►
to include it in your app.
00:00:43
◼
►
You can include it all the way from the extreme
00:00:45
◼
►
of like including it in code to shipping configuration files
00:00:50
◼
►
or shipping a database with it,
00:00:53
◼
►
preloading like a core data database.
00:00:55
◼
►
You can download it from the internet as JSON or PLIST,
00:00:59
◼
►
you can download it from the internet as a database file.
00:01:02
◼
►
You can bundle it with, I think Apple even has
00:01:05
◼
►
a distribution system where you can add assets
00:01:08
◼
►
that are downloaded on demand.
00:01:09
◼
►
Like there is a tremendous variety of things
00:01:11
◼
►
that you can do with this.
00:01:12
◼
►
- I forgot about that entire system.
00:01:14
◼
►
- I think it's primarily used for games, I think.
00:01:17
◼
►
I think mostly it's used for situations like that.
00:01:19
◼
►
But anyway, we'll get into why you might want to use that.
00:01:22
◼
►
But it's a situation that I think happens often,
00:01:26
◼
►
and there's all these weird trade-offs that you have
00:01:27
◼
►
with speed, performance, first run experience,
00:01:31
◼
►
time and downloading, but it happens more often
00:01:33
◼
►
than you will.
00:01:35
◼
►
And I think this first came to mind, I think,
00:01:37
◼
►
is something that is probably worth you explaining
00:01:39
◼
►
for how you implemented instant search
00:01:42
◼
►
in Overcast, the most latest update,
00:01:45
◼
►
which I think was a really clever system
00:01:47
◼
►
and approach to taking, sort of solving this kind
00:01:49
◼
►
of a problem.
00:01:50
◼
►
- Yeah, sure.
00:01:51
◼
►
And I went into it in detail on ATP,
00:01:53
◼
►
so I won't go into too far detail here,
00:01:54
◼
►
but basically Overcast has a new instant search feature
00:01:59
◼
►
which downloads about once a week.
00:02:01
◼
►
It downloads a search index from my servers
00:02:04
◼
►
and stores it locally, and then when you perform a search,
00:02:07
◼
►
it first hits that before it gets the results back
00:02:10
◼
►
from the server, and then it puts the server results
00:02:12
◼
►
below those results.
00:02:13
◼
►
So you can start typing, you can immediately get results
00:02:16
◼
►
from that local index while you're waiting
00:02:18
◼
►
for the network request of the server for the other results.
00:02:21
◼
►
And I think so of many of these decisions
00:02:25
◼
►
and trade-offs with this, because one thing I considered
00:02:28
◼
►
was obviously it has to be very small.
00:02:31
◼
►
The index can't be like 100 megs.
00:02:33
◼
►
That's a bit much to force people to download
00:02:35
◼
►
in the background once a week without even them knowing
00:02:38
◼
►
about it or without asking them or without providing
00:02:39
◼
►
any controls for it.
00:02:40
◼
►
And it didn't seem like an important enough feature
00:02:42
◼
►
to have a preference to turn it off or have it off
00:02:44
◼
►
by default or anything like that.
00:02:46
◼
►
But, so the file's small, it's about three and a half megs
00:02:51
◼
►
right now, and I said once a week downloads.
00:02:54
◼
►
But I also thought, should I bundle one with the app?
00:02:58
◼
►
Should I have in the app bundle your first search index
00:03:01
◼
►
and then whenever the app gets a chance,
00:03:03
◼
►
it downloads whatever's current from the internet.
00:03:05
◼
►
And I thought, that's not great, because then I'm forcing
00:03:09
◼
►
everyone for the next month or month and a half
00:03:13
◼
►
or two months until I do my next app update,
00:03:15
◼
►
I'm forcing them to download this data that will,
00:03:18
◼
►
one week in, be out of date.
00:03:20
◼
►
And that seemed like a bad idea to put in my app bundle.
00:03:24
◼
►
But if it's something that didn't change very often,
00:03:27
◼
►
I absolutely would have put it in there.
00:03:29
◼
►
No question I would have put it in the app bundle.
00:03:31
◼
►
And I might in the future build in a very small default one
00:03:34
◼
►
just to have some kind of results there for the most
00:03:38
◼
►
popular searches or something like that.
00:03:40
◼
►
But that was one consideration right there of like,
00:03:43
◼
►
do you bundle it into the app bundle directly
00:03:46
◼
►
or do you download it after installation?
00:03:48
◼
►
And I think that one is a very easy one for me.
00:03:52
◼
►
I think you can bundle it in only if it's either
00:03:57
◼
►
super critical to the app's functionality,
00:03:59
◼
►
so that that way if someone launches the app
00:04:01
◼
►
and it hasn't downloaded it yet, they can still achieve
00:04:04
◼
►
what they need to achieve in the app.
00:04:05
◼
►
So if it's super critical functionality,
00:04:07
◼
►
it needs to be in the app bundle.
00:04:09
◼
►
Or if it's something that very rarely ever changes,
00:04:12
◼
►
that you can just update it when you issue app updates
00:04:15
◼
►
and it's no big deal to have that kind of frequency
00:04:17
◼
►
of updates, then it makes sense.
00:04:19
◼
►
Build them into the app bundle and avoid
00:04:21
◼
►
all the other concerns.
00:04:23
◼
►
Like I think if it's a feature that very few people use,
00:04:27
◼
►
As long as it isn't like a massive file, still bundle it in.
00:04:30
◼
►
But if it's something that needs to be updated
00:04:31
◼
►
on a regular basis, that's when you look at downloads.
00:04:34
◼
►
- Yeah and I think too with that,
00:04:36
◼
►
I've in a variety of different apps,
00:04:38
◼
►
I've done different versions of this.
00:04:40
◼
►
And a lot of times what I find too,
00:04:43
◼
►
there's something nice about the app
00:04:46
◼
►
being self-contained that you can avoid a situation
00:04:51
◼
►
where someone downloads the app when they have connectivity
00:04:55
◼
►
and then their first launch,
00:04:57
◼
►
if their first launch of the app requires connectivity
00:04:59
◼
►
to do something, it's kind of a bad experience.
00:05:03
◼
►
And so too I think it's always,
00:05:05
◼
►
even if that initial cache may be something
00:05:08
◼
►
that will get invalidated or will be immediately updated
00:05:11
◼
►
when they launch the app, it's something
00:05:13
◼
►
that I've done a couple times where I'm trying
00:05:15
◼
►
to make sure that there's almost this basic usefulness
00:05:20
◼
►
that the app can have right out of that initial download.
00:05:24
◼
►
That you don't assume that well,
00:05:25
◼
►
if they had, obviously they had internet connectivity
00:05:28
◼
►
when they downloaded the app,
00:05:29
◼
►
of course they're gonna have it
00:05:30
◼
►
when the first time they launch it.
00:05:31
◼
►
But I don't think you can necessarily rely on that.
00:05:33
◼
►
And obviously depending on what the app is,
00:05:35
◼
►
that's more or less relevant.
00:05:37
◼
►
If it's a, like my first app that I ever actually shipped
00:05:40
◼
►
into the App Store, this is over 10 years ago now,
00:05:43
◼
►
was a reference app that showed you
00:05:45
◼
►
like cost of living per diem stuff for travelers.
00:05:50
◼
►
And for that one, it's like the app is completely useless
00:05:52
◼
►
if it doesn't have its database.
00:05:53
◼
►
And that first version I shipped it,
00:05:55
◼
►
just had like a plist file as its database,
00:05:57
◼
►
which subsequently got upgraded to a SQLite database
00:06:00
◼
►
that was shipped with the bundle.
00:06:01
◼
►
But it was one of those things where I wanted to make sure
00:06:04
◼
►
that as long as you got all of the app, it worked.
00:06:08
◼
►
And that was all you had to,
00:06:10
◼
►
all the connectivity that was required.
00:06:12
◼
►
And also I think it's something that always sticks
00:06:15
◼
►
in the back of my mind, is that one nice thing
00:06:18
◼
►
about shipping it with the bundle is that you are not,
00:06:23
◼
►
it doesn't require you to be maintaining something else
00:06:26
◼
►
in order for the app to continue to function.
00:06:29
◼
►
So as long as they have access to that app bundle,
00:06:33
◼
►
they will be able to use the app in some ways.
00:06:35
◼
►
If at some point you lose interest in the app,
00:06:38
◼
►
it kind of starts to fall away
00:06:39
◼
►
and you turn off the web server
00:06:41
◼
►
where all of the content was being served from
00:06:44
◼
►
or that cost becomes unsustainable,
00:06:47
◼
►
it's kind of nice in almost like
00:06:50
◼
►
a software preservation perspective or whatever,
00:06:52
◼
►
that even several years later,
00:06:54
◼
►
as long as Apple is still a thing
00:06:57
◼
►
that is serving assets from the App Store,
00:06:59
◼
►
someone could download the app or re-download it
00:07:01
◼
►
from their purchased history and it could still work.
00:07:05
◼
►
So that's just another thing that's kind of nice
00:07:06
◼
►
about bundling it directly into the app itself.
00:07:09
◼
►
- Yeah, exactly.
00:07:10
◼
►
Because one of the reasons why I did instant search
00:07:13
◼
►
was that when I have server issues like I did
00:07:17
◼
►
over the holidays, I did periods I talked about,
00:07:20
◼
►
if I have server issues, search is critical to a podcast app
00:07:23
◼
►
because it's the first thing people do to add podcasts
00:07:27
◼
►
and it's a very frequent thing to search for new ones
00:07:30
◼
►
and I wanted to make sure that was always gonna be fast
00:07:32
◼
►
and even though if my servers are totally down,
00:07:35
◼
►
you can't subscribe to a new podcast,
00:07:37
◼
►
so I'm not totally immune here,
00:07:39
◼
►
but I at least have a level of protection here.
00:07:42
◼
►
If there's spotty connectivity or server issues here,
00:07:47
◼
►
I'm giving them a better experience
00:07:48
◼
►
than I otherwise would have.
00:07:50
◼
►
- Yeah, and I think in general,
00:07:52
◼
►
that's a great use for this type of,
00:07:54
◼
►
it's essentially what a lot of this kind of data becomes
00:07:59
◼
►
is essentially it's advanced caching.
00:08:02
◼
►
It's different, it's a more robust version of caching
00:08:07
◼
►
that you might do ahead of time,
00:08:09
◼
►
so you can call it pre-warmed caches,
00:08:12
◼
►
but it's things that you could eventually download
00:08:16
◼
►
from the internet potentially
00:08:17
◼
►
or that may be part of the normal use of the app
00:08:19
◼
►
is that in your case, with search,
00:08:21
◼
►
most of their results eventually are gonna be coming
00:08:23
◼
►
from the web server, but as fast as that might be,
00:08:28
◼
►
as optimized as you might make it,
00:08:30
◼
►
it's still never gonna be as fast as in memory or on device.
00:08:35
◼
►
Those are situations that are always gonna be better
00:08:38
◼
►
and it's a dramatic difference in user experience
00:08:41
◼
►
to have it locally there by pre-warming those caches.
00:08:44
◼
►
I mean, you can sort of get around this sometimes
00:08:46
◼
►
where as soon as the app launches,
00:08:49
◼
►
it goes off and starts pre-warming its caches or things
00:08:52
◼
►
so that by the time someone goes to the search tree
00:08:54
◼
►
and you may have already gotten some of this and so on,
00:08:57
◼
►
but it's never gonna be,
00:08:59
◼
►
it's always a tricky balance to know the timing of that
00:09:03
◼
►
and the resources of that and having that be ready
00:09:06
◼
►
exactly when you want it to be there.
00:09:08
◼
►
We are sponsored this week by Linode.
00:09:10
◼
►
With Linode, you can instantly deploy and manage
00:09:13
◼
►
an SSD server in the Linode cloud
00:09:15
◼
►
and you can get a server running in just seconds
00:09:17
◼
►
with your choice of Linux distro,
00:09:19
◼
►
resources and node location.
00:09:22
◼
►
Linode serves their customers with the help of 10 data
00:09:25
◼
►
centers around the globe and they're about to add more.
00:09:28
◼
►
Mumbai, India and Toronto, Canada will both have
00:09:31
◼
►
data centers by 2020.
00:09:32
◼
►
Linode also features native SSD storage
00:09:35
◼
►
on all of their servers.
00:09:36
◼
►
They have a 40 gigabit network behind it all
00:09:39
◼
►
and they use Intel Xeon E5 CPUs.
00:09:41
◼
►
This means you have amazing, fast hardware and networking
00:09:45
◼
►
to serve your customers as fast as possible.
00:09:48
◼
►
And you don't have to worry about overspending
00:09:50
◼
►
because Linode has designed their pricing tiers
00:09:52
◼
►
to feature hourly billing with the added bonus
00:09:54
◼
►
of a monthly cap on all plans and add-on services
00:09:58
◼
►
including backup services and node balancers.
00:10:00
◼
►
Linode has fantastic pricing options to suit everyone.
00:10:04
◼
►
The plans start at one gig of RAM for just $5 a month
00:10:07
◼
►
and then for high memory plans starting with 16 gigs of RAM.
00:10:10
◼
►
And Linode has a special offer for our listeners.
00:10:12
◼
►
Listeners of the show can go to linode.com/radar
00:10:15
◼
►
and use promo code radar2019 to get $20
00:10:19
◼
►
towards any Linode plan.
00:10:21
◼
►
Once again, it's linode.com/radar,
00:10:23
◼
►
promo code radar2019 to get $20 towards a plan.
00:10:27
◼
►
So on the one gig of RAM plan,
00:10:28
◼
►
that could be four months for free.
00:10:30
◼
►
And with a seven day money back guarantee,
00:10:33
◼
►
you have nothing to lose.
00:10:33
◼
►
So give Linode a try today.
00:10:35
◼
►
Linode.com/radar, promo code radar2019.
00:10:38
◼
►
Thank you so much to Linode for hosting everything
00:10:41
◼
►
I run on the internet and for sponsoring
00:10:44
◼
►
our show and Relay FM.
00:10:45
◼
►
- So what I think would be interesting
00:10:48
◼
►
to kind of walk through now is the spectrum of ways
00:10:53
◼
►
that you can ship code or ship data with your app.
00:10:57
◼
►
And this is, there's a tremendous variety
00:10:59
◼
►
and each of them has different trade-offs
00:11:01
◼
►
that I think are interesting to talk about.
00:11:02
◼
►
And I'll start with the most like, I don't even know,
00:11:05
◼
►
like bare metal silly version,
00:11:07
◼
►
which is you can ship data in code.
00:11:11
◼
►
And this is something that you almost certainly
00:11:13
◼
►
you are doing to some degree, like some amount,
00:11:15
◼
►
obviously there's a certain amount of data
00:11:17
◼
►
in your application, irrespective of almost anything
00:11:21
◼
►
in terms of even you could think about
00:11:22
◼
►
like localizable strings is a form of data
00:11:25
◼
►
that is being shipped with your app.
00:11:27
◼
►
But even if it isn't like quite that straightforward
00:11:30
◼
►
degree, like you could just have an array of static strings
00:11:34
◼
►
that are like a standard list that is often shown
00:11:37
◼
►
in your application that you have to choose from.
00:11:38
◼
►
Like for example, in my workout app, Workouts++,
00:11:41
◼
►
I have, you know, there's a bunch of data
00:11:44
◼
►
that is associated with each of the different workout types.
00:11:47
◼
►
You know, you walking, running, doing yoga, whatever it is.
00:11:51
◼
►
And a lot of the information about those,
00:11:53
◼
►
I just store in code.
00:11:55
◼
►
Like I have, you know, sort of lookup tables and things
00:11:58
◼
►
that are just dictionaries that are just static values
00:12:01
◼
►
that I ship in the application.
00:12:02
◼
►
And this approach sort of works well in some ways
00:12:06
◼
►
that it's very straightforward.
00:12:10
◼
►
You know, it doesn't scale well.
00:12:11
◼
►
It doesn't work if you, it works fine
00:12:13
◼
►
for like 20 or 30 values.
00:12:15
◼
►
It doesn't scale well if you have 20 or 30,000 values.
00:12:18
◼
►
You know, it's not something that you would necessarily
00:12:20
◼
►
wanna put a huge amount of things into,
00:12:23
◼
►
but can be really straightforward.
00:12:24
◼
►
And it certainly is something that I think we all do
00:12:27
◼
►
at some point is there's a certain amount of data
00:12:30
◼
►
that is just in the code.
00:12:32
◼
►
The downside of course is that it's in the code.
00:12:34
◼
►
It's not something that you can update
00:12:36
◼
►
unless you update the application.
00:12:38
◼
►
You're tying your data versioning to your code versioning.
00:12:41
◼
►
So like in terms of from a version management perspective,
00:12:43
◼
►
you're tying those two things together,
00:12:45
◼
►
which can not be a great situation.
00:12:49
◼
►
If you are doing this, you know,
00:12:52
◼
►
if you're pushing the limits of this,
00:12:53
◼
►
like Xcode will start to get really grumpy
00:12:55
◼
►
if you are actually like, you know, having, you know,
00:12:58
◼
►
the source code files that have, you know,
00:13:01
◼
►
several thousand lines that are all just string literals.
00:13:03
◼
►
Like you could do that, but it probably is gonna make
00:13:06
◼
►
your life uncomfortable even just from a day-to-day use
00:13:10
◼
►
and you know, compile times and opening the file perspective.
00:13:13
◼
►
Like it's, or the thing that often will, you know,
00:13:16
◼
►
bring source editors to their needs
00:13:17
◼
►
is like trying to do syntax coloring
00:13:19
◼
►
on these big massive string files.
00:13:22
◼
►
Like you'll just see, like you'll actually watch
00:13:24
◼
►
the coloring move down the page sometimes.
00:13:27
◼
►
Like if you ever see that, you've gone too far on this.
00:13:30
◼
►
Like it's appropriate at the small level
00:13:33
◼
►
and it's super helpful because it's super performant.
00:13:35
◼
►
Like these are strings directly in like the code space
00:13:38
◼
►
of the application, like it's right there.
00:13:41
◼
►
Don't go too crazy with that,
00:13:42
◼
►
but it's certainly something that is available to you
00:13:48
◼
►
And I think like the next one up from that,
00:13:50
◼
►
I think is probably where you start to get into
00:13:53
◼
►
serialized data structures that are not shipped in your code
00:13:58
◼
►
but are functionally like loaded into memory.
00:14:01
◼
►
So this is where I start to think of like
00:14:03
◼
►
if you have a P list or a JSON blob or something like that
00:14:07
◼
►
that you are storing, you know,
00:14:08
◼
►
either shipping with the app like we were talking about
00:14:11
◼
►
or downloading from the internet, you could go either way.
00:14:13
◼
►
But you know, you have a data structure
00:14:15
◼
►
that you're functionally gonna be like, you know,
00:14:18
◼
►
NS array, you know, in it with contents from UI
00:14:23
◼
►
URL kind of a thing.
00:14:24
◼
►
And you're going to just immediately slurp it up into memory
00:14:27
◼
►
and you'll have an in-memory dictionary
00:14:29
◼
►
or an in-memory array.
00:14:31
◼
►
And that's how you're gonna access that data.
00:14:34
◼
►
And this I find is something that I probably use most often.
00:14:39
◼
►
It's very flexible in terms of it's nice that you can,
00:14:44
◼
►
the data is external to your code.
00:14:46
◼
►
So you're not creating that dependency.
00:14:48
◼
►
It's, you know, Xcode is perfectly happy
00:14:50
◼
►
to have a resource file in its bundle
00:14:51
◼
►
that could be bigger than you'd actually want to edit there.
00:14:55
◼
►
It's reasonably performant.
00:14:57
◼
►
It works pretty well for a medium size,
00:14:59
◼
►
like small to medium sized data sets.
00:15:02
◼
►
And like I'm actually in the process right now.
00:15:05
◼
►
I mean, this is relevant for me
00:15:06
◼
►
because I'm working on pedometer plus plus
00:15:08
◼
►
on a time zone feature.
00:15:09
◼
►
And I need to have a list of, you know,
00:15:12
◼
►
popular cities and their time zones.
00:15:14
◼
►
So you can choose time zones from cities.
00:15:17
◼
►
And it's one of those things where it falls
00:15:19
◼
►
into that kind of middle ground
00:15:20
◼
►
where ultimately I'm gonna have maybe like 1,800 cities
00:15:25
◼
►
or so, like not so many that I feel like I need to go full,
00:15:28
◼
►
like have, you know, have a database
00:15:31
◼
►
that I can, you know, make SQL queries against.
00:15:33
◼
►
It's not that many, but it's big enough
00:15:36
◼
►
that I need to externalize the data somehow.
00:15:38
◼
►
And so right now, what I've been doing
00:15:40
◼
►
that seems to work pretty well
00:15:41
◼
►
is I'm just putting it into a plist file.
00:15:43
◼
►
And I just load that plist file in whenever the user
00:15:47
◼
►
happens to want to do this search.
00:15:49
◼
►
You know, it's slightly big in memory, but not crazy.
00:15:51
◼
►
And it's an operation that doesn't happen very often.
00:15:54
◼
►
So it isn't something that I'm super worried about,
00:15:57
◼
►
but you can just take that, you know,
00:15:59
◼
►
it's you're functionally just taking a data structure
00:16:01
◼
►
that you would normally have in memory
00:16:03
◼
►
and you can write it to disk and you can read it from disk.
00:16:05
◼
►
And, you know, you can do the same thing.
00:16:06
◼
►
You can tell a dictionary or an array and say like,
00:16:08
◼
►
you know, save, you know, write to file
00:16:11
◼
►
and it'll, you know, essentially put out a binary plist
00:16:13
◼
►
that you can then really, you know, reload back in.
00:16:18
◼
►
Or you can go even more extreme and you can use like
00:16:20
◼
►
the archiving APIs in, you know, in iOS
00:16:24
◼
►
where you can take an arbitrary data structure
00:16:27
◼
►
and be, you know, be rewriting that to disk
00:16:30
◼
►
and reading that in and then it starts to get
00:16:31
◼
►
a little bit more advanced and a bit more complicated
00:16:33
◼
►
and you should start starting to think,
00:16:35
◼
►
maybe I want to do this in a database
00:16:36
◼
►
where I'm not the one having to like manage
00:16:39
◼
►
all this really sort of nuanced data management stuff.
00:16:42
◼
►
But that certainly is the most extreme version of this
00:16:45
◼
►
is when you start using like the keyed archiving
00:16:47
◼
►
and secure archiving and all of those types of things.
00:16:51
◼
►
- Yeah, and there's a number of trade-offs
00:16:52
◼
►
you have to consider when you're picking between these.
00:16:54
◼
►
So like one of the biggest ones that I think
00:16:57
◼
►
would bite a lot of people kind of unexpectedly
00:16:59
◼
►
is the true memory footprint of having
00:17:03
◼
►
one of these simpler solutions, like a plist file
00:17:05
◼
►
or a JSON file because not only do you have to consider
00:17:09
◼
►
that like the entire, like usually if you're parsing
00:17:12
◼
►
a plist or a JSON file, usually that means
00:17:15
◼
►
the entire contents of that file
00:17:17
◼
►
are going to be loaded into memory.
00:17:19
◼
►
It also usually means that they're going to be decoded
00:17:22
◼
►
into different data structures that will end up using
00:17:26
◼
►
more memory in actual like usage RAM
00:17:30
◼
►
in their native structures like NS dictionary,
00:17:31
◼
►
NS array and everything.
00:17:33
◼
►
That'll use more memory than the size of the file.
00:17:36
◼
►
So you really have to be careful if you're doing this
00:17:38
◼
►
to test, like to actually monitor, like load up your app,
00:17:42
◼
►
watch the memory usage meter in whatever developer tool
00:17:45
◼
►
that you're running it in, you know, instruments
00:17:47
◼
►
or the Xcode window like in the little like now running
00:17:49
◼
►
screen that shows the CPU memory and stuff.
00:17:51
◼
►
Watch what happens to the memory usage
00:17:52
◼
►
when you load that data in.
00:17:54
◼
►
It will probably go up by more than you think it will.
00:17:57
◼
►
And you know, so if it's like you know, one meg,
00:18:02
◼
►
who cares, right?
00:18:03
◼
►
If it's 20 megs or 50 megs or 100 megs,
00:18:06
◼
►
you should start caring.
00:18:07
◼
►
Like at that point, you have a much bigger footprint
00:18:10
◼
►
than you probably should and that can cause problems
00:18:13
◼
►
in things like getting your app kicked out of memory,
00:18:16
◼
►
having it maybe crash on older phones that don't want
00:18:18
◼
►
to give you that much memory or can't give you
00:18:19
◼
►
that much memory.
00:18:20
◼
►
You could start having potential just battery usage issues
00:18:23
◼
►
of just having lots of RAM thrashing in and out
00:18:26
◼
►
and everything so it's important to make sure
00:18:29
◼
►
that you're not gonna blow your memory budget
00:18:32
◼
►
by having just one big file.
00:18:34
◼
►
You can start doing a little more custom things
00:18:36
◼
►
like this is one of the things I hate about JSON
00:18:39
◼
►
is that there doesn't seem to be what in XML
00:18:43
◼
►
people used to call a SACS parser,
00:18:45
◼
►
which was basically a streaming parser that would like
00:18:48
◼
►
call your callbacks on begin tags, end tags,
00:18:51
◼
►
got tag contents, et cetera.
00:18:53
◼
►
And so you could basically stream an XML file
00:18:55
◼
►
through memory without ever loading the whole thing
00:18:57
◼
►
into memory.
00:18:58
◼
►
I don't know of any JSON streaming parsers,
00:19:01
◼
►
but anyway, you can kind of fake it,
00:19:03
◼
►
you can like do some custom work where like you have
00:19:05
◼
►
a file that you write in certain blocks and you have
00:19:08
◼
►
like custom ways of reading it so that each block
00:19:10
◼
►
is itself a smaller piece of JSON or whatever,
00:19:12
◼
►
that's pretty complicated.
00:19:13
◼
►
I think once you're getting to those kind of levels,
00:19:16
◼
►
you should be looking at it at just a database.
00:19:18
◼
►
And this is how I do my search index.
00:19:20
◼
►
I mentioned earlier my search index,
00:19:21
◼
►
my offline search index is about three and a half megs
00:19:24
◼
►
most of the time and so it is literally a SQLite file.
00:19:29
◼
►
And one of the great things about this as opposed
00:19:32
◼
►
to things like binary P list and everything,
00:19:34
◼
►
not only do you not have to load the whole thing
00:19:35
◼
►
into memory and everything, but it's also,
00:19:37
◼
►
it's easier to have server side tools that generate it
00:19:40
◼
►
because if you're not using an Apple platform,
00:19:44
◼
►
the tools for reading and writing Apple's P list format
00:19:46
◼
►
are weak at best and so if you can have something
00:19:50
◼
►
like SQLite or JSON, you can write server side,
00:19:54
◼
►
that's great.
00:19:55
◼
►
What's great about SQLite is that it is incredibly
00:19:58
◼
►
well optimized, like it's kind of shocking how fast
00:20:03
◼
►
SQLite is, like it shouldn't be as amazing as it is.
00:20:07
◼
►
There's a reason why it is very widespread.
00:20:10
◼
►
Even Apple uses it for a lot of OS stuff.
00:20:12
◼
►
A lot of stuff built into Mac OS and iOS is based
00:20:15
◼
►
on SQLite, including core data.
00:20:17
◼
►
There's very much reasons for that.
00:20:19
◼
►
Anyway, so just shipping around a SQLite file,
00:20:22
◼
►
I think is a very, very good solution.
00:20:24
◼
►
The only major downside to it is it is kind of overkill
00:20:28
◼
►
if you are dealing with a data set that's pretty small.
00:20:32
◼
►
So if you have like under 1,000 items in your data set,
00:20:36
◼
►
you probably don't need SQLite for that.
00:20:39
◼
►
And the other thing I would say is it does involve
00:20:41
◼
►
some code overhead that you have to write,
00:20:45
◼
►
you have to link to the SQLite dilib and you have to
00:20:49
◼
►
actually have SQLite calling functions and you might,
00:20:52
◼
►
you could use a wrapper library.
00:20:54
◼
►
My preferred one for that is FMDB by our friend Gus Mueller
00:20:58
◼
►
and I use it in Overcast, I use it as much as I can,
00:21:01
◼
►
it's wonderful.
00:21:02
◼
►
You can also use the direct C API but it's a little
00:21:05
◼
►
cumbersome so it's not necessarily the best option
00:21:08
◼
►
for most people but that's about the only downside
00:21:11
◼
►
is like yeah, you have to code in support for SQLite.
00:21:14
◼
►
Other than that, I strongly recommend it because one
00:21:16
◼
►
of the other things, like we're talking about memory costs
00:21:19
◼
►
and size considerations but another thing to consider
00:21:22
◼
►
is look up speed and if your data set is small,
00:21:25
◼
►
you know computers are fast, it's not gonna matter,
00:21:27
◼
►
you can scan through all of them just by doing like a dumb
00:21:30
◼
►
string match, is this the right title?
00:21:31
◼
►
No, is this the right title?
00:21:32
◼
►
No, just scan through the whole set.
00:21:35
◼
►
But when your set becomes large enough where that could
00:21:39
◼
►
be a problem, you actually will see significant gains
00:21:42
◼
►
by having some kind of index structure like having
00:21:45
◼
►
a binary tree lookup or something, that's what databases
00:21:48
◼
►
do so if you create an index on like a title column,
00:21:52
◼
►
you can have way faster lookups of potentially massive
00:21:56
◼
►
data sets and you can look them up basically instantly.
00:21:59
◼
►
Like I'm, again, like look at Overcast Instant Search,
00:22:02
◼
►
I am shocked how quickly it's able to pull up results
00:22:06
◼
►
from a, you know, three and a half meg database that has
00:22:10
◼
►
tens of thousands of entries in it and it's able to pull up
00:22:13
◼
►
entries from a single keystroke basically immediately.
00:22:16
◼
►
Like it basically takes no time at all.
00:22:18
◼
►
And so if you have a big enough data set where that matters,
00:22:23
◼
►
you should really just go straight to a database
00:22:25
◼
►
at that point.
00:22:26
◼
►
- Yeah, and I think too there's two other things that I
00:22:28
◼
►
think come to mind with databases that'll make them
00:22:30
◼
►
really nice and I think the first one too is just to point
00:22:33
◼
►
out that you can use core data to create the database
00:22:38
◼
►
that you're going to ultimately ship with the app to
00:22:44
◼
►
and when you set up your core data context, you can just
00:22:49
◼
►
point that to a file.
00:22:50
◼
►
Like I think you need to copy it out of your bundle
00:22:52
◼
►
so it's in your, you know, so it's in a read write place
00:22:55
◼
►
in the data, in user space but that's a trick that I've used
00:22:59
◼
►
a few times where I have a version of the app that has
00:23:04
◼
►
a build that builds the, essentially in my case it was
00:23:07
◼
►
pre-caching a bunch of data about the various audio books
00:23:11
◼
►
in the audio book catalog for my audio book app.
00:23:14
◼
►
And it builds this core data database and then I just take
00:23:18
◼
►
the SQLite file that is underlying core data, put that in
00:23:22
◼
►
my bundle, when you first load up I copy that SQLite file
00:23:26
◼
►
into user space but then I just point core data to it
00:23:29
◼
►
and I get all of the affordances and the nicenses that
00:23:32
◼
►
core data gives you.
00:23:33
◼
►
You don't have to necessarily go down to the low level
00:23:36
◼
►
of using actual SQLite calls for managing it.
00:23:40
◼
►
It can be all kind of managed from that top level
00:23:42
◼
►
in the same way that you might say if you had a cache
00:23:46
◼
►
that you were going to be storing data in with core data,
00:23:48
◼
►
if that was something that you wanted to do, you would be
00:23:51
◼
►
doing that through your application.
00:23:53
◼
►
You can kind of still do that and have that file
00:23:56
◼
►
underlying it that's ultimately all core data is.
00:23:58
◼
►
So that's just something to keep in mind.
00:24:00
◼
►
And I think too, the other big advantage that you get
00:24:03
◼
►
whenever you have this kind of sidecar data in a database
00:24:07
◼
►
as opposed to in a file, it becomes much easier and more
00:24:11
◼
►
of a robust situation if you want to make changes to it.
00:24:15
◼
►
So that if you make it non-static.
00:24:16
◼
►
So if for example this is a cache where you are caching
00:24:21
◼
►
results or prefetching results and you're making
00:24:26
◼
►
it so that when the user does something it's instantaneous
00:24:28
◼
►
but then it's going to fetch some more results later,
00:24:31
◼
►
you could potentially just start adding those into this
00:24:33
◼
►
database file and maybe it starts off being a small file
00:24:37
◼
►
but over time it grows.
00:24:39
◼
►
And that's what this does well with.
00:24:42
◼
►
A database is great for that kind of thing.
00:24:44
◼
►
You just insert more data into it and it sort of scales nice
00:24:48
◼
►
and with a controlled usage pattern and performant
00:24:51
◼
►
sort of expectation there.
00:24:52
◼
►
Whereas if you were doing something with a,
00:24:55
◼
►
you know, with a P list say, you could, you know,
00:24:59
◼
►
add in, you read the P list into memory,
00:25:01
◼
►
you add some items to it, you could write it back out
00:25:04
◼
►
to memory but that starts to very quickly start to get
00:25:07
◼
►
really cumbersome and awkward and you have to deal
00:25:09
◼
►
with all kinds of consistency issues.
00:25:11
◼
►
What happens if, you know, the app was killed
00:25:13
◼
►
halfway through a write and you can write it atomically
00:25:16
◼
►
but hopefully it works and like a lot of these things
00:25:18
◼
►
are completely handled by SQLite.
00:25:23
◼
►
That it's doing all the clever checkpointing
00:25:25
◼
►
and management and consistency stuff that it's nice
00:25:29
◼
►
to not have to think about.
00:25:30
◼
►
So those are two things that I definitely want to mention
00:25:32
◼
►
with databases where they can be super flexible
00:25:35
◼
►
and also probably third is just to keep in mind
00:25:37
◼
►
that it's entirely reasonable and not even like,
00:25:40
◼
►
there's not a tremendous overhead for this
00:25:42
◼
►
to access multiple databases from your application
00:25:45
◼
►
concurrently which I think is I guess what you do
00:25:47
◼
►
in Overcast, right, where you have a search database
00:25:49
◼
►
that is doing this stuff and you also have
00:25:51
◼
►
the actual app database that is actually storing,
00:25:55
◼
►
you know, the user's information.
00:25:56
◼
►
And it's perfectly reasonable to do that
00:25:58
◼
►
and you could have as many of them as made sense
00:26:00
◼
►
and that could really give you cleanliness
00:26:03
◼
►
in terms of both your code and the actual creation
00:26:06
◼
►
of this data that they could be coming
00:26:07
◼
►
from different systems.
00:26:08
◼
►
You don't have to sort of throw it all into one
00:26:10
◼
►
giant database that you then have to manage.
00:26:12
◼
►
- Oh yeah, exactly.
00:26:14
◼
►
One other thing I wanted to touch on too is
00:26:15
◼
►
if you're building in like some kind of source of data,
00:26:20
◼
►
go back and listen to the episode that I talked about
00:26:23
◼
►
building my original search feature in Overcast
00:26:25
◼
►
and like how to build a search engine
00:26:26
◼
►
'cause I wanted to emphasize a point that I went over there
00:26:29
◼
►
which is like make sure you're sourcing your data
00:26:32
◼
►
in a responsible way.
00:26:33
◼
►
You have to see like if you're getting, for example,
00:26:36
◼
►
you know, you're talking about getting a list of cities
00:26:38
◼
►
and what time zone they're in.
00:26:40
◼
►
You have to make sure if you get this list,
00:26:41
◼
►
first of all, like are you legally allowed to copy this
00:26:44
◼
►
and put it into your app or is it copyrighted
00:26:46
◼
►
and you know, you could get yourself into trouble.
00:26:48
◼
►
Make sure that you have the legal right
00:26:50
◼
►
to embed this in your app.
00:26:52
◼
►
Make sure that it is from a good source,
00:26:55
◼
►
that you probably have like a good and complete
00:26:58
◼
►
and correct data set and make sure you have some way
00:27:01
◼
►
to update it easily in the future if it's the kind of,
00:27:04
◼
►
if it's data where like it has the nature
00:27:06
◼
►
where it will change over time.
00:27:08
◼
►
So like, you know, a list of major cities
00:27:10
◼
►
probably doesn't need to change that frequently
00:27:11
◼
►
but you probably should update it like at least once a year
00:27:15
◼
►
and so like, you know, it's worth like considering
00:27:17
◼
►
how often will this data need to be updated.
00:27:19
◼
►
Can I build some kind of automated system for that?
00:27:21
◼
►
Do I have any kind of reliable source
00:27:24
◼
►
for where I can get this list?
00:27:25
◼
►
And as much as possible, automate that
00:27:28
◼
►
and build that in up front so you don't have to like
00:27:31
◼
►
get caught off guard later.
00:27:33
◼
►
- Yeah, and I think too along those lines
00:27:34
◼
►
is also just making sure that, yeah, that you are,
00:27:38
◼
►
you understand where the data comes from.
00:27:39
◼
►
I think it's with certain types of things,
00:27:41
◼
►
it's easy to kind of just, oh, I found this thing on the,
00:27:44
◼
►
I found this thing on the internet and I downloaded it
00:27:46
◼
►
and it seems like it has the right data in it.
00:27:48
◼
►
And it's like, if both from a copyright perspective
00:27:51
◼
►
that could be problematic but even just from a,
00:27:54
◼
►
in six months if you need to update it
00:27:56
◼
►
or people start reporting problems or whatever it is,
00:27:59
◼
►
like having a sense of the origin of the data
00:28:01
◼
►
and where it's coming from, if it's, you know,
00:28:03
◼
►
a lot of these like say you're working with something
00:28:05
◼
►
that's using like OpenStreetMap status as set
00:28:08
◼
►
or something like that, like at least you know,
00:28:10
◼
►
okay, where it's coming from and hopefully you have
00:28:12
◼
►
some kind of script that can download the data
00:28:14
◼
►
that you're interested in and process it
00:28:16
◼
►
and then generate the thing that you need
00:28:17
◼
►
for your application.
00:28:19
◼
►
Like having a end-to-end solution there
00:28:22
◼
►
where it isn't just like you found somebody who did
00:28:25
◼
►
have a script that pulled in the data and did something
00:28:28
◼
►
and then you're bundling that, like you're just setting
00:28:31
◼
►
yourself up for pain in the future where something's
00:28:34
◼
►
gonna change, some requirement's gonna change
00:28:36
◼
►
or you're gonna want to add a field to that
00:28:38
◼
►
and then suddenly it's, rather than being
00:28:40
◼
►
a relatively straightforward thing,
00:28:41
◼
►
it's this big massive project.
00:28:43
◼
►
So yeah, it's like be very thoughtful about
00:28:45
◼
►
where you're getting this kind of data
00:28:48
◼
►
and just being aware that it should be something
00:28:51
◼
►
that you have good understanding of,
00:28:54
◼
►
that in general you want to be sure about
00:28:55
◼
►
that you're sending any data that you're including
00:28:58
◼
►
in your application, like you're taking responsibility
00:29:01
◼
►
for it, so if there is anything that in it
00:29:03
◼
►
that is potentially controversial or questionable,
00:29:06
◼
►
you have to be aware of it, don't just like take it
00:29:08
◼
►
and then blindly use it.
00:29:09
◼
►
- Yeah, and make sure what you're using is legal.
00:29:13
◼
►
- All right, well thank you everybody for listening
00:29:15
◼
►
and we will talk to you next week.