Under the Radar

158: Bundled Data


00:00:00   - Welcome to Under the Radar,

00:00:01   a show about independent iOS app development.

00:00:04   I'm Mark Warmant.

00:00:05   - And I'm David Smith.

00:00:06   Under the Radar is never longer than 30 minutes,

00:00:08   so let's get started.

00:00:10   So today we wanted to talk a little bit about

00:00:13   what I kind of think about as like sidecar data,

00:00:17   things that kind of, data that you need to often ship

00:00:21   with your app or provide to your app

00:00:23   that supports its function, but isn't necessarily

00:00:26   like the main content of the app.

00:00:29   And what's interesting about this,

00:00:31   and we have a couple of recent things that both you and I

00:00:34   have been working through with this, Marco,

00:00:35   where there's lots and lots of different ways

00:00:38   that you can actually package up data like this

00:00:41   to include it in your app.

00:00:43   You can include it all the way from the extreme

00:00:45   of like including it in code to shipping configuration files

00:00:50   or shipping a database with it,

00:00:53   preloading like a core data database.

00:00:55   You can download it from the internet as JSON or PLIST,

00:00:59   you can download it from the internet as a database file.

00:01:02   You can bundle it with, I think Apple even has

00:01:05   a distribution system where you can add assets

00:01:08   that are downloaded on demand.

00:01:09   Like there is a tremendous variety of things

00:01:11   that you can do with this.

00:01:12   - I forgot about that entire system.

00:01:14   - I think it's primarily used for games, I think.

00:01:17   I think mostly it's used for situations like that.

00:01:19   But anyway, we'll get into why you might want to use that.

00:01:22   But it's a situation that I think happens often,

00:01:26   and there's all these weird trade-offs that you have

00:01:27   with speed, performance, first run experience,

00:01:31   time and downloading, but it happens more often

00:01:33   than you will.

00:01:35   And I think this first came to mind, I think,

00:01:37   is something that is probably worth you explaining

00:01:39   for how you implemented instant search

00:01:42   in Overcast, the most latest update,

00:01:45   which I think was a really clever system

00:01:47   and approach to taking, sort of solving this kind

00:01:49   of a problem.

00:01:50   - Yeah, sure.

00:01:51   And I went into it in detail on ATP,

00:01:53   so I won't go into too far detail here,

00:01:54   but basically Overcast has a new instant search feature

00:01:59   which downloads about once a week.

00:02:01   It downloads a search index from my servers

00:02:04   and stores it locally, and then when you perform a search,

00:02:07   it first hits that before it gets the results back

00:02:10   from the server, and then it puts the server results

00:02:12   below those results.

00:02:13   So you can start typing, you can immediately get results

00:02:16   from that local index while you're waiting

00:02:18   for the network request of the server for the other results.

00:02:21   And I think so of many of these decisions

00:02:25   and trade-offs with this, because one thing I considered

00:02:28   was obviously it has to be very small.

00:02:31   The index can't be like 100 megs.

00:02:33   That's a bit much to force people to download

00:02:35   in the background once a week without even them knowing

00:02:38   about it or without asking them or without providing

00:02:39   any controls for it.

00:02:40   And it didn't seem like an important enough feature

00:02:42   to have a preference to turn it off or have it off

00:02:44   by default or anything like that.

00:02:46   But, so the file's small, it's about three and a half megs

00:02:51   right now, and I said once a week downloads.

00:02:54   But I also thought, should I bundle one with the app?

00:02:58   Should I have in the app bundle your first search index

00:03:01   and then whenever the app gets a chance,

00:03:03   it downloads whatever's current from the internet.

00:03:05   And I thought, that's not great, because then I'm forcing

00:03:09   everyone for the next month or month and a half

00:03:13   or two months until I do my next app update,

00:03:15   I'm forcing them to download this data that will,

00:03:18   one week in, be out of date.

00:03:20   And that seemed like a bad idea to put in my app bundle.

00:03:24   But if it's something that didn't change very often,

00:03:27   I absolutely would have put it in there.

00:03:29   No question I would have put it in the app bundle.

00:03:31   And I might in the future build in a very small default one

00:03:34   just to have some kind of results there for the most

00:03:38   popular searches or something like that.

00:03:40   But that was one consideration right there of like,

00:03:43   do you bundle it into the app bundle directly

00:03:46   or do you download it after installation?

00:03:48   And I think that one is a very easy one for me.

00:03:52   I think you can bundle it in only if it's either

00:03:57   super critical to the app's functionality,

00:03:59   so that that way if someone launches the app

00:04:01   and it hasn't downloaded it yet, they can still achieve

00:04:04   what they need to achieve in the app.

00:04:05   So if it's super critical functionality,

00:04:07   it needs to be in the app bundle.

00:04:09   Or if it's something that very rarely ever changes,

00:04:12   that you can just update it when you issue app updates

00:04:15   and it's no big deal to have that kind of frequency

00:04:17   of updates, then it makes sense.

00:04:19   Build them into the app bundle and avoid

00:04:21   all the other concerns.

00:04:23   Like I think if it's a feature that very few people use,

00:04:26   who cares?

00:04:27   As long as it isn't like a massive file, still bundle it in.

00:04:30   But if it's something that needs to be updated

00:04:31   on a regular basis, that's when you look at downloads.

00:04:34   - Yeah and I think too with that,

00:04:36   I've in a variety of different apps,

00:04:38   I've done different versions of this.

00:04:40   And a lot of times what I find too,

00:04:43   there's something nice about the app

00:04:46   being self-contained that you can avoid a situation

00:04:51   where someone downloads the app when they have connectivity

00:04:55   and then their first launch,

00:04:57   if their first launch of the app requires connectivity

00:04:59   to do something, it's kind of a bad experience.

00:05:03   And so too I think it's always,

00:05:05   even if that initial cache may be something

00:05:08   that will get invalidated or will be immediately updated

00:05:11   when they launch the app, it's something

00:05:13   that I've done a couple times where I'm trying

00:05:15   to make sure that there's almost this basic usefulness

00:05:20   that the app can have right out of that initial download.

00:05:24   That you don't assume that well,

00:05:25   if they had, obviously they had internet connectivity

00:05:28   when they downloaded the app,

00:05:29   of course they're gonna have it

00:05:30   when the first time they launch it.

00:05:31   But I don't think you can necessarily rely on that.

00:05:33   And obviously depending on what the app is,

00:05:35   that's more or less relevant.

00:05:37   If it's a, like my first app that I ever actually shipped

00:05:40   into the App Store, this is over 10 years ago now,

00:05:43   was a reference app that showed you

00:05:45   like cost of living per diem stuff for travelers.

00:05:50   And for that one, it's like the app is completely useless

00:05:52   if it doesn't have its database.

00:05:53   And that first version I shipped it,

00:05:55   just had like a plist file as its database,

00:05:57   which subsequently got upgraded to a SQLite database

00:06:00   that was shipped with the bundle.

00:06:01   But it was one of those things where I wanted to make sure

00:06:04   that as long as you got all of the app, it worked.

00:06:08   And that was all you had to,

00:06:10   all the connectivity that was required.

00:06:12   And also I think it's something that always sticks

00:06:15   in the back of my mind, is that one nice thing

00:06:18   about shipping it with the bundle is that you are not,

00:06:23   it doesn't require you to be maintaining something else

00:06:26   in order for the app to continue to function.

00:06:29   So as long as they have access to that app bundle,

00:06:33   they will be able to use the app in some ways.

00:06:35   If at some point you lose interest in the app,

00:06:38   it kind of starts to fall away

00:06:39   and you turn off the web server

00:06:41   where all of the content was being served from

00:06:44   or that cost becomes unsustainable,

00:06:47   it's kind of nice in almost like

00:06:50   a software preservation perspective or whatever,

00:06:52   that even several years later,

00:06:54   as long as Apple is still a thing

00:06:57   that is serving assets from the App Store,

00:06:59   someone could download the app or re-download it

00:07:01   from their purchased history and it could still work.

00:07:05   So that's just another thing that's kind of nice

00:07:06   about bundling it directly into the app itself.

00:07:09   - Yeah, exactly.

00:07:10   Because one of the reasons why I did instant search

00:07:13   was that when I have server issues like I did

00:07:17   over the holidays, I did periods I talked about,

00:07:20   if I have server issues, search is critical to a podcast app

00:07:23   because it's the first thing people do to add podcasts

00:07:27   and it's a very frequent thing to search for new ones

00:07:30   and I wanted to make sure that was always gonna be fast

00:07:32   and even though if my servers are totally down,

00:07:35   you can't subscribe to a new podcast,

00:07:37   so I'm not totally immune here,

00:07:39   but I at least have a level of protection here.

00:07:42   If there's spotty connectivity or server issues here,

00:07:47   I'm giving them a better experience

00:07:48   than I otherwise would have.

00:07:50   - Yeah, and I think in general,

00:07:52   that's a great use for this type of,

00:07:54   it's essentially what a lot of this kind of data becomes

00:07:59   is essentially it's advanced caching.

00:08:02   It's different, it's a more robust version of caching

00:08:07   that you might do ahead of time,

00:08:09   so you can call it pre-warmed caches,

00:08:12   but it's things that you could eventually download

00:08:16   from the internet potentially

00:08:17   or that may be part of the normal use of the app

00:08:19   is that in your case, with search,

00:08:21   most of their results eventually are gonna be coming

00:08:23   from the web server, but as fast as that might be,

00:08:28   as optimized as you might make it,

00:08:30   it's still never gonna be as fast as in memory or on device.

00:08:35   Those are situations that are always gonna be better

00:08:38   and it's a dramatic difference in user experience

00:08:41   to have it locally there by pre-warming those caches.

00:08:44   I mean, you can sort of get around this sometimes

00:08:46   where as soon as the app launches,

00:08:49   it goes off and starts pre-warming its caches or things

00:08:52   so that by the time someone goes to the search tree

00:08:54   and you may have already gotten some of this and so on,

00:08:57   but it's never gonna be,

00:08:59   it's always a tricky balance to know the timing of that

00:09:03   and the resources of that and having that be ready

00:09:06   exactly when you want it to be there.

00:09:08   We are sponsored this week by Linode.

00:09:10   With Linode, you can instantly deploy and manage

00:09:13   an SSD server in the Linode cloud

00:09:15   and you can get a server running in just seconds

00:09:17   with your choice of Linux distro,

00:09:19   resources and node location.

00:09:22   Linode serves their customers with the help of 10 data

00:09:25   centers around the globe and they're about to add more.

00:09:28   Mumbai, India and Toronto, Canada will both have

00:09:31   data centers by 2020.

00:09:32   Linode also features native SSD storage

00:09:35   on all of their servers.

00:09:36   They have a 40 gigabit network behind it all

00:09:39   and they use Intel Xeon E5 CPUs.

00:09:41   This means you have amazing, fast hardware and networking

00:09:45   to serve your customers as fast as possible.

00:09:48   And you don't have to worry about overspending

00:09:50   because Linode has designed their pricing tiers

00:09:52   to feature hourly billing with the added bonus

00:09:54   of a monthly cap on all plans and add-on services

00:09:58   including backup services and node balancers.

00:10:00   Linode has fantastic pricing options to suit everyone.

00:10:04   The plans start at one gig of RAM for just $5 a month

00:10:07   and then for high memory plans starting with 16 gigs of RAM.

00:10:10   And Linode has a special offer for our listeners.

00:10:12   Listeners of the show can go to linode.com/radar

00:10:15   and use promo code radar2019 to get $20

00:10:19   towards any Linode plan.

00:10:21   Once again, it's linode.com/radar,

00:10:23   promo code radar2019 to get $20 towards a plan.

00:10:27   So on the one gig of RAM plan,

00:10:28   that could be four months for free.

00:10:30   And with a seven day money back guarantee,

00:10:33   you have nothing to lose.

00:10:33   So give Linode a try today.

00:10:35   Linode.com/radar, promo code radar2019.

00:10:38   Thank you so much to Linode for hosting everything

00:10:41   I run on the internet and for sponsoring

00:10:44   our show and Relay FM.

00:10:45   - So what I think would be interesting

00:10:48   to kind of walk through now is the spectrum of ways

00:10:53   that you can ship code or ship data with your app.

00:10:57   And this is, there's a tremendous variety

00:10:59   and each of them has different trade-offs

00:11:01   that I think are interesting to talk about.

00:11:02   And I'll start with the most like, I don't even know,

00:11:05   like bare metal silly version,

00:11:07   which is you can ship data in code.

00:11:11   And this is something that you almost certainly

00:11:13   you are doing to some degree, like some amount,

00:11:15   obviously there's a certain amount of data

00:11:17   in your application, irrespective of almost anything

00:11:21   in terms of even you could think about

00:11:22   like localizable strings is a form of data

00:11:25   that is being shipped with your app.

00:11:27   But even if it isn't like quite that straightforward

00:11:30   degree, like you could just have an array of static strings

00:11:34   that are like a standard list that is often shown

00:11:37   in your application that you have to choose from.

00:11:38   Like for example, in my workout app, Workouts++,

00:11:41   I have, you know, there's a bunch of data

00:11:44   that is associated with each of the different workout types.

00:11:47   You know, you walking, running, doing yoga, whatever it is.

00:11:51   And a lot of the information about those,

00:11:53   I just store in code.

00:11:55   Like I have, you know, sort of lookup tables and things

00:11:58   that are just dictionaries that are just static values

00:12:01   that I ship in the application.

00:12:02   And this approach sort of works well in some ways

00:12:06   that it's very straightforward.

00:12:10   You know, it doesn't scale well.

00:12:11   It doesn't work if you, it works fine

00:12:13   for like 20 or 30 values.

00:12:15   It doesn't scale well if you have 20 or 30,000 values.

00:12:18   You know, it's not something that you would necessarily

00:12:20   wanna put a huge amount of things into,

00:12:23   but can be really straightforward.

00:12:24   And it certainly is something that I think we all do

00:12:27   at some point is there's a certain amount of data

00:12:30   that is just in the code.

00:12:32   The downside of course is that it's in the code.

00:12:34   It's not something that you can update

00:12:36   unless you update the application.

00:12:38   You're tying your data versioning to your code versioning.

00:12:41   So like in terms of from a version management perspective,

00:12:43   you're tying those two things together,

00:12:45   which can not be a great situation.

00:12:49   If you are doing this, you know,

00:12:52   if you're pushing the limits of this,

00:12:53   like Xcode will start to get really grumpy

00:12:55   if you are actually like, you know, having, you know,

00:12:58   the source code files that have, you know,

00:13:01   several thousand lines that are all just string literals.

00:13:03   Like you could do that, but it probably is gonna make

00:13:06   your life uncomfortable even just from a day-to-day use

00:13:10   and you know, compile times and opening the file perspective.

00:13:13   Like it's, or the thing that often will, you know,

00:13:16   bring source editors to their needs

00:13:17   is like trying to do syntax coloring

00:13:19   on these big massive string files.

00:13:22   Like you'll just see, like you'll actually watch

00:13:24   the coloring move down the page sometimes.

00:13:26   (laughing)

00:13:27   Like if you ever see that, you've gone too far on this.

00:13:30   Like it's appropriate at the small level

00:13:33   and it's super helpful because it's super performant.

00:13:35   Like these are strings directly in like the code space

00:13:38   of the application, like it's right there.

00:13:41   Don't go too crazy with that,

00:13:42   but it's certainly something that is available to you

00:13:46   as a tool.

00:13:48   And I think like the next one up from that,

00:13:50   I think is probably where you start to get into

00:13:53   serialized data structures that are not shipped in your code

00:13:58   but are functionally like loaded into memory.

00:14:01   So this is where I start to think of like

00:14:03   if you have a P list or a JSON blob or something like that

00:14:07   that you are storing, you know,

00:14:08   either shipping with the app like we were talking about

00:14:11   or downloading from the internet, you could go either way.

00:14:13   But you know, you have a data structure

00:14:15   that you're functionally gonna be like, you know,

00:14:18   NS array, you know, in it with contents from UI

00:14:23   URL kind of a thing.

00:14:24   And you're going to just immediately slurp it up into memory

00:14:27   and you'll have an in-memory dictionary

00:14:29   or an in-memory array.

00:14:31   And that's how you're gonna access that data.

00:14:34   And this I find is something that I probably use most often.

00:14:39   It's very flexible in terms of it's nice that you can,

00:14:44   the data is external to your code.

00:14:46   So you're not creating that dependency.

00:14:48   It's, you know, Xcode is perfectly happy

00:14:50   to have a resource file in its bundle

00:14:51   that could be bigger than you'd actually want to edit there.

00:14:55   It's reasonably performant.

00:14:57   It works pretty well for a medium size,

00:14:59   like small to medium sized data sets.

00:15:02   And like I'm actually in the process right now.

00:15:05   I mean, this is relevant for me

00:15:06   because I'm working on pedometer plus plus

00:15:08   on a time zone feature.

00:15:09   And I need to have a list of, you know,

00:15:12   popular cities and their time zones.

00:15:14   So you can choose time zones from cities.

00:15:17   And it's one of those things where it falls

00:15:19   into that kind of middle ground

00:15:20   where ultimately I'm gonna have maybe like 1,800 cities

00:15:25   or so, like not so many that I feel like I need to go full,

00:15:28   like have, you know, have a database

00:15:31   that I can, you know, make SQL queries against.

00:15:33   It's not that many, but it's big enough

00:15:36   that I need to externalize the data somehow.

00:15:38   And so right now, what I've been doing

00:15:40   that seems to work pretty well

00:15:41   is I'm just putting it into a plist file.

00:15:43   And I just load that plist file in whenever the user

00:15:47   happens to want to do this search.

00:15:49   You know, it's slightly big in memory, but not crazy.

00:15:51   And it's an operation that doesn't happen very often.

00:15:54   So it isn't something that I'm super worried about,

00:15:57   but you can just take that, you know,

00:15:59   it's you're functionally just taking a data structure

00:16:01   that you would normally have in memory

00:16:03   and you can write it to disk and you can read it from disk.

00:16:05   And, you know, you can do the same thing.

00:16:06   You can tell a dictionary or an array and say like,

00:16:08   you know, save, you know, write to file

00:16:11   and it'll, you know, essentially put out a binary plist

00:16:13   that you can then really, you know, reload back in.

00:16:18   Or you can go even more extreme and you can use like

00:16:20   the archiving APIs in, you know, in iOS

00:16:24   where you can take an arbitrary data structure

00:16:27   and be, you know, be rewriting that to disk

00:16:30   and reading that in and then it starts to get

00:16:31   a little bit more advanced and a bit more complicated

00:16:33   and you should start starting to think,

00:16:35   maybe I want to do this in a database

00:16:36   where I'm not the one having to like manage

00:16:39   all this really sort of nuanced data management stuff.

00:16:42   But that certainly is the most extreme version of this

00:16:45   is when you start using like the keyed archiving

00:16:47   and secure archiving and all of those types of things.

00:16:51   - Yeah, and there's a number of trade-offs

00:16:52   you have to consider when you're picking between these.

00:16:54   So like one of the biggest ones that I think

00:16:57   would bite a lot of people kind of unexpectedly

00:16:59   is the true memory footprint of having

00:17:03   one of these simpler solutions, like a plist file

00:17:05   or a JSON file because not only do you have to consider

00:17:09   that like the entire, like usually if you're parsing

00:17:12   a plist or a JSON file, usually that means

00:17:15   the entire contents of that file

00:17:17   are going to be loaded into memory.

00:17:19   It also usually means that they're going to be decoded

00:17:22   into different data structures that will end up using

00:17:26   more memory in actual like usage RAM

00:17:30   in their native structures like NS dictionary,

00:17:31   NS array and everything.

00:17:33   That'll use more memory than the size of the file.

00:17:36   So you really have to be careful if you're doing this

00:17:38   to test, like to actually monitor, like load up your app,

00:17:42   watch the memory usage meter in whatever developer tool

00:17:45   that you're running it in, you know, instruments

00:17:47   or the Xcode window like in the little like now running

00:17:49   screen that shows the CPU memory and stuff.

00:17:51   Watch what happens to the memory usage

00:17:52   when you load that data in.

00:17:54   It will probably go up by more than you think it will.

00:17:57   And you know, so if it's like you know, one meg,

00:18:02   who cares, right?

00:18:03   If it's 20 megs or 50 megs or 100 megs,

00:18:06   you should start caring.

00:18:07   Like at that point, you have a much bigger footprint

00:18:10   than you probably should and that can cause problems

00:18:13   in things like getting your app kicked out of memory,

00:18:16   having it maybe crash on older phones that don't want

00:18:18   to give you that much memory or can't give you

00:18:19   that much memory.

00:18:20   You could start having potential just battery usage issues

00:18:23   of just having lots of RAM thrashing in and out

00:18:26   and everything so it's important to make sure

00:18:29   that you're not gonna blow your memory budget

00:18:32   by having just one big file.

00:18:34   You can start doing a little more custom things

00:18:36   like this is one of the things I hate about JSON

00:18:39   is that there doesn't seem to be what in XML

00:18:43   people used to call a SACS parser,

00:18:45   which was basically a streaming parser that would like

00:18:48   call your callbacks on begin tags, end tags,

00:18:51   got tag contents, et cetera.

00:18:53   And so you could basically stream an XML file

00:18:55   through memory without ever loading the whole thing

00:18:57   into memory.

00:18:58   I don't know of any JSON streaming parsers,

00:19:01   but anyway, you can kind of fake it,

00:19:03   you can like do some custom work where like you have

00:19:05   a file that you write in certain blocks and you have

00:19:08   like custom ways of reading it so that each block

00:19:10   is itself a smaller piece of JSON or whatever,

00:19:12   that's pretty complicated.

00:19:13   I think once you're getting to those kind of levels,

00:19:16   you should be looking at it at just a database.

00:19:18   And this is how I do my search index.

00:19:20   I mentioned earlier my search index,

00:19:21   my offline search index is about three and a half megs

00:19:24   most of the time and so it is literally a SQLite file.

00:19:29   That's all.

00:19:29   And one of the great things about this as opposed

00:19:32   to things like binary P list and everything,

00:19:34   not only do you not have to load the whole thing

00:19:35   into memory and everything, but it's also,

00:19:37   it's easier to have server side tools that generate it

00:19:40   because if you're not using an Apple platform,

00:19:44   the tools for reading and writing Apple's P list format

00:19:46   are weak at best and so if you can have something

00:19:50   like SQLite or JSON, you can write server side,

00:19:54   that's great.

00:19:55   What's great about SQLite is that it is incredibly

00:19:58   well optimized, like it's kind of shocking how fast

00:20:03   SQLite is, like it shouldn't be as amazing as it is.

00:20:07   There's a reason why it is very widespread.

00:20:10   Even Apple uses it for a lot of OS stuff.

00:20:12   A lot of stuff built into Mac OS and iOS is based

00:20:15   on SQLite, including core data.

00:20:17   There's very much reasons for that.

00:20:19   Anyway, so just shipping around a SQLite file,

00:20:22   I think is a very, very good solution.

00:20:24   The only major downside to it is it is kind of overkill

00:20:28   if you are dealing with a data set that's pretty small.

00:20:32   So if you have like under 1,000 items in your data set,

00:20:36   you probably don't need SQLite for that.

00:20:39   And the other thing I would say is it does involve

00:20:41   some code overhead that you have to write,

00:20:45   you have to link to the SQLite dilib and you have to

00:20:49   actually have SQLite calling functions and you might,

00:20:52   you could use a wrapper library.

00:20:54   My preferred one for that is FMDB by our friend Gus Mueller

00:20:58   and I use it in Overcast, I use it as much as I can,

00:21:01   it's wonderful.

00:21:02   You can also use the direct C API but it's a little

00:21:05   cumbersome so it's not necessarily the best option

00:21:08   for most people but that's about the only downside

00:21:11   is like yeah, you have to code in support for SQLite.

00:21:14   Other than that, I strongly recommend it because one

00:21:16   of the other things, like we're talking about memory costs

00:21:19   and size considerations but another thing to consider

00:21:22   is look up speed and if your data set is small,

00:21:25   you know computers are fast, it's not gonna matter,

00:21:27   you can scan through all of them just by doing like a dumb

00:21:30   string match, is this the right title?

00:21:31   No, is this the right title?

00:21:32   No, just scan through the whole set.

00:21:35   But when your set becomes large enough where that could

00:21:39   be a problem, you actually will see significant gains

00:21:42   by having some kind of index structure like having

00:21:45   a binary tree lookup or something, that's what databases

00:21:48   do so if you create an index on like a title column,

00:21:52   you can have way faster lookups of potentially massive

00:21:56   data sets and you can look them up basically instantly.

00:21:59   Like I'm, again, like look at Overcast Instant Search,

00:22:02   I am shocked how quickly it's able to pull up results

00:22:06   from a, you know, three and a half meg database that has

00:22:10   tens of thousands of entries in it and it's able to pull up

00:22:13   entries from a single keystroke basically immediately.

00:22:16   Like it basically takes no time at all.

00:22:18   And so if you have a big enough data set where that matters,

00:22:23   you should really just go straight to a database

00:22:25   at that point.

00:22:26   - Yeah, and I think too there's two other things that I

00:22:28   think come to mind with databases that'll make them

00:22:30   really nice and I think the first one too is just to point

00:22:33   out that you can use core data to create the database

00:22:38   that you're going to ultimately ship with the app to

00:22:44   and when you set up your core data context, you can just

00:22:49   point that to a file.

00:22:50   Like I think you need to copy it out of your bundle

00:22:52   so it's in your, you know, so it's in a read write place

00:22:55   in the data, in user space but that's a trick that I've used

00:22:59   a few times where I have a version of the app that has

00:23:04   a build that builds the, essentially in my case it was

00:23:07   pre-caching a bunch of data about the various audio books

00:23:11   in the audio book catalog for my audio book app.

00:23:14   And it builds this core data database and then I just take

00:23:18   the SQLite file that is underlying core data, put that in

00:23:22   my bundle, when you first load up I copy that SQLite file

00:23:26   into user space but then I just point core data to it

00:23:29   and I get all of the affordances and the nicenses that

00:23:32   core data gives you.

00:23:33   You don't have to necessarily go down to the low level

00:23:36   of using actual SQLite calls for managing it.

00:23:40   It can be all kind of managed from that top level

00:23:42   in the same way that you might say if you had a cache

00:23:46   that you were going to be storing data in with core data,

00:23:48   if that was something that you wanted to do, you would be

00:23:51   doing that through your application.

00:23:53   You can kind of still do that and have that file

00:23:56   underlying it that's ultimately all core data is.

00:23:58   So that's just something to keep in mind.

00:24:00   And I think too, the other big advantage that you get

00:24:03   whenever you have this kind of sidecar data in a database

00:24:07   as opposed to in a file, it becomes much easier and more

00:24:11   of a robust situation if you want to make changes to it.

00:24:15   So that if you make it non-static.

00:24:16   So if for example this is a cache where you are caching

00:24:21   results or prefetching results and you're making

00:24:26   it so that when the user does something it's instantaneous

00:24:28   but then it's going to fetch some more results later,

00:24:31   you could potentially just start adding those into this

00:24:33   database file and maybe it starts off being a small file

00:24:37   but over time it grows.

00:24:39   And that's what this does well with.

00:24:42   A database is great for that kind of thing.

00:24:44   You just insert more data into it and it sort of scales nice

00:24:48   and with a controlled usage pattern and performant

00:24:51   sort of expectation there.

00:24:52   Whereas if you were doing something with a,

00:24:55   you know, with a P list say, you could, you know,

00:24:59   add in, you read the P list into memory,

00:25:01   you add some items to it, you could write it back out

00:25:04   to memory but that starts to very quickly start to get

00:25:07   really cumbersome and awkward and you have to deal

00:25:09   with all kinds of consistency issues.

00:25:11   What happens if, you know, the app was killed

00:25:13   halfway through a write and you can write it atomically

00:25:16   but hopefully it works and like a lot of these things

00:25:18   are completely handled by SQLite.

00:25:23   That it's doing all the clever checkpointing

00:25:25   and management and consistency stuff that it's nice

00:25:29   to not have to think about.

00:25:30   So those are two things that I definitely want to mention

00:25:32   with databases where they can be super flexible

00:25:35   and also probably third is just to keep in mind

00:25:37   that it's entirely reasonable and not even like,

00:25:40   there's not a tremendous overhead for this

00:25:42   to access multiple databases from your application

00:25:45   concurrently which I think is I guess what you do

00:25:47   in Overcast, right, where you have a search database

00:25:49   that is doing this stuff and you also have

00:25:51   the actual app database that is actually storing,

00:25:55   you know, the user's information.

00:25:56   And it's perfectly reasonable to do that

00:25:58   and you could have as many of them as made sense

00:26:00   and that could really give you cleanliness

00:26:03   in terms of both your code and the actual creation

00:26:06   of this data that they could be coming

00:26:07   from different systems.

00:26:08   You don't have to sort of throw it all into one

00:26:10   giant database that you then have to manage.

00:26:12   - Oh yeah, exactly.

00:26:14   One other thing I wanted to touch on too is

00:26:15   if you're building in like some kind of source of data,

00:26:20   go back and listen to the episode that I talked about

00:26:23   building my original search feature in Overcast

00:26:25   and like how to build a search engine

00:26:26   'cause I wanted to emphasize a point that I went over there

00:26:29   which is like make sure you're sourcing your data

00:26:32   in a responsible way.

00:26:33   You have to see like if you're getting, for example,

00:26:36   you know, you're talking about getting a list of cities

00:26:38   and what time zone they're in.

00:26:40   You have to make sure if you get this list,

00:26:41   first of all, like are you legally allowed to copy this

00:26:44   and put it into your app or is it copyrighted

00:26:46   and you know, you could get yourself into trouble.

00:26:48   Make sure that you have the legal right

00:26:50   to embed this in your app.

00:26:52   Make sure that it is from a good source,

00:26:55   that you probably have like a good and complete

00:26:58   and correct data set and make sure you have some way

00:27:01   to update it easily in the future if it's the kind of,

00:27:04   if it's data where like it has the nature

00:27:06   where it will change over time.

00:27:08   So like, you know, a list of major cities

00:27:10   probably doesn't need to change that frequently

00:27:11   but you probably should update it like at least once a year

00:27:15   and so like, you know, it's worth like considering

00:27:17   how often will this data need to be updated.

00:27:19   Can I build some kind of automated system for that?

00:27:21   Do I have any kind of reliable source

00:27:24   for where I can get this list?

00:27:25   And as much as possible, automate that

00:27:28   and build that in up front so you don't have to like

00:27:31   get caught off guard later.

00:27:33   - Yeah, and I think too along those lines

00:27:34   is also just making sure that, yeah, that you are,

00:27:38   you understand where the data comes from.

00:27:39   I think it's with certain types of things,

00:27:41   it's easy to kind of just, oh, I found this thing on the,

00:27:44   I found this thing on the internet and I downloaded it

00:27:46   and it seems like it has the right data in it.

00:27:48   And it's like, if both from a copyright perspective

00:27:51   that could be problematic but even just from a,

00:27:54   in six months if you need to update it

00:27:56   or people start reporting problems or whatever it is,

00:27:59   like having a sense of the origin of the data

00:28:01   and where it's coming from, if it's, you know,

00:28:03   a lot of these like say you're working with something

00:28:05   that's using like OpenStreetMap status as set

00:28:08   or something like that, like at least you know,

00:28:10   okay, where it's coming from and hopefully you have

00:28:12   some kind of script that can download the data

00:28:14   that you're interested in and process it

00:28:16   and then generate the thing that you need

00:28:17   for your application.

00:28:19   Like having a end-to-end solution there

00:28:22   where it isn't just like you found somebody who did

00:28:25   have a script that pulled in the data and did something

00:28:28   and then you're bundling that, like you're just setting

00:28:31   yourself up for pain in the future where something's

00:28:34   gonna change, some requirement's gonna change

00:28:36   or you're gonna want to add a field to that

00:28:38   and then suddenly it's, rather than being

00:28:40   a relatively straightforward thing,

00:28:41   it's this big massive project.

00:28:43   So yeah, it's like be very thoughtful about

00:28:45   where you're getting this kind of data

00:28:48   and just being aware that it should be something

00:28:51   that you have good understanding of,

00:28:54   that in general you want to be sure about

00:28:55   that you're sending any data that you're including

00:28:58   in your application, like you're taking responsibility

00:29:01   for it, so if there is anything that in it

00:29:03   that is potentially controversial or questionable,

00:29:06   you have to be aware of it, don't just like take it

00:29:08   and then blindly use it.

00:29:09   - Yeah, and make sure what you're using is legal.

00:29:12   - Yes.

00:29:13   - All right, well thank you everybody for listening

00:29:15   and we will talk to you next week.

00:29:17   - Bye.

00:29:18   [