00:00:00
◼►
I'm happy to report that we have not yet been classified by the US government as dangerously capable. Therefore, you're still able to listen to our show.
00:00:09
◼►
Do you think that Apple has classified us as dangerously capable and that's why we aren't getting invites?
00:00:29
◼►
I mean, we should probably talk about the idea of should we really be doing video.
00:00:34
◼►
I don't think any of us are super into that idea, but if we want to continue to have that kind of visibility to the rest of the world, that might be a good idea.
00:00:48
◼►
But it would so complicate the production of the show and the editing of the show that I don't think it's enough motivation for us.
00:01:00
◼►
But I don't know. How do you guys feel about that?
00:01:54
◼►
The flip side of that is, I think it's clear that that's where attention is, certainly for Apple, though honestly, I don't really care if Apple pays that much attention to us.
00:02:03
◼►
It would be lovely to be able to go to WBDC, but it's really not that big a deal.
00:02:13
◼►
But they seem to ignore everything that is not video these days, or print for that matter.
00:02:17
◼►
I mean, and to be clear, like we were never like super reliably getting press access, but it has really turned to zero in recent years, like the last couple of years.
00:02:46
◼►
I know when I have had times in my career where I've had press access with Apple, I have worried about losing it.
00:02:53
◼►
I don't think that ever made me do anything major, majorly different from how I would have otherwise done it.
00:02:59
◼►
But I'm sure it had to have like a minor impact or some kind of like subconscious biasing that like I maybe I would soften things or not go near certain things because I was afraid of of losing that access.
00:03:11
◼►
Whereas when you have nothing to lose, it does give a certain degree of freedom.
00:03:18
◼►
And going back to the video thing for a second, maybe maybe a good guiding principle on this for us to keep in mind is like, I don't think any of our audience has asked us for that.
00:03:58
◼►
What podcast wouldn't like more listeners or more of an audience?
00:04:03
◼►
And so I think that's why people are doing it.
00:04:05
◼►
But if we are happy doing what we are doing, and our audience is happy doing with us doing it this way, and we're all happy with the numbers and how everything is going, I don't think we should feel compelled to have to push into an area that none of us seem like we actually want to do.
00:04:23
◼►
Just for the idea of possibly, basically trying to become YouTubers in a way that it seems like none of us really, really have that in us.
00:17:26
◼►
I'm pretty sure I don't know this for a fact, but as having, having used the world when
00:17:30
◼►
it was new, I'm pretty sure the company that made the Mac app for AOL, which you probably
00:17:36
◼►
aren't familiar with, but anyway, there was a Mac app for AOL.
00:17:38
◼►
And I think that company that made that app essentially white labeled it to Apple and said, here, you can have the app.
00:17:43
◼►
And then Apple just obviously changed all the strings and stuff and then changed all of the graphics.
00:17:49
◼►
The world's whole idea was there was this like a, I don't know how to describe the art style, but sort of a, uh, an impressionistic kind of a cartoony art style of a little village with buildings and stuff.
00:17:58
◼►
Back when everyone thought the internet was going to be a, a giant gif of a town with little people in it.
00:18:17
◼►
No, I've only seen magazine pictures of it.
00:18:18
◼►
I, I used it on, I was, I was traveling to New York when I was a kid once and, and a friend who wasn't Casey, I actually have multiple friends who I would meet in New York.
00:18:28
◼►
Um, but I, a friend had it on his computer and I used it for like a couple of days on this trip.
00:18:34
◼►
Like, so this was the thing where, you know, again, this was like probably early to mid nineties back when there was still a lot of experimentation in like, what should computer UIs be?
00:18:46
◼►
Uh, and, and what should be like the, the metaphor that structures all your applications and documents and everything together.
00:18:53
◼►
And Microsoft Bob was an experiment that Microsoft did somewhere in the mid nineties.
00:18:58
◼►
is that was like, what if we just arrange things like in a house and you could go to different rooms in the house?
00:19:05
◼►
Like very much a, like, you know, like the, the, the early generation of nerds had probably, you know, done a little bit of acid one weekend and came up with this idea in somebody's hot tub.
00:19:16
◼►
And it's like, okay, well, that's a cool idea.
00:19:20
◼►
Um, but I believe that's where Clippy came from.
00:19:23
◼►
I think Clippy was like an offshoot of Bob or Clippy escaped to Bob.
00:19:26
◼►
But yeah, I mean, but there was like, look, this was back in the day that there, there was still at least some experimentation of like before we had all settled on basically the desktop with applications and files.
00:19:37
◼►
Like there, there were other ideas, you know, it wasn't that long after that, that, um, the Palm pilot happened and the Palm pilot also had a totally different structure of like, how should your data and applications be structured?
00:19:50
◼►
Which actually was far closer to what we have on iOS today than, than like, you know, PCs were, um, but yeah, e-world was, e-world was a little bit after that, I believe.
00:19:59
◼►
But it was, that was back in the day when like nobody was quite sure what the internet would end up being for consumers yet.
00:20:05
◼►
And there were things like AOL and CompuServe and everybody wanted their own online service.
00:20:09
◼►
And that's, I guess, where e-world came in.
00:20:12
◼►
I mean, it was just a hundred percent like the, uh, AOL Mac client, uh, except that they had that graphic where a click on the house and then it would just, I mean, it was very,
00:20:20
◼►
It wasn't like Bob where there was like a, you know, there was very, very few graphics and things fit on a floppy disk, but, uh, yeah, you can find screenshots of it online.
00:20:28
◼►
It had a pretty cool art style, but it did not last long.
00:20:32
◼►
I feel, uh, I want to also say that, uh, this, uh, uh, diversion into Microsoft Bob should not be counted towards my time, but I will take full credit for D for the derailment reclaiming my time.
00:20:53
◼►
I'm like, well, there must be some reason for it.
00:20:55
◼►
But if you had like a Mac with touch ID and you needed to like do something, oh, I want to drag a file out of the slash library folder.
00:21:01
◼►
And it's like, it was a, in the finder and be like, well, you can't do that because it's not owned by you.
00:21:04
◼►
But, uh, if you enter your password, cause you have an admin account, if you enter your password, we'll, we'll, you know, do it for you with administrative privileges or whatever.
00:21:12
◼►
And it would always make you enter your password.
00:21:21
◼►
But every time you needed to elevate your privileges, eventually do the equivalent of sudo or sudo or whatever you pronounce that word, uh, it would always throw up a dialogue box.
00:21:55
◼►
It doesn't show you the wait list thing.
00:21:57
◼►
I was, I was going to say that like, it's fine for us to put this out there because just our super nerdy listeners, which are in the grand scheme of things, few in number, will know about this, but it has since leaked out into the media.
00:22:13
◼►
And it's only for the Mac though, to be clear, it only works on, on Mac OS golden gate.
00:22:16
◼►
I mean, where are you going to write, uh, sooner defaults, right command on your phone?
00:22:20
◼►
That's the, yeah, I was, I would love to, to skip it on my test phone for iOS 27, but, uh, that there seems to be no way to skip that yet, except I guess be Joanna Stern.
00:22:30
◼►
And speaking of Steve Trout and Smith, uh, he also pointed out that there is a change in Apple's documentation.
00:22:36
◼►
The key UI design requires compatibility.
00:22:39
◼►
Uh, well, let me just read what it says in the documentation.
00:22:41
◼►
The system ignores this key when you build for iOS 27 or later iPad OS 20.
00:22:46
◼►
Well, basically all the 27 or later, uh, OS's.
00:22:48
◼►
So this is the thing that said, don't use liquid glass for my app, please.
00:22:52
◼►
And Apple saying tough noogs, you're going to have to use it.
00:22:55
◼►
If you compile against iOS, any of the OS is 27.
00:22:58
◼►
I think they might've said that last year too.
00:23:21
◼►
That being said, like all of the problems of liquid glass from, from 26, like we, like if you were kicking the can down the road and you didn't want to deal with them with your app, you still have to deal with them because even if golden gate fixes some of your biggest, uh, you know, nitpicks or issues, which honestly, in that kind of context,
00:23:42
◼►
of like app developers maintaining things, it's not really going to help that much.
00:23:46
◼►
Um, it's just nicer, you know, in certain ways, but all of the things about like, you know, having different metrics for controls than iOS 18 and different behaviors and things, you still have to deal with that as long as you support iOS 18 or iOS 26.
00:24:01
◼►
And for most apps, that's going to be a little bit like you're probably still dealing with supporting all these things for at least another year.
00:24:09
◼►
So, uh, like, like for me with overcast, I'm thinking this is probably a good time.
00:24:14
◼►
Like this, like once I launched my 27 version this fall, I can probably drop support for iOS 18 then, but I that's even that's pretty aggressive given where the numbers actually are for most apps these days.
00:24:26
◼►
So, uh, I understand why people still want this key to work, um, because then they can keep shipping one UI for their app instead of now kind of two and a half is what they'll have to do.
00:24:39
◼►
But, uh, it's, it's going to be a bumpy time.
00:24:41
◼►
Like the, the support for 26 and Tahoe is going to be kind of a thorn in app developer sides for probably at least another two years.
00:24:58
◼►
During onboarding, and I think this is coming from one, John Syracuse, during onboarding for Golden Gate, the liquid glass slider, where you decide how transparent you want it to be.
00:25:07
◼►
Well, that's part of the ongoing of the onboarding process, which is pretty interesting.
00:25:11
◼►
That is what that means is, uh, you know, I was just saying, oh, everyone will just leave it on the default.
00:25:16
◼►
But now I wonder, I mean, maybe most people probably still leave it on the default because people just hit continue, continue, continue.
00:25:23
◼►
But yeah, it's part of, it's part of like the, the onboarding after you install or update to the OS.
00:25:29
◼►
So everyone's going to at least see that slider.
00:25:30
◼►
And I'm sure Apple will be monitoring where those numbers, I say, I'm sure, but honestly, I don't know what kind of metrics Apple gathers about this stuff.
00:25:38
◼►
Apple tends not to gather much info, but we do know that they have some numbers.
00:25:57
◼►
And then some people put all the way to the left because they're young and they think it's cool.
00:26:00
◼►
I still, this, this slider still paffles me.
00:26:06
◼►
Like it still just feels like we didn't feel like deciding.
00:26:10
◼►
So we're going to give you this zero to one floating point setting that changes one thing about liquid glass that honestly isn't even that much.
00:26:20
◼►
First of all, it isn't that impactful of a change.
00:26:21
◼►
And second of all, the range between zero and one of like how they look is not that different.
00:26:29
◼►
But most importantly, going all the way to the right does not make it opaque.
00:26:32
◼►
There's reduced transparency for that if you want it.
00:26:34
◼►
But just to be clear, the slider makes it less transparent.
00:26:38
◼►
And all the way to the left is not fully clear either.
00:26:40
◼►
So like whatever you want, this is kind of a like weird milk toast middle of the road kind of like, well, we can give you a little bit of control.
00:26:49
◼►
But it feels like a very precise control and it just isn't.
00:26:53
◼►
And by the way, they give you like a picture of Apple Park with some crap floating on top of it, you know, liquid glass style.
00:27:07
◼►
Before you move on from that, as Marco pointed out, the underlying setting, which is called NS glass tint amount, does go from zero to one.
00:27:15
◼►
If you set it to a value higher than one, it is ignored.
00:27:19
◼►
Unfortunately, the first thing I did, I'm like, oh, the slider goes from zero to one, 15.
00:27:49
◼►
So you can see this adorable little puppy sitting on one of their pet beds.
00:27:52
◼►
But they do mattresses for all sorts of different sizes and shapes of people, including children, including their plus mattress, which has a little bit extra support and cushioning right where it counts.
00:28:04
◼►
They also have all sorts of different materials that you can choose from to make sure that you're doing what you think is best for your body.
00:28:10
◼►
They offered to send a mattress to me so I could give it a shot and make sure that it's worth talking to you folks about.
00:28:17
◼►
And so we needed a new mattress for Michaela, actually.
00:28:20
◼►
And we ordered a natural hybrid mattress for Michaela.
00:28:23
◼►
But I slept on it for a couple of nights before Michaela did, so I could make sure I'd done my due diligence.
00:28:27
◼►
And let me tell you, I am a mattress snob.
00:28:54
◼►
So, if you're interested in this, go to Lisa.com for 25% off select mattresses, plus get an extra $50 off if you use the promo code ATP, which is exclusive for our listeners.
00:29:05
◼►
That's L-E-E-S-A.com, promo code ATP, for 25% off select mattresses and an extra $50 off.
00:29:13
◼►
Support our show and let them know we sent you after checkout.
00:29:19
◼►
Thank you to Lisa for sponsoring the show.
00:29:24
◼►
All right, this might get me in a lot of trouble, but there was a lot of consternation with regard to the corner radius of macOS Windows.
00:29:35
◼►
And, John, would you like to just briefly recap why perhaps you, or if not you, others were upset about this in macOS 26, please?
00:29:44
◼►
Yeah, they made the corner radius very large.
00:29:47
◼►
It means the corners were very, very rounded, which means there are a larger number of pixels that you don't get to see in the corners of a window.
00:29:54
◼►
And this is especially important in Tahoe because the way the UI works is that if, like, if you're looking at a picture, like an image or something, the image goes edge to edge.
00:30:04
◼►
So you're losing all four corners of that image if that image is in a window because all four corners are curved.
01:19:29
◼►
And back when Amazon did their Super Bowl commercial, I could not do it.
01:19:34
◼►
No matter what I tried, I could not get the recording of my voice to not trigger the speakers.
01:19:42
◼►
So what did I do when all this came out this week about Siri?
01:19:46
◼►
I tried the exact same thing and I went in and I was able to
01:19:48
◼►
able to compare because now I had the real audio from Apple.
01:19:51
◼►
So, you know, so I opened that real audio up and I saw exactly how they stripped out those
01:19:56
◼►
frequencies and exactly which ones were stripped out with what kind of response pattern and I
01:20:01
◼►
figured out exactly how to replicate like as close as possible to that process.
01:20:06
◼►
And I applied it to a test file and I still couldn't replicate the effect.
01:20:11
◼►
It would still wake up all my devices whenever I would play my voice coming over the speakers.
01:20:16
◼►
Now, what some people have theorized is that maybe by recognizing my voice, maybe I'm giving it such a strong signal because it's me saying it that my device would start to it.
01:20:27
◼►
But maybe like somebody else wouldn't be able to activate my devices as easily.
01:20:33
◼►
But also with everybody saying that all of their devices were being woken up by the keynote anyway, it seems like this is not necessarily a very strong filter that basically the recognition of the term Siri by all the modern Apple devices out there seems like it actually might be sophisticated enough and sensitive enough that maybe this trick of muting these few frequencies doesn't even defeat it necessarily.
01:21:00
◼►
Or it doesn't defeat it enough if the voice is close enough to you.
01:21:05
◼►
Or maybe if you haven't done like the personal voice training with the Siri setup process, maybe yours is more sensitive or less sensitive.
01:21:35
◼►
The frequency response of speakers is not perfect.
01:21:37
◼►
So, yeah, you could put out the slice out these narrow slices.
01:21:40
◼►
But by the time that makes it through your audio chain to your speakers, to the physical world, through the air to your ears, how much of that slicing out survives?
01:21:49
◼►
The finer you slice it, the more chance there is that just the quality of the amplification and the speaker cone wiggling around end up mushing some other frequencies into the range where you could notch them out.
01:22:03
◼►
Now there's stuff there and it's it's really complicated.
01:22:05
◼►
I mean, to be clear, like this particular slice out would be very difficult to do in any kind of analog way.
01:22:13
◼►
If you try to do it like with a parametric EQ, you won't be able to.
01:22:18
◼►
But it's so narrow that, like, you think, OK, well, now that I've cut that out, there'll be nothing in that frequency band when the sound hits my ear.
01:22:24
◼►
And it's like, well, is the signal chain from that audio file to your ears so perfect that it's tiny, narrow slice of frequency is going to have no energy in it whatsoever because your speakers are that perfect?
01:22:37
◼►
I, you know, especially if you're playing over a phone speaker or something, I'm not sure it's going to work.
01:22:46
◼►
So coming back around, let's talk about what Amar had said during the same presentation.
01:22:53
◼►
Again, they are the vice president of AI.
01:22:55
◼►
We're super excited about our third generation of Apple Foundation Models, or AFM, in partnership with Google.
01:23:00
◼►
We've built a family of models spanning on device to the cloud.
01:23:03
◼►
AFM Core, Core Advanced Cloud, and Cloud Image are custom builds for Apple Silicon, trained using proprietary data and refined using techniques.
01:23:11
◼►
This was a potential transcription thing.
01:23:33
◼►
But what Stephen said was that he thinks refined is another way of saying that they distilled the Gemini models to train Apple Foundation Model models.
01:23:42
◼►
And what that means is basically you have some finished model that someone's already trained, and you don't want to have to go through all the effort of training your thing the same way.
01:23:52
◼►
So instead, you ask questions of that other finished model and use the answers to train your model.
01:24:02
◼►
You know, if your competitor comes out with a better model than you, you ask that model questions and use the answers to train your model so your model is as good as theirs.
01:24:10
◼►
It's not clear to me when they talk about all these models.
01:24:13
◼►
First, they give them all Apple names.
01:24:14
◼►
It's, you know, AFM, Apple Foundation Model, Core, Core Advanced, Core Advanced Cloud.
01:24:19
◼►
The names sound like old Intel processors, honestly.
01:28:01
◼►
So that's where, you know, for each token that's going through, all 70 billion parameters don't participate in the processing of that token.
01:28:08
◼►
Only the ones that another network decides are relevant.
01:28:11
◼►
So you don't have to activate all of them at once.
01:28:13
◼►
So that's dense model versus sparse model.
01:28:15
◼►
And you'll hear those terms in this big Apple press release about their third generation foundation models.
01:28:21
◼►
All right, so with that in mind, Apple's announcement of its third generation foundation models happened over at machinelearning.apple.com with, you know, we'll put the link in the show notes.
01:28:30
◼►
With regard to the on-device models, AFM Core is a 3 billion parameter dense model.
01:28:36
◼►
So that's the one where if you ask it anything, it's using all 3 billion parameters.
01:28:40
◼►
AFM Core Advanced is a 20 billion parameter sparse model and our most powerful on-device model.
01:28:47
◼►
It's natively multimodal, enabling helpful features like expressive voices and higher accuracy dictation.
01:28:53
◼►
Built on cutting-edge Apple research, this model activates just 1 to 4 billion parameters at a time, depending on the request.
01:28:59
◼►
AFM 3 Core Advanced is unlocked and optimized for our most capable Apple Silicon systems.
01:29:05
◼►
So this is the model that you only get to run on whatever it is.
01:30:10
◼►
So Apple, if you look at their talk and all the description of this, like, they don't come out and say this.
01:30:16
◼►
But every part of their architecture, and I imagine most things like this, is trying as hard as it can to use the smallest possible model it can get away with.
01:30:28
◼►
This is as opposed to, for example, me, whenever I use any of these products.
01:30:31
◼►
I'm using, like, you know, codex from the command line or something.
01:30:34
◼►
And it has, like, a slash models thing where you can pick which model you want.
01:30:37
◼►
Whatever the biggest is, I always pick biggest, highest effort, most thinking.
01:30:44
◼►
Because I'm like, I want the smartest one.
01:30:57
◼►
That is not what Apple wants billions of iPhone users to do.
01:31:01
◼►
So what they're going to do is you make a request and it's going to be like, if we can get away with doing this with AFM Core, our 3 billion parameter dense model, we're going to.
01:31:09
◼►
Even though if we gave it to AFM Cloud Pro, it would do a way better job.
01:31:14
◼►
We're not going to do that because computing is a scarce resource.
01:31:21
◼►
And even though it probably would give a better result, although that is somewhat debatable because sometimes they say, well, if you give it to one of these big thinking models, not only does it take longer, but actually gives a worse result because it overthinks it and blah, blah, blah.
01:31:31
◼►
But setting that aside, my experience has been the bigger, more powerful models do better no matter what you ask them.
01:31:38
◼►
I'm sure there are counterexamples, but my experience with code stuff is they do better.
01:31:43
◼►
But that's not what Apple software wants to do.
01:31:45
◼►
So it's going to figure out what is the wimpiest model that I can send this to.
01:31:49
◼►
If none of the on-device ones will do it, then I have to send it to the cloud.
01:31:52
◼►
But even when I send it to the cloud, do we really want to send it to the one, the AFM Cloud Pro, the one we said that is comparable to Google stuff?
01:32:02
◼►
But when I'm sitting there talking to like, you know, you know, ChatGPT or whatever in a web browser, I'm at least picking the pop-up menus and say, yeah, go to ChatGPT, super hard thinking, maximum, whatever, all the time.
01:32:16
◼►
No matter what I'm asking it, Apple does not give you that choice.
01:32:19
◼►
They only have one model, AFM Cloud Pro, that is on that level.
01:32:23
◼►
And I bet it gets very few requests by design.
01:32:26
◼►
The only other thing is like if you do anything with images, they just have one image model.
01:32:29
◼►
So everything there is going to AFM Cloud Image, but everything else, they have so many choices before they get down to the quote-unquote good model underneath it all.
01:32:38
◼►
And I think that may come up when we get to another Dan Morin example.
01:32:42
◼►
Maybe we skipped over it or maybe I removed it from the notes.
01:32:45
◼►
But sometimes, yeah, maybe I get to get clipped out of the notes here.
01:32:49
◼►
Sometimes when you ask Siri AI in the current betas a question and it gives you an answer.
01:32:55
◼►
Actually, that's from my experience, not Dan Morin, sorry.
01:33:03
◼►
I don't know where it went in the notes, though.
01:33:05
◼►
Anyway, sometimes when you ask a question and you get an answer, as I did with Siri AI and macOS, you'd be like, oh, that's a pretty good answer.
01:33:13
◼►
And then when you ask the same question later to, for example, demonstrate it to a family member, it gives you a totally different, worse answer.
01:33:20
◼►
And when I see that, I'm like, did you go to AFM Cloud Pro the first time?
01:33:24
◼►
But then the second time I ran it, you did like a local on-device model and gave me a crap answer because that first answer was so much better.
01:33:31
◼►
So, yeah, Apple doesn't seem to give you control over that routing.
01:33:35
◼►
I don't think there's a pop-up menu in Siri where you can pick the model that you want.
01:33:39
◼►
And sometimes that difference shows through.
01:33:41
◼►
Maybe if you're not, like, experimenting like I am and asking the same question multiple times with the exact same wording, you won't notice that.
01:33:50
◼►
Traditional large language models, whether dense or sparsely activated, require all weights to reside in active memory or DRAM.
01:33:56
◼►
To break this barrier, AFM Core Advanced introduces a novel sparsely activated architecture built on instruction-following pruning, or IFP, a technique developed by Apple researchers.
01:34:07
◼►
And, in fact, there's instruction-following pruning of large language models for large language models from June 2025.
01:34:14
◼►
That's also at machinelearning.apple.com.
01:34:16
◼►
I forget if we ever talked about that paper, that Apple said it had a bunch of papers on this topic.
01:34:30
◼►
Then there's also LLM in a Flash, Efficient Large Language Model Inference with Limited Memory, again, on machinelearning.apple.com.
01:34:38
◼►
Filippone Stefano wrote an article about it, which we'll also link from Filippone.
01:34:43
◼►
Finally, the LLM in a Flash paper addresses challenges and solutions for running LLMs on devices with limited DRAM capacity.
01:34:49
◼►
It presents an approach for efficiently executing LLMs that exceed available DRAM capacity by storing model parameters in flash memory and loading them into DRAM on demand.
01:34:58
◼►
Yeah, I think we did also mention that paper ages ago.
01:35:01
◼►
Apple explicitly calls out the instruction following pruning for large language models sparse things.
01:35:06
◼►
So they're definitely doing that because that was reading from a direct quote from their VP of AI or whatever.
01:35:17
◼►
And there's other people about flash memory, about like, oh, you don't have enough RAM, but you want to use a bigger model.
01:35:23
◼►
You can store part of it in a flash and like swap it in or whatever.
01:35:25
◼►
That makes me wonder if one of the reasons Apple intelligence is disabled on external drives is that Apple just doesn't want to deal with having to guess at the performance characteristics of external drives.
01:35:36
◼►
If they're doing if they are, in fact, doing this technique, this LLM in a flash thing where some of the model is in RAM and some of it is on flash.
01:35:44
◼►
Apple knows the character, the speed characteristics of all of its internal SSDs.
01:35:47
◼►
And so maybe it's one that's an external drive.
01:35:51
◼►
They're like, how about we just disable entirely?
01:35:52
◼►
Because I'm sure plenty of the drives are fast enough.
01:35:54
◼►
But if they're not, it has a terrible experience or like a performance falls off a cliff.
01:36:15
◼►
With that in mind, coming back to Apple's Foundation Models announcement, instead of forcing the entire model into DRAM, the full model is stored in flash memory or NAND.
01:36:23
◼►
Because NAND to DRAM bandwidth is too slow to swap weights token by token as standard MOE or mixture of experts models require, AFM3 Core Advanced makes routing decisions per prompt.
01:36:34
◼►
A lightweight, dense block selects a fixed set of experts during initial processing, periodically reselecting them during generation.
01:36:40
◼►
To minimize data movement, the model relies on a high percentage of always active, shared experts, alongside input-dependent, routed experts, swapped into DRAM only when needed.
01:36:50
◼►
This design also introduces crucial inference time elasticity.
01:36:55
◼►
Rather than using a single model for all tasks or managing an ensemble of smaller models, AFM Core Advanced uses a predetermined number of active parameters tailored to each specific use case.
01:37:05
◼►
This allows weights to be loaded incrementally across requests of varying difficulty, scaling the model size for beyond traditional DRAM limits while minimizing latency.
01:37:15
◼►
So this is them saying that this specific model, AFM3 Core Advanced, does use the things from the flash paper, it sounds like.
01:37:22
◼►
But of course, Apple Intelligence hasn't been able to run when booted from an external drive since its introduction in 2024.
01:37:28
◼►
So this is, and also the AFM Core Advanced only runs on the highest-end hardware.
01:37:33
◼►
So that can't be the root reason why they do this, but I wonder if it contributes to it.
01:37:37
◼►
But anyway, all of that is to say, if you think Apple just, like, bought models from Google, stuck them on servers, and sent requests to them, that's not how this works at all.
01:37:45
◼►
Like, I feel like it's the biggest point of this tech talk is with all those things on the diagram and Craig Fadery pointing out, like, you know, Google's participation, they were an important part of this.
01:37:55
◼►
And we'll get to some even more important parts of them in a second.
01:37:57
◼►
But Apple is the one writing the software that's on top of it.
01:38:01
◼►
And Apple, in fact, has made many advances, all those AI people that Apple hired that, you know, eventually left to go to other AI companies, they actually did interesting novel work.
01:38:10
◼►
I'm not sure if this stuff is, like, the equivalent things are happening at all the cutting-edge AI companies, but I do want to give Apple credit for, like, they're not just, like, we failed, we can't do anything, let's just use a third-party product and slap a Siri face on it.
01:38:24
◼►
Like, these papers and the fact that they're using them and the description of how they use them and how it gets them to be able to run models that otherwise wouldn't run on device.
01:38:33
◼►
And essentially how it makes it possible for Apple to ship 27 OSs to literally billions of Apple devices and not destroy any, all their servers.
01:38:43
◼►
Like, they can't do the thing that everyone else is doing, which is, like, every request goes to a server, don't run anything on device.
01:38:48
◼►
They're trying so hard to run everything on device that they possibly can, using lots of interesting techniques to make models that shouldn't fit on your phone actually fit on your phone, and shuffling between them.
01:38:59
◼►
And, like, it's fascinating, and it shows that they are not, I mean, they're behind, like, because they're not the cutting edge like their competitors are.
01:39:06
◼►
But they're using what they're good at, basically writing client-side software, to innovate in that area on top of the underlying technologies that makes the models themselves and trains them and stuff like that.
01:39:18
◼►
Then coming back to Amar, to bring this model to production, we work with both Google and NVIDIA to extend our private cloud compute infrastructure to NVIDIA GPUs in Google's cloud while maintaining Apple's unmatched privacy guarantees.
01:39:33
◼►
Andrew Cunningham over at ArsTechnica writes,
01:39:35
◼►
To do this while still making the same privacy promises, Apple's new iteration of private cloud compute is using NVIDIA's confidential computing, Intel's trust domain extensions, and Google's Titan security chip to provide layers of protection similar to what Apple provides for its own servers.
01:39:49
◼►
To provide additional protection, Apple keeps a cryptographically verifiable append-only ledger of all Google Cloud hardware that is part of the PCC fleet.
01:39:57
◼►
And Apple's devices will only trust hardware, excuse me, software on these servers that is signed by Apple.
01:40:03
◼►
The Google Cloud servers don't yet support all the same protections as Apple's own private cloud compute servers, but Apple says it will gradually be ramping towards the complete set of protections throughout the summer preview period.
01:40:15
◼►
So, what does a cryptographically verifiable append-only ledger sound like to you?
01:40:30
◼►
I mean, all this stuff, as has been pointed out when we talked about private cloud compute, like, Apple is doing everything that it possibly can to mathematically show and prove and guarantee that you are talking to servers that behave in the way that Apple says.
01:40:48
◼►
And that way is, Apple can't see it because it's all end-to-end encrypted, so Apple can't see it.
01:40:53
◼►
And obviously, the server that receives a request has to, of course, decrypt it because how could it, you know, it has to know what you said to do the work.
01:41:00
◼►
But the other guarantee that Apple gives is the software that runs on those servers will never save any of that.
01:41:09
◼►
Like, it doesn't, like, no data that comes into that server that is decrypted in memory and then processed and chucked back out is saved in any way.
01:41:16
◼►
And so, that's private cloud computing.
01:41:18
◼►
It was 2024 where they talked about that.
01:41:19
◼►
They gave a big paper on it and everything.
01:41:20
◼►
And that's part of the promise is we, our software behaves in this way.
01:42:00
◼►
They're just calling it private cloud compute.
01:42:02
◼►
And what they mean by that is those M2 Ultra Apple Silicon servers that Apple made.
01:42:06
◼►
And also, NVIDIA GPUs running in Google's cloud.
01:42:11
◼►
Because as we saw, Google has a similar architecture to PCC.
01:42:15
◼►
And those things that Casey listed there, if you go to the Andrew Cunningham article at Ars Technica that will be in the show notes, those are links.
01:42:23
◼►
So if you want to learn what is NVIDIA's confidential computing, what is Intel's trust domain extensions, what is Google's Titan security chip?
01:42:30
◼►
Those are the pieces of the puzzle that are building up towards making Google able to run servers that Apple calls private cloud computing, even though they share essentially nothing with Apple's private cloud computing other than the promises that they fulfill.
01:42:46
◼►
And as this article says, that they're not quite up to the standards of Apple's private cloud compute.
01:42:53
◼►
But the idea is they will be before the 27 OS's ship.
01:42:58
◼►
Like, again, Apple gives these images to security researchers so they can mathematically prove that when you run a request, it's really running against this image.
01:43:26
◼►
You can decompile it and you security researchers should be able to prove, see, we're not saving it anymore.
01:43:31
◼►
And then also what you should be able to prove is that thing that we gave you, that's the thing that's running on the server.
01:43:36
◼►
And all the stories I've said about this is that in the end, you have to, there is a root of trust in believing that Apple is not lying to you.
01:43:42
◼►
That's true of every piece of software in the world.
01:43:45
◼►
Lots of times people send an ask ADP questions.
01:43:47
◼►
Like they say, Apple says my message is encrypted, but do we have to just take their word for it?
01:43:53
◼►
Like, you know, like they write the software.
01:43:56
◼►
They could lie and say it's end to end encrypted and just be lying through their teeth.
01:43:59
◼►
But security researchers would discover that they were lying in most cases.
01:44:04
◼►
In the end, there is some root of trust.
01:44:06
◼►
You know, if you keep digging down and like, what is that paper that we've linked to a few times?
01:44:13
◼►
Musing on trust or whatever, something like that, about talking about if you had a compiler that was corrupted in some way that you couldn't trust anything because the compiler builds all your other programs.
01:44:23
◼►
I wish I could remember what that one.
01:44:27
◼►
So, yeah, but you do have to trust it when they say we want some software to do this.
01:44:31
◼►
And also you have to trust that it doesn't have bugs, which sometimes it does.
01:44:33
◼►
It's supposed to do this, but it doesn't actually do this as a bug in it somewhere.
01:44:38
◼►
But, yeah, it seems like, I mean, the question is, those Apple Silicon servers with the M2 Ultras in them, how long will Apple keep doing that?
01:44:46
◼►
Have they already given up on that effort and said, well, it's a cool thing.
01:44:49
◼►
We tried it, but that was under the old regime.
01:44:50
◼►
And now what we're going to do is what everybody else does, which is we're going to run NVIDIA GPUs, which is interesting because Apple hates NVIDIA's guts and hasn't done anything with NVIDIA for ages.
01:44:59
◼►
But apparently they'll use their GPUs and servers.
01:45:01
◼►
And it's also interesting because Google, which is running these servers, Google has their own like TPUs.
01:45:07
◼►
There's tensor processing units that are really good at running Google's models.
01:45:11
◼►
Google makes its own silicon, these TPUs, we've talked about a long time ago on the show, that are different than NVIDIA GPUs.
01:45:21
◼►
Well, if Apple uses NVIDIA GPUs, they're not tied to Google.
01:45:26
◼►
So if and when the deal ends with Google or they don't like Google hosting their stuff, they could go to anybody and say, hey, do you have a data center filled with NVIDIA GPUs that we can run our stuff on?
01:45:49
◼►
But anyway, seems like Apple might be giving up on the idea that they are going to run Apple silicon servers in their own data centers and build their own hardware and whatever that factory was in Arizona or in Texas.
01:46:03
◼►
And instead, they're just going to do what everyone else does and just pay some hosting provider like AWS or Google Cloud to rent racks full of NVIDIA GPUs.
01:46:11
◼►
And at this point, Google did a deal with XAI, Grok, whatever.
01:46:17
◼►
They built out data centers that they aren't using because no one wants to use Grok because it sucks and everyone hates Elon Musk.
01:46:24
◼►
And so XAI is renting whole data centers to Google.
01:46:29
◼►
So it could be that when you talk to Siri AI, it ends up going to an XAI data center that is being rented by Google where it's running NVIDIA GPUs.
01:46:41
◼►
But yeah, this seems like a change in Apple's stance.
01:46:46
◼►
And again, notably the first time that I'm aware of since back when NVIDIA had a bum GPU in like an iBook or something that like the relationship between Apple and NVIDIA soured.
01:48:04
◼►
The only things that were Google were like, you know, tinted shaded stuff in the cloud thing.
01:48:09
◼►
So they weren't forthcoming with exactly like is it a Google model that we change as an Apple model that we just still like they didn't go into that level of detail.
01:48:16
◼►
But their big emphasis here was most of the stuff you see in Siri AI is software Apple wrote.
01:48:23
◼►
And they didn't say this, but like it's clear that you kind of need the part that Google provided or otherwise none of this works.
01:48:30
◼►
But the Apple was really emphasizing we did a lot of work for this.
01:48:34
◼►
And I mean, I guess it's like also when it falls on its face and doesn't work right or you aren't happy with how it goes.
01:48:39
◼►
Don't blame Google because that wasn't up to them.
01:50:09
◼►
By the way, if you're not an ATP member and you're like, oh, you're going to put State of the Union in Overtime?
01:50:20
◼►
Non-members hate when we put stuff in Overtime because they don't get to hear Overtime.
01:50:23
◼►
And like, I get it, you know, but like, here's the thing.
01:50:26
◼►
I'm pretty sure in multiple past years, possibly also including last year, we didn't cover State of the Union at all.
01:50:33
◼►
Because what happens after WWC is there's tons of follow-up and tons of news and things happen and State of the Union just gets pushed off and pushed off.
01:50:39
◼►
And then we look up and it's a month later and it's like, I guess we'll just delete State of the Union from the notes because, like, it's old news now.
01:50:45
◼►
Like, we can't really, like, go back to it.
01:50:47
◼►
It's just, it's already done and gone.
01:51:10
◼►
I wanted to spend just a couple of minutes talking about trip results.
01:51:14
◼►
So I just went to Cape Charles for the last week.
01:51:16
◼►
This is our happy place on the eastern shore of Virginia.
01:51:20
◼►
And I brought a truly asinine amount of equipment and computing-related things.
01:51:27
◼►
In no small part because I had to record this very show, but also because I'm me.
01:51:32
◼►
And I wanted to talk about the Unified Travel Router, which we talked about at some point in the past.
01:51:39
◼►
But to refresh your memory, the standard operating procedure for travel routers is a GLINET,
01:51:45
◼►
which I don't know how to verbally describe how big a GLINET router is.
01:51:50
◼►
But they're small but not tiny by any stretch.
01:51:53
◼►
And a few months ago, Ubiquiti came out with a Unified Travel Router, which is exceedingly small.
01:52:00
◼►
It's imagine like five or six credit cards stacked on top of each other.
01:52:03
◼►
That's probably not exactly right, but that's kind of what I'm talking about.
01:52:06
◼►
And what a travel router does is if you have a single internet connection, which presumably you do,
01:52:13
◼►
but you would like to broadcast that to your entire family's constellation of devices,
01:52:19
◼►
then what you can do is you can use a travel router, be that a GLINET or the Unified Travel Router,
01:52:24
◼►
to log into or connect to whatever the internet source is.
01:52:28
◼►
So in a best-case scenario, you plug Ethernet right into this little baby travel router.
01:52:31
◼►
But more realistically, you are in a hotel or something like that,
01:52:35
◼►
And you have the travel router log into the hotel Wi-Fi and go through that whole painful dance.
01:52:43
◼►
But then the travel router broadcasts its own Wi-Fi that your phone and iPad and Mac and your spouse's phone and Mac and iPad
01:52:54
◼►
and your children's iPads and switches and so on and so forth, they all connect to the UTR, the Unified Travel Router.
01:53:00
◼►
And that is figuring out how to get you to the internet.
01:53:03
◼►
Now, the pro move, in my personal opinion, which you can do either by hand or the Unified Travel Router does automatically,
01:53:09
◼►
is to set your Wi-Fi that this portable router is broadcasting to be the exact same SSID and password as your home Wi-Fi.
01:53:19
◼►
So this way, everything just jumps on the nearby Wi-Fi, thinking effectively that it's at home.
01:53:23
◼►
It's particularly critical if you're a dork like me and bring one or maybe two Sonos speakers with you when you are on a long trip like this,
01:55:15
◼►
And I brought a GLiNet with me as well, because I assumed that the UTR just wouldn't have the oomph
01:55:23
◼►
to broadcast Wi-Fi throughout the entire house.
01:55:26
◼►
Now, this was the GLiNet was set up for success because it so happens that the cable modem and the router for the house were fairly centrally located, which was excellent.
01:55:39
◼►
But I have to say, the UTR worked great.
01:55:43
◼►
I, in fact, didn't want to say anything to Marco about this, or John, for that matter.
01:55:47
◼►
But when I was talking on ATP, I was connecting via Ethernet.
01:55:53
◼►
I think I did talk about this last week.
01:55:54
◼►
via a really janky, like, 50-foot Ethernet cable running through the kitchen of the house over to where the UTR was, which actually had a Ubiquity Flex Mini, I think it is, hanging off of it.
01:56:09
◼►
So I could connect more than one thing via Ethernet, because, again, I'm a dork.
01:56:15
◼►
So the laptop was connected Ethernet to a little tiny five-port switch, which was connected to the UTR, which was, in turn, connected as a client to the house's router.
01:57:16
◼►
Now it mostly works, but occasionally it'll just kind of forget to be connected to Teleport, which isn't the end of the earth.
01:57:23
◼►
But I prefer all of our traffic to be encrypted through whatever the house's router is and merge.
01:57:28
◼►
Or ingress onto the internet from our house, like our literal home in Richmond.
01:57:34
◼►
But all in all, it actually worked really well.
01:57:36
◼►
And what's interesting about the fact that the UTR can be on Teleport is that if I wanted to, I could actually bring one of the Unify security cameras and connect it to the UTR.
01:57:50
◼►
I guess it would have to be a Wi-Fi camera or I would have to use like a POE injector or whatever.
01:58:15
◼►
But you should really consider the Unify travel router if you're traveling.
01:58:19
◼►
And I find that even if I didn't use it at the house, I would consider bringing this anyway because it is so darn small.
01:58:27
◼►
And when you're in the car, if you set it up, you know, hook it to your phone to tether off your phone.
01:58:32
◼►
Or if you do like me and borrow a hotspot from the library or what have you, it is a really excellent way to get the kids online quickly and easily in the car.
01:58:42
◼►
And what I have is, you know, a little anchor like battery pack connected to the UTR, which is in turn connected to the hotspot.
01:58:51
◼►
So this is probably something you should not want in your life because you are really in a bad state if this is something that excites you as much as it excites me.
01:58:59
◼►
But I got to tell you, I freaking love this thing.