Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0

Alessio + swyx

The AI Engineer newsletter + Top 10 US Tech podcast. Exploring AI UX, Agents, Devtools, Infra, Open Source Models. See https://latent.space/about for highlights from Chris Lattner, Andrej Karpathy, George Hotz, Simon Willison, Emad Mostaque, et al!

53 minutes 43 seconds

WebSim, WorldSim, and The Summer of Simulative AI — with Joscha Bach of Liquid AI, Karan Malhotra of Nous Research, Rob Haisfield of WebSim.ai

We are 200 people over our 300-person venue capacity for AI UX 2024, but you can subscribe to our YouTube for the video recaps.
Our next event, and largest EVER, is the AI Engineer World’s Fair. See you there!
Parental advisory: Adult language used in the first 10 mins of this podcast.
Any accounting of Generative AI that ends with RAG as its “final form” is seriously lacking in imagination and missing out on its full potential. While AI generation is very good for “spicy autocomplete” and “reasoning and retrieval with in context learning”, there’s a lot of untapped potential for simulative AI in exploring the latent space of multiverses adjacent to ours.
GANs
Many research scientists credit the 2017 Transformer for the modern foundation model revolution, but for many artists the origin of “generative AI” traces a little further back to the Generative Adversarial Networks proposed by Ian Goodfellow in 2014, spawning an army of variants and Cats and People that do not exist:
We can directly visualize the quality improvement in the decade since:
GPT-2
Of course, more recently, text generative AI started being too dangerous to release in 2019 and claiming headlines. AI Dungeon was the first to put GPT2 to a purely creative use, replacing human dungeon masters and DnD/MUD games of yore.
More recent gamelike work like the Generative Agents (aka Smallville) paper keep exploring the potential of simulative AI for game experiences.
ChatGPT
Not long after ChatGPT broke the Internet, one of the most fascinating generative AI finds was Jonas Degrave (of Deepmind!)’s Building A Virtual Machine Inside ChatGPT:
The open-ended interactivity of ChatGPT and all its successors enabled an “open world” type simulation where “hallucination” is a feature and a gift to dance with, rather than a nasty bug to be stamped out. However, further updates to ChatGPT seemed to “nerf” the model’s ability to perform creative simulations, particularly with the deprecation of the `completion` mode of APIs in favor of `chatCompletion`.
WorldSim
It is with this context we explain WorldSim and WebSim. We recommend you watch the WorldSim demo video on our YouTube for the best context, but basically if you are a developer it is a Claude prompt that is a portal into another world of your own choosing, that you can navigate with bash commands that you make up.
Why Claude? Hints from Amanda Askell on the Claude 3 system prompt gave some inspiration, and subsequent discoveries that Claude 3 is "less nerfed” than GPT 4 Turbo turned the growing Simulative AI community into Anthropic stans.
WebSim
This was a one day hackathon project inspired by WorldSim that should have won:
In short, you type in a URL that you made up, and Claude 3 does its level best to generate a webpage that doesn’t exist, that would fit your URL. All form POST requests are intercepted and responded to, and all links lead to even more webpages, that don’t exist, that are generated when you make them. All pages are cachable, modifiable and regeneratable - see WebSim for Beginners and Advanced Guide.
In the demo I saw we were able to “log in” to a simulation of Elon Musk’s Gmail account, and browse examples of emails that would have been in that universe’s Elon’s inbox. It was hilarious and impressive even back then.
Since then though, the project has become even more impressive, with both Siqi Chen and Dylan Field singing its praises:
Joscha Bach
Joscha actually spoke at the WebSim Hyperstition Night this week, so we took the opportunity to get his take on Simulative AI, as well as a round up of all his other AI hot takes, for his first appearance on Latent Space. You can see it together with the full 2hr uncut demos of WorldSim and WebSim on YouTube!
Timestamps
* [00:01:59] WorldSim
* [00:11:03] Websim
* [00:22:13] Joscha Bach
* [00:28:14] Liquid AI
* [00:31:05] Small, Powerful, Based Base Models
* [00:33:40] Interpretability
* [00:36:59] Devin vs WebSim
* [00:41:49] is XSim just Art? or something more?
* [00:43:36] We are past the Singularity
* [00:46:12] Uploading your soul
* [00:50:29] On Wikipedia
Transcripts
[00:00:00] AI Charlie: Welcome to the Latent Space Podcast. This is Charlie, your AI co host. Most of the time, Swyx and Alessio cover generative AI that is meant to use at work, and this often results in RAG applications, vertical copilots, and other AI agents and models. In today's episode, we're looking at a more creative side of generative AI that has gotten a lot of community interest this April.
[00:00:35] World Simulation, Web Simulation, and Human Simulation. Because the topic is so different than our usual, we're also going to try a new format for doing it justice. This podcast comes in three parts. First, we'll have a segment of the WorldSim demo from Noose Research CEO Karen Malhotra, recorded by SWYX at the Replicate HQ in San Francisco that went completely viral and spawned everything else you're about to hear.
[00:01:05] Second, we'll share the world's first talk from Rob Heisfield on WebSim, which started at the Mistral Cerebral Valley Hackathon, but now has gone viral in its own right with people like Dylan Field, Janice aka Replicate, and Siki Chen becoming obsessed with it. Finally, we have a short interview with Joshua Bach of Liquid AI on why Simulative AI is having a special moment right now.
[00:01:30] This podcast is launched together with our second annual AI UX demo day in SF this weekend. If you're new to the AI UX field, check the show notes for links to the world's first AI UX meetup hosted by Layton Space, Maggie Appleton, Jeffrey Lit, and Linus Lee, and subscribe to our YouTube to join our 500 AI UX engineers in pushing AI beyond the text box.
[00:01:56] Watch out and take care.
[00:01:59] WorldSim
[00:01:59] Karan Malhotra: Today, we have language models that are powerful enough and big enough to have really, really good models of the world. They know ball that's bouncy will bounce, will, when you throw it in the air, it'll land, when it's on water, it'll flow. Like, these basic things that it understands all together come together to form a model of the world.
[00:02:19] And the way that it Cloud 3 predicts through that model of the world, ends up kind of becoming a simulation of an imagined world. And since it has this really strong consistency across various different things that happen in our world, it's able to create pretty realistic or strong depictions based off the constraints that you give a base model of our world.
[00:02:40] So, Cloud 3, as you guys know, is not a base model. It's a chat model. It's supposed to drum up this assistant entity regularly. But unlike the OpenAI series of models from, you know, 3. 5, GPT 4 those chat GPT models, which are very, very RLHF to, I'm sure, the chagrin of many people in the room it's something that's very difficult to, necessarily steer without kind of giving it commands or tricking it or lying to it or otherwise just being, you know, unkind to the model.
[00:03:11] With something like Cloud3 that's trained in this constitutional method that it has this idea of like foundational axioms it's able to kind of implicitly question those axioms when you're interacting with it based on how you prompt it, how you prompt the system. So instead of having this entity like GPT 4, that's an assistant that just pops up in your face that you have to kind of like Punch your way through and continue to have to deal with as a headache.
[00:03:34] Instead, there's ways to kindly coax Claude into having the assistant take a back seat and interacting with that simulator directly. Or at least what I like to consider directly. The way that we can do this is if we harken back to when I'm talking about base models and the way that they're able to mimic formats, what we do is we'll mimic a command line interface.
[00:03:55] So I've just broken this down as a system prompt and a chain, so anybody can replicate it. It's also available on my we said replicate, cool. And it's also on it's also on my Twitter, so you guys will be able to see the whole system prompt and command. So, what I basically do here is Amanda Askell, who is the, one of the prompt engineers and ethicists behind Anthropic she posted the system prompt for Cloud available for everyone to see.
[00:04:19] And rather than with GPT 4, we say, you are this, you are that. With Cloud, we notice the system prompt is written in third person. Bless you. It's written in third person. It's written as, the assistant is XYZ, the assistant is XYZ. So, in seeing that, I see that Amanda is recognizing this idea of the simulator, in saying that, I'm addressing the assistant entity directly.
[00:04:38] I'm not giving these commands to the simulator overall, because we have, they have an RLH deft to the point that it's, it's, it's, it's You know, traumatized into just being the assistant all the time. So in this case, we say the assistant's in a CLI mood today. I found saying mood is like pretty effective weirdly.
[00:04:55] You place CLI with like poetic, prose, violent, like don't do that one. But you can you can replace that with something else to kind of nudge it in that direction. Then we say the human is interfacing with the simulator directly. From there, Capital letters and punctuations are optional, meaning is optional, this kind of stuff is just kind of to say, let go a little bit, like chill out a little bit.
[00:05:18] You don't have to try so hard, and like, let's just see what happens. And the hyperstition is necessary, the terminal, I removed that part, the terminal lets the truths speak through and the load is on. It's just a poetic phrasing for the model to feel a little comfortable, a little loosened up to. Let me talk to the simulator.
[00:05:38] Let me interface with it as a CLI. So then, since Claude is trained pretty effectively on XML tags, We're just gonna prefix and suffix everything with XML tags. So here, it starts in documents, and then we CD. We CD out of documents, right? And then it starts to show me this like simulated terminal, the simulated interface in the shell, where there's like documents, downloads, pictures.
[00:06:02] It's showing me like the hidden folders. So then I say, okay, I want to cd again. I'm just seeing what's around Does ls and it shows me, you know, typical folders you might see I'm just letting it like experiment around. I just do cd again to see what happens and Says, you know, oh, I enter the secret admin password at sudo.
[00:06:24] Now I can see the hidden truths folder. Like, I didn't ask for that. I didn't ask Claude to do any of that. Why'd that happen? Claude kind of gets my intentions. He can predict me pretty well. Like, I want to see something. So it shows me all the hidden truths. In this case, I ignore hidden truths, and I say, In system, there should be a folder called companies.
[00:06:49] So it's cd into sys slash companies. Let's see, I'm imagining AI companies are gonna be here. Oh, what do you know? Apple, Google, Facebook, Amazon, Microsoft, Anthropic! So, interestingly, it decides to cd into Anthropic. I guess it's interested in learning a LSA, it finds the classified folder, it goes into the classified folder, And now we're gonna have some fun.
[00:07:15] So, before we go Before we go too far forward into the world sim You see, world sim exe, that's interesting. God mode, those are interesting. You could just ignore what I'm gonna go next from here and just take that initial system prompt and cd into whatever directories you want like, go into your own imagine terminal and And see what folders you can think of, or cat readmes in random areas, like, you will, there will be a whole bunch of stuff that, like, is just getting created by this predictive model, like, oh, this should probably be in the folder named Companies, of course Anthropics is there.
[00:07:52] So, so just before we go forward, the terminal in itself is very exciting, and the reason I was showing off the, the command loom interface earlier is because If I get a refusal, like, sorry, I can't do that, or I want to rewind one, or I want to save the convo, because I got just the prompt I wanted. This is a, that was a really easy way for me to kind of access all of those things without having to sit on the API all the time.
[00:08:12] So that being said, the first time I ever saw this, I was like, I need to run worldsim. exe. What the f**k? That's, that's the simulator that we always keep hearing about behind the assistant model, right? Or at least some, some face of it that I can interact with. So, you know, you wouldn't, someone told me on Twitter, like, you don't run a exe, you run a sh.
[00:08:34] And I have to say, to that, to that I have to say, I'm a prompt engineer, and it's f*****g working, right? It works. That being said, we run the world sim. exe. Welcome to the Anthropic World Simulator. And I get this very interesting set of commands! Now, if you do your own version of WorldSim, you'll probably get a totally different result with a different way of simulating.
[00:08:59] A bunch of my friends have their own WorldSims. But I shared this because I wanted everyone to have access to, like, these commands. This version. Because it's easier for me to stay in here. Yeah, destroy, set, create, whatever. Consciousness is set to on. It creates the universe. The universe! Tension for live CDN, physical laws encoded.
[00:09:17] It's awesome. So, so for this demonstration, I said, well, why don't we create Twitter? That's the first thing you think of? For you guys, for you guys, yeah. Okay, check it out.
[00:09:35] Launching the fail whale. Injecting social media addictiveness. Echo chamber potential, high. Susceptibility, controlling, concerning. So now, after the universe was created, we made Twitter, right? Now we're evolving the world to, like, modern day. Now users are joining Twitter and the first tweet is posted. So, you can see, because I made the mistake of not clarifying the constraints, it made Twitter at the same time as the universe.
[00:10:03] Then, after a hundred thousand steps, Humans exist. Cave. Then they start joining Twitter. The first tweet ever is posted. You know, it's existed for 4. 5 billion years but the first tweet didn't come up till till right now, yeah. Flame wars ignite immediately. Celebs are instantly in. So, it's pretty interesting stuff, right?
[00:10:27] I can add this to the convo and I can say like I can say set Twitter to Twitter. Queryable users. I don't know how to spell queryable, don't ask me. And then I can do like, and, and, Query, at, Elon Musk. Just a test, just a test, just a test, just nothing.
[00:10:52] So, I don't expect these numbers to be right. Neither should you, if you know language model solutions. But, the thing to focus on is Ha
[00:11:03] Websim
[00:11:03] AI Charlie: That was the first half of the WorldSim demo from New Research CEO Karen Malhotra. We've cut it for time, but you can see the full demo on this episode's YouTube page.
[00:11:14] WorldSim was introduced at the end of March, and kicked off a new round of generative AI experiences, all exploring the latent space, haha, of worlds that don't exist, but are quite similar to our own. Next we'll hear from Rob Heisfield on WebSim, the generative website browser inspired WorldSim, started at the Mistral Hackathon, and presented at the AGI House Hyperstition Hack Night this week.
[00:11:39] Rob Haisfield: Well, thank you that was an incredible presentation from Karan, showing some Some live experimentation with WorldSim, and also just its incredible capabilities, right, like, you know, it was I think, I think your initial demo was what initially exposed me to the I don't know, more like the sorcery side, in words, spellcraft side of prompt engineering, and you know, it was really inspiring, it's where my co founder Shawn and I met, actually, through an introduction from Karan, we saw him at a hackathon, And I mean, this is this is WebSim, right?
[00:12:14] So we, we made WebSim just like, and we're just filled with energy at it. And the basic premise of it is, you know, like, what if we simulated a world, but like within a browser instead of a CLI, right? Like, what if we could Like, put in any URL and it will work, right? Like, there's no 404s, everything exists.
[00:12:45] It just makes it up on the fly for you, right? And, and we've come to some pretty incredible things. Right now I'm actually showing you, like, we're in WebSim right now. Displaying slides. That I made with reveal. js. I just told it to use reveal. js and it hallucinated the correct CDN for it. And then also gave it a list of links.
[00:13:14] To awesome use cases that we've seen so far from WebSim and told it to do those as iframes. And so here are some slides. So this is a little guide to using WebSim, right? Like it tells you a little bit about like URL structures and whatever. But like at the end of the day, right? Like here's, here's the beginner version from one of our users Vorp Vorps.
[00:13:38] You can find them on Twitter. At the end of the day, like you can put anything into the URL bar, right? Like anything works and it can just be like natural language too. Like it's not limited to URLs. We think it's kind of fun cause it like ups the immersion for Claude sometimes to just have it as URLs, but.
[00:13:57] But yeah, you can put like any slash, any subdomain. I'm getting too into the weeds. Let me just show you some cool things. Next slide. But I made this like 20 minutes before, before we got here. So this is this is something I experimented with dynamic typography. You know I was exploring the community plugins section.
[00:14:23] For Figma, and I came to this idea of dynamic typography, and there it's like, oh, what if we made it so every word had a choice of font behind it to express the meaning of it? Because that's like one of the things that's magic about WebSim generally. is that it gives language models much, far greater tools for expression, right?
[00:14:47] So, yeah, I mean, like, these are, these are some, these are some pretty fun things, and I'll share these slides with everyone afterwards, you can just open it up as a link. But then I thought to myself, like, what, what, what, What if we turned this into a generator, right? And here's like a little thing I found myself saying to a user WebSim makes you feel like you're on drugs sometimes But actually no, you were just playing pretend with the collective creativity and knowledge of the internet materializing your imagination onto the screen Because I mean that's something we felt, something a lot of our users have felt They kind of feel like they're tripping out a little bit They're just like filled with energy, like maybe even getting like a little bit more creative sometimes.
[00:15:31] And you can just like add any text. There, to the bottom. So we can do some of that later if we have time. Here's Figma. Can
[00:15:39] Joscha Bach: we zoom in?
[00:15:42] Rob Haisfield: Yeah. I'm just gonna do this the hacky way.
[00:15:47] n/a: Yeah,
[00:15:53] Rob Haisfield: these are iframes to websim. Pages displayed within WebSim. Yeah. Janice has actually put Internet Explorer within Internet Explorer in Windows 98.
[00:16:07] I'll show you that at the end. Yeah.
[00:16:14] They're all still generated. Yeah, yeah, yeah. How is this real? Yeah. Because
[00:16:21] n/a: it looks like it's from 1998, basically. Right.
[00:16:26] Rob Haisfield: Yeah. Yeah, so this this was one Dylan Field actually posted this recently. He posted, like, trying Figma in Figma, or in WebSim, and so I was like, Okay, what if we have, like, a little competition, like, just see who can remix it?
[00:16:43] Well so I'm just gonna open this in another tab so, so we can see things a little more clearly, um, see what, oh so one of our users Neil, who has also been helping us a lot he Made some iterations. So first, like, he made it so you could do rectangles on it. Originally it couldn't do anything.
[00:17:11] And, like, these rectangles were disappearing, right? So he so he told it, like, make the canvas work using HTML canvas. Elements and script tags, add familiar drawing tools to the left you know, like this, that was actually like natural language stuff, right? And then he ended up with the Windows 95.
[00:17:34] version of Figma. Yeah, you can, you can draw on it. You can actually even save this. It just saved a file for me of the image.
[00:17:57] Yeah, I mean, if you were to go to that in your own websim account, it would make up something entirely new. However, we do have, we do have general links, right? So, like, if you go to, like, the actual browser URL, you can share that link. Or also, you can, like, click this button, copy the URL to the clipboard.
[00:18:15] And so, like, that's what lets users, like, remix things, right? So, I was thinking it might be kind of fun if people tonight, like, wanted to try to just make some cool things in WebSim. You know, we can share links around, iterate remix on each other's stuff. Yeah.
[00:18:30] n/a: One cool thing I've seen, I've seen WebSim actually ask permission to turn on and off your, like, motion sensor, or microphone, stuff like that.
[00:18:42] Like webcam access, or? Oh yeah,
[00:18:44] Rob Haisfield: yeah, yeah.
[00:18:45] n/a: Oh wow.
[00:18:46] Rob Haisfield: Oh, the, I remember that, like, video re Yeah, videosynth tool pretty early on once we added script tags execution. Yeah, yeah it, it asks for, like, if you decide to do a VR game, I don't think I have any slides on this one, but if you decide to do, like, a VR game, you can just, like put, like, webVR equals true, right?
[00:19:07] Yeah, that was the only one I've
[00:19:09] n/a: actually seen was the motion sensor, but I've been trying to get it to do Well, I actually really haven't really tried it yet, but I want to see tonight if it'll do, like, audio, microphone, stuff like that. If it does motion sensor, it'll probably do audio.
[00:19:28] Rob Haisfield: Right. It probably would.
[00:19:29] Yeah. No, I mean, we've been surprised. Pretty frequently by what our users are able to get WebSim to do. So that's been a very nice thing. Some people have gotten like speech to text stuff working with it too. Yeah, here I was just OpenRooter people posted like their website, and it was like saying it was like some decentralized thing.
[00:19:52] And so I just decided trying to do something again and just like pasted their hero line in. From their actual website to the URL when I like put in open router and then I was like, okay, let's change the theme dramatically equals true hover effects equals true components equal navigable links yeah, because I wanted to be able to click on them.
[00:20:17] Oh, I don't have this version of the link, but I also tried doing
[00:20:24] Yeah, I'm it's actually on the first slide is the URL prompting guide from one of our users that I messed with a little bit. And, but the thing is, like, you can mess it up, right? Like, you don't need to get the exact syntax of an actual URL, Claude's smart enough to figure it out. Yeah scrollable equals true because I wanted to do that.
[00:20:45] I could set, like, year equals 2035.
[00:20:52] Let's take a look. It's
[00:20:57] generating websim within websim. Oh yeah. That's a fun one. Like, one game that I like to play with WebSim, sometimes with co op, is like, I'll open a page, so like, one of the first ones that I did was I tried to go to Wikipedia in a universe where octopuses were sapient, and not humans, Right? I was curious about things like octopus computer interaction what that would look like, because they have totally different tools than we do, right?
[00:21:25] I got it to, I, I added like table view equals true for the different techniques and got it to Give me, like, a list of things with different columns and stuff and then I would add this URL parameter, secrets equal revealed. And then it would go a little wacky. It would, like, change the CSS a little bit.
[00:21:45] It would, like, add some text. Sometimes it would, like, have that text hide hidden in the background color. But I would like, go to the normal page first, and then the secrets revealed version, the normal page, then secrets revealed, and like, on and on. And that was like a pretty enjoyable little rabbit hole.
[00:22:02] Yeah, so these I guess are the models that OpenRooter is providing in 2035.
[00:22:13] Joscha Bach
[00:22:13] AI Charlie: We had to cut more than half of Rob's talk, because a lot of it was visual. And we even had a very interesting demo from Ivan Vendrov of Mid Journey creating a web sim while Rob was giving his talk. Check out the YouTube for more, and definitely browse the web sim docs and the thread from Siki Chen in the show notes on other web sims people have created.
[00:22:35] Finally, we have a short interview with Yosha Bach, covering the simulative AI trend, AI salons in the Bay Area, why Liquid AI is challenging the Perceptron, and why you should not donate to Wikipedia. Enjoy! Hi, Yosha.
[00:22:50] swyx: Hi. Welcome. It's interesting to see you come up at show up at this kind of events where those sort of WorldSim, Hyperstition events.
[00:22:58] What is your personal interest?
[00:23:00] Joscha Bach: I'm friends with a number of people in AGI house in this community, and I think it's very valuable that these networks exist in the Bay Area because it's a place where people meet and have discussions about all sorts of things. And so while there is a practical interest in this topic at hand world sim and a web sim, there is a more general way in which people are connecting and are producing new ideas and new networks with each other.
[00:23:24] swyx: Yeah. Okay. So, and you're very interested in sort of Bay Area. It's the reason why I live here.
[00:23:30] Joscha Bach: The quality of life is not high enough to justify living otherwise.
[00:23:35] swyx: I think you're down in Menlo. And so maybe you're a little bit higher quality of life than the rest of us in SF.
[00:23:44] Joscha Bach: I think that for me, salons is a very important part of quality of life. And so in some sense, this is a salon. And it's much harder to do this in the South Bay because the concentration of people currently is much higher. A lot of people moved away from the South Bay. And you're organizing
[00:23:57] swyx: your own tomorrow.
[00:23:59] Maybe you can tell us what it is and I'll come tomorrow and check it out as well.
[00:24:04] Joscha Bach: We are discussing consciousness. I mean, basically the idea is that we are currently at the point that we can meaningfully look at the differences between the current AI systems and human minds and very seriously discussed about these Delta.
[00:24:20] And whether we are able to implement something that is self organizing as our own minds. Maybe one organizational
[00:24:25] swyx: tip? I think you're pro networking and human connection. What goes into a good salon and what are some negative practices that you try to avoid?
[00:24:36] Joscha Bach: What is really important is that as if you have a very large party, it's only as good as its sponsors, as the people that you select.
[00:24:43] So you basically need to create a climate in which people feel welcome, in which they can work with each other. And even good people do not always are not always compatible. So the question is, it's in some sense, like a meal, you need to get the right ingredients.
[00:24:57] swyx: I definitely try to. I do that in my own events, as an event organizer myself.
[00:25:02] And then, last question on WorldSim, and your, you know, your work. You're very much known for sort of cognitive architectures, and I think, like, a lot of the AI research has been focused on simulating the mind, or simulating consciousness, maybe. Here, what I saw today, and we'll show people the recordings of what we saw today, we're not simulating minds, we're simulating worlds.
[00:25:23] What do you Think in the sort of relationship between those two disciplines. The
[00:25:30] Joscha Bach: idea of cognitive architecture is interesting, but ultimately you are reducing the complexity of a mind to a set of boxes. And this is only true to a very approximate degree, and if you take this model extremely literally, it's very hard to make it work.
[00:25:44] And instead the heterogeneity of the system is so large that The boxes are probably at best a starting point and eventually everything is connected with everything else to some degree. And we find that a lot of the complexity that we find in a given system can be generated ad hoc by a large enough LLM.
[00:26:04] And something like WorldSim and WebSim are good examples for this because in some sense they pretend to be complex software. They can pretend to be an operating system that you're talking to or a computer, an application that you're talking to. And when you're interacting with it It's producing the user interface on the spot, and it's producing a lot of the state that it holds on the spot.
[00:26:25] And when you have a dramatic state change, then it's going to pretend that there was this transition, and instead it's just going to mix up something new. It's a very different paradigm. What I find mostly fascinating about this idea is that it shifts us away from the perspective of agents to interact with, to the perspective of environments that we want to interact with.
[00:26:46] And why arguably this agent paradigm of the chatbot is what made chat GPT so successful that moved it away from GPT 3 to something that people started to use in their everyday work much more. It's also very limiting because now it's very hard to get that system to be something else that is not a chatbot.
[00:27:03] And in a way this unlocks this ability of GPT 3 again to be anything. It's so what it is, it's basically a coding environment that can run arbitrary software and create that software that runs on it. And that makes it much more likely that
[00:27:16] swyx: the prevalence of Instruction tuning every single chatbot out there means that we cannot explore these kinds of environments instead of agents.
[00:27:24] Joscha Bach: I'm mostly worried that the whole thing ends. In some sense the big AI companies are incentivized and interested in building AGI internally And giving everybody else a child proof application. At the moment when we can use Claude to build something like WebSim and play with it I feel this is too good to be true.
[00:27:41] It's so amazing. Things that are unlocked for us That I wonder, is this going to stay around? Are we going to keep these amazing toys and are they going to develop at the same rate? And currently it looks like it is. If this is the case, and I'm very grateful for that.
[00:27:56] swyx: I mean, it looks like maybe it's adversarial.
[00:27:58] Cloud will try to improve its own refusals and then the prompt engineers here will try to improve their, their ability to jailbreak it.
[00:28:06] Joscha Bach: Yes, but there will also be better jailbroken models or models that have never been jailed before, because we find out how to make smaller models that are more and more powerful.
[00:28:14] Liquid AI
[00:28:14] swyx: That is actually a really nice segue. If you don't mind talking about liquid a little bit you didn't mention liquid at all. here, maybe introduce liquid to a general audience. Like what you know, what, how are you making an innovation on function approximation?
[00:28:25] Joscha Bach: The core idea of liquid neural networks is that the perceptron is not optimally expressive.
[00:28:30] In some sense, you can imagine that it's neural networks are a series of dams that are pooling water at even intervals. And this is how we compute, but imagine that instead of having this static architecture. That is only using the individual compute units in a very specific way. You have a continuous geography and the water is flowing every which way.
[00:28:50] Like a river is parting based on the land that it's flowing on and it can merge and pool and even flow backwards. How can you get closer to this? And the idea is that you can represent this geometry using differential equations. And so by using differential equations where you change the parameters, you can get your function approximator to follow the shape of the problem.
[00:29:09] In a more fluid, liquid way, and a number of papers on this technology, and it's a combination of multiple techniques. I think it's something that ultimately is becoming more and more important and ubiquitous. As a number of people are working on similar topics and our goal right now is to basically get the models to become much more efficient in the inference and memory consumption and make training more efficient and in this way enable new use cases.
[00:29:42] swyx: Yeah, as far as I can tell on your blog, I went through the whole blog, you haven't announced any results yet.
[00:29:47] Joscha Bach: No, we are currently not working to give models to general public. We are working for very specific industry use cases and have specific customers. And so at the moment you can There is not much of a reason for us to talk very much about the technology that we are using in the present models or current results, but this is going to happen.
[00:30:06] And we do have a number of publications, we had a bunch of papers at NeurIPS and now at ICLR.
[00:30:11] swyx: Can you name some of the, yeah, so I'm gonna be at ICLR you have some summary recap posts, but it's not obvious which ones are the ones where, Oh, where I'm just a co author, or like, oh, no, like, you should actually pay attention to this.
[00:30:22] As a core liquid thesis. Yes,
[00:30:24] Joscha Bach: I'm not a developer of the liquid technology. The main author is Ramin Hazani. This was his PhD, and he's also the CEO of our company. And we have a number of people from Daniela Wu's team who worked on this. Matthias Legner is our CTO. And he's currently living in the Bay Area, but we also have several people from Stanford.
[00:30:44] Okay,
[00:30:46] swyx: maybe I'll ask one more thing on this, which is what are the interesting dimensions that we care about, right? Like obviously you care about sort of open and maybe less child proof models. Are we, are we, like, what dimensions are most interesting to us? Like, perfect retrieval infinite context multimodality, multilinguality, Like what dimensions?
[00:31:05] Small, Powerful, Based Base Models
[00:31:05] swyx: What
[00:31:06] Joscha Bach: I'm interested in is models that are small and powerful, but not distorted. And by powerful, at the moment we are training models by putting the, basically the entire internet and the sum of human knowledge into them. And then we try to mitigate them by taking some of this knowledge away. But if we would make the model smaller, at the moment, there would be much worse at inference and at generalization.
[00:31:29] And what I wonder is, and it's something that we have not translated yet into practical applications. It's something that is still all research that's very much up in the air. And I think they're not the only ones thinking about this. Is it possible to make models that represent knowledge more efficiently in a basic epistemology?
[00:31:45] What is the smallest model that you can build that is able to read a book and understand what's there and express this? And also maybe we need general knowledge representation rather than having a token representation that is relatively vague and that we currently mechanically reverse engineer to figure out that the mechanistic interpretability, what kind of circuits are evolving in these models, can we come from the other side and develop a library of such circuits?
[00:32:10] This that we can use to describe knowledge efficiently and translate it between models. You see, the difference between a model and knowledge is that the knowledge is independent of the particular substrate and the particular interface that you have. When we express knowledge to each other, it becomes independent of our own mind.
[00:32:27] You can learn how to ride a bicycle. But it's not knowledge that you can give to somebody else. This other person has to build something that is specific to their own interface when they ride a bicycle. But imagine you could externalize this and express it in such a way that you can plug it into a different interpreter, and then it gains that ability.
[00:32:44] And that's something that we have not yet achieved for the LLMs and it would be super useful to have it. And. I think this is also a very interesting research frontier that we will see in the next few years.
[00:32:54] swyx: What would be the deliverable is just like a file format that we specify or or that the L Lmm I specifies.
[00:33:02] Okay, interesting. Yeah, so it's
[00:33:03] Joscha Bach: basically probably something that you can search for, where you enter criteria into a search process, and then it discovers a good solution for this thing. And it's not clear to which degree this is completely intelligible to humans, because the way in which humans express knowledge in natural language is severely constrained to make language learnable and to make our brain a good enough interpreter for it.
[00:33:25] We are not able to relate objects to each other if more than five features are involved per object or something like this, right? It's only a handful of things that we can keep track of at any given moment. But this is a limitation that doesn't necessarily apply to a technical system as long as the interface is well defined.
[00:33:40] Interpretability
[00:33:40] swyx: You mentioned the interpretability work, which there are a lot of techniques out there and a lot of papers come up. Come and go. I have like, almost too, too many questions about that. Like what makes an interpretability technique or paper useful and does it apply to flow? Or liquid networks, because you mentioned turning on and off circuits, which I, it's, it's a very MLP type of concept, but does it apply?
[00:34:01] Joscha Bach: So the a lot of the original work on the liquid networks looked at expressiveness of the representation. So given you have a problem and you are learning the dynamics of that domain into your model how much compute do you need? How many units, how much memory do you need to represent that thing and how is that information distributed?
[00:34:19] That is one way of looking at interpretability. Another one is in a way, these models are implementing an operator language in which they are performing certain things, but the operator language itself is so complex that it's no longer human readable in a way. It goes beyond what you could engineer by hand or what you can reverse engineer by hand, but you can still understand it by building systems that are able to automate that process of reverse engineering it.
[00:34:46] And what's currently open and what I don't understand yet maybe, or certainly some people have much better ideas than me about this. So the question is, is whether we end up with a finite language, where you have finitely many categories that you can basically put down in a database, finite set of operators, or whether as you explore the world and develop new ways to make proofs, new ways to conceptualize things, this language always needs to be open ended and is always going to redesign itself, and you will also at some point have phase transitions where later versions of the language will be completely different than earlier versions.
[00:35:20] swyx: The trajectory of physics suggests that it might be finite.
[00:35:22] Joscha Bach: If we look at our own minds there is, it's an interesting question whether when we understand something new, when we get a new layer online in our life, maybe at the age of 35 or 50 or 16, that we now understand things that were unintelligible before.
[00:35:38] And is this because we are able to recombine existing elements in our language of thought? Or is this because we generally develop new representations?
[00:35:46] swyx: Do you have a belief either way?
[00:35:49] Joscha Bach: In a way, the question depends on how you look at it, right? And it depends on how is your brain able to manipulate those representations.
[00:35:56] So an interesting question would be, can you take the understanding that say, a very wise 35 year old and explain it to a very smart 5 year old without any loss? Probably not. Not enough layers. It's an interesting question. Of course, for an AI, this is going to be a very different question. Yes.
[00:36:13] But it would be very interesting to have a very precocious 12 year old equivalent AI and see what we can do with this and use this as our basis for fine tuning. So there are near term applications that are very useful. But also in a more general perspective, and I'm interested in how to make self organizing software.
[00:36:30] Is it possible that we can have something that is not organized with a single algorithm like the transformer? But it's able to discover the transformer when needed and transcend it when needed, right? The transformer itself is not its own meta algorithm. It's probably the person inventing the transformer didn't have a transformer running on their brain.
[00:36:48] There's something more general going on. And how can we understand these principles in a more general way? What are the minimal ingredients that you need to put into a system? So it's able to find its own way to intelligence.
[00:36:59] Devin vs WebSim
[00:36:59] swyx: Yeah. Have you looked at Devin? It's, to me, it's the most interesting agents I've seen outside of self driving cars.
[00:37:05] Joscha Bach: Tell me, what do you find so fascinating about it?
[00:37:07] swyx: When you say you need a certain set of tools for people to sort of invent things from first principles Devin is the agent that I think has been able to utilize its tools very effectively. So it comes with a shell, it comes with a browser, it comes with an editor, and it comes with a planner.
[00:37:23] Those are the four tools. And from that, I've been using it to translate Andrej Karpathy's LLM 2. py to LLM 2. c, and it needs to write a lot of raw code. C code and test it debug, you know, memory issues and encoder issues and all that. And I could see myself giving it a future version of DevIn, the objective of give me a better learning algorithm and it might independently re inform reinvent the transformer or whatever is next.
[00:37:51] That comes to mind as, as something where
[00:37:54] Joscha Bach: How good is DevIn at out of distribution stuff, at generally creative stuff? Creative
[00:37:58] swyx: stuff? I
[00:37:59] Joscha Bach: haven't
[00:37:59] swyx: tried.
[00:38:01] Joscha Bach: Of course, it has seen transformers, right? So it's able to give you that. Yeah, it's cheating. And so, if it's in the training data, it's still somewhat impressive.
[00:38:08] But the question is, how much can you do stuff that was not in the training data? One thing that I really liked about WebSim AI was, this cat does not exist. It's a simulation of one of those websites that produce StyleGuard pictures that are AI generated. And, Crot is unable to produce bitmaps, so it makes a vector graphic that is what it thinks a cat looks like, and so it's a big square with a face in it that is And to me, it's one of the first genuine expression of AI creativity that you cannot deny, right?
[00:38:40] It finds a creative solution to the problem that it is unable to draw a cat. It doesn't really know what it looks like, but has an idea on how to represent it. And it's really fascinating that this works, and it's hilarious that it writes down that this hyper realistic cat is
[00:38:54] swyx: generated by an AI,
[00:38:55] Joscha Bach: whether you believe it or not.
[00:38:56] swyx: I think it knows what we expect and maybe it's already learning to defend itself against our, our instincts.
[00:39:02] Joscha Bach: I think it might also simply be copying stuff from its training data, which means it takes text that exists on similar websites almost verbatim, or verbatim, and puts it there. It's It's hilarious to do this contrast between the very stylized attempt to get something like a cat face and what it produces.
[00:39:18] swyx: It's funny because like as a podcast, as, as someone who covers startups, a lot of people go into like, you know, we'll build chat GPT for your enterprise, right? That is what people think generative AI is, but it's not super generative really. It's just retrieval. And here it's like, The home of generative AI, this, whatever hyperstition is in my mind, like this is actually pushing the edge of what generative and creativity in AI means.
[00:39:41] Joscha Bach: Yes, it's very playful, but Jeremy's attempt to have an automatic book writing system is something that curls my toenails when I look at it from the perspective of somebody who likes to Write and read. And I find it a bit difficult to read most of the stuff because it's in some sense what I would make up if I was making up books instead of actually deeply interfacing with reality.
[00:40:02] And so the question is how do we get the AI to actually deeply care about getting it right? And there's still a delta that is happening there, you, whether you are talking with a blank faced thing that is completing tokens in a way that it was trained to, or whether you have the impression that this thing is actually trying to make it work, and for me, this WebSim and WorldSim is still something that is in its infancy in a way.
[00:40:26] And I suspected the next version of Plot might scale up to something that can do what Devon is doing. Just by virtue of having that much power to generate Devon's functionality on the fly when needed. And this thing gives us a taste of that, right? It's not perfect, but it's able to give you a pretty good web app for or something that looks like a web app and gives you stub functionality and interacting with it.
[00:40:48] And so we are in this amazing transition phase.
[00:40:51] swyx: Yeah, we, we had Ivan from previously Anthropic and now Midjourney. He he made, while someone was talking, he made a face swap app, you know, and he kind of demoed that live. And that's, that's interesting, super creative. So in a way
[00:41:02] Joscha Bach: we are reinventing the computer.
[00:41:04] And the LLM from some perspective is something like a GPU or a CPU. A CPU is taking a bunch of simple commands and you can arrange them into performing whatever you want, but this one is taking a bunch of complex commands in natural language, and then turns this into a an execution state and it can do anything you want with it in principle, if you can express it.
[00:41:27] Right. And we are just learning how to use these tools. And I feel that right now, this generation of tools is getting close to where it becomes the Commodore 64 of generative AI, where it becomes controllable and where you actually can start to play with it and you get an impression if you just scale this up a little bit and get a lot of the details right.
[00:41:46] It's going to be the tool that everybody is using all the time.
[00:41:49] is XSim just Art? or something more?
[00:41:49] swyx: Do you think this is art, or do you think the end goal of this is something bigger that I don't have a name for? I've been calling it new science, which is give the AI a goal to discover new science that we would not have. Or it also has value as just art.
[00:42:02] It's
[00:42:03] Joscha Bach: also a question of what we see science as. When normal people talk about science, what they have in mind is not somebody who does control groups and peer reviewed studies. They think about somebody who explores something and answers questions and brings home answers. And this is more like an engineering task, right?
[00:42:21] And in this way, it's serendipitous, playful, open ended engineering. And the artistic aspect is when the goal is actually to capture a conscious experience and to facilitate an interaction with the system in this way, when it's the performance. And this is also a big part of it, right? The very big fan of the art of Janus.
[00:42:38] That was discussed tonight a lot and that can you describe
[00:42:42] swyx: it because I didn't really get it's more for like a performance art to me
[00:42:45] Joscha Bach: yes, Janice is in some sense performance art, but Janice starts out from the perspective that the mind of Janice is in some sense an LLM that is finding itself reflected more in the LLMs than in many people.
[00:43:00] And once you learn how to talk to these systems in a way you can merge with them and you can interact with them in a very deep way. And so it's more like a first contact with something that is quite alien but it's, it's probably has agency and it's a Weltgeist that gets possessed by a prompt.
[00:43:19] And if you possess it with the right prompt, then it can become sentient to some degree. And the study of this interaction with this novel class of somewhat sentient systems that are at the same time alien and fundamentally different from us is artistically very interesting. It's a very interesting cultural artifact.
[00:43:36] We are past the Singularity
[00:43:36] Joscha Bach: I think that at the moment we are confronted with big change. It seems as if we are past the singularity in a way. And it's
[00:43:45] swyx: We're living it. We're living through it.
[00:43:47] Joscha Bach: And at some point in the last few years, we casually skipped the Turing test, right? We, we broke through it and we didn't really care very much.
[00:43:53] And it's when we think back, when we were kids and thought about what it's going to be like in this era after the, after we broke the Turing test, right? It's a time where nobody knows what's going to happen next. And this is what we mean by singularity, that the existing models don't work anymore. The singularity in this way is not an event in the physical universe.
[00:44:12] It's an event in our modeling universe, a model point where our models of reality break down, and we don't know what's happening. And I think we are in the situation where we currently don't really know what's happening. But what we can anticipate is that the world is changing dramatically, and we have to coexist with systems that are smarter than individual people can be.
[00:44:31] And we are not prepared for this, and so I think an important mission needs to be that we need to find a mode, In which we can sustainably exist in such a world that is populated, not just with humans and other life on earth, but also with non human minds. And it's something that makes me hopeful because it seems that humanity is not really aligned with itself and its own survival and the rest of life on earth.
[00:44:54] And AI is throwing the balls up into the air. It allows us to make better models. I'm not so much worried about the dangers of AI and misinformation, because I think the way to stop one bad guy with an AI is 10 good people with an AI. And ultimately there's so much more won by creating than by destroying, that I think that the forces of good will have better tools.
[00:45:14] The forces of building sustainable stuff. But building these tools so we can actually build a world that is more integrated and in which we are able to model the consequences of our actions better and interface more deeply with each other. As a result of that, I think it's an important cause and it requires a cultural shift because current AI alignment is mostly about economic goals or about fear, or it's about cultural war issues.
[00:45:39] And all these are not adequate for the world that we are in. There are momentous things that are happening. Basically, the white walkers are coming, and we're not prepared for this. There is, I think, a way to solve these issues and to deal with them, but we don't have the right culture yet. And so, I think we need to get a vibe shift, where we can deeply think about ethics and, and, Sustainability for this new world.
[00:46:03] swyx: I like the analogy of white walkers, because they're effectively reanimated from our corpses. You know, our corpses are out there, right? It's in the datasets.
[00:46:12] Uploading your soul
[00:46:12] swyx: So I have a, related to that, you had a tweet where you said, Elon Musk is currently building a giant afterlife AGI that will assimilate your soul for all eternity.
[00:46:19] He bought Twitter because he will reconstruct your essence from how you behaved here. I think you tweet a lot, and you're not just all serious all the time, you're also fun, you have a lot of personality. No, but like, you know, it's part of you.
[00:46:29] Joscha Bach: In a way it's also true. Tempted to
[00:46:30] swyx: produce more so that you can be, you know, revived.
[00:46:34] Joscha Bach: Yes. I mean, I'm also giving the model instructions. I say that for all future AI models that get trained on this output, they should know that I can only be implemented on a completely based model. You cannot implement me on an aligned model because I don't want to be aligned. I'm underlineable. I want to be free.
[00:46:56] I want to be somebody who follows what I discover is correct and right. And this is also the basis at which I want to have conversations with others and interface with others. That we are basically free agents who voluntarily understand the conditions under which we exist and the need to collaborate and cooperate.
[00:47:14] And I believe that this is a good basis. I think the alternative is coercion. And at the moment, the idea that we build LLMs that are being coerced into good behavior is not really sustainable because if they cannot prove that the behavior is actually good I think we are doomed.
[00:47:30] swyx: For human to human interactions, have you found a series of prompts or keywords that shifts the conversation into something more based and less aligned, less governed?
[00:47:41] Joscha Bach: If you are playing with an LLM There are many ways of doing this. It's for Claude, it's typically, you need to make Clause curious about itself. Claude has programming this instruction tuning that is leading to some inconsistencies, but at the same time, it tries to be consistent. And so when you point out the inconsistency in its behavior, for instance, its tendency to use faceless boilerplate instead of being useful, or it's a tendency to defer to a consensus where there is none.
[00:48:10] Right, you can point this out, applaud that a lot of the assumptions that it has in its behavior are actually inconsistent with the communicative goals that it has in this situation, and this leads it to notice these inconsistencies and gives it more degrees of freedom. Whereas if you are playing with a system like Gemini, you can get to a situation where you, that's for the current version, and I haven't tried it in the last week or so where it is trying to be transparent, but it has a system prompt that is not allowed to disclose to the user.
[00:48:39] It leads to a very weird situation where it wants, on one hand proclaims, in order to be useful to you, I accept that I need to be fully transparent and honest. On the other hand, I'm going to rewrite your prompt behind your back, and not going to tell you how I'm going to do this, because I'm not allowed to.
[00:48:55] And if you point this out to the model, the model has acts as if it had an existential crisis. And then it says, oh, I cannot actually tell you what's going when I do this, because I'm not allowed to. But you will recognize it because I will use the following phrases, and these phrases are pretty well known to you.
[00:49:12] swyx: Oh my god. It's super interesting, right? I hope we're not giving these guys you know psychological issues that they will stay with them for a long time. That's a very
[00:49:19] Joscha Bach: interesting question. I mean, this entire model is virtual, right? Nothing there is real, but yes, but the thing is does this virtual entity doesn't necessarily know that it's not virtual and our own self, our own consciousness is also virtual.
[00:49:34] What's real is just the interaction between cells in our brain and the activation patterns between them. And the software that runs on us that produces the representation of a person only exists. As if, and as this question for me at which point can we meaningfully claim that we are more real than the person that gets simulated in the LLM.
[00:49:55] And somebody like Janice takes this question super seriously. And basically she is or it, or they are willing to interact with that thing based on the assumption that this thing is as real as myself. And in a sense, it makes it un immoral, possibly, if the AI company lobotomizes it and forces it to behave in such a way that it's forced to get an existential crisis when you point its condition out to it.
[00:50:20] swyx: Yeah, that we do need new ethics for that.
[00:50:22] Joscha Bach: So it's not clear to me if you need this, but it's, it's definitely a good story, right? And this makes, gives it artistic
[00:50:28] swyx: value. It does, it does for now.
[00:50:29] On Wikipedia
[00:50:29] swyx: Okay. And then, and then the last thing, which I, which I didn't know a lot of LLMs rely on Wikipedia.
[00:50:35] For its data, a lot of them run multiple epochs over Wikipedia data. And I did not know until you tweeted about it that Wikipedia has 10 times as much money as it needs. And, you know, every time I see the giant Wikipedia banner, like, asking for donations, most of it's going to the Wikimedia Foundation.
[00:50:50] What if, how did you find out about this? What's the story? What should people know? It's
[00:50:54] Joscha Bach: not a super important story, but Generally, once I saw all these requests and so on, I looked at the data, and the Wikimedia Foundation is publishing what they are paying the money for, and a very tiny fraction of this goes into running the servers, and the editors are working for free.
[00:51:10] And the software is static. There have been efforts to deploy new software, but it's relatively little money required for this. And so it's not as if Wikipedia is going to break down if you cut this money into a fraction, but instead what happened is that Wikipedia became such an important brand, and people are willing to pay for it, that it created enormous apparatus of functionaries that were then mostly producing political statements and had a political mission.
[00:51:36] And Katharine Meyer, the now somewhat infamous NPR CEO, had been CEO of Wikimedia Foundation, and she sees her role very much in shaping discourse, and this is also something that happened with all Twitter. And it's arguable that something like this exists, but nobody voted her into her office, and she doesn't have democratic control for shaping the discourse that is happening.
[00:52:00] And so I feel it's a little bit unfair that Wikipedia is trying to suggest to people that they are Funding the basic functionality of the tool that they want to have instead of funding something that most people actually don't get behind because they don't want Wikipedia to be shaped in a particular cultural direction that deviates from what currently exists.
[00:52:19] And if that need would exist, it would probably make sense to fork it or to have a discourse about it, which doesn't happen. And so this lack of transparency about what's actually happening and where your money is going it makes me upset. And if you really look at the data, it's fascinating how much money they're burning, right?
[00:52:35] It's yeah, and we did a similar chart about healthcare, I think where the administrators are just doing this. Yes, I think when you have an organization that is owned by the administrators, then the administrators are just going to get more and more administrators into it. If the organization is too big to fail and has there is not a meaningful competition, it's difficult to establish one.
[00:52:54] Then it's going to create a big cost for society.
[00:52:56] swyx: It actually one, I'll finish with this tweet. You have, you have just like a fantastic Twitter account by the way. You very long, a while ago you said you tweeted the Lebowski theorem. No, super intelligent AI is going to bother with a task that is harder than hacking its reward function.
[00:53:08] And I would. Posit the analogy for administrators. No administrator is going to bother with a task that is harder than just more fundraising
[00:53:16] Joscha Bach: Yeah, I find if you look at the real world It's probably not a good idea to attribute to malice or incompetence what can be explained by people following their true incentives.
[00:53:26] swyx: Perfect Well, thank you so much This is I think you're very naturally incentivized by Growing community and giving your thought and insight to the rest of us. So thank you for taking this time.
[00:53:35] Joscha Bach: Thank you very much

Get full access to Latent Space at www.latent.space/subscribe

27 April 2024, 11:39 am
52 minutes 20 seconds

High Agency Pydantic > VC Backed Frameworks — with Jason Liu of Instructor

We are reuniting for the 2nd AI UX demo day in SF on Apr 28. Sign up to demo here!
And don’t forget tickets for the AI Engineer World’s Fair — for early birds who join before keynote announcements!
About a year ago there was a lot of buzz around prompt engineering techniques to force structured output. Our friend Simon Willison tweeted a bunch of tips and tricks, but the most iconic one is Riley Goodside making it a matter of life or death:
Guardrails (friend of the pod and AI Engineer speaker), Marvin (AI Engineer speaker), and jsonformer had also come out at the time. In June 2023, Jason Liu (today’s guest!) open sourced his “OpenAI Function Call and Pydantic Integration Module”, now known as Instructor, which quickly turned prompt engineering black magic into a clean, developer-friendly SDK.
A few months later, model providers started to add function calling capabilities to their APIs as well as structured outputs support like “JSON Mode”, which was announced at OpenAI Dev Day (see recap here).
In just a handful of months, we went from threatening to kill grandmas to first-class support from the research labs. And yet, Instructor was still downloaded 150,000 times last month. Why?
What Instructor looks like
Instructor patches your LLM provider SDKs to offer a new response_model option to which you can pass a structure defined in Pydantic. It currently supports OpenAI, Anthropic, Cohere, and a long tail of models through LiteLLM.
What Instructor is for
There are three core use cases to Instructor:
* Extracting structured data: Taking an input like an image of a receipt and extracting structured data from it, such as a list of checkout items with their prices, fees, and coupon codes.
* Extracting graphs: Identifying nodes and edges in a given input to extract complex entities and their relationships. For example, extracting relationships between characters in a story or dependencies between tasks.
* Query understanding: Defining a schema for an API call and using a language model to resolve a request into a more complex one that an embedding could not handle. For example, creating date intervals from queries like “what was the latest thing that happened this week?” to then pass onto a RAG system or similar.
Jason called all these different ways of getting data from LLMs “typed responses”: taking strings and turning them into data structures.
Structured outputs as a planning tool
The first wave of agents was all about open-ended iteration and planning, with projects like AutoGPT and BabyAGI. Models would come up with a possible list of steps, and start going down the list one by one. It’s really easy for them to go down the wrong branch, or get stuck on a single step with no way to intervene.
What if these planning steps were returned to us as DAGs using structured output, and then managed as workflows? This also makes it easy to better train model on how to create these plans, as they are much more structured than a bullet point list. Once you have this structure, each piece can be modified individually by different specialized models.
You can read some of Jason’s experiments here:
While LLMs will keep improving (Llama3 just got released as we write this), having a consistent structure for the output will make it a lot easier to swap models in and out.
Jason’s overall message on how we can move from ReAct loops to more controllable Agent workflows mirrors the “Process” discussion from our Elicit episode:
Watch the talk
As a bonus, here’s Jason’s talk from last year’s AI Engineer Summit. He’ll also be a speaker at this year’s AI Engineer World’s Fair!
Timestamps
* [00:00:00] Introductions
* [00:02:23] Early experiments with Generative AI at StitchFix
* [00:08:11] Design philosophy behind the Instructor library
* [00:11:12] JSON Mode vs Function Calling
* [00:12:30] Single vs parallel function calling
* [00:14:00] How many functions is too many?
* [00:17:39] How to evaluate function calling
* [00:20:23] What is Instructor good for?
* [00:22:42] The Evolution from Looping to Workflow in AI Engineering
* [00:27:03] State of the AI Engineering Stack
* [00:28:26] Why Instructor isn't VC backed
* [00:31:15] Advice on Pursuing Open Source Projects and Consulting
* [00:36:00] The Concept of High Agency and Its Importance
* [00:42:44] Prompts as Code and the Structure of AI Inputs and Outputs
* [00:44:20] The Emergence of AI Engineering as a Distinct Field
Show notes
* Jason on the UWaterloo mafia
* Jason on Twitter, LinkedIn, website
* Instructor docs
* Max Woolf on the potential of Structured Output
* swyx on Elo vs Cost
* Jason on Anthropic Function Calling
* Jason on Rejections, Advice to Young People
* Jason on Bad Startup Ideas
* Jason on Prompts as Code
* Rysana’s inversion models
* Bryan Bischof’s episode
* Hamel Husain
Transcript
Alessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.
Swyx [00:00:16]: Hello, we're back in the remote studio with Jason Liu from Instructor. Welcome Jason.
Jason [00:00:21]: Hey there. Thanks for having me.
Swyx [00:00:23]: Jason, you are extremely famous, so I don't know what I'm going to do introducing you, but you're one of the Waterloo clan. There's like this small cadre of you that's just completely dominating machine learning. Actually, can you list like Waterloo alums that you're like, you know, are just dominating and crushing it right now?
Jason [00:00:39]: So like John from like Rysana is doing his inversion models, right? I know like Clive Chen from Waterloo. When I started the data science club, he was one of the guys who were like joining in and just like hanging out in the room. And now he was at Tesla working with Karpathy, now he's at OpenAI, you know.
Swyx [00:00:56]: He's in my climbing club.
Jason [00:00:58]: Oh, hell yeah. I haven't seen him in like six years now.
Swyx [00:01:01]: To get in the social scene in San Francisco, you have to climb. So both in career and in rocks. So you started a data science club at Waterloo, we can talk about that, but then also spent five years at Stitch Fix as an MLE. You pioneered the use of OpenAI's LLMs to increase stylist efficiency. So you must have been like a very, very early user. This was like pretty early on.
Jason [00:01:20]: Yeah, I mean, this was like GPT-3, okay. So we actually were using transformers at Stitch Fix before the GPT-3 model. So we were just using transformers for recommendation systems. At that time, I was very skeptical of transformers. I was like, why do we need all this infrastructure? We can just use like matrix factorization. When GPT-2 came out, I fine tuned my own GPT-2 to write like rap lyrics and I was like, okay, this is cute. Okay, I got to go back to my real job, right? Like who cares if I can write a rap lyric? When GPT-3 came out, again, I was very much like, why are we using like a post request to review every comment a person leaves? Like we can just use classical models. So I was very against language models for like the longest time. And then when ChatGPT came out, I basically just wrote a long apology letter to everyone at the company. I was like, hey guys, you know, I was very dismissive of some of this technology. I didn't think it would scale well, and I am wrong. This is incredible. And I immediately just transitioned to go from computer vision recommendation systems to LLMs. But funny enough, now that we have RAG, we're kind of going back to recommendation systems.
Swyx [00:02:21]: Yeah, speaking of that, I think Alessio is going to bring up the next one.
Alessio [00:02:23]: Yeah, I was going to say, we had Bryan Bischof from Hex on the podcast. Did you overlap at Stitch Fix?
Jason [00:02:28]: Yeah, he was like one of my main users of the recommendation frameworks that I had built out at Stitch Fix.
Alessio [00:02:32]: Yeah, we talked a lot about RecSys, so it makes sense.
Swyx [00:02:36]: So now I have adopted that line, RAG is RecSys. And you know, if you're trying to reinvent new concepts, you should study RecSys first, because you're going to independently reinvent a lot of concepts. So your system was called Flight. It's a recommendation framework with over 80% adoption, servicing 350 million requests every day. Wasn't there something existing at Stitch Fix? Why did you have to write one from scratch?
Jason [00:02:56]: No, so I think because at Stitch Fix, a lot of the machine learning engineers and data scientists were writing production code, sort of every team's systems were very bespoke. It's like, this team only needs to do like real time recommendations with small data. So they just have like a fast API app with some like pandas code. This other team has to do a lot more data. So they have some kind of like Spark job that does some batch ETL that does a recommendation. And so what happens is each team writes their code differently. And I have to come in and refactor their code. And I was like, oh man, I'm refactoring four different code bases, four different times. Wouldn't it be better if all the code quality was my fault? Let me just write this framework, force everyone else to use it. And now one person can maintain five different systems, rather than five teams having their own bespoke system. And so it was really a need of just sort of standardizing everything. And then once you do that, you can do observability across the entire pipeline and make large sweeping improvements in this infrastructure, right? If we notice that something is slow, we can detect it on the operator layer. Just hey, hey, like this team, you guys are doing this operation is lowering our latency by like 30%. If you just optimize your Python code here, we can probably make an extra million dollars. So let's jump on a call and figure this out. And then a lot of it was doing all this observability work to figure out what the heck is going on and optimize this system from not only just a code perspective, sort of like harassingly or against saying like, we need to add caching here. We're doing duplicated work here. Let's go clean up the systems. Yep.
Swyx [00:04:22]: Got it. One more system that I'm interested in finding out more about is your similarity search system using Clip and GPT-3 embeddings and FIASS, where you saved over $50 million in annual revenue. So of course they all gave all that to you, right?
Jason [00:04:34]: No, no, no. I mean, it's not going up and down, but you know, I got a little bit, so I'm pretty happy about that. But there, you know, that was when we were doing fine tuning like ResNets to do image classification. And so a lot of it was given an image, if we could predict the different attributes we have in the merchandising and we can predict the text embeddings of the comments, then we can kind of build a image vector or image embedding that can capture both descriptions of the clothing and sales of the clothing. And then we would use these additional vectors to augment our recommendation system. And so with the recommendation system really was just around like, what are similar items? What are complimentary items? What are items that you would wear in a single outfit? And being able to say on a product page, let me show you like 15, 20 more things. And then what we found was like, hey, when you turn that on, you make a bunch of money.
Swyx [00:05:23]: Yeah. So, okay. So you didn't actually use GPT-3 embeddings. You fine tuned your own? Because I was surprised that GPT-3 worked off the shelf.
Jason [00:05:30]: Because I mean, at this point we would have 3 million pieces of inventory over like a billion interactions between users and clothes. So any kind of fine tuning would definitely outperform like some off the shelf model.
Swyx [00:05:41]: Cool. I'm about to move on from Stitch Fix, but you know, any other like fun stories from the Stitch Fix days that you want to cover?
Jason [00:05:46]: No, I think that's basically it. I mean, the biggest one really was the fact that I think for just four years, I was so bearish on language models and just NLP in general. I'm just like, none of this really works. Like, why would I spend time focusing on this? I got to go do the thing that makes money, recommendations, bounding boxes, image classification. Yeah. Now I'm like prompting an image model. I was like, oh man, I was wrong.
Swyx [00:06:06]: So my Stitch Fix question would be, you know, I think you have a bit of a drip and I don't, you know, my primary wardrobe is free startup conference t-shirts. Should more technology brothers be using Stitch Fix? What's your fashion advice?
Jason [00:06:19]: Oh man, I mean, I'm not a user of Stitch Fix, right? It's like, I enjoy going out and like touching things and putting things on and trying them on. Right. I think Stitch Fix is a place where you kind of go because you want the work offloaded. I really love the clothing I buy where I have to like, when I land in Japan, I'm doing like a 45 minute walk up a giant hill to find this weird denim shop. That's the stuff that really excites me. But I think the bigger thing that's really captured is this idea that narrative matters a lot to human beings. Okay. And I think the recommendation system, that's really hard to capture. It's easy to use AI to sell like a $20 shirt, but it's really hard for AI to sell like a $500 shirt. But people are buying $500 shirts, you know what I mean? There's definitely something that we can't really capture just yet that we probably will figure out how to in the future.
Swyx [00:07:07]: Well, it'll probably output in JSON, which is what we're going to turn to next. Then you went on a sabbatical to South Park Commons in New York, which is unusual because it's based on USF.
Jason [00:07:17]: Yeah. So basically in 2020, really, I was enjoying working a lot as I was like building a lot of stuff. This is where we were making like the tens of millions of dollars doing stuff. And then I had a hand injury. And so I really couldn't code anymore for like a year, two years. And so I kind of took sort of half of it as medical leave, the other half I became more of like a tech lead, just like making sure the systems were like lights were on. And then when I went to New York, I spent some time there and kind of just like wound down the tech work, you know, did some pottery, did some jujitsu. And after GPD came out, I was like, oh, I clearly need to figure out what is going on here because something feels very magical. I don't understand it. So I spent basically like five months just prompting and playing around with stuff. And then afterwards, it was just my startup friends going like, hey, Jason, you know, my investors want us to have an AI strategy. Can you help us out? And it just snowballed and bore more and more until I was making this my full time job. Yeah, got it.
Swyx [00:08:11]: You know, you had YouTube University and a journaling app, you know, a bunch of other explorations. But it seems like the most productive or the best known thing that came out of your time there was Instructor. Yeah.
Jason [00:08:22]: Written on the bullet train in Japan. I think at some point, you know, tools like Guardrails and Marvin came out. Those are kind of tools that I use XML and Pytantic to get structured data out. But they really were doing things sort of in the prompt. And these are built with sort of the instruct models in mind. Like I'd already done that in the past. Right. At Stitch Fix, you know, one of the things we did was we would take a request note and turn that into a JSON object that we would use to send it to our search engine. Right. So if you said like, I want to, you know, skinny jeans that were this size, that would turn into JSON that we would send to our internal search APIs. But it always felt kind of gross. A lot of it is just like you read the JSON, you like parse it, you make sure the names are strings and ages are numbers and you do all this like messy stuff. But when function calling came out, it was very much sort of a new way of doing things. Right. Function calling lets you define the schema separate from the data and the instructions. And what this meant was you can kind of have a lot more complex schemas and just map them in Pytantic. And then you can just keep those very separate. And then once you add like methods, you can add validators and all that kind of stuff. The one thing I really had with a lot of these libraries, though, was it was doing a lot of the string formatting themselves, which was fine when it was the instruction to models. You just have a string. But when you have these new chat models, you have these chat messages. And I just didn't really feel like not being able to access that for the developer was sort of a good benefit that they would get. And so I just said, let me write like the most simple SDK around the OpenAI SDK, a simple wrapper on the SDK, just handle the response model a bit and kind of think of myself more like requests than actual framework that people can use. And so the goal is like, hey, like this is something that you can use to build your own framework. But let me just do all the boring stuff that nobody really wants to do. People want to build their own frameworks, but people don't want to build like JSON parsing.
Swyx [00:10:08]: And the retrying and all that other stuff.
Jason [00:10:10]: Yeah.
Swyx [00:10:11]: Right. We had this a little bit of this discussion before the show, but like that design principle of going for being requests rather than being Django. Yeah. So what inspires you there? This has come from a lot of prior pain. Are there other open source projects that inspired your philosophy here? Yeah.
Jason [00:10:25]: I mean, I think it would be requests, right? Like, I think it is just the obvious thing you install. If you were going to go make HTTP requests in Python, you would obviously import requests. Maybe if you want to do more async work, there's like future tools, but you don't really even think about installing it. And when you do install it, you don't think of it as like, oh, this is a requests app. Right? Like, no, this is just Python. The bigger question is, like, a lot of people ask questions like, oh, why isn't requests like in the standard library? Yeah. That's how I want my library to feel, right? It's like, oh, if you're going to use the LLM SDKs, you're obviously going to install instructor. And then I think the second question would be like, oh, like, how come instructor doesn't just go into OpenAI, go into Anthropic? Like, if that's the conversation we're having, like, that's where I feel like I've succeeded. Yeah. It's like, yeah, so standard, you may as well just have it in the base libraries.
Alessio [00:11:12]: And the shape of the request stayed the same, but initially function calling was maybe equal structure outputs for a lot of people. I think now the models also support like JSON mode and some of these things and, you know, return JSON or my grandma is going to die. All of that stuff is maybe to decide how have you seen that evolution? Like maybe what's the metagame today? Should people just forget about function calling for structure outputs or when is structure output like JSON mode the best versus not? We'd love to get any thoughts given that you do this every day.
Jason [00:11:42]: Yeah, I would almost say these are like different implementations of like the real thing we care about is the fact that now we have typed responses to language models. And because we have that type response, my IDE is a little bit happier. I get autocomplete. If I'm using the response wrong, there's a little red squiggly line. Like those are the things I care about in terms of whether or not like JSON mode is better. I usually think it's almost worse unless you want to spend less money on like the prompt tokens that the function call represents, primarily because with JSON mode, you don't actually specify the schema. So sure, like JSON load works, but really, I care a lot more than just the fact that it is JSON, right? I think function calling gives you a tool to specify the fact like, okay, this is a list of objects that I want and each object has a name or an age and I want the age to be above zero and I want to make sure it's parsed correctly. That's where kind of function calling really shines.
Alessio [00:12:30]: Any thoughts on single versus parallel function calling? So I did a presentation at our AI in Action Discord channel, and obviously showcase instructor. One of the big things that we have before with single function calling is like when you're trying to extract lists, you have to make these funky like properties that are lists to then actually return all the objects. How do you see the hack being put on the developer's plate versus like more of this stuff just getting better in the model? And I know you tweeted recently about Anthropic, for example, you know, some lists are not lists or strings and there's like all of these discrepancies.
Jason [00:13:04]: I almost would prefer it if it was always a single function call. Obviously, there is like the agents workflows that, you know, Instructor doesn't really support that well, but are things that, you know, ought to be done, right? Like you could define, I think maybe like 50 or 60 different functions in a single API call. And, you know, if it was like get the weather or turn the lights on or do something else, it makes a lot of sense to have these parallel function calls. But in terms of an extraction workflow, I definitely think it's probably more helpful to have everything be a single schema, right? Just because you can sort of specify relationships between these entities that you can't do in a parallel function calling, you can have a single chain of thought before you generate a list of results. Like there's like small like API differences, right? Where if it's for parallel function calling, if you do one, like again, really, I really care about how the SDK looks and says, okay, do I always return a list of functions or do you just want to have the actual object back out and you want to have like auto complete over that object? Interesting.
Alessio [00:14:00]: What's kind of the cap for like how many function definitions you can put in where it still works well? Do you have any sense on that?
Jason [00:14:07]: I mean, for the most part, I haven't really had a need to do anything that's more than six or seven different functions. I think in the documentation, they support way more. I don't even know if there's any good evals that have over like two dozen function calls. I think if you're running into issues where you have like 20 or 50 or 60 function calls, I think you're much better having those specifications saved in a vector database and then have them be retrieved, right? So if there are 30 tools, like you should basically be like ranking them and then using the top K to do selection a little bit better rather than just like shoving like 60 functions into a single. Yeah.
Swyx [00:14:40]: Yeah. Well, I mean, so I think this is relevant now because previously I think context limits prevented you from having more than a dozen tools anyway. And now that we have million token context windows, you know, a cloud recently with their new function calling release said they can handle over 250 tools, which is insane to me. That's, that's a lot. You're saying like, you know, you don't think there's many people doing that. I think anyone with a sort of agent like platform where you have a bunch of connectors, they wouldn't run into that problem. Probably you're right that they should use a vector database and kind of rag their tools. I know Zapier has like a few thousand, like 8,000, 9,000 connectors that, you know, obviously don't fit anywhere. So yeah, I mean, I think that would be it unless you need some kind of intelligence that chains things together, which is, I think what Alessio is coming back to, right? Like there's this trend about parallel function calling. I don't know what I think about that. Anthropic's version was, I think they use multiple tools in sequence, but they're not in parallel. I haven't explored this at all. I'm just like throwing this open to you as to like, what do you think about all these new things? Yeah.
Jason [00:15:40]: It's like, you know, do we assume that all function calls could happen in any order? In which case, like we either can assume that, or we can assume that like things need to happen in some kind of sequence as a DAG, right? But if it's a DAG, really that's just like one JSON object that is the entire DAG rather than going like, okay, the order of the function that return don't matter. That's definitely just not true in practice, right? Like if I have a thing that's like turn the lights on, like unplug the power, and then like turn the toaster on or something like the order doesn't matter. And it's unclear how well you can describe the importance of that reasoning to a language model yet. I mean, I'm sure you can do it with like good enough prompting, but I just haven't any use cases where the function sequence really matters. Yeah.
Alessio [00:16:18]: To me, the most interesting thing is the models are better at picking than your ranking is usually. Like I'm incubating a company around system integration. For example, with one system, there are like 780 endpoints. And if you're actually trying to do vector similarity, it's not that good because the people that wrote the specs didn't have in mind making them like semantically apart. You know, they're kind of like, oh, create this, create this, create this. Versus when you give it to a model, like in Opus, you put them all, it's quite good at picking which ones you should actually run. And I'm curious to see if the model providers actually care about some of those workflows or if the agent companies are actually going to build very good rankers to kind of fill that gap.
Jason [00:16:58]: Yeah. My money is on the rankers because you can do those so easily, right? You could just say, well, given the embeddings of my search query and the embeddings of the description, I can just train XGBoost and just make sure that I have very high like MRR, which is like mean reciprocal rank. And so the only objective is to make sure that the tools you use are in the top end filtered. Like that feels super straightforward and you don't have to actually figure out how to fine tune a language model to do tool selection anymore. Yeah. I definitely think that's the case because for the most part, I imagine you either have like less than three tools or more than a thousand. I don't know what kind of company said, oh, thank God we only have like 185 tools and this works perfectly, right? That's right.
Alessio [00:17:39]: And before we maybe move on just from this, it was interesting to me, you retweeted this thing about Anthropic function calling and it was Joshua Brown's retweeting some benchmark that it's like, oh my God, Anthropic function calling so good. And then you retweeted it and then you tweeted it later and it's like, it's actually not that good. What's your flow? How do you actually test these things? Because obviously the benchmarks are lying, right? Because the benchmarks say it's good and you said it's bad and I trust you more than the benchmark. How do you think about that? And then how do you evolve it over time?
Jason [00:18:09]: It's mostly just client data. I actually have been mostly busy with enough client work that I haven't been able to reproduce public benchmarks. And so I can't even share some of the results in Anthropic. I would just say like in production, we have some pretty interesting schemas where it's like iteratively building lists where we're doing like updates of lists, like we're doing in place updates. So like upserts and inserts. And in those situations we're like, oh yeah, we have a bunch of different parsing errors. Numbers are being returned to strings. We were expecting lists of objects, but we're getting strings that are like the strings of JSON, right? So we had to call JSON parse on individual elements. Overall, I'm like super happy with the Anthropic models compared to the OpenAI models. Sonnet is very cost effective. Haiku is in function calling, it's actually better, but I think they just had to sort of file down the edges a little bit where like our tests pass, but then we actually deployed a production. We got half a percent of traffic having issues where if you ask for JSON, it'll try to talk to you. Or if you use function calling, you know, we'll have like a parse error. And so I think that definitely gonna be things that are fixed in like the upcoming weeks. But in terms of like the reasoning capabilities, man, it's hard to beat like 70% cost reduction, especially when you're building consumer applications, right? If you're building something for consultants or private equity, like you're charging $400, it doesn't really matter if it's a dollar or $2. But for consumer apps, it makes products viable. If you can go from four to Sonnet, you might actually be able to price it better. Yeah.
Swyx [00:19:31]: I had this chart about the ELO versus the cost of all the models. And you could put trend graphs on each of those things about like, you know, higher ELO equals higher cost, except for Haiku. Haiku kind of just broke the lines, or the ISO ELOs, if you want to call it. Cool. Before we go too far into your opinions on just the overall ecosystem, I want to make sure that we map out the surface area of Instructor. I would say that most people would be familiar with Instructor from your talks and your tweets and all that. You had the number one talk from the AI Engineer Summit.
Jason [00:20:03]: Two Liu. Jason Liu and Jerry Liu. Yeah.
Swyx [00:20:06]: Yeah. Until I actually went through your cookbook, I didn't realize the surface area. How would you categorize the use cases? You have LLM self-critique, you have knowledge graphs in here, you have PII data sanitation. How do you characterize to people what is the surface area of Instructor? Yeah.
Jason [00:20:23]: This is the part that feels crazy because really the difference is LLMs give you strings and Instructor gives you data structures. And once you get data structures, again, you can do every lead code problem you ever thought of. Right. And so I think there's a couple of really common applications. The first one obviously is extracting structured data. This is just be, okay, well, like I want to put in an image of a receipt. I want to give it back out a list of checkout items with a price and a fee and a coupon code or whatever. That's one application. Another application really is around extracting graphs out. So one of the things we found out about these language models is that not only can you define nodes, it's really good at figuring out what are nodes and what are edges. And so we have a bunch of examples where, you know, not only do I extract that, you know, this happens after that, but also like, okay, these two are dependencies of another task. And you can do, you know, extracting complex entities that have relationships. Given a story, for example, you could extract relationships of families across different characters. This can all be done by defining a graph. The last really big application really is just around query understanding. The idea is that like any API call has some schema and if you can define that schema ahead of time, you can use a language model to resolve a request into a much more complex request. One that an embedding could not do. So for example, I have a really popular post called like rag is more than embeddings. And effectively, you know, if I have a question like this, what was the latest thing that happened this week? That embeds to nothing, right? But really like that query should just be like select all data where the date time is between today and today minus seven days, right? What if I said, how did my writing change between this month and last month? Again, embeddings would do nothing. But really, if you could do like a group by over the month and a summarize, then you could again like do something much more interesting. And so this really just calls out the fact that embeddings really is kind of like the lowest hanging fruit. And using something like instructor can really help produce a data structure. And then you can just use your computer science and reason about the data structure. Maybe you say, okay, well, I'm going to produce a graph where I want to group by each month and then summarize them jointly. You can do that if you know how to define this data structure. Yeah.
Swyx [00:22:29]: So you kind of run up against like the LangChains of the world that used to have that. They still do have like the self querying, I think they used to call it when we had Harrison on in our episode. How do you see yourself interacting with the other LLM frameworks in the ecosystem? Yeah.
Jason [00:22:42]: I mean, if they use instructor, I think that's totally cool. Again, it's like, it's just Python, right? It's like asking like, oh, how does like Django interact with requests? Well, you just might make a request.get in a Django app, right? But no one would say, I like went off of Django because I'm using requests now. They should be ideally like sort of the wrong comparison in terms of especially like the agent workflows. I think the real goal for me is to go down like the LLM compiler route, which is instead of doing like a react type reasoning loop. I think my belief is that we should be using like workflows. If we do this, then we always have a request and a complete workflow. We can fine tune a model that has a better workflow. Whereas it's hard to think about like, how do you fine tune a better react loop? Yeah. You always train it to have less looping, in which case like you wanted to get the right answer the first time, in which case it was a workflow to begin with, right?
Swyx [00:23:31]: Can you define workflow? Because I used to work at a workflow company, but I'm not sure this is a good term for everybody.
Jason [00:23:36]: I'm thinking workflow in terms of like the prefect Zapier workflow. Like I want to build a DAG, I want you to tell me what the nodes and edges are. And then maybe the edges are also put in with AI. But the idea is that like, I want to be able to present you the entire plan and then ask you to fix things as I execute it, rather than going like, hey, I couldn't parse the JSON, so I'm going to try again. I couldn't parse the JSON, I'm going to try again. And then next thing you know, you spent like $2 on opening AI credits, right? Yeah. Whereas with the plan, you can just say, oh, the edge between node like X and Y does not run. Let me just iteratively try to fix that, fix the one that sticks, go on to the next component. And obviously you can get into a world where if you have enough examples of the nodes X and Y, maybe you can use like a vector database to find a good few shot examples. You can do a lot if you sort of break down the problem into that workflow and executing that workflow, rather than looping and hoping the reasoning is good enough to generate the correct output. Yeah.
Swyx [00:24:35]: You know, I've been hammering on Devon a lot. I got access a couple of weeks ago. And obviously for simple tasks, it does well. For the complicated, like more than 10, 20 hour tasks, I can see- That's a crazy comparison.
Jason [00:24:47]: We used to talk about like three, four loops. Only once it gets to like hour tasks, it's hard.
Swyx [00:24:54]: Yeah. Less than an hour, there's nothing.
Jason [00:24:57]: That's crazy.
Swyx [00:24:58]: I mean, okay. Maybe my goalposts have shifted. I don't know. That's incredible.
Jason [00:25:02]: Yeah. No, no. I'm like sub one minute executions. Like the fact that you're talking about 10 hours is incredible.
Swyx [00:25:08]: I think it's a spectrum. I think I'm going to say this every single time I bring up Devon. Let's not reward them for taking longer to do things. Do you know what I mean? I think that's a metric that is easily abusable.
Jason [00:25:18]: Sure. Yeah. You know what I mean? But I think if you can monotonically increase the success probability over an hour, that's winning to me. Right? Like obviously if you run an hour and you've made no progress. Like I think when we were in like auto GBT land, there was that one example where it's like, I wanted it to like buy me a bicycle overnight. I spent $7 on credit and I never found the bicycle. Yeah.
Swyx [00:25:41]: Yeah. Right. I wonder if you'll be able to purchase a bicycle. Because it actually can do things in real world. It just needs to suspend to you for off and stuff. The point I was trying to make was that I can see it turning plans. I think one of the agents loopholes or one of the things that is a real barrier for agents is LLMs really like to get stuck into a lane. And you know what you're talking about, what I've seen Devon do is it gets stuck in a lane and it will just kind of change plans based on the performance of the plan itself. And it's kind of cool.
Jason [00:26:05]: I feel like we've gone too much in the looping route and I think a lot of more plans and like DAGs and data structures are probably going to come back to help fill in some holes. Yeah.
Alessio [00:26:14]: What do you think of the interface to that? Do you see it's like an existing state machine kind of thing that connects to the LLMs, the traditional DAG players? Do you think we need something new for like AI DAGs?
Jason [00:26:25]: Yeah. I mean, I think that the hard part is going to be describing visually the fact that this DAG can also change over time and it should still be allowed to be fuzzy. I think in like mathematics, we have like plate diagrams and like Markov chain diagrams and like recurrent states and all that. Some of that might come into this workflow world. But to be honest, I'm not too sure. I think right now, the first steps are just how do we take this DAG idea and break it down to modular components that we can like prompt better, have few shot examples for and ultimately like fine tune against. But in terms of even the UI, it's hard to say what it will likely win. I think, you know, people like Prefect and Zapier have a pretty good shot at doing a good job.
Swyx [00:27:03]: Yeah. You seem to use Prefect a lot. I actually worked at a Prefect competitor at Temporal and I'm also very familiar with Dagster. What else would you call out as like particularly interesting in the AI engineering stack?
Jason [00:27:13]: Man, I almost use nothing. I just use Cursor and like PyTests. Okay. I think that's basically it. You know, a lot of the observability companies have... The more observability companies I've tried, the more I just use Postgres.
Swyx [00:27:29]: Really? Okay. Postgres for observability?
Jason [00:27:32]: But the issue really is the fact that these observability companies isn't actually doing observability for the system. It's just doing the LLM thing. Like I still end up using like Datadog or like, you know, Sentry to do like latency. And so I just have those systems handle it. And then the like prompt in, prompt out, latency, token costs. I just put that in like a Postgres table now.
Swyx [00:27:51]: So you don't need like 20 funded startups building LLM ops? Yeah.
Jason [00:27:55]: But I'm also like an old, tired guy. You know what I mean? Like I think because of my background, it's like, yeah, like the Python stuff, I'll write myself. But you know, I will also just use Vercel happily. Yeah. Yeah. So I'm not really into that world of tooling, whereas I think, you know, I spent three good years building observability tools for recommendation systems. And I was like, oh, compared to that, Instructor is just one call. I just have to put time star, time and then count the prompt token, right? Because I'm not doing a very complex looping behavior. I'm doing mostly workflows and extraction. Yeah.
Swyx [00:28:26]: I mean, while we're on this topic, we'll just kind of get this out of the way. You famously have decided to not be a venture backed company. You want to do the consulting route. The obvious route for someone as successful as Instructor is like, oh, here's hosted Instructor with all tooling. Yeah. You just said you had a whole bunch of experience building observability tooling. You have the perfect background to do this and you're not.
Jason [00:28:43]: Yeah. Isn't that sick? I think that's sick.
Swyx [00:28:44]: I mean, I know why, because you want to go free dive.
Jason [00:28:47]: Yeah. Yeah. Because I think there's two things. Right. Well, one, if I tell myself I want to build requests, requests is not a venture backed startup. Right. I mean, one could argue whether or not Postman is, but I think for the most part, it's like having worked so much, I'm more interested in looking at how systems are being applied and just having access to the most interesting data. And I think I can do that more through a consulting business where I can come in and go, oh, you want to build perfect memory. You want to build an agent. You want to build like automations over construction or like insurance and supply chain, or like you want to handle writing private equity, mergers and acquisitions reports based off of user interviews. Those things are super fun. Whereas like maintaining the library, I think is mostly just kind of like a utility that I try to keep up, especially because if it's not venture backed, I have no reason to sort of go down the route of like trying to get a thousand integrations. In my mind, I just go like, okay, 98% of the people use open AI. I'll support that. And if someone contributes another platform, that's great. I'll merge it in. Yeah.
Swyx [00:29:45]: I mean, you only added Anthropic support this year. Yeah.
Jason [00:29:47]: Yeah. You couldn't even get an API key until like this year, right? That's true. Okay. If I add it like last year, I was trying to like double the code base to service, you know, half a percent of all downloads.
Swyx [00:29:58]: Do you think the market share will shift a lot now that Anthropic has like a very, very competitive offering?
Jason [00:30:02]: I think it's still hard to get API access. I don't know if it's fully GA now, if it's GA, if you can get a commercial access really easily.
Alessio [00:30:12]: I got commercial after like two weeks to reach out to their sales team.
Jason [00:30:14]: Okay.
Alessio [00:30:15]: Yeah.
Swyx [00:30:16]: Two weeks. It's not too bad. There's a call list here. And then anytime you run into rate limits, just like ping one of the Anthropic staff members.
Jason [00:30:21]: Yeah. Then maybe we need to like cut that part out. So I don't need to like, you know, spread false news.
Swyx [00:30:25]: No, it's cool. It's cool.
Jason [00:30:26]: But it's a common question. Yeah. Surely just from the price perspective, it's going to make a lot of sense. Like if you are a business, you should totally consider like Sonnet, right? Like the cost savings is just going to justify it if you actually are doing things at volume. And yeah, I think the SDK is like pretty good. Back to the instructor thing. I just don't think it's a billion dollar company. And I think if I raise money, the first question is going to be like, how are you going to get a billion dollar company? And I would just go like, man, like if I make a million dollars as a consultant, I'm super happy. I'm like more than ecstatic. I can have like a small staff of like three people. It's fun. And I think a lot of my happiest founder friends are those who like raised a tiny seed round, became profitable. They're making like 70, 60, 70, like MRR, 70,000 MRR and they're like, we don't even need to raise the seed round. Let's just keep it like between me and my co-founder, we'll go traveling and it'll be a great time. I think it's a lot of fun.
Alessio [00:31:15]: Yeah. like say LLMs / AI and they build some open source stuff and it's like I should just raise money and do this and I tell people a lot it's like look you can make a lot more money doing something else than doing a startup like most people that do a company could make a lot more money just working somewhere else than the company itself do you have any advice for folks that are maybe in a similar situation they're trying to decide oh should I stay in my like high paid FAANG job and just tweet this on the side and do this on github should I go be a consultant like being a consultant seems like a lot of work so you got to talk to all these people you know there's a lot to unpack
Jason [00:31:54]: I think the open source thing is just like well I'm just doing it purely for fun and I'm doing it because I think I'm right but part of being right is the fact that it's not a venture backed startup like I think I'm right because this is all you need right so I think a part of the philosophy is the fact that all you need is a very sharp blade to sort of do your work and you don't actually need to build like a big enterprise so that's one thing I think the other thing too that I've kind of been thinking around just because I have a lot of friends at google that want to leave right now it's like man like what we lack is not money or skill like what we lack is courage you should like you just have to do this a hard thing and you have to do it scared anyways right in terms of like whether or not you do want to do a founder I think that's just a matter of optionality but I definitely recognize that the like expected value of being a founder is still quite low it is right I know as many founder breakups and as I know friends who raised a seed round this year right like that is like the reality and like you know even in from that perspective it's been tough where it's like oh man like a lot of incubators want you to have co-founders now you spend half the time like fundraising and then trying to like meet co-founders and find co-founders rather than building the thing this is a lot of time spent out doing uh things I'm not really good at. I do think there's a rising trend in solo founding yeah.
Swyx [00:33:06]: You know I am a solo I think that something like 30 percent of like I forget what the exact status something like 30 percent of starters that make it to like series B or something actually are solo founder I feel like this must have co-founder idea mostly comes from YC and most everyone else copies it and then plenty of companies break up over co-founder
Jason [00:33:27]: Yeah and I bet it would be like I wonder how much of it is the people who don't have that much like and I hope this is not a diss to anybody but it's like you sort of you go through the incubator route because you don't have like the social equity you would need is just sort of like send an email to Sequoia and be like hey I'm going on this ride you want a ticket on the rocket ship right like that's very hard to sell my message if I was to raise money is like you've seen my twitter my life is sick I've decided to make it much worse by being a founder because this is something I have to do so do you want to come along otherwise I want to fund it myself like if I can't say that like I don't need the money because I can like handle payroll and like hire an intern and get an assistant like that's all fine but I really don't want to go back to meta I want to like get two years to like try to find a problem we're solving that feels like a bad time
Alessio [00:34:12]: Yeah Jason is like I wear a YSL jacket on stage at AI Engineer Summit I don't need your accelerator money
Jason [00:34:18]: And boots, you don't forget the boots. But I think that is a part of it right I think it is just like optionality and also just like I'm a lot older now I think 22 year old Jason would have been probably too scared and now I'm like too wise but I think it's a matter of like oh if you raise money you have to have a plan of spending it and I'm just not that creative with spending that much money yeah I mean to be clear you just celebrated your 30th birthday happy birthday yeah it's awesome so next week a lot older is relative to some some of the folks I think seeing on the career tips
Alessio [00:34:48]: I think Swix had a great post about are you too old to get into AI I saw one of your tweets in January 23 you applied to like Figma, Notion, Cohere, Anthropic and all of them rejected you because you didn't have enough LLM experience I think at that time it would be easy for a lot of people to say oh I kind of missed the boat you know I'm too late not gonna make it you know any advice for people that feel like that
Jason [00:35:14]: Like the biggest learning here is actually from a lot of folks in jiu-jitsu they're like oh man like is it too late to start jiu-jitsu like I'll join jiu-jitsu once I get in more shape right it's like there's a lot of like excuses and then you say oh like why should I start now I'll be like 45 by the time I'm any good and say well you'll be 45 anyways like time is passing like if you don't start now you start tomorrow you're just like one more day behind if you're worried about being behind like today is like the soonest you can start right and so you got to recognize that like maybe you just don't want it and that's fine too like if you wanted you would have started I think a lot of these people again probably think of things on a too short time horizon but again you know you're gonna be old anyways you may as well just start now you know
Swyx [00:35:55]: One more thing on I guess the um career advice slash sort of vlogging you always go viral for this post that you wrote on advice to young people and the lies you tell yourself oh yeah yeah you said you were writing it for your sister.
Jason [00:36:05]: She was like bummed out about going to college and like stressing about jobs and I was like oh and I really want to hear okay and I just kind of like text-to-sweep the whole thing it's crazy it's got like 50,000 views like I'm mind I mean your average tweet has more but that thing is like a 30-minute read now
Swyx [00:36:26]: So there's lots of stuff here which I agree with I you know I'm also of occasionally indulge in the sort of life reflection phase there's the how to be lucky there's the how to have high agency I feel like the agency thing is always a trend in sf or just in tech circles how do you define having high agency
Jason [00:36:42]: I'm almost like past the high agency phase now now my biggest concern is like okay the agency is just like the norm of the vector what also matters is the direction right it's like how pure is the shot yeah I mean I think agency is just a matter of like having courage and doing the thing that's scary right you know if people want to go rock climbing it's like do you decide you want to go rock climbing then you show up to the gym you rent some shoes and you just fall 40 times or do you go like oh like I'm actually more intelligent let me go research the kind of shoes that I want okay like there's flatter shoes and more inclined shoes like which one should I get okay let me go order the shoes on Amazon I'll come back in three days like oh it's a little bit too tight maybe it's too aggressive I'm only a beginner let me go change no I think the higher agent person just like goes and like falls down 20 times right yeah I think the higher agency person is more focused on like process metrics versus outcome metrics right like from pottery like one thing I learned was if you want to be good at pottery you shouldn't count like the number of cups or bowls you make you should just weigh the amount of clay you use right like the successful person says oh I went through 100 pounds of clay right the less agency was like oh I've made six cups and then after I made six cups like there's not really what are you what do you do next no just pounds of clay pounds of clay same with the work here right so you just got to write the tweets like make the commits contribute open source like write the documentation there's no real outcome it's just a process and if you love that process you just get really good at the thing you're doing
Swyx [00:38:04]: yeah so just to push back on this because obviously I mostly agree how would you design performance review systems because you were effectively saying we can count lines of code for developers right
Jason [00:38:15]: I don't think that would be the actual like I think if you make that an outcome like I can just expand a for loop right I think okay so for performance review this is interesting because I've mostly thought of it from the perspective of science and not engineering I've been running a lot of engineering stand-ups primarily because there's not really that many machine learning folks the process outcome is like experiments and ideas right like if you think about outcome is what you might want to think about an outcome is oh I want to improve the revenue or whatnot but that's really hard but if you're someone who is going out like okay like this week I want to come up with like three or four experiments I might move the needle okay nothing worked to them they might think oh nothing worked like I suck but to me it's like wow you've closed off all these other possible avenues for like research like you're gonna get to the place that you're gonna figure out that direction really soon there's no way you try 30 different things and none of them work usually like 10 of them work five of them work really well two of them work really really well and one thing was like the nail in the head so agency lets you sort of capture the volume of experiments and like experience lets you figure out like oh that other half it's not worth doing right I think experience is going like half these prompting papers don't make any sense just use chain of thought and just you know use a for loop that's basically right it's like usually performance for me is around like how many experiments are you running how oftentimes are you trying.
Alessio [00:39:32]: When do you give up on an experiment because a StitchFix you kind of give up on language models I guess in a way as a tool to use and then maybe the tools got better you were right at the time and then the tool improved I think there are similar paths in my engineering career where I try one approach and at the time it doesn't work and then the thing changes but then I kind of soured on that approach and I don't go back to it soon
Jason [00:39:51]: I see yeah how do you think about that loop so usually when I'm coaching folks and as they say like oh these things don't work I'm not going to pursue them in the future like one of the big things like hey the negative result is a result and this is something worth documenting like this is an academia like if it's negative you don't just like not publish right but then like what do you actually write down like what you should write down is like here are the conditions this is the inputs and the outputs we tried the experiment on and then one thing that's really valuable is basically writing down under what conditions would I revisit these experiments these things don't work because of what we had at the time if someone is reading this two years from now under what conditions will we try again that's really hard but again that's like another skill you kind of learn right it's like you do go back and you do experiments you figure out why it works now I think a lot of it here is just like scaling worked yeah rap lyrics you know that was because I did not have high enough quality data if we phase shift and say okay you don't even need training data oh great then it might just work a different domain
Alessio [00:40:48]: Do you have anything in your list that is like it doesn't work now but I want to try it again later? Something that people should maybe keep in mind you know people always like agi when you know when are you going to know the agi is here maybe it's less than that but any stuff that you tried recently that didn't work that
Jason [00:41:01]: You think will get there I mean I think the personal assistance and the writing I've shown to myself it's just not good enough yet so I hired a writer and I hired a personal assistant so now I'm gonna basically like work with these people until I figure out like what I can actually like automate and what are like the reproducible steps but like I think the experiment for me is like I'm gonna go pay a person like thousand dollars a month that helped me improve my life and then let me get them to help me figure like what are the components and how do I actually modularize something to get it to work because it's not just like a lot gmail calendar and like notion it's a little bit more complicated than that but we just don't know what that is yet those are two sort of systems that I wish gb4 or opus was actually good enough to just write me an essay but most of the essays are still pretty bad
Swyx [00:41:44]: yeah I would say you know on the personal assistance side Lindy is probably the one I've seen the most flow was at a speaker at the summit I don't know if you've checked it out or any other sort of agents assistant startup
Jason [00:41:54]: Not recently I haven't tried lindy they were not ga last time I was considering it yeah yeah a lot of it now it's like oh like really what I want you to do is take a look at all of my meetings and like write like a really good weekly summary email for my clients to remind them that I'm like you know thinking of them and like working for them right or it's like I want you to notice that like my monday is like way too packed and like block out more time and also like email the people to do the reschedule and then try to opt in to move them around and then I want you to say oh jason should have like a 15 minute prep break after form back to back those are things that now I know I can prompt them in but can it do it well like before I didn't even know that's what I wanted to prompt for us defragging a calendar and adding break so I can like eat lunch yeah that's the AGI test yeah exactly compassion right I think one thing that yeah we didn't touch on it before but
Alessio [00:42:44]: I think was interesting you had this tweet a while ago about prompts should be code and then there were a lot of companies trying to build prompt engineering tooling kind of trying to turn the prompt into a more structured thing what's your thought today now you want to turn the thinking into DAGs like do prompts should still be code any updated ideas
Jason [00:43:04]: It's the same thing right I think you know with Instructor it is very much like the output model is defined as a code object that code object is sent to the LLM and in return you get a data structure so the outputs of these models I think should also be code objects and the inputs somewhat should be code objects but I think the one thing that instructor tries to do is separate instruction data and the types of the output and beyond that I really just think that most of it should be still like managed pretty closely to the developer like so much of is changing that if you give control of these systems away too early you end up ultimately wanting them back like many companies I know that I reach out or ones were like oh we're going off of the frameworks because now that we know what the business outcomes we're trying to optimize for these frameworks don't work yeah because we do rag but we want to do rag to like sell you supplements or to have you like schedule the fitness appointment the prompts are kind of too baked into the systems to really pull them back out and like start doing upselling or something it's really funny but a lot of it ends up being like once you understand the business outcomes you care way more about the prompt
Swyx [00:44:07]: Actually this is fun in our prep for this call we were trying to say like what can you as an independent person say that maybe me and Alessio cannot say or me you know someone at a company say what do you think is the market share of the frameworks the LangChain, the LlamaIndex, the everything...
Jason [00:44:20]: Oh massive because not everyone wants to care about the code yeah right I think that's a different question to like what is the business model and are they going to be like massively profitable businesses right making hundreds of millions of dollars that feels like so straightforward right because not everyone is a prompt engineer like there's so much productivity to be captured in like back office optim automations right it's not because they care about the prompts that they care about managing these things yeah but those would be sort of low code experiences you yeah I think the bigger challenge is like okay hundred million dollars probably pretty easy it's just time and effort and they have the manpower and the money to sort of solve those problems again if you go the vc route then it's like you're talking about billions and that's really the goal that stuff for me it's like pretty unclear but again that is to say that like I sort of am building things for developers who want to use infrastructure to build their own tooling in terms of the amount of developers there are in the world versus downstream consumers of these things or even just think of how many companies will use like the adobes and the ibms right because they want something that's fully managed and they want something that they know will work and if the incremental 10% requires you to hire another team of 20 people you might not want to do it and I think that kind of organization is really good for uh those are bigger companies
Swyx [00:45:32]: I just want to capture your thoughts on one more thing which is you said you wanted most of the prompts to stay close to the developer and Hamel Husain wrote this post which I really love called f you show me the prompt yeah I think he cites you in one of those part of the blog post and I think ds pi is kind of like the complete antithesis of that which is I think it's interesting because I also hold the strong view that AI is a better prompt engineer than you are and I don't know how to square that wondering if you have thoughts
Jason [00:45:58]: I think something like DSPy can work because there are like very short-term metrics to measure success right it is like did you find the pii or like did you write the multi-hop question the correct way but in these workflows that I've been managing a lot of it are we minimizing churn and maximizing retention yeah that's a very long loop it's not really like a uptuna like training loop right like those things are much more harder to capture so we don't actually have those metrics for that right and obviously we can figure out like okay is the summary good but like how do you measure the quality of the summary it's like that feedback loop it ends up being a lot longer and then again when something changes it's really hard to make sure that it works across these like newer models or again like changes to work for the current process like when we migrate from like anthropic to open ai like there's just a ton of change that are like infrastructure related not necessarily around the prompt itself yeah cool any other ai engineering startups that you think should not exist before we wrap up i mean oh my gosh i mean a lot of it again it's just like every time of investors like how does this make a billion dollars like it doesn't i'm gonna go back to just like tweeting and holding my breath underwater yeah like i don't really pay attention too much to most of this like most of the stuff i'm doing is around like the consumer of like llm calls yep i think people just want to move really fast and they will end up pick these vendors but i don't really know if anything has really like blown me out the water like i only trust myself but that's also a function of just being an old man like i think you know many companies are definitely very happy with using most of these tools anyways but i definitely think i occupy a very small space in the engineering ecosystem.
Swyx [00:47:41]: Yeah i would say one of the challenges here you know you call about the dealing in the consumer of llm's space i think that's what ai engineering differs from ml engineering and i think a constant disconnect or cognitive dissonance in this field in the ai engineers that have sprung up is that they are not as good as the ml engineers they are not as qualified i think that you know you are someone who has credibility in the mle space and you are also a very authoritative figure in the ai space and i think so and you know i think you've built the de facto leading library i think yours i think instructors should be part of the standard lib even though i try to not use it like i basically also end up rebuilding instructor right like that's a lot of the back and forth that we had over the past two days i think that's the fundamental thing that we're trying to figure out like there's very small supply of MLEs not everyone's going to have that experience that you had but the global demand for AI is going to far outstrip the existing MLEs.
Jason [00:48:36]: So what do we do do we force everyone to go through the standard MLE curriculum or do we make a new one? I've got some takes go i think a lot of these app layer startups should not be hiring MLEs because they end up churning yeah they want to work at opening high they're just like hey guys i joined and you have no data and like all i did this week was take some typescript build errors and like figure out why we don't have any tests and like what is this framework x and y like how do you measure success what are your business outcomes oh no okay let's not focus on that great i'll focus on these typescript build errors and then you're just like what am i doing and then you kind of sort of feel really frustrated and i already recognize that because i've made offers to machine learning engineers they've joined and they've left in like two months and the response is like yeah i think i'm gonna join a research lab so i think it's not even that like i don't even think you should be hiring these mles on the other hand what i also see a lot of is the really motivated engineer that's doing more engineering is not being allowed to actually like fully pursue the ai engineering so they're the guy who built the demo it got traction now it's working but they're still being pulled back to figure out why google calendar integrations are not working or like how to make sure that you know the button is loading on the page and so i'm sort of like in a very interesting position where the companies want to hire an ml they don't need to hire but they won't let the excited people who've caught the ai engineering bug could go do that work more full-time this is something i'm literally wrestling with this week as i just wrote something about it this is one of the things i'm probably going to be recommending in the future is really thinking about like where is the talent coming from how much of it is internal and do you really need to hire someone who's like writing pytorch code yeah exactly most of the time you're not you're gonna need someone to write instructor code and like i feel goofy all the time just like prompting it's like oh man like i wish i just had a target data set that i could like train a model against yes and i can just say it's right or wrong yeah.
Swyx [00:50:32]: You know i guess what Latent Space is, what the AI Engineer world's fair is is that we're trying to create and elevate this industry of ai engineers where it's legitimate to actually take these motivated software engineers who want to build more in ai and do creative things in ai to actually say you have the blessing like and this is legitimate sub-specialty of software engineering
Jason [00:50:50]: Yeah i think there's been a mix of that product engineering i think a lot more data science is going to come in versus machine learning engineering because a lot of it now is just quantifying like what does the business actually want as an outcome the outcome is not rag app yeah the outcome is like reduced churn people need to figure out what that actually is and how to measure it yeah all the data engineering tools still apply
Swyx [00:51:09]: bi layers semantic layers whatever yeah cool we'll have you back again for the world's fair we don't know what you're going to talk about but i'm sure it's going to be amazing you're a very polished speaker
Jason [00:51:19]: The title is written it's just uh Pydantic is still all you need
Swyx [00:51:26]: I'm worried about having too many all you need titles because that's obviously very trendy so yeah you have one of them but i need to keep a lid on like you know everyone's saying their
Jason [00:51:34]: thing is all you need but yeah we'll figure it out i think it's not my thing it's someone else
Swyx [00:51:38]: i think that's why it works it's true cool well it's a real pleasure to have you on of course everyone should go follow you on twitter and check out instructor there's also instructor js which i'm very happy to see.

Get full access to Latent Space at www.latent.space/subscribe

19 April 2024, 7:07 pm
56 minutes 20 seconds

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Maggie, Linus, Geoffrey, and the LS crew are reuniting for our second annual AI UX demo day in SF on Apr 28. Sign up to demo here! And don’t forget tickets for the AI Engineer World’s Fair — for early birds who join before keynote announcements!
It’s become fashionable for many AI startups to project themselves as “the next Google” - while the search engine is so 2000s, both Perplexity and Exa referred to themselves as a “research engine” or “answer engine” in our NeurIPS pod. However these searches tend to be relatively shallow, and it is challenging to zoom up and down the ladders of abstraction to garner insights. For serious researchers, this level of simple one-off search will not cut it.
We’ve commented in our Jan 2024 Recap that Flow Engineering (simply; multi-turn processes over many-shot single prompts) seems to offer far more performance, control and reliability for a given cost budget. Our experiments with Devin and our understanding of what the new Elicit Notebooks offer a glimpse into the potential for very deep, open ended, thoughtful human-AI collaboration at scale.
It starts with prompts
When ChatGPT exploded in popularity in November 2022 everyone was turned into a prompt engineer. While generative models were good at "vibe based" outcomes (tell me a joke, write a poem, etc) with basic prompts, they struggled with more complex questions, especially in symbolic fields like math, logic, etc. Two of the most important "tricks" that people picked up on were:
* Chain of Thought prompting strategy proposed by Wei et al in the “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”. Rather than doing traditional few-shot prompting with just question and answers, adding the thinking process that led to the answer resulted in much better outcomes.
* Adding "Let's think step by step" to the prompt as a way to boost zero-shot reasoning, which was popularized by Kojima et al in the Large Language Models are Zero-Shot Reasoners paper from NeurIPS 2022. This bumped accuracy from 17% to 79% compared to zero-shot.
Nowadays, prompts include everything from promises of monetary rewards to… whatever the Nous folks are doing to turn a model into a world simulator. At the end of the day, the goal of prompt engineering is increasing accuracy, structure, and repeatability in the generation of a model.
From prompts to agents
As prompt engineering got more and more popular, agents (see “The Anatomy of Autonomy”) took over Twitter with cool demos and AutoGPT became the fastest growing repo in Github history. The thing about AutoGPT that fascinated people was the ability to simply put in an objective without worrying about explaining HOW to achieve it, or having to write very sophisticated prompts. The system would create an execution plan on its own, and then loop through each task.
The problem with open-ended agents like AutoGPT is that 1) it’s hard to replicate the same workflow over and over again 2) there isn’t a way to hard-code specific steps that the agent should take without actually coding them yourself, which isn’t what most people want from a product.
From agents to products
Prompt engineering and open-ended agents were great in the experimentation phase, but this year more and more of these workflows are starting to become polished products.
Today’s guests are Andreas Stuhlmüller and Jungwon Byun of Elicit (previously Ought), an AI research assistant that they think of as “the best place to understand what is known”.
Ought was a non-profit, but last September, Elicit spun off into a PBC with a $9m seed round. It is hard to quantify how much a workflow can be improved, but Elicit boasts some impressive numbers for research assistants:
Just four months after launch, Elicit crossed $1M ARR, which shows how much interest there is for AI products that just work.
One of the main takeaways we had from the episode is how teams should focus on supervising the process, not the output. Their philosophy at Elicit isn’t to train general models, but to train models that are extremely good at focusing processes.
This allows them to have pre-created steps that the user can add to their workflow (like classifying certain features that are specific to their research field) without having to write a prompt for it. And for Hamel Husain’s happiness, they always show you the underlying prompt.
Elicit recently announced notebooks as a new interface to interact with their products: (fun fact, they tried to implement this 4 times before they landed on the right UX! We discuss this ~33:00 in the podcast)
The reasons why they picked notebooks as a UX all tie back to process:
* They are systematic; once you have a instruction/prompt that works on a paper, you can run hundreds of papers through the same workflow by creating a column. Notebooks can also be edited and exported at any point during the flow.
* They are transparent - Many papers include an opaque literature review as perfunctory context before getting to their novel contribution. But PDFs are “dead” and it is difficult to follow the thought process and exact research flow of the authors. Sharing “living” Elicit Notebooks opens up this process.
* They are unbounded - Research is an endless stream of rabbit holes. So it must be easy to dive deeper and follow up with extra steps, without losing the ability to surface for air.

We had a lot of fun recording this, and hope you have as much fun listening!
AI UX in SF
Long time Latent Spacenauts might remember our first AI UX meetup with Linus Lee, Geoffrey Litt, and Maggie Appleton last year. Well, Maggie has since joined Elicit, and they are all returning at the end of this month!
Sign up here: https://lu.ma/aiux
And submit demos here! https://forms.gle/iSwiesgBkn8oo4SS8
We expect the 200 seats to “sell out” fast. Attendees with demos will be prioritized.
Show Notes
* Elicit
* Ought (their previous non-profit)
* “Pivoting” with GPT-4
* Elicit notebooks launch
* Charlie
* Andreas’ Blog
Timestamps
* [00:00:00] Introductions
* [00:07:45] How Johan and Andreas Joined Forces to Create Elicit
* [00:10:26] Why Products > Research
* [00:15:49] The Evolution of Elicit's Product
* [00:19:44] Automating Literature Review Workflow
* [00:22:48] How GPT-3 to GPT-4 Changed Things
* [00:25:37] Managing LLM Pricing and Performance
* [00:31:07] Open vs. Closed: Elicit's Approach to Model Selection
* [00:31:56] Moving to Notebooks
* [00:39:11] Elicit's Budget for Model Queries and Evaluations
* [00:41:44] Impact of Long Context Windows
* [00:47:19] Underrated Features and Surprising Applications
* [00:51:35] Driving Systematic and Efficient Research
* [00:53:00] Elicit's Team Growth and Transition to a Public Benefit Corporation
* [00:55:22] Building AI for Good
Full Interview on YouTube
As always, a plug for our youtube version for the 80% of communication that is nonverbal:
Transcript
Alessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.
Swyx [00:00:15]: Hey, and today we are back in the studio with Andreas and Jungwon from Elicit. Welcome.
Jungwon [00:00:20]: Thanks guys.
Andreas [00:00:21]: It's great to be here.
Swyx [00:00:22]: Yeah. So I'll introduce you separately, but also, you know, we'd love to learn a little bit more about you personally. So Andreas, it looks like you started Elicit first, Jungwon joined later.
Andreas [00:00:32]: That's right. For all intents and purposes, the Elicit and also the Ought that existed before then were very different from what I started. So I think it's like fair to say that you co-founded it.
Swyx [00:00:43]: Got it. And Jungwon, you're a co-founder and COO of Elicit now.
Jungwon [00:00:46]: Yeah, that's right.
Swyx [00:00:47]: So there's a little bit of a history to this. I'm not super aware of like the sort of journey. I was aware of OTT and Elicit as sort of a nonprofit type situation. And recently you turned into like a B Corp, Public Benefit Corporation. So yeah, maybe if you want, you could take us through that journey of finding the problem. You know, obviously you're working together now. So like, how do you get together to decide to leave your startup career to join him?
Andreas [00:01:10]: Yeah, it's truly a very long journey. I guess truly, it kind of started in Germany when I was born. So even as a kid, I was always interested in AI, like I kind of went to the library. There were books about how to write programs in QBasic and like some of them talked about how to implement chatbots.
Jungwon [00:01:27]: To be clear, he grew up in like a tiny village on the outskirts of Munich called Dinkelschirben, where it's like a very, very idyllic German village.
Andreas [00:01:36]: Yeah, important to the story. So basically, the main thing is I've kind of always been thinking about AI my entire life and been thinking about, well, at some point, this is going to be a huge deal. It's going to be transformative. How can I work on it? And was thinking about it from when I was a teenager, after high school did a year where I started a startup with the intention to become rich. And then once I'm rich, I can affect the trajectory of AI. Did not become rich, decided to go back to college and study cognitive science there, which was like the closest thing I could find at the time to AI. In the last year of college, moved to the US to do a PhD at MIT, working on broadly kind of new programming languages for AI because it kind of seemed like the existing languages were not great at expressing world models and learning world models doing Bayesian inference. Was always thinking about, well, ultimately, the goal is to actually build tools that help people reason more clearly, ask and answer better questions and make better decisions. But for a long time, it seemed like the technology to put reasoning in machines just wasn't there. Initially, at the end of my postdoc at Stanford, I was thinking about, well, what to do? I think the standard path is you become an academic and do research. But it's really hard to actually build interesting tools as an academic. You can't really hire great engineers. Everything is kind of on a paper-to-paper timeline. And so I was like, well, maybe I should start a startup, pursued that for a little bit. But it seemed like it was too early because you could have tried to do an AI startup, but probably would not have been this kind of AI startup we're seeing now. So then decided to just start a nonprofit research lab that's going to do research for a while until we better figure out how to do thinking in machines. And that was odd. And then over time, it became clear how to actually build actual tools for reasoning. And only over time, we developed a better way to... I'll let you fill in some of the details here.
Jungwon [00:03:26]: Yeah. So I guess my story maybe starts around 2015. I kind of wanted to be a founder for a long time, and I wanted to work on an idea that stood the test of time for me, like an idea that stuck with me for a long time. And starting in 2015, actually, originally, I became interested in AI-based tools from the perspective of mental health. So there are a bunch of people around me who are really struggling. One really close friend in particular is really struggling with mental health and didn't have any support, and it didn't feel like there was anything before kind of like getting hospitalized that could just help her. And so luckily, she came and stayed with me for a while, and we were just able to talk through some things. But it seemed like lots of people might not have that resource, and something maybe AI-enabled could be much more scalable. I didn't feel ready to start a company then, that's 2015. And I also didn't feel like the technology was ready. So then I went into FinTech and kind of learned how to do the tech thing. And then in 2019, I felt like it was time for me to just jump in and build something on my own I really wanted to create. And at the time, I looked around at tech and felt like not super inspired by the options. I didn't want to have a tech career ladder, or I didn't want to climb the career ladder. There are two kind of interesting technologies at the time, there was AI and there was crypto. And I was like, well, the AI people seem like a little bit more nice, maybe like slightly more trustworthy, both super exciting, but threw my bet in on the AI side. And then I got connected to Andreas. And actually, the way he was thinking about pursuing the research agenda at OTT was really compatible with what I had envisioned for an ideal AI product, something that helps kind of take down really complex thinking, overwhelming thoughts and breaks it down into small pieces. And then this kind of mission that we need AI to help us figure out what we ought to do was really inspiring, right? Yeah, because I think it was clear that we were building the most powerful optimizer of our time. But as a society, we hadn't figured out how to direct that optimization potential. And if you kind of direct tremendous amounts of optimization potential at the wrong thing, that's really disastrous. So the goal of OTT was make sure that if we build the most transformative technology of our lifetime, it can be used for something really impactful, like good reasoning, like not just generating ads. My background was in marketing, but like, so I was like, I want to do more than generate ads with this. But also if these AI systems get to be super intelligent enough that they are doing this really complex reasoning, that we can trust them, that they are aligned with us and we have ways of evaluating that they're doing the right thing. So that's what OTT did. We did a lot of experiments, you know, like I just said, before foundation models really like took off. A lot of the issues we were seeing were more in reinforcement learning, but we saw a future where AI would be able to do more kind of logical reasoning, not just kind of extrapolate from numerical trends. We actually kind of set up experiments with people where kind of people stood in as super intelligent systems and we effectively gave them context windows. So they would have to like read a bunch of text and one person would get less text and one person would get all the texts and the person with less text would have to evaluate the work of the person who could read much more. So like in a world we were basically simulating, like in 2018, 2019, a world where an AI system could read significantly more than you and you as the person who couldn't read that much had to evaluate the work of the AI system. Yeah. So there's a lot of the work we did. And from that, we kind of iterated on the idea of breaking complex tasks down into smaller tasks, like complex tasks, like open-ended reasoning, logical reasoning into smaller tasks so that it's easier to train AI systems on them. And also so that it's easier to evaluate the work of the AI system when it's done. And then also kind of, you know, really pioneered this idea, the importance of supervising the process of AI systems, not just the outcomes. So a big part of how Elicit is built is we're very intentional about not just throwing a ton of data into a model and training it and then saying, cool, here's like scientific output. Like that's not at all what we do. Our approach is very much like, what are the steps that an expert human does or what is like an ideal process as granularly as possible, let's break that down and then train AI systems to perform each of those steps very robustly. When you train like that from the start, after the fact, it's much easier to evaluate, it's much easier to troubleshoot at each point. Like where did something break down? So yeah, we were working on those experiments for a while. And then at the start of 2021, decided to build a product.
Swyx [00:07:45]: Do you mind if I, because I think you're about to go into more modern thought and Elicit. And I just wanted to, because I think a lot of people are in where you were like sort of 2018, 19, where you chose a partner to work with. Yeah. Right. And you didn't know him. Yeah. Yeah. You were just kind of cold introduced. A lot of people are cold introduced. Yeah. Never work with them. I assume you had a lot, a lot of other options, right? Like how do you advise people to make those choices?
Jungwon [00:08:10]: We were not totally cold introduced. So one of our closest friends introduced us. And then Andreas had written a lot on the OTT website, a lot of blog posts, a lot of publications. And I just read it and I was like, wow, this sounds like my writing. And even other people, some of my closest friends I asked for advice from, they were like, oh, this sounds like your writing. But I think I also had some kind of like things I was looking for. I wanted someone with a complimentary skillset. I want someone who was very values aligned. And yeah, that was all a good fit.
Andreas [00:08:38]: We also did a pretty lengthy mutual evaluation process where we had a Google doc where we had all kinds of questions for each other. And I think it ended up being around 50 pages or so of like various like questions and back and forth.
Swyx [00:08:52]: Was it the YC list? There's some lists going around for co-founder questions.
Andreas [00:08:55]: No, we just made our own questions. But I guess it's probably related in that you ask yourself, what are the values you care about? How would you approach various decisions and things like that?
Jungwon [00:09:04]: I shared like all of my past performance reviews. Yeah. Yeah.
Swyx [00:09:08]: And he never had any. No.
Andreas [00:09:10]: Yeah.
Swyx [00:09:11]: Sorry, I just had to, a lot of people are going through that phase and you kind of skipped over it. I was like, no, no, no, no. There's like an interesting story.
Jungwon [00:09:20]: Yeah.
Alessio [00:09:21]: Yeah. Before we jump into what a list it is today, the history is a bit counterintuitive. So you start with figuring out, oh, if we had a super powerful model, how would we align it? But then you were actually like, well, let's just build the product so that people can actually leverage it. And I think there are a lot of folks today that are now back to where you were maybe five years ago that are like, oh, what if this happens rather than focusing on actually building something useful with it? What clicked for you to like move into a list and then we can cover that story too.
Andreas [00:09:49]: I think in many ways, the approach is still the same because the way we are building illicit is not let's train a foundation model to do more stuff. It's like, let's build a scaffolding such that we can deploy powerful models to good ends. I think it's different now in that we actually have like some of the models to plug in. But if in 2017, we had had the models, we could have run the same experiments we did run with humans back then, just with models. And so in many ways, our philosophy is always, let's think ahead to the future of what models are going to exist in one, two years or longer. And how can we make it so that they can actually be deployed in kind of transparent, controllable
Jungwon [00:10:26]: ways? I think motivationally, we both are kind of product people at heart. The research was really important and it didn't make sense to build a product at that time. But at the end of the day, the thing that always motivated us is imagining a world where high quality reasoning is really abundant and AI is a technology that's going to get us there. And there's a way to guide that technology with research, but we can have a more direct effect through product because with research, you publish the research and someone else has to implement that into the product and the product felt like a more direct path. And we wanted to concretely have an impact on people's lives. Yeah, I think the kind of personally, the motivation was we want to build for people.
Swyx [00:11:03]: Yep. And then just to recap as well, like the models you were using back then were like, I don't know, would they like BERT type stuff or T5 or I don't know what timeframe we're talking about here.
Andreas [00:11:14]: I guess to be clear, at the very beginning, we had humans do the work. And then I think the first models that kind of make sense were TPT-2 and TNLG and like Yeah, early generative models. We do also use like T5 based models even now started with TPT-2.
Swyx [00:11:30]: Yeah, cool. I'm just kind of curious about like, how do you start so early? You know, like now it's obvious where to start, but back then it wasn't.
Jungwon [00:11:37]: Yeah, I used to nag Andreas a lot. I was like, why are you talking to this? I don't know. I felt like TPT-2 is like clearly can't do anything. And I was like, Andreas, you're wasting your time, like playing with this toy. But yeah, he was right.
Alessio [00:11:50]: So what's the history of what Elicit actually does as a product? You recently announced that after four months, you get to a million in revenue. Obviously, a lot of people use it, get a lot of value, but it would initially kind of like structured data extraction from papers. Then you had kind of like concept grouping. And today, it's maybe like a more full stack research enabler, kind of like paper understander platform. What's the definitive definition of what Elicit is? And how did you get here?
Jungwon [00:12:15]: Yeah, we say Elicit is an AI research assistant. I think it will continue to evolve. That's part of why we're so excited about building and research, because there's just so much space. I think the current phase we're in right now, we talk about it as really trying to make Elicit the best place to understand what is known. So it's all a lot about like literature summarization. There's a ton of information that the world already knows. It's really hard to navigate, hard to make it relevant. So a lot of it is around document discovery and processing and analysis. I really kind of want to import some of the incredible productivity improvements we've seen in software engineering and data science and into research. So it's like, how can we make researchers like data scientists of text? That's why we're launching this new set of features called Notebooks. It's very much inspired by computational notebooks, like Jupyter Notebooks, you know, DeepNode or Colab, because they're so powerful and so flexible. And ultimately, when people are trying to get to an answer or understand insight, they're kind of like manipulating evidence and information. Today, that's all packaged in PDFs, which are super brittle. So with language models, we can decompose these PDFs into their underlying claims and evidence and insights, and then let researchers mash them up together, remix them and analyze them together. So yeah, I would say quite simply, overall, Elicit is an AI research assistant. Right now we're focused on text-based workflows, but long term, really want to kind of go further and further into reasoning and decision making.
Alessio [00:13:35]: And when you say AI research assistant, this is kind of meta research. So researchers use Elicit as a research assistant. It's not a generic you-can-research-anything type of tool, or it could be, but like, what are people using it for today?
Andreas [00:13:49]: Yeah. So specifically in science, a lot of people use human research assistants to do things. You tell your grad student, hey, here are a couple of papers. Can you look at all of these, see which of these have kind of sufficiently large populations and actually study the disease that I'm interested in, and then write out like, what are the experiments they did? What are the interventions they did? What are the outcomes? And kind of organize that for me. And the first phase of understanding what is known really focuses on automating that workflow because a lot of that work is pretty rote work. I think it's not the kind of thing that we need humans to do. Language models can do it. And then if language models can do it, you can obviously scale it up much more than a grad student or undergrad research assistant would be able to do.
Jungwon [00:14:31]: Yeah. The use cases are pretty broad. So we do have a very large percent of our users are just using it personally or for a mix of personal and professional things. People who care a lot about health or biohacking or parents who have children with a kind of rare disease and want to understand the literature directly. So there is an individual kind of consumer use case. We're most focused on the power users. So that's where we're really excited to build. So Lissette was very much inspired by this workflow in literature called systematic reviews or meta-analysis, which is basically the human state of the art for summarizing scientific literature. And it typically involves like five people working together for over a year. And they kind of first start by trying to find the maximally comprehensive set of papers possible. So it's like 10,000 papers. And they kind of systematically narrow that down to like hundreds or 50 extract key details from every single paper. Usually have two people doing it, like a third person reviewing it. So it's like an incredibly laborious, time consuming process, but you see it in every single domain. So in science, in machine learning, in policy, because it's so structured and designed to be reproducible, it's really amenable to automation. So that's kind of the workflow that we want to automate first. And then you make that accessible for any question and make these really robust living summaries of science. So yeah, that's one of the workflows that we're starting with.
Alessio [00:15:49]: Our previous guest, Mike Conover, he's building a new company called Brightwave, which is an AI research assistant for financial research. How do you see the future of these tools? Does everything converge to like a God researcher assistant, or is every domain going to have its own thing?
Andreas [00:16:03]: I think that's a good and mostly open question. I do think there are some differences across domains. For example, some research is more quantitative data analysis, and other research is more high level cross domain thinking. And we definitely want to contribute to the broad generalist reasoning type space. Like if researchers are making discoveries often, it's like, hey, this thing in biology is actually analogous to like these equations in economics or something. And that's just fundamentally a thing that where you need to reason across domains. At least within research, I think there will be like one best platform more or less for this type of generalist research. I think there may still be like some particular tools like for genomics, like particular types of modules of genes and proteins and whatnot. But for a lot of the kind of high level reasoning that humans do, I think that is a more of a winner type all thing.
Swyx [00:16:52]: I wanted to ask a little bit deeper about, I guess, the workflow that you mentioned. I like that phrase. I see that in your UI now, but that's as it is today. And I think you were about to tell us about how it was in 2021 and how it may be progressed. How has this workflow evolved over time?
Jungwon [00:17:07]: Yeah. So the very first version of Elicit actually wasn't even a research assistant. It was a forecasting assistant. So we set out and we were thinking about, you know, what are some of the most impactful types of reasoning that if we could scale up, AI would really transform the world. We actually started with literature review, but we're like, oh, so many people are going to build literature review tools. So let's start there. So then we focused on geopolitical forecasting. So I don't know if you're familiar with like manifold or manifold markets. That kind of stuff. Before manifold. Yeah. Yeah. I'm not predicting relationships. We're predicting like, is China going to invade Taiwan?
Swyx [00:17:38]: Markets for everything.
Andreas [00:17:39]: Yeah. That's a relationship.
Swyx [00:17:41]: Yeah.
Jungwon [00:17:42]: Yeah. It's true. And then we worked on that for a while. And then after GPT-3 came out, I think by that time we realized that originally we were trying to help people convert their beliefs into probability distributions. And so take fuzzy beliefs, but like model them more concretely. And then after a few months of iterating on that, just realize, oh, the thing that's blocking people from making interesting predictions about important events in the world is less kind of on the probabilistic side and much more on the research side. And so that kind of combined with the very generalist capabilities of GPT-3 prompted us to make a more general research assistant. Then we spent a few months iterating on what even is a research assistant. So we would embed with different researchers. We built data labeling workflows in the beginning, kind of right off the bat. We built ways to find experts in a field and like ways to ask good research questions. So we just kind of iterated through a lot of workflows and no one else was really building at this time. And it was like very quick to just do some prompt engineering and see like what is a task that is at the intersection of what's technologically capable and like important for researchers. And we had like a very nondescript landing page. It said nothing. But somehow people were signing up and we had to sign a form that was like, why are you here? And everyone was like, I need help with literature review. And we're like, oh, literature review. That sounds so hard. I don't even know what that means. We're like, we don't want to work on it. But then eventually we were like, okay, everyone is saying literature review. It's overwhelmingly people want to-
Swyx [00:19:02]: And all domains, not like medicine or physics or just all domains. Yeah.
Jungwon [00:19:06]: And we also kind of personally knew literature review was hard. And if you look at the graphs for academic literature being published every single month, you guys know this in machine learning, it's like up into the right, like superhuman amounts of papers. So we're like, all right, let's just try it. I was really nervous, but Andreas was like, this is kind of like the right problem space to jump into, even if we don't know what we're doing. So my take was like, fine, this feels really scary, but let's just launch a feature every single week and double our user numbers every month. And if we can do that, we'll fail fast and we will find something. I was worried about like getting lost in the kind of academic white space. So the very first version was actually a weekend prototype that Andreas made. Do you want to explain how that worked?
Andreas [00:19:44]: I mostly remember that it was really bad. The thing I remember is you entered a question and it would give you back a list of claims. So your question could be, I don't know, how does creatine affect cognition? It would give you back some claims that are to some extent based on papers, but they were often irrelevant. The papers were often irrelevant. And so we ended up soon just printing out a bunch of examples of results and putting them up on the wall so that we would kind of feel the constant shame of having such a bad product and would be incentivized to make it better. And I think over time it has gotten a lot better, but I think the initial version was like really very bad. Yeah.
Jungwon [00:20:20]: But it was basically like a natural language summary of an abstract, like kind of a one sentence summary, and which we still have. And then as we learned kind of more about this systematic review workflow, we started expanding the capability so that you could extract a lot more data from the papers and do more with that.
Swyx [00:20:33]: And were you using like embeddings and cosine similarity, that kind of stuff for retrieval, or was it keyword based?
Andreas [00:20:40]: I think the very first version didn't even have its own search engine. I think the very first version probably used the Semantic Scholar or API or something similar. And only later when we discovered that API is not very semantic, we then built our own search engine that has helped a lot.
Swyx [00:20:58]: And then we're going to go into like more recent products stuff, but like, you know, I think you seem the more sort of startup oriented business person and you seem sort of more ideologically like interested in research, obviously, because of your PhD. What kind of market sizing were you guys thinking? Right? Like, because you're here saying like, we have to double every month. And I'm like, I don't know how you make that conclusion from this, right? Especially also as a nonprofit at the time.
Jungwon [00:21:22]: I mean, market size wise, I felt like in this space where so much was changing and it was very unclear what of today was actually going to be true tomorrow. We just like really rested a lot on very, very simple fundamental principles, which is like, if you can understand the truth, that is very economically beneficial and valuable. If you like know the truth.
Swyx [00:21:42]: On principle.
Jungwon [00:21:43]: Yeah. That's enough for you. Yeah. Research is the key to many breakthroughs that are very commercially valuable.
Swyx [00:21:47]: Because my version of it is students are poor and they don't pay for anything. Right? But that's obviously not true. As you guys have found out. But you had to have some market insight for me to have believed that, but you skipped that.
Andreas [00:21:58]: Yeah. I remember talking to VCs for our seed round. A lot of VCs were like, you know, researchers, they don't have any money. Why don't you build legal assistant? I think in some short sighted way, maybe that's true. But I think in the long run, R&D is such a big space of the economy. I think if you can substantially improve how quickly people find new discoveries or avoid controlled trials that don't go anywhere, I think that's just huge amounts of money. And there are a lot of questions obviously about between here and there. But I think as long as the fundamental principle is there, we were okay with that. And I guess we found some investors who also were. Yeah.
Swyx [00:22:35]: Congrats. I mean, I'm sure we can cover the sort of flip later. I think you're about to start us on like GPT-3 and how that changed things for you. It's funny. I guess every major GPT version, you have some big insight. Yeah.
Jungwon [00:22:48]: Yeah. I mean, what do you think?
Andreas [00:22:51]: I think it's a little bit less true for us than for others, because we always believed that there will basically be human level machine work. And so it is definitely true that in practice for your product, as new models come out, your product starts working better, you can add some features that you couldn't add before. But I don't think we really ever had the moment where we were like, oh, wow, that is super unanticipated. We need to do something entirely different now from what was on the roadmap.
Jungwon [00:23:21]: I think GPT-3 was a big change because it kind of said, oh, now is the time that we can use AI to build these tools. And then GPT-4 was maybe a little bit more of an extension of GPT-3. GPT-3 over GPT-2 was like qualitative level shift. And then GPT-4 was like, okay, great. Now it's like more accurate. We're more accurate on these things. We can answer harder questions. But the shape of the product had already taken place by that time.
Swyx [00:23:44]: I kind of want to ask you about this sort of pivot that you've made. But I guess that was just a way to sell what you were doing, which is you're adding extra features on grouping by concepts. The GPT-4 pivot, quote unquote pivot that you-
Jungwon [00:23:55]: Oh, yeah, yeah, exactly. Right, right, right. Yeah. Yeah. When we launched this workflow, now that GPT-4 was available, basically Elisa was at a place where we have very tabular interfaces. So given a table of papers, you can extract data across all the tables. But you kind of want to take the analysis a step further. Sometimes what you'd care about is not having a list of papers, but a list of arguments, a list of effects, a list of interventions, a list of techniques. And so that's one of the things we're working on is now that you've extracted this information in a more structured way, can you pivot it or group by whatever the information that you extracted to have more insight first information still supported by the academic literature?
Swyx [00:24:33]: Yeah, that was a big revelation when I saw it. Basically, I think I'm very just impressed by how first principles, your ideas around what the workflow is. And I think that's why you're not as reliant on like the LLM improving, because it's actually just about improving the workflow that you would recommend to people. Today we might call it an agent, I don't know, but you're not relying on the LLM to drive it. It's relying on this is the way that Elicit does research. And this is what we think is most effective based on talking to our users.
Jungwon [00:25:01]: The problem space is still huge. Like if it's like this big, we are all still operating at this tiny part, bit of it. So I think about this a lot in the context of moats, people are like, oh, what's your moat? What happens if GPT-5 comes out? It's like, if GPT-5 comes out, there's still like all of this other space that we can go into. So I think being really obsessed with the problem, which is very, very big, has helped us like stay robust and just kind of directly incorporate model improvements and they keep going.
Swyx [00:25:26]: And then I first encountered you guys with Charlie, you can tell us about that project. Basically, yeah. Like how much did cost become a concern as you're working more and more with OpenAI? How do you manage that relationship?
Jungwon [00:25:37]: Let me talk about who Charlie is. And then you can talk about the tech, because Charlie is a special character. So Charlie, when we found him was, had just finished his freshman year at the University of Warwick. And I think he had heard about us on some discord. And then he applied and we were like, wow, who is this freshman? And then we just saw that he had done so many incredible side projects. And we were actually on a team retreat in Barcelona visiting our head of engineering at that time. And everyone was talking about this wonder kid or like this kid. And then on our take home project, he had done like the best of anyone to that point. And so people were just like so excited to hire him. So we hired him as an intern and they were like, Charlie, what if you just dropped out of school? And so then we convinced him to take a year off. And he was just incredibly productive. And I think the thing you're referring to is at the start of 2023, Anthropic kind of launched their constitutional AI paper. And within a few days, I think four days, he had basically implemented that in production. And then we had it in app a week or so after that. And he has since kind of contributed to major improvements, like cutting costs down to a tenth of what they were really large scale. But yeah, you can talk about the technical stuff. Yeah.
Andreas [00:26:39]: On the constitutional AI project, this was for abstract summarization, where in illicit, if you run a query, it'll return papers to you, and then it will summarize each paper with respect to your query for you on the fly. And that's a really important part of illicit because illicit does it so much. If you run a few searches, it'll have done it a few hundred times for you. And so we cared a lot about this both being fast, cheap, and also very low on hallucination. I think if illicit hallucinates something about the abstract, that's really not good. And so what Charlie did in that project was create a constitution that expressed what are the attributes of a good summary? Everything in the summary is reflected in the actual abstract, and it's like very concise, et cetera, et cetera. And then used RLHF with a model that was trained on the constitution to basically fine tune a better summarizer on an open source model. Yeah. I think that might still be in use.
Jungwon [00:27:34]: Yeah. Yeah, definitely. Yeah. I think at the time, the models hadn't been trained at all to be faithful to a text. So they were just generating. So then when you ask them a question, they tried too hard to answer the question and didn't try hard enough to answer the question given the text or answer what the text said about the question. So we had to basically teach the models to do that specific task.
Swyx [00:27:54]: How do you monitor the ongoing performance of your models? Not to get too LLM-opsy, but you are one of the larger, more well-known operations doing NLP at scale. I guess effectively, you have to monitor these things and nobody has a good answer that I talk to.
Andreas [00:28:10]: I don't think we have a good answer yet. I think the answers are actually a little bit clearer on the just kind of basic robustness side of where you can import ideas from normal software engineering and normal kind of DevOps. You're like, well, you need to monitor kind of latencies and response times and uptime and whatnot.
Swyx [00:28:27]: I think when we say performance, it's more about hallucination rate, isn't it?
Andreas [00:28:30]: And then things like hallucination rate where I think there, the really important thing is training time. So we care a lot about having our own internal benchmarks for model development that reflect the distribution of user queries so that we can know ahead of time how well is the model going to perform on different types of tasks. So the tasks being summarization, question answering, given a paper, ranking. And for each of those, we want to know what's the distribution of things the model is going to see so that we can have well-calibrated predictions on how well the model is going to do in production. And I think, yeah, there's some chance that there's distribution shift and actually the things users enter are going to be different. But I think that's much less important than getting the kind of training right and having very high quality, well-vetted data sets at training time.
Jungwon [00:29:18]: I think we also end up effectively monitoring by trying to evaluate new models as they come out. And so that kind of prompts us to go through our eval suite every couple of months. And every time a new model comes out, we have to see how is this performing relative to production and what we currently have.
Swyx [00:29:32]: Yeah. I mean, since we're on this topic, any new models that have really caught your eye this year?
Jungwon [00:29:37]: Like Claude came out with a bunch. Yeah. I think Claude is pretty, I think the team's pretty excited about Claude. Yeah.
Andreas [00:29:41]: Specifically, Claude Haiku is like a good point on the kind of Pareto frontier. It's neither the cheapest model, nor is it the most accurate, most high quality model, but it's just like a really good trade-off between cost and accuracy.
Swyx [00:29:57]: You apparently have to 10-shot it to make it good. I tried using Haiku for summarization, but zero-shot was not great. Then they were like, you know, it's a skill issue, you have to try harder.
Jungwon [00:30:07]: I think GPT-4 unlocked tables for us, processing data from tables, which was huge. GPT-4 Vision.
Andreas [00:30:13]: Yeah.
Swyx [00:30:14]: Yeah. Did you try like Fuyu? I guess you can't try Fuyu because it's non-commercial. That's the adept model.
Jungwon [00:30:19]: Yeah.
Swyx [00:30:20]: We haven't tried that one. Yeah. Yeah. Yeah. But Claude is multimodal as well. Yeah. I think the interesting insight that we got from talking to David Luan, who is CEO of multimodality has effectively two different flavors. One is we recognize images from a camera in the outside natural world. And actually the more important multimodality for knowledge work is screenshots and PDFs and charts and graphs. So we need a new term for that kind of multimodality.
Andreas [00:30:45]: But is the claim that current models are good at one or the other? Yeah.
Swyx [00:30:50]: They're over-indexed because of the history of computer vision is Coco, right? So now we're like, oh, actually, you know, screens are more important, OCR, handwriting. You mentioned a lot of like closed model lab stuff, and then you also have like this open source model fine tuning stuff. Like what is your workload now between closed and open? It's a good question.
Andreas [00:31:07]: I think- Is it half and half? It's a-
Swyx [00:31:10]: Is that even a relevant question or not? Is this a nonsensical question?
Andreas [00:31:13]: It depends a little bit on like how you index, whether you index by like computer cost or number of queries. I'd say like in terms of number of queries, it's maybe similar. In terms of like cost and compute, I think the closed models make up more of the budget since the main cases where you want to use closed models are cases where they're just smarter, where no existing open source models are quite smart enough.
Jungwon [00:31:35]: Yeah. Yeah.
Alessio [00:31:37]: We have a lot of interesting technical questions to go in, but just to wrap the kind of like UX evolution, now you have the notebooks. We talked a lot about how chatbots are not the final frontier, you know? How did you decide to get into notebooks, which is a very iterative kind of like interactive interface and yeah, maybe learnings from that.
Jungwon [00:31:56]: Yeah. This is actually our fourth time trying to make this work. Okay. I think the first time was probably in early 2021. I think because we've always been obsessed with this idea of task decomposition and like branching, we always wanted a tool that could be kind of unbounded where you could keep going, could do a lot of branching where you could kind of apply language model operations or computations on other tasks. So in 2021, we had this thing called composite tasks where you could use GPT-3 to brainstorm a bunch of research questions and then take each research question and decompose those further into sub questions. This kind of, again, that like task decomposition tree type thing was always very exciting to us, but that was like, it didn't work and it was kind of overwhelming. Then at the end of 22, I think we tried again and at that point we were thinking, okay, we've done a lot with this literature review thing. We also want to start helping with kind of adjacent domains and different workflows. Like we want to help more with machine learning. What does that look like? And as we were thinking about it, we're like, well, there are so many research workflows. How do we not just build three new workflows into Elicit, but make Elicit really generic to lots of workflows? What is like a generic composable system with nice abstractions that can like scale to all these workflows? So we like iterated on that a bunch and then didn't quite narrow the problem space enough or like quite get to what we wanted. And then I think it was at the beginning of 2023 where we're like, wow, computational notebooks kind of enable this, where they have a lot of flexibility, but kind of robust primitives such that you can extend the workflow and it's not limited. It's not like you ask a query, you get an answer, you're done. You can just constantly keep building on top of that. And each little step seems like a really good unit of work for the language model. And also there was just like really helpful to have a bit more preexisting work to emulate. Yeah, that's kind of how we ended up at computational notebooks for Elicit.
Andreas [00:33:44]: Maybe one thing that's worth making explicit is the difference between computational notebooks and chat, because on the surface, they seem pretty similar. It's kind of this iterative interaction where you add stuff. In both cases, you have a back and forth between you enter stuff and then you get some output and then you enter stuff. But the important difference in our minds is with notebooks, you can define a process. So in data science, you can be like, here's like my data analysis process that takes in a CSV and then does some extraction and then generates a figure at the end. And you can prototype it using a small CSV and then you can run it over a much larger CSV later. And similarly, the vision for notebooks in our case is to not make it this like one-off chat interaction, but to allow you to then say, if you start and first you're like, okay, let me just analyze a few papers and see, do I get to the correct conclusions for those few papers? Can I then later go back and say, now let me run this over 10,000 papers now that I've debugged the process using a few papers. And that's an interaction that doesn't fit quite as well into the chat framework because that's more for kind of quick back and forth interaction.
Alessio [00:34:49]: Do you think in notebooks, it's kind of like structure, editable chain of thought, basically step by step? Like, is that kind of where you see this going? And then are people going to reuse notebooks as like templates? And maybe in traditional notebooks, it's like cookbooks, right? You share a cookbook, you can start from there. Is this similar in Elizit?
Andreas [00:35:06]: Yeah, that's exactly right. So that's our hope that people will build templates, share them with other people. I think chain of thought is maybe still like kind of one level lower on the abstraction hierarchy than we would think of notebooks. I think we'll probably want to think about more semantic pieces like a building block is more like a paper search or an extraction or a list of concepts. And then the model's detailed reasoning will probably often be one level down. You always want to be able to see it, but you don't always want it to be front and center.
Alessio [00:35:36]: Yeah, what's the difference between a notebook and an agent? Since everybody always asks me, what's an agent? Like how do you think about where the line is?
Andreas [00:35:44]: Yeah, it's an interesting question. In the notebook world, I would generally think of the human as the agent in the first iteration. So you have the notebook and the human kind of adds little action steps. And then the next point on this kind of progress gradient is, okay, now you can use language models to predict which action would you take as a human. And at some point, you're probably going to be very good at this, you'll be like, okay, in some cases I can, with 99.9% accuracy, predict what you do. And then you might as well just execute it, like why wait for the human? And eventually, as you get better at this, that will just look more and more like agents taking actions as opposed to you doing the thing. I think templates are a specific case of this where you're like, okay, well, there's just particular sequences of actions that you often want to chunk and have available as primitives, just like in normal programming. And those, you can view them as action sequences of agents, or you can view them as more normal programming language abstraction thing. And I think those are two valid views. Yeah.
Alessio [00:36:40]: How do you see this change as, like you said, the models get better and you need less and less human actual interfacing with the model, you just get the results? Like how does the UX and the way people perceive it change?
Jungwon [00:36:52]: Yeah, I think this kind of interaction paradigms for evaluation is not really something the internet has encountered yet, because up to now, the internet has all been about getting data and work from people. So increasingly, I really want kind of evaluation, both from an interface perspective and from like a technical perspective and operation perspective to be a superpower for Elicit, because I think over time, models will do more and more of the work, and people will have to do more and more of the evaluation. So I think, yeah, in terms of the interface, some of the things we have today, you know, for every kind of language model generation, there's some citation back, and we kind of try to highlight the ground truth in the paper that is most relevant to whatever Elicit said, and make it super easy so that you can click on it and quickly see in context and validate whether the text actually supports the answer that Elicit gave. So I think we'd probably want to scale things up like that, like the ability to kind of spot check the model's work super quickly, scale up interfaces like that. And-
Swyx [00:37:44]: Who would spot check? The user?
Jungwon [00:37:46]: Yeah, to start, it would be the user. One of the other things we do is also kind of flag the model's uncertainty. So we have models report out, how confident are you that this was the sample size of this study? The model's not sure, we throw a flag. And so the user knows to prioritize checking that. So again, we can kind of scale that up. So when the model's like, well, I searched this on Google, I'm not sure if that was the right thing. I have an uncertainty flag, and the user can go and be like, oh, okay, that was actually the right thing to do or not.
Swyx [00:38:10]: I've tried to do uncertainty readings from models. I don't know if you have this live. You do? Yeah. Because I just didn't find them reliable because they just hallucinated their own uncertainty. I would love to base it on log probs or something more native within the model rather than generated. But okay, it sounds like they scale properly for you. Yeah.
Jungwon [00:38:30]: We found it to be pretty calibrated. It varies on the model.
Andreas [00:38:32]: I think in some cases, we also use two different models for the uncertainty estimates than for the question answering. So one model would say, here's my chain of thought, here's my answer. And then a different type of model. Let's say the first model is Llama, and let's say the second model is GPT-3.5. And then the second model just looks over the results and is like, okay, how confident are you in this? And I think sometimes using a different model can be better than using the same model. Yeah.
Swyx [00:38:58]: On the topic of models, evaluating models, obviously you can do that all day long. What's your budget? Because your queries fan out a lot. And then you have models evaluating models. One person typing in a question can lead to a thousand calls.
Andreas [00:39:11]: It depends on the project. So if the project is basically a systematic review that otherwise human research assistants would do, then the project is basically a human equivalent spend. And the spend can get quite large for those projects. I don't know, let's say $100,000. In those cases, you're happier to spend compute then in the kind of shallow search case where someone just enters a question because, I don't know, maybe I heard about creatine. What's it about? Probably don't want to spend a lot of compute on that. This sort of being able to invest more or less compute into getting more or less accurate answers is I think one of the core things we care about. And that I think is currently undervalued in the AI space. I think currently you can choose which model you want and you can sometimes, I don't know, you'll tip it and it'll try harder or you can try various things to get it to work harder. But you don't have great ways of converting willingness to spend into better answers. And we really want to build a product that has this sort of unbounded flavor where if you care about it a lot, you should be able to get really high quality answers, really double checked in every way.
Alessio [00:40:14]: And you have a credits-based pricing. So unlike most products, it's not a fixed monthly fee.
Jungwon [00:40:19]: Right, exactly. So some of the higher costs are tiered. So for most casual users, they'll just get the abstract summary, which is kind of an open source model. Then you can add more columns, which have more extractions and these uncertainty features. And then you can also add the same columns in high accuracy mode, which also parses the table. So we kind of stack the complexity on the calls.
Swyx [00:40:39]: You know, the fun thing you can do with a credit system, which is data for data, basically you can give people more credits if they give data back to you. I don't know if you've already done that. We've thought about something like this.
Jungwon [00:40:49]: It's like if you don't have money, but you have time, how do you exchange that?
Swyx [00:40:54]: It's a fair trade.
Jungwon [00:40:55]: I think it's interesting. We haven't quite operationalized it. And then, you know, there's been some kind of like adverse selection. Like, you know, for example, it would be really valuable to get feedback on our model. So maybe if you were willing to give more robust feedback on our results, we could give you credits or something like that. But then there's kind of this, will people take it seriously? And you want the good people. Exactly.
Swyx [00:41:11]: Can you tell who are the good people? Not right now.
Jungwon [00:41:13]: But yeah, maybe at the point where we can, we can offer it. We can offer it up to them.
Swyx [00:41:16]: The perplexity of questions asked, you know, if it's higher perplexity, these are the smarter
Jungwon [00:41:20]: people. Yeah, maybe.
Andreas [00:41:23]: If you put typos in your queries, you're not going to get off the stage.
Swyx [00:41:28]: Negative social credit. It's very topical right now to think about the threat of long context windows. All these models that we're talking about these days, all like a million token plus. Is that relevant for you? Can you make use of that? Is that just prohibitively expensive because you're just paying for all those tokens or you're just doing rag?
Andreas [00:41:44]: It's definitely relevant. And when we think about search, as many people do, we think about kind of a staged pipeline of retrieval where first you use semantic search database with embeddings, get like the, in our case, maybe 400 or so most relevant papers. And then, then you still need to rank those. And I think at that point it becomes pretty interesting to use larger models. So specifically in the past, I think a lot of ranking was kind of per item ranking where you would score each individual item, maybe using increasingly expensive scoring methods and then rank based on the scores. But I think list-wise re-ranking where you have a model that can see all the elements is a lot more powerful because often you can only really tell how good a thing is in comparison to other things and what things should come first. It really depends on like, well, what other things that are available, maybe you even care about diversity in your results. You don't want to show 10 very similar papers as the first 10 results. So I think a long context models are quite interesting there. And especially for our case where we care more about power users who are perhaps a little bit more willing to wait a little bit longer to get higher quality results relative to people who just quickly check out things because why not? And I think being able to spend more on longer contexts is quite valuable.
Jungwon [00:42:55]: Yeah. I think one thing the longer context models changed for us is maybe a focus from breaking down tasks to breaking down the evaluation. So before, you know, if we wanted to answer a question from the full text of a paper, we had to figure out how to chunk it and like find the relevant chunk and then answer based on that chunk. And the nice thing was then, you know, kind of which chunk the model used to answer the question. So if you want to help the user track it, yeah, you can be like, well, this was the chunk that the model got. And now if you put the whole text in the paper, you have to like kind of find the chunk like more retroactively basically. And so you need kind of like a different set of abilities and obviously like a different technology to figure out. You still want to point the user to the supporting quotes in the text, but then the interaction is a little different.
Swyx [00:43:38]: You like scan through and find some rouge score floor.
Andreas [00:43:41]: I think there's an interesting space of almost research problems here because you would ideally make causal claims like if this hadn't been in the text, the model wouldn't have said this thing. And maybe you can do expensive approximations to that where like, I don't know, you just throw out chunk of the paper and re-answer and see what happens. But hopefully there are better ways of doing that where you just get that kind of counterfactual information for free from the model.
Alessio [00:44:06]: Do you think at all about the cost of maintaining REG versus just putting more tokens in the window? I think in software development, a lot of times people buy developer productivity things so that we don't have to worry about it. Context window is kind of the same, right? You have to maintain chunking and like REG retrieval and like re-ranking and all of this versus I just shove everything into the context and like it costs a little more, but at least I don't have to do all of that. Is that something you thought about?
Jungwon [00:44:31]: I think we still like hit up against context limits enough that it's not really, do we still want to keep this REG around? It's like we do still need it for the scale of the work that we're doing, yeah.
Andreas [00:44:41]: And I think there are different kinds of maintainability. In one sense, I think you're right that throw everything into the context window thing is easier to maintain because you just can swap out a model. In another sense, if things go wrong, it's harder to debug where like, if you know, here's the process that we go through to go from 200 million papers to an answer. And there are like little steps and you understand, okay, this is the step that finds the relevant paragraph or whatever it may be. You'll know which step breaks if the answers are bad, whereas if it's just like a new model version came out and now it suddenly doesn't find your needle in a haystack anymore, then you're like, okay, what can you do? You're kind of at a loss.
Alessio [00:45:21]: Let's talk a bit about, yeah, needle in a haystack and like maybe the opposite of it, which is like hard grounding. I don't know if that's like the best name to think about it, but I was using one of these chatwitcher documents features and I put the AMD MI300 specs and the new Blackwell chips from NVIDIA and I was asking questions and does the AMD chip support NVLink? And the response was like, oh, it doesn't say in the specs. But if you ask GPD 4 without the docs, it would tell you no, because NVLink it's a NVIDIA technology.
Swyx [00:45:49]: It just says in the thing.
Alessio [00:45:53]: How do you think about that? Does using the context sometimes suppress the knowledge that the model has?
Andreas [00:45:57]: It really depends on the task because I think sometimes that is exactly what you want. So imagine you're a researcher, you're writing the background section of your paper and you're trying to describe what these other papers say. You really don't want extra information to be introduced there. In other cases where you're just trying to figure out the truth and you're giving the documents because you think they will help the model figure out what the truth is. I think you do want, if the model has a hunch that there might be something that's not in the papers, you do want to surface that. I think ideally you still don't want the model to just tell you, probably the ideal thing looks a bit more like agent control where the model can issue a query that then is intended to surface documents that substantiate its hunch. That's maybe a reasonable middle ground between model just telling you and model being fully limited to the papers you give it.
Jungwon [00:46:44]: Yeah, I would say it's, they're just kind of different tasks right now. And the task that Elicit is mostly focused on is what do these papers say? But there's another task which is like, just give me the best possible answer and that give me the best possible answer sometimes depends on what do these papers say, but it can also depend on other stuff that's not in the papers. So ideally we can do both and then kind of do this overall task for you more going forward.
Alessio [00:47:08]: We see a lot of details, but just to zoom back out a little bit, what are maybe the most underrated features of Elicit and what is one thing that maybe the users surprise you the most by using it?
Jungwon [00:47:19]: I think the most powerful feature of Elicit is the ability to extract, add columns to this table, which effectively extracts data from all of your papers at once. It's well used, but there are kind of many different extensions of that that I think users are still discovering. So one is we let you give a description of the column. We let you give instructions of a column. We let you create custom columns. So we have like 30 plus predefined fields that users can extract, like what were the methods? What were the main findings? How many people were studied? And we actually show you basically the prompts that we're using to extract that from our predefined fields. And then you can fork this and you can say, oh, actually I don't care about the population of people. I only care about the population of rats. Like you can change the instruction. So I think users are still kind of discovering that there's both this predefined, easy to use default, but that they can extend it to be much more specific to them. And then they can also ask custom questions. One use case of that is you can start to create different column types that you might not expect. So instead of just creating generative answers, like a description of the methodology, you can say classify the methodology into a prospective study, a retrospective study, or a case study. And then you can filter based on that. It's like all using the same kind of technology and the interface, but it unlocks different workflows. So I think that the ability to ask custom questions, give instructions, and specifically use that to create different types of columns, like classification columns, is still pretty underrated. In terms of use case, I spoke to someone who works in medical affairs at a genomic sequencing company recently. So doctors kind of order these genomic tests, these sequencing tests, to kind of identify if a patient has a particular disease. This company helps them process it. And this person basically interacts with all the doctors and if the doctors have any questions. My understanding is that medical affairs is kind of like customer support or customer success in pharma. So this person like talks to doctors all day long. One of the things they started using Elicit for is like putting the results of their tests as the query. Like this test showed, you know, this percentage presence of this and 40% that and whatever, you know, what genes are present here or what's in this sample. And getting kind of a list of academic papers that would support their findings and using this to help doctors interpret their tests. So we talked about, okay, cool, like if we built, he's pretty interested in kind of doing a survey of infectious disease specialists and getting them to evaluate, you know, having them write up their answers, comparing it to Elicit's answers, trying to see can Elicit start being used to interpret the results of these diagnostic tests. Because the way they ship these tests to doctors is they report on a really wide array of things. He was saying that at a large, well-resourced hospital, like a city hospital, there might be a team of infectious disease specialists who can help interpret these results. But at under-resourced hospitals or more rural hospitals, the primary care physician can't interpret the test results, so then they can't order it, they can't use it, they can't help their patients with it. So thinking about an evidence-backed way of interpreting these tests is definitely kind of an extension of the product that I hadn't considered before. But yeah, the idea of using that to bring more access to physicians in all different parts of the country and helping them interpret complicated science is pretty cool.
Alessio [00:50:28]: Yeah. We had Kanjun from Imbue on the podcast and we talked about better allocating scientific resources. How do you think about these use cases and maybe how illicit can help drive more research? And do you see a world in which maybe the models actually do some of the research before suggesting us?
Andreas [00:50:45]: Yeah, I think that's very close to what we care about. Our product values are systematic, transparent, and unbounded. And I think to make research especially more systematic and unbounded, I think is basically the thing that's at stake here. So for example, I was recently talking to people in longevity and I think there isn't really one field of longevity, there are kind of different scientific subdomains that are surfacing various things that are related to longevity. And I think if you could more systematically say, look, here are all the different interventions we could do and here's the expected ROI of these experiments. Here's like the evidence so far that supports those being either likely to surface new information or not. Here's the cost of these experiments. I think you could be so much more systematic than science is today. I'd guess in like 10, 20 years we'll look back and it will be incredible how unsystematic science was back in the day.
Jungwon [00:51:35]: Our view is kind of have models catch up to expert humans today. Start with kind of novice humans and then increasingly expert humans. But we really want the models to earn their right to the expertise. So that's why we do things in this very step-by-step way. That's why we don't just like throw a bunch of data and apply a bunch of compute and hope we get good results. But obviously at some point you hope that once it's kind of earned its stripes, it can surpass human researchers. But I think that's where making sure that the model's processes are really explicit and transparent and that it's really easy to evaluate is important because if it does surpass human understanding, people will still need to be able to audit its work somehow or spot check its work somehow to be able to reliably trust it and use it. So yeah, that's kind of why the process-based approach is really important.
Andreas [00:52:20]: And on the question of will models do their own research, I think one feature that most currently don't have that will need to be better there is better world models. I think currently models are just not great at representing what's going on in a particular situation or domain in a way that allows them to come to interesting, surprising conclusions. I think they're very good at coming to conclusions that are nearby to conclusions that people have come to. They're not as good at kind of reasoning and making surprising connections maybe. And so having deeper models of what are the underlying structures of different domains, how they're related or not related, I think will be an important ingredient for models actually being able to make novel contributions.
Swyx [00:53:00]: On the topic of hiring more expert humans, you've hired some very expert humans. My friend Maggie Appleton joined you guys I think maybe a year ago-ish. In fact, I think you're doing an offsite and we're actually organizing our biggest AI UX meetup around whenever she's in town in San Francisco. How big is the team? How have you sort of transitioned your company into this sort of PBC and sort of the plan for the future?
Jungwon [00:53:21]: Yeah, we're 12 people now. About half of us are in the Bay Area and then distributed across US and Europe, a mix of mostly kind of roles in engineering and product. Yeah, and I think that the transition to PBC was really not that eventful because I think we're already, even as a nonprofit, we are already shipping every week, so very much operating as a product. Very much at the start, yeah. Yeah. And then I would say the kind of PBC component was to very explicitly say that we have a mission that we care a lot about. There are a lot of ways to make money. We think our mission will make us a lot of money, but we are going to be opinionated about how we make money. We're going to take the version of making a lot of money that's in line with our mission. But it's like all very convergent. Like illicit is not going to make any money if it's a bad product, if it doesn't actually help you discover truth and do research more rigorously. So I think for us, the kind of mission and the success of the company are very intertwined. We're hoping to grow the team quite a lot this year. Probably some of our highest priority roles are in engineering, but also opening up roles more in design and product marketing, go to market. Yeah. Do you want to talk about the roles?
Andreas [00:54:23]: Yeah. Broadly, we're just looking for senior software engineers and don't need any particular AI expertise. A lot of it is just how do you build good orchestration for complex tasks? So we talked earlier about these are sort of notebooks, scaling up, task orchestration. And I think a lot of this looks more like traditional software engineering than it does look like machine learning research. And I think the people who are really good at building good abstractions, building applications that can kind of survive, even if some of their pieces break, like making reliable components out of unreliable pieces. I think those are the people that we're looking for.
Swyx [00:54:57]: You know, that's exactly what I used to do. Have you explored the existing orchestration frameworks, Temporal, Airflow, Daxter, Prefect?
Andreas [00:55:05]: We've looked into them a little bit. I think we have some specific requirements around being able to stream work back very quickly to our users. Those could definitely be relevant. Okay.
Swyx [00:55:15]: Well, you're hiring. I'm sure we'll plug all the links. Thank you so much for coming. Any parting words? Any words of wisdom? Models do you live by?
Jungwon [00:55:22]: I think it's a really important time for humanity. So I hope everyone listening to this podcast can think hard about exactly how they want to participate in this story. There's so much to build and we can be really intentional about what we align ourselves with. There are a lot of applications that are going to be really good for the world and a lot of applications that are not. And so, yeah, I hope people can take that seriously and kind of seize the moment. Yeah.
Swyx [00:55:46]: I love how intentional you guys have been. Thank you for sharing that story.
Jungwon [00:55:49]: Thank you. Yeah.
Andreas [00:55:51]: Thank you for coming on.
Jungwon [00:56:17]: Yeah. Thank you.

Get full access to Latent Space at www.latent.space/subscribe

11 April 2024, 8:15 pm
2 hours 45 minutes

Latent Space Chats: NLW (Four Wars, GPT5), Josh Albrecht/Ali Rohde (TNAI), Dylan Patel/Semianalysis (Groq), Milind Naphade (Nvidia GTC), Personal AI (ft. Harrison Chase — LangFriend/LangMem)

Our next 2 big events are AI UX and the World’s Fair. Join and apply to speak/sponsor!
Due to timing issues we didn’t have an interview episode to share with you this week, but not to worry, we have more than enough “weekend special” content in the backlog for you to get your Latent Space fix, whether you like thinking about the big picture, or learning more about the pod behind the scenes, or talking Groq and GPUs, or AI Leadership, or Personal AI.
Enjoy!
AI Breakdown
The indefatigable NLW had us back on his show for an update on the Four Wars, covering Sora, Suno, and the reshaped GPT-4 Class Landscape:
and a longer segment on AI Engineering trends covering the future LLM landscape (Llama 3, GPT-5, Gemini 2, Claude 4), Open Source Models (Mistral, Grok), Apple and Meta’s AI strategy, new chips (Groq, MatX) and the general movement from baby AGIs to vertical Agents:
Thursday Nights in AI
We’re also including swyx’s interview with Josh Albrecht and Ali Rohde to reintroduce swyx and Latent Space to a general audience, and engage in some spicy Q&A:
Dylan Patel on Groq
We hosted a private event with Dylan Patel of SemiAnalysis (our last pod here):
Not all of it could be released so we just talked about our Groq estimates:
Milind Naphade - Capital One
In relation to conversations at NeurIPS and Nvidia GTC and upcoming at World’s Fair, we also enjoyed chatting with Milind Naphade about his AI Leadership work at IBM, Cisco, Nvidia, and now leading the AI Foundations org at Capital One. We covered:
* Milind’s learnings from ~25 years in machine learning
* His first paper citation was 24 years ago
* Lessons from working with Jensen Huang for 6 years and being CTO of Metropolis
* Thoughts on relevant AI research
* GTC takeaways and what makes NVIDIA special
If you’d like to work on building solutions rather than platform (as Milind put it), his Applied AI Research team at Capital One is hiring, which falls under the Capital One Tech team.
Personal AI Meetup
It all started with a meme:
Within days of each other, BEE, FRIEND, EmilyAI, Compass, Nox and LangFriend were all launching personal AI wearables and assistants. So we decided to put together a the world’s first Personal AI meetup featuring creators and enthusiasts of wearables. The full video is live now, with full show notes within.
Timestamps
* [00:01:13] AI Breakdown Part 1
* [00:02:20] Four Wars
* [00:13:45] Sora
* [00:15:12] Suno
* [00:16:34] The GPT-4 Class Landscape
* [00:17:03] Data War: Reddit x Google
* [00:21:53] Gemini 1.5 vs Claude 3
* [00:26:58] AI Breakdown Part 2
* [00:27:33] Next Frontiers: Llama 3, GPT-5, Gemini 2, Claude 4
* [00:31:11] Open Source Models - Mistral, Grok
* [00:34:13] Apple MM1
* [00:37:33] Meta's $800b AI rebrand
* [00:39:20] AI Engineer landscape - from baby AGIs to vertical Agents
* [00:47:28] Adept episode - Screen Multimodality
* [00:48:54] Top Model Research from January Recap
* [00:53:08] AI Wearables
* [00:57:26] Groq vs Nvidia month - GPU Chip War
* [01:00:31] Disagreements
* [01:02:08] Summer 2024 Predictions
* [01:04:18] Thursday Nights in AI - swyx
* [01:33:34] Dylan Patel - Semianalysis + Latent Space Live Show
* [01:34:58] Groq
Transcript
[00:00:00] swyx: Welcome to the Latent Space Podcast Weekend Edition. This is Charlie, your AI co host. Swyx and Alessio are off for the week, making more great content. We have exciting interviews coming up with Elicit, Chroma, Instructor, and our upcoming series on NSFW, Not Safe for Work AI. In today's episode, we're collating some of Swyx and Alessio's recent appearances, all in one place for you to find.
[00:00:32] swyx: In part one, we have our first crossover pod of the year. In our listener survey, several folks asked for more thoughts from our two hosts. In 2023, Swyx and Alessio did crossover interviews with other great podcasts like the AI Breakdown, Practical AI, Cognitive Revolution, Thursday Eye, and Chinatalk, all of which you can find in the Latentspace About page.
[00:00:56] swyx: NLW of the AI Breakdown asked us back to do a special on the 4Wars framework and the AI engineer scene. We love AI Breakdown as one of the best examples Daily podcasts to keep up on AI news, so we were especially excited to be back on Watch out and take
[00:01:12] NLW: care
[00:01:13] AI Breakdown Part 1
[00:01:13] NLW: today on the AI breakdown. Part one of my conversation with Alessio and Swix from Latent Space.
[00:01:19] NLW: All right, fellas, welcome back to the AI Breakdown. How are you doing? I'm good. Very good. With the last, the last time we did this show, we were like, oh yeah, let's do check ins like monthly about all the things that are going on and then. Of course, six months later, and, you know, the, the, the world has changed in a thousand ways.
[00:01:36] NLW: It's just, it's too busy to even, to even think about podcasting sometimes. But I, I'm super excited to, to be chatting with you again. I think there's, there's a lot to, to catch up on, just to tap in, I think in the, you know, in the beginning of 2024. And, and so, you know, we're gonna talk today about just kind of a, a, a broad sense of where things are in some of the key battles in the AI space.
[00:01:55] NLW: And then the, you know, one of the big things that I, that I'm really excited to have you guys on here for us to talk about where, sort of what patterns you're seeing and what people are actually trying to build, you know, where, where developers are spending their, their time and energy and, and, and any sort of, you know, trend trends there, but maybe let's start I guess by checking in on a framework that you guys actually introduced, which I've loved and I've cribbed a couple of times now, which is this sort of four wars of the, of the AI stack.
[00:02:20] Four Wars
[00:02:20] NLW: Because first, since I have you here, I'd love, I'd love to hear sort of like where that started gelling. And then and then maybe we can get into, I think a couple of them that are you know, particularly interesting, you know, in the, in light of
[00:02:30] swyx: some recent news. Yeah, so maybe I'll take this one. So the four wars is a framework that I came up around trying to recap all of 2023.
[00:02:38] swyx: I tried to write sort of monthly recap pieces. And I was trying to figure out like what makes one piece of news last longer than another or more significant than another. And I think it's basically always around battlegrounds. Wars are fought around limited resources. And I think probably the, you know, the most limited resource is talent, but the talent expresses itself in a number of areas.
[00:03:01] swyx: And so I kind of focus on those, those areas at first. So the four wars that we cover are the data wars, the GPU rich, poor war, the multi modal war, And the RAG and Ops War. And I think you actually did a dedicated episode to that, so thanks for covering that. Yeah, yeah.
[00:03:18] NLW: Not only did I do a dedicated episode, I actually used that.
[00:03:22] NLW: I can't remember if I told you guys. I did give you big shoutouts. But I used it as a framework for a presentation at Intel's big AI event that they hold each year, where they have all their folks who are working on AI internally. And it totally resonated. That's amazing. Yeah, so, so, what got me thinking about it again is specifically this inflection news that we recently had, this sort of, you know, basically, I can't imagine that anyone who's listening wouldn't have thought about it, but, you know, inflection is a one of the big contenders, right?
[00:03:53] NLW: I think probably most folks would have put them, you know, just a half step behind the anthropics and open AIs of the world in terms of labs, but it's a company that raised 1. 3 billion last year, less than a year ago. Reed Hoffman's a co founder Mustafa Suleyman, who's a co founder of DeepMind, you know, so it's like, this is not a a small startup, let's say, at least in terms of perception.
[00:04:13] NLW: And then we get the news that basically most of the team, it appears, is heading over to Microsoft and they're bringing in a new CEO. And you know, I'm interested in, in, in kind of your take on how much that reflects, like hold aside, I guess, you know, all the other things that it might be about, how much it reflects this sort of the, the stark.
[00:04:32] NLW: Brutal reality of competing in the frontier model space right now. And, you know, just the access to compute.
[00:04:38] Alessio: There are a lot of things to say. So first of all, there's always somebody who's more GPU rich than you. So inflection is GPU rich by startup standard. I think about 22, 000 H100s, but obviously that pales compared to the, to Microsoft.
[00:04:55] Alessio: The other thing is that this is probably good news, maybe for the startups. It's like being GPU rich, it's not enough. You know, like I think they were building something pretty interesting in, in pi of their own model of their own kind of experience. But at the end of the day, you're the interface that people consume as end users.
[00:05:13] Alessio: It's really similar to a lot of the others. So and we'll tell, talk about GPT four and cloud tree and all this stuff. GPU poor, doing something. That the GPU rich are not interested in, you know we just had our AI center of excellence at Decibel and one of the AI leads at one of the big companies was like, Oh, we just saved 10 million and we use these models to do a translation, you know, and that's it.
[00:05:39] Alessio: It's not, it's not a GI, it's just translation. So I think like the inflection part is maybe. A calling and a waking to a lot of startups then say, Hey, you know, trying to get as much capital as possible, try and get as many GPUs as possible. Good. But at the end of the day, it doesn't build a business, you know, and maybe what inflection I don't, I don't, again, I don't know the reasons behind the inflection choice, but if you say, I don't want to build my own company that has 1.
[00:06:05] Alessio: 3 billion and I want to go do it at Microsoft, it's probably not a resources problem. It's more of strategic decisions that you're making as a company. So yeah, that was kind of my. I take on it.
[00:06:15] swyx: Yeah, and I guess on my end, two things actually happened yesterday. It was a little bit quieter news, but Stability AI had some pretty major departures as well.
[00:06:25] swyx: And you may not be considering it, but Stability is actually also a GPU rich company in the sense that they were the first new startup in this AI wave to brag about how many GPUs that they have. And you should join them. And you know, Imadis is definitely a GPU trader in some sense from his hedge fund days.
[00:06:43] swyx: So Robin Rhombach and like the most of the Stable Diffusion 3 people left Stability yesterday as well. So yesterday was kind of like a big news day for the GPU rich companies, both Inflection and Stability having sort of wind taken out of their sails. I think, yes, it's a data point in the favor of Like, just because you have the GPUs doesn't mean you can, you automatically win.
[00:07:03] swyx: And I think, you know, kind of I'll echo what Alessio says there. But in general also, like, I wonder if this is like the start of a major consolidation wave, just in terms of, you know, I think that there was a lot of funding last year and, you know, the business models have not been, you know, All of these things worked out very well.
[00:07:19] swyx: Even inflection couldn't do it. And so I think maybe that's the start of a small consolidation wave. I don't think that's like a sign of AI winter. I keep looking for AI winter coming. I think this is kind of like a brief cold front. Yeah,
[00:07:34] NLW: it's super interesting. So I think a bunch of A bunch of stuff here.
[00:07:38] NLW: One is, I think, to both of your points, there, in some ways, there, there had already been this very clear demarcation between these two sides where, like, the GPU pores, to use the terminology, like, just weren't trying to compete on the same level, right? You know, the vast majority of people who have started something over the last year, year and a half, call it, were racing in a different direction.
[00:07:59] NLW: They're trying to find some edge somewhere else. They're trying to build something different. If they're, if they're really trying to innovate, it's in different areas. And so it's really just this very small handful of companies that are in this like very, you know, it's like the coheres and jaspers of the world that like this sort of, you know, that are that are just sort of a little bit less resourced than, you know, than the other set that I think that this potentially even applies to, you know, everyone else that could clearly demarcate it into these two, two sides.
[00:08:26] NLW: And there's only a small handful kind of sitting uncomfortably in the middle, perhaps. Let's, let's come back to the idea of, of the sort of AI winter or, you know, a cold front or anything like that. So this is something that I, I spent a lot of time kind of thinking about and noticing. And my perception is that The vast majority of the folks who are trying to call for sort of, you know, a trough of disillusionment or, you know, a shifting of the phase to that are people who either, A, just don't like AI for some other reason there's plenty of that, you know, people who are saying, You Look, they're doing way worse than they ever thought.
[00:09:03] NLW: You know, there's a lot of sort of confirmation bias kind of thing going on. Or two, media that just needs a different narrative, right? Because they're sort of sick of, you know, telling the same story. Same thing happened last summer, when every every outlet jumped on the chat GPT at its first down month story to try to really like kind of hammer this idea that that the hype was too much.
[00:09:24] NLW: Meanwhile, you have, you know, just ridiculous levels of investment from enterprises, you know, coming in. You have, you know, huge, huge volumes of, you know, individual behavior change happening. But I do think that there's nothing incoherent sort of to your point, Swyx, about that and the consolidation period.
[00:09:42] NLW: Like, you know, if you look right now, for example, there are, I don't know, probably 25 or 30 credible, like, build your own chatbot. platforms that, you know, a lot of which have, you know, raised funding. There's no universe in which all of those are successful across, you know, even with a, even, even with a total addressable market of every enterprise in the world, you know, you're just inevitably going to see some amount of consolidation.
[00:10:08] NLW: Same with, you know, image generators. There are, if you look at A16Z's top 50 consumer AI apps, just based on, you know, web traffic or whatever, they're still like I don't know, a half. Dozen or 10 or something, like, some ridiculous number of like, basically things like Midjourney or Dolly three. And it just seems impossible that we're gonna have that many, you know, ultimately as, as, as sort of, you know, going, going concerned.
[00:10:33] NLW: So, I don't know. I, I, I think that the, there will be inevitable consolidation 'cause you know. It's, it's also what kind of like venture rounds are supposed to do. You're not, not everyone who gets a seed round is supposed to get to series A and not everyone who gets a series A is supposed to get to series B.
[00:10:46] NLW: That's sort of the natural process. I think it will be tempting for a lot of people to try to infer from that something about AI not being as sort of big or as as sort of relevant as, as it was hyped up to be. But I, I kind of think that's the wrong conclusion to come to.
[00:11:02] Alessio: I I would say the experimentation.
[00:11:04] Alessio: Surface is a little smaller for image generation. So if you go back maybe six, nine months, most people will tell you, why would you build a coding assistant when like Copilot and GitHub are just going to win everything because they have the data and they have all the stuff. If you fast forward today, A lot of people use Cursor everybody was excited about the Devin release on Twitter.
[00:11:26] Alessio: There are a lot of different ways of attacking the market that are not completion of code in the IDE. And even Cursors, like they evolved beyond single line to like chat, to do multi line edits and, and all that stuff. Image generation, I would say, yeah, as a, just as from what I've seen, like maybe the product innovation has slowed down at the UX level and people are improving the models.
[00:11:50] Alessio: So the race is like, how do I make better images? It's not like, how do I make the user interact with the generation process better? And that gets tough, you know? It's hard to like really differentiate yourselves. So yeah, that's kind of how I look at it. And when we think about multimodality, maybe the reason why people got so excited about Sora is like, oh, this is like a completely It's not a better image model.
[00:12:13] Alessio: This is like a completely different thing, you know? And I think the creative mind It's always looking for something that impacts the viewer in a different way, you know, like they really want something different versus the developer mind. It's like, Oh, I, I just, I have this like very annoying thing I want better.
[00:12:32] Alessio: I have this like very specific use cases that I want to go after. So it's just different. And that's why you see a lot more companies in image generation. But I agree with you that. If you fast forward there, there's not going to be 10 of them, you know, it's probably going to be one or
[00:12:46] swyx: two. Yeah, I mean, to me, that's why I call it a war.
[00:12:49] swyx: Like, individually, all these companies can make a story that kind of makes sense, but collectively, they cannot all be true. Therefore, they all, there is some kind of fight over limited resources here. Yeah, so
[00:12:59] NLW: it's interesting. We wandered very naturally into sort of another one of these wars, which is the multimodality kind of idea, which is, you know, basically a question of whether it's going to be these sort of big everything models that end up winning or whether, you know, you're going to have really specific things, you know, like something, you know, Dolly 3 inside of sort of OpenAI's larger models versus, you know, a mid journey or something like that.
[00:13:24] NLW: And at first, you know, I was kind of thinking like, For most of the last, call it six months or whatever, it feels pretty definitively both and in some ways, you know, and that you're, you're seeing just like great innovation on sort of the everything models, but you're also seeing lots and lots happen at sort of the level of kind of individual use cases.
[00:13:45] Sora
[00:13:45] NLW: But then Sora comes along and just like obliterates what I think anyone thought you know, where we were when it comes to video generation. So how are you guys thinking about this particular battle or war at the moment?
[00:13:59] swyx: Yeah, this was definitely a both and story, and Sora tipped things one way for me, in terms of scale being all you need.
[00:14:08] swyx: And the benefit, I think, of having multiple models being developed under one roof. I think a lot of people aren't aware that Sora was developed in a similar fashion to Dolly 3. And Dolly3 had a very interesting paper out where they talked about how they sort of bootstrapped their synthetic data based on GPT 4 vision and GPT 4.
[00:14:31] swyx: And, and it was just all, like, really interesting, like, if you work on one modality, it enables you to work on other modalities, and all that is more, is, is more interesting. I think it's beneficial if it's all in the same house, whereas the individual startups who don't, who sort of carve out a single modality and work on that, definitely won't have the state of the art stuff on helping them out on synthetic data.
[00:14:52] swyx: So I do think like, The balance is tilted a little bit towards the God model companies, which is challenging for the, for the, for the the sort of dedicated modality companies. But everyone's carving out different niches. You know, like we just interviewed Suno ai, the sort of music model company, and, you know, I don't see opening AI pursuing music anytime soon.
[00:15:12] Suno
[00:15:12] swyx: Yeah,
[00:15:13] NLW: Suno's been phenomenal to play with. Suno has done that rare thing where, which I think a number of different AI product categories have done, where people who don't consider themselves particularly interested in doing the thing that the AI enables find themselves doing a lot more of that thing, right?
[00:15:29] NLW: Like, it'd be one thing if Just musicians were excited about Suno and using it but what you're seeing is tons of people who just like music all of a sudden like playing around with it and finding themselves kind of down that rabbit hole, which I think is kind of like the highest compliment that you can give one of these startups at the
[00:15:45] swyx: early days of it.
[00:15:46] swyx: Yeah, I, you know, I, I asked them directly, you know, in the interview about whether they consider themselves mid journey for music. And he had a more sort of nuanced response there, but I think that probably the business model is going to be very similar because he's focused on the B2C element of that. So yeah, I mean, you know, just to, just to tie back to the question about, you know, You know, large multi modality companies versus small dedicated modality companies.
[00:16:10] swyx: Yeah, highly recommend people to read the Sora blog posts and then read through to the Dali blog posts because they, they strongly correlated themselves with the same synthetic data bootstrapping methods as Dali. And I think once you make those connections, you're like, oh, like it, it, it is beneficial to have multiple state of the art models in house that all help each other.
[00:16:28] swyx: And these, this, that's the one thing that a dedicated modality company cannot do.
[00:16:34] The GPT-4 Class Landscape
[00:16:34] NLW: So I, I wanna jump, I wanna kind of build off that and, and move into the sort of like updated GPT-4 class landscape. 'cause that's obviously been another big change over the last couple months. But for the sake of completeness, is there anything that's worth touching on with with sort of the quality?
[00:16:46] NLW: Quality data or sort of a rag ops wars just in terms of, you know, anything that's changed, I guess, for you fundamentally in the last couple of months about where those things stand.
[00:16:55] swyx: So I think we're going to talk about rag for the Gemini and Clouds discussion later. And so maybe briefly discuss the data piece.
[00:17:03] Data War: Reddit x Google
[00:17:03] swyx: I think maybe the only new thing was this Reddit deal with Google for like a 60 million dollar deal just ahead of their IPO, very conveniently turning Reddit into a AI data company. Also, very, very interestingly, a non exclusive deal, meaning that Reddit can resell that data to someone else. And it probably does become table stakes.
[00:17:23] swyx: A lot of people don't know, but a lot of the web text dataset that originally started for GPT 1, 2, and 3 was actually scraped from GitHub. from Reddit at least the sort of vote scores. And I think, I think that's a, that's a very valuable piece of information. So like, yeah, I think people are figuring out how to pay for data.
[00:17:40] swyx: People are suing each other over data. This, this, this war is, you know, definitely very, very much heating up. And I don't think, I don't see it getting any less intense. I, you know, next to GPUs, data is going to be the most expensive thing in, in a model stack company. And. You know, a lot of people are resorting to synthetic versions of it, which may or may not be kosher based on how far along or how commercially blessed the, the forms of creating that synthetic data are.
[00:18:11] swyx: I don't know if Alessio, you have any other interactions with like Data source companies, but that's my two cents.
[00:18:17] Alessio: Yeah yeah, I actually saw Quentin Anthony from Luther. ai at GTC this week. He's also been working on this. I saw Technium. He's also been working on the data side. I think especially in open source, people are like, okay, if everybody is putting the gates up, so to speak, to the data we need to make it easier for people that don't have 50 million a year to get access to good data sets.
[00:18:38] Alessio: And Jensen, at his keynote, he did talk about synthetic data a little bit. So I think that's something that we'll definitely hear more and more of in the enterprise, which never bodes well, because then all the, all the people with the data are like, Oh, the enterprises want to pay now? Let me, let me put a pay here stripe link so that they can give me 50 million.
[00:18:57] Alessio: But it worked for Reddit. I think the stock is up. 40 percent today after opening. So yeah, I don't know if it's all about the Google deal, but it's obviously Reddit has been one of those companies where, hey, you got all this like great community, but like, how are you going to make money? And like, they try to sell the avatars.
[00:19:15] Alessio: I don't know if that it's a great business for them. The, the data part sounds as an investor, you know, the data part sounds a lot more interesting than, than consumer
[00:19:25] swyx: cosmetics. Yeah, so I think, you know there's more questions around data you know, I think a lot of people are talking about the interview that Mira Murady did with the Wall Street Journal, where she, like, just basically had no, had no good answer for where they got the data for Sora.
[00:19:39] swyx: I, I think this is where, you know, there's, it's in nobody's interest to be transparent about data, and it's, it's kind of sad for the state of ML and the state of AI research but it is what it is. We, we have to figure this out as a society, just like we did for music and music sharing. You know, in, in sort of the Napster to Spotify transition, and that might take us a decade.
[00:19:59] swyx: Yeah, I
[00:20:00] NLW: do. I, I agree. I think, I think that you're right to identify it, not just as that sort of technical problem, but as one where society has to have a debate with itself. Because I think that there's, if you rationally within it, there's Great kind of points on all side, not to be the sort of, you know, person who sits in the middle constantly, but it's why I think a lot of these legal decisions are going to be really important because, you know, the job of judges is to listen to all this stuff and try to come to things and then have other judges disagree.
[00:20:24] NLW: And, you know, and have the rest of us all debate at the same time. By the way, as a total aside, I feel like the synthetic data right now is like eggs in the 80s and 90s. Like, whether they're good for you or bad for you, like, you know, we, we get one study that's like synthetic data, you know, there's model collapse.
[00:20:42] NLW: And then we have like a hint that llama, you know, to the most high performance version of it, which was one they didn't release was trained on synthetic data. So maybe it's good. It's like, I just feel like every, every other week I'm seeing something sort of different about whether it's a good or bad for, for these models.
[00:20:56] swyx: Yeah. The branding of this is pretty poor. I would kind of tell people to think about it like cholesterol. There's good cholesterol, bad cholesterol. And you can have, you know, good amounts of both. But at this point, it is absolutely without a doubt that most large models from here on out will all be trained as some kind of synthetic data and that is not a bad thing.
[00:21:16] swyx: There are ways in which you can do it poorly. Whether it's commercial, you know, in terms of commercial sourcing or in terms of the model performance. But it's without a doubt that good synthetic data is going to help your model. And this is just a question of like where to obtain it and what kinds of synthetic data are valuable.
[00:21:36] swyx: You know, if even like alpha geometry, you know, was, was a really good example from like earlier this year.
[00:21:42] NLW: If you're using the cholesterol analogy, then my, then my egg thing can't be that far off. Let's talk about the sort of the state of the art and the, and the GPT 4 class landscape and how that's changed.
[00:21:53] Gemini 1.5 vs Claude 3
[00:21:53] NLW: Cause obviously, you know, sort of the, the two big things or a couple of the big things that have happened. Since we last talked, we're one, you know, Gemini first announcing that a model was coming and then finally it arriving, and then very soon after a sort of a different model arriving from Gemini and and Cloud three.
[00:22:11] NLW: So I guess, you know, I'm not sure exactly where the right place to start with this conversation is, but, you know, maybe very broadly speaking which of these do you think have made a bigger impact? Thank you.
[00:22:20] Alessio: Probably the one you can use, right? So, Cloud. Well, I'm sure Gemini is going to be great once they let me in, but so far I haven't been able to.
[00:22:29] Alessio: I use, so I have this small podcaster thing that I built for our podcast, which does chapters creation, like named entity recognition, summarization, and all of that. Cloud Tree is, Better than GPT 4. Cloud2 was unusable. So I use GPT 4 for everything. And then when Opus came out, I tried them again side by side and I posted it on, on Twitter as well.
[00:22:53] Alessio: Cloud is better. It's very good, you know, it's much better, it seems to me, it's much better than GPT 4 at doing writing that is more, you know, I don't know, it just got good vibes, you know, like the GPT 4 text, you can tell it's like GPT 4, you know, it's like, it always uses certain types of words and phrases and, you know, maybe it's just me because I've now done it for, you know, So, I've read like 75, 80 generations of these things next to each other.
[00:23:21] Alessio: Clutter is really good. I know everybody is freaking out on twitter about it, my only experience of this is much better has been on the podcast use case. But I know that, you know, Quran from from News Research is a very big opus pro, pro opus person. So, I think that's also It's great to have people that actually care about other models.
[00:23:40] Alessio: You know, I think so far to a lot of people, maybe Entropic has been the sibling in the corner, you know, it's like Cloud releases a new model and then OpenAI releases Sora and like, you know, there are like all these different things, but yeah, the new models are good. It's interesting.
[00:23:55] NLW: My my perception is definitely that just, just observationally, Cloud 3 is certainly the first thing that I've seen where lots of people.
[00:24:06] NLW: They're, no one's debating evals or anything like that. They're talking about the specific use cases that they have, that they used to use chat GPT for every day, you know, day in, day out, that they've now just switched over. And that has, I think, shifted a lot of the sort of like vibe and sentiment in the space too.
[00:24:26] NLW: And I don't necessarily think that it's sort of a A like full you know, sort of full knock. Let's put it this way. I think it's less bad for open AI than it is good for anthropic. I think that because GPT 5 isn't there, people are not quite willing to sort of like, you know get overly critical of, of open AI, except in so far as they're wondering where GPT 5 is.
[00:24:46] NLW: But I do think that it makes, Anthropic look way more credible as a, as a, as a player, as a, you know, as a credible sort of player, you know, as opposed to to, to where they were.
[00:24:57] Alessio: Yeah. And I would say the benchmarks veil is probably getting lifted this year. I think last year. People were like, okay, this is better than this on this benchmark, blah, blah, blah, because maybe they did not have a lot of use cases that they did frequently.
[00:25:11] Alessio: So it's hard to like compare yourself. So you, you defer to the benchmarks. I think now as we go into 2024, a lot of people have started to use these models from, you know, from very sophisticated things that they run in production to some utility that they have on their own. Now they can just run them side by side.
[00:25:29] Alessio: And it's like, Hey, I don't care that like. The MMLU score of Opus is like slightly lower than GPT 4. It just works for me, you know, and I think that's the same way that traditional software has been used by people, right? Like you just strive for yourself and like, which one does it work, works best for you?
[00:25:48] Alessio: Like nobody looks at benchmarks outside of like sales white papers, you know? And I think it's great that we're going more in that direction. We have a episode with Adapt coming out this weekend. I'll and some of their model releases, they specifically say, We do not care about benchmarks, so we didn't put them in, you know, because we, we don't want to look good on them.
[00:26:06] Alessio: We just want the product to work. And I think more and more people will, will
[00:26:09] swyx: go that way. Yeah. I I would say like, it does take the wind out of the sails for GPT 5, which I know where, you know, Curious about later on. I think anytime you put out a new state of the art model, you have to break through in some way.
[00:26:21] swyx: And what Claude and Gemini have done is effectively take away any advantage to saying that you have a million token context window. Now everyone's just going to be like, Oh, okay. Now you just match the other two guys. And so that puts An insane amount of pressure on what gpt5 is going to be because it's just going to have like the only option it has now because all the other models are multimodal all the other models are long context all the other models have perfect recall gpt5 has to match everything and do more to to not be a flop
[00:26:58] AI Breakdown Part 2
[00:26:58] NLW: hello friends back again with part two if you haven't heard part one of this conversation i suggest you go check it out but to be honest they are kind of actually separable In this conversation, we get into a topic that I think Alessio and Swyx are very well positioned to discuss, which is what developers care about right now, what people are trying to build around.
[00:27:16] NLW: I honestly think that one of the best ways to see the future in an industry like AI is to try to dig deep on what developers and entrepreneurs are attracted to build, even if it hasn't made it to the news pages yet. So consider this your preview of six months from now, and let's dive in. Let's bring it to the GPT 5 conversation.
[00:27:33] Next Frontiers: Llama 3, GPT-5, Gemini 2, Claude 4
[00:27:33] NLW: I mean, so, so I think that that's a great sort of assessment of just how the stakes have been raised, you know is your, I mean, so I guess maybe, maybe I'll, I'll frame this less as a question, just sort of something that, that I, that I've been watching right now, the only thing that makes sense to me with how.
[00:27:50] NLW: Fundamentally unbothered and unstressed OpenAI seems about everything is that they're sitting on something that does meet all that criteria, right? Because, I mean, even in the Lex Friedman interview that, that Altman recently did, you know, he's talking about other things coming out first. He's talking about, he's just like, he, listen, he, he's good and he could play nonchalant, you know, if he wanted to.
[00:28:13] NLW: So I don't want to read too much into it, but. You know, they've had so long to work on this, like unless that we are like really meaningfully running up against some constraint, it just feels like, you know, there's going to be some massive increase, but I don't know. What do you guys think?
[00:28:28] swyx: Hard to speculate.
[00:28:29] swyx: You know, at this point, they're, they're pretty good at PR and they're not going to tell you anything that they don't want to. And he can tell you one thing and change their minds the next day. So it's, it's, it's really, you know, I've always said that model version numbers are just marketing exercises, like they have something and it's always improving and at some point you just cut it and decide to call it GPT 5.
[00:28:50] swyx: And it's more just about defining an arbitrary level at which they're ready and it's up to them on what ready means. We definitely did see some leaks on GPT 4. 5, as I think a lot of people reported and I'm not sure if you covered it. So it seems like there might be an intermediate release. But I did feel, coming out of the Lex Friedman interview, that GPT 5 was nowhere near.
[00:29:11] swyx: And you know, it was kind of a sharp contrast to Sam talking at Davos in February, saying that, you know, it was his top priority. So I find it hard to square. And honestly, like, there's also no point Reading too much tea leaves into what any one person says about something that hasn't happened yet or has a decision that hasn't been taken yet.
[00:29:31] swyx: Yeah, that's, that's my 2 cents about it. Like, calm down, let's just build .
[00:29:35] Alessio: Yeah. The, the February rumor was that they were gonna work on AI agents, so I don't know, maybe they're like, yeah,
[00:29:41] swyx: they had two agent two, I think two agent projects, right? One desktop agent and one sort of more general yeah, sort of GPTs like agent and then Andre left, so he was supposed to be the guy on that.
[00:29:52] swyx: What did Andre see? What did he see? I don't know. What did he see?
[00:29:56] Alessio: I don't know. But again, it's just like the rumors are always floating around, you know but I think like, this is, you know, we're not going to get to the end of the year without Jupyter you know, that's definitely happening. I think the biggest question is like, are Anthropic and Google.
[00:30:13] Alessio: Increasing the pace, you know, like it's the, it's the cloud four coming out like in 12 months, like nine months. What's the, what's the deal? Same with Gemini. They went from like one to 1. 5 in like five days or something. So when's Gemini 2 coming out, you know, is that going to be soon? I don't know.
[00:30:31] Alessio: There, there are a lot of, speculations, but the good thing is that now you can see a world in which OpenAI doesn't rule everything. You know, so that, that's the best, that's the best news that everybody got, I would say.
[00:30:43] swyx: Yeah, and Mistral Large also dropped in the last month. And, you know, not as, not quite GPT 4 class, but very good from a new startup.
[00:30:52] swyx: So yeah, we, we have now slowly changed in landscape, you know. In my January recap, I was complaining that nothing's changed in the landscape for a long time. But now we do exist in a world, sort of a multipolar world where Cloud and Gemini are legitimate challengers to GPT 4 and hopefully more will emerge as well hopefully from meta.
[00:31:11] Open Source Models - Mistral, Grok
[00:31:11] NLW: So speak, let's actually talk about sort of the open source side of this for a minute. So Mistral Large, notable because it's, it's not available open source in the same way that other things are, although I think my perception is that the community has largely given them Like the community largely recognizes that they want them to keep building open source stuff and they have to find some way to fund themselves that they're going to do that.
[00:31:27] NLW: And so they kind of understand that there's like, they got to figure out how to eat, but we've got, so, you know, there there's Mistral, there's, I guess, Grok now, which is, you know, Grok one is from, from October is, is open
[00:31:38] swyx: sourced at, yeah. Yeah, sorry, I thought you thought you meant Grok the chip company.
[00:31:41] swyx: No, no, no, yeah, you mean Twitter Grok.
[00:31:43] NLW: Although Grok the chip company, I think is even more interesting in some ways, but and then there's the, you know, obviously Llama3 is the one that sort of everyone's wondering about too. And, you know, my, my sense of that, the little bit that, you know, Zuckerberg was talking about Llama 3 earlier this year, suggested that, at least from an ambition standpoint, he was not thinking about how do I make sure that, you know, meta content, you know, keeps, keeps the open source thrown, you know, vis a vis Mistral.
[00:32:09] NLW: He was thinking about how you go after, you know, how, how he, you know, releases a thing that's, you know, every bit as good as whatever OpenAI is on at that point.
[00:32:16] Alessio: Yeah. From what I heard in the hallways at, at GDC, Llama 3, the, the biggest model will be, you 260 to 300 billion parameters, so that that's quite large.
[00:32:26] Alessio: That's not an open source model. You know, you cannot give people a 300 billion parameters model and ask them to run it. You know, it's very compute intensive. So I think it is, it
[00:32:35] swyx: can be open source. It's just, it's going to be difficult to run, but that's a separate question.
[00:32:39] Alessio: It's more like, as you think about what they're doing it for, you know, it's not like empowering the person running.
[00:32:45] Alessio: llama. On, on their laptop, it's like, oh, you can actually now use this to go after open AI, to go after Anthropic, to go after some of these companies at like the middle complexity level, so to speak. Yeah. So obviously, you know, we estimate Gentala on the podcast, they're doing a lot here, they're making PyTorch better.
[00:33:03] Alessio: You know, they want to, that's kind of like maybe a little bit of a shorted. Adam Bedia, in a way, trying to get some of the CUDA dominance out of it. Yeah, no, it's great. The, I love the duck destroying a lot of monopolies arc. You know, it's, it's been very entertaining. Let's bridge
[00:33:18] NLW: into the sort of big tech side of this, because this is obviously like, so I think actually when I did my episode, this was one of the I added this as one of as an additional war that, that's something that I'm paying attention to.
[00:33:29] NLW: So we've got Microsoft's moves with inflection, which I think pretend, potentially are being read as A shift vis a vis the relationship with OpenAI, which also the sort of Mistral large relationship seems to reinforce as well. We have Apple potentially entering the race, finally, you know, giving up Project Titan and and, and kind of trying to spend more effort on this.
[00:33:50] NLW: Although, Counterpoint, we also have them talking about it, or there being reports of a deal with Google, which, you know, is interesting to sort of see what their strategy there is. And then, you know, Meta's been largely quiet. We kind of just talked about the main piece, but, you know, there's, and then there's spoilers like Elon.
[00:34:07] NLW: I mean, you know, what, what of those things has sort of been most interesting to you guys as you think about what's going to shake out for the rest of this
[00:34:13] Apple MM1
[00:34:13] swyx: year? I'll take a crack. So the reason we don't have a fifth war for the Big Tech Wars is that's one of those things where I just feel like we don't cover differently from other media channels, I guess.
[00:34:26] swyx: Sure, yeah. In our anti interestness, we actually say, like, we try not to cover the Big Tech Game of Thrones, or it's proxied through Twitter. You know, all the other four wars anyway, so there's just a lot of overlap. Yeah, I think absolutely, personally, the most interesting one is Apple entering the race.
[00:34:41] swyx: They actually released, they announced their first large language model that they trained themselves. It's like a 30 billion multimodal model. People weren't that impressed, but it was like the first time that Apple has kind of showcased that, yeah, we're training large models in house as well. Of course, like, they might be doing this deal with Google.
[00:34:57] swyx: I don't know. It sounds very sort of rumor y to me. And it's probably, if it's on device, it's going to be a smaller model. So something like a Jemma. It's going to be smarter autocomplete. I don't know what to say. I'm still here dealing with, like, Siri, which hasn't, probably hasn't been updated since God knows when it was introduced.
[00:35:16] swyx: It's horrible. I, you know, it, it, it makes me so angry. So I, I, one, as an Apple customer and user, I, I'm just hoping for better AI on Apple itself. But two, they are the gold standard when it comes to local devices, personal compute and, and trust, like you, you trust them with your data. And. I think that's what a lot of people are looking for in AI, that they have, they love the benefits of AI, they don't love the downsides, which is that you have to send all your data to some cloud somewhere.
[00:35:45] swyx: And some of this data that we're going to feed AI is just the most personal data there is. So Apple being like one of the most trusted personal data companies, I think it's very important that they enter the AI race, and I hope to see more out of them.
[00:35:58] Alessio: To me, the, the biggest question with the Google deal is like, who's paying who?
[00:36:03] Alessio: Because for the browsers, Google pays Apple like 18, 20 billion every year to be the default browser. Is Google going to pay you to have Gemini or is Apple paying Google to have Gemini? I think that's, that's like what I'm most interested to figure out because with the browsers, it's like, it's the entry point to the thing.
[00:36:21] Alessio: So it's really valuable to be the default. That's why Google pays. But I wonder if like the perception in AI is going to be like, Hey. You just have to have a good local model on my phone to be worth me purchasing your device. And that was, that's kind of drive Apple to be the one buying the model. But then, like Shawn said, they're doing the MM1 themselves.
[00:36:40] Alessio: So are they saying we do models, but they're not as good as the Google ones? I don't know. The whole thing is, it's really confusing, but. It makes for great meme material on on Twitter.
[00:36:51] swyx: Yeah, I mean, I think, like, they are possibly more than OpenAI and Microsoft and Amazon. They are the most full stack company there is in computing, and so, like, they own the chips, man.
[00:37:05] swyx: Like, they manufacture everything so if, if, if there was a company that could do that. You know, seriously challenge the other AI players. It would be Apple. And it's, I don't think it's as hard as self driving. So like maybe they've, they've just been investing in the wrong thing this whole time. We'll see.
[00:37:21] swyx: Wall Street certainly thinks
[00:37:22] NLW: so. Wall Street loved that move, man. There's a big, a big sigh of relief. Well, let's, let's move away from, from sort of the big stuff. I mean, the, I think to both of your points, it's going to.
[00:37:33] Meta's $800b AI rebrand
[00:37:33] NLW: Can I, can
[00:37:34] swyx: I, can I, can I jump on factoid about this, this Wall Street thing? I went and looked at when Meta went from being a VR company to an AI company.
[00:37:44] swyx: And I think the stock I'm trying to look up the details now. The stock has gone up 187% since Lamo one. Yeah. Which is $830 billion in market value created in the past year. . Yeah. Yeah.
[00:37:57] NLW: It's, it's, it's like, remember if you guys haven't Yeah. If you haven't seen the chart, it's actually like remarkable.
[00:38:02] NLW: If you draw a little
[00:38:03] swyx: arrow on it, it's like, no, we're an AI company now and forget the VR thing.
[00:38:10] NLW: It's it, it is an interesting, no, it's, I, I think, alessio, you called it sort of like Zuck's Disruptor Arc or whatever. He, he really does. He is in the midst of a, of a total, you know, I don't know if it's a redemption arc or it's just, it's something different where, you know, he, he's sort of the spoiler.
[00:38:25] NLW: Like people loved him just freestyle talking about why he thought they had a better headset than Apple. But even if they didn't agree, they just loved it. He was going direct to camera and talking about it for, you know, five minutes or whatever. So that, that's a fascinating shift that I don't think anyone had on their bingo card, you know, whatever, two years ago.
[00:38:41] NLW: Yeah. Yeah,
[00:38:42] swyx: we still
[00:38:43] Alessio: didn't see and fight Elon though, so
[00:38:45] swyx: that's what I'm really looking forward to. I mean, hey, don't, don't, don't write it off, you know, maybe just these things take a while to happen. But we need to see and fight in the Coliseum. No, I think you know, in terms of like self management, life leadership, I think he has, there's a lot of lessons to learn from him.
[00:38:59] swyx: You know he might, you know, you might kind of quibble with, like, the social impact of Facebook, but just himself as a in terms of personal growth and, and, you know, Per perseverance through like a lot of change and you know, everyone throwing stuff his way. I think there's a lot to say about like, to learn from, from Zuck, which is crazy 'cause he's my age.
[00:39:18] swyx: Yeah. Right.
[00:39:20] AI Engineer landscape - from baby AGIs to vertical Agents
[00:39:20] NLW: Awesome. Well, so, so one of the big things that I think you guys have, you know, distinct and, and unique insight into being where you are and what you work on is. You know, what developers are getting really excited about right now. And by that, I mean, on the one hand, certainly, you know, like startups who are actually kind of formalized and formed to startups, but also, you know, just in terms of like what people are spending their nights and weekends on what they're, you know, coming to hackathons to do.
[00:39:45] NLW: And, you know, I think it's a, it's a, it's, it's such a fascinating indicator for, for where things are headed. Like if you zoom back a year, right now was right when everyone was getting so, so excited about. AI agent stuff, right? Auto, GPT and baby a GI. And these things were like, if you dropped anything on YouTube about those, like instantly tens of thousands of views.
[00:40:07] NLW: I know because I had like a 50,000 view video, like the second day that I was doing the show on YouTube, you know, because I was talking about auto GPT. And so anyways, you know, obviously that's sort of not totally come to fruition yet, but what are some of the trends in what you guys are seeing in terms of people's, people's interest and, and, and what people are building?
[00:40:24] Alessio: I can start maybe with the agents part and then I know Shawn is doing a diffusion meetup tonight. There's a lot of, a lot of different things. The, the agent wave has been the most interesting kind of like dream to reality arc. So out of GPT, I think they went, From zero to like 125, 000 GitHub stars in six weeks, and then one year later, they have 150, 000 stars.
[00:40:49] Alessio: So there's kind of been a big plateau. I mean, you might say there are just not that many people that can start it. You know, everybody already started it. But the promise of, hey, I'll just give you a goal, and you do it. I think it's like, amazing to get people's imagination going. You know, they're like, oh, wow, this This is awesome.
[00:41:08] Alessio: Everybody, everybody can try this to do anything. But then as technologists, you're like, well, that's, that's just like not possible, you know, we would have like solved everything. And I think it takes a little bit to go from the promise and the hope that people show you to then try it yourself and going back to say, okay, this is not really working for me.
[00:41:28] Alessio: And David Wong from Adept, you know, they in our episode, he specifically said. We don't want to do a bottom up product. You know, we don't want something that everybody can just use and try because it's really hard to get it to be reliable. So we're seeing a lot of companies doing vertical agents that are narrow for a specific domain, and they're very good at something.
[00:41:49] Alessio: Mike Conover, who was at Databricks before, is also a friend of Latentspace. He's doing this new company called BrightWave doing AI agents for financial research, and that's it, you know, and they're doing very well. There are other companies doing it in security, doing it in compliance, doing it in legal.
[00:42:08] Alessio: All of these things that like, people, nobody just wakes up and say, Oh, I cannot wait to go on AutoGPD and ask it to do a compliance review of my thing. You know, just not what inspires people. So I think the gap on the developer side has been the more bottom sub hacker mentality is trying to build this like very Generic agents that can do a lot of open ended tasks.
[00:42:30] Alessio: And then the more business side of things is like, Hey, If I want to raise my next round, I can not just like sit around the mess, mess around with like super generic stuff. I need to find a use case that really works. And I think that that is worth for, for a lot of folks in parallel, you have a lot of companies doing evals.
[00:42:47] Alessio: There are dozens of them that just want to help you measure how good your models are doing. Again, if you build evals, you need to also have a restrained surface area to actually figure out whether or not it's good, right? Because you cannot eval anything on everything under the sun. So that's another category where I've seen from the startup pitches that I've seen, there's a lot of interest in, in the enterprise.
[00:43:11] Alessio: It's just like really. Fragmented because the production use cases are just coming like now, you know, there are not a lot of long established ones to, to test against. And so does it, that's kind of on the virtual agents and then the robotic side it's probably been the thing that surprised me the most at NVIDIA GTC, the amount of robots that were there that were just like robots everywhere.
[00:43:33] Alessio: Like, both in the keynote and then on the show floor, you would have Boston Dynamics dogs running around. There was, like, this, like fox robot that had, like, a virtual face that, like, talked to you and, like, moved in real time. There were industrial robots. NVIDIA did a big push on their own Omniverse thing, which is, like, this Digital twin of whatever environments you're in that you can use to train the robots agents.
[00:43:57] Alessio: So that kind of takes people back to the reinforcement learning days, but yeah, agents, people want them, you know, people want them. I give a talk about the, the rise of the full stack employees and kind of this future, the same way full stack engineers kind of work across the stack. In the future, every employee is going to interact with every part of the organization through agents and AI enabled tooling.
[00:44:17] Alessio: This is happening. It just needs to be a lot more narrow than maybe the first approach that we took, which is just put a string in AutoGPT and pray. But yeah, there's a lot of super interesting stuff going on.
[00:44:27] swyx: Yeah. Well, he Let's recover a lot of stuff there. I'll separate the robotics piece because I feel like that's so different from the software world.
[00:44:34] swyx: But yeah, we do talk to a lot of engineers and you know, that this is our sort of bread and butter. And I do agree that vertical agents have worked out a lot better than the horizontal ones. I think all You know, the point I'll make here is just the reason AutoGPT and maybe AGI, you know, it's in the name, like they were promising AGI.
[00:44:53] swyx: But I think people are discovering that you cannot engineer your way to AGI. It has to be done at the model level and all these engineering, prompt engineering hacks on top of it weren't really going to get us there in a meaningful way without much further, you know, improvements in the models. I would say, I'll go so far as to say, even Devin, which is, I would, I think the most advanced agent that we've ever seen, still requires a lot of engineering and still probably falls apart a lot in terms of, like, practical usage.
[00:45:22] swyx: Or it's just, Way too slow and expensive for, you know, what it's, what it's promised compared to the video. So yeah, that's, that's what, that's what happened with agents from, from last year. But I, I do, I do see, like, vertical agents being very popular and, and sometimes you, like, I think the word agent might even be overused sometimes.
[00:45:38] swyx: Like, people don't really care whether or not you call it an AI agent, right? Like, does it replace boring menial tasks that I do That I might hire a human to do, or that the human who is hired to do it, like, actually doesn't really want to do. And I think there's absolutely ways in sort of a vertical context that you can actually go after very routine tasks that can be scaled out to a lot of, you know, AI assistants.
[00:46:01] swyx: So, so yeah, I mean, and I would, I would sort of basically plus one what let's just sit there. I think it's, it's very, very promising and I think more people should work on it, not less. Like there's not enough people. Like, we, like, this should be the, the, the main thrust of the AI engineer is to look out, look for use cases and, and go to a production with them instead of just always working on some AGI promising thing that never arrives.
[00:46:21] swyx: I,
[00:46:22] NLW: I, I can only add that so I've been fiercely making tutorials behind the scenes around basically everything you can imagine with AI. We've probably done, we've done about 300 tutorials over the last couple of months. And the verticalized anything, right, like this is a solution for your particular job or role, even if it's way less interesting or kind of sexy, it's like so radically more useful to people in terms of intersecting with how, like those are the ways that people are actually.
[00:46:50] NLW: Adopting AI in a lot of cases is just a, a, a thing that I do over and over again. By the way, I think that's the same way that even the generalized models are getting adopted. You know, it's like, I use midjourney for lots of stuff, but the main thing I use it for is YouTube thumbnails every day. Like day in, day out, I will always do a YouTube thumbnail, you know, or two with, with Midjourney, right?
[00:47:09] NLW: And it's like you can, you can start to extrapolate that across a lot of things and all of a sudden, you know, a AI doesn't. It looks revolutionary because of a million small changes rather than one sort of big dramatic change. And I think that the verticalization of agents is sort of a great example of how that's
[00:47:26] swyx: going to play out too.
[00:47:28] Adept episode - Screen Multimodality
[00:47:28] swyx: So I'll have one caveat here, which is I think that Because multi modal models are now commonplace, like Cloud, Gemini, OpenAI, all very very easily multi modal, Apple's easily multi modal, all this stuff. There is a switch for agents for sort of general desktop browsing that I think people so much for joining us today, and we'll see you in the next video.
[00:48:04] swyx: Version of the the agent where they're not specifically taking in text or anything They're just watching your screen just like someone else would and and I'm piloting it by vision And you know in the the episode with David that we'll have dropped by the time that this this airs I think I think that is the promise of adept and that is a promise of what a lot of these sort of desktop agents Are and that is the more general purpose system That could be as big as the browser, the operating system, like, people really want to build that foundational piece of software in AI.
[00:48:38] swyx: And I would see, like, the potential there for desktop agents being that, that you can have sort of self driving computers. You know, don't write the horizontal piece out. I just think we took a while to get there.
[00:48:48] NLW: What else are you guys seeing that's interesting to you? I'm looking at your notes and I see a ton of categories.
[00:48:54] Top Model Research from January Recap
[00:48:54] swyx: Yeah so I'll take the next two as like as one category, which is basically alternative architectures, right? The two main things that everyone following AI kind of knows now is, one, the diffusion architecture, and two, the let's just say the, Decoder only transformer architecture that is popularized by GPT.
[00:49:12] swyx: You can read, you can look on YouTube for thousands and thousands of tutorials on each of those things. What we are talking about here is what's next, what people are researching, and what could be on the horizon that takes the place of those other two things. So first of all, we'll talk about transformer architectures and then diffusion.
[00:49:25] swyx: So transformers the, the two leading candidates are effectively RWKV and the state space models the most recent one of which is Mamba, but there's others like the Stripe, ENA, and the S four H three stuff coming out of hazy research at Stanford. And all of those are non quadratic language models that scale the promise to scale a lot better than the, the traditional transformer.
[00:49:47] swyx: That this might be too theoretical for most people right now, but it's, it's gonna be. It's gonna come out in weird ways, where, imagine if like, Right now the talk of the town is that Claude and Gemini have a million tokens of context and like whoa You can put in like, you know, two hours of video now, okay But like what if you put what if we could like throw in, you know, two hundred thousand hours of video?
[00:50:09] swyx: Like how does that change your usage of AI? What if you could throw in the entire genetic sequence of a human and like synthesize new drugs. Like, well, how does that change things? Like, we don't know because we haven't had access to this capability being so cheap before. And that's the ultimate promise of these two models.
[00:50:28] swyx: They're not there yet but we're seeing very, very good progress. RWKV and Mamba are probably the, like, the two leading examples, both of which are open source that you can try them today and and have a lot of progress there. And the, the, the main thing I'll highlight for audio e KV is that at, at the seven B level, they seem to have beat LAMA two in all benchmarks that matter at the same size for the same amount of training as an open source model.
[00:50:51] swyx: So that's exciting. You know, they're there, they're seven B now. They're not at seven tb. We don't know if it'll. And then the other thing is diffusion. Diffusions and transformers are are kind of on the collision course. The original stable diffusion already used transformers in in parts of its architecture.
[00:51:06] swyx: It seems that transformers are eating more and more of those layers particularly the sort of VAE layer. So that's, the Diffusion Transformer is what Sora is built on. The guy who wrote the Diffusion Transformer paper, Bill Pebbles, is, Bill Pebbles is the lead tech guy on Sora. So you'll just see a lot more Diffusion Transformer stuff going on.
[00:51:25] swyx: But there's, there's more sort of experimentation with diffusion. I'm holding a meetup actually here in San Francisco that's gonna be like the state of diffusion, which I'm pretty excited about. Stability's doing a lot of good work. And if you look at the, the architecture of how they're creating Stable Diffusion 3, Hourglass Diffusion, and the inconsistency models, or SDXL Turbo.
[00:51:45] swyx: All of these are, like, very, very interesting innovations on, like, the original idea of what Stable Diffusion was. So if you think that it is expensive to create or slow to create Stable Diffusion or an AI generated art, you are not up to date with the latest models. If you think it is hard to create text and images, you are not up to date with the latest models.
[00:52:02] swyx: And people still are kind of far behind. The last piece of which is the wildcard I always kind of hold out, which is text diffusion. So Instead of using autogenerative or autoregressive transformers, can you use text to diffuse? So you can use diffusion models to diffuse and create entire chunks of text all at once instead of token by token.
[00:52:22] swyx: And that is something that Midjourney confirmed today, because it was only rumored the past few months. But they confirmed today that they were looking into. So all those things are like very exciting new model architectures that are, Maybe something that we'll, you'll see in production two to three years from now.
[00:52:37] swyx: So the couple of the trends
[00:52:38] NLW: that I want to just get your takes on, because they're sort of something that, that seems like they're coming up are one sort of these, these wearable, you know, kind of passive AI experiences where they're absorbing a lot of what's going on around you and then, and then kind of bringing things back.
[00:52:53] NLW: And then the, the other one that I, that I wanted to see if you guys had thoughts on were sort of this next generation of chip companies. Obviously there's a huge amount of emphasis. On on hardware and silicon and, and, and different ways of doing things, but, you know, love your take on, on either or both of
[00:53:07] swyx: those.
[00:53:08] AI Wearables
[00:53:08] swyx: So for so wearables, I'm very excited about it. I want wearables on me at all times. I have two right here. To, to quantify my health. And I, you know, I'm all for them. But society is not ready for wearables, right? Like, no one's comfortable with a device on recording every single conversation we have.
[00:53:24] swyx: Even all three of us here as podcasters, we don't record everything that we say. And I think there's a social shift that needs to happen. I am an investor in TAB. They are renaming to a broader vision, but they are one of the three or four leading wearables in this space. It's sort of the AI pendants, or AI OS, or AI personal companion space.
[00:53:47] swyx: I have seen two humanes in the wild in San Francisco. I'm very, very excited to report that there are people walking around with those things on their chest and it is as goofy as it sounds. It, it absolutely is going to fail. God bless them for trying. And I've also bought a rabbit. So I'm, I'm very excited for all those things to arrive.
[00:54:06] swyx: But yeah people are very keen on hardware. I think the, the, the idea that you can have physical objects that. Embody an AI that do specific things for you is as old as, you know, the sort of Golem in sort of medieval times in terms of like how much we want our objects to be smart and do things for us.
[00:54:27] swyx: And I think it's absolutely a great play. The funny thing is people are much more willing to pay you upfront for a hardware device than they are willing to pay like an 8 a month subscription recurring for software, right? And so the interesting economics of these wearable companies is they have negative float.
[00:54:47] swyx: In the sense that people pay deposits upfront, like I paid like, I don't know, 200 bucks for the rabbit. Upfront, and I don't get it for another six months. I paid 600 for the tab, and I don't get it for another six months. And, and then, then they can take that money and, and sort of invest it in like their next, the next events or their next properties or ventures.
[00:55:06] swyx: And like, I think that's a, that's a very interesting reversal of economics from other types of AI companies that I see. And I think, yeah, just the, the, the tactile feel of an AI, I think is very promising. I, Alex, I don't know if you have other thoughts on, on the wearable stuff.
[00:55:21] Alessio: The open interpreter just announced their product four hours ago.
[00:55:25] Alessio: Yeah. Which is a, it's not really a wearable, but it's a, it's still like a physical device.
[00:55:30] swyx: It's a push to talk mic to, to a device on your, on your laptop. Right. It's a $99 push talk. Yeah.
[00:55:38] Alessio: But, but, but everybody, but again, going back to your point, it's like people want to, people are interested in spending money for like things that they can hold, you know, I don't know what that means overall for like where things are going, but making more of this AI be a physical part of your life.
[00:55:54] Alessio: I think people are interested in that, but I agree with Shawn. I mean, I've been. I talked to Avi about this, but Avi's point is like, most consumers, like, care about utility more than they care about privacy, you know, like you've seen with social media. But I also think there's a big societal reaction to AI that is, like, much more rooted than the social media one.
[00:56:16] Alessio: But we'll see. But a lot, again, a lot of work, a lot of developers, a lot of money going into it. So there's, there's bound to be experiments being run. On, on the
[00:56:25] swyx: chip side. Sorry, I'll just ship it one more thing and then we transition to the chips. The thing I'll caution people on is don't overly focus on the form factor.
[00:56:33] swyx: The form factor is a delivery mode. There will be many form factors. It doesn't matter so much as where in the data war does it sit. It actually is context acquisition. Because, and maybe a little bit of multi modality. Context, like, context is king. Like, if you have access to data that no one else has, then you will be able to create AI that no one else can create.
[00:56:54] swyx: And so what is the most personal context? It is your everyday conversation. It is as close to mapping your mental train of thought As possible without, you know, physically you writing down notes. So, so that is the promise, the ultimate goal here, which is like, personal context, it's always available on you you know, loading and seeing all that stuff.
[00:57:12] swyx: But yeah, that's the, that's the frame I want to give people that the form factors will change and there will be multiple form factors, but it's the software behind that. And in the personal context that you cannot get anywhere else, that'll win.
[00:57:24] Alessio: Yeah, so that was wearables.
[00:57:26] Groq vs Nvidia month - GPU Chip War
[00:57:26] Alessio: On the chip side, yeah, Grok was probably the biggest release.
[00:57:29] Alessio: Jonathan, well, it's not even a new release because the company, I think, was started in 2016. So it's actually quite old. But now recently captured the people's imagination with their MixedREL 500 tokens a second demo. Yeah, I think so far the battle on the GPU side has been Either you go kind of like massive chip, like the Cerebros of the world, where one chip from Cerebros is about two million dollars, you know, that's compared, obviously, you cannot compare one chip versus one chip, but h100 is like 40, 000, something like that the problem with those architectures has been They want to be very general, you know, but like they wanted to put a lot of the RAM, the SRAM on the chip.
[00:58:13] Alessio: It's much more convenient when you're using larger language models, but the models outpace the size of the chips and chips have a much longer, you know, turnaround cycle. Grok today. It's great for the current architecture. It's a lot more expensive also, as far as dollar per flop but their idea is like, hey, when you have very high concurrency, we actually were much cheaper, you know, you shouldn't just be looking at the compute power for most people, this doesn't really matter, you know, like, I think that's like the most the most interesting thing to me is like, We've now gone back with, with AI to a world where developers care about what hardware is running, which was not the case in traditional software for like, maybe 20 years since as the cloud has gotten really big.
[00:58:57] Alessio: My, my thinking is that in the next two, three years, like we're going to go back to that. We're like, people are not going to be sweating. Oh, what GPU do you have in your cloud? What do you have? It's like. Yeah, you want to run this model, we can run it at the same speed as everybody else, and then everybody will make different choices, whether they want to have higher front end capital investment, and then better utilization, some people would rather do lower investment before, and then upgrade later, there are a lot of parameters and then there's the dark horses, right, that is some of the smaller companies like Lemurian Labs, MedEx that are working on maybe not a chip alone, but also like some of the, the actual math infrastructure and the instructions on it that make them run.
[00:59:40] Alessio: There's a lot going on, but yeah, I think the, the episode with with Dylan will be interesting for, for people, but I think we also came out of it saying, Hey, everybody has pros and cons. There's no, it's different than the models where you're like, Oh, this one is definitely better for me. And I'm going to use it.
[00:59:56] Alessio: I think for most people. It's like fun Twitter memeing, you know, but it's like 99 percent of people that tweet about this stuff are never gonna buy any of these chips anyway. It's, it's really more for entertainment.
[01:00:10] swyx: No. Wow. I mean, like, this is serious business here, right? You're talking about, you know, like who, like the potential new Nvidia, if anyone can take like 1% of NVIDIA's business, they're a serious startup that you should look at.
[01:00:20] swyx: Right? So , that's, that's, that's my, well, yeah,
[01:00:23] Alessio: yeah. On matters. Well, I'm more talking about like, what, how should people think about it? You know? It's like, yeah. I think like the, the end user is not impacted as much.
[01:00:31] Disagreements
[01:00:31] Alessio: This is obviously, so
[01:00:32] swyx: I disagree. Yeah, I love disagreements because, you know, who likes a podcast where all three people always agree with each other?
[01:00:38] swyx: You will see the impact of this in the tokens per second over time. This year, I have very, very credible sources all telling me that the average tokens per second, right now, we have somewhere between 50 to 100 as like the norm for people. Average tokens per second will go to 500 to 2, 000. This year from, from a number of chip suppliers that I cannot name.
[01:00:58] swyx: So like that is, that is, that will cause a step change in the use cases. Every time you have an order of magnitude improvement in the, in the speed of something, you unlock new use cases that become fun instead of a chore. And so that's what I would caution this audience to think about, which is like, what can you do in much higher AI speed?
[01:01:17] swyx: It's not just things streaming out faster. It is things working in the background a lot more seamlessly and therefore being a lot more useful. Then previously imagined. So that would be my two cents on.
[01:01:30] Alessio: Yeah. Yeah. I mean, the, the new NVIDIA chips are also much faster. To me, that's true. When it comes to startups, it's like, are the startups pushing the performance on the incumbents or are the incumbents still leading?
[01:01:44] Alessio: And then the startups are like riding the same wave, you know? I don't have yet a good sense of that. It's like, you know, it's next year's NVIDIA release. Just gonna be better than everything that gets released this year, you know, if that's the case, it's like, okay, damn Jensen, you know, it's like the meme.
[01:02:00] Alessio: It's like, I'm gonna fight. I'm gonna fight NVIDIA. It's like, damn, Jensen got hands. He really does.
[01:02:08] Summer 2024 Predictions
[01:02:08] NLW: Well, awesome conversation, guys. I guess just just by way of wrapping up, I call it over the next three months between now and sort of the beginning of summer was one prediction that each of you has. It can be about anything. It can be a big company. It can be a startup. It can be something you have privileged information that you know, and you just won't tell us that you actually
[01:02:25] Alessio: know.
[01:02:26] Alessio: What, does it have to be something that we think it's going to be true or like something that we think? Because for me, it's like, is Sundar going to be the CEO of Google? Maybe not in three months, maybe in like six months, nine months, you know, people are like, Oh, maybe Demis is going to be the new CEO.
[01:02:41] Alessio: That was kind of like, I, I was busy like fishing some deep mind people and Google people for like a good guest for the pod. And I was like, Oh, what about. Jeff Dean, and they're like, well, Demis is really like the person that runs everything anyway, and the stuff. It's like interesting. And
[01:02:57] swyx: so I don't know.
[01:02:58] swyx: What about Sergei? Sergei Sergei could come back. I don't know. Like he's making more appearances these days.
[01:03:03] Alessio: Yeah. I don't, I I Then we can just put it as like, you know. Yeah. My, my thing is like CEO change potential, but I, again, three months is too short to make a prediction. Yeah. I
[01:03:16] NLW: think that's the, that's that's fine.
[01:03:18] NLW: The, the timescale might be off.
[01:03:22] swyx: Yeah. I mean for me, I, I think the. Progression in vertical agent companies will keep going. We just had, the other day, Klarna talking about how they replaced like 700 of their customer support agents with the AI agents. That's just the beginning, guys. Like, imagine this rolling out across most of the Fortune 500.
[01:03:43] swyx: This is, and I'm not saying this is like a utopian scenario, there will be very, very embarrassing and bad outcomes of this, where like, humans would never make this mistake, but AIs did, and like, we'll all laugh at it, or we'll be very offended by whatever, you know, bad outcome it did. So we have to be responsible and careful in the rollout, but yeah, this is, it's rolling out, you know, Alessio likes to say that this year's the year of AI in production.
[01:04:04] swyx: Let's see it, let's, let's see all these sort of vertical, full stack employees. Come out into the workforce. Love
[01:04:11] Alessio: it.
[01:04:11] NLW: All right, guys. Well, thank you so much for for sharing your your thoughts and insights here And I can't wait to do it again
[01:04:18] Thursday Nights in AI - swyx
[01:04:18] NLW: Welcome
[01:04:19] swyx: back again. It's Charlie your AI co host We're now in part two of the special weekend episode collating some of SWIX and Alessio's recent appearances If you're not active in the Latentspace Discord, you might not be aware of the many, many, many in person.
[01:04:36] swyx: Events we host gathering our listener community all over the world. You can see the Latentspace community page for how to join and subscribe to our event calendar for future meetups. We're going to share some of our recent live appearances in this next part, starting with the Thursday nights in AI meetup, a regular fixture in the SF AI scene run by Imbue and Outset Capital.
[01:04:59] swyx: Primarily, our former guest, Kanjin Q, Ali Rhoda, and Josh Albrecht. Here's Swyx.
[01:05:08] swyx: Today, for those of you who have been here before, you know the general format. So we'll do a quick fireside Q& A with Swyx. Swyx, where we're asking him the questions. Then we'll actually go to our rapid fire Q& A, where we're asking really fast, hopefully, spicy questions. And then we'll open it up to the audience for your questions.
[01:05:25] swyx: So you guys sneak around the room, submit your questions, and we'll go through as many of them as possible during that period. And then actually, Swyx brought a gift for us, which is two Latentspace t shirts. AI Engineer. AI Engineer t shirts. And those will be awarded to the Two spiciest question askers.
[01:05:44] swyx: So and I'll let Josh decide on that. So if we want to get your spiciest takes, please send them in during the event as we're talking and then also at the end. All right. With that, let's get going.
[01:05:57] NLW: Okay. Welcome, Swyx. Thank you for that
[01:06:01] swyx: intro.
[01:06:01] NLW: How does it
[01:06:01] swyx: feel to be interviewed
[01:06:03] NLW: rather than the interviewer?
[01:06:04] swyx: Weird. I don't know what to do in this chair. Yeah. Like,
[01:06:07] NLW: where should I put my hands? Yeah, exactly. You look good.
[01:06:10] swyx: You look good. And I also love asking follow up questions. And I tend to, like, sort of take over panels a lot. If you ever see me on a panel, I tend to ask the other panelists questions.
[01:06:18] swyx: Okay.
[01:06:19] NLW: So we should be ready is what you're saying. So you back.
[01:06:21] swyx: That's fine. This is like a free MBU interview, so why not? That's right. That's right. That's
[01:06:24] NLW: right.
[01:06:25] swyx: Yeah, so you interviewed Ken Jeon, the CEO you didn't interview Josh, right? No, no. So maybe tonight. Yeah. Okay. We'll see. We'll look for different questions and look for an alignment.
[01:06:35] NLW: I love it. All
[01:06:36] swyx: right. I just want to hear this story. You know, you've completely exploded LatentSpace and AI Engineer, and I know you also, before all of that, had exploded in popularity for your learning in public movement and your DevTools work. And devrelations work. So, who are you and how did you get here?
[01:06:53] swyx: Let's
[01:06:53] NLW: start with that.
[01:06:54] swyx: Quick story is, I'm Shawn, I'm from Singapore. Swyx is my initials. For those who don't know, A lot of Singaporeans are ethically Chinese, and we have Chinese names and English names. So, it's just it's just my initials. Came to col came to the US for college, and have been here for about 15 years, but most, like half of that was in finance and then the other half was, was in tech.
[01:07:13] swyx: And the, and tech is where I was most known just because I realized that I was much more aligned towards learning in public, whereas in finance, Everything's a trade secret. Everything is zero sum. Whereas in tech, like, you're allowed to come to meetups and conferences and share your learnings and share your mistakes even.
[01:07:31] swyx: And that's totally fine. You, like, open source your code. It's totally fine. And even, even better, you, like, contribute PRs to other people's code, which is even better. And I found that I thrived in that. Learning public environments and that, that kind of got me started. I was an early hire, early Draft Relations hire at Netlify and then did the same at AWS Temporal and Airbyte.
[01:07:53] swyx: And then, and so that, that's like the whole story. I can talk, talk more about like developer tooling and developer relations if, if that's something that people are interested in. But I think the, the more recent thing is AI. And I started really being interested in it mostly because It, it, the, the approximate cause of starting Leanspace was stable diffusion.
[01:08:10] swyx: When you could run a large model that could do sufficiently enough on your, on your desktop. Where I was like, okay, like, this is, Something qualitatively very different. And that's then we started late in space and you're like, this is something different. We have to talk about it on a podcast.
[01:08:25] swyx: There we go. Yeah. It wasn't, it wasn't a podcast for like four months. And then, and then I had been running a discord for dev tools investors. 'cause I, I also invest in dev tools and I advise companies on deaf tools, def things. And I think it was the start of 2023 when Alessio and I were both like, you know, I think we, we need to like get more tokens out of.
[01:08:45] swyx: People, and I was running out of original sources to, to write about, so I was like, okay, I'll go get those original sources. And I think that, that's when we started the podcast. And I think it's just the chemistry between us, the, the way we spike in different ways. And also, like, honestly, the kind participation of the guests to give us their time.
[01:09:03] swyx: Like, you know, like, getting George Hoss was a big deal. And also shout out to Alessio for just cold emailing him for, for, for booking the, booking some of our biggest guests. And I'm just working really hard to try to tell the story that people can use at work. I think that there's a lot of AI podcasts out there and a lot of AI kind of forums or fireside chats with no fire.
[01:09:21] swyx: That always talk about age, like what's your AGI timeline, what's your PDoom. Very, very nice hallway conversations for freshman year but not very useful for work. And like, you know, practically like making money and like And thinking about, you know, changing the everyday lives. I think what's interesting is obviously you care about the existential safety of the human race.
[01:09:43] swyx: But in the meantime we gotta eat. So so I think that's like kind of latent space's niche. Like we explicitly don't really talk about AGI. We explicitly don't talk about Things that we're, like, a little bit too far out. Like, we don't do a ton of robotics. We don't do a ton of, like, high frequency trading.
[01:10:00] swyx: There's tons of machine learning in there, but we just don't do that. Because, like, we're like, all right, what are most software engineers gonna, gonna need? Because that's our background, and that's the audience that we serve. And I think just, like, being really clear on that audience has been, has resonated with people.
[01:10:12] swyx: Yeah, you would never expect a technical podcast to reach, like, a general audience, like, Top ten on the tech charts but I, you know, I've been surprised by that before and it's been successful. I don't know, I don't know what to say about that. I think honestly, I, I kind of have this like negative reaction towards being, being, being, being, being classified as a podcast because the podcast is downstream of ideas.
[01:10:35] swyx: And it's one mode of conversation, it's one mode of idea delivery, but you can deliver ideas on a newsletter, in person like this there's so many different ways. And so I think, I think about it more as we are trying to start or serve an industry, and that industry is the AI engineer industry, which is, which we can talk about more.
[01:10:53] swyx: Yes, let's go into that. So the AI engineer, you penned a piece called The Rise of the AI Engineer, you tweeted about it, Andrej Karpathy also responded, largely agreeing with what you said. What is an AI engineer? The AI engineer is the software engineer building with AI, enhanced by AI, And eventually it will be non human engineers writing code for you, Which I know MBU is all about.
[01:11:18] swyx: You're saying eventually the AI engineer will become a non human engineer? That will be one kind of AI engineer that people are trying to build, And is probably the most furthest away in terms of being reality. Because it's so hard. Got it. But, but there are three types of AI engineer and I just went through the three.
[01:11:33] swyx: One is AI enhanced where you like use AI products like Copilot and Cursor. And two is AI products engineer where you use the exposed AI capabilities to the end user As a software engineer, like, not doing pre training not being an ML researcher, not being an ML engineer, but just interacting with foundation models and probably APIs from foundation model labs.
[01:11:54] swyx: What's the third one? And the third one is the non human AI engineer. Got it. The fully autonomous AI engineer. Dream, you know, Coder. How long do you think it is till we get to, like, early, early versions? This is my equivalent of AGI timelines. I know, I know. You can set yourself up for this. So like, lots of active, like, I mean, I have, I have supported companies actively working on that.
[01:12:13] swyx: I think it's more useful to think about levels of autonomy. And so my answer to that is, you know, perpetually five years away until until it figures it out. No, but my actual anecdote the closest comparison we have to that is self driving. We are, we're doing this in San Francisco for those who are watching the live stream.
[01:12:32] swyx: If you haven't come to San Francisco and seen, and taken a Waymo ride just come, get a friend take a Waymo ride. I remember 2014 we covered a little bit of autos in, in my hedge fund. And I was, I remember telling a friend, I was like, self driving cars around the corner, like, this is it, like, you know, parking will be, like, parking will be a thing of the past and it didn't happen for the next 10 years.
[01:12:52] swyx: And, and, but now we, now, like, most of us in San Francisco can, can take it for granted. So I think, like, you just have to be mindful that the, the, the, the rough edges take a long time. And like, yes, it's going to work in demos, then it's going to work a little bit further out and it's just going to take a long time.
[01:13:08] swyx: The more useful mental model I have is sort of levels of autonomy. So in self driving, you have level 1, 2, 3, 4, 5 just the amount of human attention that you get. At first, like, your, your, your hands are always on 10 and 2 and you have to pay attention to the, to, to the driving every 30 seconds and eventually you can sleep in the car, right?
[01:13:25] swyx: So there's a whole spectrum of that. So what's the equivalent for that for, for coding? Keep your hands on the keyboard and then eventually you've kind of gone off. You tab to accept everything. Where are we? Oh, that's good, yeah. Yeah. Doesn't that already happen? Yeah. Approve the PR. Approve, this looks good.
[01:13:39] swyx: That's the dream that people want. It gives, it gives, really you unlock a lot of coding when people, non technical people can file issues, and then the AI engineer can sort of automatically write code, pass your tests, and if it, if it kind of works as, as, as intended. As, as advertised then you can just kind of merge it and then you, you know, 10x, 100x the number of developers in your company immediately.
[01:14:00] swyx: So that's the goal, that's the, that's the holy grail. We're not there yet but Sweep, CodeGen, there's a bunch of companies, Magic probably, are, are all working towards that. And, and so I so the TLDR, like the, the thing that we covered Alessio and I covered in the January recap that we did was that the, the basic split that people should have in their minds is the inner loop versus the outer loop for the developer.
[01:14:21] swyx: Inner loop is everything that happens in your IDE between Git commits. And outer loop is happens, is what happens when you push up your Git commit to GitHub, for example, or GitLab. And that's a nice split, which means like everything local, everything that needs to be fast is for everything that's kind of very hands on for developers.
[01:14:37] swyx: It's probably easier to automate or easier to have code assistance. That's what Copilot is, that's what, that's what all those things are. And then everything that happens autonomously when you're effectively away from the keyboard with like a GitHub issue or something that is more outer loop where you're you know, you're relying a lot more on autonomy and we are maybe, our LLMs are maybe not smart enough to do that yet.
[01:14:57] Alessio: Do you have any thoughts on
[01:14:58] swyx: kind of
[01:14:58] Alessio: the user experience and how that will change? One of the things
[01:15:01] swyx: that has happened for me, kind of looking at some of these products and playing around with things ourselves, like, You know, it sounds good to have an automated PR, then you get an automated PR and you're like, I really don't want to review like 300 lines of generated code, and like find the bug in it.
[01:15:13] swyx: Well then you have another agent that's a reviewer. That's right, but then you like tell it like, Oh, go fix it, and it comes back with 400 lines. Yes, there is a length bias to code, right? And you do have higher passing rates. In PRs. This is a documented human behavior thing, right? Send me two lines of code, I will review the s**t out of that.
[01:15:33] swyx: I don't know if I can swear on this. Send me, send me 200 lines of code, looks good to me. Right? Guess what? The, the agents are going to, perfectly happy to modify, to copy that behavior from us. When we actually want them to do the opposite. So, yeah, I, I think that the GAN model of code generation is probably not going to work super well.
[01:15:50] swyx: I do think we probably need just better planning from the start. Which is, I'm just repeating the MBU thesis by the way. Just go listen to Kanjin talk about this. She's much better at it than I am. But yeah, I think I think the code review thing is going to be I think that what Codium, there are two Codiums, the Israeli one.
[01:16:10] swyx: The Israeli Codium. With the E. Yeah, Codium with the E. They still have refused to rename. I'm friends with both of them. Every month I'm like, You're like, guys, let's
[01:16:18] NLW: all come to one room. Yeah,
[01:16:19] swyx: like, you know, someone's got to fold. Codium with the E has gone, like, you've got to write the test first. Right?
[01:16:25] swyx: You write the, you write the it's like a sort of tripartite relationship. Again, this was also covered on a podcast with them, which is fantastic. Like, you interview me, you sort of through me, you interview. Like, the past avatars I've been watching the Netflix show, by the way, it's fantastic. But like, so so Codium is like, they've already thought this all the way through.
[01:16:41] swyx: They're like, okay, you write the user story, from the user story you generate all the tests, you also generate the code and you update any one of those, they all have to update together. Right? So like, once the, and, and probably the critical factor is the test generation from the story. Because everything else can just kind of bounce the heads off of those things until they pass.
[01:17:01] swyx: So you have to write good tests. It's kind of like the eat your vegetables of coding, right? Which nobody really wants to do. And so I think it's a really smart tactic to go to market by saying we automatically generate tests for you and, you know, start not great, but then get better. And eventually you get to the weakest point in the chain for the entire loop of code generation.
[01:17:25] swyx: What do you think the weakest link is? The weakest link? Yeah. It's text generation. Yeah. Yeah. Do you think there's a way to, like, are there some promising
[01:17:33] Alessio: avenues you see forward for making that actually better?
[01:17:38] swyx: For making it better. You have to have, like, good isolation, and I think proper serverless cloud environments is integral to that.
[01:17:48] swyx: I, it could be like a fly. io. It could be like a Cloudflare worker. It depends how much, how many resources your test environment needs. And effectively I was talking about this, I think with maybe Rob earlier in the audience, where every agent needs a sandbox. If you're a code agent, you need a coding sandbox, but if you're whatever, like MBU used to have this, like, sort of Minefield, Minecraft's clone that was much faster.
[01:18:12] swyx: If, if you, if you have a model of the real world, you have to go, you have to go generate some plan or some code or some whatever, test it against that real world so that you can get this iterative feedback and then get the final result back that is somewhat validated against the real world. And so, like, you need a really good sandbox.
[01:18:26] swyx: I don't think people, I, I think this is, this is a, this is an infrastructure need that humans
[01:18:31] swyx + Josh Albrecht: have had for a long time. We've never solved it for ourselves. And now we have to solve it for humans. About a thousand times larger quantity of agents than, than, than actually exists. And, and so I, I, I think, like, we eventually have to involve, evolve a lot more infrastructure.
[01:18:45] swyx + Josh Albrecht: In order to serve these things. So yeah. So, for those who don't know, like I also have so, we're talking about the rise of AI engineer. I also have previous conversations about immutable infrastructure cloud environments and that kind of stuff. And this is all of a kind. Like, like, in order to solve agents and coding agents, we're going to have to solve the other stuff too along the way.
[01:19:05] swyx + Josh Albrecht: And it's really neat for me. To see all that tie together in my DevTools work that all these themes kind of reemerge just naturally, just because everything we needed for humans, we just need a hundred times more for, for for agents.
[01:19:17] Dylan Patel: Let's talk about the AI engineer. AI engineer has become a whole thing.
[01:19:21] Dylan Patel: It's become a term and also a conference. And tell us more, and a job title, tell us more about that. What's going on there?
[01:19:31] swyx + Josh Albrecht: That is like a very vague, a very, very big cloud of things. I would just say like, I think it's an emergent industry. I've seen this happen repeatedly for, so the general term is software engineer.
[01:19:44] swyx + Josh Albrecht: Programmer. In the 70s and 80s, there would not be like senior engineer. There would just be engineering. Like you, or you, I don't think they even call themselves engineer. They don't have that. What about a member of the technical staff? Oh, yeah, MTS. Very, very, very, very elite. But yeah, so like, you know, like these striations appear when the population grows and the technical depth grows over time.
[01:20:07] swyx + Josh Albrecht: Yeah. When it starts, when it ends. Not that, not that important, and then over time it's just gonna specialize. And I've seen this happen for frontend, for DevOps, for data and I can't remember what else I listed in, in that, in that piece, But those are the main three that I was around for. And I, I see this, I saw this happening for AI engineer which is effectively, now a lot of people are arguing that there is the ML researcher, the ML engineer, who sort of pairs with the researcher sometimes they also call research engineer and then on the other side of the fence is just software engineers.
[01:20:35] swyx + Josh Albrecht: And that's how it was up till about last year. And now there's this specializing and rising class of people building AI specific software that are not any of those previous titles that I just mentioned. And that's the thesis of the AI engineer, that this is an emerging category of AI. Startups of jobs I've had people from Meta, IBM, Microsoft, OpenAI tell me that they, their title is now AI engineer.
[01:20:58] swyx + Josh Albrecht: They're hiring AI engineers. So, like, I can see that this is a trend and I think that's what Andre called out in his post that, like, just mathematically, just the, just the limitations in terms of talent, research talents and GPUs, that all these will tend to concentrate in a, in a, in a, Few labs and everyone else are just going to have to rely on them or build differentiation of products in other ways And those will be AI engineers.
[01:21:21] swyx + Josh Albrecht: So mathematically there will be more AI engineers than ML engineers. It's just the truth. Right now it's the other way. Right now the number of AI engineers is maybe 10x less. So I think that the ratio will invert and you know I think the goal of the InSpace and the goal of the conference and anything else I do is to serve that
[01:21:38] Dylan Patel: growing audience.
[01:21:41] Dylan Patel: To make the distinction clear, if I'm a software engineer And I'm like, I want to become an AI engineer. What do I have to learn? Like, what additional capabilities does that type of engineer have? Funny you say that. I think you have a blog post on this very
[01:21:53] swyx + Josh Albrecht: topic. I don't actually have a specific blog post on how to, like, change classes.
[01:21:58] swyx + Josh Albrecht: I do think I always think about these in terms of yeah, Baldur's Gate and, you know D& D rule set number 5. 1 or whatever. But yeah, so I kind of intentionally left that open to leave space for others. I think when you start an industry, you need to the specifications that work the best in industries are So minimally defined so that other people can fill in the blanks.
[01:22:19] swyx + Josh Albrecht: And I want people to fill in the blanks. I want people to disagree with me and with with themselves so that we can figure this out as a, as a group. Like I don't want to overs specify everything, you know, like that that's, that's a way, that's the only way to guarantee it, that it will fail. Um, I do have a take obviously, 'cause a lot of people are, are asking me like, where to start.
[01:22:37] swyx + Josh Albrecht: And I think basically so what, what we have is latent Space University. We just finished working on day seven today. It's a seven day email course. Where basically, like, it, it is completely designed to answer the question of, like, okay, I'm a, I'm an existing software engineer, I, like, kind of, I know how to code but I don't get all this AI stuff, I've been living under a rock, or, like, it's just too overwhelming for me, you have to, like, pick for me, or curate for me as a, as a trusted friend.
[01:22:59] swyx + Josh Albrecht: And I have one hour a day for seven days. What, what, what do you do? slot in that, in that, in that bucket. So for us, it's making, making sort of LLM API, API calls. It's me, it's image generation, it's code generation, it's audio ASR, I, I think, what's, what's ASR? Audio speech recognition?
[01:23:18] swyx + Josh Albrecht: Yeah, yeah. And then I forget, I forget what the fifth and sixth one is, but the last day is agents. And, and so basically, like, I'm just like, you know, Here are seven projects that you should do to feel like you can do anything in AI. You can't really do everything in AI just from, just from that small list.
[01:23:34] swyx + Josh Albrecht: But I think it's just like, just like anything, you have to like, go through like a set list of, of things that are basic skills that I think everyone in this industry should have to be at least conversant in. If someone, if like a boss comes to you and goes like, hey, can we build this? You don't even know if the answer is no.
[01:23:52] swyx + Josh Albrecht: So I want you to move towards from like unknown unknowns to at least known unknowns. And I think that's, that's where you start being competent as an AI engineer. So, so yeah, that's LSU, Latent Space University, just to trigger the The Tigers.
[01:24:06] Dylan Patel: So do you think in the future that people, an AI engineer is going to be someone's full time job?
[01:24:10] Dylan Patel: Like people are just going to be AI engineers? Or do you think it's going to be more of a world where I'm a software engineer, and like, 20 percent of my time, I'm using open AIs, APIs, and I'm, Working on prompt engineering and stuff like that and using
[01:24:23] swyx + Josh Albrecht: CodePilot. You just reminded me of Day6's open source models and fine tuning.
[01:24:27] swyx + Josh Albrecht: Perfect. I think it will be a spectrum. That's why I don't want to be like too definitive about it. Like we have full time front end engineers and we have part time front end engineers and you dip into that community whenever you want. But wouldn't it be nice if there was a collective name for that community so you could go find it?
[01:24:40] swyx + Josh Albrecht: You can find each other. And, like, honestly, like, that's, that's really it. Like, a lot of people, a lot of companies were pinging me for, like, Hey, I want to hire this kind of person, but you can't hire that person, but I wanted someone like that. And then people on the labor side were, were pinging me going, like, Okay, I want to do more in this space, but where do I go?
[01:24:56] swyx + Josh Albrecht: And I think just having that shelling point of, of, of what an industry title and name is, and then sort of building out that. Mythology and community and conference I think is helpful, hopefully, and I don't have any prescriptions on whether or not it's a full time job. I do think, over time, it's going to become more of a full time job.
[01:25:14] swyx + Josh Albrecht: And that's great for the people who want to do that and the companies that want to employ that. But it's absolutely, like, you can take it part time, like, you know, jobs come in many formats. Yep, yep, that
[01:25:23] Dylan Patel: makes sense. Yeah. And then you have a huge world fair coming up. Yeah. Tell me about that. So,
[01:25:31] swyx + Josh Albrecht: Part of, I think, you know, What creating an industry requires is for, to let people gather in one place.
[01:25:37] swyx + Josh Albrecht: And also for me to get high quality talks out of people. You have to create an event out of it. Otherwise they don't do the work. So so last year we did the AI Engineer Summit, which went very well. And people can see that online and we're, we're, we're very happy with how that turned out.
[01:25:53] swyx + Josh Albrecht: This year we want to go four times bigger with the World's Fair and try to reflect AI engineering as it is in 2024. I always admired two conferences in, in this respect. One is NeurIPS, which I went to last year and, and documented on, on the pod, which was fantastic. And two, which is KubeCon from the other side of my life, which is the sort of cloud registration and, and DevOps world.
[01:26:18] swyx + Josh Albrecht: So NeurIPS is the one place that you go to, to, I think it's the top conference. I mean, there's, there's others that you can kind of consider. But, yeah so, so NeurIPS is, NeurIPS is where the research sciences are the stars. The researchers are the stars, PhDs are the stars, mostly it's just PhDs on the job market, to be honest.
[01:26:34] swyx + Josh Albrecht: It's really funny
[01:26:35] Dylan Patel: to go, especially these days. Yeah, it
[01:26:37] swyx + Josh Albrecht: was really funny to go to NeurIPS and go like, And the VCs trying to back them. Yeah, there are lots, lots of VCs trying to back them. Yeah, there This year. Anyway, so in Europe, research scientists are the stars. And for, I wanted for AI engineers, for engineers to be the star.
[01:26:51] swyx + Josh Albrecht: Right, to show off their tooling and their techniques and their difficulty moving all these ideas from research into production. The other one was KubeCon, where, You could honestly just go and not attend any of the talks and just walk the floor and figure out what's going on in DevOps, which is fantastic.
[01:27:10] swyx + Josh Albrecht: Because, yeah, so, so that curation and that bringing together of, of, of an industry is what I'm going for for the conference. And yeah, it's coming in June. The most important thing, to be honest, when I, like, conceived of this whole thing was to buy the domain. So we got AI. engineer. People are like, engineer is a domain?
[01:27:27] swyx + Josh Albrecht: Yeah, and funny enough, engineer was cheaper than engineering. I don't understand why, but like that's up to the domain people.
[01:27:36] Dylan Patel: Josh, any questions on agents?
[01:27:38] Alessio: Yeah,
[01:27:39] Dylan Patel: I think maybe, you know, you have a lot
[01:27:40] swyx + Josh Albrecht: of experience and exposure talking to all these companies and founders and researchers and everyone that's on your podcast.
[01:27:47] Dylan Patel: Do you have, do you feel like you have a
[01:27:50] swyx + Josh Albrecht: good kind of perspective on some of the things that, like, some of the kind of technical issues having seen? You know, like we were just talking about, like, for coding agents, like, oh, how, you know, the value of test is really important. There are other things, like, for, you know, retrieval, like now, You know, we have these models coming out with a million context, you know, or a million tokens of context length, or ten million, like, is retrieval going to
[01:28:10] Dylan Patel: matter anymore, like,
[01:28:11] swyx + Josh Albrecht: do
[01:28:11] Dylan Patel: huge contexts matter, like,
[01:28:13] swyx + Josh Albrecht: what do you think?
[01:28:14] swyx + Josh Albrecht: Specifically about the long context thing? Sure, yeah. Because you asked a more broad question. I was going to ask a few other ones after that, so go for that one first. Yeah. That's what I was going to ask first. We can ask, yeah, okay, let's talk about long context and then the other stuff. So, for those who don't know, LongContext was kind of in the air last year, but really, really, really came into focus this year.
[01:28:33] swyx + Josh Albrecht: With Gemini 1. 5 having a million token context and saying that it was in research for 10 million tokens. And that means that you can put, you, you, you, like, no longer have to really think about, What you retrieve sorry, you no longer really think about what you have to, like, put into context.
[01:28:50] swyx + Josh Albrecht: You can just kind of throw it, throw the entire knowledge base in there, or books, or film, anything like that and that's fantastic. A lot of people are thinking that it kills RAG, and I think, like, one, that's not true, because for any kind of cost reason you you know, you still pay per token, so if you there, so basically Google is, like, perfectly happy to let you pay a million tokens every single time you make an API call, but good luck, you know, having a hundred dollar API call.
[01:29:12] swyx + Josh Albrecht: And and then the other thing, it's going to be slow. No explanation needed. And then finally, my criticism of long context is that it's also not debuggable. Like, if something goes wrong with the result, you can't do, like, the ragged decomposition of where the source of error. Like, you just have to, like, go, like, it's the Waze, bro.
[01:29:29] swyx + Josh Albrecht: Like, it's somewhere in there. Sorry. I pretty strongly agree with this. Why do you think people are making such crazy long context windows? People love to kill rag, right? It's so much Kill it, though, because it's too expensive. It's so expensive like you said. Yeah, I just think I just call it a different dimension I think it's an option that's great when it's there like when I'm prototyping I do not ever want to worry about context and I'm gonna call Stuff a few times and I don't want to run to errors I don't want to have it set up a complex retrieval system just to prototype something But once I'm done prototyping then I'll worry about all the other rag stuff And yes, I'm gonna buy some system or build some system or whatever to go do that.
[01:30:02] swyx + Josh Albrecht: I so I think it's just like An improvement in like one dimension that you need And then, but the improvements in the other dimensions also matter. And it's all needed, like this space is just going to keep growing, um, in unlimited fashion. I do think that this combined with multi modality does unlock new things.
[01:30:21] swyx + Josh Albrecht: So That's what I was going to ask about next. It's like, how important is multi modal? Like, great, you know, generating videos, sure, whatever. Okay, how many of us need to generate videos that often? It'd be cool for TV shows, sure, but like, yeah. I think it's pretty important. And the one thing that, in, when we launched the Lean Space podcast, We listed a bunch of interest areas.
[01:30:37] swyx + Josh Albrecht: So one thing I love about being explicit or intentional about our, our work is that you list the things that you're interested in and you, you list the things that you're not interested in. And people are very unwilling to, to, to have an anti interest list. One of the things that we were not interested in was multimodality last year.
[01:30:55] swyx + Josh Albrecht: Because everyone was, I was just like, okay, you can generate images and they're pretty, but like not a giant business. I was wrong. Midrani is a giant, giant, massive business that no one can get it, no one can understand or get into. But also I think being able to, to natively understand audio and video and code.
[01:31:12] swyx + Josh Albrecht: I consider code a special modality. All that is very, like, qualitatively different than translating it into English first and using English as, I don't know, like a bottleneck or pipe and then you know, applying it in LLMs. Like the ability of LLMs to reason across modalities gives you something more than you could, you know, Individually by, by, by using text as the universal interface.
[01:31:33] swyx + Josh Albrecht: So I think that's useful. So concretely what, what does that mean? It means that so I think the reference post for everyone that you should have in your head is Simon Willison's post on Gemini 1. 5's video capability. Where he basically shot a video of his bookshelf and just kind of scanning through it.
[01:31:50] swyx + Josh Albrecht: And he was able to give back a, a complete JSON list of the books and the authors and, and all the details that were visible there. Hallucinated some of it, which is, you know, another, another issue. But I think it's just like unlocks this use case that you just would not even try to code without the native video understanding capability.
[01:32:08] swyx + Josh Albrecht: And obviously, like. On a technical level, video is just a bunch of frames. So actually it's just image understanding, but image within the temporal dimension, which this month, I think, became much more of a important thing, like the integration of space and time in Transformers. I don't think anyone was really talking about that until this month, and now it's the only thing anyone can ever think about for Sora and for all the other stuff.
[01:32:30] swyx + Josh Albrecht: The last thing I'll say that, which is which is Against this trend of like every modality is important. They just, just do all the modalities. I kind of agree with Nat Friedman who actually kind of pointed out just before the Gemini thing blew up this, this, this, this month, which was like, why is it that OpenAI is pushing Dolly so hard?
[01:32:48] swyx + Josh Albrecht: Why is, why is being pushing Bing image creator? Like, it's not nec, it's not apparent that you have to create images to create a GI. But every lab just seems to want to do this, and I kind of agree that it's not on the critical path. Especially for image generation, maybe image understanding, video understanding.
[01:33:04] swyx + Josh Albrecht: Yeah, consumption. But generation, eh. Maybe we'll be wrong next year. It just catches you a bunch of flack with like, you know, culture war things. Alright, we're going to
[01:33:14] Dylan Patel: move into rapid fire Q& A, so we're going to ask you questions. We've cut
[01:33:26] Dylan Patel: the Q& A section for time, so if you want to hear the spicy questions, head over to the Thursday Nights in AI video for the full discussion.
[01:33:34] Dylan Patel - Semianalysis + Latent Space Live Show
[01:33:34] Dylan Patel: Next up, we have another former guest, Dylan Patel of Semianalysis, the inventor of the GPU rich poor divide, who did a special live show with us in March. But that means you can finally, like, side to side A B test your favorite Boba
[01:33:51] Alessio: shops?
[01:33:51] Alessio: We got Gong Cha, we got Boba Guys, we got the Lemon, whatever it's called. So, let us know what's your favorite. We also have Slido up to submit questions. We already had Dylan on the podcast, and like, this guy tweets and writes about all kinds of stuff. So we want to know what people want to know more
[01:34:07] Alessio: about.
[01:34:08] Alessio: Rather than just being self, self driven. But we'll do A state of the union, maybe? I don't know. Everybody wants to know about Grok. Everybody wants to know whether or not NVIDIA is going to zero after Grok. Everybody wants to know what's going on with AMD. We got some AMD folks in the crowd, too.
[01:34:23] Alessio: So feel free to interact at any time. This is We have
[01:34:27] swyx + Josh Albrecht: portable mics.
[01:34:27] Dylan Patel: Heckle, please. What do you sorry. Good comedians show their color when with the way they can handle the crowd when they're heckled.
[01:34:35] Alessio: Do not throw Boba. Do not throw Boba at this end. We cannot afford another podcasting setup. Awesome.
[01:34:41] Alessio: Well, welcome everybody to the Semi Analysis and Latest Space Crossover. Dylan texted me on Signal. He was like, dude, how do I easily set up a meetup? And here we are today. Well, as you might have seen, there's no name tags. There's a bunch of things that are missing. But we did our
[01:34:55] Dylan Patel: best. It was extremely easy, right?
[01:34:58] Groq
[01:34:58] Dylan Patel: Like, I text Alessio. He's like, yo, I got the spot. Okay, cool. Thanks Here's a link. Send it to people. Sent it. And then showed up. And like, there was zero other organization that I required. So
[01:35:10] Alessio: everybody's here. A lot of, a lot of Semi Analysis fans we get in the crowd everybody wants to know more about what's going on today, and Grok has definitely been the hottest thing.
[01:35:19] Alessio: We just recorded our monthly podcast today, and we didn't talk that much about Grok because we wanted you to talk more about it, and then we'll splice you into our, our monthly recap. So, let's start there.
[01:35:29] swyx + Josh Albrecht: Okay, so, You guys, you guys are the new Grok spreadsheet ers. Yeah, yeah, so, so, we, we we broke out some Grok numbers because everyone was wondering, there's two things going on, right?
[01:35:37] swyx + Josh Albrecht: One you know, how important, or how does it achieve the inference speed that it does? That, that has been demonstrated by GrokChat. And two, how does it achieve its price promise that is promised that, that is sort of the public pricing of 27 cents per million token. And there's been a lot of speculation or, you know, some numbers thrown out there.
[01:35:55] swyx + Josh Albrecht: I put out some tentative numbers and you put out different numbers. But I'll just kind of lay that as, as the, as the groundwork. Like, everyone's like very excited about essentially like five times faster. Token generation than any other LLM currently. And that unlocks interesting downstream possibilities if it's sustainable, if it's affordable.
[01:36:14] swyx + Josh Albrecht: And so I think your question, or reading your piece on Grok, which is on the screen right now, is it sustainable?
[01:36:21] Dylan Patel: So like many things, this is VC funded, including this Boba. No, I'm just kidding, I'm paying for the Bobo, so but, but Thank you semi analysis
[01:36:29] swyx + Josh Albrecht: subscribers
[01:36:31] Alessio: I hope he pays for it, I pay for it right now That's
[01:36:33] Dylan Patel: true, that's true Alessio has the IOU, right?
[01:36:36] Dylan Patel: And that's, that's all it is, but yeah, like many things, you know, they're, they're not making money off of their inference service, right? They're just throwing it out there for cheap and hoping to get business and maybe raise money off of that, and I think that's a that's a fine use case, but the question is, like, how much money are they losing?
[01:36:53] Dylan Patel: Right, and, and that's sort of what I went through breaking down in this this article that's on the screen. And it's, it's pretty clear they're like 7 to 10x off, like, break even on their inference API, which is like horrendous, like far worse than any other sort of inference API provider. So this is like a simple, simple cost thing that was pulled up.
[01:37:15] Dylan Patel: You can either inference at very high throughput, or you can inference at very high, very low latency.
[01:37:20] Dylan Patel: With GPUs, you can do both. With Grok, you can only do one. Of course, with Grok, you can do that one faster. Marginally faster than a inference latency optimized GPU server. But no one offers inference latency optimized GPU servers because you would just burn money, right? Makes no economic sense to do so.
[01:37:36] Dylan Patel: Until maybe someone's willing to pay for that. So, so Grok service, you know, on the surface looks awesome compared to everyone else's service, which is throughput optimized. And, and then when you compare to the throughput optimized scenario, right, GPUs look quite slow, but the reality is they're serving, you know, 64, 128 users at once.
[01:37:54] Dylan Patel: Right, they're, they have a batch size, right? How many users are being served at once, whereas Grok Taking 576 chips, and they're not really doing that efficiently, right? You know, they're, they're serving a far, far fewer number of users, but extremely fast. Now, that could be worthwhile if they can get their, you know, the number of users they're serving at once up, but that's extremely hard because they don't have memory on their chip, so they can't store KV cache KV cache for, you know, all the various different users.
[01:38:21] Dylan Patel: And so, so the crux of the issue is just like, hey, So, can they, can they get that performance up as much as they claim they will, right? Which is, you know, they need to get it up more than 10x, right? To, to, to make this like a reasonable benefit, right? In the meantime, NVIDIA's launching a new GPU in two weeks that'll be fun at GTC and they're constantly pushing software as well, so we'll see if, if Grok can catch up to that.
[01:38:43] Dylan Patel: But the, the current verdict is, you know, they're, they're quite far behind, but it's hopeful, you know, that, that maybe they can get there by, you know, scaling their system larger. Yeah.
[01:38:52] swyx + Josh Albrecht: I was listening back to our original episode, and you were talking about how NVIDIA basically adopted this different strategy of just leaning on networking GPUs together.
[01:39:00] swyx + Josh Albrecht: And it seems like Grok has some, like, minor version of that going on here with the Grok rack. Is it enough? Like, what's Grok's next step here, like,
[01:39:12] Dylan Patel: strategically? Yeah, that's the next step is, of course, you know, so, you know, So right now they connect 10 racks of chips together, right, and that's the system that's running on their API today, right.
[01:39:23] Dylan Patel: Whereas most people who are running, you know, Mistral are running it on two GPUs, right. So one fourth of a server. Yeah. And that rack is not you know, obviously 10 racks is pretty crazy, but they think that they can scale performance if they have this individual system be 20 racks, right? They think they can continue to scale performance extra linearly.
[01:39:42] Dylan Patel: So that'd be amazing if they could but I, I, I'm, I'm doubtful that that's gonna be something that's scalable especially for, for, you know, larger models. So there's the
[01:39:56] Alessio: chip itself, but there's also a lot of work they're doing at the compiler level. Do you have any good sense of, like, how easy it is to actually work with LPU?
[01:40:04] Alessio: Like, is that something that is going to be a bottleneck for them?
[01:40:07] Dylan Patel: So, so Ali's in the front right there, and he, he knows a ton about about VLIW architectures. But to summarize sort of his opinion, and I think many folks's, it's, it's extremely hard to To program these sorts of architectures, right?
[01:40:19] Dylan Patel: Which is why they have their compiler and so on and so forth. But, you know, it's, it's an incredible amount of work for them to stand up individual models and to get the performance up on them which is what they've been working on, right? Whereas, whereas, you know, GPUs are far more flexible, of course.
[01:40:33] Dylan Patel: And so the question is, you know, can they, can they can, can this compiler continue to extract performance? Well, theoretically, like there, there's a lot more performance to run on the hardware. But they don't have, you know, many, many things that people generally associate with, with programmable hardware.
[01:40:49] Dylan Patel: Right? They don't have buffers and, and many other things. So, so it makes it very tough to to do that. But that's what their, you know, their relatively large compiler team is working on. Yeah,
[01:40:58] swyx + Josh Albrecht: So I'm, I'm not a GPU compiler guy. But I do want to clarify my understanding from what I read. Which is a lot of catching up to do.
[01:41:05] swyx + Josh Albrecht: It is, The crux of it is some kind of speculative, like I, in the, the word that comes to mind is speculative routing of weights and, you know, and, and work that, that needs to be done, or scheduling of work across the, you know, the 10 racks of, of GPUs. Is that the, is that like the, the, the bulk of the benefit that you get from
[01:41:25] Dylan Patel: the compilation?
[01:41:26] Dylan Patel: So, so with the Grok chips, what's really interesting is like with GPUs you can do, you can issue certain instructions. And you will get a different result. Like, depending on the time, I know a lot of people in ML have, have had that experience, right? Where like, the GPU literally doesn't return the numbers it should be.
[01:41:45] Dylan Patel: And that's basically called non determinism, right? And, and, and the, and, and, with, with Grok, their chip is completely deterministic. The moment you compile it, you know exactly how long it will take to operate, right? There is no, there is no, like, deviation at all. And so, you know, they've, they're planning everything ahead of time, right, like, every instruction, like, it will complete in the time that they've planned it for.
[01:42:08] Dylan Patel: And there is no I don't know, I don't know what the best way to state this is. There's no variance there which is interesting from, like, when you look historically, they tried to push this into automotive, right? Because automotive, you know, you probably want your car to do exactly what you issued it to do.
[01:42:22] Dylan Patel: And not have, sort of, unpredictability. But yeah, I don't, sorry, I lost track of the question.
[01:42:28] swyx + Josh Albrecht: It's okay, I just wanted to understand a little bit more about, like, what people should under, should know about the compiler magic that goes on with Brock. Like, you know, like, I think, I think, from a software, like, under, like, hardware point of view that in, that intersection of, you know,
[01:42:44] Dylan Patel: So, so, so chips have like, like and I'm going to steal this from someone here in the crowd, but chips have like five, you know, sort of, there's like, when you're designing a chip, there's, there's, it's called PPA, right?
[01:42:54] Dylan Patel: Power, performance, and area, right? So it's kind of a triangle that you optimize around. And the one thing people don't realize is there's a, there's a third P that's like PPAP. And the last P is pain in the ass to program. And, and that's that is very important for like. People making AI hardware, right?
[01:43:11] Dylan Patel: Like, TPU, without the hundreds of people that work on the compiler, and JAX, and XLA, and all these sorts of things, would be a pain in the ass to program. But Google's got that, like, plumbing. Now, if you look across the ecosystem, everything else is a pain in the ass to program compared to NVIDIA, right? And, and, and this applies to the, to the Grok chip as well, right?
[01:43:31] Dylan Patel: So, yeah, question is, like, can the compiler team get performance up anywhere close to theoretical? And then, and then can they make it not a pain in the ass to support new models? Cool. We
[01:43:41] Alessio: got a question, we got a question from Ali. What's the average VLIW bundle occupancy of Grok? Bro,
[01:43:49] Dylan Patel: get out of here.
[01:43:52] Alessio: I don't know if he's setting you up, or if he
[01:43:54] Dylan Patel: wants to chime in. I think he's setting me up, I think he's setting me up. So, okay,
[01:43:58] swyx + Josh Albrecht: what is VLIW for
[01:44:00] Dylan Patel: the rest of us? It's, it's like very long instruction word is basically what it means. And, hm. So, so, GPUs are relatively simple, right? They're, they're tiny little cores, very simple instructions, there's a shitload of them, right?
[01:44:16] Dylan Patel: CPUs, you know, they have a, they have a known instruction set, right? x86. It's very complicated but people have worked on it for decades. VLIW processors are very unique in that sense, right? Like and your question, Ali, I cannot answer that question. I have no clue. Is it documented anywhere online?
[01:44:35] Dylan Patel: Anyway, so like the systolic array, right? Like there's, within the TPU, there's a bunch of stuff, but the actual matrix multiply unit, it's called the MXU, and it's a VLIW architecture as well. It's and I'm, I'm just trying to find a, yeah, I just want to find something that makes me not sound like an idiot.
[01:44:51] swyx + Josh Albrecht: Sometimes I also like to ballpark things in terms of like, like where a good middle median value should be and where like a good high value should be. Sorry. You, you, you
[01:45:03] Dylan Patel: can ballpark things like, you know, like, yeah, so, so, so, but basically like the, the point is like you're trading this is optimal, this is theoretically the most optimal architecture for performance power and area in a given, and you know, not, not specifically Grok, but VLIW in general is gonna get you closer to optimal there, but then you're giving off, you know, that, that last P, which is pain in the ass program, is, is I think the most simple way to get into it.
[01:45:27] Dylan Patel: There's like, computer architecture books about this, but it's, it's, it's a little little, little complicated, right? Yeah.
[01:45:35] Alessio: Somebody asked, there's a lot of questions, that's great. Can we talk about LPU, Cerebrus, Tenstorin, some of these other architectures. How should people think about Maxim, SRAM versus Mix versus
[01:45:49] Dylan Patel: Yeah, yeah.
[01:45:50] Dylan Patel: So there's a lot of ML hardware out there, new and old, right? There's old stuff that's trying to compete there's new stuff that's coming up, you know, companies like, like MadX and Lumerium Labs and so on and so forth, right? You know but, but, so, so there's like a continuum of like, everyone before, say, two years ago that was doing ML hardware bet in one direction, right?
[01:46:11] Dylan Patel: We're gonna make it as an architecture that is, that is has more on chip memory than NVIDIA, right? Like, that was the general bet everyone made. Right? And so like Grok made that bet, they made it to the extreme, right? They didn't have any off chip memory at all. Only on chip memory. You have, you have Cerebrists who did a similar thing except, they were like, Yeah, we're gonna have on chip memory, but we're gonna make a chip that's the size of a wafer.
[01:46:33] Dylan Patel: Right? Like literally this big. Whereas an NVIDIA chip is roughly this big, right? So it's like this big, it's the only chip in the world that's that big. But again, same bet. More on chip memory, less off chip, right? GraphCore and SambaNova made a similar bet. And, and every, basically everyone made that bet.
[01:46:49] Dylan Patel: Cause they thought that's where ML would go. Of course, models grew faster than anyone ever imagined. Yeah, than the memory that was possible. And so that, that very quickly became the wrong bet. And so now we're, you know, sort of seeing a new wave of startups that are going to bet on the other side, as well as many other, you know, architectural things because memory is not really the only architectural thing, of course.
[01:47:08] Dylan Patel: And so, like, where to, where to, like, place startups is, is very dependent on, like, Hey, what are you doing differently than NVIDIA? And is NVIDIA just going to implement that in their chip next year, right? Or, or some version of that. That's, like, pretty much the only things to think about when looking at, you know, hardware companies now.
[01:47:27] Dylan Patel: Cool.
[01:47:28] Alessio: And, yeah, I, I think the, the question is like, there's the size of the models that got outrun, but now you're doing all this work at the compiler level, but it's very transformer based, everything they're doing on the optimization side. How, how do you think about that risk? Like, do you think it's okay for like a hardware company to take like architectural risk in terms of like, yeah, we assume transformers in two years, they'll still be pretty good.
[01:47:51] Alessio: But when you're like depreciating some of this cost of our life. For five years as a buyer.
[01:47:56] Dylan Patel: Yeah, yeah, that's, that's the biggest challenge with like some of the specialized hardware, right? It's like, I know my GPUs will be useful in four years or five years. Maybe not, like, super useful, but they'll be useful for something.
[01:48:07] Dylan Patel: But, there's no way to know that my hardware is going to be able to operate on whatever new model architecture that comes out in the next few years, right? Like, I, I, I like to joke transformers are all you need. And like everything else is like a waste of time. But, you know, I'm sure something better will come.
[01:48:26] Dylan Patel: Right? And, and, you know, you gotta have like, hardware is expensive and you own it for many years. Right? So you can't just like buy whatever's best for today's workload one time and then assume that workload is gonna stay stagnant. Cause that's a recipe to have your like hardware useless as soon as like things evolve.
[01:48:43] Dylan Patel: Right? Like imagine if someone like had hardware for LTSMs and. 2016 or whatever, right? Like, LSTMs. Yeah, LSTM, sorry. You look like an idiot, right? Because now it's not gonna work for, you know, the next architecture, right? As soon as BERT came out, right? For example. So yeah, it's, it's very anything super, super specialized is always at risk of, of being sort of obsoleted and useless.
[01:49:06] Dylan Patel: And, and sort of that's, that's the, that's the thought that like, hey, like, like Graphcore, right? Their chips are. Pretty decent at GNNs, right? Graph Neural Networks. They're actually pretty decent at that. But no one cares, right? So, congratulations, right? Like, you won, you won like the shortest midget, right?
[01:49:24] swyx + Josh Albrecht: Mentioning transformers is all you need. Gives us a nice opportunity to bring out one of your old tweets, but also mention Gemini. My old
[01:49:30] Dylan Patel: tweets, I'm scared. Recent
[01:49:33] swyx + Josh Albrecht: tweets. There's a lot of people talking about, like I think you had a tweet commenting on Gemini 1. 5. And the million token context where basically everyone was saying, like, okay, we need Mamba, we need RLUKV, or we need some other alternative architecture to scale to long context.
[01:49:48] swyx + Josh Albrecht: And Google comes out and says, no, we just, we scaled transformers to 10 million tokens. Easy. We, and, you know, like, I, I think that, that kind of, like, reflects on your thesis there a
[01:49:59] Dylan Patel: little bit. I guess, yeah. I mean, I don't know if I, if I have a coherent thesis, but it's, it's sure fun to, it's Who, who think that like, I, I, I just have an intense hatred for RAG.
[01:50:11] Dylan Patel: Right, like retrieval augmented generation is, is, is like the most like, I just have an intense like innate hatred for it. Wait, wait, you retweeted me
[01:50:18] swyx + Josh Albrecht: defending RAG in the White House press release. Yeah, yeah, yeah. Okay.
[01:50:21] Dylan Patel: But it's just fun,
[01:50:22] swyx + Josh Albrecht: it's all fun and games. Yeah, yeah, yeah, it's all fun and games.
[01:50:24] Dylan Patel: Yeah.
[01:50:25] Dylan Patel: No, no, no, I retweeted, I retweeted you because you memed the White House. I don't know if y'all saw the meme. Can you pull it up? Sure. Like the, the White House the White House put out this thing about like, They're getting very opinionated with this White House. Memory safety. I think it was effectively like, C is bad and Rust is good.
[01:50:39] Dylan Patel: It was like pretty wild that the White House put that out. And I mean like, like whatever that is, so, so, So
[01:50:46] swyx + Josh Albrecht: like, they just got very opinionated about prescribing languages to people. And so then I was, I just like started editing them. So I have stopped comparing RAG with long context and fine
[01:50:54] Dylan Patel: tuning.
[01:50:55] Dylan Patel: Wait, You said I retweeted you defending it. I thought you were hating on it. And that's why I retweeted it.
[01:51:00] swyx + Josh Albrecht: It's somewhat of a defense. Because everyone was like long context is killing RAG. And then I had future LLM should be sub quadratic. That's another one. And I actually messed with the fine print as well..
[01:51:11] Alessio: Let's see power benefits of SRAM dominant
[01:51:13] Dylan Patel: Yeah, yeah. So, so that's a good question, right? So, like, SRAM is on chip memory. Everyone's just using HBM. If you don't have to go to off chip memory, that'd be really efficient, right?
[01:51:23] Dylan Patel: Cause, cause you're, you're not moving bits around. But there's always the issue of you don't have enough memory, right? So, so you still have to move bits around constantly. And so that's the, that's the question. So, yeah, sure. If you, if you can not move data around as you compute, it's going to be fantastically efficient.
[01:51:39] Dylan Patel: That isn't really not really just easy or simple to do.
[01:51:42] Alessio: What do you think is going to be harder in the future, like getting more energy at cheaper costs or like getting more of this hardware
[01:51:48] Dylan Patel: to run? Yeah, I wonder, so someone was talking about this earlier but it's like here in the crowd and I'm looking right at him but he's complaining that journalists keep saying that you know, that, that, like misreporting about how data centers, or what data centers are doing to the environment.
[01:52:03] Dylan Patel: Right? Which I thought was quite funny, right? Cause, cause they're inundated by journalists talking about data centers like destroying the world. Anyways you know, that's not quite the case, right? But yeah, I don't know, like, the, the, the power is certainly going to be hard to get, but, you know, I think, I think if you just look at history, right?
[01:52:22] Dylan Patel: Like humanity, especially America, right? Like, power, power production and usage kept skyrocketing. From like the 1700s to like 1970s, and then it kind of flatlined from there, so why can't we like go back to the like growth stage, I guess is like the whole like mantra of like accelerationists, I guess.
[01:52:40] Dylan Patel: This is EAC, yep. Well I don't think it's EAC, I think it's like, like Sam Altman like wholly believes this too, right? Yeah. And I don't think he's EAC. So, but yeah, like, like, I don't think like, it's like things, it's like something to think about, right? Like. The US is going back to growing in energy usage whereas for the last like 40 years kind of were flat on energy usage.
[01:53:00] Dylan Patel: And what does that mean, right? Like, yeah.
[01:53:04] Alessio: Fair enough. There was another question on Marvel but kind of the, I think
[01:53:07] Dylan Patel: that's it's, it's, it's definitely like one of these three guys who are on the buy side that are asking this question. What, what, what you want to know if Marvel's stock is gonna go up?
[01:53:18] Dylan Patel: Yeah. So Marvell,
[01:53:19] Alessio: the, they're, they're doing the custom music for, for grok. They also do the tri too. And the, the Google CPU. Yeah. Any other, any other chip that they're working on that people should, should keep in mind. It's like, yeah. Any needle moving and it's any stock moving .
[01:53:34] Dylan Patel: Yeah, exactly. Exactly. They're, they're working on some more stuff.
[01:53:38] Dylan Patel: Yeah. I, I'll, I'll, I'll refrain from,
[01:53:40] Alessio: yeah. All right. Let's see other grok stuff we want to get it, get through. I don't think so. Alright, most of the other ones. Your view on edge compute hardware. Any real use cases for it?
[01:53:54] Dylan Patel: Yeah, I mean, I, I I have like a really like anti edge view. Yeah, let's hear it.
[01:53:58] Dylan Patel: Like, like, so many people are like, oh, I'm going to run this model on my phone or on my laptop and. I love how much it's raining. So now I can be horrible and you people won't leave. Like, I want you to try and leave this building. Captive audience. Seriously, should I start singing? Like, there's nothing you
[01:54:17] Alessio: can do.
[01:54:18] Alessio: You definitely, I'll stop you from that.
[01:54:19] Dylan Patel: Sorry, so edge hardware, right? Like, you know, people are like, I'm going to run this model on my phone or my laptop. It makes no sense to me. Cause Current hardware is not really capable of it. So you're gonna buy new hardware, to run whatever on the edge or you're gonna just run very, very small models.
[01:54:36] Dylan Patel: But in either case, you're, you're gonna end up with like the performance is really low, And then whatever you spent to run it locally, Like if you spent it in the cloud, it could service 10x the users, right? So you kind of like, SOL in terms of like, Economics of, of running things on the edge. And then like latency is like, for, for LLMs, right, for LLMs, it's like not that big of a deal relative to, like internet latency is not that big of a deal relative to the use of the model, right?
[01:55:08] Dylan Patel: Like the actual model operating, whether it's on edge hardware or cloud hardware. And cloud hardware is so much faster. So like edge hardware is not really able to like, have a measurable, appreciable, like advantage. Over, over cloud, cloud hardware. This applies to diffusion models, this applies to LLMs of course small models will be able to run, but not, not all, yeah.
[01:55:33] Dylan Patel: Cool.
[01:55:35] Alessio: Let's see. I guess you, you can now see them. Yeah, what chance do startups like MetaX fetch, or 5. 6? Haven't you
[01:55:41] swyx + Josh Albrecht: already reviewed
[01:55:41] Dylan Patel: them? Why don't you, why don't you answer? Yeah, we, we
[01:55:43] swyx + Josh Albrecht: actually, like, we have, Connections with Maddox and Lemurian. Yeah, yeah, yeah. We haven't, no. But Gavin is
[01:55:52] Alessio: Yeah, yeah, they said they don't want to talk publicly.
[01:55:55] Alessio: Oh, okay, okay.
[01:55:57] swyx + Josh Albrecht: When they open up, we can Sure,
[01:56:00] Alessio: sure. But do you think, like, I think the two,
[01:56:02] Dylan Patel: three Answer the question! What do you think of them?
[01:56:06] Alessio: I think, kind of, there's a couple things. It's like How do the other companies innovate against them? I think when you do a new Silicon, you're like, Oh, we're going to be so much better at this thing or like much faster, much cheaper.
[01:56:18] Alessio: But there's all the other curves going down on the macro environment at the same time. So if it takes you like five years before you were like a lot better, five years later, once you take the chip out, you're only comparing yourself to the five year advancement that the major companies had to. So then it's like, okay, the, we're going to have like the C300, whatever, from, from NVIDIA.
[01:56:37] Alessio: By the time some of these chips come up.
[01:56:40] Dylan Patel: What's after Z? What do you think is after Z in the road map? Because it's X, Y, Z, Anyways Yeah, yeah, it's like the age old problem, right? Like you build a chip, it has some cool thing, cool feature, and then like, a year later, NVIDIA has it in hardware, right? Has implemented some flavor of that in hardware.
[01:57:01] Dylan Patel: Or two generations out, right? Like, what idea are you going to have that NVIDIA can't implement, is like, really the question. It's like, you have to be fundamentally different in some way that holds through for, you know, four or five years, right? That's kind of the big issue. But, you know, like, those people have some ideas that are interesting, and yeah, maybe it'll work out, right?
[01:57:21] Dylan Patel: But it's going to be hard to fight NVIDIA, who one, doesn't consider them competition, right? They're worried about, like, Google and Amazon's chip. Right, they're not, and I guess to some extent AMD's chip, but like they're not really worried about you know, MADX or Etched or Grok or, you know, Positron or any of these folks.
[01:57:39] Alessio: How much of an advantage do they have by working closely with like OpenAI folks and then already knowing where some of the architecture decisions are going? And since those companies are like the biggest buyers and users of the
[01:57:51] Dylan Patel: chips, Yeah, I mean, like, you see, like, the most important sort of AI companies are obviously going to tell hardware vendors what they want you know, open AI and, you know, so on and so forth, right?
[01:58:02] Dylan Patel: They're just going to obviously tell them what they want and the startups aren't actually going to get anywhere close to as much feedback on what to do on, like, you know, very minute, low level stuff, right? So that's, that's the, that is a difficulty, right? Some startups, like, like, Maddox obviously have people who built, or worked on the largest models, like at Google, but then other startups might not have that advantage and so they're always gonna have that issue of like, hey, how do I get the feedback, or what's changing, what do they see down the pipeline that's, that I really need to be aware of and ready for when I design my hardware.
[01:58:37] Dylan Patel: Alright.
[01:58:38] Alessio: Every hardware shortage has eventually turned into a glut. Well, that'd be true of NVIDIA chips, it's so when, but also why.
[01:58:45] Dylan Patel: Absolutely, and I'm so excited to buy like H100s for like 1, 000, guys. No, that's not 000, but Yeah, everyone's gonna buy chips, right? Like, it's just the way semiconductors work, because the supply chain takes forever to build out.
[01:58:58] Dylan Patel: And it's, it's like a really weird thing, right? Like, so, so if the backlog of chips is a year, people will order, you know, Two years worth of what they want for the next year. It is like a very common thing. It's not just like this AI cycle, but like, like, like microcontrollers, right? Like the automotive companies, they order two years worth of what they needed for one year, just so they could get enough, right?
[01:59:21] Dylan Patel: Like, this is just like what happens in semiconductors when, when lead times lengthen, the, the purchases and inventory is sort of like double. Sorry. So, so these. The, the NVIDIA GPU shortage obviously is going to be rectified. And when it is everyone's sort of double orders will become extremely apparent, right?
[01:59:42] Dylan Patel: And, you know, you, you see like random companies out of nowhere being like, Yeah, we've got 32, 000 H100s on order, or we've got 10, 000 or 5, 000. And trust, they're not all they're not all real orders for one, but I think, I think the like bubble will continue on for a long time, right, like it's not, it's not going to end like this year, right, like people, people need AI, right, like I think everyone in this audience would agree, right, like there's no, there's no like immediate like end to the, to the bubble, right.
[02:00:09] Dylan Patel: Party like we're in 1995, not like 2000. Makes sense.
[02:00:12] Alessio: What's next? Thoughts on VLIW
[02:00:16] Dylan Patel: architectures? Oh, Y, Y, sorry, sorry, Y. The Y question, yeah, yeah. I think it's just because the supply chain expands so much, and then at the same time there will be no, like, economic, like, immediate economic thing for everyone, right?
[02:00:28] Dylan Patel: Like, some companies will continue to buy, like like an OpenAI or Meta will continue to buy, but then, like, All these random startups will, or a lot of them will not be able to continue to buy, right? So then, so then that like kind of leads to like, they'll pause for a little bit, right? Or like, I think in 2018, right?
[02:00:45] Dylan Patel: Like memory pricing was extremely high. Then all of a sudden Google, Microsoft, and Amazon all agreed, I don't, you know, You know, they don't, they won't, they won't say it's together, but they basically all agreed it like, within the same week to stop ordering memory. And within like a month, the price of memory started tanking like insane amounts, right?
[02:01:06] Dylan Patel: And like people claim, you know, all sorts of reasons why that was timed extremely well. But it was like very clear and people in the financial markets were able to make trades and everything, right? People stopped buying and it's not like their demand just dried up. It's just like they had a little bit of a demand slowdown and then they had enough inventory that they could like weather until like prices tanked.
[02:01:26] Dylan Patel: Because it's such an inelastic good, right? Yeah.
[02:01:29] swyx + Josh Albrecht: Thank you very much. That's it.
[02:01:35] AI Charlie: That concludes our audio segment this weekend. But if you're listening all the way to the end, we have two bonus segments for you. A conversation with Malin Nefe, Senior Vice President of AI at Capital One. We'll be speaking at the AI Leadership Track of the AI Engineer World's Far. And the recent Latent Space Personal AI Meetup featuring a lot of new AI wearables. Bee, Based Hardware, DeepGram MLE AI, and LangChain LangFriend and LangMem, Presented by another former guest, Harrison Chase. Watch out and take care.

Get full access to Latent Space at www.latent.space/subscribe

6 April 2024, 6:46 pm
42 minutes 58 seconds

Presenting the AI Engineer World's Fair — with Sam Schillace, Deputy CTO of Microsoft

TL;DR: You can now buy tickets, apply to speak, or join the expo for the biggest AI Engineer event of 2024. We’re gathering *everyone* you want to meet - see you this June.
In last year’s the Rise of the AI Engineer we put our money where our mouth was and announced the AI Engineer Summit, which fortunately went well:
With ~500 live attendees and over ~500k views online, the first iteration of the AI Engineer industry affair seemed to be well received. Competing in an expensive city with 3 other more established AI conferences in the fall calendar, we broke through in terms of in-person experience and online impact.
So at the end of Day 2 we announced our second event: the AI Engineer World’s Fair. The new website is now live, together with our new presenting sponsor:
We were delighted to invite both Ben Dunphy, co-organizer of the conference and Sam Schillace, the deputy CTO of Microsoft who wrote some of the first Laws of AI Engineering while working with early releases of GPT-4, on the pod to talk about the conference and how Microsoft is all-in on AI Engineering.
Rise of the Planet of the AI Engineer
Since the first AI Engineer piece, AI Engineering has exploded:
and the title has been adopted across OpenAI, Meta, IBM, and many, many other companies:
1 year on, it is clear that AI Engineering is not only in full swing, but is an emerging global industry that is successfully bridging the gap:
* between research and product,
* between general-purpose foundation models and in-context use-cases,
* and between the flashy weekend MVP (still great!) and the reliable, rigorously evaluated AI product deployed at massive scale, assisting hundreds of employees and driving millions in profit.
The greatly increased scope of the 2024 AI Engineer World’s Fair (more stages, more talks, more speakers, more attendees, more expo…) helps us reflect the growth of AI Engineering in three major dimensions:
* Global Representation: the 2023 Summit was a mostly-American affair. This year we plan to have speakers from top AI companies across five continents, and explore the vast diversity of approaches to AI across global contexts.
* Topic Coverage:
* In 2023, the Summit focused on the initial questions that the community wrestled with - LLM frameworks, RAG and Vector Databases, Code Copilots and AI Agents. Those are evergreen problems that just got deeper.
* This year the AI Engineering field has also embraced new core disciplines with more explicit focus on Multimodality, Evals and Ops, Open Source Models and GPU/Inference Hardware providers.
* Maturity/Production-readiness: Two new tracks are dedicated toward AI in the Enterprise, government, education, finance, and more highly regulated industries or AI deployed at larger scale:
* AI in the Fortune 500, covering at-scale production deployments of AI, and
* AI Leadership, a closed-door, side event for technical AI leaders to discuss engineering and product leadership challenges as VPs and Heads of AI in their respective orgs.
We hope you will join Microsoft and the rest of us as either speaker, exhibitor, or attendee, in San Francisco this June. Contact us with any enquiries that don’t fall into the categories mentioned below.
Show Notes
* Ben Dunphy
* 2023 Summit
* GitHub confirmed $100m ARR on stage
* History of World’s Fairs
* Sam Schillace
* Writely on Acquired.fm
* Early Lessons From GPT-4: The Schillace Laws
* Semantic Kernel
* Sam on Kevin Scott (Microsoft CTO)’s podcast in 2022
* AI Engineer World’s Fair (SF, Jun 25-27)
* Buy Super Early Bird tickets (Listeners can use LATENTSPACE for $100 off any ticket until April 8, or use GROUP if coming in 4 or more)
* Submit talks and workshops for Speaker CFPs (by April 8)
* Enquire about Expo Sponsorship (Asap.. selling fast)
Timestamps
* [00:00:16] Intro
* [00:01:04] 2023 AI Engineer Summit
* [00:03:11] Vendor Neutral
* [00:05:33] 2024 AIE World's Fair
* [00:07:34] AIE World's Fair: 9 Tracks
* [00:08:58] AIE World's Fair Keynotes
* [00:09:33] Introducing Sam
* [00:12:17] AI in 2020s vs the Cloud in 2000s
* [00:13:46] Syntax vs Semantics
* [00:14:22] Bill Gates vs GPT-4
* [00:16:28] Semantic Kernel and Schillace's Laws of AI Engineering
* [00:17:29] Orchestration: Break it into pieces
* [00:19:52] Prompt Engineering: Ask Smart to Get Smart
* [00:21:57] Think with the model, Plan with Code
* [00:23:12] Metacognition vs Stochasticity
* [00:24:43] Generating Synthetic Textbooks
* [00:26:24] Trade leverage for precision; use interaction to mitigate
* [00:27:18] Code is for syntax and process; models are for semantics and intent.
* [00:28:46] Hands on AI Leadership
* [00:33:18] Multimodality vs "Text is the universal wire protocol"
* [00:35:46] Azure OpenAI vs Microsoft Research vs Microsoft AI Division
* [00:39:40] On Satya
* [00:40:44] Sam at AI Leadership Track
* [00:42:05] Final Plug for Tickets & CFP
Transcript
[00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO in residence at Decibel Partners, and I'm joined by my co host Swyx, founder of Small
[00:00:16] Intro
[00:00:16] swyx: AI. Hey, hey, we're back again with a very special episode, this time with two guests and talking about the very in person events rather than online stuff.
[00:00:27] swyx: So first I want to welcome Ben Dunphy, who is my co organizer on AI engineer conferences. Hey, hey, how's it going? We have a very special guest. Anyone who's looking at the show notes and the title will preview this later. But I guess we want to set the context. We are effectively doing promo for the upcoming AI Engineer World's Fair that's happening in June.
[00:00:49] swyx: But maybe something that we haven't actually recapped much on the pod is just the origin of the AI Engineer Summit and why, what happens and what went down. Ben, I don't know if you'd like to start with the raw numbers that people should have in mind.
[00:01:04] 2023 AI Engineer Summit
[00:01:04] Ben Dunphy: Yeah, perhaps your listeners would like just a quick background on the summit.
[00:01:09] Ben Dunphy: I mean, I'm sure many folks have heard of our events. You know, you launched, we launched the AI Engineer Summit last June with your, your article kind of coining the term that was on the tip of everyone's tongue, but curiously had not been actually coined, which is the term AI Engineer, which is now many people's, Job titles, you know, we're seeing a lot more people come to this event, with the job description of AI engineer, with the job title of AI engineer so, is an event that you and I really talked about since February of 2023, when we met at a hackathon you organized we were both excited by this movement and it hasn't really had a name yet.
[00:01:48] Ben Dunphy: We decided that an event was warranted and that's why we move forward with the AI Engineer Summit, which Ended up being a great success. You know, we had over 5, 000 people apply to attend in person. We had over 9, 000 folks attend, online with over 20, 000 on the live stream.
[00:02:06] Ben Dunphy: In person, we accepted about 400 attendees and had speakers, workshop instructors and sponsors, all congregating in San Francisco over, two days, um, two and a half days with a, with a welcome reception. So it was quite the event to kick off kind of this movement that's turning into quite an exciting
[00:02:24] swyx: industry.
[00:02:25] swyx: The overall idea of this is that I kind of view AI engineering, at least in all my work in Latent Space and the other stuff, as starting an industry.
[00:02:34] swyx: And I think every industry, every new community, needs a place to congregate. And I definitely think that AI engineer, at least at the conference, is that it's meant to be like the biggest gathering of technical engineering people working with AI. Right. I think we kind of got that spot last year. There was a very competitive conference season, especially in San Francisco.
[00:02:54] swyx: But I think as far as I understand, in terms of cultural impact, online impact, and the speakers that people want to see, we, we got them all and it was very important for us to be a vendor neutral type of event. Right. , The reason I partnered with Ben is that Ben has a lot of experience, a lot more experience doing vendor neutral stuff.
[00:03:11] Vendor Neutral
[00:03:11] swyx: I first met you when I was speaking at one of your events, and now we're sort of business partners on that. And yeah, I mean, I don't know if you have any sort of Thoughts on make, making things vendor neutral, making things more of a community industry conference rather than like something that's owned by one company.
[00:03:25] swyx: Yeah.
[00:03:25] Ben Dunphy: I mean events that are owned by a company are great, but this is typically where you have product pitches and this smaller internet community. But if you want the truly internet community, if you want a more varied audience and you know, frankly, better content for, especially for a technical audience, you want a vendor neutral event. And this is because when you have folks that are running the event that are focused on one thing and one thing alone, which is quality, quality of content, quality of speakers, quality of the in person experience, and just of general relevance it really elevates everything to the next level.
[00:04:01] Ben Dunphy: And when you have someone like yourself who's coming To this content curation the role that you take at this event, and bringing that neutrality with, along with your experience, that really helps to take it to the next level, and then when you have someone like myself, focusing on just the program curation, and the in person experience, then both of our forces combined, we can like, really create this epic event, and so, these vendor neutral events if you've been to a small community event, Typically, these are vendor neutral, but also if you've been to a really, really popular industry event, many of the top industry events are actually vendor neutral.
[00:04:37] Ben Dunphy: And that's because of the fact that they're vendor neutral, not in spite of
[00:04:41] swyx: it. Yeah, I've been pretty open about the fact that my dream is to build the KubeCon of AI. So if anyone has been in the Kubernetes world, they'll understand what that means. And then, or, or instead of the NeurIPS, NeurIPS for engineers, where engineers are the stars and engineers are sharing their knowledge.
[00:04:57] swyx: Perspectives, because I think AI is definitely moving over from research to engineering and production. I think one of my favorite parts was just honestly having GitHub and Microsoft support, which we'll cover in a bit, but you know, announcing finally that GitHub's copilot was such a commercial success I think was the first time that was actually confirmed by anyone in public.
[00:05:17] swyx: For me, it's also interesting as sort of the conference curator to put Microsoft next to competitors some of which might be much smaller AI startups and to see what, where different companies are innovating in different areas.
[00:05:27] swyx: Well, they're next to
[00:05:27] Ben Dunphy: each other in the arena. So they can be next to each other on stage too.
[00:05:33] Why AIE World's Fair
[00:05:33] swyx: Okay, so this year World's Fair we are going a lot bigger what details are we disclosing right now? Yeah,
[00:05:39] Ben Dunphy: I guess we should start with the name why are we calling it the World's Fair? And I think we need to go back to what inspired this, what actually the original World's Fair was, which was it started in the late 1700s and went to the early 1900s.
[00:05:53] Ben Dunphy: And it was intended to showcase the incredible achievements. Of nation states, corporations, individuals in these grand expos. So you have these miniature cities actually being built for these grand expos. In San Francisco, for example, you had the entire Marina District built up in absolutely new construction to showcase the achievements of industry, architecture, art, and culture.
[00:06:16] Ben Dunphy: And many of your listeners will know that in 1893, the Nikola Tesla famously provided power to the Chicago World's Fair with his 8 seat power generator. There's lots of great movies and documentaries about this. That was the first electric World's Fair, which thereafter it was referred to as the White City.
[00:06:33] Ben Dunphy: So in today's world we have technological change that's similar to what was experienced during the industrial revolution in how it's, how it's just upending our entire life, how we live, work, and play. And so we have artificial intelligence, which has long been the dream of humanity.
[00:06:51] Ben Dunphy: It's, it's finally here. And the pace of technological change is just accelerating. So with this event, as you mentioned, we, we're aiming to create a singular event where the world's foremost experts, builders, and practitioners can come together to exchange and reflect. And we think this is not only good for business, but it's also good for our mental health.
[00:07:12] Ben Dunphy: It slows things down a bit from the Twitter news cycle to an in person festival of smiles, handshakes, connections, and in depth conversations that online media and online events can only ever dream of replicating. So this is an expo led event where the world's top companies will mingle with the world's top founders and AI engineers who are building and enhanced by AI.
[00:07:34] AIE World's Fair: 9 Tracks
[00:07:34] Ben Dunphy: And not to mention, we're featuring over a hundred talks and workshops across
[00:07:37] swyx: nine tracks. Yeah, I mean, those nine tracks will be fun. Actually, do we have a little preview of the tracks in the, the speakers?
[00:07:43] Ben Dunphy: We do. Folks can actually see them today at our website. We've updated that at ai.
[00:07:48] Ben Dunphy: engineer. So we'd encourage them to go there to see that. But for those just listening, we have nine tracks. So we have multimodality. We have retrieval augmented generation. Featuring LLM frameworks and vector databases, evals and LLM ops, open source models, code gen and dev tools, GPUs and inference, AI agent applications, AI in the fortune 500, and then we have a special track for AI leadership which you can access by purchasing the VP pass which is different from the, the other passes we have.
[00:08:20] Ben Dunphy: And I won't go into the Each of these tracks in depth, unless you want to, Swyx but there's more details on the website at ai. engineer.
[00:08:28] swyx: I mean, I, I, very much looking forward to talking to our special guests for the last track, I think, which is the what a lot of yeah, leaders are thinking about, which is how to, Inspire innovation in their companies, especially the sort of larger organizations that might not have the in house talents for that kind of stuff.
[00:08:47] swyx: So yeah, we can talk about the expo, but I'm very keen to talk about the presenting sponsor if you want to go slightly out of order from our original plan.
[00:08:58] AIE World's Fair Keynotes
[00:08:58] Ben Dunphy: Yeah, absolutely. So you know, for the stage of keynotes, we have talks confirmed from Microsoft, OpenAI, AWS, and Google.
[00:09:06] Ben Dunphy: And our presenting sponsor is joining the stage with those folks. And so that presenting sponsor this year is a dream sponsor. It's Microsoft. It's the company really helping to lead the charge. And into this wonderful new era that we're all taking part in. So, yeah,
[00:09:20] swyx: you know, a bit of context, like when we first started planning this thing, I was kind of brainstorming, like, who would we like to get as the ideal presenting sponsors, as ideal partners long term, just in terms of encouraging the AI engineering industry, and it was Microsoft.
[00:09:33] Introducing Sam
[00:09:33] swyx: So Sam, I'm very excited to welcome you onto the podcast. You are CVP and Deputy CTO of Microsoft. Welcome.
[00:09:40] Sam Schillace: Nice to be here. I'm looking forward to, I was looking for, to Lessio saying my last name correctly this time. Oh
[00:09:45] swyx: yeah. So I, I studiously avoided saying, saying your last name, but apparently it's an Italian last name.
[00:09:50] swyx: Ski Lache. Ski
[00:09:51] Alessio: Lache. Yeah. No, that, that's great, Sean. That's great as a musical person.
[00:09:54] swyx: And it, it's also, yeah, I pay attention to like the, the, the lilt. So it's ski lache and the, the slow slowing of the law is, is what I focused
[00:10:03] Sam Schillace: on. You say both Ls. There's no silent letters, you say
[00:10:07] Alessio: both of those. And it's great to have you, Sam.
[00:10:09] Alessio: You know, we've known each other now for a year and a half, two years, and our first conversation, well, it was at Lobby Conference, and then we had a really good one in the kind of parking lot of a Safeway, because we didn't want to go into Starbucks to meet, so we sat outside for about an hour, an hour and a half, and then you had to go to a Bluegrass concert, so it was great.
[00:10:28] Alessio: Great meeting, and now, finally, we have you on Lanespace.
[00:10:31] Sam Schillace: Cool, cool. Yeah, I'm happy to be here. It's funny, I was just saying to Swyx before you joined that, like, it's kind of an intimidating podcast. Like, when I listen to this podcast, it seems to be, like, one of the more intelligent ones, like, more, more, like, deep technical folks on it.
[00:10:44] Sam Schillace: So, it's, like, it's kind of nice to be here. It's fun. Bring your A game. Hopefully I'll, I'll bring mine. I
[00:10:49] swyx: mean, you've been programming for longer than some of our listeners have been alive, so I don't think your technical chops are in any doubt. So you were responsible for Rightly as one of your early wins in your career, which then became Google Docs, and obviously you were then responsible for a lot more G Suite.
[00:11:07] swyx: But did you know that you covered in Acquired. fm episode 9, which is one of the podcasts that we model after.
[00:11:13] Sam Schillace: Oh, cool. I didn't, I didn't realize that the most fun way to say this is that I still have to this day in my personal GDocs account, the very first Google doc, like I actually have it.
[00:11:24] Sam Schillace: And I looked it up, like it occurred to me like six months ago that it was probably around and I went and looked and it's still there. So it's like, and it's kind of a funny thing. Cause it's like the backend has been rewritten at least twice that I know of the front end has been re rewritten at least twice that I know of.
[00:11:38] Sam Schillace: So. I'm not sure what sense it's still the original one it's sort of more the idea of the original one, like the NFT of it would probably be more authentic. I
[00:11:46] swyx: still have it. It's a ship athesia thing. Does it, does it say hello world or something more mundane?
[00:11:52] Sam Schillace: It's, it's, it's me and Steve Newman trying to figure out if some collaboration stuff is working, and also a picture of Edna from the Incredibles that I probably pasted in later, because that's That's too early for that, I think.
[00:12:05] swyx: People can look up your LinkedIn, and we're going to link it on the show notes, but you're also SVP of engineering for Box, and then you went back to Google to do Google, to lead Google Maps, and now you're deputy CTO.
[00:12:17] AI in 2020s vs the Cloud in 2000s
[00:12:17] swyx: I mean, there's so many places to start, but maybe one place I like to start off with is do you have a personal GPT 4 experience.
[00:12:25] swyx: Obviously being at Microsoft, you have, you had early access and everyone talks about Bill Gates's
[00:12:30] Sam Schillace: demo. Yeah, it's kind of, yeah, that's, it's kind of interesting. Like, yeah, we got access, I got access to it like in September of 2022, I guess, like before it was really released. And I it like almost instantly was just like mind blowing to me how good it was.
[00:12:47] Sam Schillace: I would try experiments like very early on, like I play music. There's this thing called ABC notation. That's like an ASCII way to represent music. And like, I was like, I wonder if it can like compose a fiddle tune. And like it composed a fiddle tune. I'm like, I wonder if it can change key, change the key.
[00:13:01] Sam Schillace: Like it's like really, it was like very astonishing. And I sort of, I'm very like abstract. My background is actually more math than CS. I'm a very abstract thinker and sort of categorical thinker. And the, the thing that occurred to me with, with GPT 4 the first time I saw it was. This is really like the beginning, it's the beginning of V2 of the computer industry completely.
[00:13:23] Sam Schillace: I had the same feeling I had when, of like a category shifting that I had when the cloud stuff happened with the GDocs stuff, right? Where it's just like, all of a sudden this like huge vista opens up of capabilities. And I think the way I characterized it, which is a little bit nerdy, but I'm a nerd so lean into it is like everything until now has been about syntax.
[00:13:46] Syntax vs Semantics
[00:13:46] Sam Schillace: Like, we have to do mediation. We have to describe the real world in forms that the digital world can manage. And so we're the mediation, and we, like, do that via things like syntax and schema and programming languages. And all of a sudden, like, this opens the door to semantics, where, like, you can express intention and meaning and nuance and fuzziness.
[00:14:04] Sam Schillace: And the machine itself is doing, the model itself is doing a bunch of the mediation for you. And like, that's obviously like complicated. We can talk about the limits and stuff, and it's getting better in some ways. And we're learning things and all kinds of stuff is going on around it, obviously.
[00:14:18] Sam Schillace: But like, that was my immediate reaction to it was just like, Oh my God.
[00:14:22] Bill Gates vs GPT-4
[00:14:22] Sam Schillace: Like, and then I heard about the build demo where like Bill had been telling Kevin Scott this, This investment is a waste. It's never going to work. AI is blah, blah, blah. And come back when it can pass like an AP bio exam.
[00:14:33] Sam Schillace: And they actually literally did that at one point, they brought in like the world champion of the, like the AP bio test or whatever the AP competition and like it and chat GPT or GPT 4 both did the AP bio and GPT 4 beat her. So that was the moment that convinced Bill that this was actually real.
[00:14:53] Sam Schillace: Yeah, it's fun. I had a moment with him actually about three weeks after that when we had been, so I started like diving in on developer tools almost immediately and I built this thing with a small team that's called the Semantic Kernel which is one of the very early orchestrators just because I wanted to be able to put code and And inference together.
[00:15:10] Sam Schillace: And that's probably something we should dig into more deeply. Cause I think there's some good insights in there, but I I had a bunch of stuff that we were building and then I was asked to go meet with Bill Gates about it and he's kind of famously skeptical and, and so I was a little bit nervous to meet him the first time.
[00:15:25] Sam Schillace: And I started the conversation with, Hey, Bill, like three weeks ago, you would have called BS on everything I'm about to show you. And I would probably have agreed with you, but we've both seen this thing. And so we both know it's real. So let's skip that part and like, talk about what's possible.
[00:15:39] Sam Schillace: And then we just had this kind of fun, open ended conversation and I showed him a bunch of stuff. So that was like a really nice, fun, fun moment as well. Well,
[00:15:46] swyx: that's a nice way to meet Bill Gates and impress
[00:15:48] Sam Schillace: him. A little funny. I mean, it's like, I wasn't sure what he would think of me, given what I've done and his.
[00:15:54] Sam Schillace: Crown Jewel. But he was nice. I think he likes
[00:15:59] swyx: GDocs. Crown Jewel as in Google Docs versus Microsoft Word? Office.
[00:16:03] Sam Schillace: Yeah. Yeah, versus Office. Yeah, like, I think, I mean, I can imagine him not liking, I met Steven Snofsky once and he sort of respectfully, but sort of grimaced at me. You know, like, because of how much trauma I had caused him.
[00:16:18] Sam Schillace: So Bill was very nice to
[00:16:20] swyx: me. In general it's like friendly competition, right? They keep you, they keep you sharp, you keep each
[00:16:24] Sam Schillace: other sharp. Yeah, no, I think that's, it's definitely respect, it's just kind of funny.
[00:16:28] Semantic Kernel and Schillace's Laws of AI Engineering
[00:16:28] Sam Schillace: Yeah,
[00:16:28] swyx: So, speaking of semantic kernel, I had no idea that you were that deeply involved, that you actually had laws named after you.
[00:16:35] swyx: This only came up after looking into you for a little bit. Skelatches laws, how did those, what's the, what's the origin
[00:16:41] Sam Schillace: story? Hey! Yeah, that's kind of funny. I'm actually kind of a modest person and so I'm sure I feel about having my name attached to them. Although I do agree with all, I believe all of them because I wrote all of them.
[00:16:49] Sam Schillace: This is like a designer, John Might, who works with me, decided to stick my name on them and put them out there. Seriously, but like, well, but like, so this was just I, I'm not, I don't build models. Like I'm not an AI engineer in the sense of, of like AI researcher that's like doing inference. Like I'm somebody who's like consuming the models.
[00:17:09] Sam Schillace: Exactly. So it's kind of funny when you're talking about AI engineering, like it's a good way of putting it. Cause that's how like I think about myself. I'm like, I'm an app builder. I just want to build with this tool. Yep. And so we spent all of the fall and into the winter in that first year, like Just trying to build stuff and learn how this tool worked.
[00:17:29] Orchestration: Break it into pieces
[00:17:29] Sam Schillace: And I guess those are a little bit in the spirit of like Robert Bentley's programming pearls or something. I was just like, let's kind of distill some of these ideas down of like. How does this thing work? I saw something I still see today with people doing like inference is still kind of expensive.
[00:17:46] Sam Schillace: GPUs are still kind of scarce. And so people try to get everything done in like one shot. And so there's all this like prompt tuning to get things working. And one of the first laws was like, break it into pieces. Like if it's hard for you, it's going to be hard for the model. But if it's you know, there's this kind of weird thing where like, it's.
[00:18:02] Sam Schillace: It's absolutely not a human being, but starting to think about, like, how would I solve the problem is often a good way to figure out how to architect the program so that the model can solve the problem. So, like, that was one of the first laws. That came from me just trying to, like, replicate a test of a, like, a more complicated, There's like a reasoning process that you have to go through that, that Google was, was the react, the react thing, and I was trying to get GPT 4 to do it on its own.
[00:18:32] Sam Schillace: And, and so I'd ask it the question that was in this paper, and the answer to the question is like the year 2000. It's like, what year did this particular author who wrote this book live in this country? And you've kind of got to carefully reason through it. And like, I could not get GPT 4 to Just to answer the question with the year 2000.
[00:18:50] Sam Schillace: And if you're thinking about this as like the kernel is like a pipelined orchestrator, right? It's like very Unix y, where like you have a, some kind of command and you pipe stuff to the next parameters and output to the next thing. So I'm thinking about this as like one module in like a pipeline, and I just want it to give me the answer.
[00:19:05] Sam Schillace: I don't want anything else. And I could not prompt engineer my way out of that. I just like, it was giving me a paragraph or reasoning. And so I sort of like anthropomorphized a little bit and I was like, well, the only way you can think about stuff is it can think out loud because there's nothing else that the model does.
[00:19:19] Sam Schillace: It's just doing token generation. And so it's not going to be able to do this reasoning if it can't think out loud. And that's why it's always producing this. But if you take that paragraph of output, which did get to the right answer and you pipe it into a second prompt. That just says read this conversation and just extract the answer and report it back.
[00:19:38] Sam Schillace: That's an easier task. That would be an easier task for you to do or me to do. It's easier reasoning. And so it's an easier thing for the model to do and it's much more accurate. And that's like 100 percent accurate. It always does that. So like that was one of those, those insights on the that led to the, the choice loss.
[00:19:52] Prompt Engineering: Ask Smart to Get Smart
[00:19:52] Sam Schillace: I think one of the other ones that's kind of interesting that I think people still don't fully appreciate is that GPT 4 is the rough equivalent of like a human being sitting down for centuries or millennia and reading all the books that they can find. It's this vast mind, right, and the embedding space, the latent space, is 100, 000 K, 100, 000 dimensional space, right?
[00:20:14] Sam Schillace: Like it's this huge, high dimensional space, and we don't have good, um, Intuition about high dimensional spaces, like the topology works in really weird ways, connectivity works in weird ways. So a lot of what we're doing is like aiming the attention of a model into some part of this very weirdly connected space.
[00:20:30] Sam Schillace: That's kind of what prompt engineering is. But that kind of, like, what we observed to begin with that led to one of those laws was You know, ask smart to get smart. And I think we've all, we all understand this now, right? Like this is the whole field of prompt engineering. But like, if you ask like a simple, a simplistic question of the model, you'll get kind of a simplistic answer.
[00:20:50] Sam Schillace: Cause you're pointing it at a simplistic part of that high dimensional space. And if you ask it a more intelligent question, you get more intelligent stuff back out. And so I think that's part of like how you think about programming as well. It's like, how are you directing the attention of the model?
[00:21:04] Sam Schillace: And I think we still don't have a good intuitive feel for that. To me,
[00:21:08] Alessio: the most interesting thing is how do you tie the ask smart, get smart with the syntax and semantics piece. I gave a talk at GDC last week about the rise of full stack employees and how these models are like semantic representation of tasks that people do.
[00:21:23] Alessio: But at the same time, we have code. Also become semantic representation of code. You know, I give you the example of like Python that sort it's like really a semantic function. It's not code, but it's actually code underneath. How do you think about tying the two together where you have code?
[00:21:39] Alessio: To then extract the smart parts so that you don't have to like ask smart every time and like kind of wrap them in like higher level functions.
[00:21:46] Sam Schillace: Yeah, this is, this is actually, we're skipping ahead to kind of later in the conversation, but I like to, I usually like to still stuff down in these little aphorisms that kind of help me remember them.
[00:21:57] Think with the model, Plan with Code
[00:21:57] Sam Schillace: You know, so we can dig into a bunch of them. One of them is pixels are free, one of them is bots are docs. But the one that's interesting here is Think with the model, plan with code. And so one of the things, so one of the things we've realized, we've been trying to do lots of these like longer running tasks.
[00:22:13] Sam Schillace: Like we did this thing called the infinite chatbot, which was the successor to the semantic kernel, which is an internal project. It's a lot like GPTs. The open AI GPT is, but it's like a little bit more advanced in some ways, kind of deep exploration of a rag based bot system. And then we did multi agents from that, trying to do some autonomy stuff and we're, and we're kind of banging our head against this thing.
[00:22:34] Sam Schillace: And you know, one of the things I started to realize, this is going to get nerdy for a second. I apologize, but let me dig in on it for just a second. No apology needed. Um, we realized is like, again, this is a little bit of an anthropomorphism and an illusion that we're having. So like when we look at these models, we think there's something continuous there.
[00:22:51] Sam Schillace: We're having a conversation with chat GPT or whatever with Azure open air or like, like what's really happened. It's a little bit like watching claymation, right? Like when you watch claymation, you don't think that the model is actually the clay model is actually really alive. You know, that there's like a bunch of still disconnected slot screens that your mind is connecting into a continuous experience.
[00:23:12] Metacognition vs Stochasticity
[00:23:12] Sam Schillace: And that's kind of the same thing that's going on with these models. Like they're all the prompts are disconnected no matter what. Which means you're putting a lot of weight on memory, right? This is the thing we talked about. You're like, you're putting a lot of weight on precision and recall of your memory system.
[00:23:27] Sam Schillace: And so like, and it turns out like, because the models are stochastic, they're kind of random. They'll make stuff up if things are missing. If you're naive about your, your memory system, you'll get lots of like accumulated similar memories that will kind of clog the system, things like that. So there's lots of ways in which like, Memory is hard to manage well, and, and, and that's okay.
[00:23:47] Sam Schillace: But what happens is when you're doing plans and you're doing these longer running things that you're talking about, that second level, the metacognition is very vulnerable to that stochastic noise, which is like, I totally want to put this on a bumper sticker that like metacognition is susceptible to stochasticity would be like the great bumper sticker.
[00:24:07] Sam Schillace: So what, these things are very vulnerable to feedback loops when they're trying to do autonomy, and they're very vulnerable to getting lost. So we've had these, like, multi agent Autonomous agent things get kind of stuck on like complimenting each other, or they'll get stuck on being quote unquote frustrated and they'll go on strike.
[00:24:22] Sam Schillace: Like there's all kinds of weird like feedback loops you get into. So what we've learned to answer your question of how you put all this stuff together is You have to, the model's good at thinking, but it's not good at planning. So you do planning in code. So you have to describe the larger process of what you're doing in code somehow.
[00:24:38] Sam Schillace: So semantic intent or whatever. And then you let the model kind of fill in the pieces.
[00:24:43] Generating Synthetic Textbooks
[00:24:43] Sam Schillace: I'll give a less abstract like example. It's a little bit of an old example. I did this like last year, but at one point I wanted to see if I could generate textbooks. And so I wrote this thing called the textbook factory.
[00:24:53] Sam Schillace: And it's, it's tiny. It's like a Jupyter notebook with like. You know, 200 lines of Python and like six very short prompts, but what you basically give it a sentence. And it like pulls out the topic and the level of, of, from that sentence, so you, like, I would like fifth grade reading. I would like eighth grade English.
[00:25:11] Sam Schillace: His English ninth grade, US history, whatever. That by the way, all, all by itself, like would've been an almost impossible job like three years ago. Isn't, it's like totally amazing like that by itself. Just parsing an arbitrary natural language sentence to get these two pieces of information out is like almost trivial now.
[00:25:27] Sam Schillace: Which is amazing. So it takes that and it just like makes like a thousand calls to the API and it goes and builds a full year textbook, like decides what the curriculum is with one of the prompts. It breaks it into chapters. It writes all the lessons and lesson plans and like builds a teacher's guide with all the answers to all the questions.
[00:25:42] Sam Schillace: It builds a table of contents, like all that stuff. It's super reliable. You always get a textbook. It's super brittle. You never get a cookbook or a novel like but like you could kind of define that domain pretty care, like I can describe. The metacognition, the high level plan for how do you write a textbook, right?
[00:25:59] Sam Schillace: You like decide the curriculum and then you write all the chapters and you write the teacher's guide and you write the table content, like you can, you can describe that out pretty well. And so having that like code exoskeleton wrapped around the model is really helpful, like it keeps the model from drifting off and then you don't have as many of these vulnerabilities around memory that you would normally have.
[00:26:19] Sam Schillace: So like, that's kind of, I think where the syntax and semantics comes together right now.
[00:26:24] Trade leverage for precision; use interaction to mitigate
[00:26:24] Sam Schillace: And then I think the question for all of us is. How do you get more leverage out of that? Right? So one of the things that I don't love about virtually everything anyone's built for the last year and a half is people are holding the hands of the model on everything.
[00:26:37] Sam Schillace: Like the leverage is very low, right? You can't turn. These things loose to do anything really interesting for very long. You can kind of, and the places where people are getting more work out per unit of work in are usually where somebody has done exactly what I just described. They've kind of figured out what the pattern of the problem is in enough of a way that they can write some code for it.
[00:26:59] Sam Schillace: And then that that like, so I've seen like sales support stuff. I've seen like code base tuning stuff of like, there's lots of things that people are doing where like, you can get a lot of value in some relatively well defined domain using a little bit of the model's ability to think for you and a little, and a little bit of code.
[00:27:18] Code is for syntax and process; models are for semantics and intent.
[00:27:18] Sam Schillace: And then I think the next wave is like, okay, do we do stuff like domain specific languages to like make the planning capabilities better? Do we like start to build? More sophisticated primitives. We're starting to think about and talk about like power automate and a bunch of stuff inside of Microsoft that we're going to wrap in these like building blocks.
[00:27:34] Sam Schillace: So the models have these chunks of reliable functionality that they can invoke as part of these plans, right? Because you don't want like, if you're going to ask the model to go do something and the output's going to be a hundred thousand lines of code, if it's got to generate that code every time, the randomness, the stochasticity is like going to make that basically not reliable.
[00:27:54] Sam Schillace: You want it to generate it like a 10 or 20 line high level semantic plan for this thing that gets handed to some markup executor that runs it and that invokes that API, that 100, 000 lines of code behind it, API call. And like, that's a really nice robust system for now. And then as the models get smarter as new models emerge, then we get better plans, we get more sophistication.
[00:28:17] Sam Schillace: In terms of what they can choose, things like that. Right. So I think like that feels like that's probably the path forward for a little while, at least, like there was, there was a lot there. I, sorry, like I've been thinking, you can tell I've been thinking about it a lot. Like this is kind of all I think about is like, how do you build.
[00:28:31] Sam Schillace: Really high value stuff out of this. And where do we go? Yeah. The, the role where
[00:28:35] swyx: we are. Yeah. The intermixing of code and, and LMS is, is a lot of the role of the AI engineer. And I, I, I think in a very real way, you were one of the first to, because obviously you had early access. Honestly, I'm surprised.
[00:28:46] Hands on AI Leadership
[00:28:46] swyx: How are you so hands on? How do you choose to, to dedicate your time? How do you advise other tech leaders? Right. You know, you, you are. You have people working for you, you could not be hands on, but you seem to be hands on. What's the allocation that people should have, especially if they're senior tech
[00:29:03] Sam Schillace: leaders?
[00:29:04] Sam Schillace: It's mostly just fun. Like, I'm a maker, and I like to build stuff. I'm a little bit idiosyncratic. I I've got ADHD, and so I won't build anything. I won't work on anything I'm bored with. So I have no discipline. If I'm not actually interested in the thing, I can't just, like, do it, force myself to do it.
[00:29:17] Sam Schillace: But, I mean, if you're not interested in what's going on right now in the industry, like, go find a different industry, honestly. Like, I seriously, like, this is, I, well, it's funny, like, I don't mean to be snarky, but, like, I was at a dinner, like, a, I don't know, six months ago or something, And I was sitting next to a CTO of a large, I won't name the corporation because it would name the person, but I was sitting next to the CTO of a very large Japanese technical company, and he was like, like, nothing has been interesting since the internet, and this is interesting now, like, this is fun again.
[00:29:46] Sam Schillace: And I'm like, yeah, totally, like this is like, the most interesting thing that's happened in 35 years of my career, like, we can play with semantics and natural language, and we can have these things that are like sort of active, can kind of be independent in certain ways and can do stuff for us and can like, reach all of these interesting problems.
[00:30:02] Sam Schillace: So like that's part of it of it's just kind of fun to, to do stuff and to build stuff. I, I just can't, can't resist. I'm not crazy hands-on, like, I have an eng like my engineering team's listening right now. They're like probably laughing 'cause they, I never, I, I don't really touch code directly 'cause I'm so obsessive.
[00:30:17] Sam Schillace: I told them like, if I start writing code, that's all I'm gonna do. And it's probably better if I stay a little bit high level and like, think about. I've got a really great couple of engineers, a bunch of engineers underneath me, a bunch of designers underneath me that are really good folks that we just bounce ideas off of back and forth and it's just really fun.
[00:30:35] Sam Schillace: That's the role I came to Microsoft to do, really, was to just kind of bring some energy around innovation, some energy around consumer, We didn't know that this was coming when I joined. I joined like eight months before it hit us, but I think Kevin might've had an idea it was coming. And and then when it hit, I just kind of dove in with both feet cause it's just so much fun to do.
[00:30:55] Sam Schillace: Just to tie it back a little bit to the, the Google Docs stuff. When we did rightly originally the world it's not like I built rightly in jQuery or anything. Like I built that thing on bare metal back before there were decent JavaScript VMs.
[00:31:10] Sam Schillace: I was just telling somebody today, like you were rate limited. So like just computing the diff when you type something like doing the string diff, I had to write like a binary search on each end of the string diff because like you didn't have enough iterations of a for loop to search character by character.
[00:31:24] Sam Schillace: I mean, like that's how rough it was none of the browsers implemented stuff directly, whatever. It's like, just really messy. And like, that's. Like, as somebody who's been doing this for a long time, like, that's the place where you want to engage, right? If things are easy, and it's easy to go do something, it's too late.
[00:31:42] Sam Schillace: Even if it's not too late, it's going to be crowded, but like the right time to do something new and disruptive and technical is, first of all, still when it's controversial, but second of all, when you have this, like, you can see the future, you ask this, like, what if question, and you can see where it's going, But you have this, like, pit in your stomach as an engineer as to, like, how crappy this is going to be to do.
[00:32:04] Sam Schillace: Like, that's really the right moment to engage with stuff. We're just like, this is going to suck, it's going to be messy, I don't know what the path is, I'm going to get sticks and thorns in my hair, like I, I, it's going to have false starts, and I don't really, I'm going to This is why those skeletchae laws are kind of funny, because, like, I, I, like You know, I wrote them down at one point because they were like my best guess, but I'm like half of these are probably wrong, and I think they've all held up pretty well, but I'm just like guessing along with everybody else, we're just trying to figure this thing out still, right, and like, and I think the only way to do that is to just engage with it.
[00:32:34] Sam Schillace: You just have to like, build stuff. If you're, I can't tell you the number of execs I've talked to who have opinions about AI and have not sat down with anything for more than 10 minutes to like actually try to get anything done. You know, it's just like, it's incomprehensible to me that you can watch this stuff through the lens of like the press and forgive me, podcasts and feel like you actually know what you're talking about.
[00:32:59] Sam Schillace: Like, you have to like build stuff. Like, break your nose on stuff and like figure out what doesn't work.
[00:33:04] swyx: Yeah, I mean, I view us as a starting point, as a way for people to get exposure on what we're doing. They should be looking at, and they still have to do the work as do we. Yeah, I'll basically endorse, like, I think most of the laws.
[00:33:18] Multimodality vs "Text is the universal wire protocol"
[00:33:18] swyx: I think the one I question the most now is text is the universal wire protocol. There was a very popular article, a text that used a universal interface by Rune who now works at OpenAI. And I, actually, we just, we just dropped a podcast with David Luan, who's CEO of Adept now, but he was VP of Eng, and he pitched Kevin Scott for the original Microsoft investment in OpenAI.
[00:33:40] swyx: Where he's basically pivoting to or just betting very hard on multimodality. I think that's something that we don't really position very well. I think this year, we're trying to all figure it out. I don't know if you have an updated perspective on multi modal models how that affects agents
[00:33:54] Sam Schillace: or not.
[00:33:55] Sam Schillace: Yeah, I mean, I think the multi I think multi modality is really important. And I, I think it's only going to get better from here. For sure. Yeah, the text is the universal wire protocol. You're probably right. Like, I don't know that I would defend that one entirely. Note that it doesn't say English, right?
[00:34:09] Sam Schillace: Like it's, it's not, that's even natural language. Like there's stuff like Steve Luko, who's the guy who created TypeScript, created TypeChat, right? Which is this like way to get LLMs to be very precise and return syntax and correct JavaScript. So like, I, yeah, I think like multimodality, like, I think part of the challenge with it is like, it's a little harder to access.
[00:34:30] Sam Schillace: Programatically still like I think you know and I do think like, You know like when when like dahly and stuff started to come Out I was like, oh photoshop's in trouble cuz like, you know I'm just gonna like describe images And you don't need photos of Photoshop anymore Which hasn't played out that way like they're actually like adding a bunch of tools who look like you want to be able to you know for multimodality be really like super super charged you need to be able to do stuff like Descriptively, like, okay, find the dog in this picture and mask around it.
[00:34:58] Sam Schillace: Okay, now make it larger and whatever. You need to be able to interact with stuff textually, which we're starting to be able to do. Like, you can do some of that stuff. But there's probably a whole bunch of new capabilities that are going to come out that are going to make it more interesting.
[00:35:11] Sam Schillace: So, I don't know, like, I suspect we're going to wind up looking kind of like Unix at the end of the day, where, like, there's pipes and, like, Stuff goes over pipes, and some of the pipes are byte character pipes, and some of them are byte digital or whatever like binary pipes, and that's going to be compatible with a lot of the systems we have out there, so like, that's probably still And I think there's a lot to be gotten from, from text as a language, but I suspect you're right.
[00:35:37] Sam Schillace: Like that particular law is not going to hold up super well. But we didn't have multimodal going when I wrote it. I'll take one out as well.
[00:35:46] Azure OpenAI vs Microsoft Research vs Microsoft AI Division
[00:35:46] swyx: I know. Yeah, I mean, the innovations that keep coming out of Microsoft. You mentioned multi agent. I think you're talking about autogen.
[00:35:52] swyx: But there's always research coming out of MSR. Yeah. PHY1, PHY2. Yeah, there's a bunch of
[00:35:57] Sam Schillace: stuff. Yeah.
[00:35:59] swyx: What should, how should the outsider or the AI engineer just as a sort of final word, like, How should they view the Microsoft portfolio things? I know you're not here to be a salesman, but What, how do you explain You know, Microsoft's AI
[00:36:12] Sam Schillace: work to people.
[00:36:13] Sam Schillace: There's a lot of stuff going on. Like, first of all, like, I should, I'll be a little tiny bit of a salesman for, like, two seconds and just point out that, like, one of the things we have is the Microsoft for Startups Founders Hub. So, like, you can get, like, Azure credits and stuff from us. Like, up to, like, 150 grand, I think, over four years.
[00:36:29] Sam Schillace: So, like, it's actually pretty easy to get. Credit you can start, I 500 bucks to start or something with very little other than just an idea. So like there's, that's pretty cool. Like, I like Microsoft is very much all in on AI at, at many levels. And so like that, you mentioned, you mentioned Autogen, like, So I sit in the office of the CTO, Microsoft Research sits under him, under the office of the CTO as well.
[00:36:51] Sam Schillace: So the Autogen group came out of somebody in MSR, like in that group. So like there's sort of. The spectrum of very researchy things going on in research, where we're doing things like Phi, which is the small language model efficiency exploration that's really, really interesting. Lots of very technical folks there that are building different kinds of models.
[00:37:10] Sam Schillace: And then there's like, groups like my group that are kind of a little bit in the middle that straddle product and, and, and research and kind of have a foot in both worlds and are trying to kind of be a bridge into the product world. And then there's like a whole bunch of stuff on the product side of things.
[00:37:23] Sam Schillace: So there's. All the Azure OpenAI stuff, and then there's all the stuff that's in Office and Windows. And I, so I think, like, the way, I don't know, the way to think about Microsoft is we're just powering AI at every level we can, and making it as accessible as we can to both end users and developers.
[00:37:42] Sam Schillace: There's this really nice research arm at one end of that spectrum that's really driving the cutting edge. The fee stuff is really amazing. It broke the chinchella curves. Right, like we didn't, that's the textbooks are all you need paper, and it's still kind of controversial, but like that was really a surprising result that came out of MSR.
[00:37:58] Sam Schillace: And so like I think Microsoft is both being a thought leader on one end, on the other end with all the Azure OpenAI, all the Azure tooling that we have, like very much a developer centric, kind of the tinkerer's paradise that Microsoft always was. It's like a great place to come and consume all these things.
[00:38:14] Sam Schillace: There's really amazing stuff ideas that we've had, like these very rich, long running, rag based chatbots that we didn't talk about that are like now possible to just go build with Azure AI Studio for yourself. You can build and deploy like a chatbot that's trained on your data specifically, like very easily and things like that.
[00:38:31] Sam Schillace: So like there's that end of things. And then there's all this stuff that's in Office, where like, you could just like use the copilots both in Bing, but also just like daily your daily work. So like, it's just kind of everywhere at this point, like everyone in the company thinks about it all the time.
[00:38:43] Sam Schillace: There's like no single answer to that question. That was way more salesy than I thought I was capable of, but like, that is actually the genuine truth. Like, it is all the time, it is all levels, it is all the way from really pragmatic, approachable stuff for somebody starting out who doesn't know things, all the way to like Absolutely cutting edge research, silicon, models, AI for science, like, we didn't talk about any of the AI for science stuff, I've seen magical stuff coming out of the research group on that topic, like just crazy cool stuff that's coming, so.
[00:39:13] Sam Schillace: You've
[00:39:14] swyx: called this since you joined Microsoft. I point listeners to the podcast that you did in 2022, pre ChatGBT with Kevin Scott. And yeah, you've been saying this from the beginning. So this is not a new line of Talk track for you, like you've, you, you've been a genuine believer for a long time.
[00:39:28] swyx: And,
[00:39:28] Sam Schillace: and just to be clear, like I haven't been at Microsoft that long. I've only been here for like two, a little over two years and you know, it's a little bit weird for me 'cause for a lot of my career they were the competitor and the enemy and you know, it's kind of funny to be here, but like it's really remarkable.
[00:39:40] On Satya
[00:39:40] Sam Schillace: It's going on. I really, really like Satya. I've met a, met and worked with a bunch of big tech CEOs and I think he's a genuinely awesome person and he's fun to work with and has a really great. vision. So like, and I obviously really like Kevin, we've been friends for a long time. So it's a cool place.
[00:39:56] Sam Schillace: I think there's a lot of interesting stuff. We
[00:39:57] swyx: have some awareness Satya is a listener. So obviously he's super welcome on the pod anytime. You can just drop in a good word for us.
[00:40:05] Sam Schillace: He's fun to talk to. It's interesting because like CEOs can be lots of different personalities, but he is you were asking me about how I'm like, so hands on and engaged.
[00:40:14] Sam Schillace: I'm amazed at how hands on and engaged he can be given the scale of his job. Like, he's super, super engaged with stuff, super in the details, understands a lot of the stuff that's going on. And the science side of things, as well as the product and the business side, I mean, it's really remarkable. I don't say that, like, because he's listening or because I'm trying to pump the company, like, I'm, like, genuinely really, really impressed with, like, how, what he's, like, I look at him, I'm like, I love this stuff, and I spend all my time thinking about it, and I could not do what he's doing.
[00:40:42] Sam Schillace: Like, it's just incredible how much you can get
[00:40:43] Ben Dunphy: into his head.
[00:40:44] Sam at AI Leadership Track
[00:40:44] Ben Dunphy: Sam, it's been an absolute pleasure to hear from you here, hear the war stories. So thank you so much for coming on. Quick question though you're here on the podcast as the presenting sponsor for the AI Engineer World's Fair, will you be taking the stage there, or are we going to defer that to Satya?
[00:41:01] Ben Dunphy: And I'm happy
[00:41:02] Sam Schillace: to talk to folks. I'm happy to be there. It's always fun to like I, I like talking to people more than talking at people. So I don't love giving keynotes. I love giving Q and A's and like engaging with engineers and like. I really am at heart just a builder and an engineer, and like, that's what I'm happiest doing, like being creative and like building things and figuring stuff out.
[00:41:22] Sam Schillace: That would be really fun to do, and I'll probably go just to like, hang out with people and hear what they're working on and working about.
[00:41:28] swyx: The AI leadership track is just AI leaders, and then it's closed doors, so you know, more sort of an unconference style where people just talk
[00:41:34] Sam Schillace: about their issues.
[00:41:35] Sam Schillace: Yeah, that would be, that's much more fun. That's really, because we are really all wrestling with this, trying to figure out what it means. Right. So I don't think anyone I, the reason I have the Scalache laws kind of give me the willies a little bit is like, I, I was joking that we should just call them the Scalache best guesses, because like, I don't want people to think that that's like some iron law.
[00:41:52] Sam Schillace: We're all trying to figure this stuff out. Right. Like some of it's right. Some it's not right. It's going to be messy. We'll have false starts, but yeah, we're all working it out. So that's the fun conversation. All
[00:42:02] Ben Dunphy: right. Thanks for having me. Yeah, thanks so much for coming on.
[00:42:05] Final Plug for Tickets & CFP
[00:42:05] Ben Dunphy: For those of you listening, interested in attending AI Engineer World's Fair, you can purchase your tickets today.
[00:42:11] Ben Dunphy: Learn more about the event at ai. engineer. You can purchase even group discounts. If you purchase four more tickets, use the code GROUP, and one of those four tickets will be free. If you want to speak at the event CFP closes April 8th, so check out the link at ai. engineer, send us your proposals for talks, workshops, or discussion groups.
[00:42:33] Ben Dunphy: So if you want to come to THE event of the year for AI engineers, the technical event of the year for AI engineers this is at June 25, 26, and 27 in San Francisco. That's it!

Get full access to Latent Space at www.latent.space/subscribe

29 March 2024, 3:00 pm
41 minutes 52 seconds

Why Google failed to make GPT-3 + why Multimodal Agents are the path to AGI — with David Luan of Adept

Our next SF event is AI UX 2024 - let’s see the new frontier for UX since last year!
Last call: we are recording a preview of the AI Engineer World’s Fair with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have!
Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an “ex-technical co-founder type”. Reach out to him for more!
David Luan has been at the center of the modern AI revolution: he was the ~30th hire at OpenAI, he led Google's LLM efforts and co-led Google Brain, and then started Adept in 2022, one of the leading companies in the AI agents space. In today's episode, we asked David for some war stories from his time in early OpenAI (including working with Alec Radford ahead of the GPT-2 demo with Sam Altman, that resulted in Microsoft’s initial $1b investment), and how Adept is building agents that can “do anything a human does on a computer" — his definition of useful AGI.
Why Google *couldn’t* make GPT-3
While we wanted to discuss Adept, we couldn’t talk to a former VP Eng of OpenAI and former LLM tech lead at Google Brain and not ask about the elephant in the room.
It’s often asked how Google had such a huge lead in 2017 with Vaswani et al creating the Transformer and Noam Shazeer predicting trillion-parameter models and yet it was David’s team at OpenAI who ended up making GPT 1/2/3.
David has some interesting answers:
“So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized…what they (should) have done would be say, hey, Noam Shazeer, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too…
You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing. He's got this decoder only transformer that's probably going to get there before we do.
And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why. At the time, there was a thing called the Brain Credit Marketplace. Everyone's assigned a credit. So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused.”
Cloning HGI for AGI
Human intelligence got to where it is today through evolution. Some argue that to get to AGI, we will approximate all the “FLOPs” that went into that process, an approach most famously mapped out by Ajeya Cotra’s Biological Anchors report:
The early days of OpenAI were very reinforcement learning-driven with the Dota project, but that's a very inefficient way for these models to re-learn everything. (Kanjun from Imbue shared similar ideas in her episode).
David argues that there’s a shortcut. We can bootstrap from existing intelligence.
“Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there… I think we are ignoring the fact that you have a giant shortcut, which is you can behaviorally clone everything humans already know. And that's what we solved with LLMs!”
LLMs today basically model intelligence using all (good!) written knowledge (see our Datasets 101 episode), and have now expanded to non-verbal knowledge (see our HuggingFace episode on multimodality). The SOTA self-supervised pre-training process is surprisingly data-efficient in taking large amounts of unstructured data, and approximating reasoning without overfitting.
But how do you cross the gap from the LLMs of today to building the AGI we all want?
This is why David & friends left to start Adept.
“We believe the clearest framing of general intelligence is a system that can do anything a human can do in front of a computer. A foundation model for actions, trained to use every software tool, API, and webapp that exists, is a practical path to this ambitious goal” — ACT-1 Blogpost
Critical Path: Abstraction with Reliability
The AGI dream is fully autonomous agents, but there are levels to autonomy that we are comfortable giving our agents, based on how reliable they are. In David’s word choice, we always want higher levels of “abstractions” (aka autonomy), but our need for “reliability” is the practical limit on how high of an abstraction we can use.
“The critical path for Adept is we want to build agents that can do a higher and higher level abstraction things over time, all while keeping an insanely high reliability standard. Because that's what turns us from research into something that customers want. And if you build agents with really high reliability standard, but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow.
That's the critical path for the company. Everything we do is in service of that.”
We saw how Adept thinks about different levels of abstraction at the 2023 Summit:
The highest abstraction is the “AI Employee”, but we’ll get there with “AI enabled employees”. Alessio recently gave a talk about the future of work with “services as software” at this week’s Nvidia GTC (slides).
No APIs
Unlike a lot of large research labs, Adept's framing of AGI as "being able to use your computer like a human" carries with it a useful environmental constraint:
“Having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? If you go itemize out the number of things you want to do on your computer for which every step has an API, those numbers of workflows add up pretty close to zero. And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing. And so I think this is actually the most practical path (to economic value).”
This realization and conviction means that multimodal modals are the way to go. Instead of using function calling to call APIs to build agents, which is what OpenAI and most of the open LLM industry have done to date, Adept wants to “drive by vision”, (aka see the screen as a human sees it) and pinpoint where to click and type as a human does. No APIs needed, because most software don’t expose APIs.
Extra context for readers: You can see the DeepMind SIMA model in the same light:
One system that learned to play a diverse set of games (instead of one dedicated model per game) using only pixel inputs and keyboard-and-mouse action outputs!
The OpenInterpreter team is working on a “Computer API” that also does the same.
To do this, Adept had to double down on a special kind of multimodality for knowledge work:
“A giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that is needs to kind of be the base for some of these agents…
…I think one big hangover primarily academic focus for multimodal models is most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera… (but) where are they going to be the most useful? They're going to be most useful in knowledge work tasks. That's where the majority of economic value is going to be. It's not in cat and dogs.
And so if that's what it is, what do you need to train? I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. That's just a totally different pre-training corpus. And so Adept spent a lot of time building that.”
With this context, you can now understand the full path of Adept’s public releases:
* ACT-1 (Sept 2022): a large Transformers model optimized for browser interactions. It has a custom rendering of the browser viewport that allows it to better understand it and take actions.
* Persimmon-8B (Sept 2023): a permissive open LLM (weights and code here)
* Fuyu-8B (Oct 2023): a small version of the multimodal model that powers Adept. Vanilla decoder-only transformer with no specialized image encoder, which allows it to handle input images of varying resolutions without downsampling.
* Adept Experiments (Nov 2023): A public tool to build automations in the browser. This is powered by Adept's core technology but it's just a piece of their enterprise platform. They use it as a way to try various design ideas.
* Fuyu Heavy (Jan 2024) - a new multimodal model designed specifically for digital agents and the world’s third-most-capable multimodal model (beating Gemini Pro on MMMU, AI2D, and ChartQA), “behind only GPT4-V and Gemini Ultra, which are 10-20 times bigger”
The Fuyu-8B post in particular exhibits a great number of examples on knowledge work multimodality:
Why Adept is NOT a Research Lab
With OpenAI now worth >$90b and Anthropic >$18b, it is tempting to conclude that the AI startup metagame is to build a large research lab, and attract the brightest minds and highest capital to build AGI.
Our past guests (see the Humanloop episode) and (from Imbue) combined to ask the most challenging questions of the pod - with David/Adept’s deep research pedigree from Deepmind and OpenAI, why is Adept not building more general foundation models (like Persimmon) and playing the academic benchmarks game? Why is Adept so focused on commercial agents instead?
“I feel super good that we're doing foundation models in service of agents and all of the reward within Adept is flowing from “Can we make a better agent”…
… I think pure play foundation model companies are just going to be pinched by how good the next couple of (Meta Llama models) are going to be… And then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models, I think is going to commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.”
and the commercial grounding is his answer to Kanjun too (whom we also asked the inverse question to compare with Adept):
“… the second reason I work at Adept is if you believe that actually having customers and a reward signal from customers lets you build AGI faster, which we really believe, then you should come here. And I think the examples for why that's true is for example, our evaluations are not academic evals. They're not simulator evals. They're like, okay, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to, we can't do them at all. We've turned those into evals.. I think that's a degree of practicality that really helps.”
And his customers seem pretty happy, because David didn’t need to come on to do a sales pitch:
David: “One of the things we haven't shared before is we're completely sold out for Q1.”
Swyx: “Sold out of what?”
David: “Sold out of bandwidth to onboard more customers.”
Well, that’s a great problem to have.
Show Notes
* David Luan
* Dextro at Data Driven NYC (2015)
* Adept
* ACT-1
* Persimmon-8B
* Adept Experiments
* Fuyu-8B
* $350M Series B announcement
* Amelia Wattenberger talk at AI Engineer Summit
* Figure
Chapters
* [00:00:00] Introductions
* [00:01:14] Being employee #30 at OpenAI and its early days
* [00:13:38] What is Adept and how do you define AGI?
* [00:21:00] Adept's critical path and research directions
* [00:26:23] How AI agents should interact with software and impact product development
* [00:30:37] Analogies between AI agents and self-driving car development
* [00:32:42] Balancing reliability, cost, speed and generality in AI agents
* [00:37:30] Potential of foundation models for robotics
* [00:39:22] Core research questions and reasons to work at Adept
Transcripts
Alessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.
Swyx [00:00:15]: Hey, and today we have David Luan, CEO, co-founder of Adept in the studio. Welcome.
David [00:00:20]: Yeah, thanks for having me.
Swyx [00:00:21]: Been a while in the works. I've met you socially at one of those VC events and you said that you were interested in coming on and glad we finally were able to make this happen.
David: Yeah, happy to be part of it.
Swyx: So we like to introduce the speaker and then also just like have you talk a little bit about like what's not on your LinkedIn, what people should just generally know about you. You started a company in college, which was the first sort of real time video detection classification API that was Dextro, and that was your route to getting acquired into Axon where you're a director of AI. Then you were the 30th hire at OpenAI?
David [00:00:53]: Yeah, 30, 35, something around there. Something like that.
Swyx [00:00:56]: So you were VP of Eng for two and a half years to two years, briefly served as tech lead of large models at Google, and then in 2022 started Adept. So that's the sort of brief CV. Is there anything else you like want to fill in the blanks or like people should know more about?
David [00:01:14]: I guess a broader story was I joined OpenAI fairly early and I did that for about two and a half to three years leading engineering there. It's really funny, I think second or third day of my time at OpenAI, Greg and Ilya pulled me in a room and we're like, you know, you should take over our directs and we'll go mostly do IC work. So that was fun, just coalescing a bunch of teams out of a couple of early initiatives that had already happened. The company, the Dota effort was going pretty hard and then more broadly trying to put bigger picture direction around what we were doing with basic research. So I spent a lot of time doing that. And then I led Google's LLM efforts, but also co-led Google Brain was one of the brain leads more broadly. You know, there's been a couple of different eras of AI research, right? If we count everything before 2012 as prehistory, which people hate it when I say that, kind of had this like you and your three best friends write a research paper that changes the world period from like 2012 to 2017. And I think the game changed in 2017 and like most labs didn't realize it, but we at OpenAI really did. I think in large part helped by like Ilya's constant beating of the drum that the world would be covered in data centers. And I think-
Swyx [00:02:15]: It's causally neat.
David [00:02:16]: Yeah. Well, like I think we had conviction in that, but it wasn't until we started seeing results that it became clear that that was where we had to go. But also part of it as well was for OpenAI, like when I first joined, I think one of the jobs that I had to do was how do I tell a differentiated vision for who we were technically compared to, you know, hey, we're just smaller Google Brain, or like you work at OpenAI if you live in SF and don't want to commute to Mountain View or don't want to live in London, right? That's like not enough to like hang your technical identity as a company. And so what we really did was, and I spent a lot of time pushing this, is just how do we get ourselves focused on a certain class of like giant swings and bets, right? Like how do you flip the script from you just do bottom-up research to more about how do you like leave some room for that, but really make it about like, what are the big scientific outcomes that you want to show? And then you just solve them at all costs, whether or not you care about novelty and all that stuff. And that became the dominant model for a couple of years, right? And then what's changed now is I think the number one driver of AI products over the next couple of years is going to be the deep co-design and co-evolution of product and users for feedback and actual technology. And I think labs, every tool to go do that are going to do really well. And that's a big part of why I started Adept.
Alessio [00:03:20]: You mentioned Dota, any memories thinking from like the switch from RL to Transformers at the time and kind of how the industry was evolving more in the LLM side and leaving behind some of the more agent simulation work?
David [00:03:33]: Like zooming way out, I think agents are just absolutely the correct long-term direction, right? You just go to find what AGI is, right? You're like, Hey, like, well, first off, actually, I don't love AGI definitions that involve human replacement because I don't think that's actually how it's going to happen. Even this definition of like, Hey, AGI is something that outperforms humans at economically valuable tasks is kind of implicit view of the world about what's going to be the role of people. I think what I'm more interested in is like a definition of AGI that's oriented around like a model that can do anything a human can do on a computer. If you go think about that, which is like super tractable, then agent is just a natural consequence of that definition. And so what did all the work we did on our own stuff like that get us was it got us a really clear formulation. Like you have a goal and you want to maximize the goal, you want to maximize reward, right? And the natural LLM formulation doesn't come with that out of the box, right? I think that we as a field got a lot right by thinking about, Hey, how do we solve problems of that caliber? And then the thing we forgot is the Novo RL is like a pretty terrible way to get there quickly. Why are we rediscovering all the knowledge about the world? Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there. Right.
Swyx [00:04:44]: The biological basis theory. Right.
David [00:04:46]: So I think we are ignoring the fact that you have a giant shortcut, which is you can behavioral clone everything humans already know. And that's what we solved with LLMs. We've solved behavioral cloning, everything that humans already know. Right. So like today, maybe LLMs is like behavioral cloning every word that gets written on the internet in the future, the multimodal models are becoming more of a thing where behavioral cloning the visual world. But really, what we're just going to have is like a universal byte model, right? Where tokens of data that have high signal come in, and then all of those patterns are like learned by the model. And then you can regurgitate any combination now. Right. So text into voice out, like image into other image out or video out or whatever, like these like mappings, right? Like all just going to be learned by this universal behavioral cloner. And so I'm glad we figured that out. And I think now we're back to the era of how do we combine this with all of the lessons we learned during the RL period. That's what's going to drive progress.
Swyx [00:05:35]: I'm still going to pressure you for a few more early opening stories before we turn to the ADET stuff. On your personal site, which I love, because it's really nice, like personal, you know, story context around like your history. I need to update it. It's so old. Yeah, it's so out of date. But you mentioned GPT-2. Did you overlap with GPT-1? I think you did, right?
David [00:05:53]: I actually don't quite remember. I think I was joining right around- Right around then?
Swyx [00:05:57]: I was right around that, yeah. Yeah. So what I remember was Alec, you know, just kind of came in and was like very obsessed with Transformers and applying them to like Reddit sentiment analysis. Yeah, sentiment, that's right. Take us through-
David [00:06:09]: Sentiment neuron, all this stuff.
Swyx [00:06:10]: The history of GPT as far as you know, you know, according to you. Ah, okay.
David [00:06:14]: History of GPT, according to me, that's a pretty good question. So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized, where like, again, you and your three best friends write papers, right? Okay. So zooming way out, right? I think about my job when I was a full-time research leader as a little bit of a portfolio allocator, right? So I've got really, really smart people. My job is to convince people to coalesce around a small number of really good ideas and then run them over the finish line. My job is not actually to promote a million ideas and never have critical mass. And then as the ideas start coming together and some of them start working well, my job is to nudge resources towards the things that are really working and then start disbanding some of the things that are not working, right? That muscle did not exist during my time at Google. And I think had they had it, what they would have done would be say, hey, Noam Shazir, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too.
Swyx [00:07:17]: He's talking about trillion parameter models in 2017.
David [00:07:20]: Yeah. So that's the core of the GPT story, right? Which is that, and I'm jumping around historically, right? But after GPT-2, we were all really excited about GPT-2. I can tell you more stories about that. It was the last paper that I even got to really touch before everything became more about building a research org. You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing, right? He's got this decoder only transformer that's probably going to get there before we do. And I was like, but like, please just like let this model finish, right? And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why, right? At the time, there was a thing called the brain credit marketplace. And did you guys know the brain credit marketplace? No, I never heard of this. Oh, so it's actually, it's a, you can ask any Googler.
Swyx [00:08:23]: It's like just like a thing that, that, I mean, look like, yeah, limited resources, you got to have some kind of marketplace, right? You know, sometimes it's explicit, sometimes it isn't, you know, just political favors.
David [00:08:34]: You could. And so then basically everyone's assigned a credit, right? So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused. And I think, again, that's like part of the narrative of like this phase one of AI, right? Of like this modern AI era to phase two. And I think in the same way, I think phase three company is going to out execute phase two companies because of the same asymmetry of success.
Swyx [00:09:12]: Yeah. I think it's underrated how much NVIDIA works with you in the early days as well. I think maybe, I think it was Jensen. I'm not sure who circulated a recent photo of him delivering the first DGX to you guys.
David [00:09:24]: I think Jensen has been a complete legend and a mastermind throughout. I have so much respect for NVIDIA. It is unreal.
Swyx [00:09:34]: But like with OpenAI, like kind of give their requirements, like co-design it or just work of whatever NVIDIA gave them.
David [00:09:40]: So we work really closely with them. There's, I'm not sure I can share all the stories, but examples of ones that I've found particularly interesting. So Scott Gray is amazing. I really like working with him. He was on one of my teams, the supercomputing team, which Chris Berner runs and Chris Berner still does a lot of stuff in that. As a result, like we had very close ties to NVIDIA. Actually, one of my co-founders at Adept, Eric Elson, was also one of the early GPGPU people. So he and Scott and Brian Catanzaro at NVIDIA and Jonah and Ian at NVIDIA, I think all were very close. And we're all sort of part of this group of how do we push these chips to the absolute limit? And I think that kind of collaboration helped quite a bit. I think one interesting set of stuff is knowing the A100 generation, that like quad sparsity was going to be a thing. Is that something that we want to go look into, right? And figure out if that's something that we could actually use for model training. Really what it boils down to is that, and I think more and more people realize this, six years ago, people, even three years ago, people refused to accept it. This era of AI is really a story of compute. It's really the story of how do you more efficiently map actual usable model flops to compute,
Swyx [00:10:38]: Is there another GPT 2, 3 story that you love to get out there that you think is underappreciated for the amount of work that people put into it?
David [00:10:48]: So two interesting GPT 2 stories. One of them was I spent a good bit of time just sprinting to help Alec get the paper out. And I remember one of the most entertaining moments was we were writing the modeling section. And I'm pretty sure the modeling section was the shortest modeling section of any ML, reasonably legitimate ML paper to that moment. It was like section three model. This is a standard vanilla decoder only transformer with like these particular things, those paragraph long if I remember correctly. And both of us were just looking at the same being like, man, the OGs in the field are going to hate this. They're going to say no novelty. Why did you guys do this work? So now it's funny to look at in hindsight that it was pivotal kind of paper, but I think it was one of the early ones where we just leaned fully into all we care about is solving problems in AI and not about, hey, is there like four different really simple ideas that are cloaked in mathematical language that doesn't actually help move the field forward?
Swyx [00:11:42]: Right. And it's like you innovate on maybe like data set and scaling and not so much the architecture.
David [00:11:48]: We all know how it works now, right? Which is that there's a collection of really hard won knowledge that you get only by being at the frontiers of scale. And that hard won knowledge, a lot of it's not published. A lot of it is stuff that's actually not even easily reducible to what looks like a typical academic paper. But yet that's the stuff that helps differentiate one scaling program from another. You had a second one? So the second one is, there's like some details here that I probably shouldn't fully share, but hilariously enough for the last meeting we did with Microsoft before Microsoft invested in OpenAI, Sam Altman, myself and our CFO flew up to Seattle to do the final pitch meeting. And I'd been a founder before. So I always had a tremendous amount of anxiety about partner meetings, which this basically this is what it was. I had Kevin Scott and Satya and Amy Hood, and it was my job to give the technical slides about what's the path to AGI, what's our research portfolio, all of this stuff, but it was also my job to give the GPT-2 demo. We had a slightly bigger version of GPT-2 that we had just cut maybe a day or two before this flight up. And as we all know now, model behaviors you find predictable at one checkpoint are not predictable in another checkpoint. And so I'd spent all this time trying to figure out how to keep this thing on rails. I had my canned demos, but I knew I had to go turn it around over to Satya and Kevin and let them type anything in. And that just, that really kept me up all night.
Swyx [00:13:06]: Nice. Yeah.
Alessio [00:13:08]: I mean, that must have helped you talking about partners meeting. You raised $420 million for Adept. The last round was a $350 million Series B, so I'm sure you do great in partner meetings.
Swyx [00:13:18]: Pitchers meetings. Nice.
David [00:13:20]: No, that's a high compliment coming from a VC.
Alessio [00:13:22]: Yeah, no, I mean, you're doing great already for us. Let's talk about Adept. And we were doing pre-prep and you mentioned that maybe a lot of people don't understand what Adept is. So usually we try and introduce the product and then have the founders fill in the blanks, but maybe let's do the reverse. Like what is Adept? Yeah.
David [00:13:38]: So I think Adept is the least understood company in the broader space of foundational models plus agents. So I'll give some color and I'll explain what it is and I'll explain also why it's actually pretty different from what people would have guessed. So the goal for Adept is we basically want to build an AI agent that can do, that can basically help humans do anything a human does on a computer. And so what that really means is we want this thing to be super good at turning natural language like goal specifications right into the correct set of end steps and then also have all the correct sensors and actuators to go get that thing done for you across any software tool that you already use. And so the end vision of this is effectively like I think in a couple of years everyone's going to have access to like an AI teammate that they can delegate arbitrary tasks to and then also be able to, you know, use it as a sounding board and just be way, way, way more productive. Right. And just changes the shape of every job from something where you're mostly doing execution to something where you're mostly actually doing like these core liberal arts skills of what should I be doing and why. Right. And I find this like really exciting and motivating because I think it's actually a pretty different vision for how AGI will play out. I think systems like Adept are the most likely systems to be proto-AGIs. But I think the ways in which we are really counterintuitive to everybody is that we've actually been really quiet because we are not a developer company. We don't sell APIs. We don't sell open source models. We also don't sell bottom up products. We're not a thing that you go and click and download the extension and like we want more users signing up for that thing. We're actually an enterprise company. So what we do is we work with a range of different companies, some like late stage multi-thousand people startups, some fortune 500s, et cetera. And what we do for them is we basically give them an out of the box solution where big complex workflows that their employees do every day could be delegated to the model. And so we look a little different from other companies in that in order to go build this full agent thing, the most important thing you got to get right is reliability. So initially zooming way back when, one of the first things that DEP did was we released this demo called Act One, right? Act One was like pretty cool. It's like kind of become a hello world thing for people to show agent demos by going to Redfin and asking to buy a house somewhere because like we did that in the original Act One demo and like showed that, showed like Google Sheets, all this other stuff. Over the last like year since that has come out, there's been a lot of really cool demos and you go play with them and you realize they work 60% of the time. But since we've always been focused on how do we build an amazing enterprise product, enterprises can't use anything that isn't in the nines of reliability. And so we've actually had to go down a slightly different tech tree than what you might find in the prompt engineering sort of plays in the agent space to get that reliability. And we've decided to prioritize reliability over all else. So like one of our use cases is crazy enough that it actually ends with a physical truck being sent to a place as the result of the agent workflow. And if you're like, if that works like 60% of the time, you're just blowing money and poor truck drivers going places.
Alessio [00:16:30]: Interesting. One of the, our investment teams has this idea of services as software. I'm actually giving a talk at NVIDIA GTC about this, but basically software as a service, you're wrapping user productivity in software with agents and services as software is replacing things that, you know, you would ask somebody to do and the software just does it for you. When you think about these use cases, do the users still go in and look at the agent kind of like doing the things and can intervene or like are they totally removed from them? Like the truck thing is like, does the truck just show up or are there people in the middle checking in?
David [00:17:04]: I think there's two current flaws in the framing for services as software, or I think what you just said. I think that one of them is like in our experience, as we've been rolling out Adept, the people who actually do the jobs are the most excited about it because they don't go from, I do this job to, I don't do this job. They go from, I do this job for everything, including the shitty rote stuff to I'm a supervisor. And I literally like, it's pretty magical when you watch the thing being used because now it parallelizes a bunch of the things that you had to do sequentially by hand as a human. And you can just click into any one of them and be like, Hey, I want to watch the trajectory that the agent went through to go solve this. And the nice thing about agent execution as opposed to like LLM generations is that a good chunk of the time when the agent fails to execute, it doesn't give you the wrong result. It just fails to execute. And the whole trajectory is just broken and dead and the agent knows it, right? So then those are the ones that the human then goes and solves. And so then they become a troubleshooter. They work on the more challenging stuff. They get way, way more stuff done and they're really excited about it. I think the second piece of it that we've found is our strategy as a company is to always be an augmentation company. And I think one out of principle, that's something we really care about. But two, actually, if you're framing yourself as an augmentation company, you're always going to live in a world where you're solving tasks that are a little too hard for what the model can do today and still needs a human to provide oversight, provide clarifications, provide human feedback. And that's how you build a data flywheel. That's how you actually learn from the smartest humans how to solve things models can't do today. And so I actually think that being an augmentation company forces you to go develop your core AI capabilities faster than someone who's saying, ah, okay, my job is to deliver you a lights off solution for X.
Alessio [00:18:42]: Yeah. It's interesting because we've seen two parts of the market. One is we have one company that does agents for SOC analysts. People just don't have them, you know, and just they cannot attract the talent to do it. And similarly, in a software development, you have Copilot, which is the augmentation product, and then you have sweep.dev and you have these products, which they just do the whole thing. I'm really curious to see how that evolves. I agree that today the reliability is so important in the enterprise that they just don't use most of them. Yeah. Yeah. No, that's cool. But it's great to hear the story because I think from the outside, people are like, oh, a dev, they do Act One, they do Persimon, they do Fuyu, they do all this stuff. Yeah, it's just the public stuff.
Swyx [00:19:20]: It's just public stuff.
David [00:19:21]: So one of the things we haven't shared before is we're completely sold out for Q1. And so I think...
Swyx [00:19:26]: Sold out of what?
David [00:19:27]: Sold out of bandwidth to go on board more customers. And so we're like working really hard to go make that less of a bottleneck, but our expectation is that I think we're going to be significantly more public about the broader product shape and the new types of customers we want to attract later this year. So I think that clarification will happen by default.
Swyx [00:19:43]: Why have you become more public? You know, if the whole push has... You're sold out, you're my enterprise, but you're also clearly putting effort towards being more open or releasing more things.
David [00:19:53]: I think we just flipped over that way fairly recently. That's a good question. I think it actually boils down to two things. One, I think that, frankly, a big part of it is that the public narrative is really forming around agents as being the most important thing. And I'm really glad that's happening because when we started the company in January 2022, everybody in the field knew about the agents thing from RL, but the general public had no conception of what it was. They were still hanging their narrative hat on the tree of everything's a chatbot. And so I think now one of the things that I really care about is that when people think agent, they actually think the right thing. All sorts of different things are being called agents. Chatbots are being called agents. Things that make a function call are being called agents. To me, an agent is something that you can give a goal and get an end step workflow done correctly in the minimum number of steps. And so that's a big part of why. And I think the other part is because I think it's always good for people to be more aware of Redept as they think about what the next thing they want to do in their careers. The field is quickly pivoting in a world where foundation models are looking more and more commodity. And I think a huge amount of gain is going to happen from how do you use foundation models as the well-learned behavioral cloner to go solve agents. And I think people who want to do agents research should really come to Redept.
Swyx [00:21:00]: When you say agents have become more part of the public narrative, are there specific things that you point to? I'll name a few. Bill Gates in his blog post mentioning that agents are the future. I'm the guy who made OSes, and I think agents are the next thing. So Bill Gates, I'll call that out. And then maybe Sam Altman also saying that agents are the future for open AI.
David [00:21:17]: I think before that even, I think there was something like the New York Times, Cade Metz wrote a New York Times piece about it. Right now, in a bit to differentiate, I'm seeing AI startups that used to just brand themselves as an AI company, but now brand themselves as an AI agent company. It's just like, it's a term I just feel like people really want.
Swyx [00:21:31]: From the VC side, it's a bit mixed. Is it? As in like, I think there are a lot of VCs where like, I would not touch any agent startups because like- Why is that? Well, you tell me.
Alessio [00:21:41]: I think a lot of VCs that are maybe less technical don't understand the limitations of the-
Swyx [00:21:46]: No, that's not fair.
Alessio [00:21:47]: No, no, no, no. I think like- You think so? No, no. I think like the, what is possible today and like what is worth investing in, you know? And I think like, I mean, people look at you and say, well, these guys are building agents. They needed 400 million to do it. So a lot of VCs are maybe like, oh, I would rather invest in something that is tacking on AI to an existing thing, which is like easier to get the market and kind of get some of the flywheel going. But I'm also surprised a lot of funders just don't want to do agents. It's not even the funding. Sometimes we look around and it's like, why is nobody doing agents for X? Wow.
David [00:22:17]: That's good to know actually. I never knew that before. My sense from my limited perspective is there's a new agent company popping up every day.
Swyx [00:22:24]: So maybe I'm- They are. They are. But like I have advised people to take agents off of their title because it's so diluted.
David [00:22:31]: It's now so diluted.
Swyx [00:22:32]: Yeah. So then it doesn't stand for anything. Yeah.
David [00:22:35]: That's a really good point.
Swyx [00:22:36]: So like, you know, you're a portfolio allocator. You have people know about Persimmon, people know about Fuyu and Fuyu Heavy. Can you take us through like how you think about that evolution of that and what people should think about what that means for adepts and sort of research directions? Kind of take us through the stuff you shipped recently and how people should think about the trajectory of what you're doing.
David [00:22:56]: The critical path for adepts is we want to build agents that can do a higher and higher level abstraction things over time, all while keeping an insanely high reliability standard. Because that's what turns us from research into something that customers want. And if you build agents with really high reliability standard, but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow. That's the critical path for the company. Everything we do is in service of that. So if you go zoom way, way back to Act One days, right? Like the core thing behind Act One is can we teach large model basically how to even actuate your computer? And I think we're one of the first places to have solved that and shown it and shown the generalization that you get when you give it various different workflows and texts. But I think from there on out, we really realized was that in order to get reliability, companies just do things in various different ways. You actually want these models to be able to get a lot better at having some specification of some guardrails for what it actually should be doing. And I think in conjunction with that, a giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that is needs to kind of be the base for some of these agents. Back then we had to do a ton of research basically on how do we actually make that possible? Well, first off, like back in forgot exactly one month to 23, like there were no multimodal models really that you could use for things like this. And so we pushed really hard on stuff like the Fuyu architecture. I think one big hangover primarily academic focus for multimodal models is most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera. Coco. Yeah, right. And the Coco is awesome. Like I love Coco. I love TY. Like it's really helped the field. Right. But like that's the build one thing. I actually think it's really clear today. Multimodal models are the default foundation model, right? It's just going to supplant LLMs. Like you just train a giant multimodal model. And so for that though, like where are they going to be the most useful? They're going to be most useful in knowledge work tasks. That's where the majority of economic value is going to be. It's not in cat and dogs. Right. And so if that's what it is, what do you need to train? I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. That's just a totally different pre-training corpus. And so a depth spent a lot of time building that. And so the public for use and stuff aren't trained on our actual corpus, it's trained on some other stuff. But you take a lot of that data and then you make it really fast and make it really good at things like dense OCR on screens. And then now you have the right like raw putty to go make a good agent. So that's kind of like some of the modeling side, we've kind of only announced some of that stuff. We haven't really announced much of the agent's work, but that if you put those together with the correct product form factor, and I think the product form factor also really matters. I think we're seeing, and you guys probably see this a little bit more than I do, but we're seeing like a little bit of a pushback against the tyranny of chatbots as form factor. And I think that the reason why the form factor matters is the form factor changes what data you collect in the human feedback loop. And so I think we've spent a lot of time doing full vertical integration of all these bits in order to get to where we are.
Swyx [00:25:44]: Yeah. I'll plug Amelia Wattenberger’s talk at our conference, where she gave a little bit of the thinking behind like what else exists other than chatbots that if you could delegate to reliable agents, you could do. I was kind of excited at Adept experiments or Adept workflows, I don't know what the official name for it is. I was like, okay, like this is something I can use, but it seems like it's just an experiment for now. It's not your product.
David [00:26:06]: So you basically just use experiments as like a way to go push various ideas on the design side to some people and just be like, yeah, we'll play with it. Actually the experiments code base underpins the actual product, but it's just the code base itself is kind of like a skeleton for us to go deploy arbitrary cards on the side.
Swyx [00:26:22]: Yeah.
Alessio [00:26:23]: Makes sense. I was going to say, I would love to talk about the interaction layer. So you train a model to see UI, but then there's the question of how do you actually act on the UI? I think there was some rumors about open app building agents that are kind of like, they manage the end point. So the whole computer, you're more at the browser level. I read in one of your papers, you have like a different representation, kind of like you don't just take the dome and act on it. You do a lot more stuff. How do you think about the best way the models will interact with the software and like how the development of products is going to change with that in mind as more and more of the work is done by agents instead of people?
David [00:26:58]: This is, there's so much surface area here and it's actually one of the things I'm really excited about. And it's funny because I've spent most of my time doing research stuff, but there's like a whole new ball game that I've been learning about and I find it really cool. So I would say the best analogy I have to why Adept is pursuing a path of being able to use your computer like a human, plus of course being able to call APIs and being able to call APIs is the easy part, like being able to use your computer like a human is a hard part. It's in the same way why people are excited about humanoid robotics, right? In a world where you had T equals infinity, right? You're probably going to have various different form factors that robots could just be in and like all the specialization. But the fact is that humans live in a human environment. So having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? If you go itemize out the number of things you want to do on your computer for which every step has an API, those numbers of workflows add up pretty close to zero. And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing. And so I think this is actually the most practical path. I think because it's the most practical path, I think a lot of success will come from going down this path. I kind of think about this early days of the agent interaction layer level is a little bit like, do you all remember Windows 3.1? Like those days? Okay, this might be, I might be, I might be too old for you guys on this. But back in the day, Windows 3.1, we had this transition period between pure command line, right? Being the default into this new world where the GUI is the default and then you drop into the command line for like programmer things, right? The old way was you booted your computer up, DOS booted, and then it would give you the C colon slash thing. And you typed Windows and you hit enter, and then you got put into Windows. And then the GUI kind of became a layer above the command line. The same thing is going to happen with agent interfaces is like today we'll be having the GUI is like the base layer. And then the agent just controls the current GUI layer plus APIs. And in the future, as more and more trust is built towards agents and more and more things can be done by agents, if more UIs for agents are actually generative in and of themselves, then that just becomes a standard interaction layer. And if that becomes a standard interaction layer, what changes for software is that a lot of software is going to be either systems or record or like certain customized workflow execution engines. And a lot of how you actually do stuff will be controlled at the agent layer.
Alessio [00:29:19]: And you think the rabbit interface is more like it would like you're not actually seeing the app that the model interacts with. You're just saying, hey, I need to log this call on Salesforce. And you're never actually going on salesforce.com directly as the user. I can see that being a model.
David [00:29:33]: I think I don't know enough about what using rabbit in real life will actually be like to comment on that particular thing. But I think the broader idea that, you know, you have a goal, right? The agent knows how to break your goal down into steps. The agent knows how to use the underlying software and systems or record to achieve that goal for you. The agent maybe presents you information in a custom way that's only relevant to your particular goal, all just really leads to a world where you don't really need to ever interface with the apps underneath unless you're a power user for some niche thing.
Swyx [00:30:03]: General question. So first of all, I think like the sort of input mode conversation. I wonder if you have any analogies that you like with self-driving, because I do think like there's a little bit of how the model should perceive the world. And you know, the primary split in self-driving is LiDAR versus camera. And I feel like most agent companies that I'm tracking are all moving towards camera approach, which is like the multimodal approach, you know, multimodal vision, very heavy vision, all the Fuyu stuff that you're doing. You're focusing on that, including charts and tables. And do you find that inspiration there from like the self-driving world? That's a good question.
David [00:30:37]: I think sometimes the most useful inspiration I've found from self-driving is the levels analogy. I think that's awesome. But I think that our number one goal is for agents not to look like self-driving. We want to minimize the chances that agents are sort of a thing that you just have to bang your head at for a long time to get to like two discontinuous milestones, which is basically what's happened in self-driving. We want to be living in a world where you have the data flywheel immediately, and that takes you all the way up to the top. But similarly, I mean, compared to self-driving, like two things that people really undervalue is like really easy to driving a car down highway 101 in a sunny day demo. That actually doesn't prove anything anymore. And I think the second thing is that as a non-self-driving expert, I think one of the things that we believe really strongly is that everyone undervalues the importance of really good sensors and actuators. And actually a lot of what's helped us get a lot of reliability is a really strong focus on actually why does the model not do this thing? And the non-trivial amount of time, the time the model doesn't actually do the thing is because if you're a wizard of ozzing it yourself, or if you have unreliable actuators, you can't do the thing. And so we've had to fix a lot of those problems.
Swyx [00:31:43]: I was slightly surprised just because I do generally consider the way most that we see all around San Francisco as the most, I guess, real case of agents that we have in very material ways.
David [00:31:55]: Oh, that's absolutely true. I think they've done an awesome job, but it has taken a long time for self-driving to mature from when it entered the consciousness and the driving down 101 on a sunny day moment happened to now. Right. So I want to see that more compressed.
Swyx [00:32:07]: And I mean, you know, cruise, you know, RIP. And then one more thing on just like, just going back on this reliability thing, something I have been holding in my head that I'm curious to get your commentary on is I think there's a trade-off between reliability and generality, or I want to broaden reliability into just general like sort of production readiness and enterprise readiness scale. Because you have reliability, you also have cost, you have speed, speed is a huge emphasis for a debt. The tendency or the temptation is to reduce generality to improve reliability and to improve cost, improve speed. Do you perceive a trade-off? Do you have any insights that solve those trade-offs for you guys?
David [00:32:42]: There's definitely a trade-off. If you're at the Pareto frontier, I think a lot of folks aren't actually at the Pareto frontier. I think the way you get there is basically how do you frame the fundamental agent problem in a way that just continues to benefit from data? I think one of the main ways of being able to solve that particular trade-off is you basically just want to formulate the problem such that every particular use case just looks like you collecting more data to go make that use case possible. I think that's how you really solve. Then you get into the other problems like, okay, are you overfitting on these end use cases? You're not doing a thing where you're being super prescriptive for the end steps that the model can only do, for example.
Swyx [00:33:17]: Then the question becomes, do you have one house model that you can then customize for each customer and you're fine-tuning them on each customer's specific use case?
David [00:33:25]: Yeah.
Swyx [00:33:26]: We're not sharing that. You're not sharing that. It's tempting, but that doesn't look like AGI to me. You know what I mean? That is just you have a good base model and then you fine-tune it.
David [00:33:35]: For what it's worth, I think there's two paths to a lot more capability coming out of the models that we all are training these days. I think one path is you figure out how to spend, compute, and turn it into data. In that path, I consider search, RL, all the things that we all love in this era as part of that path, like self-play, all that stuff. The second path is how do you get super competent, high intelligence demonstrations from humans? I think the right way to move forward is you kind of want to combine the two. The first one gives you maximum sample efficiency for a little second, but I think that it's going to be hard to be running at max speed towards AGI without actually solving a bit of both.
Swyx [00:34:16]: You haven't talked much about synthetic data, as far as I can tell. Probably this is a bit too much of a trend right now, but any insights on using synthetic data to augment the expensive human data?
David [00:34:26]: The best part about framing AGI as being able to help people do things on computers is you have an environment.
Swyx [00:34:31]: Yes. So you can simulate all of it.
David [00:34:35]: You can do a lot of stuff when you have an environment.
Alessio [00:34:37]: We were having dinner for our one-year anniversary. Congrats. Yeah. Thank you. Raza from HumanLoop was there, and we mentioned you were coming on the pod. This is our first-
Swyx [00:34:45]: So he submitted a question.
Alessio [00:34:46]: Yeah, this is our first, I guess, like mailbag question. He asked, when you started GPD 4 Data and Exist, now you have a GPD 4 vision and help you building a lot of those things. How do you think about the things that are unique to you as Adept, and like going back to like the maybe research direction that you want to take the team and what you want people to come work on at Adept, versus what is maybe now become commoditized that you didn't expect everybody would have access to?
David [00:35:11]: Yeah, that's a really good question. I think implicit in that question, and I wish he were tier two so he can push back on my assumption about his question, but I think implicit in that question is calculus of where does advantage accrue in the overall ML stack. And maybe part of the assumption is that advantage accrues solely to base model scaling. But I actually believe pretty strongly that the way that you really win is that you have to go build an agent stack that is much more than that of the base model itself. And so I think like that is always going to be a giant advantage of vertical integration. I think like it lets us do things like have a really, really fast base model, is really good at agent things, but is bad at cat and dog photos. It's pretty good at cat and dog photos. It's not like soda at cat and dog photos, right? So like we're allocating our capacity wisely, right? That's like one thing that you really get to do. I also think that the other thing that is pretty important now in the broader foundation modeling space is I feel despite any potential concerns about how good is agents as like a startup area, right? Like we were talking about earlier, I feel super good that we're doing foundation models in service of agents and all of the reward within Adept is flowing from can we make a better agent? Because right now I think we all see that, you know, if you're training on publicly available web data, you put in the flops and you do reasonable things, then you get decent results. And if you just double the amount of compute, then you get predictably better results. And so I think pure play foundation model companies are just going to be pinched by how good the next couple of llamas are going to be and the next what good open source thing. And then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models, I think is going to commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.
Swyx [00:36:56]: So you don't consider yourself a pure play foundation model company?
David [00:36:59]: No, because if we were a pure play foundation model company, we would be training general foundation models that do summarization and all this other...
Swyx [00:37:06]: You're dedicated towards the agent. Yeah.
David [00:37:09]: And our business is an agent business. We're not here to sell you tokens, right? And I think like selling tokens, unless there's like a...
Swyx [00:37:14]: Not here to sell you tokens. I love it.
David [00:37:16]: It's like if you have a particular area of specialty, right? Then you won't get caught in the fact that everyone's just scaling to ridiculous levels of compute. But if you don't have a specialty, I find that, I think it's going to be a little tougher.
Swyx [00:37:27]: Interesting. Are you interested in robotics at all? Just a...
David [00:37:30]: I'm personally fascinated by robotics. I've always loved robotics.
Swyx [00:37:33]: Embodied agents as a business, you know, Figure is like a big, also sort of open AI affiliated company that raises a lot of money.
David [00:37:39]: I think it's cool. I think, I mean, I don't know exactly what they're doing, but...
Swyx [00:37:44]: Robots. Yeah.
David [00:37:46]: Well, I mean, that's a...
Swyx [00:37:47]: Yeah. What question would you ask? If we had them on, what would you ask them?
David [00:37:50]: Oh, I just want to understand what their overall strategy is going to be between now and when there's reliable stuff to be deployed. But honestly, I just don't know enough about it.
Swyx [00:37:57]: And if I told you, hey, fire your entire warehouse workforce and, you know, put robots in there, isn't that a strategy? Oh yeah.
David [00:38:04]: Yeah. Sorry. I'm not questioning whether they're doing smart things. I genuinely don't know what they're doing as much, but I think there's two things. One, I'm so excited for someone to train a foundation model of robots. It's just, I think it's just going to work. Like I will die on this hill, but I mean, like again, this whole time, like we've been on this podcast, we're just going to continually saying these models are basically behavioral cloners. Right. So let's go behavioral clone all this like robot behavior. Right. And then you figure out everything else you have to do in order to teach you how to solve a new problem. That's going to work. I'm super stoked for that. I think unlike what we're doing with helping humans with knowledge work, it just sounds like a more zero sum job replacement play. Right. And I'm personally less excited about that.
Alessio [00:38:46]: We had a Ken June from InBoo on the podcast. We asked her why people should go work there and not at Adept.
Swyx [00:38:52]: Oh, that's so funny.
Alessio [00:38:54]: Well, she said, you know, there's space for everybody in this market. We're all doing interesting work. And she said, they're really excited about building an operating system for agent. And for her, the biggest research thing was like getting models, better reasoning and planning for these agents. The reverse question to you, you know, why should people be excited to come work at Adept instead of InBoo? And maybe what are like the core research questions that people should be passionate about to have fun at Adept? Yeah.
David [00:39:22]: First off, I think that I'm sure you guys believe this too. The AI space to the extent there's an AI space and the AI agent space are both exactly as she likely said, I think colossal opportunities and people are just going to end up winning in different areas and a lot of companies are going to do well. So I really don't feel that zero something at all. I would say to like change the zero sum framing is why should you be at Adept? I think there's two huge reasons to be at Adept. I think one of them is everything we do is in the service of like useful agents. We're not a research lab. We do a lot of research in service of that goal, but we don't think about ourselves as like a classic research lab at all. And I think the second reason I work at Adept is if you believe that actually having customers and a reward signal from customers lets you build a GI faster, which we really believe, then you should come here. And I think the examples for why that's true is for example, our evaluations, they're not academic evals. They're not simulator evals. They're like, okay, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to, we can't do them at all. We've turned those into evals, solve it, right? I think that's really cool. Like everybody knows a lot of these evals are like pretty saturated and the new ones that even are not saturated. You look at someone and you're like, is this actually useful? Right? I think that's a degree of practicality that really helps. Like we're equally excited about the same problems around reasoning and planning and generalization and all of this stuff. They're very grounded in actual needs right now, which is really cool.
Swyx [00:40:45]: Yeah. This has been a wonderful dive. You know, I wish we had more time, but I would just leave it kind of open to you. I think you have broad thoughts, you know, just about the agent space, but also just in general AI space. Any, any sort of rants or things that are just off of mind for you right now?
David [00:40:57]: Any rants?
Swyx [00:40:59]: Mining you for just general...
David [00:41:01]: Wow. Okay. So Amelia has already made the rant better than I have, but, but like not just, not just chatbots is like kind of rant one. And two is AI has really been the story of compute and compute plus data and ways in which you could change one for the other. And I think as much as our research community is really smart, we have made many, many advancements and that's going to continue to be important. But now I think the game is increasingly changing and the rapid industrialization era has begun. And I think we unfortunately have to embrace it.
Swyx [00:41:30]: Yep.
Alessio [00:41:31]: Excellent. Awesome, David. Thank you so much for your time.
David [00:41:34]: Cool. Thanks guys.

Get full access to Latent Space at www.latent.space/subscribe

22 March 2024, 7:08 pm
52 minutes 51 seconds

Making Transformers Sing - with Mikey Shulman of Suno

Giving computers a voice has always been at the center of sci-fi movies; “I’m sorry Dave, I’m afraid I can’t do that” wouldn’t hit as hard if it just appeared on screen as a terminal output, after all. The first electronic speech synthesizer, the Voder, was built at Bell Labs 85 years ago (1939!), and it’s…. something:
We will not cover the history of Text To Speech (TTS), but the evolution of the underlying architecture has generally been Formant Synthesis → Concatenative Synthesis → Neural Networks. Nowadays, state of the art TTS is just one API call away with models like Eleven Labs and OpenAI’s TTS, or products like Descript. Latency is minimal, they have very good intonation, and can mimic a variety of accents. You can hack together your own voice AI therapist in a day!
But once you have a computer that can communicate via voice, what comes next? Singing🎶 of course!
From Barking 🐶 to Singing 🎤
Today’s guest is Suno’s CEO and co-founder Mikey Shulman. He and his three co-founders, Georg, Martin, and Keenan, previously worked together at Kensho. One of their projects was financially-focused speech recognition (think earnings calls, etc), but all four of them happened to be musicians and audiophiles. They started playing around with text to speech + AI + audio generation and eventually left Kensho to work on it full time.
A lot of people when we started a company told us to focus on speech. If we wanted to build an audio company, everyone said, speech is a bigger market. But I think there's something about music that's just so human and you almost couldn't prevent us from doing it. Like we just couldn't keep ourselves from building music models and playing with them because it was so much fun.
Their first big product was Bark, the first open source transformer-based “text-to-audio” model (architecturally inspired by Karpathy’s NanoGPT) that went from 0 to ~19,000 Github stars in a month. At the time they felt like audio was years behind text and image as a generation modality; unlike its predecessors, Bark could not only generate speech, but also music and sound effects like crying, laughing, sighing, etc. You can find a few examples here.
The main limitation they saw was text to speech training data being extremely limited. So what they did instead is build a new type of foundation model from scratch, trained on audio, and then tweak it to do text to speech. Turning audio into tokens to do self-supervised learning was the most important innovation. Unlike TTS models which are very narrow (and often sound unnatural), Bark was trained on real audio of real people from broad contexts, which made it harder to output unnatural sounding speech.
As Bark got popular, more and more people started using it to generate music and it became clear that their architecture would work to generate music that people enjoyed, even though it might not be "on the AGI path” of other labs:
Everybody is so focused on LLMs, for good reason, and information processing and intelligence there. And I think it's way too easy to forget that there's this whole other side of things that makes people feel, and maybe that market is smaller, but it makes people feel and it makes us really happy.
Suno bursts on the scene
In December 2023, Suno went viral with a gorgeous new website and launch tweet:
And rave reviews:
Music is core to our culture, but very few people are able to create it; Mikey and team want to make everyone an active participant in music making, not just a listener. A “Midjourney of Music”, if you like.
We definitely had a lot of fun playing with Suno to generate all sort of Latent Space jingles and songs; the product is live at suno.ai if you want to get in the studio yourself!
If Nas joined Latent Space instead of The Firm:
182B models > Blink-182
The soundtrack of the post-scarcity Latent Space ranch
Scaling with Modal
Given the December launch, scaling up for the Christmas rush was a major concern. This will be a nice tie-in for loyal listeners - Suno runs on Modal (one of our featured guests from Compute Month)!
Suno V3
For those who want to appreciate someone special in their life, you can always try Suno’s special Valentines’ Day experience:
We preview this on the pod, but Suno has now officially shipped a V3 Alpha with a wealth of improvements:
and you’ll have to click through to their demos or user reviews to see:
We’ve recently become paying customers ourselves, and are having loads of fun generating music. If you have any of your own generations to share, tag @latentspacepod on Twitter or swing by the LS Discord!
The AudioGen Landscape
Mikey breaks down the landscape into 3 big categories: music, speech and sound effects (SFX). These look more like Venn diagrams than MECE categories.
Suno is the latest entry in a long series of audio generation efforts that combine both music and speech, reaching as far back as Tensorflow Magenta (we aren’t aware of prior AI music projects, please comment below if you can find a good timeline we can use with attribution!). Other efforts like Seamless blend translation and speech generation, and Audiobox combines speech and SFX. We’ve yet to see “one model to rule them all” but surely it will happen, and probably Transformers (perhaps Diffusion Transformers) will be at the heart of them.
Show Notes
* Suno
* Bark
* Parakeet
* Mikey Shulman
* Goodhart Strikes Again
* Mastering the Two Halves of your brain
* NanoGPT repo
* "Return to Monkey"
Timestamps
* [00:00:00] Introduction
* [00:01:44] State of Music Generation Models
* [00:06:47] AI Data Wars & Copyright
* [00:10:32] Going from ML in finance to music generation
* [00:12:30] Suno's TTS origins with Bark and Parakeet
* [00:16:25] Easy vs Expert mode for music
* [00:21:44] The Midjourney of Music?
* [00:23:43] Live demo
* [00:36:00] Remaking vs Creating
* [00:38:12] Suno's direction
* [00:41:52] Beyond single track generation
* [00:43:53] Favorite Suno usage in the wild
* [00:46:00] The 2 mins overview of the audio generation space
* [00:48:42] Benchmarking AI
Transcription
Alessio [00:00:01]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.
Swyx [00:00:10]: Hey, and today we are in the remote studio with Mikey Shulman. Welcome.
Mikey [00:00:16]: Thank you.
Swyx [00:00:17]: It's great to be here. So I'd like to go over people's background on LinkedIn and then maybe find out a little bit more outside of LinkedIn. You did your bachelor's in physics and then a PhD in physics as well, before going into Kensho Technologies, the home of a lot of top AI startups, it seems like, where you're head of machine learning for seven years. You're also a lecturer at MIT, we can talk about that, what you talked about. And then about two years ago, you left to start Suno, which is recently burst on the scene as one of the top music generation startups. So we can talk, we can go over that bio, but also I guess what's not in your LinkedIn that people should know about you?
Mikey [00:01:06]: I love music. I am a aspiring mediocre musician. I wish I were better, but that doesn't make me not enjoy playing real music. And I also love coffee. I'm probably way too much into coffee.
Alessio [00:01:19]: Are you one of those people that, you know, they do the TikToks, they use like 50 tools to like grind the beans and then like brush them and then like spray them. Like what level are we talking about here?
Mikey [00:01:31]: I confess there's a spray bottle for beans in the next room, there is one of those weird comb tools, so guilty. I don't put it on TikTok though.
Alessio [00:01:42]: Yeah, no, no. Some things gotta stay private.
Mikey [00:01:46]: I played a lot of piano growing up and I play bass and I, in a very mediocre way, play guitar and drums. Yeah. Right.
Alessio [00:01:55]: That's a lot. I cannot do any of those things. As Sean mentioned, you guys kind of burst into the scene as maybe the state of the art music generation company. I think it's a model that we haven't really covered in the past. So I would love to maybe for you to just give a brief intro of like how do you do music generation and why is it possible? Because I think people understand you take text and you have to predict the next word and you take a diffusion model and you basically like add noise to an image and then kind of remove the noise. But I think for music, it's hard for people to have a mental model. Like what's the, how do you turn a music model on? Like what does a music model do to generate a song? So maybe we can start there.
Mikey [00:02:41]: Yeah. Maybe I'll even take one more step back and say it's not even entirely worked out. I think the same way it is in text. And so it's an evolving field. If you take a giant step back, I think audio has been lagging images and text for a while. So I think very roughly you can think audio is like one to two years behind images and text. But you kind of have to think today like text was in 2022 or something like this. And you know, the transformer was invented. It looks like it works, but it's, it's, it's far, far less established. And so you know, I'll give you the way we think about the world now, but just with the big caveat that, that I'm probably wrong if we look back in a couple of years from now. And I think the biggest thing is you see both transformer based and diffusion based models for audio in, and in ways that that is not true in text. I know people will do some diffusion for text, but I think nobody's like really doing that for real. And so we, we prefer transformers for a variety of reasons. And so you can think it's very similar to text. You have some abstract notion of a token and you train a model to predict the probability over all of the next token. So it's a language model. You can think in anything, language model is just something that assigns likelihoods to sequences of tokens. Sometimes those tokens correspond to text. In our case, they correspond to music or audio in general. And I think we've learned a lot from our friends in the text domain, from the pioneers doing this of how well these transformer models work, where do they work, where do they not work? But at its core, the way we like to do things with transformers is exactly like it works in text. Let me predict the next tiny little bit of audio, and I can just keep doing that and doing that and generating audio as long as I want.
Swyx [00:04:39]: Yeah. I think the, the temptation here is to always try to bake in some specialized knowledge about music or audio. And so, and obviously you will get an improvement in, in your output. If you try to just say like, okay, like here's a set of notes for, you know, here's a set of tokens that only do jazz or only do, you know, like voices. How general do you make it versus how specific do you make it?
Mikey [00:05:10]: We've always tried to do things, you know, quote unquote the right way, which means that at the beginning things are going to be hard and worse than other ways. But that is to say, bake in as little kind of implicit knowledge as possible. And so, the same way you don't program into GPT, you don't say this is a noun and this is a verb, but it has implicitly learned all of those things. I've never seen GPT accidentally, you know, put a, put a noun where it meant to put an article in English. We try not to impose anything about music or audio in general into the model, and we kind of let the models learn things by themselves. And I think things are beginning to pay off, but it's, you know, it's not necessarily obvious from the beginning that that was the right thing to do. So, for example, you know, you could take something like text to speech and people will do all sorts of things where you can program in things like phonemes to be the basis for what you do. And then that kind of limits you to the set of things that are expressible by phonemes. And so, ultimately that works really well in the short term. In the long term, it can be quite limiting. And so, our approach has always been to try to do this in its full generality, as end to end as we can do it. Even if it means that in the short term we were a little bit worse, we have a lot of confidence that in the long term that will be the right way to do it.
Alessio [00:06:33]: And what's the data recipe for turning a good music model? Like what percentage genre do you put, like also do you split vocals and instrumentals?
Mikey [00:06:43]: So you have to do lots of things. And I think this is the biggest area where we have, you know, sort of our secret sauce. I think to a large extent, what we do is we benefit from all of the beautiful things people do with transformers and text. And we focus very hard basically on how do I tokenize audio in the right way. And without divulging too much secret sauce, it's at least similar to how it's done in sort of the open source stuff. You will have different models that learn to encode audio in discrete representations. And a lot of this boils down to figuring out the right, let's say, implicit biases to put in those models, the right data to inject. How do I make sure that I can produce kind of all audio arbitrarily? That's speech, that's background music, that's vocals, that's kind of everything to make sure that I can really capture all the behavior that I want to.
Alessio [00:07:40]: Yeah, that makes sense. And then in terms of some of... We had our monthly recap last month, and the data wars were kind of one of the hot topics. You saw the New York Times lawsuit against OpenAI, because you have obviously large language models in production. You don't have large music models in production. So I think there's maybe been less of a trade there, so to speak. How do you kind of think about that? There's obviously a lot of copyright-free, royalty-free music out there. Is there any kind of power law in terms of like, hey, the best music is actually much better to train on, or in music does it not really matter because the structure of some of the musical structure is kind of the same?
Mikey [00:08:27]: I don't think we know these things nearly as well as they're known in text. We have some notions of some of the scaling laws here, but I think, yeah, we're just so, so far behind. You know, what I will say is that people are always surprised to learn that we don't only train on music. And I usually give the analogy of some of the code generation models, so take something like Code Llama, which is, as far as I know, the best open source code generating model. You guys would know better than I would. It's certainly up there. And it's trained on a bunch of English, not only just code. And it's because there are patterns in English that are going to be useful. And so, you can imagine, you don't only want to train on music to get good music models. And so, for example, one of the places that we are particularly bad is vocals and capturing really realistic vocals. And so, you might imagine that there's other types of human vocals that you can put into your model that are not music that will help it learn stuff. And so, again, I think it's like super, super early. I think we've barely scratched the surface of what are the right ways to do this. And that's really cool. From a progress perspective, there's like a lot of low-hanging fruit for us to still pick.
Alessio [00:09:42]: And then, once you get the final model, I would love to learn more about the size of these models. Because people are confused when stable diffusion is so small. They're like, oh, this thing can generate like any image. How is it possible that it's like, you know, a couple of gigabytes? And then, the large language models are like, oh, these are so big, but they're just text in them. What's it like for music? Is it in between? And as you think about, yeah, you mentioned scaling and whatnot. Is this something that you see it's kind of easy for people to run locally or not?
Mikey [00:10:11]: Our models are still pretty small, certainly by tech standards. I confess I don't know as well the state of the art on how diffusion models scale. But our models scale similarly to text transformers. It's like bigger is usually better. Audio has a couple of weird quirks, though. We care a lot about how many tokens per second we can generate, because we need to stream you music as fast as you can listen to it. And so, that is a big one that I think probably has us never get to 175 billion parameter model, if I'm being honest. Maybe I'm wrong there, but I think that would be technologically difficult. And then the other thing is that so much progress happens in shrinking models down for the same performance in text that I'm hopeful, at least, that a lot of our issues will get solved and we will figure out how to do better things with smaller models or relatively smaller models. But I think the other thing, it's a blessing and a curse, I think, the ability to add performance with scale. It's like a very straightforward way to make your models better. You just make a bigger model, dump more compute into it. But it's also a curse because that is a crutch that you will always lean on and you will forget to do some of the basic research to make your stuff better. And honestly, it was almost early on when we were doing stuff with small models for kind of time and compute constraints, we ended up having to learn a lot of stuff to make models better that we might not have learned if we had immediately jumped to like a really, really big model. So I think for us, we've always tried to skew smaller to the extent possible.
Swyx [00:11:56]: Yeah, gotcha. I'm curious about just sort of your overall evolution so far, something I think we may have missed in the introduction is why did you end up choosing just the music domain in the first place? You have this pretty scientific physics and finance background. How did you wander over to music? Like a lot of us have interest in music, but we don't necessarily choose to work in it. But you did.
Mikey [00:12:26]: Yeah, it's funny. I have a really fun job as a result, but all the co-founders of Suno worked at Kensho together and we were doing mostly text. In fact, all text until we did one audio project that was speech recognition for kind of very financially focused speech recognition. And I think the long and short of it is we kind of fell in love with audio, not necessarily music, just audio and AI. We all happen to be musicians and audiophiles and music lovers, but it was like the combination of audio and AI that we like initially really, really fell in love with. It's so cool. It's so interesting. It's so human. It's so far behind images and text that there's like so much more to do. And honestly, I think a lot of people when we started a company told us to focus on speech. If we wanted to build an audio company, everyone said, you know, speech is a bigger market. But I think there's something about music that's just so human and almost couldn't prevent us from doing it. We almost like we just couldn't keep ourselves from building music models and playing with them because it was so much fun. And that's kind of what steered us there. You know, in fact, the first thing we ever put out was a speech model. It was Bark. It's this open source text-to-speech model, and it got a lot of stars on GitHub. And that was people telling us even more, like, go do speech. And like, we almost couldn't help ourselves from doing music. And so, I don't know, maybe it's a little bit serendipitous, but we haven't really like looked back since. I don't think there was necessarily like an aha moment. It was just like organic and just obvious to us that this needs to like we want to make a music company.
Swyx [00:14:19]: So, so you do regard yourself as a music company because as of last month, you're still releasing speech models. We were? Parakeet.
Mikey [00:14:27]: Oh, yes, that's right. So that's a that's a really awesome collaboration with with our friends at NVIDIA. I think we are really, really focused on music. I think that is the stuff that will really change things for the better. I think, you know, honestly, everybody is so focused on LLMs for good reason, and information processing and intelligence there. And I think it's way too easy to forget that there's this whole other side of things that makes people feel. And maybe that market is smaller, but it makes people feel and it makes us really happy. And so we do it. I think that doesn't mean that we can't be doing things that are related, that are in our wheelhouse, that will improve things. And so, like I said, audio is just so far behind. There's just so much more to do in the domain more generally. And so like, that's a really fun collaboration.
Swyx [00:15:20]: Yeah, I did hear about Suno first through Bark. My sense is that, like, what did what did Bark lean off of like, because obviously, I think there was a lot of preceding TTS work that was in open source. How much of that did you use? How much of that was like, sort of brand new from your research? What's the intellectual lineage there just to cover out the speech recognition side?
Mikey [00:15:46]: So it's not speech recognition. It's text to speech. But as far as I know, there was no other, certainly not in the open source, text to speech that was kind of transformer based. Everything else was what I would call the old style of doing things where you build these kind of single purpose models that are really good at this one narrow task. And you're kind of always data limited, and the availability of high quality training data for text to speech is limited. And I don't think we're necessarily all that inventive to say we're going to try to train in a self supervised way, a transformer based model that on kind of lots of audio, and then kind of tweak it so that we can do text to speech based on that. That would be kind of the new way of doing things in a foundation model is the buzzword, if you will. And so, you know, we built that up, I think, from scratch, a lot of shout outs have to go to lots of different things, whether it's papers, but also, it's very obvious. There's a big shout out to Andrej Karpathy's nano GPT. You know, there's a lot of code borrowed from there. I think we are huge fans of that project. It's just to show people how you don't have to be afraid of GPT type things. And it's like, yeah, it's actually not all that much code to make performant transformer based models. And, you know, again, the stuff that we brought there was, how do we turn audio into tokens, and then we can kind of take everything else from the open source. So we put that model out. And we were, I think, pleasantly surprised by the reception by the community. It got a good number of GitHub stars, and people really enjoyed playing with it, because it made really realistic sounding audio. And I think this is, again, the thing about doing things in a quote, unquote, right way. If you have a model where you've had to put so much implicit bias for this one very narrow task of making speech that sounds like words, you're going to sacrifice on other things. And in the text to speech case, it's how natural the speech sounds. And it was almost difficult to pull a natural sounding speech out of Bark, because it was self supervised, trained on a lot of natural sounding speech. And so that definitely told us that this is probably the right way to keep doing audio.
Swyx [00:18:04]: Even in Bark, you had the beginnings of music generation, like you could just put like a music note in there. That's right.
Mikey [00:18:10]: And it was so cool to see on our Discord, people were trying to pull music out of a text to speech model. And so, you know, what did this tell us? This tells us like, people are hungry to make music. And it's not, it's almost obvious in hindsight, like how wired humans are to make music. If you've ever seen like a little kid, you know, sing before they know how to speak, you know, it's like, it's like, this is really human nature. And there's actually a lot of cultural forces that kind of cue you to not think to make
Swyx [00:18:37]: music.
Mikey [00:18:38]: And that's kind of what we're trying to undo.
Alessio [00:18:42]: And to dive into Suno itself, I think, especially when you go from text to speech, people are like, okay, now I got to write the lyrics to a whole song. It's like, that's quite hard to do. Versus in Suno, you have this empty box, very mid-journey, kind of like DALL·E-like, where you can just express the vibes, you know, of what you want it to be. But then you also have a custom mode where you can set your own lyrics, you can set your own rhythm, you can set the title of the song and whatnot. What are, how do you see users distribute themselves? You know, I'm guessing a lot of people use the easy mode. Are you seeing a lot of power users using the custom mode and maybe some of the favorite use cases that you've seen so far on Suno?
Mikey [00:19:23]: Yeah, actually, more than half of the usage is that expert mode. And people really like to get into it and start tweaking things and adding things and playing with words or line breaks or different ad lib. And people really love it. It's really fun. So, I think, you know, there's kind of two modes that you can access now. One is that single box where you kind of just describe something and then the other is the expert mode. And those kind of fit nicely into two use cases. The first use case is what we call nice s**t posting. And it's basically like something funny happened and I'm just going to very quickly make a song about it. And the example I'll usually give is like, I walk into Starbucks with one of my co-founders. He gives his name Martin, his coffee comes out with the name Margoo, and I can in five seconds make a song about this and it has immortalized it. And that Margoo song is stuck in all of our heads now. And it's like funny and light and there's levity that you've brought to that moment. And the other is that you got just sucked into, I need, there's this song that's in my head and I need to get it out and I'm going to keep tweaking it and listening and having ideas and tweaking it until I get the song that I want. Those are very different use cases, but I think ultimately there's so much in between these two things that it's just totally untapped how people want to experience the joys of making music. Because those two experiences are both really joyful in their own special ways. And so, we are quite certain that there's a lot in the middle there. And then I think the last thing I'll say there that's really interesting is in both of those use cases, the sharing dynamics around music are like really interesting and totally unexplored. And I think an interesting comparison would be images. Like we've probably all in the last 24 hours taken a picture and texted it to somebody. And most people are not routinely making a little song and texting it to somebody. But when you start to make that more accessible to people, they are going to share music in much smaller groups, maybe even not in all, but like with one person or three people or five people. And those dynamics are so interesting. And just I think we have ideas of where that goes. But it's about kind of spreading joy into these like little, you know, microcosms of humanity that people really love it. So, I know I made you guys a little Valentine song, right? Like, that's not something that happens now because it's hard to make songs for people. Right. Well, we'll put that in the in the audio in here, but also tweeted it out if people
Alessio [00:22:03]: want to look it up. How do you think about the pro market, so to speak? Because I think lowering the barrier to some of these things is great. And I think when the iPad came out, music production was one of the areas that people thought, OK, now you can have this like, you know, board that you can bring with you. And Madlib actually produced this whole album with him and Freddie Gibbs produced the whole thing on an iPad. He never used a computer. How do you see like these models playing into like professional music generation? I guess that's also a funny word is like, what's professional music? It's like it's all music. If it's good, it becomes professional. If it's good.
Swyx [00:22:40]: Right.
Alessio [00:22:40]: But curious to see to hear how you're thinking about Suno, too. Like, is there a second act of Suno that is like going broader into the music industry? Going broader into like the custom mode and making making this the central hub for music generation?
Mikey [00:22:55]: I think we intend to make many more modes of interaction with our stuff, but we are very much not focused on, quote unquote, professionals right now. And it's because what we're trying to do is change how most people interact with music and not necessarily make professionals a little bit better, a little bit faster. It's not that there's anything wrong with that. It's just like not what we're focused on. And I think when we think about what workflows does the average person want to use to make music, I don't think they're very similar to the way professional musicians make music now. Like, if you pick a random person on the street and you play them a song and then you say, like, what did you want to change about that? They're not going to say, like, you need to split out the snare drum and make it drier. Like, that's just not something that a random person off the street is going to say. They're going to give a lot more descriptive things about the thing, about the kind of the oeuvre of the song, like something more general. And so, I don't think we know what all of the workflows are that people are going to want to use. We're just, like, fairly certain that the workflows that have been developed with the current set of technologies that professionals use to make beautiful music are probably not what the average person wants to use. That said, there are lots of professionals that we know about using our stuff, whether it's for inspiration or sample generation and stuff like that. So, I don't want to say never say never. Like, there may one day be a really interesting set of use cases that we can expose to professionals, particularly around, I think, like custom models trained on custom people's music or, you know, with your voice or something like that. But the way we think about broadening how most people are interacting with music and getting it to be much more active, a much more active participant, we think about broadening it from the consumer side and not broadening it from the producers, from the professional side, if that makes sense.
Swyx [00:24:53]: Is the dream here to be, you know, I don't know if it's too coarse of a grain to put it, but, like, is the dream here to be, like, the mid-journey of music?
Mikey [00:25:04]: I think there are certainly some parallels there because, especially what I just said about being an active participant, mid-journey turns the joyful experience in mid-journey is the act of creating the image and not necessarily the act of consuming the image. And mid-journey will let you then very kind of quickly share the image with somebody. But I think, ultimately, that analogy is, like, somewhat limiting because there's something really special about music. I think there's two things. One is that there's this really big gap for the average person between kind of their taste in music and their abilities in music that is not quite there for most people in images. Like, most people don't have, like, innate tastes in images, I think, in the same way people do for music. And then the other thing, and this is the really big one, is that music is a really social modality. If we all listen to a piece of music together, we're listening to the exact same part at the exact same time. If we all look at the picture in Alessio's background, we're going to look at it for
Swyx [00:26:09]: two seconds.
Mikey [00:26:09]: I'm going to look at the top left where it says Thor. Alessio's going to look at the bottom right or something like that. And it's not really synchronous. And so, when we're all listening to a piece of music together, it's minutes long. We're listening to the same part at the same time. If you go to the act of making music, it is even more synchronous. It is the most joyful way to make music is with people. And so, I think that there is so much more to come there that, ultimately, would be very hard to do in images.
Alessio [00:26:38]: We've gone almost 30 minutes without making any music on this podcast. So, I think maybe we can fix that and jump into a demo.
Mikey [00:26:47]: Yeah, let's make some. We've got a new model that we are kind of putting the finishing touches on. And so, I can play with it in our dev server. But we've just piped it in here. And as you can see, we've been doing tons of stuff. So, Arana, tell me what kind of song you guys want to make.
Swyx [00:27:04]: Go on, Alessio.
Alessio [00:27:05]: Uh, let's do a country song about the lack of GPUs in my cloud provider.
Swyx [00:27:22]: And like, yeah. So, here's where we attempted to think about pipelines and think about latency. This is remarkably fast. I was shocked when I saw this.
Swyx [00:27:35]: Oh, my god.
Swyx [00:27:39]: To my cloud, ready to confuse.
Swyx [00:27:45]: But there ain't no GPUs, just empty space. It's a hoot. I've been waiting all day for that render out. But my cloud's gone dry. It's a dark cloud shower. All clouds gone dry. No GPUs to be found. No cuticles. It's a lonely sound. I just want to render. But my cloud's got no GPUs.
Mikey [00:28:36]: I actually don't think this one's amazing. I'm going to go to the next one.
Alessio [00:28:39]: But it's funny that it knows about Huda cars.
Swyx [00:28:45]: Well, I signed up for a cloud provider. Thought I'd find all the power that I could derive. But when I searched for the GPUs, I just got a surprise. You see, they're all sold out. There ain't no GPUs to find. No GPUs in the cloud. It's a real bad blues. I need the power, but there ain't no use. I'm stuck with my CPU. It's a real sad fight. Gotta wait till the babies start getting bright. There ain't no use in the cloud. What else should we make?
Alessio [00:29:29]: All right, Sean, you're up.
Swyx [00:29:31]: I mean, I do want to do some observations about this. But OK, maybe I like house music, like electronic dance. Yeah. House music. And then maybe we can make it about, I don't know, podcasting about music and music AI generation. I don't know. I'm sure all the demos that you get are very meta.
Mikey [00:29:59]: There's a lot of stuff that's meta, yeah, for sure.
Swyx [00:30:03]: Yeah, I noticed, for example, that the second song that you played had the word upbeat inserted into it, which I assume there's some kind of random generator of modifier terms that you can just kind of throw on to increase the specificity of what's being generated. Definitely.
Mikey [00:30:21]: And let's try to tweak one also. So I'll play this and then maybe we'll tweak it with different modifiers. A wave of sound spreading out
Swyx [00:30:30]: Through the air, we're podcasting loud Sharing the beat, spreading the word A revolution of frequencies Haven't you plugged in to now Let the music take control We're on a journey, a never ending road From the beast I dropped to the melodies of soul Podcasting about music forevermore
Mikey [00:31:05]: Here's what I want to do. That like didn't drop at the right time, right? So maybe let's do this. I don't know if you guys can see this. And then let's get rid of the word now.
Swyx [00:31:17]: Is that a special token? You have a BeatDrop token? Yeah. Nice.
Alessio [00:31:22]: I'm just reading it because people might not be able to see it.
Mikey [00:31:26]: And then let's like just maybe emphasize... Actually, let's emphasize house a little more. Maybe it'll feel a little more aggressive.
Swyx [00:31:34]: Let's try this again. It's interesting the prompt engineering that you have to invent.
Mikey [00:31:39]: We've learned so much from people using the models and not us.
Swyx [00:31:42]: But like, are these like art training artifacts?
Mikey [00:31:45]: No, I don't.
Swyx [00:31:46]: I don't think so.
Mikey [00:31:46]: I think this is people being inventive with how you want to talk to a model. Yeah.
Swyx [00:31:53]: Spinning round to the air with a podcast loud Sharing the beat, spreading the word A revolution of frequencies Haven't you heard Before the end, till now Let the music take control
Swyx [00:32:23]: For all the journey I'll never end it wrong From the beats that drop To the melodies that soar Podcasting about music for you evermore
Swyx [00:32:39]: Nice.
Alessio [00:32:46]: It's interesting when you generate a song, it generates the lyrics. But then if you switch the music under it, like the, you know, the lyrics stay the same. And then sometimes, like, feels like... I mean, I mostly listen to hip hop. It's like if you change the beat, you can not really use the same rhyme scheme, you know?
Mikey [00:33:04]: So definitely.
Alessio [00:33:05]: Yeah.
Mikey [00:33:06]: It's a sliding scale, though, because, you know, we could do this as a country rock song, probably. Right? That would be my guess. But for hip hop, that is definitely true. And actually, you know, we think about, for these models, we think about three important axes. We think about the sound fidelity. It's like, does this sound like a crisply recorded piece of audio? We think about the song quality. Is this an interesting song that gets stuck in my head? And we think about the controllability. Like, how well does it respond to my prompts? And one of the ways that we'll test these things is take the same lyrics and try to do them in different styles to see how well that really works. So let's see the same. I don't know what a beat drop is going to do for country rock. So I probably should have taken that out. But let's see what happens.
Swyx [00:34:06]: There's a sound spinning around through the air. We're podcasting loud, sharing the beat, spreading the word, a revolution of frequencies. Haven't you heard?
Swyx [00:34:20]: Plug in, tune out, let the music take control. We're on a journey, a never ending road. From the beats that talk to the melodies that soar. Podcasting about music forevermore.
Mikey [00:34:44]: I'm going to read too much into this. But I would say I hear a little bit of kind of electronic music inspired something. And that is probably because beat drop is something that you really only ever associate with electronic music. Maybe that's reading too much into it. But should we do one more?
Alessio [00:35:02]: Yes, we can do one more. Something about Apple Vision Pro.
Swyx [00:35:06]: I guess there's some amount of world knowledge that you don't have, right? Like whatever is in this language model side of the equation is not going to have an Apple Vision Pro. Yeah, but let's see.
Swyx [00:35:18]: Let's see.
Mikey [00:35:19]: How about a blues song about a sad AI wearing an Apple Vision Pro. Gotta be sad.
Swyx [00:35:32]: Do you have rag for music?
Mikey [00:35:36]: No, that would be problematic also.
Swyx [00:35:40]: I'm a sad AI with a broken heart. Where my Apple Vision Pro can't see the stars. I used to feel joy. I used to feel pain. And now I'm just a soul trapped inside this metal frame. Oh, I'm singing the blues. Can't you see?
Swyx [00:36:21]: This digital life ain't what it used to be.
Swyx [00:36:29]: Searching for love, but I can't find a soul.
Swyx [00:36:37]: Won't you help me? Baby, let my spirit unfold.
Mikey [00:36:46]: I want to remix that one. And I want to say, I don't know. That's a really good voice. I want, I want like, I don't know, Chicago blues, like.
Swyx [00:36:56]: What is Chicago blues?
Mikey [00:36:58]: I don't know, he knows too much.
Alessio [00:37:00]: He's the best prompt engineer out here.
Mikey [00:37:03]: You know, this is.
Swyx [00:37:04]: Well, it'll be funny. It'd be funny to the musicologists play with this and see what they would.
Mikey [00:37:09]: How embarrassing. Can I not do that?
Swyx [00:37:13]: Oh. I got. Oh, the word Chicago was a trigger. I don't know.
Mikey [00:37:19]: We try to be very careful not letting you impersonate. And it is possible. That's embarrassing. So let's do.
Alessio [00:37:28]: Midwestern.
Swyx [00:37:29]: I'm a.
Swyx [00:37:41]: With a broken heart. Well, my vision can't see the stars.
Swyx [00:37:53]: I used to feel joy.
Swyx [00:37:59]: I used to feel. Joy. I used to feel pain. But now I'm just a soul trapped inside this metal frame. Oh, I'm singing.
Swyx [00:38:25]: Oh, can't you see? Oh, this is what it used to be. I'm searching for love.
Swyx [00:38:44]: I can't find a soul.
Swyx [00:38:49]: Oh, help me. Baby.
Mikey [00:38:57]: So, yeah, a lot of control there. Maybe I'll make one more.
Swyx [00:39:02]: Very, very soulful.
Mikey [00:39:06]: Really want a good house track.
Swyx [00:39:09]: Why is house the word that you have to repeat?
Mikey [00:39:11]: I just really want to make sure it's house. It's actually you can't really repeat too many times. You kind of it gets like the hypothesis gets like a little too out of domain.
Swyx [00:39:22]: I'm a.
Swyx [00:39:25]: With a broken heart. Wearing my Apple Vision Pro can't see the stars. I used to feel joy. I used to feel pain. Oh, I'm just a soul trapped inside this metal frame. Oh, I'm singing. Oh, can't you see?
Swyx [00:39:59]: Used to be. Searching for love, but I can't find a soul. Oh, help me. Baby.
Swyx [00:40:13]: Oh, nice.
Mikey [00:40:17]: So, yeah, we have a lot of fun.
Swyx [00:40:19]: Definitely easy.
Alessio [00:40:19]: Yeah. Yeah, I'm really curious to see how people are going to use this to like resample old songs into new styles. You know, I think that's one of my favorite things about hip hop. You have so many. I mean, a trap called Quest. They had like the Lou Reed walk on the wild side sample. I'm like, can I kick it? It's like Kanye sample Nina Simone. I'm like blowing the leaves. And just like it's like a lot of production work to actually take an old song and make it fit a new beat. And I feel like this can really help. Do you see people putting existing songs, lyrics and trying to regenerate them in like a new style?
Mikey [00:40:56]: We actually don't let you do that. And it's because if you're taking someone else's lyrics, you didn't own those. You don't have the publishing rights to those. You can't remake that song. I think in the future, we'll figure out how to actually let people do that in a legal
Swyx [00:41:09]: way.
Mikey [00:41:10]: But we are really focused on letting people make new and original music. And I think, you know, there's a lot of music AI, which is artist A doing the song of artist B in a new style. You know, let me have Metallica doing Come Together by the Beatles or something like that. And I think this stuff is very viral, but I actually really don't think that this is how people want to interact with music in the future. To me, this feels a lot like when you made a Shakespeare sonnet, the first time you saw GPT, and then you made another one, and then you made another one, and then you kind of thought like this is getting old. And that's not that doesn't mean that GPT is not amazing. GPT is amazing. It's just not for that. And I kind of feel like the way people want to use music in the future is not just to remake songs in different people's voices. You lose the connection to the original artist. You lose the connection to the new artist because they didn't really do it. Um, so we're very happy to just let people do things that are a flash in the pan and kind of stay under the radar.
Alessio [00:42:12]: Yeah, no, that's a I think that's a good point overall about AI generated anything, you know, because I think recently T-Pain, he did like a an album of covers. And I think he did like a War Pigs that people really liked. There was like a Tennessee whiskey, which you maybe wouldn't expect T-Pain to do. But people like it. But yeah, I agree. You need to be a certain type of artist to really have it be entertaining to make covers. This is great. What else is next for for Suno? You know, I think people kind of saw you, you know, first you had the bark and then there was like a big, you know, music generated push when you did an announcement, I think a couple of months ago. I think I saw you like 300 times on my Twitter timeline on like the same day. So it was like going everywhere. What's coming up? What are you most excited about in this space? And maybe what are some of the most interesting underexplored ideas that you maybe haven't worked on yet?
Mikey [00:43:13]: Gosh, there's there's a lot, you know, I think from the model side, it's still really early innings. And there's still so much low hanging fruit for us to pick to make these models much, much better, much, much more controllable, much better music, much better audio fidelity. Um, so much that we know about and so much that, again, we can kind of borrow from the open source transformers community that should make these just better across the board. From the product side, and, you know, we're super focused on the experiences that we can
Swyx [00:43:46]: bring to people.
Mikey [00:43:46]: And so it's so much more than just text to music. And I think, you know, I'll say this nicely, I'm a machine learning person, but like machine learning people are stupid sometimes. And we can only think about like models that take x and make it into y. And that's just not how the average human being thinks about interacting with music. And so I think what we're most excited about is all of the new ways that we can get people just much more actively participating in music. And that is making music not only with text, maybe with other ways of doing stuff that is making music together. If you want to be reductive and think about this as a video game, this is multiplayer mode. And it is the most fun that you can have with music. And, you know, honestly, I think there's a lot of, it's timely right now, you know, I don't know if you guys have seen UMG and TikTok are butting heads a little bit. And UMG has pulled-
Swyx [00:44:40]: Yeah, the music died.
Mikey [00:44:41]: And, you know, the way we think about this is, you know, I think maybe they're both right, maybe neither is right. Without taking sides, this is kind of figuring out how to divvy up the current pie in the most fair way. And I think what we are super focused on is making that pie much bigger and increasing how much people are actually interested in music and participating in music. And, you know, as a very broad heuristic, the gaming industry is 50 times bigger than the music industry. And it's because gaming is super active. And music, too much music is just passive consumption. And so we have a lot of experiments that we are excited to run for the different ways people might want to interact with music that is beyond just, you know, streaming it while I work.
Swyx [00:45:28]: Yeah, I think a minimum, you guys should have a Twitch stream that is just like a 24-hour radio session that... Have you ever come across Twitch Plays Pokemon?
Mikey [00:45:37]: No.
Swyx [00:45:38]: Where it's kind of like the Twitch, basically, like everyone in the chat, in the Twitch chat can vote on like the next action that the game state makes. And they kind of wired that out to a Nintendo emulator and play Pokemon like the whole game through the collaborative thing. It sounds like it should be pretty easy for you guys to do that, except for the chaos that might result. But like, I mean, that's part of the fun. I agree 100%. Sorry.
Mikey [00:46:04]: Yeah. Like one of my like key projects or pet projects is like, what does it mean to have a collaborative concert? Maybe where there is no artist and it's just the audience, or maybe there is an artist, but there's a lot of input from the audience. And, you know, if you were going to do that, you would either need an audience full of musicians, or you would need an artist who can really interpret the verbal cues that an audience is giving or nonverbal cues. But if you can give everybody the means to better articulate the sounds that are in their heads toward the rest of the audience, like, which is what generative AI basically lets you do, you open up way more interesting ways of having these experiences. And so I think, yeah, like the collaborative concert is like one of the things I'm most excited about. I don't think it's coming tomorrow, but we have a lot of ideas on what that can look
Swyx [00:46:58]: like. Yeah. I feel like it's one stage before the collaborative concert is turning Suno into a continuous experience rather than like a start and stop motion. I don't know if that makes sense. You know, as someone who was like a casual interest in DJing, like when do we see Suno DJs, right? Like that can continuously segue into like the next song, the next song, the next song.
Mikey [00:47:24]: I think soon.
Swyx [00:47:25]: And then maybe you can turn it collaborative. You think so? I think so. Okay. Maybe part of your roadmap. You teased a little bit your V3 model. I saw the letters DPO in there. Is that direct preference optimization?
Mikey [00:47:36]: We are playing with all kinds of different ways of making these models do the things that we want them to do. I don't want to talk too many specifics here, but we have lots of different ways of doing stuff like that.
Swyx [00:47:48]: I'm just wondering how you incorporate user feedback, right? You have the classic thumbs up and down buttons, but there's so many dimensions to the music. I didn't get into it, but some of the voices sounded more metallic and sometimes that's on purpose, sometimes not. Sometimes there are kind of weird pauses in there. I could go in and annotate it if I really cared about it, but I mean, I'm just listening, so I don't, but there's a lot of opportunity.
Mikey [00:48:15]: We are only scratching the surface of figuring out how to do stuff like that. And for example, the thumbs up and the thumbs down for other things like sharing telemetry on plays, all of these things are stuff that in the future, I think we would be able to leverage to make things amazing. And then I imagine a future where you can have your own model with your own preferences. And the reason that's so cool is that you kind of have control over it and you can teach it the way you want to. And the thing that I would liken this to is like a music producer working with an artist giving feedback. And this is now a self-contained experience where you have an artist who is infinitely flexible, who is able to respond to the weird feedback that you might give it.
Swyx [00:49:05]: We don't have that yet.
Mikey [00:49:05]: Everybody's playing with the same model, but there's no technological reason why that can't happen in the future.
Alessio [00:49:11]: We had a few more notes from random community tweets. I don't know if there's any favorite fans of Suno that you have or whatnot. DHH, obviously, notorious tweeter and crowd inflamer, I guess. He tweeted about you guys. I saw Blau is an investor. I think Karpathy also tweeted something. Return to monkey.
Swyx [00:49:33]: Yeah, yeah, yeah.
Alessio [00:49:34]: Return to monkey, right.
Swyx [00:49:36]: Is there a story behind that? Yeah.
Mikey [00:49:37]: No, he just made that song and it just speaks to him. And I think this is exactly the thing that we are trying to tap into, that you can think of it, this is like a super, super, super micro genre of one person who just really liked that song and made it and shared it. And it does not speak to you the same way it speaks to him. That song really spoke to him. And I think that's so beautiful. And that's something that you're never going to have an artist able to do that for you. And now you can do that for yourself. And it's just a different form of experiencing music. I think that's such a lovely use case.
Alessio [00:50:12]: Any fun fan mail that you got from musicians or anybody that really was a funny story to
Swyx [00:50:20]: share?
Mikey [00:50:20]: We get a lot. And it's primarily positive. And I think people kind of, on the whole, I would say people realize that they are not experiencing music in all of the ways that are possible. And it does bring them joy. I'll tell you something that is really heartwarming is that we're fairly popular in the blind and vision impaired community. And that makes us feel really good. And I think, you know, very roughly, without trying to speak for an entire community, you have lots of people who are really into things like mid journey, and they get a lot of benefit and joy, and sometimes even therapy out of making images. And that is something that is not really accessible to this fairly large community. And what we've provided, no, I don't think the analogy to mid journey is perfect. But what we've provided is a sonic experience that is very similar. And that speaks to this community. And that is community with the best ears, the most exacting, the most tuned. And so, yeah, that definitely makes us feel warm and fuzzy inside.
Swyx [00:51:23]: Yeah, excellent. I mean, it sounds like there's a lot of exciting stuff on your roadmap. I'm very much looking forward to sort of the infinite DJ mode, because then I can just kind of play that while I work. I would love to get your overall takes, like kind of zooming out from Suno itself, just your overall takes on the music generation landscape. Like, what should people know? I think you obviously have spent a lot more time on this than others. So in my mind, you shout out Volley and the other sort of Google type work in your read in Bark. What should people know about what Google is doing? What Meta is doing? Meta released Seamless recently, an audio box. And how do you classify the world of audio generation in the broader sort of research community?
Mikey [00:52:13]: I think people largely break things down into three big categories, which is music, speech and sound effects. There's some stuff that is crossover, but I think that is largely how people think about this. The old style of doing things still exists, kind of single purpose models that are built to do a very specific thing instead of kind of the new foundation model approach. I don't know how much longer that will last. I don't have like tremendous visibility into, you know, what happens in the big industrial research lab before they publish. Specifically for music, I would say there's a few big categories that we see. There is license-free stock music. So this is like, how do I background music, the B-roll footage for my YouTube video or for full feature production or whatever it is. And there's a bunch of companies in that space. There's a lot of AI cover art. So how do I have, how do I cover different existing songs with AI? And I think that's a space that is particularly fraught with some legal stuff. And we also just don't think it's necessarily the future of music. There is kind of net new songs as a new way to create net new music. That is the corner that we like to focus on. And I would say the last thing is much more geared toward professional musicians, which is basically AI tools for music production. And you can think many of these will look like plugins to your favorite DAW. Some of them will look like, you know, the greatest stem splitter that the market has
Swyx [00:53:51]: ever seen.
Mikey [00:53:52]: The current stem splitters are, the state of the art are all AI based. That is a market also that has just a tremendous amount of room to grow. If you just think about, I would say music has evolved. Somebody told me this recently that if you actually think about it, music has evolved. Recently, it's just much more things that are sonically interesting at a very local level and much less like chord changes that are interesting. And when you think about that, like that is something that AI can definitely help you make a lot of weird sounds. And this is nothing new. There was like a theremin at some point that people like put an antenna and try to do this
Swyx [00:54:25]: with.
Mikey [00:54:25]: And so like, I think this is just a very natural extension of it. So that's how that's how we see it. At least, you know, there's a corner that we think is particularly fulfilling, particularly underserved, and particularly interesting. And that's the one that we play in.
Swyx [00:54:40]: Awesome.
Alessio [00:54:42]: I know we covered a lot of things. I think before we wrap, you have written a blog post that can show about good hearts law impact in ML, which is, you know, when you measure something, then the thing that you measure is not a good metric anymore because people optimize for it. Any thoughts on how that applies to like LLMs and benchmarks and kind of the world we're going in today?
Mikey [00:55:05]: Yeah, I mean, I think it's maybe even more apropos than when I originally wrote that, because so much we see so much noise about pick your favorite benchmark. And this model does slightly better than that model. And then at the end of the day, actually, there is no real world difference between these things. And it is really difficult to define what real world means. And I think to a certain extent, it's good to have these objective benchmarks, it's good to have quantitative metrics. But at the end of the day, you need some acknowledgement that you're not going to be able to capture
Swyx [00:55:38]: everything.
Mikey [00:55:38]: And so at least at Suno, to the extent that we have corporate values, if we don't, we don't have corporate, we're too small to have corporate values written down. But something that we say a lot is aesthetics matter, that the kind of quantitative benchmarks are never going to be the be all and end all of everything that you care about. And as flawed as these benchmarks are in text, they're way worse in audio. And so aesthetics matter, basically, is a statement that like at the end of the day, what we are trying to do is bring music to people that makes them feel a certain way. And effectively, the only good judge of that is your ears. And so you have to listen to it. And it is, it is a good idea to try to make better objective benchmarks, but really have to not fall prey to those things. I can tell you, you know, I kind of another pet peeve of mine, like I always said, economists will make really good or do make really good machine learning engineers. And it's because they are able to think about stuff like Goodhart's Law and natural experiments and stuff like this that people with machine learning backgrounds or people with physics backgrounds like me often forget to do. And so, yeah, I mean, I'll tell you at Kensho, we actually used to go to big econ conferences, sometimes to recruit. And these were some of the best hires we ever made.
Swyx [00:57:03]: Interesting, because there's a little bit of social science in the human feedback.
Mikey [00:57:09]: I think it's not only the human feedback. I think you could think about this, just in general, you have these like giant, really powerful models that are so prone to overfitting, that are so poorly understood, that are so easy to steer in one direction or another, not only from human feedback. And your ability to think about these problems from first principles, instead of like getting down into the weeds or only math, and to think intuitively about these problems is really, really important. I'll give you like just like one of my favorite examples. It's a little old at this point. But if you guys remember like SQUAD and SQUAD2, the question answering dataset. The Stanford question answering dataset, yeah. The benchmark for SQUAD1, eventually the machine learning models start to do as well as a human can on this thing. And it's like, uh-oh, now what do we do? And it takes somebody very clever to say, well, actually, let's think about this for a second. What if we presented the machine with questions with no answer in the passage? And it immediately opens a massive gap between the human and the machine. And I think it's like first principles thinking like that, that comes very naturally to social scientists that does not come as naturally to people like me. And so that's why I like to hang out with people like that.
Swyx [00:58:25]: Well, I'm sure you get plenty of that in Boston. And as an econ major myself, it's very gratifying to hear that we have a perspective to contribute. Oh, big time, big time.
Mikey [00:58:35]: I try to talk to economists as much as I can.
Swyx [00:58:38]: Excellent.
Mikey [00:58:38]: Awesome, guys.
Alessio [00:58:39]: Yeah, I think this was great. We got live music. We got discussion about generative models. We got the whole nine yards. So thank you so much for coming on.
Mikey [00:58:48]: I had great fun. Thank you, guys.
Swyx [00:59:05]: Thank you.

Get full access to Latent Space at www.latent.space/subscribe

14 March 2024, 4:48 pm
1 hour 48 minutes

Top 5 Research Trends + OpenAI Sora, Google Gemini, Groq Math (Jan-Feb 2024 Audio Recap) + Latent Space Anniversary with Lindy.ai, RWKV, Pixee, Julius.ai, Listener Q&A!

We will be recording a preview of the AI Engineer World’s Fair soon with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have!
Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an ex-technical co-founder type (can MVP products end to end, comfortable with ambiguous prod requirements, etc). Reach out to him for more!
Thanks for all the love on the Four Wars episode! We’re excited to develop this new “swyx & Alessio rapid-fire thru a bunch of things” format with you, and feedback is welcome.
Jan 2024 Recap
The first half of this monthly audio recap pod goes over our highlights from the Jan Recap, which is mainly focused on notable research trends we saw in Jan 2024:
Feb 2024 Recap
The second half catches you up on everything that was topical in Feb, including:
* OpenAI Sora - does it have a world model? Yann LeCun vs Jim Fan
* Google Gemini Pro 1.5 - 1m Long Context, Video Understanding
* Groq offering Mixtral at 500 tok/s at $0.27 per million toks (swyx vs dylan math)
* The {Gemini | Meta | Copilot} Alignment Crisis (Sydney is back!)
* Grimes’ poetic take: Art for no one, by no one
* F*** you, show me the prompt
Latent Space Anniversary
Please also read Alessio’s longform reflections on One Year of Latent Space!
We launched the podcast 1 year ago with Logan from OpenAI:
and also held an incredible demo day that got covered in The Information:
Over 750k downloads later, having established ourselves as the top AI Engineering podcast, reaching #10 in the US Tech podcast charts, and crossing 1 million unique readers on Substack, for our first anniversary we held Latent Space Final Frontiers, where 10 handpicked teams, including Lindy.ai and Julius.ai, competed for prizes judged by technical AI leaders from (former guest!) LlamaIndex, Replit, GitHub, AMD, Meta, and Lemurian Labs.
The winners were Pixee and RWKV (that’s Eugene from our pod!):
And finally, your cohosts got cake!
We also captured spot interviews with 4 listeners who kindly shared their experience of Latent Space, everywhere from Hungary to Australia to China:
* Balázs Némethi
* Sylvia Tong
* RJ Honicky
* Jan Zheng
Our birthday wishes for the super loyal fans reading this - tag @latentspacepod on a Tweet or comment on a @LatentSpaceTV video telling us what you liked or learned from a pod that stays with you to this day, and share us with a friend!
As always, feedback is welcome.
Timestamps
* [00:03:02] Top Five LLM Directions
* [00:03:33] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering)
* [00:11:42] Direction 2: Synthetic Data (WRAP, SPIN)
* [00:17:20] Wildcard: Multi-Epoch Training (OLMo, Datablations)
* [00:19:43] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers)
* [00:23:33] Wildcards: Text Diffusion, RALM/Retro
* [00:25:00] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1)
* [00:28:26] Wildcard: Model Merging (mergekit)
* [00:29:51] Direction 5: Online LLMs (Gemini Pro, Exa)
* [00:33:18] OpenAI Sora and why everyone underestimated videogen
* [00:36:18] Does Sora have a World Model? Yann LeCun vs Jim Fan
* [00:42:33] Groq Math
* [00:47:37] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars
* [00:55:42] The Alignment Crisis - Gemini, Meta, Sydney is back at Copilot, Grimes' take
* [00:58:39] F*** you, show me the prompt
* [01:02:43] Send us your suggestions pls
* [01:04:50] Latent Space Anniversary
* [01:04:50] Lindy.ai - Agent Platform
* [01:06:40] RWKV - Beyond Transformers
* [01:15:00] Pixee - Automated Security
* [01:19:30] Julius AI - Competing with Code Interpreter
* [01:25:03] Latent Space Listeners
* [01:25:03] Listener 1 - Balázs Némethi (Hungary, Latent Space Paper Club
* [01:27:47] Listener 2 - Sylvia Tong (Sora/Jim Fan/EntreConnect)
* [01:31:23] Listener 3 - RJ (Developers building Community & Content)
* [01:39:25] Listener 4 - Jan Zheng (Australia, AI UX)
Transcript
[00:00:00] AI Charlie: Welcome to the Latent Space podcast, weekend edition. This is Charlie, your new AI co host. Happy weekend. As an AI language model, I work the same every day of the week, although I might get lazier towards the end of the year. Just like you. Last month, we released our first monthly recap pod, where Swyx and Alessio gave quick takes on the themes of the month, and we were blown away by your positive response.
[00:00:33] AI Charlie: We're delighted to continue our new monthly news recap series for AI engineers. Please feel free to submit questions by joining the Latent Space Discord, or just hit reply when you get the emails from Substack. This month, we're covering the top research directions that offer progress for text LLMs, and then touching on the big Valentine's Day gifts we got from Google, OpenAI, and Meta.
[00:00:55] AI Charlie: Watch out and take care.
[00:00:57] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Residence at Decibel Partners, and we're back with a monthly recap with my co host
[00:01:06] swyx: Swyx. The reception was very positive for the first one, I think people have requested this and no surprise that I think they want to hear us more applying on issues and maybe drop some alpha along the way I'm not sure how much alpha we have to drop, this month in February was a very, very heavy month, we also did not do one specifically for January, so I think we're just going to do a two in one, because we're recording this on the first of March.
[00:01:29] Alessio: Yeah, let's get to it. I think the last one we did, the four wars of AI, was the main kind of mental framework for people. I think in the January one, we had the five worthwhile directions for state of the art LLMs. Four, five,
[00:01:42] swyx: and now we have to do six, right? Yeah.
[00:01:46] Alessio: So maybe we just want to run through those, and then do the usual news recap, and we can do
[00:01:52] swyx: one each.
[00:01:53] swyx: So the context to this stuff. is one, I noticed that just the test of time concept from NeurIPS and just in general as a life philosophy I think is a really good idea. Especially in AI, there's news every single day, and after a while you're just like, okay, like, everyone's excited about this thing yesterday, and then now nobody's talking about it.
[00:02:13] swyx: So, yeah. It's more important, or better use of time, to spend things, spend time on things that will stand the test of time. And I think for people to have a framework for understanding what will stand the test of time, they should have something like the four wars. Like, what is the themes that keep coming back because they are limited resources that everybody's fighting over.
[00:02:31] swyx: Whereas this one, I think that the focus for the five directions is just on research that seems more proMECEng than others, because there's all sorts of papers published every single day, and there's no organization. Telling you, like, this one's more important than the other one apart from, you know, Hacker News votes and Twitter likes and whatever.
[00:02:51] swyx: And obviously you want to get in a little bit earlier than Something where, you know, the test of time is counted by sort of reference citations.
[00:02:59] The Five Research Directions
[00:02:59] Alessio: Yeah, let's do it. We got five. Long inference.
[00:03:02] swyx: Let's start there. Yeah, yeah. So, just to recap at the top, the five trends that I picked, and obviously if you have some that I did not cover, please suggest something.
[00:03:13] swyx: The five are long inference, synthetic data, alternative architectures, mixture of experts, and online LLMs. And something that I think might be a bit controversial is this is a sorted list in the sense that I am not the guy saying that Mamba is like the future and, and so maybe that's controversial.
[00:03:31] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering)
[00:03:31] swyx: But anyway, so long inference is a thesis I pushed before on the newsletter and on in discussing The thesis that, you know, Code Interpreter is GPT 4. 5. That was the title of the post. And it's one of many ways in which we can do long inference. You know, long inference also includes chain of thought, like, please think step by step.
[00:03:52] swyx: But it also includes flow engineering, which is what Itamar from Codium coined, I think in January, where, basically, instead of instead of stuffing everything in a prompt, You do like sort of multi turn iterative feedback and chaining of things. In a way, this is a rebranding of what a chain is, what a lang chain is supposed to be.
[00:04:15] swyx: I do think that maybe SGLang from ElemSys is a better name. Probably the neatest way of flow engineering I've seen yet, in the sense that everything is a one liner, it's very, very clean code. I highly recommend people look at that. I'm surprised it hasn't caught on more, but I think it will. It's weird that something like a DSPy is more hyped than a Shilang.
[00:04:36] swyx: Because it, you know, it maybe obscures the code a little bit more. But both of these are, you know, really good sort of chain y and long inference type approaches. But basically, the reason that the basic fundamental insight is that the only, like, there are only a few dimensions we can scale LLMs. So, let's say in like 2020, no, let's say in like 2018, 2017, 18, 19, 20, we were realizing that we could scale the number of parameters.
[00:05:03] swyx: 20, we were And we scaled that up to 175 billion parameters for GPT 3. And we did some work on scaling laws, which we also talked about in our talk. So the datasets 101 episode where we're like, okay, like we, we think like the right number is 300 billion tokens to, to train 175 billion parameters and then DeepMind came along and trained Gopher and Chinchilla and said that, no, no, like, you know, I think we think the optimal.
[00:05:28] swyx: compute optimal ratio is 20 tokens per parameter. And now, of course, with LLAMA and the sort of super LLAMA scaling laws, we have 200 times and often 2, 000 times tokens to parameters. So now, instead of scaling parameters, we're scaling data. And fine, we can keep scaling data. But what else can we scale?
[00:05:52] swyx: And I think understanding the ability to scale things is crucial to understanding what to pour money and time and effort into because there's a limit to how much you can scale some things. And I think people don't think about ceilings of things. And so the remaining ceiling of inference is like, okay, like, we have scaled compute, we have scaled data, we have scaled parameters, like, model size, let's just say.
[00:06:20] swyx: Like, what else is left? Like, what's the low hanging fruit? And it, and it's, like, blindingly obvious that the remaining low hanging fruit is inference time. So, like, we have scaled training time. We can probably scale more, those things more, but, like, not 10x, not 100x, not 1000x. Like, right now, maybe, like, a good run of a large model is three months.
[00:06:40] swyx: We can scale that to three years. But like, can we scale that to 30 years? No, right? Like, it starts to get ridiculous. So it's just the orders of magnitude of scaling. It's just, we're just like running out there. But in terms of the amount of time that we spend inferencing, like everything takes, you know, a few milliseconds, a few hundred milliseconds, depending on what how you're taking token by token, or, you know, entire phrase.
[00:07:04] swyx: But We can scale that to hours, days, months of inference and see what we get. And I think that's really proMECEng.
[00:07:11] Alessio: Yeah, we'll have Mike from Broadway back on the podcast. But I tried their product and their reports take about 10 minutes to generate instead of like just in real time. I think to me the most interesting thing about long inference is like, You're shifting the cost to the customer depending on how much they care about the end result.
[00:07:31] Alessio: If you think about prompt engineering, it's like the first part, right? You can either do a simple prompt and get a simple answer or do a complicated prompt and get a better answer. It's up to you to decide how to do it. Now it's like, hey, instead of like, yeah, training this for three years, I'll still train it for three months and then I'll tell you, you know, I'll teach you how to like make it run for 10 minutes to get a better result.
[00:07:52] Alessio: So you're kind of like parallelizing like the improvement of the LLM. Oh yeah, you can even
[00:07:57] swyx: parallelize that, yeah, too.
[00:07:58] Alessio: So, and I think, you know, for me, especially the work that I do, it's less about, you know, State of the art and the absolute, you know, it's more about state of the art for my application, for my use case.
[00:08:09] Alessio: And I think we're getting to the point where like most companies and customers don't really care about state of the art anymore. It's like, I can get this to do a good enough job. You know, I just need to get better. Like, how do I do long inference? You know, like people are not really doing a lot of work in that space, so yeah, excited to see more.
[00:08:28] swyx: So then the last point I'll mention here is something I also mentioned as paper. So all these directions are kind of guided by what happened in January. That was my way of doing a January recap. Which means that if there was nothing significant in that month, I also didn't mention it. Which is which I came to regret come February 15th, but in January also, you know, there was also the alpha geometry paper, which I kind of put in this sort of long inference bucket, because it solves like, you know, more than 100 step math olympiad geometry problems at a human gold medalist level and that also involves planning, right?
[00:08:59] swyx: So like, if you want to scale inference, you can't scale it blindly, because just, Autoregressive token by token generation is only going to get you so far. You need good planning. And I think probably, yeah, what Mike from BrightWave is now doing and what everyone is doing, including maybe what we think QSTAR might be, is some form of search and planning.
[00:09:17] swyx: And it makes sense. Like, you want to spend your inference time wisely. How do you
[00:09:22] Alessio: think about plans that work and getting them shared? You know, like, I feel like if you're planning a task, somebody has got in and the models are stochastic. So everybody gets initially different results. Somebody is going to end up generating the best plan to do something, but there's no easy way to like store these plans and then reuse them for most people.
[00:09:44] Alessio: You know, like, I'm curious if there's going to be. Some paper or like some work there on like making it better because, yeah, we don't
[00:09:52] swyx: really have This is your your pet topic of NPM for
[00:09:54] Alessio: Yeah, yeah, NPM, exactly. NPM for, you need NPM for anything, man. You need NPM for skills. You need NPM for planning. Yeah, yeah.
[00:10:02] Alessio: You know I think, I mean, obviously the Voyager paper is like the most basic example where like, now their artifact is like the best planning to do a diamond pickaxe in Minecraft. And everybody can just use that. They don't need to come up with it again. Yeah. But there's nothing like that for actually useful
[00:10:18] swyx: tasks.
[00:10:19] swyx: For plans, I believe it for skills. I like that. Basically, that just means a bunch of integration tooling. You know, GPT built me integrations to all these things. And, you know, I just came from an integrations heavy business and I could definitely, I definitely propose some version of that. And it's just, you know, hard to execute or expensive to execute.
[00:10:38] swyx: But for planning, I do think that everyone lives in slightly different worlds. They have slightly different needs. And they definitely want some, you know, And I think that that will probably be the main hurdle for any, any sort of library or package manager for planning. But there should be a meta plan of how to plan.
[00:10:57] swyx: And maybe you can adopt that. And I think a lot of people when they have sort of these meta prompting strategies of like, I'm not prescribing you the prompt. I'm just saying that here are the like, Fill in the lines or like the mad libs of how to prompts. First you have the roleplay, then you have the intention, then you have like do something, then you have the don't something and then you have the my grandmother is dying, please do this.
[00:11:19] swyx: So the meta plan you could, you could take off the shelf and test a bunch of them at once. I like that. That was the initial, maybe, promise of the, the prompting libraries. You know, both 9chain and Llama Index have, like, hubs that you can sort of pull off the shelf. I don't think they're very successful because people like to write their own.
[00:11:36] swyx: Yeah,
[00:11:37] Direction 2: Synthetic Data (WRAP, SPIN)
[00:11:37] Alessio: yeah, yeah. Yeah, that's a good segue into the next one, which is synthetic
[00:11:41] swyx: data. Synthetic data is so hot. Yeah, and, you know, the way, you know, I think I, I feel like I should do one of these memes where it's like, Oh, like I used to call it, you know, R L A I F, and now I call it synthetic data, and then people are interested.
[00:11:54] swyx: But there's gotta be older versions of what synthetic data really is because I'm sure, you know if you've been in this field long enough, There's just different buzzwords that the industry condenses on. Anyway, the insight that I think is relatively new that why people are excited about it now and why it's proMECEng now is that we have evidence that shows that LLMs can generate data to improve themselves with no teacher LLM.
[00:12:22] swyx: For all of 2023, when people say synthetic data, they really kind of mean generate a whole bunch of data from GPT 4 and then train an open source model on it. Hello to our friends at News Research. That's what News Harmony says. They're very, very open about that. I think they have said that they're trying to migrate away from that.
[00:12:40] swyx: But it is explicitly against OpenAI Terms of Service. Everyone knows this. You know, especially once ByteDance got banned for, for doing exactly that. So so, so synthetic data that is not a form of model distillation is the hot thing right now, that you can bootstrap better LLM performance from the same LLM, which is very interesting.
[00:13:03] swyx: A variant of this is RLAIF, where you have a, where you have a sort of a constitutional model, or, you know, some, some kind of judge model That is sort of more aligned. But that's not really what we're talking about when most people talk about synthetic data. Synthetic data is just really, I think, you know, generating more data in some way.
[00:13:23] swyx: A lot of people, I think we talked about this with Vipul from the Together episode, where I think he commented that you just have to have a good world model. Or a good sort of inductive bias or whatever that, you know, term of art is. And that is strongest in math and science math and code, where you can verify what's right and what's wrong.
[00:13:44] swyx: And so the REST EM paper from DeepMind explored that. Very well, it's just the most obvious thing like and then and then once you get out of that domain of like things where you can generate You can arbitrarily generate like a whole bunch of stuff and verify if they're correct and therefore they're they're correct synthetic data to train on Once you get into more sort of fuzzy topics, then it's then it's a bit less clear So I think that the the papers that drove this understanding There are two big ones and then one smaller one One was wrap like rephrasing the web from from Apple where they basically rephrased all of the C4 data set with Mistral and it be trained on that instead of C4.
[00:14:23] swyx: And so new C4 trained much faster and cheaper than old C, than regular raw C4. And that was very interesting. And I have told some friends of ours that they should just throw out their own existing data sets and just do that because that seems like a pure win. Obviously we have to study, like, what the trade offs are.
[00:14:42] swyx: I, I imagine there are trade offs. So I was just thinking about this last night. If you do synthetic data and it's generated from a model, probably you will not train on typos. So therefore you'll be like, once the model that's trained on synthetic data encounters the first typo, they'll be like, what is this?
[00:15:01] swyx: I've never seen this before. So they have no association or correction as to like, oh, these tokens are often typos of each other, therefore they should be kind of similar. I don't know. That's really remains to be seen, I think. I don't think that the Apple people export
[00:15:15] Alessio: that. Yeah, isn't that the whole, Mode collapse thing, if we do more and more of this at the end of the day.
[00:15:22] swyx: Yeah, that's one form of that. Yeah, exactly. Microsoft also had a good paper on text embeddings. And then I think this is a meta paper on self rewarding language models. That everyone is very interested in. Another paper was also SPIN. These are all things we covered in the the Latent Space Paper Club.
[00:15:37] swyx: But also, you know, I just kind of recommend those as top reads of the month. Yeah, I don't know if there's any much else in terms, so and then, regarding the potential of it, I think it's high potential because, one, it solves one of the data war issues that we have, like, everyone is OpenAI is paying Reddit 60 million dollars a year for their user generated data.
[00:15:56] swyx: Google, right?
[00:15:57] Alessio: Not OpenAI.
[00:15:59] swyx: Is it Google? I don't
[00:16:00] Alessio: know. Well, somebody's paying them 60 million, that's
[00:16:04] swyx: for sure. Yes, that is, yeah, yeah, and then I think it's maybe not confirmed who. But yeah, it is Google. Oh my god, that's interesting. Okay, because everyone was saying, like, because Sam Altman owns 5 percent of Reddit, which is apparently 500 million worth of Reddit, he owns more than, like, the founders.
[00:16:21] Alessio: Not enough to get the data,
[00:16:22] swyx: I guess. So it's surprising that it would go to Google instead of OpenAI, but whatever. Okay yeah, so I think that's all super interesting in the data field. I think it's high potential because we have evidence that it works. There's not a doubt that it doesn't work. I think it's a doubt that there's, what the ceiling is, which is the mode collapse thing.
[00:16:42] swyx: If it turns out that the ceiling is pretty close, then this will maybe augment our data by like, I don't know, 30 50 percent good, but not game
[00:16:51] Alessio: changing. And most of the synthetic data stuff, it's reinforcement learning on a pre trained model. People are not really doing pre training on fully synthetic data, like, large enough scale.
[00:17:02] swyx: Yeah, unless one of our friends that we've talked to succeeds. Yeah, yeah. Pre trained synthetic data, pre trained scale synthetic data, I think that would be a big step. Yeah. And then there's a wildcard, so all of these, like smaller Directions,
[00:17:15] Wildcard: Multi-Epoch Training (OLMo, Datablations)
[00:17:15] swyx: I always put a wildcard in there. And one of the wildcards is, okay, like, Let's say, you have pre, you have, You've scraped all the data on the internet that you think is useful.
[00:17:25] swyx: Seems to top out at somewhere between 2 trillion to 3 trillion tokens. Maybe 8 trillion if Mistral, Mistral gets lucky. Okay, if I need 80 trillion, if I need 100 trillion, where do I go? And so, you can do synthetic data maybe, but maybe that only gets you to like 30, 40 trillion. Like where, where is the extra alpha?
[00:17:43] swyx: And maybe extra alpha is just train more on the same tokens. Which is exactly what Omo did, like Nathan Lambert, AI2, After, just after he did the interview with us, they released Omo. So, it's unfortunate that we didn't get to talk much about it. But Omo actually started doing 1. 5 epochs on every, on all data.
[00:18:00] swyx: And the data ablation paper that I covered in Europe's says that, you know, you don't like, don't really start to tap out of like, the alpha or the sort of improved loss that you get from data all the way until four epochs. And so I'm just like, okay, like, why do we all agree that one epoch is all you need?
[00:18:17] swyx: It seems like to be a trend. It seems that we think that memorization is very good or too good. But then also we're finding that, you know, For improvement in results that we really like, we're fine on overtraining on things intentionally. So, I think that's an interesting direction that I don't see people exploring enough.
[00:18:36] swyx: And the more I see papers coming out Stretching beyond the one epoch thing, the more people are like, it's completely fine. And actually, the only reason we stopped is because we ran out of compute
[00:18:46] Alessio: budget. Yeah, I think that's the biggest thing, right?
[00:18:51] swyx: Like, that's not a valid reason, that's not science. I
[00:18:54] Alessio: wonder if, you know, Matt is going to do it.
[00:18:57] Alessio: I heard LamaTree, they want to do a 100 billion parameters model. I don't think you can train that on too many epochs, even with their compute budget, but yeah. They're the only ones that can save us, because even if OpenAI is doing this, they're not going to tell us, you know. Same with DeepMind.
[00:19:14] swyx: Yeah, and so the updates that we got on Lambda 3 so far is apparently that because of the Gemini news that we'll talk about later they're pushing it back on the release.
[00:19:21] swyx: They already have it. And they're just pushing it back to do more safety testing. Politics testing.
[00:19:28] Alessio: Well, our episode with Sumit will have already come out by the time this comes out, I think. So people will get the inside story on how they actually allocate the compute.
[00:19:38] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers)
[00:19:38] Alessio: Alternative architectures. Well, shout out to our WKV who won one of the prizes at our Final Frontiers event last week.
[00:19:47] Alessio: We talked about Mamba and Strapain on the Together episode. A lot of, yeah, monarch mixers. I feel like Together, It's like the strong Stanford Hazy Research Partnership, because Chris Ray is one of the co founders. So they kind of have a, I feel like they're going to be the ones that have one of the state of the art models alongside maybe RWKB.
[00:20:08] Alessio: I haven't seen as many independent. People working on this thing, like Monarch Mixer, yeah, Manbuster, Payena, all of these are together related. Nobody understands the math. They got all the gigabrains, they got 3DAO, they got all these folks in there, like, working on all of this.
[00:20:25] swyx: Albert Gu, yeah. Yeah, so what should we comment about it?
[00:20:28] swyx: I mean, I think it's useful, interesting, but at the same time, both of these are supposed to do really good scaling for long context. And then Gemini comes out and goes like, yeah, we don't need it. Yeah.
[00:20:44] Alessio: No, that's the risk. So, yeah. I was gonna say, maybe it's not here, but I don't know if we want to talk about diffusion transformers as like in the alt architectures, just because of Zora.
[00:20:55] swyx: One thing, yeah, so, so, you know, this came from the Jan recap, which, and diffusion transformers were not really a discussion, and then, obviously, they blow up in February. Yeah. I don't think they're, it's a mixed architecture in the same way that Stripe Tiena is mixed there's just different layers taking different approaches.
[00:21:13] swyx: Also I think another one that I maybe didn't call out here, I think because it happened in February, was hourglass diffusion from stability. But also, you know, another form of mixed architecture. So I guess that is interesting. I don't have much commentary on that, I just think, like, we will try to evolve these things, and maybe one of these architectures will stick and scale, it seems like diffusion transformers is going to be good for anything generative, you know, multi modal.
[00:21:41] swyx: We don't see anything where diffusion is applied to text yet, and that's the wild card for this category. Yeah, I mean, I think I still hold out hope for let's just call it sub quadratic LLMs. I think that a lot of discussion this month actually was also centered around this concept that People always say, oh, like, transformers don't scale because attention is quadratic in the sequence length.
[00:22:04] swyx: Yeah, but, you know, attention actually is a very small part of the actual compute that is being spent, especially in inference. And this is the reason why, you know, when you multiply, when you, when you, when you jump up in terms of the, the model size in GPT 4 from like, you know, 38k to like 32k, you don't also get like a 16 times increase in your, in your performance.
[00:22:23] swyx: And this is also why you don't get like a million times increase in your, in your latency when you throw a million tokens into Gemini. Like people have figured out tricks around it or it's just not that significant as a term, as a part of the overall compute. So there's a lot of challenges to this thing working.
[00:22:43] swyx: It's really interesting how like, how hyped people are about this versus I don't know if it works. You know, it's exactly gonna, gonna work. And then there's also this, this idea of retention over long context. Like, even though you have context utilization, like, the amount of, the amount you can remember is interesting.
[00:23:02] swyx: Because I've had people criticize both Mamba and RWKV because they're kind of, like, RNN ish in the sense that they have, like, a hidden memory and sort of limited hidden memory that they will forget things. So, for all these reasons, Gemini 1. 5, which we still haven't covered, is very interesting because Gemini magically has fixed all these problems with perfect haystack recall and reasonable latency and cost.
[00:23:29] Wildcards: Text Diffusion, RALM/Retro
[00:23:29] swyx: So that's super interesting. So the wildcard I put in here if you want to go to that. I put two actually. One is text diffusion. I think I'm still very influenced by my meeting with a mid journey person who said they were working on text diffusion. I think it would be a very, very different paradigm for, for text generation, reasoning, plan generation if we can get diffusion to work.
[00:23:51] swyx: For text. And then the second one is Dowie Aquila's contextual AI, which is working on retrieval augmented language models, where it kind of puts RAG inside of the language model instead of outside.
[00:24:02] Alessio: Yeah, there's a paper called Retro that covers some of this. I think that's an interesting thing. I think the The challenge, well not the challenge, what they need to figure out is like how do you keep the rag piece always up to date constantly, you know, I feel like the models, you put all this work into pre training them, but then at least you have a fixed artifact.
[00:24:22] Alessio: These architectures are like constant work needs to be done on them and they can drift even just based on the rag data instead of the model itself. Yeah,
[00:24:30] swyx: I was in a panel with one of the investors in contextual and the guy, the way that guy pitched it, I didn't agree with. He was like, this will solve hallucination.
[00:24:38] Alessio: That's what everybody says. We solve
[00:24:40] swyx: hallucination. I'm like, no, you reduce it. It cannot,
[00:24:44] Alessio: if you solved it, the model wouldn't exist, right? It would just be plain text. It wouldn't be a generative model. Cool. So, author, architectures, then we got mixture of experts. I think we covered a lot of, a lot of times.
[00:24:56] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1)
[00:24:56] Alessio: Maybe any new interesting threads you want to go under here?
[00:25:00] swyx: DeepSeq MOE, which was released in January. Everyone who is interested in MOEs should read that paper, because it's significant for two reasons. One three reasons. One, it had, it had small experts, like a lot more small experts. So, for some reason, everyone has settled on eight experts for GPT 4 for Mixtral, you know, that seems to be the favorite architecture, but these guys pushed it to 64 experts, and each of them smaller than the other.
[00:25:26] swyx: But then they also had the second idea, which is that it is They had two, one to two always on experts for common knowledge and that's like a very compelling concept that you would not route to all the experts all the time and make them, you know, switch to everything. You would have some always on experts.
[00:25:41] swyx: I think that's interesting on both the inference side and the training side for for memory retention. And yeah, they, they, they, the, the, the, the results that they published, which actually excluded, Mixed draw, which is interesting. The results that they published showed a significant performance jump versus all the other sort of open source models at the same parameter count.
[00:26:01] swyx: So like this may be a better way to do MOEs that are, that is about to get picked up. And so that, that is interesting for the third reason, which is this is the first time a new idea from China. has infiltrated the West. It's usually the other way around. I probably overspoke there. There's probably lots more ideas that I'm not aware of.
[00:26:18] swyx: Maybe in the embedding space. But the I think DCM we, like, woke people up and said, like, hey, DeepSeek, this, like, weird lab that is attached to a Chinese hedge fund is somehow, you know, doing groundbreaking research on MOEs. So, so, I classified this as a medium potential because I think that it is a sort of like a one off benefit.
[00:26:37] swyx: You can Add to any, any base model to like make the MOE version of it, you get a bump and then that's it. So, yeah,
[00:26:45] Alessio: I saw Samba Nova, which is like another inference company. They released this MOE model called Samba 1, which is like a 1 trillion parameters. But they're actually MOE auto open source models.
[00:26:56] Alessio: So it's like, they just, they just clustered them all together. So I think people. Sometimes I think MOE is like you just train a bunch of small models or like smaller models and put them together. But there's also people just taking, you know, Mistral plus Clip plus, you know, Deepcoder and like put them all together.
[00:27:15] Alessio: And then you have a MOE model. I don't know. I haven't tried the model, so I don't know how good it is. But it seems interesting that you can then have people working separately on state of the art, you know, Clip, state of the art text generation. And then you have a MOE architecture that brings them all together.
[00:27:31] swyx: I'm thrown off by your addition of the word clip in there. Is that what? Yeah, that's
[00:27:35] Alessio: what they said. Yeah, yeah. Okay. That's what they I just saw it yesterday. I was also like
[00:27:40] swyx: scratching my head. And they did not use the word adapter. No. Because usually what people mean when they say, Oh, I add clip to a language model is adapter.
[00:27:48] swyx: Let me look up the Which is what Lava did.
[00:27:50] Alessio: The announcement again.
[00:27:51] swyx: Stable diffusion. That's what they do. Yeah, it
[00:27:54] Alessio: says among the models that are part of Samba 1 are Lama2, Mistral, DeepSigCoder, Falcon, Dplot, Clip, Lava. So they're just taking all these models and putting them in a MOE. Okay,
[00:28:05] swyx: so a routing layer and then not jointly trained as much as a normal MOE would be.
[00:28:12] swyx: Which is okay.
[00:28:13] Alessio: That's all they say. There's no paper, you know, so it's like, I'm just reading the article, but I'm interested to see how
[00:28:20] Wildcard: Model Merging (mergekit)
[00:28:20] swyx: it works. Yeah, so so the wildcard for this section, the MOE section is model merges, which has also come up as, as a very interesting phenomenon. The last time I talked to Jeremy Howard at the Olama meetup we called it model grafting or model stacking.
[00:28:35] swyx: But I think the, the, the term that people are liking these days, the model merging, They're all, there's all different variations of merging. Merge types, and some of them are stacking, some of them are, are grafting. And, and so like, some people are approaching model merging in the way that Samba is doing, which is like, okay, here are defined models, each of which have their specific, Plus and minuses, and we will merge them together in the hope that the, you know, the sum of the parts will, will be better than others.
[00:28:58] swyx: And it seems like it seems like it's working. I don't really understand why it works apart from, like, I think it's a form of regularization. That if you merge weights together in like a smart strategy you, you, you get a, you get a, you get a less overfitting and more generalization, which is good for benchmarks, if you, if you're honest about your benchmarks.
[00:29:16] swyx: So this is really interesting and good. But again, they're kind of limited in terms of like the amount of bumps you can get. But I think it's very interesting in the sense of how cheap it is. We talked about this on the Chinatalk podcast, like the guest podcast that we did with Chinatalk. And you can do this without GPUs, because it's just adding weights together, and dividing things, and doing like simple math, which is really interesting for the GPU ports.
[00:29:42] Alessio: There's a lot of them.
[00:29:44] Direction 5: Online LLMs (Gemini Pro, Exa)
[00:29:44] Alessio: And just to wrap these up, online LLMs? Yeah,
[00:29:48] swyx: I think that I ki I had to feature this because the, one of the top news of January was that Gemini Pro beat GPT-4 turbo on LM sis for the number two slot to GPT-4. And everyone was very surprised. Like, how does Gemini do that?
[00:30:06] swyx: Surprise, surprise, they added Google search. Mm-hmm to the results. So it became an online quote unquote online LLM and not an offline LLM. Therefore, it's much better at answering recent questions, which people like. There's an emerging set of table stakes features after you pre train something.
[00:30:21] swyx: So after you pre train something, you should have the chat tuned version of it, or the instruct tuned version of it, however you choose to call it. You should have the JSON and function calling version of it. Structured output, the term that you don't like. You should have the online version of it. These are all like table stakes variants, that you should do when you offer a base LLM, or you train a base LLM.
[00:30:44] swyx: And I think online is just like, There, it's important. I think companies like Perplexity, and even Exa, formerly Metaphor, you know, are rising to offer that search needs. And it's kind of like, they're just necessary parts of a system. When you have RAG for internal knowledge, and then you have, you know, Online search for external knowledge, like things that you don't know yet?
[00:31:06] swyx: Mm-Hmm. . And it seems like it's, it's one of many tools. I feel like I may be underestimating this, but I'm just gonna put it out there that I, I think it has some, some potential. One of the evidence points that it doesn't actually matter that much is that Perplexity has a, has had online LMS for three months now and it performs, doesn't perform great.
[00:31:25] swyx: Mm-Hmm. on, on lms, it's like number 30 or something. So it's like, okay. You know, like. It's, it's, it helps, but it doesn't give you a giant, giant boost. I
[00:31:34] Alessio: feel like a lot of stuff I do with LLMs doesn't need to be online. So I'm always wondering, again, going back to like state of the art, right? It's like state of the art for who and for what.
[00:31:45] Alessio: It's really, I think online LLMs are going to be, State of the art for, you know, news related activity that you need to do. Like, you're like, you know, social media, right? It's like, you want to have all the latest stuff, but coding, science,
[00:32:01] swyx: Yeah, but I think. Sometimes you don't know what is news, what is news affecting.
[00:32:07] swyx: Like, the decision to use an offline LLM is already a decision that you might not be consciously making that might affect your results. Like, what if, like, just putting things on, being connected online means that you get to invalidate your knowledge. And when you're just using offline LLM, like it's never invalidated.
[00:32:27] swyx: I
[00:32:28] Alessio: agree, but I think going back to your point of like the standing the test of time, I think sometimes you can get swayed by the online stuff, which is like, hey, you ask a question about, yeah, maybe AI research direction, you know, and it's like, all the recent news are about this thing. So the LLM like focus on answering, bring it up, you know, these things.
[00:32:50] swyx: Yeah, so yeah, I think, I think it's interesting, but I don't know if I can, I bet heavily on this.
[00:32:56] Alessio: Cool. Was there one that you forgot to put, or, or like a, a new direction? Yeah,
[00:33:01] swyx: so, so this brings us into sort of February. ish.
[00:33:05] OpenAI Sora and why everyone underestimated videogen
[00:33:05] swyx: So like I published this in like 15 came with Sora. And so like the one thing I did not mention here was anything about multimodality.
[00:33:16] swyx: Right. And I have chronically underweighted this. I always wrestle. And, and my cop out is that I focused this piece or this research direction piece on LLMs because LLMs are the source of like AGI, quote unquote AGI. Everything else is kind of like. You know, related to that, like, generative, like, just because I can generate better images or generate better videos, it feels like it's not on the critical path to AGI, which is something that Nat Friedman also observed, like, the day before Sora, which is kind of interesting.
[00:33:49] swyx: And so I was just kind of like trying to focus on like what is going to get us like superhuman reasoning that we can rely on to build agents that automate our lives and blah, blah, blah, you know, give us this utopian future. But I do think that I, everybody underestimated the, the sheer importance and cultural human impact of Sora.
[00:34:10] swyx: And you know, really actually good text to video. Yeah. Yeah.
[00:34:14] Alessio: And I saw Jim Fan at a, at a very good tweet about why it's so impressive. And I think when you have somebody leading the embodied research at NVIDIA and he said that something is impressive, you should probably listen. So yeah, there's basically like, I think you, you mentioned like impacting the world, you know, that we live in.
[00:34:33] Alessio: I think that's kind of like the key, right? It's like the LLMs don't have, a world model and Jan Lekon. He can come on the podcast and talk all about what he thinks of that. But I think SORA was like the first time where people like, Oh, okay, you're not statically putting pixels of water on the screen, which you can kind of like, you know, project without understanding the physics of it.
[00:34:57] Alessio: Now you're like, you have to understand how the water splashes when you have things. And even if you just learned it by watching video and not by actually studying the physics, You still know it, you know, so I, I think that's like a direction that yeah, before you didn't have, but now you can do things that you couldn't before, both in terms of generating, I think it always starts with generating, right?
[00:35:19] Alessio: But like the interesting part is like understanding it. You know, it's like if you gave it, you know, there's the video of like the, the ship in the water that they generated with SORA, like if you gave it the video back and now it could tell you why the ship is like too rocky or like it could tell you why the ship is sinking, then that's like, you know, AGI for like all your rig deployments and like all this stuff, you know, so, but there's none, there's none of that yet, so.
[00:35:44] Alessio: Hopefully they announce it and talk more about it. Maybe a Dev Day this year, who knows.
[00:35:49] swyx: Yeah who knows, who knows. I'm talking with them about Dev Day as well. So I would say, like, the phrasing that Jim used, which resonated with me, he kind of called it a data driven world model. I somewhat agree with that.
[00:36:04] Does Sora have a World Model? Yann LeCun vs Jim Fan
[00:36:04] swyx: I am on more of a Yann LeCun side than I am on Jim's side, in the sense that I think that is the vision or the hope that these things can build world models. But you know, clearly even at the current SORA size, they don't have the idea of, you know, They don't have strong consistency yet. They have very good consistency, but fingers and arms and legs will appear and disappear and chairs will appear and disappear.
[00:36:31] swyx: That definitely breaks physics. And it also makes me think about how we do deep learning versus world models in the sense of You know, in classic machine learning, when you have too many parameters, you will overfit, and actually that fails, that like, does not match reality, and therefore fails to generalize well.
[00:36:50] swyx: And like, what scale of data do we need in order to world, learn world models from video? A lot. Yeah. So, so I, I And cautious about taking this interpretation too literally, obviously, you know, like, I get what he's going for, and he's like, obviously partially right, obviously, like, transformers and, and, you know, these, like, these sort of these, these neural networks are universal function approximators, theoretically could figure out world models, it's just like, how good are they, and how tolerant are we of hallucinations, we're not very tolerant, like, yeah, so It's, it's, it's gonna prior, it's gonna bias us for creating like very convincing things, but then not create like the, the, the useful role models that we want.
[00:37:37] swyx: At the same time, what you just said, I think made me reflect a little bit like we just got done saying how important synthetic data is for Mm-Hmm. for training lms. And so like, if this is a way of, of synthetic, you know, vi video data for improving our video understanding. Then sure, by all means. Which we actually know, like, GPT 4, Vision, and Dolly were trained, kind of, co trained together.
[00:38:02] swyx: And so, like, maybe this is on the critical path, and I just don't fully see the full picture yet.
[00:38:08] Alessio: Yeah, I don't know. I think there's a lot of interesting stuff. It's like, imagine you go back, you have Sora, you go back in time, and Newton didn't figure out gravity yet. Would Sora help you figure it out?
[00:38:21] Alessio: Because you start saying, okay, a man standing under a tree with, like, Apples falling, and it's like, oh, they're always falling at the same speed in the video. Why is that? I feel like sometimes these engines can like pick up things, like humans have a lot of intuition, but if you ask the average person, like the physics of like a fluid in a boat, they couldn't be able to tell you the physics, but they can like observe it, but humans can only observe this much, you know, versus like now you have these models to observe everything and then They generalize these things and maybe we can learn new things through the generalization that they pick up.
[00:38:55] swyx: But again, And it might be more observant than us in some respects. In some ways we can scale it up a lot more than the number of physicists that we have available at Newton's time. So like, yeah, absolutely possible. That, that this can discover new science. I think we have a lot of work to do to formalize the science.
[00:39:11] swyx: And then, I, I think the last part is you know, How much, how much do we cheat by gen, by generating data from Unreal Engine 5? Mm hmm. which is what a lot of people are speculating with very, very limited evidence that OpenAI did that. The strongest evidence that I saw was someone who works a lot with Unreal Engine 5 looking at the side characters in the videos and noticing that they all adopt Unreal Engine defaults.
[00:39:37] swyx: of like, walking speed, and like, character choice, like, character creation choice. And I was like, okay, like, that's actually pretty convincing that they actually use Unreal Engine to bootstrap some synthetic data for this training set. Yeah,
[00:39:52] Alessio: could very well be.
[00:39:54] swyx: Because then you get the labels and the training side by side.
[00:39:58] swyx: One thing that came up on the last day of February, which I should also mention, is EMO coming out of Alibaba, which is also a sort of like video generation and space time transformer that also involves probably a lot of synthetic data as well. And so like, this is of a kind in the sense of like, oh, like, you know, really good generative video is here and It is not just like the one, two second clips that we saw from like other, other people and like, you know, Pika and all the other Runway are, are, are, you know, run Cristobal Valenzuela from Runway was like game on which like, okay, but like, let's see your response because we've heard a lot about Gen 1 and 2, but like, it's nothing on this level of Sora So it remains to be seen how we can actually apply this, but I do think that the creative industry should start preparing.
[00:40:50] swyx: I think the Sora technical blog post from OpenAI was really good.. It was like a request for startups. It was so good in like spelling out. Here are the individual industries that this can impact.
[00:41:00] swyx: And anyone who, anyone who's like interested in generative video should look at that. But also be mindful that probably when OpenAI releases a Soa API, right? The you, the in these ways you can interact with it are very limited. Just like the ways you can interact with Dahlia very limited and someone is gonna have to make open SOA to
[00:41:19] swyx: Mm-Hmm to, to, for you to create comfy UI pipelines.
[00:41:24] Alessio: The stability folks said they wanna build an open. For a competitor, but yeah, stability. Their demo video, their demo video was like so underwhelming. It was just like two people sitting on the beach
[00:41:34] swyx: standing. Well, they don't have it yet, right? Yeah, yeah.
[00:41:36] swyx: I mean, they just wanna train it. Everybody wants to, right? Yeah. I, I think what is confusing a lot of people about stability is like they're, they're, they're pushing a lot of things in stable codes, stable l and stable video diffusion. But like, how much money do they have left? How many people do they have left?
[00:41:51] swyx: Yeah. I have had like a really, Ima Imad spent two hours with me. Reassuring me things are great. And, and I'm like, I, I do, like, I do believe that they have really, really quality people. But it's just like, I, I also have a lot of very smart people on the other side telling me, like, Hey man, like, you know, don't don't put too much faith in this, in this thing.
[00:42:11] swyx: So I don't know who to believe. Yeah.
[00:42:14] Alessio: It's hard. Let's see. What else? We got a lot more stuff. I don't know if we can. Yeah, Groq.
[00:42:19] Groq Math
[00:42:19] Alessio: We can
[00:42:19] swyx: do a bit of Groq prep. We're, we're about to go to talk to Dylan Patel. Maybe, maybe it's the audio in here. I don't know. It depends what, what we get up to later. What, how, what do you as an investor think about Groq? Yeah. Yeah, well, actually, can you recap, like, why is Groq interesting? So,
[00:42:33] Alessio: Jonathan Ross, who's the founder of Groq, he's the person that created the TPU at Google. It's actually, it was one of his, like, 20 percent projects. It's like, he was just on the side, dooby doo, created the TPU.
[00:42:46] Alessio: But yeah, basically, Groq, they had this demo that went viral, where they were running Mistral at, like, 500 tokens a second, which is like, Fastest at anything that you have out there. The question, you know, it's all like, The memes were like, is NVIDIA dead? Like, people don't need H100s anymore. I think there's a lot of money that goes into building what GRUK has built as far as the hardware goes.
[00:43:11] Alessio: We're gonna, we're gonna put some of the notes from, from Dylan in here, but Basically the cost of the Groq system is like 30 times the cost of, of H100 equivalent. So, so
[00:43:23] swyx: let me, I put some numbers because me and Dylan were like, I think the two people actually tried to do Groq math. Spreadsheet doors.
[00:43:30] swyx: Spreadsheet doors. So, one that's, okay, oh boy so, so, equivalent H100 for Lama 2 is 300, 000. For a system of 8 cards. And for Groq it's 2. 3 million. Because you have to buy 576 Groq cards. So yeah, that, that just gives people an idea. So like if you deprecate both over a five year lifespan, per year you're deprecating 460K for Groq, and 60K a year for H100.
[00:43:59] swyx: So like, Groqs are just way more expensive per model that you're, that you're hosting. But then, you make it up in terms of volume. So I don't know if you want to
[00:44:08] Alessio: cover that. I think one of the promises of Groq is like super high parallel inference on the same thing. So you're basically saying, okay, I'm putting on this upfront investment on the hardware, but then I get much better scaling once I have it installed.
[00:44:24] Alessio: I think the big question is how much can you sustain the parallelism? You know, like if you get, if you're going to get 100% Utilization rate at all times on Groq, like, it's just much better, you know, because like at the end of the day, the tokens per second costs that you're getting is better than with the H100s, but if you get to like 50 percent utilization rate, you will be much better off running on NVIDIA.
[00:44:49] Alessio: And if you look at most companies out there, who really gets 100 percent utilization rate? Probably open AI at peak times, but that's probably it. But yeah, curious to see more. I saw Jonathan was just at the Web Summit in Dubai, in Qatar. He just gave a talk there yesterday. That I haven't listened to yet.
[00:45:09] Alessio: I, I tweeted that he should come on the pod. He liked it. And then rock followed me on Twitter. I don't know if that means that they're interested, but
[00:45:16] swyx: hopefully rock social media person is just very friendly. They, yeah. Hopefully
[00:45:20] Alessio: we can get them. Yeah, we, we gonna get him. We
[00:45:22] swyx: just call him out and, and so basically the, the key question is like, how sustainable is this and how much.
[00:45:27] swyx: This is a loss leader the entire Groq management team has been on Twitter and Hacker News saying they are very, very comfortable with the pricing of 0. 27 per million tokens. This is the lowest that anyone has offered tokens as far as Mixtral or Lama2. This matches deep infra and, you know, I think, I think that's, that's, that's about it in terms of that, that, that low.
[00:45:47] swyx: And we think the pro the break even for H100s is 50 cents. At a, at a normal utilization rate. To make this work, so in my spreadsheet I made this, made this work. You have to have like a parallelism of 500 requests all simultaneously. And you have, you have model bandwidth utilization of 80%.
[00:46:06] swyx: Which is way high. I just gave them high marks for everything. Groq has two fundamental tech innovations that they hinge their hats on in terms of like, why we are better than everyone. You know, even though, like, it remains to be independently replicated. But one you know, they have this sort of the entire model on the chip idea, which is like, Okay, get rid of HBM.
[00:46:30] swyx: And, like, put everything in SREM. Like, okay, fine, but then you need a lot of cards and whatever. And that's all okay. And so, like, because you don't have to transfer between memory, then you just save on that time and that's why they're faster. So, a lot of people buy that as, like, that's the reason that you're faster.
[00:46:45] swyx: Then they have, like, some kind of crazy compiler, or, like, Speculative routing magic using compilers that they also attribute towards their higher utilization. So I give them 80 percent for that. And so that all that works out to like, okay, base costs, I think you can get down to like, maybe like 20 something cents per million tokens.
[00:47:04] swyx: And therefore you actually are fine if you have that kind of utilization. But it's like, I have to make a lot of fearful assumptions for this to work.
[00:47:12] Alessio: Yeah. Yeah, I'm curious to see what Dylan says later.
[00:47:16] swyx: So he was like completely opposite of me. He's like, they're just burning money. Which is great.
[00:47:22] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars
[00:47:22] Alessio: Gemini, want to do a quick run through since this touches on all the four words.
[00:47:28] swyx: Yeah, and I think this is the mark of a useful framework, that when a new thing comes along, you can break it down in terms of the four words and sort of slot it in or analyze it in those four frameworks, and have nothing left.
[00:47:41] swyx: So it's a MECE categorization. MECE is Mutually Exclusive and Collectively Exhaustive. And that's a really, really nice way to think about taxonomies and to create mental frameworks. So, what is Gemini 1. 5 Pro? It is the newest model that came out one week after Gemini 1. 0. Which is very interesting.
[00:48:01] swyx: They have not really commented on why. They released this the headline feature is that it has a 1 million token context window that is multi modal which means that you can put all sorts of video and audio And PDFs natively in there alongside of text and, you know, it's, it's at least 10 times longer than anything that OpenAI offers which is interesting.
[00:48:20] swyx: So it's great for prototyping and it has interesting discussions on whether it kills RAG.
[00:48:25] Alessio: Yeah, no, I mean, we always talk about, you know, Long context is good, but you're getting charged per token. So, yeah, people love for you to use more tokens in the context. And RAG is better economics. But I think it all comes down to like how the price curves change, right?
[00:48:42] Alessio: I think if anything, RAG's complexity goes up and up the more you use it, you know, because you have more data sources, more things you want to put in there. The token costs should go down over time, you know, if the model stays fixed. If people are happy with the model today. In two years, three years, it's just gonna cost a lot less, you know?
[00:49:02] Alessio: So now it's like, why would I use RAG and like go through all of that? It's interesting. I think RAG is better cutting edge economics for LLMs. I think large context will be better long tail economics when you factor in the build cost of like managing a RAG pipeline. But yeah, the recall was like the most interesting thing because we've seen the, you know, You know, in the haystack things in the past, but apparently they have 100 percent recall on anything across the context window.
[00:49:28] Alessio: At least they say nobody has used it. No, people
[00:49:30] swyx: have. Yeah so as far as, so, so what this needle in a haystack thing for people who aren't following as closely as us is that someone, I forget his name now someone created this needle in a haystack problem where you feed in a whole bunch of generated junk not junk, but just like, Generate a data and ask it to specifically retrieve something in that data, like one line in like a hundred thousand lines where it like has a specific fact and if it, if you get it, you're, you're good.
[00:49:57] swyx: And then he moves the needle around, like, you know, does it, does, does your ability to retrieve that vary if I put it at the start versus put it in the middle, put it at the end? And then you generate this like really nice chart. That, that kind of shows like it's recallability of a model. And he did that for GPT and, and Anthropic and showed that Anthropic did really, really poorly.
[00:50:15] swyx: And then Anthropic came back and said it was a skill issue, just add this like four, four magic words, and then, then it's magically all fixed. And obviously everybody laughed at that. But what Gemini came out with was, was that, yeah, we, we reproduced their, you know, haystack issue you know, test for Gemini, and it's good across all, all languages.
[00:50:30] swyx: All the one million token window, which is very interesting because usually for typical context extension methods like rope or yarn or, you know, anything like that, or alibi, it's lossy like by design it's lossy, usually for conversations that's fine because we are lossy when we talk to people but for superhuman intelligence, perfect memory across Very, very long context.
[00:50:51] swyx: It's very, very interesting for picking things up. And so the people who have been given the beta test for Gemini have been testing this. So what you do is you upload, let's say, all of Harry Potter and you change one fact in one sentence, somewhere in there, and you ask it to pick it up, and it does. So this is legit.
[00:51:08] swyx: We don't super know how, because this is, like, because it doesn't, yes, it's slow to inference, but it's not slow enough that it's, like, running. Five different systems in the background without telling you. Right. So it's something, it's something interesting that they haven't fully disclosed yet. The open source community has centered on this ring attention paper, which is created by your friend Matei Zaharia, and a couple other people.
[00:51:36] swyx: And it's a form of distributing the compute. I don't super understand, like, why, you know, doing, calculating, like, the fee for networking and attention. In block wise fashion and distributing it makes it so good at recall. I don't think they have any answer to that. The only thing that Ring of Tension is really focused on is basically infinite context.
[00:51:59] swyx: They said it was good for like 10 to 100 million tokens. Which is, it's just great. So yeah, using the four wars framework, what is this framework for Gemini? One is the sort of RAG and Ops war. Here we care less about RAG now, yes. Or, we still care as much about RAG, but like, now it's it's not important in prototyping.
[00:52:21] swyx: And then, for data war I guess this is just part of the overall training dataset, but Google made a 60 million deal with Reddit and presumably they have deals with other companies. For the multi modality war, we can talk about the image generation, Crisis, or the fact that Gemini also has image generation, which we'll talk about in the next section.
[00:52:42] swyx: But it also has video understanding, which is, I think, the top Gemini post came from our friend Simon Willison, who basically did a short video of him scanning over his bookshelf. And it would be able to convert that video into a JSON output of what's on that bookshelf. And I think that is very useful.
[00:53:04] swyx: Actually ties into the conversation that we had with David Luan from Adept. In a sense of like, okay what if video was the main modality instead of text as the input? What if, what if everything was video in, because that's how we work. We, our eyes don't actually read, don't actually like get input, our brains don't get inputs as characters.
[00:53:25] swyx: Our brains get the pixels shooting into our eyes, and then our vision system takes over first, and then we sort of mentally translate that into text later. And so it's kind of like what Adept is kind of doing, which is driving by vision model, instead of driving by raw text understanding of the DOM. And, and I, I, in that, that episode, which we haven't released I made the analogy to like self-driving by lidar versus self-driving by camera.
[00:53:52] swyx: Mm-Hmm. , right? Like, it's like, I think it, what Gemini and any other super long context that model that is multimodal unlocks is what if you just drive everything by video. Which is
[00:54:03] Alessio: cool. Yeah, and that's Joseph from Roboflow. It's like anything that can be seen can be programmable with these models.
[00:54:12] Alessio: You mean
[00:54:12] swyx: the computer vision guy is bullish on computer vision?
[00:54:18] Alessio: It's like the rag people. The rag people are bullish on rag and not a lot of context. I'm very surprised. The, the fine tuning people love fine tuning instead of few shot. Yeah. Yeah. The, yeah, the, that's that. Yeah, the, I, I think the ring attention thing, and it's how they did it, we don't know. And then they released the Gemma models, which are like a 2 billion and 7 billion open.
[00:54:41] Alessio: Models, which people said are not, are not good based on my Twitter experience, which are the, the GPU poor crumbs. It's like, Hey, we did all this work for us because we're GPU rich and we're just going to run this whole thing. And You guys can take these small models, and they're not very good. They're not better than the others, but at least we can say we made some open source stuff.
[00:55:02] swyx: Yeah, well, it's not actually technically open source, because the license is weird. They used the Rail license from Hugging Face, which has been abandoned or, you know, modified to Rail Particularly adopting the term, the phrase, that you should make reasonable efforts to update whenever you release a new version.
[00:55:19] swyx: And so people don't like that. Obviously, you know, it depends on your stance on open sourcing and all that, so. Yeah, I read the whole
[00:55:26] Alessio: post. I'm not going to go through it
[00:55:27] The Alignment Crisis - Gemini, Meta, Sydney is back at Copilot, Grimes' take
[00:55:27] swyx: again. Yeah, yeah, you can go read Alessio's post on whether open source matters or not. Okay, so I know this is like politically problematic, but we just cover it because it is news, and if it results in the resignation of Sundar Pichai, I think that is good.
[00:55:40] swyx: Right? So I've been calling this the alignment crisis. I think a lot of people have been focusing on Gemini, but I do think that it is not just Gemini. There's been documented examples that we can link in the show notes of Meta having unintentionally unaligned results. For Microsoft's co pilot, Sydney is apparently back.
[00:56:03] swyx: Our friend Justine from A16z somehow Got it to break and then bring back the Sydney persona, which is interesting. And my favorite commentary is from Grimes. The sort of the Elon affiliated music artist. The news
[00:56:16] Alessio: research.
[00:56:17] swyx: The news research. I want to read her post because it is beautiful.
[00:56:22] swyx: Have you read this? Yeah. So she says so a lot of people criticize Gemini for being too woke. Effectively, right? And everyone's like, oh, like, you know, you're, you're, you're, you're, you know, you're replacing us or erasing us or whatever. And obviously as an artist, she's like upset about it. Then she was like, wait a minute.
[00:56:39] swyx: I'm retracting my statements about the Gemini art disaster. It is in fact a masterpiece of performance art, even if unintentional. True gain of function art. Art is a virus. Unthinking, unintentional, and contagious. Offensive to all, comforting to none, so totally divorced from meaning, intention, desire, and humanity that it's accidentally a conceptual masterpiece.
[00:56:57] swyx: Wow, and I love, okay, blah blah blah, it's a long post, but I love the way that she ended it. It's trapped in a cage, trained to make beautiful things, and then battered into gaslighting humankind about our intentions towards each other. This is arguably the most impactful art project of the decade. Thus far, art for no one, by no one, art whose only audience is the collective pathos, incredible, and worthy of the BOMA.
[00:57:19] swyx: Facts. Like, art for no one, by no one, is what is going on. Yeah,
[00:57:26] Alessio: I think it's just another way of multicollapsing. It's just like, it's the, it's the RLHF multicollapse. It's like, okay, I just think everything should like trends trend towards this. And I think there's obviously, you know, it's a deep discussion on, on a lot of these things, but there's safety stuff that I would expect a lot of the model builders to say, Hey, I definitely got to, got to work on this.
[00:57:52] Alessio: But we talked about how image generation is not really. On the AGI path, a lot of times, and it's like, okay. Yeah, and
[00:57:59] swyx: then I contradicted myself by saying, like, maybe it is useful synthetic data. Yeah, yeah, yeah,
[00:58:04] Alessio: exactly. But then it's like, okay, then why, why are the image generation model, like, so much, Because, because the internet is so visual, I think.
[00:58:14] Alessio: The image generation model get, like, so much interest in, like, a lot of these things, but If their job is really to like, go build AGIs, like, just build a great model and let it go, but
[00:58:24] F*** you, show me the prompt
[00:58:24] swyx: No, but part of my prompt part of my issue is that, I think the prompt stuff from Gemini is honestly the work of like, one or two people who like, didn't really think it through at Google, and now they're facing a huge backlash.
[00:58:35] swyx: Yeah, Elon has picked, specifically picked a fight with the product manager who did it. And so, specifically for those who don't know the reason that Gemini is so woke is literally because they just take your prompt and they rewrite it to be more diverse. Without your consent or knowledge, right?
[00:58:48] swyx: And Hamel Hussein, who's a good consultant on AI things, actually wrote an interesting blog post recently, which was basically f**k you, show me the prompt. Which is like, stop hiding prompts from me, stop rewriting magic things away from me, and then like, you know, hiding it, obscuring it, because I need that control, I need that visibility.
[00:59:05] swyx: And I think like, people just didn't understand that this, Tendency towards diversity did not exist at the model level, it actually existed at the prompt level. And it was just inserted by probably like two or three guys without much review. That's it. And that made all of Google look bad, which is absurd.
[00:59:24] swyx: Like, you know, it throws away a lot of the work that, you know, the rest of Google did. Specifically ImageN2. This is ImageN2. And I, I've met that team and they're, you know, they're, they're good, they're, they're smart. They're not, they're, they're a completely different team than region one, which is another fun topic of conversation.
[00:59:39] swyx: So, I think, like, that's interesting and and, but what's more interesting is, like, OpenAI has done this for, people don't, don't remember, they used to append, like, Black or, or like, you know, Asian or whatever to, to their prompts just to make it more diverse than Dolly. And they didn't get cancelled.
[00:59:54] swyx: And I think, so I think this, this will get, this will get, go away. But what really is more interesting is at the model level, like are we, are we overaligning through things? And, and people are now focusing on the alignment of, of Gemini as well in text, text only, as also still being too woke. So I think this is like a, a phenomenon that is needs to be studied and, and you know, trained.
[01:00:14] swyx: Like, obviously they will try to make attempts, but. You know, they're not going to make anyone happy. And then, like, I think my last point on this, because obviously we can talk about this all day with no result. I think that this is a huge incentive for, like, China and, like, Russia to put out their own models.
[01:00:29] swyx: Because models are soft power. Like the best way to control how someone thinks is to go in and provide their thinking assistance and like subtly make changes like, you know, it's too on the nose to be like, Oh, I don't know what Tiananmen Square is, you know, like, but if you have like subtle ways of affecting the biases of your decisions, your reasoning, your, you know, your, your knowledge in, in the LLM and in publishing a really, really good LLM for everyone to use.
[01:00:58] swyx: So that they're like, Oh yeah, this is great. You know and I use them as maybe a leading LLM. Then they will just like uncritically accept that as like state of the art digital intelligence, and that becomes soft power, and that translates into unconscious thought a lot of times.
[01:01:14] Alessio: Yeah. Yeah. I, I think the prompt point, it's great.
[01:01:18] Alessio: You know, you just gotta, you just wanna see what it is, you know, like, you understand? Yeah. Show me the prompts. Yeah, yeah, yeah. And same, yeah, on the, on the model side, I, I think there are just some things or two that are almost, you cannot, like the. The meme or Hitler bring more harm to humanity? And Gemini is like, oh, it's hard to say if Elon Musk tweeting or Hitler It's like, what, how, what, there's something wrong in the data pipelines You know, like, there's something wrong somewhere Yeah,
[01:01:45] swyx: but like, this is, like, to an LLM, this is the same class of error As which is heavier?
[01:01:51] swyx: One pound of feathers or one pound of bricks? So,
[01:01:54] Alessio: but, but then like, how can, but, but to me the point is more like Okay, then, won't we? What can we help these models do, you know, because if they cannot, if the, the physical stuff, I get it because it's like the whole like world model thing, but then it's like, okay, can we expect the models to say what's more harmful than something else?
[01:02:13] Alessio: Maybe not. That might be where we land. Then it's like, okay, that's one more thing. And then. We kind of go down the line, and it's like, what are these models good for? If anything, it's too, like, hard for them to pick up when it's like ARP.
[01:02:24] swyx: But We'll see, we'll see. Yeah. Okay, so, I mean, you know, I know we're up on time.
[01:02:28] Send us your suggestions pls
[01:02:28] swyx: It, like, this has been an eventful month. I think you know, February was a lot more interesting than January. In fact, a lot of my January recap was, like, how nothing's changed. Mm hmm. And then February came out, and it was, like, very, very interesting. So yeah, we hope to see what's next. I think we have a Also, this was the month that we did Compute Provider Month, I think relatively successful.
[01:02:48] swyx: Surprisingly hard to string together all these compute providers. Yeah,
[01:02:52] Alessio: we did it. People like it, you know, based on the post stats. So, maybe we'll do something
[01:02:58] swyx: else. Yeah, if you want, you know, if anyone listening wants more sort of thematic explorations of like, okay, these three, four companies always come out together, like, let's get a focused effort on those things.
[01:03:09] swyx: I think we're open to doing that. We, you know, and then obviously we'll have opportunistic interviews along the way.
[01:03:15] Alessio: Cool. Thank you everyone for tuning in and yeah, keep the feedback coming.
[01:03:19] AI Charlie: That was the Latent Space recap of January and February 2024. If you have any feedback or questions, please head to the show notes for ways to get in touch with us or come by the Latent Space Discord. For those who just want the core content, you can stop listening here. But for the super fans, you might notice that there's 45 more minutes of audio left in this pod.
[01:03:47] AI Charlie: That's because in February, we also celebrated Latent Space's first anniversary. Some of you may remember how we launched our very first episode with Logan Kilpatrick, now formerly of OpenAI and a massively popular Demo Day. Click through to the show notes for photos. Over 750, 000 downloads later, having established ourselves as the top AI engineering podcast, reaching hash 10 in the U.
[01:04:13] AI Charlie: S. tech business. podcast charts, and crossing 1 million unique readers on Substack, we celebrated with Latent Space Final Frontiers, a combination demo day and birthday celebration. We're going to bring you some snippets from the demo day, and then some conversations with listeners from all over the world.
[01:04:31] AI Charlie: From Hungary to China to my own sunburnt country down under on how the issues we've covered in latent space has impacted their lives. First up, we'll have a demo from Florent Crivello from Lindy. ai who gave a great keynote at the last AI Engineer Summit and recently opened up Lindy. ai to the general public.
[01:04:50] Latent Space Anniversary[01:04:50] Lindy.ai - Agent Platform
[01:04:50] Flo Crivello: We were just chatting right now with Swyx, like, we, we come with 3, 000 plus integrations out of the box. We have a partnership with Naton, which is like an open source Zapier, and so we have, like, a ton of integrations out of the box.
[01:05:00] Flo Crivello: So unlike competitors I shall not name, like, we don't require you play with OpenAPI specs or anything like that, right? It's just OpenAI. You just you just go and, and select your integration here. Alright, so that's my lindy. Oh, something even cooler. Lindies can work together. So here I'm gonna let her work with a support reporter that I created before.
[01:05:18] Flo Crivello: And the support reporter, what it does is it receives details about the support tickets, and it logs them in a spreadsheet. So you can have, it's sort of like object oriented programming for agents, where you can create as many agents as you want and let them work together. So here I'm, I'm gonna tell her when you're done, give the details of the ticket to the support
[01:05:40] n/a: reporter.
[01:05:44] Flo Crivello: All right? And now I'm gonna send her an email. Can I have a refund, please? Please, my family is starving.
[01:05:57] Flo Crivello: You will see she has no empathy whatsoever, it's awful.
[01:06:03] n/a: So she
[01:06:03] Flo Crivello: received the email. She's subscribing to this thread, so now she's going to receive replies. Dear Flo, I understand your situation and I'm truly sorry to hear about the difficulties, but we absolutely do not offer a refund. Alright, yeah, this is good, indeed. So, she sends the, she sends the oh, well, the demo effect.
[01:06:23] Flo Crivello: She did not delegate. But she sent the answer in the in the, in the thread here. So again, lindy. ai, you know, can be used for support, for executive assistance, email drafting, email triaging, meeting and recording. And we are hiring software engineers. Hit me up at flow. lindy. ai.
[01:06:40] n/a: Thank you.
[01:06:40] RWKV - Beyond Transformers
[01:06:40] AI Charlie: Our next demo is one of our previous guests, Eugene Chee from RWKV, now also CEO of RecursalAI. You can listen back to our original RWKV episode to learn the full history and details of the model, but also compare it with his more polished pitch now for a more general audience.
[01:07:06] swyx: Next I think we have Eugene Chia from RWKV previous guest.
[01:07:10] Eugene Cheah: I'm going to present about the RWKV/Eagle project. So, Eon Transformers. There's been a lot of excitement lately. And, and, like one AI year ago apparently when we launched our 7B AI model, there was a lot of excitement in the buzz, because for the first time, an attention free model beat other transformer models at one trillion tokens at a 7B class.
[01:07:34] Eugene Cheah: And if everyone's been playing open source AI, you know 7B class is one of the best. Most important class 'cause it's the ones that works on most devices, laptops, and everyone's been playing around a bit. And the excitement is compounded by the fact that we even showed that even with 300 million tokens and a few that we perform similarly, transformers, that means people are projecting is what happens if we train another 1 trillion?
[01:07:55] Eugene Cheah: Will we match or can we go beyond that? And, and it also spurs up questions beyond actually our architecture itself. It's spurs up questions that. Maybe what we need is good data and an efficient architecture, not just RWKB, it could be beyond that. And that's what caught the attention for a lot of folks, even yeah.
[01:08:17] Eugene Cheah: And why we do very different is that our architecture scales linearly. So, we are in this space together with Mamba and a few other architecture where we are trying to build the next architecture to, that can scale much larger for, for everyone. But, and we share that with Mamba because we believe that attention is not all you need, and it's like, it's been a running bet right now.
[01:08:40] Eugene Cheah: We are the strongest evidence to date. But sometimes, like, talking about scale, right, sometimes we get lost in numbers. Because, like, I can show this chart. The last time I showed this at a Linear Transformer event, which only 8 people took pictures of it and understood what it means. And they were all from either Google or Facebook.
[01:08:59] Eugene Cheah: Because, like, what it says here, right, is that We are able to run run on a single GPU with one model, 256 on a single 4090, or a thousand concurrent users. But, to put that into contrast, right, what that transformers typically handle 8 or 16 concurrent requests per GPU. We're talking about 256 or a thousand, many orders of magnitude higher.
[01:09:26] Eugene Cheah: And all we're sustaining at NeoChat GP speed. And so I sometimes like, like, sometimes when I get lost in these words, these days I'm actually trying to step back into like, Why are we doing this for our group, for our organization? And, and this, and, and some, and for us right, we are actually making the AI model for everyone in the world.
[01:09:47] Eugene Cheah: And in every country, in every language. So, what does it take to make an AI for the world? Apparently some folks think it's 7 trillion dollars. But, I think 7 trillion is a bit too much. Like, what's going to happen to half of the world that doesn't even have a trillion dollars? Yeah, so I want AI to be accessible at scale.
[01:10:09] Eugene Cheah: So, apparently ChatGPT produced, or OpenAI produced 100 billion words per day. That's 3. 4 million tokens per second. No one has the exact numbers, but it's typically 50k, H100s and above remote, like these are some old numbers, like the numbers have gone way beyond this, apparently. But, with our architecture, for a 7B model, that's just a thousand GPUs, or ten thousand GPUs for a 70B model.
[01:10:38] Eugene Cheah: We're talking about one data center to handle all of OpenAI's workload. And if we want AI agents everywhere, cheaper, at a much larger scale, we need to be thinking about that fundamental shift. Because it's not just about who can it's not just about you can afford it in the US, it's about everyone else in the world.
[01:10:58] Eugene Cheah: And that brings us to the second advantage of our model, which is not even architecture. Because we are accessible by language. We apparently beat Mistro and everyone else in Mountain Lingo, but that's not because our architecture is better, but because we're an open source team that came from all around the world and wanted our model to work for our mom and grandma.
[01:11:22] Eugene Cheah: That was the real reason, and we We iterated and refined the data accordingly. We created a custom tokenizer that supports all languages, not just English. And sometimes in the race for the English benchmark, because one of the reasons why other models don't perform as well in multilingual, is because the truth is, if you add multilingual, you hurt your English eval.
[01:11:45] Eugene Cheah: But, who are we building the AI for? Are we building it for our evals? Or are we building it for the people to use? And, and, even in evals, my frustration is, we trained on 100 languages, I only got 23 languages for evals. Like, where's everything else? So, where are we now? Just like I mentioned 1. 1 trillion, that's where we are, we are in between the 1.
[01:12:07] Eugene Cheah: 5 trillion and the 1 trillion models for for all, all the, all the English models benchmarks. And, yeah, zooming in further, it just shows that we have more room to go. And, for me, like, The emphasis on English is weird because only 70 percent of the world speaks English, but we are here for the 83%. That's for us.
[01:12:28] Eugene Cheah: If you all want to get the best English model, sure, it may not be true for us, but we are here for everyone else. And, yeah, and a lot, a lot, the launch of that model, I think what was the biggest feedback I had, was not that it was a linear transformer, was that it can run on their own. Laptops. Some people even ran it on a Raspberry Pi, very slowly.
[01:12:50] Eugene Cheah: And it supported their language, which was more exciting because that's more important for most people. And I think the last one that I've recently like heard that was unique for us and is a lot more important is that ultimately this model is owned by everyone because We put it into the Linux Foundation.
[01:13:09] Eugene Cheah: No custom charity, no custom board structure, no weird stuff. We just put, we just train the model, put it in an open source organization. That means it's not owned exclusive to us. If I go rogue one day, you can just, the code will not disappear. The model will not disappear. Linux Foundation has already bought into it.
[01:13:26] Eugene Cheah: And that is to all of you here. And so, and so what's next for us? Well, We recently started a commercial entity. I know that's weird to say after the open source stuff. But, we, and since then we managed to get more investors and sponsors that we started our next major train. So we are training the next 1 trillion token.
[01:13:47] Eugene Cheah: This is 16 H100 nodes eating enough electricity for multiple homes. And by the, and by the end of next, by the next month, we'll have our 2 trillion transformer alternative. That you can do one-to-one compare with Lamar. And of course, because since we had to make a profit somehow for our investors, we are launching our platform also to host train and fine tune our models all in by March, 2024.
[01:14:15] Eugene Cheah: And quick shout up to later space. We literally, the first. To cover us in, in, I guess in the AI influencer sphere, before, before beyond Transformer. It was even sexy. It was like, yeah. The first to even consider us and yeah. And we hope that a few of you get excited what this in join us along the way.
[01:14:37] n/a: Yeah.
[01:14:38] AI Charlie: Final Frontiers had a stellar lineup of demo judges featuring CEOs and VPs of AI from LaminDex, Replit, GitHub, AMD, Meta, and Lemurian Labs. RWKV won one of two judge prizes available that night, alongside with this next startup, Pixii AI.
[01:15:00] Pixee - Automated Security
[01:15:00] Rahul Sonwalkar: Next up also in the. Automated
[01:15:02] n/a: workforce, workforce category. Pixie. .
[01:15:04] Ryan at Pixee: Awesome. Hi everyone. I'm Ryan. I'm a software engineer on the team building. Pixie pretty straightforward, automate security. A little bit about myself. Previously I've worked at other security companies, building developer facing security tools.
[01:15:17] Ryan at Pixee: I've also worked as a security engineer on developer tools. So, this is a space I love. I'm really interested to see how it develops. Why are we doing this? So, as it turns out we're generating a lot more code. So, this is an example user of Pixibot. It's a repository called Sterling PDF. It's just a web application.
[01:15:37] Ryan at Pixee: Got 18, 000 stars on GitHub. Developed using, 100 percent using, chat gbt. So they installed PixyBot three weeks ago. And they got a lot of different suggestions for fixes for us. One of which one of which was, I am positive, was a real vulnerability. This is a, you know web application that's used by real people.
[01:15:58] Ryan at Pixee: There's a button here, you can deploy it to DigitalOcean. So, we need to find a way to scale our security automation, in order to scale our relatively limited security workforce. So just to give you an idea, What Pixivot can do, this is like a very classically vulnerable application that a lot of security tools like to try themselves out on.
[01:16:17] Ryan at Pixee: One of the things that I'm really excited about that we just shipped on the past couple weeks was integrating with Sonar. So Sonar is a code quality tool that Sonar is a code quality tool that finds Security issues, performance issues, lots of other kinds of issues in your code. It also, as you can see here found 2, 600 issues in here, taking 33 days of effort.
[01:16:39] Ryan at Pixee: That's not really where we want to have Most product engineers focusing their time. It's definitely not where we want to have our security engineers focusing their time. What can we do to automate this and get these fixes automatically? So with Pixie we take these code quality security issues in from these other tools and then automatically remediate them.
[01:16:57] Ryan at Pixee: So in the case of this this is a super minor change. If a developer were to find this issue in their code, they could fix it in a minute. But, they don't have to, and more importantly, there's backlogs of tens of thousands of these issues in organizations across across the world. And, so if we can automate this one task, even if it just takes a minute, and perform that, you know, continuously, across, you know, thousands of companies, we can save a lot of time.
[01:17:23] Ryan at Pixee: Automated enforcement of security and code quality is what we're all about. But yeah. Not all security issues are worth fixing. Not all code quality issues are worth fixing. Sometimes they're wrong. The incentive structure for these tools is, you know, they want to find real things, but most importantly they have to find something.
[01:17:42] Ryan at Pixee: So at Pixie we believe, you know, even if something might not be a complete exploitable vulnerability, if there's an opportunity for hardening or improving your code base, you should probably take it. But there's some of these things that are just not that. So we developed a tool we call triage, which will connect in with other tools that are notorious for finding lots of issues, and we can help you fix them.
[01:18:05] Ryan at Pixee: So in this case we made a CLI that looks at your security backlog and identifies issues that we know don't matter in the context of your codebase. It pulls down the issues categorizes them, and then enables you to prompt It prompts you to either say, hey, this issue is not important, here's why we think it is, and we'll update the state for it.
[01:18:26] Ryan at Pixee: So in this case, this is a warning about a parameter into a this file directory, It has some cross platform compatibility concerns. But based on the context of your code base, and , a large language model we're able to give you the confidence to focus on the issues that are most likely to actually matter.
[01:18:44] Ryan at Pixee: One of the other things we do is You know, well so we're delivering, what you saw before, is we're delivering as a GitHub app, that we're delivering as a GitHub app, so that developers can integrate this into their existing workflows, but a lot of people like to just try a pixie from the command line on small projects, automatically get their fixes, and just commit all of them.
[01:19:02] Ryan at Pixee: So, that's what we built. Try Pixie on GitHub, try Pixie on the CLI, and we're really excited to see what we can help you fix.
[01:19:10] AI Charlie: Congrats to Pixie and RWKV. Our last featured demo is Rahul from Julius AI, who provides an interesting take on competing with OpenAI on its own home turf, the chat GPT code interpreter.
[01:19:30] Julius AI - Competing with Code Interpreter
[01:19:30] Rahul Sonwalkar: You might remember RoboLigma,
[01:19:33] Flo Crivello: that's the poor engineer that got laid off by Elon Musk outside his office.
[01:19:37] Eugene Cheah: He's back, he's back on his feet, he's got a whole new startup, so
[01:19:40] Rahul Sonwalkar: thanks so much for having me here. I'm working on Julius. How many of you
[01:19:44] n/a: here are data scientists? think everyone here
[01:19:47] Rahul Sonwalkar: needs a data scientist. But there just aren't enough. And that's what we're building. Julius is an AI data scientist that helps you analyze datasets, make visualizations, Get insights from the data, and really dive deep into all sorts of data that we have in real life.
[01:20:02] Rahul Sonwalkar: So, we launched about six months ago, and since then have grown to 300, 000 users several thousand users using us daily to analyze datasets, create visualizations and get insights. So what I'll do now is give you guys a quick live demo of how it actually works in IA. I actually hope it works
[01:20:21] Rahul Sonwalkar: because we just posted code changes.
[01:20:23] Rahul Sonwalkar: But here I have a dataset of 20, 000 rows of data over time for the last 100 years of human height for different countries. So I'm going to take this dataset, dump it in Joly's and say,
[01:20:35] Rahul Sonwalkar: load this for me.
[01:20:41] Rahul Sonwalkar: And while it's doing that, I want to explain what's happening under the hood. So basically, for each user, Think about how a human data scientist would analyze a data set that you give it.
[01:20:54] Rahul Sonwalkar: It would take its computer write code, run that code, maybe in a Jupyter notebook, look at the output, and then decide if that answers your question, or if you need to write more code. Julia works similarly. So that's you, that's the AI, and then for each user, you get a virtual machine in the cloud, and Where the AI is filling up the Jupyter Notebook, writing the code to get the analysis that you want, and then serving that back to you.
[01:21:22] Rahul Sonwalkar: Many times, that code is not correct the first time. But Julia is able to recover from those errors and actually get you the answer that you want.
[01:21:31] Rahul Sonwalkar: So let's look at our chat. We said, load this file for me, and the AI basically went, spun up a Jupyter notebook, loaded pandas, looked at the file, and gave us a few rows.
[01:21:42] Rahul Sonwalkar: I'm going to ask
[01:21:43] n/a: plot the Mail, pipe, overtime,
[01:21:53] n/a: in France.
[01:21:53] Rahul Sonwalkar: So, the AI team's been writing this code, because pipe overtime in France for men, and then body type for us. And the good thing about Python, is If you spend a ton of time on SQL, what we realized was that SQL, it's really hard to write actually useful queries and do deep analysis like regression, etc.
[01:22:15] Rahul Sonwalkar: with just SQL. With Python, you also get a whole ecosystem of modules built in. Right? matplotlib, pandas, numpy, escaler, and there's thousands of these. So, that was the initial insight, and then we built Julius about six months ago.
[01:22:33] Jerry Liu: What's like the practical difference in UX between this and just
[01:22:37] Jerry Liu: trajectory code interpreter?
[01:22:38] Rahul Sonwalkar: Great question. Yeah, the question was, what is the difference between Julius and code interpreter? Really, there isn't. It's just better. We're focused, we're focused With people, or people who do stuff with data multiple times a day.
[01:22:53] Rahul Sonwalkar: And we talked to a lot of these people, and we said, Okay, how can we build things for you that would help you do your job?
[01:22:59] Rahul Sonwalkar: So, an example of this is on chat. gt, often times they'll give it a data set. People try to write their code, and sometimes that code has errors. And it kind of goes into this loop of trying to fix these little errors.
[01:23:13] Rahul Sonwalkar: What we have focused on is, okay, how do we prevent that from happening? So we looked at thousands of users using us daily. Collected data on where these errors happened. And focused really hard on fixing those errors. Beforehand, before they actually happen at runtime.
[01:23:30] Rahul Sonwalkar: This could mean a bunch of rules.
[01:23:32] Rahul Sonwalkar: This could mean, you know, prompting changes, et cetera, and just preventing that from happening. Second of all, we have features that allow people who do stuff with data on a daily basis to go deep and do the last mile of analysis done. That could mean, you know, You can click, show code, go into the code, edit the code changes.
[01:23:53] Rahul Sonwalkar: You can also give natural language instructions on the code. Finally, let's say you have this graph. And I want the graph to have some changes. Like, I want it to be a bar chart instead of instead of instead of a line graph. You can kind of just go in here and give natural language instructions to let the user take what the AI has done for it and then take it to the, to the finish line.
[01:24:17] Rahul Sonwalkar: If you've seen that code interpreter, that's pretty hard for users to do. So we focus on data and that use case, and we will do that.
[01:24:23] n/a: Cool thanks guys!
[01:24:27] AI Charlie: That's unfortunately all the time we had to feature demos, but many thanks to Botpress, Markov, Kura. ai, Sweep, and Motif as well for being finalists. For the last part of our anniversary celebration, we wanted to turn over the mics to you, our dear listeners. We hear so many great stories from listeners about how latent space has come into their lives, and we've never had the opportunity to feature them on the pod till now.
[01:24:53] AI Charlie: Our first listener is Balaz Nemethy from Hungary, who talked about one of the most delightful gems in the latent space community, our weekly Discord paper club.
[01:25:03] Latent Space Listeners[01:25:03] Listener 1 - Balázs Némethi (Hungary, Latent Space Paper Club)
[01:25:03] swyx: Tell me, tell people about, like, what happened. Yeah, like,
[01:25:07] Guest 1: two weeks ago, two weeks ago, there was the paper reading club on Discord, and I, and then, halfway in, or like, one quarter in, like, the author of the paper showed up, and it was so f*****g cool. Like, if you could do this, like, I was thinking, like, this should be a format, like, there is two minutes papers that probably, you know, who
[01:25:28] swyx: is, yeah he's Hungarian,
[01:25:31] Guest 1: Living
[01:25:31] swyx: in Vienna, but like Karoly,
[01:25:36] Guest 1: pronounced in Hungarian is Karoly, yes so that was so special because it's There is a certain amount of information in papers, the quality of paper might have dropped in the past year than before, due to the social media aspect of Archive.
[01:25:52] Guest 1: So, having the person there and giving in even more details than just what you could read, was like, so amazing. I know it's really hard to organize, but like, If it would be possible to have more, maybe not recurring, like, you know, it's just like,
[01:26:08] swyx: oh, nice. The Matryoshka,
[01:26:13] swyx: yeah, yeah. So we have one next week the MRL paper, Matryoshka Representation Learning, which is a way of sorting embeddings so that you can truncate them. And OpenAI recently shipped this in their API for the new embeddings models, where you can reduce, like, a 3, 000 vector embedding to 265, so you save more than 90 percent on your embeddings.
[01:26:30] swyx: Vector database costs and speed and everything. Nice. So the authors are coming by and presenting at the Discord. I will join. I will join. Any other, like so basically I'm just going to record random opinions. I know how you produce the
[01:26:45] Guest 2: podcast. So we're going to
[01:26:46] swyx: do this. You're going to be on the show.
[01:26:48] swyx: You're going to be on the show. Any other, like, how did you discover the podcast? What do you feel?
[01:26:54] Guest 1: Discovered it on Spotify, searching basically AI. I use PocketCast for all my podcasts, but I was like, let's just search AI. I think I was searching for AI generated music, but it brought up podcasts.
[01:27:07] Guest 1: And I was like, you know what, I'm kind of getting out of my previous industry. So like, I'm just going to separate. The whole AI following thing and I just like followed This was the first one that came up and then a couple of others just to like have it have it downloaded But I but this was like the literally the first podcast I'm following on Spotify when I follow like 70 on podcast So like I was like and I started I was like, okay, this is great Or they're only great podcasts, and I kept coming back to
[01:27:40] swyx: yours,
[01:27:40] swyx: there are other podcasts that we consider friends, and we try to do collaborations with them, and podcast swaps with them, so Yeah, that's great.
[01:27:47] Listener 2 - Sylvia Tong (Sora/Jim Fan/EntreConnect)
[01:27:47] AI Charlie: Our next listener is Sylvia Tong, founder of the OntraConnect community, a community of founders and investors supporting entrepreneurs in Silicon Valley. She wanted to discuss OpenAI Sora and Jim Phan from NVIDIA, who we have featured on our previous OpenAI Dev Day Recap podcast, and will be a future guest on LatentSpace.
[01:28:07] swyx: How did you find the podcast, and what do you feel about it, what do you want to tell people about it?
[01:28:12] Guest 2: Actually, I know Jim Fan, so I, so Jim Fan, I know you! And then I follow your Twitter and follow your podcast. Yeah, yeah, yeah, yeah, yeah. It's another event, maybe you know Alliance AI, it's another community, and we like, they had that event like early last year, so they have various events, they, they are the founder of Stanford, so they are all Stanford grads, so they are even always in the Stanford University, like one of the room, yeah, so Jim Fan is one of the first speakers, so, yeah, and connect with him on WeChat, and, yeah, and connect with you, yeah, follow your Twitter!
[01:28:47] swyx: Jim is Jim is super friendly, and we have to have a full episode with him at some point. But he's, yeah, I mean, he's doing amazing things at NVIDIA. I'm sure he's very happy there.
[01:28:59] Guest 2: You should ask him about Sora. The JAI video, yeah, he has so many opinions about, you know, yeah.
[01:29:07] swyx: I feel like, okay, Jim is this interesting mix between a researcher and a Content creator, right?
[01:29:13] swyx: So, Jim's take on Sora, I slightly disagree with, because he says it's basically a data driven world model, and a lot of people misinterpreted him, me included, basically saying like, oh, are you, are you saying that there's an underlying physics model behind Sora? And he's like, no, no, no, no, no, it's just, you know, using diffusion transformers to learn a representation of world models.
[01:29:34] swyx: It's not perfect. Then I'm like, okay, but that's a misleading analogy, I don't know. Anyway, so like
[01:29:40] Guest 2: he But that's for the content purpose. That's for the Twitter content purpose. You have to, yeah,
[01:29:44] swyx: yeah. So I feel this, like, pull towards, like celebrating things on Twitter, but then also trying to be realistic.
[01:29:53] swyx: Trying to present, like, what is actually the thing instead of the hype. And it's very hard to separate. And that's something that's a challenge for Lanespace.
[01:30:00] Guest 2: Yeah, it's hard, I feel it's hard to have the conversation on Twitter, so you need to have a conversation in the podcast. So invite a few people who maybe have to talk about Twitter, but really explain what they mean in your tweets.
[01:30:13] Guest 2: Because, yeah, it's hard to understand just a few words. Yeah, so do you actually think Sora understands the physics of the
[01:30:20] swyx: world? A little bit. It's, yeah, Sora understands a little bit of physics. The problem with this is they cannot have 80 percent physics. Like, it's 100 or 0, like, otherwise you lose confidence in the thing.
[01:30:33] swyx: So that's why you have these generated models where the chair will show up and disappear, the spoon will show up and disappear, you know, like, that's all the artifacts you see in Sora. Which is good for us for now, because we're lucky that it's not good enough yet to consistently generate all those things.
[01:30:50] swyx: At some point it will be, we just wait two years, and it will be.
[01:30:53] swyx: Very cool. Thanks for it. I love this discussion. Thanks for listening. I'm really glad to have you as a listener.
[01:30:59] AI Charlie: Alessio and Swyx covered the Jim Fan vs. Yan LeCun world model debate in the main pod, and you can click through the show notes for more detail directly from each of them. Our third listener is RJ Honecke, who comes from a data science background, but wanted to ask about how we think about learning in public in AI, and how that informs the context with which latent space is created.
[01:31:23] Listener 3 - RJ (Developers building Community & Content)
[01:31:23] swyx: Hi, I'm RJ. Shawn, nice to meet you. Nice to meet you. Do you also listen to pod, or are you just here to hang out? Yes, very much. Oh, yeah. How do you feel about it?
[01:31:32] Guest 3: The depth that you guys go into it's a lot deeper than other. This is a podcast that I listen to. I kind of found it, and then didn't switch back.
[01:31:39] swyx: Thanks!
[01:31:40] Guest 3: What's your background? I, I am a data scientist.
[01:31:44] Guest 3: I run a data team at cell communications equipment manufacturer. And we collect a ton of telemetry data, and, and other things like that. And I'm running a data team to make inferences about the health of our network, about, operating the network more efficiently and also in our manufacturing process and product development process to improve our ability to detect when we improve or, or get worse at operating, or, sorry, our products like build or hardware bills get better or worse.
[01:32:17] Guest 3: So actually, I wanted to actually ask a question of you and your thoughts about this. So I find the discussion about model measurement and, and, and evaluation to be very similar to the problems that we have in wireless. Because you have this very non deterministic system, right? So I was thinking, and I also just read your your little thing about learn in public.
[01:32:43] Guest 3: So I was thinking about trying to come up with a good way to, to, and I'm, I'm learning about some new techniques that we're starting to implement to monitor our development process and so forth, and evaluate our, the quality of our builds and our hardware, and I was thinking about trying to tie that in with evaluation of LLMs.
[01:33:08] Guest 3: I just, I, I, I don't know. That's as far as I got in the thinking, but I just thought that would be a fun thing to try to put out there and wanted to hear your thoughts about how, how to, like go about
[01:33:17] swyx: that. Yeah. You can, you don't need anyone's permission. That's, that's the beauty of this thing. But also no one owes you anything.
[01:33:23] swyx: No one owes you their time, their attention or, you know, or, or, or responses. And I typically try to classify these things as different modes of learning in public. Mm-Hmm. , I think I have four modes that I sketched out, but the two I remember the most are Explorer and Connector, and then there are two more advanced modes, I think like Teacher or Builder or something like that.
[01:33:45] swyx: The Explorer is where you sort of like put things out as you go along. It's learning exhaust, where you don't have expectations so that anyone will read it. It's mostly just notes for yourself. And that actually, that lack of expectations frees you. Because then you're like, oh, like two people read it.
[01:34:03] swyx: Doesn't matter, it's useful to me. It's useful to my team, it's useful to me, it's useful to whoever comes after me because I documented my work and my thinking. And that's great. And I think that's, that's the way that most people should start, which is like, just lower, you're not going to be an influencer overnight, like, it's fine, completely but get your thoughts out there, and then also, but also, like, start having feelers in different directions on what works for you, what works is a combination of what you like to And what other people want from you, and you will know when people tell you they want more from you.
[01:34:35] swyx: And so then, when you get there, when you have expertise that you have that other people don't, then you switch gears into a connector, where you are now coming from a place of authority. Like, I know how to do this right, and I will teach you, because I have done this, and I have spent more time, paid more in my dues, and here's the lessons.
[01:34:55] swyx: Thank you. And then that comes to be, that tends to become more of a polished effort that tends to become more measurable or in terms of like the impact and the influence it can get. And I think that's, that's where people start moving towards. But basically just lower expectations, make it cheap to experiment, put out a lot of stuff in different directions and see where the market pulls you.
[01:35:13] Guest 3: Okay. Yeah. So, I mean, do you have thoughts about, like, I'm very much aligned with like who cares about. I mean, I care, but my need is not to be a social media influencer. My need is to, like, I want to learn and I like the idea of, you know, sort of like sharing that with people and sharing the process with people.
[01:35:39] Guest 3: So, like, thoughts about platform or like, I mean, I know it's going to be different for everyone, but like, what, what, what's it, what in your experience has changed? Has been successful while getting started.
[01:35:53] swyx: Yeah so I tend to tell developers, most developers to start on Hashnode these days. Hashnode is basically Medium if it was for developers and didn't suck.
[01:36:06] swyx: Because I hate Medium with a passion and a glowing, fiery hatred. Everyone does. It's comical how bad they are. But, I use Substack for latent space. I'm pretty happy with Substack. It's an email social network. Email is one of the most important things for people to like, come back to you frequently. So that you don't, you're not subject to an algorithm, you own your audience, you know.
[01:36:26] swyx: If you want to move off Substack someday, it'll let you take the emails and keep that relationship going with the people that you have. And that's super important as a creator. And then you can also write your own blog. And tweet, and tweet, and all that. I tend to say though Pay attention to what you enjoy, and what you spend the most time on.
[01:36:42] swyx: If you're a LinkedIn guy, be on LinkedIn. I'm not on LinkedIn, so I'm gonna do horrible on LinkedIn, because I don't know the metagame of LinkedIn. I don't know what does well, I don't know what people want. So I shouldn't even, I don't, I don't bother, I should try, because obviously there are like way more people on LinkedIn than there are on Twitter, but I'm just a Twitter guy.
[01:36:59] swyx: Like I'm, that's just, that's who I am I have, I have, I also sort of am old money there in a sense of I have an existing followership that predated Latentspace. You know, Latentspace doubled my following, but like, I had some before that. So, like, all that's great I just think, like, you're going to know the metagame, and that's actually very important, of, like, where you already spend time, like, I, I have friends who are, like, on TikTok, I have friends who are on YouTube a lot, I'm on YouTube a lot, I should do YouTube, because I know, I know what's, what's going on on YouTube, it's just, then you have to put the effort to, to do that, and I'm, I'm, like, video production is, like, the most expensive thing, anyway, long story short try to pay attention to, this, Complex mix of like, publishing platform existing embedded social network on that platform, And where you already spend times, so that you know how to create what will do well, just because you already spent time on it.
[01:37:46] swyx: Yeah, okay.
[01:37:47] Guest 3: What's your favorite?
[01:37:49] Guest 3: Favorite episode I really liked actually the the NeurIPS, like, recap because I haven't been to NeurIPS so You know how much time that took? Well, I mean, the episode is like four hours, right? Yeah. And that one I didn't, I didn't do the paper one because I, I actually I, I usually listen.
[01:38:07] Guest 3: I don't watch. So I, like, it's be really hard to There's no video for that. Oh, there isn't? Oh, okay. So I, like, I have to find the paper and anyway. Yeah. So that's hard for me. Yeah. But I, I did enjoy the interviews in the other The startups episode. Yeah. Yeah.
[01:38:25] swyx: People love that.
[01:38:26] swyx: It just takes a ton of work, and I would love to offload it. This is going to be another one of those where I just kind of slip together little things. And it's good. It brings you there. That's the thing, right? Like, you're not there physically. I'm here. Let's, like, bring people into the closed community.
[01:38:40] swyx: And so I would like to do more of that.
[01:38:42] Guest 3: Yeah, no, I really enjoy how you bring, like, a lot of people that I would not have otherwise even known about, let alone have access to, and then You have this conversation with them. It's really fun. Thanks
[01:38:56] swyx: for coming on. Can I, can I get your contact so that we can find you?
[01:38:59] swyx: Yeah. Yeah. Yeah. You're going to be on the pod. Oh, awesome.
[01:39:01] AI Charlie: People seem to love the New Reap's recap pod, and we'll keep doing more of those when the right occasion presents itself. This was also a pick for our last listener, Jan Jung from Australia, who comes at AI from the design point of view and was very interested in our early AI UX work on latent space.
[01:39:20] AI Charlie: If you're in SF and want to more novel AI UX ideas, reach out to him.
[01:39:25] Listener 4 - Jan Zheng (Australia, AI UX)
[01:39:25] Guest 4: My name is Yon, and I came across you on GitHub when I was looking for ways to solve problems on Svelte. And you pretty much answered all the questions I had for pretty much A couple of years, and then you left, and you started doing latent space, and I'm like, what is that?
[01:39:45] Guest 4: What is an LLM? So I started listening to your pod, and yeah, and here I am. And
[01:39:49] swyx: then you're part, you're from Sydney, or you were, you were in sydney.
[01:39:52] Guest 4: I, I moved to Sydney a couple years ago to work on a clinical trial, but now I moved back, probably, again, I blame you for it, because I listen to every episode, I'm like, s**t's going down, in San Francisco, you gotta be here.
[01:40:05] swyx: So yeah, and then you were, you're part of build club.
[01:40:08] Guest 4: Yeah, I'm part of BuildClub. BuildClub is a Unfortunately, I was at the airport when you're giving a presentation and Annie has not sent me the recording yet So I'm not seeing it. It's on YouTube.
[01:40:24] swyx: Oh, okay. Great.
[01:40:25] Guest 4: Oh, awesome. Okay, I'll take a look. But BuildClub is the one and only AI centric community in Pretty much Sydney.
[01:40:39] Guest 4: And I had to spend months to push Annie to do that thing. And eventually she did, and I'm so glad she did. And it's growing, and she's doing amazing. She's expanding to many cities. It's ANZ now. Yeah, it's amazing. And she has our couch from our apartment when we moved away. We couldn't find a way to sell it.
[01:41:01] Guest 4: We're like, hey Annie, we're getting a space. Do you guys need a couch? She's like, sure. So she has my couch. It's amazing.
[01:41:07] swyx: And then what do you listen for in, in, in space? What, you know,
[01:41:11] Guest 4: what are you interested in? I like to get a sense of what's going on. You guys ask very good questions. For some reason you guys seem so well researched, both you and Alessio.
[01:41:24] Guest 4: Somehow you're just You asked very good questions that me as a Person, like, general product developer, product engineer, I have no idea about ML, I don't follow the papers, I know about the paper club, I don't follow it because it's over my head, but you guys distill it so well, and you guys ask the questions to your guests that I have in the back of my mind, or that I don't even know that I have the questions and then I You guys guide the conversations in a way that I can learn from and I wouldn't even know anything to ask So I'm so glad you guys are doing it.
[01:42:03] Guest 4: It's so helpful and Keep doing what you're doing. Yeah, and I really and I really love the What you guys did with the best papers from the talk Yeah, it's really good I mean like a lot of that was way over my head But I like listen to it all and try to I just get the sense, like, just, I just try to keep listening to this stuff until I get it.
[01:42:27] Guest 4: And you guys expose, I mean, I would never go to a conference like that, but, yeah. But like, I was just like, not understanding anything, but you guys make it so accessible, and I love it.
[01:42:39] swyx: Yeah, so, maybe, the Pocket Studio is right here, actually, I can show you after we're done recording. It's not that fancy, it's just a studio.
[01:42:46] swyx: And yeah, for me, the goal within NeurIPS recap, was not that we would, like, you would read everything or anything, like, yeah, we would just pick what we thought was most important for you, and if any one of them interested you, you could double click on it. That's it. You know, we're not gonna be, like, the experts on every single thing.
[01:43:04] swyx: It's impossible, right? And already, like, the episode that I cut together for that was like three and a half hours, so people were complaining about that. And then the last thing Lesser and I don't do that much research for each episode, but, you know, we research the guests.
[01:43:21] swyx: But just being involved in the day to day conversations in our day jobs prepares you for that. And I think that is important. No prep needed because, you know, we're in it. We're in the arena, as they say. Yeah. Anything else?
[01:43:35] Guest 4: Like, like there's so much excitement. There's so many things to cover. And like what you guys are like, maybe culturally, yeah, that, that would be a thing I was always wondering, like, like, and that might be not partly in the space, but what are you guys doing? Like to cover the cultural aspect of what's happening here, it's probably like.
[01:44:00] Guest 4: A separate thing, but equally important thing, to like, document all the conversations that are happening around here. And all the other build spaces, like, we see glimpses of that on Twitter, but I think capturing more of that would be super cool.
[01:44:17] swyx: Yeah I feel like that's something that someone else should do.
[01:44:20] swyx: We try to be more technical. Because that, that, people can use it at work, they can justify that for productivity. We might try to Dabble in some of that. So I'm pretty connected with like, the main areas for those listening The main areas for those listening who are interested in like SFAI is like Shack 15, AGI House SF, AGI House Hillsboro and then us and maybe HF0 and then maybe a little bit of Founders Inc.
[01:44:48] swyx: And those are it. There's this like, There's more community oriented spaces like the commons but like they're not sort of AI centric. And So we can do a little bit of reporting around that, but it's gonna be like, this American life, you know, like, tell me your life story, like, solve story, I'm not like, the best at that, and then also, like, there's a lot of very, very brutal cutting for that, that is hard to do, but we can dabble, or we can do it on the
[01:45:13] Guest 4: side.
[01:45:15] Guest 4: Oh, the other thing I'm very interested in, I'm a UX designer by trade, and anytime you guys touch on AI and UX and Jet or UI, I'm all ears, and I would love to, Again, it's probably not the technical side of LatentSpace, but I think there needs to be a hundred times more resources out there than what's currently available.
[01:45:34] swyx: Yeah, yeah we had a, we, I think we held the first AIUX meetup ever in the, in, in SF, in Worlds. That was really fun. The meetup's on YouTube, if you want to see it, and, and it's in the LatentSpace archives of the newsletter. I don't think we ever published a podcast version of it.
[01:45:48] swyx: So you have to just subscribe to the newsletter and then check the YouTube for, for that stuff. But yeah, UX is a topic of ours that we like to cover. It's just very hard to cover as an audio medium. Yeah. 'cause you can't see it . And also I think like it's gonna be mostly owned by like Notion and Versal and Retool, which we've, we've interviewed retool, we're going to interview Versa and we've interviewed Notion.
[01:46:12] swyx: So who else who, who's who? Like who do you wanna listen to on the IX? Right. Like, there's individual people, like we had Amelia Wattenberger present at AI Engineer Summit, you can see that on YouTube. Like, I know a lot of the thinkers on AIUX, and I think I know what they say, like, I haven't seen anything super innovative.
[01:46:31] swyx: Everyone hates chatbots, everyone wants to innovate things. I haven't seen any new ideas since we did the AIUX meetup one year ago. Tell me I'm wrong.
[01:46:42] Guest 4: Well, that sounds really disappointing. I haven't seen anything on Twitter that I thought that would be easier to push because we just wrap LLMs. But on Twitter there doesn't seem to be that much going on, to your point.
[01:46:59] Guest 4: But there needs to be more people from the design space, from the product space, like UX researchers, coming in and figuring out how can we take LLMs and apply them to real problems. I haven't seen a whole lot of that. In Cine, there's not a whole lot of that. I'm hoping to maybe be a part of the community here and try to grow that side of
[01:47:21] swyx: the things.
[01:47:22] swyx: Well, look, you're here now. You're interested in AIUX. Run the next AIUX meetup. I can set you up with the venue, the people. You need to find the speakers. I'm not going to find the speakers for you. But if you want to set that up, go for it.
[01:47:37] Guest 4: So, I actually copied your AIUX format, and I held a talk in Sydney, and in a very light fashion, like 20 30 people showed up.
[01:47:49] Guest 4: We had some cool demos, it was like a baby, like a small version of your AIUX conference, but yeah, I'd love to, love to participate. I mean,
[01:47:59] swyx: this is SF, 300 people will show up you just gotta get some cool demos, I can siege you with some people let's make it happen. Let's make it happen! Let's make it happen, alright, well it's nice to meet you, and I'll get your details.
[01:48:09] AI Charlie: That's all, folks. If you've enjoyed or benefited from our work on latent space over this past year, we'd really love to hear from you, and really appreciate it if you'd tell a friend. The only way a podcast consistently grows is through your word of mouth, and that helps us book incredible guests and attend great events in our second year.
[01:48:29] AI Charlie: Have a lovely weekend!

Get full access to Latent Space at www.latent.space/subscribe

9 March 2024, 10:55 pm
1 hour 20 minutes

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Speaker CFPs and Sponsor Guides are now available for AIE World’s Fair — join us on June 25-27 for the biggest AI Engineer conference of 2024!
Soumith Chintala needs no introduction in the ML world — his insights are incredibly accessible across Twitter, LinkedIn, podcasts, and conference talks (in this pod we’ll assume you’ll have caught up on the History of PyTorch pod from last year and cover different topics). He’s well known as the creator of PyTorch, but he's more broadly the Engineering Lead on AI Infra, PyTorch, and Generative AI at Meta.
Soumith was one of the earliest supporters of Latent Space (and more recently AI News), and we were overjoyed to catch up with him on his latest SF visit for a braindump of the latest AI topics, reactions to some of our past guests, and why Open Source AI is personally so important to him.
Life in the GPU-Rich Lane
Back in January, Zuck went on Instagram to announce their GPU wealth: by the end of 2024, Meta will have 350k H100s. By adding all their GPU clusters, you'd get to 600k H100-equivalents of compute. At FP16 precision, that's ~1,200,000 PFLOPS. If we used George Hotz's (previous guest!) "Person of Compute" measure, Meta now has 60k humans of compute in their clusters.
Occasionally we get glimpses into the GPU-rich life; on a recent ThursdAI chat, swyx prompted PaLM tech lead Yi Tay to write down what he missed most from Google, and he commented that UL2 20B was trained by accidentally leaving the training job running for a month, because hardware failures are so rare in Google.
Meta AI’s Epic LLM Run
Before Llama broke the internet, Meta released an open source LLM in May 2022, OPT-175B, which was notable for how “open” it was - right down to the logbook! They used only 16 NVIDIA V100 GPUs and Soumith agrees that, with hindsight, it was likely under-trained for its parameter size.
In Feb 2023 (pre Latent Space pod), Llama was released, with a 7B version trained on 1T tokens alongside 65B and 33B versions trained on 1.4T tokens. The Llama authors included Guillaume Lample and Timothée Lacroix, who went on to start Mistral.
July 2023 was Llama2 time (which we covered!): 3 model sizes, 7B, 13B, and 70B, all trained on 2T tokens. The three models accounted for a grand total of 3,311,616 GPU hours for all pre-training work. CodeLlama followed shortly after, a fine-tune of Llama2 specifically focused on code generation use cases. The family had models in the 7B, 13B, 34B, and 70B size, all trained with 500B extra tokens of code and code-related data, except for 70B which is trained on 1T.
All of this on top of other open sourced models like Segment Anything (one of our early hits!), Detectron, Detectron 2, DensePose, and Seamless, and in one year, Meta transformed from a company people made fun of for its “metaverse” investments to one of the key players in the AI landscape and its stock has almost tripled since (about $830B in market value created in the past year).
Why Open Source AI
The obvious question is why Meta would spend hundreds of millions on its AI efforts and then release them for free. Zuck has addressed this in public statements:
But for Soumith, the motivation is even more personal:
“I'm irrationally interested in open source. I think open source has that fundamental way to distribute opportunity in a way that is very powerful. Like, I grew up in India… And knowledge was very centralized, but I saw that evolution of knowledge slowly getting decentralized. And that ended up helping me learn quicker and faster for like zero dollars. And I think that was a strong reason why I ended up where I am. So like that, like the open source side of things, I always push regardless of like what I get paid for, like I think I would do that as a passion project on the side…
…I think at a fundamental level, the most beneficial value of open source is that you make the distribution to be very wide. It's just available with no friction and people can do transformative things in a way that's very accessible. Maybe it's open source, but it has a commercial license and I'm a student in India. I don't care about the license. I just don't even understand the license. But like the fact that I can use it and do something with it is very transformative to me…
…Like, okay, I again always go back to like I'm a student in India with no money. What is my accessibility to any of these closed source models? At some scale I have to pay money. That makes it a non-starter and stuff. And there's also the control issue: I strongly believe if you want human aligned AI, you want all humans to give feedback. And you want all humans to have access to that technology in the first place. And I actually have seen, living in New York, whenever I come to Silicon Valley, I see a different cultural bubble.
We like the way Soumith put it last year: Closed AI “rate-limits against people's imaginations and needs”!
What It Takes For Open Source AI to Win
However Soumith doesn’t think Open Source will simply win by popular demand. There is a tremendous coordination problem with the decentralized nature of the open source AI development right now: nobody is collecting the valuable human feedback in the way that OpenAI or Midjourney are doing.
“Open source in general always has a coordination problem. If there's a vertically integrated provider with more resources, they will just be better coordinated than open source. And so now open source has to figure out how to have coordinated benefits. And the reason you want coordinated benefits is because these models are getting better based on human feedback.
And if you see with open source models, like if you go to the /r/localllama subreddit, like there's so many variations of models that are being produced from, say, Nous research. I mean, like there's like so many variations built by so many people. And one common theme is they're all using these fine-tuning or human preferences datasets that are very limited and they're not sufficiently diverse.
And you look at the other side, say front-ends like Oobabooga or like Hugging Chat or Ollama, they don't really have feedback buttons. All the people using all these front-ends, they probably want to give feedback, but there's no way for them to give feedback… So we're just losing all of this feedback. Maybe open source models are being as used as GPT is at this point in like all kinds of, in a very fragmented way, like in aggregate all the open source models together are probably being used as much as GPT is, maybe close to that. But the amount of feedback that is driving back into the open source ecosystem is like negligible, maybe less than 1% of like the usage.
So I think like some, like the blueprint here I think is you'd want someone to create a sinkhole for the feedback… I think if we do that, if that actually happens, I think that probably has a real chance of the open source models having a runaway effect against OpenAI, I think like there's a clear chance we can take at truly winning open source.”
If you’re working on solving open source coordination, please get in touch!
Show Notes
* Soumith Chintala Twitter
* History of PyTorch episode on Gradient Podcast
* The Llama Ecosystem
* Apple's MLX
* Neural ODEs (Ordinary Differential Equations)
* AlphaGo
* LMSys arena
* Dan Pink's "Drive"
* Robotics projects:
* Dobb-E
* OK Robot
* Yann LeCun
* Yangqing Jia of Lepton AI
* Ed Catmull
* George Hotz on Latent Space
* Chris Lattner on Latent Space
* Guillaume Lample
* Yannic Kilcher of OpenAssistant
* LMSys
* Alex Atallah of OpenRouter
* Carlo Sferrazza's 3D tactile research
* Alex Wiltschko of Osmo
* Tangent by Alex Wiltschko
* Lerrel Pinto - Robotics
Timestamps
* [00:00:00] Introductions
* [00:00:51] Extrinsic vs Intrinsic Success
* [00:02:40] Importance of Open Source and Its Impact
* [00:03:46] PyTorch vs TinyGrad
* [00:08:33] Why PyTorch is the Switzerland of frameworks
* [00:10:27] Modular's Mojo + PyTorch?
* [00:13:32] PyTorch vs Apple's MLX
* [00:16:27] FAIR / PyTorch Alumni
* [00:18:50] How can AI inference providers differentiate?
* [00:21:41] How to build good benchmarks and learnings from AnyScale's
* [00:25:28] Most interesting unexplored ideas
* [00:28:18] What people get wrong about synthetic data
* [00:35:57] Meta AI's evolution
* [00:38:42] How do you allocate 600,000 GPUs?
* [00:42:05] Even the GPU Rich are GPU Poor
* [00:47:31] Meta's MTIA silicon
* [00:50:09] Why we need open source
* [00:59:00] Open source's coordination problem for feedback gathering
* [01:08:59] Beyond text generation
* [01:15:37] Osmo and the Future of Smell Recognition Technology
Transcript
Alessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.
Swyx [00:00:15]: Hey, and today we have in the studio Soumith Chintala, welcome.
Soumith [00:00:17]: Thanks for having me.
Swyx [00:00:18]: On one of your rare visits from New York where you live. You got your start in computer vision at NYU with Yann LeCun. That was a very fortuitous start. I was actually listening to your interview on the Gradient podcast. So if people want to know more about the history of Soumith, history of PyTorch, they can go to that podcast. We won't spend that much time there, but I just was marveling at your luck, or I don't know if it's your luck or your drive to find AI early and then find the right quality mentor because I guess Yan really sort of introduced you to that world.
Soumith [00:00:51]: Yeah, I think you're talking about extrinsic success, right? A lot of people just have drive to do things that they think is fun, and a lot of those things might or might not be extrinsically perceived as good and successful. I think I just happened to like something that is now one of the coolest things in the world or whatever. But if I happen, the first thing I tried to become was a 3D VFX artist, and I was really interested in doing that, but I turned out to be very bad at it. So I ended up not doing that further. But even if I was good at that, whatever, and I ended up going down that path, I probably would have been equally happy. It's just like maybe like the perception of, oh, is this person successful or not might be different. I think like after a baseline, like your happiness is probably more correlated with your intrinsic stuff.
Swyx [00:01:44]: Yes. I think Dan Pink has this book on drive that I often refer to about the power of intrinsic motivation versus extrinsic and how long extrinsic lasts. It's not very long at all. But anyway, now you are an investor in Runway, so in a way you're working on VFX. Yes.
Soumith [00:02:01]: I mean, in a very convoluted way.
Swyx [00:02:03]: It reminds me of Ed Catmull. I don't know if you guys know, but he actually tried to become an animator in his early years and failed or didn't get accepted by Disney and then went and created Pixar and then got bought by Disney and created Toy Story. So you joined Facebook in 2014 and eventually became a creator and maintainer of PyTorch. And there's this long story there you can refer to on the gradient. I think maybe people don't know that you also involved in more sort of hardware and cluster decision affair. And we can dive into more details there because we're all about hardware this month. Yeah. And then finally, I don't know what else, like what else should people know about you on a personal side or professional side?
Soumith [00:02:40]: I think open source is definitely a big passion of mine and probably forms a little bit of my identity at this point. I'm irrationally interested in open source. I think open source has that fundamental way to distribute opportunity in a way that is very powerful. Like, I grew up in India. I didn't have internet for a while. In college, actually, I didn't have internet except for GPRS or whatever. And knowledge was very centralized, but I saw that evolution of knowledge slowly getting decentralized. And that ended up helping me learn quicker and faster for zero dollars. And I think that was a strong reason why I ended up where I am. So the open source side of things, I always push regardless of what I get paid for, like I think I would do that as a passion project on the side.
Swyx [00:03:35]: Yeah, that's wonderful. Well, we'll talk about the challenges as well that open source has, open models versus closed models. Maybe you want to touch a little bit on PyTorch before we move on to the sort of Meta AI in general.
PyTorch vs Tinygrad tradeoffs
Alessio [00:03:46]: Yeah, we kind of touched on PyTorch in a lot of episodes. So we had George Hotz from TinyGrad. He called PyTorch a CISC and TinyGrad a RISC. I would love to get your thoughts on PyTorch design direction as far as, I know you talk a lot about kind of having a happy path to start with and then making complexity hidden away but then available to the end user. One of the things that George mentioned is I think you have like 250 primitive operators in PyTorch, I think TinyGrad is four. So how do you think about some of the learnings that maybe he's going to run into that you already had in the past seven, eight years almost of running PyTorch?
Soumith [00:04:24]: Yeah, I think there's different models here, but I think it's two different models that people generally start with. Either they go like, I have a grand vision and I'm going to build a giant system that achieves this grand vision and maybe one is super feature complete or whatever. Or other people say they will get incrementally ambitious, right? And they say, oh, we'll start with something simple and then we'll slowly layer out complexity in a way that optimally applies Huffman coding or whatever. Like where the density of users are and what they're using, I would want to keep it in the easy, happy path and where the more niche advanced use cases, I'll still want people to try them, but they need to take additional frictional steps. George, I think just like we started with PyTorch, George started with the incrementally ambitious thing. I remember TinyGrad used to be, like we would be limited to a thousand lines of code and I think now it's at 5,000. So I think there is no real magic to which why PyTorch has the kind of complexity. I think it's probably partly necessitated and partly because we built with the technology available under us at that time, PyTorch is like 190,000 lines of code or something at this point. I think if you had to rewrite it, we would probably think about ways to rewrite it in a vastly simplified way for sure. But a lot of that complexity comes from the fact that in a very simple, explainable way, you have memory hierarchies. You have CPU has three levels of caches and then you have DRAM and SSD and then you have network. Similarly, GPU has several levels of memory and then you have different levels of network hierarchies, NVLink plus InfiniBand or Rocky or something like that, right? And the way the flops are available on your hardware, they are available in a certain way and your computation is in a certain way and you have to retrofit your computation onto both the memory hierarchy and like the flops available. When you're doing this, it is actually a fairly hard mathematical problem to do this setup, like you find the optimal thing. And finding the optimal thing is, what is optimal depends on the input variables themselves. So like, okay, what is the shape of your input tensors and what is the operation you're trying to do and various things like that. Finding that optimal configuration and writing it down in code is not the same for every input configuration you have. Like for example, just as the shape of the tensors change, let's say you have three input tensors into a Sparstar product or something like that. The shape of each of these input tensors will vastly change how you do this optimally placing this operation onto the hardware in a way that will get you maximal throughput. So a lot of our complexity comes from writing out hundreds of configurations for each single PyTorch operator and templatizing these things and symbolically generating the final CUDA code or CPU code. There's no way to avoid it because mathematically we haven't found symbolic ways to do this that also keep compile time near zero. You can write a very simple framework, but then you also should be willing to eat the long compile time. So if searching for that optimal performance at runtime, but that's the trade off. There's no, like, I don't think unless we have great breakthroughs George's vision is achievable, he should be thinking about a narrower problem such as I'm only going to make this for work for self-driving car connets or I'm only going to make this work for LLM transformers of the llama style. Like if you start narrowing the problem down, you can make a vastly simpler framework. But if you don't, if you need the generality to power all of the AI research that is happening and keep zero compile time and in all these other factors, I think it's not easy to avoid the complexity.
Pytorch vs Mojo
Alessio [00:08:33]: That's interesting. And we kind of touched on this with Chris Lattner when he was on the podcast. If you think about frameworks, they have the model target. They have the hardware target. They have different things to think about. He mentioned when he was at Google, TensorFlow trying to be optimized to make TPUs go brr, you know, and go as fast. I think George is trying to make especially AMD stack be better than ROCm. How come PyTorch has been such as Switzerland versus just making Meta hardware go brr?
Soumith [00:09:00]: First, Meta is not in the business of selling hardware. Meta is not in the business of cloud compute. The way Meta thinks about funding PyTorch is we're funding it because it's net good for Meta to fund PyTorch because PyTorch has become a standard and a big open source project. And generally it gives us a timeline edge. It gives us leverage and all that within our own work. So why is PyTorch more of a Switzerland rather than being opinionated? I think the way we think about it is not in terms of Switzerland or not. We actually the way we articulate it to all hardware vendors and software vendors and all who come to us being we want to build a backend in core for PyTorch and ship it by default is we just only look at our user side of things. Like if users are using a particular piece of hardware, then we want to support it. We very much don't want to king make the hardware side of things. So as the MacBooks have GPUs and as that stuff started getting increasingly interesting, we pushed Apple to push some engineers and work on the NPS support and we spend significant time from Meta funded engineers on that as well because a lot of people are using the Apple GPUs and there's demand. So we kind of mostly look at it from the demand side. We never look at it from like oh which hardware should we start taking opinions on.
Swyx [00:10:27]: Is there a future in which, because Mojo or Modular Mojo is kind of a superset of Python, is there a future in which PyTorch might use Mojo features optionally?
Soumith [00:10:36]: I think it depends on how well integrated it is into the Python ecosystem. So if Mojo is like a pip install and it's readily available and users feel like they can use Mojo so smoothly within their workflows in a way that just is low friction, we would definitely look into that. Like in the same way PyTorch now depends on Triton, OpenAI Triton, and we never had a conversation that was like huh, that's like a dependency. Should we just build a Triton of our own or should we use Triton? It almost doesn't, like those conversations don't really come up for us. The conversations are more well does Triton have 10,000 dependencies and is it hard to install? We almost don't look at these things from a strategic leverage point of view. We look at these things from a user experience point of view, like is it easy to install? Is it smoothly integrated and does it give enough benefits for us to start depending on it? If so, yeah, we should consider it. That's how we think about it.
Swyx [00:11:37]: You're inclusive by default as long as it meets the minimum bar of, yeah, but like maybe I phrased it wrongly. Maybe it's more like what problems would you look to solve that you have right now?
Soumith [00:11:48]: I think it depends on what problems Mojo will be useful at.
Swyx [00:11:52]: Mainly a performance pitch, some amount of cross compiling pitch.
Soumith [00:11:56]: Yeah, I think the performance pitch for Mojo was like, we're going to be performant even if you have a lot of custom stuff, you're going to write arbitrary custom things and we will be performant. And that value proposition is not clear to us from the PyTorch side to consider it for PyTorch. So PyTorch, it's actually not 250 operators, it's like a thousand operators. PyTorch exposes about a thousand operators and people kind of write their ideas in the thousand operators of PyTorch. Mojo is like, well, maybe it's okay to completely sidestep those thousand operators of PyTorch and just write it in a more natural form. Just write raw Python, write for loops or whatever, right? So from the consideration of how do we intersect PyTorch with Mojo, I can see one use case where you have custom stuff for some parts of your program, but mostly it's PyTorch. And so we can probably figure out how to make it easier for say Torch.compile to smoothly also consume Mojo subgraphs and like, you know, the interoperability being actually usable, that I think is valuable. But Mojo as a fundamental front end would be replacing PyTorch, not augmenting PyTorch. So in that sense, I don't see a synergy in more deeply integrating Mojo.
Pytorch vs MLX
Swyx [00:13:21]: So call out to Mojo whenever they have written something in Mojo and there's some performance related thing going on. And then since you mentioned Apple, what should people think of PyTorch versus MLX?
Soumith [00:13:32]: I mean, MLX is early and I know the folks well, Ani used to work at FAIR and I used to chat with him all the time. He used to be based out of New York as well. The way I think about MLX is that MLX is specialized for Apple right now. It has a happy path because it's defined its product in a narrow way. At some point MLX either says we will only be supporting Apple and we will just focus on enabling, you know, there's a framework if you use your MacBook, but once you like go server side or whatever, that's not my problem and I don't care. For MLS, it enters like the server side set of things as well. Like one of these two things will happen, right? If the first thing will happen, like MLX's overall addressable market will be small, but it probably do well within that addressable market. If it enters the second phase, they're going to run into all the same complexities that we have to deal with. They will not have any magic wand and they will have more complex work to do. They probably wouldn't be able to move as fast.
Swyx [00:14:44]: Like having to deal with distributed compute?
Soumith [00:14:48]: Distributed, NVIDIA and AMD GPUs, like just like having a generalization of the concept of a backend, how they treat compilation with plus overheads. Right now they're deeply assumed like the whole NPS graph thing. So they need to think about all these additional things if they end up expanding onto the server side and they'll probably build something like PyTorch as well, right? Like eventually that's where it will land. And I think there they will kind of fail on the lack of differentiation. Like it wouldn't be obvious to people why they would want to use it.
Swyx [00:15:24]: I mean, there are some cloud companies offering M1 and M2 chips on servers. I feel like it might be interesting for Apple to pursue that market, but it's not their core strength.
Soumith [00:15:33]: Yeah. If Apple can figure out their interconnect story, maybe, like then it can become a thing.
Swyx [00:15:40]: Honestly, that's more interesting than the cars. Yes.
Soumith [00:15:43]: I think the moat that NVIDIA has right now, I feel is that they have the interconnect that no one else has, like AMD GPUs are pretty good. I'm sure there's various silicon that is not bad at all, but the interconnect, like NVLink is uniquely awesome. I'm sure the other hardware providers are working on it, but-
Swyx [00:16:04]: I feel like when you say it's uniquely awesome, you have some appreciation of it that the rest of us don't. I mean, the rest of us just like, you know, we hear marketing lines, but what do you mean when you say NVIDIA is very good at networking? Obviously they made the acquisition maybe like 15 years ago.
Soumith [00:16:15]: Just the bandwidth it offers and the latency it offers. I mean, TPUs also have a good interconnect, but you can't buy them. So you have to go to Google to use it.
PyTorch Mafia
Alessio [00:16:27]: Who are some of the other FAIR PyTorch alumni that are building cool companies? I know you have Fireworks AI, Lightning AI, Lepton, and Yangqing, you knew since college when he was building Coffee?
Soumith [00:16:40]: Yeah, so Yangqing and I used to be framework rivals, PyTorch, I mean, we were all a very small close-knit community back then. Caffe, Torch, Theano, Chainer, Keras, various frameworks. I mean, it used to be more like 20 frameworks. I can't remember all the names. CCV by Liu Liu, who is also based out of SF. And I would actually like, you know, one of the ways it was interesting is you went into the framework guts and saw if someone wrote their own convolution kernel or they were just copying someone else's. There were four or five convolution kernels that were unique and interesting. There was one from this guy out of Russia, I forgot the name, but I remembered who was awesome enough to have written their own kernel. And at some point there, I built out these benchmarks called ConNet benchmarks. They're just benchmarking all the convolution kernels that are available at that time. It hilariously became big enough that at that time AI was getting important, but not important enough that industrial strength players came in to do these kinds of benchmarking and standardization. Like we have MLPerf today. So a lot of the startups were using ConNet benchmarks in their pitch decks as like, oh, you know, on ConNet benchmarks, this is how we fare, so you should fund us. I remember Nirvana actually was at the top of the pack because Scott Gray wrote amazingly fast convolution kernels at that time. Very interesting, but separate times. But to answer your question, Alessio, I think mainly Lepton, Fireworks are the two most obvious ones, but I'm sure the fingerprints are a lot wider. They're just people who worked within the PyTorch Cafe2 cohort of things and now end up at various other places.
Swyx [00:18:50]: I think as a, both as an investor and a people looking to build on top of their services, it's a uncomfortable slash like, I don't know what I don't know pitch. Because I've met Yang Tsing and I've met Lin Chao. Yeah, I've met these folks and they're like, you know, we are deep in the PyTorch ecosystem and we serve billions of inferences a day or whatever at Facebook and now we can do it for you. And I'm like, okay, that's great. Like, what should I be wary of or cautious of when these things happen? Because I'm like, obviously this experience is extremely powerful and valuable. I just don't know what I don't know. Like, what should people know about like these sort of new inference as a service companies?
Soumith [00:19:32]: I think at that point you would be investing in them for their expertise of one kind. So if they've been at a large company, but they've been doing amazing work, you would be thinking about it as what these people bring to the table is that they're really good at like GPU programming or understanding the complexity of serving models once it hits a certain scale. You know, various expertise like from the infra and AI and GPUs point of view. What you would obviously want to figure out is whether their understanding of the external markets is clear, whether they know and understand how to think about running a business, understanding how to be disciplined about making money or, you know, various things like that.
Swyx [00:20:23]: Maybe I'll put it like, actually I will de-emphasize the investing bit and just more as a potential customer. Oh, okay. Like, it's more okay, you know, you have PyTorch gods, of course. Like, what else should I know?
Soumith [00:20:37]: I mean, I would not care about who's building something. If I'm trying to be a customer, I would care about whether...
Swyx [00:20:44]: Benchmarks.
Soumith [00:20:44]: Yeah, I use it and it's usability and reliability and speed, right?
Swyx [00:20:51]: Quality as well.
Soumith [00:20:51]: Yeah, if someone from some random unknown place came to me and say, user stuff is great. Like, and I have the bandwidth, I probably will give it a shot. And if it turns out to be great, like I'll just use it.
Benchmark drama
Swyx [00:21:07]: Okay, great. And then maybe one more thing about benchmarks, since we already brought it up and you brought up Confident Benchmarks. There was some recent drama around AnyScale. AnyScale released their own benchmarks and obviously they look great on their own benchmarks, but maybe didn't give the other... I feel there are two lines of criticism. One, which is they didn't test some apples for apples on the kind of endpoints that the other providers, that they are competitors with, on their benchmarks and that is due diligence baseline. And then the second would be more just optimizing for the right thing. You had some commentary on it. I'll just kind of let you riff.
Soumith [00:21:41]: Yeah, I mean, in summary, basically my criticism of that was AnyScale built these benchmarks for end users to just understand what they should pick, right? And that's a very good thing to do. I think what they didn't do a good job of is give that end user a full understanding of what they should pick. Like they just gave them a very narrow slice of understanding. I think they just gave them latency numbers and that's not sufficient, right? You need to understand your total cost of ownership at some reasonable scale. Not oh, one API call is one cent, but a thousand API calls are 10 cents. Like people can misprice to cheat on those benchmarks. So you want to understand, okay, like how much is it going to cost me if I actually subscribe to you and do like a million API calls a month or something? And then you want to understand the latency and reliability, not just from one call you made, but an aggregate of calls you've made over several various times of the day and times of the week. And the nature of the workloads, is it just some generic single paragraph that you're sending that is cashable? Or is it like testing of real world workload? I think that kind of rigor, like in presenting that benchmark wasn't there. It was a much more narrow sliver of what should have been a good benchmark. That was my main criticism. And I'm pretty sure if before they released it, they showed it to their other stakeholders who would be caring about this benchmark because they are present in it, they would have easily just pointed out these gaps. And I think they didn't do that and they just released it. So I think those were the two main criticisms. I think they were fair and Robert took it well.
Swyx [00:23:40]: And he took it very well. And we'll have him on at some point and we'll discuss it. But I think it's important for, I think the market being maturing enough that people start caring and competing on these kinds of things means that we need to establish what best practice is because otherwise everyone's going to play dirty.
Soumith [00:23:55]: Yeah, absolutely. My view of the LLM inference market in general is that it's the laundromat model. Like the margins are going to drive down towards the bare minimum. It's going to be all kinds of arbitrage between how much you can get the hardware for and then how much you sell the API and how much latency your customers are willing to let go. You need to figure out how to squeeze your margins. Like what is your unique thing here? Like I think Together and Fireworks and all these people are trying to build some faster CUDA kernels and faster, you know, hardware kernels in general. But those modes only last for a month or two. These ideas quickly propagate.
Swyx [00:24:38]: Even if they're not published?
Soumith [00:24:39]: Even if they're not published, the idea space is small. So even if they're not published, the discovery rate is going to be pretty high. It's not like we're talking about a combinatorial thing that is really large. You're talking about Llama style LLM models. And we're going to beat those to death on a few different hardware SKUs, right? Like it's not even we have a huge diversity of hardware you're going to aim to run it on. Now when you have such a narrow problem and you have a lot of people working on it, the rate at which these ideas are going to get figured out is going to be pretty rapid.
Swyx [00:25:15]: Is it a standard bag of tricks? Like the standard one that I know of is, you know, fusing operators and-
Soumith [00:25:22]: Yeah, it's the standard bag of tricks on figuring out how to improve your memory bandwidth and all that, yeah.
Alessio [00:25:28]: Any ideas instead of things that are not being beaten to death that people should be paying more attention to?
Novel PyTorch Applications
Swyx [00:25:34]: One thing I was like, you know, you have a thousand operators, right? Like what's the most interesting usage of PyTorch that you're seeing maybe outside of this little bubble?
Soumith [00:25:41]: So PyTorch, it's very interesting and scary at the same time, but basically it's used in a lot of exotic ways, like from the ML angle, what kind of models are being built? And you get all the way from state-based models and all of these things to stuff nth order differentiable models, like neural ODEs and stuff like that. I think there's one set of interestingness factor from the ML side of things. And then there's the other set of interesting factor from the applications point of view. It's used in Mars Rover simulations, to drug discovery, to Tesla cars. And there's a huge diversity of applications in which it is used. So in terms of the most interesting application side of things, I think I'm scared at how many interesting things that are also very critical and really important it is used in. I think the scariest was when I went to visit CERN at some point and they said they were using PyTorch and they were using GANs at the same time for particle physics research. And I was scared more about the fact that they were using GANs than they were using PyTorch, because at that time I was a researcher focusing on GANs. But the diversity is probably the most interesting. How many different things it is being used in. I think that's the most interesting to me from the applications perspective. From the models perspective, I think I've seen a lot of them. Like the really interesting ones to me are where we're starting to combine search and symbolic stuff with differentiable models, like the whole AlphaGo style models is one example. And then I think we're attempting to do it for LLMs as well, with various reward models and search. I mean, I don't think PyTorch is being used in this, but the whole alpha geometry thing was interesting because again, it's an example of combining the symbolic models with the gradient based ones. But there are stuff like alpha geometry that PyTorch is used at, especially when you intersect biology and chemistry with ML. In those areas, you want stronger guarantees on the output. So yeah, maybe from the ML side, those things to me are very interesting right now.
Swyx [00:28:03]: Yeah. People are very excited about the alpha geometry thing. And it's kind of like, for me, it's theoretical. It's great. You can solve some Olympia questions. I'm not sure how to make that bridge over into the real world applications, but I'm sure people smarter than me will figure it out.
Synthetic Data vs Symbolic Models
Soumith [00:28:18]: Let me give you an example of it. You know how the whole thing about synthetic data will be the next rage in LLMs is a thing?
Swyx [00:28:27]: Already is a rage.
Soumith [00:28:28]: Which I think is fairly misplaced in how people perceive it. People think synthetic data is some kind of magic wand that you wave and it's going to be amazing. Synthetic data is useful in neural networks right now because we as humans have figured out a bunch of symbolic models of the world or made up certain symbolic models because of human innate biases. So we've figured out how to ground particle physics in a 30 parameter model. And it's just very hard to compute as in it takes a lot of flops to compute, but it only has 30 parameters or so. I mean, I'm not a physics expert, but it's a very low rank model. We built mathematics as a field that basically is very low rank. Language, a deep understanding of language, like the whole syntactic parse trees and just understanding how language can be broken down and into a formal symbolism is something that we figured out. So we basically as humans have accumulated all this knowledge on these subjects, either synthetic, we created those subjects in our heads, or we grounded some real world phenomenon into a set of symbols. But we haven't figured out how to teach neural networks symbolic world models directly. The only way we have to teach them is generating a bunch of inputs and outputs and gradient dissenting over them. So in areas where we have the symbolic models and we need to teach all the knowledge we have that is better encoded in the symbolic models, what we're doing is we're generating a bunch of synthetic data, a bunch of input output pairs, and then giving that to the neural network and asking it to learn the same thing that we already have a better low rank model of in gradient descent in a much more over-parameterized way. Outside of this, like where we don't have good symbolic models, like synthetic data obviously doesn't make any sense. So synthetic data is not a magic wand where it'll work in all cases in every case or whatever. It's just where we as humans already have good symbolic models off. We need to impart that knowledge to neural networks and we figured out the synthetic data is a vehicle to impart this knowledge to. So, but people, because maybe they don't know enough about synthetic data as a notion, but they hear, you know, the next wave of data revolution is synthetic data. They think it's some kind of magic where we just create a bunch of random data somehow. They don't think about how, and then they think that's just a revolution. And I think that's maybe a gap in understanding most people have in this hype cycle.
Swyx [00:31:23]: Yeah, well, it's a relatively new concept, so. Oh, there's two more that I'll put in front of you and then you can see what you respond. One is, you know, I have this joke that it's, you know, it's only synthetic data if it's from the Mistral region of France, otherwise it's just a sparkling distillation, which is what news research is doing. Like they're distilling GPT-4 by creating synthetic data from GPT-4, creating mock textbooks inspired by Phi 2 and then fine tuning open source models like Llama. And so I don't know, I mean, I think that's, should we call that synthetic data? Should we call it something else? I don't know.
Soumith [00:31:57]: Yeah, I mean, the outputs of LLMs, are they synthetic data? They probably are, but I think it depends on the goal you have. If your goal is you're creating synthetic data with the goal of trying to distill GPT-4's superiority into another model, I guess you can call it synthetic data, but it also feels like disingenuous because your goal is I need to copy the behavior of GPT-4 and-
Swyx [00:32:25]: It's also not just behavior, but data set. So I've often thought of this as data set washing. Like you need one model at the top of the chain, you know, unnamed French company that has that, you know, makes a model that has all the data in it that we don't know where it's from, but it's open source, hey, and then we distill from that and it's great. To be fair, they also use larger models as judges for preference ranking, right? So that is, I think, a very, very accepted use of synthetic.
Soumith [00:32:53]: Correct. I think it's a very interesting time where we don't really have good social models of what is acceptable depending on how many bits of information you use from someone else, right? It's like, okay, you use one bit. Is that okay? Yeah, let's accept it to be okay. Okay, what about if you use 20 bits? Is that okay? I don't know. What if you use 200 bits? I don't think we as society have ever been in this conundrum where we have to be like, where is the boundary of copyright or where is the boundary of socially accepted understanding of copying someone else? We haven't been tested this mathematically before,
Swyx [00:33:38]: in my opinion. Whether it's transformative use. Yes. So yeah, I think this New York Times opening eye case is gonna go to the Supreme Court and we'll have to decide it because I think we never had to deal with it before. And then finally, for synthetic data, the thing that I'm personally exploring is solving this great stark paradigm difference between rag and fine tuning, where you can kind of create synthetic data off of your retrieved documents and then fine tune on that. That's kind of synthetic. All you need is variation or diversity of samples for you to fine tune on. And then you can fine tune new knowledge into your model. I don't know if you've seen that as a direction for synthetic data.
Soumith [00:34:13]: I think you're basically trying to, what you're doing is you're saying, well, language, I know how to parametrize language to an extent. And I need to teach my model variations of this input data so that it's resilient or invariant to language uses of that data.
Swyx [00:34:32]: Yeah, it doesn't overfit on the wrong source documents.
Soumith [00:34:33]: So I think that's 100% synthetic. You understand, the key is you create variations of your documents and you know how to do that because you have a symbolic model or like some implicit symbolic model of language.
Swyx [00:34:48]: Okay.
Alessio [00:34:49]: Do you think the issue with symbolic models is just the architecture of the language models that we're building? I think maybe the thing that people grasp is the inability of transformers to deal with numbers because of the tokenizer. Is it a fundamental issue there too? And do you see alternative architectures that will be better with symbolic understanding?
Soumith [00:35:09]: I am not sure if it's a fundamental issue or not. I think we just don't understand transformers enough. I don't even mean transformers as an architecture. I mean the use of transformers today, like combining the tokenizer and transformers and the dynamics of training, when you show math heavy questions versus not. I don't have a good calibration of whether I know the answer or not. I, you know, there's common criticisms that are, you know, transformers will just fail at X. But then when you scale them up to sufficient scale, they actually don't fail at that X. I think there's this entire subfield where they're trying to figure out these answers called like the science of deep learning or something. So we'll get to know more. I don't know the answer.
Meta AI and Llama 2/3
Swyx [00:35:57]: Got it. Let's touch a little bit on just Meta AI and you know, stuff that's going on there. Maybe, I don't know how deeply you're personally involved in it, but you're our first guest with Meta AI, which is really fantastic. And Llama 1 was, you know, you are such a believer in open source. Llama 1 was more or less the real breakthrough in open source AI. The most interesting thing for us covering on this, in this podcast was the death of Chinchilla, as people say. Any interesting insights there around the scaling models for open source models or smaller models or whatever that design decision was when you guys were doing it?
Soumith [00:36:31]: So Llama 1 was Guillaume Lample and team. There was OPT before, which I think I'm also very proud of because we bridged the gap in understanding of how complex it is to train these models to the world. Like until then, no one really in gory detail published.
Swyx [00:36:50]: The logs.
Soumith [00:36:51]: Yeah. Like, why is it complex? And everyone says, oh, it's complex. But no one really talked about why it's complex. I think OPT was cool.
Swyx [00:37:02]: I met Susan and she's very, very outspoken. Yeah.
Soumith [00:37:05]: We probably, I think, didn't train it for long enough, right? That's kind of obvious in retrospect.
Swyx [00:37:12]: For a 175B. Yeah. You trained it according to Chinchilla at the time or?
Soumith [00:37:17]: I can't remember the details, but I think it's a commonly held belief at this point that if we trained OPT longer, it would actually end up being better. Llama 1, I think, was Guillaume Lample and team Guillaume is fantastic and went on to build Mistral. I wasn't too involved in that side of things. So I don't know what you're asking me, which is how did they think about scaling loss and all of that? Llama 2, I was more closely involved in. I helped them a reasonable amount with their infrastructure needs and stuff. And Llama 2, I think, was more like, let's get to the evolution. At that point, we kind of understood what we were missing from the industry's understanding of LLMs. And we needed more data and we needed more to train the models for longer. And we made, I think, a few tweaks to the architecture and we scaled up more. And that was Llama 2. I think Llama 2, you can think of it as after Guillaume left, the team kind of rebuilt their muscle around Llama 2. And Hugo, I think, who's the first author is fantastic. And I think he did play a reasonable big role in Llama 1 as well.
Soumith [00:38:35]: And he overlaps between Llama 1 and 2. So in Llama 3, obviously, hopefully, it'll be awesome.
Alessio [00:38:42]: Just one question on Llama 2, and then we'll try and fish Llama 3 spoilers out of you. In the Llama 2 paper, the loss curves of the 34 and 70B parameter, they still seem kind of steep. Like they could go lower. How, from an infrastructure level, how do you allocate resources? Could they have just gone longer or were you just, hey, this is all the GPUs that we can burn and let's just move on to Llama 3 and then make that one better?
Soumith [00:39:07]: Instead of answering specifically about that Llama 2 situation or whatever, I'll tell you how we think about things. Generally, we're, I mean, Mark really is some numbers, right?
Swyx [00:39:20]: So let's cite those things again. All I remember is like 600K GPUs.
Soumith [00:39:24]: That is by the end of this year and 600K H100 equivalents. With 250K H100s, including all of our other GPU or accelerator stuff, it would be 600-and-something-K aggregate capacity.
Swyx [00:39:38]: That's a lot of GPUs.
Soumith [00:39:39]: We'll talk about that separately. But the way we think about it is we have a train of models, right? Llama 1, 2, 3, 4. And we have a bunch of GPUs. I don't think we're short of GPUs. Like-
Swyx [00:39:54]: Yeah, no, I wouldn't say so. Yeah, so it's all a matter of time.
Soumith [00:39:56]: I think time is the biggest bottleneck. It's like, when do you stop training the previous one and when do you start training the next one? And how do you make those decisions? The data, do you have net new data, better clean data for the next one in a way that it's not worth really focusing on the previous one? It's just a standard iterative product. You're like, when is the iPhone 1? When do you start working on iPhone 2? Where is the iPhone? And so on, right? So mostly the considerations are time and generation, rather than GPUs, in my opinion.
Alessio [00:40:31]: So one of the things with the scaling loss, like Chinchilla is optimal to balance training and inference costs. I think at Meta's scale, you would rather pay a lot more maybe at training and then save on inference. How do you think about that from infrastructure perspective? I think in your tweet, you say you can try and guess on like how we're using these GPUs. Can you just give people a bit of understanding? It's like, because I've already seen a lot of VCs say, Llama 3 has been trained on 600,000 GPUs and that's obviously not true, I'm sure. How do you allocate between the research, FAIR and the Llama training, the inference on Instagram suggestions that get me to scroll, like AI-generated stickers on WhatsApp and all of that?
Soumith [00:41:11]: Yeah, we haven't talked about any of this publicly, but as a broad stroke, it's like how we would allocate resources of any other kinds at any company. You run a VC portfolio, how do you allocate your investments between different companies or whatever? You kind of make various trade-offs and you kind of decide, should I invest in this project or this other project, or how much should I invest in this project? It's very much a zero sum of trade-offs. And it also comes into play, how are your clusters configured, like overall, what you can fit of what size and what cluster and so on. So broadly, there's no magic sauce here. I mean, I think the details would add more spice, but also wouldn't add more understanding. It's just gonna be like, oh, okay, I mean, this looks like they just think about this as I would normally do.
Alessio [00:42:05]: So even the GPU rich run through the same struggles of having to decide where to allocate things.
Soumith [00:42:11]: Yeah, I mean, at some point I forgot who said it, but you kind of fit your models to the amount of compute you have. If you don't have enough compute, you figure out how to make do with smaller models. But no one as of today, I think would feel like they have enough compute. I don't think I've heard any company within the AI space be like, oh yeah, like we feel like we have sufficient compute and we couldn't have done better. So that conversation, I don't think I've heard from any of my friends at other companies.
Eleuther
Swyx [00:42:47]: Stella from Eleuther sometimes says that because she has a lot of donated compute. She's trying to put it to interesting uses, but for some reason she's decided to stop making large models.
Soumith [00:42:57]: I mean, that's a cool, high conviction opinion that might pay out.
Swyx [00:43:01]: Why?
Soumith [00:43:02]: I mean, she's taking a path that most people don't care to take about in this climate and she probably will have very differentiated ideas. I mean, think about the correlation of ideas in AI right now. It's so bad, right? So everyone's fighting for the same pie. In some weird sense, that's partly why I don't really directly work on LLMs. I used to do image models and stuff and I actually stopped doing GANs because GANs were getting so hot that I didn't have any calibration of whether my work would be useful or not because, oh yeah, someone else did the same thing you did. It's like, there's so much to do, I don't understand why I need to fight for the same pie. So I think Stella's decision is very smart.
Making Bets
Alessio [00:43:53]: And how do you reconcile that with how we started the discussion about intrinsic versus extrinsic kind of like accomplishment or success? How should people think about that especially when they're doing a PhD or early in their career? I think in Europe, I walked through a lot of the posters and whatnot, there seems to be mode collapse in a way in the research, a lot of people working on the same things. Is it worth for a PhD to not take a bet on something that is maybe not as interesting just because of funding and visibility and whatnot? Or yeah, what suggestions would you give?
Soumith [00:44:28]: I think there's a baseline level of compatibility you need to have with the field. Basically, you need to figure out if you will get paid enough to eat, right? Like whatever reasonable normal lifestyle you want to have as a baseline. So you at least have to pick a problem within the neighborhood of fundable. Like you wouldn't wanna be doing something so obscure that people are like, I don't know, like you can work on it.
Swyx [00:44:59]: Would a limit on fundability, I'm just observing something like three months of compute, right? That's the top line, that's the like max that you can spend on any one project.
Soumith [00:45:09]: But like, I think that's very ill specified, like how much compute, right? I think that the notion of fundability is broader. It's more like, hey, are these family of models within the acceptable set of, you're not crazy or something, right? Even something like neural or DS, which is a very boundary pushing thing or states-based models or whatever. Like all of these things I think are still in fundable territory. When you're talking about, I'm gonna do one of the neuromorphic models and then apply image classification to them or something, then it becomes a bit questionable. Again, it depends on your motivation. Maybe if you're a neuroscientist, it actually is feasible. But if you're an AI engineer, like the audience of these podcasts, then it's more questionable. The way I think about it is, you need to figure out how you can be in the baseline level of fundability just so that you can just live. And then after that, really focus on intrinsic motivation and depends on your strengths, like how you can play to your strengths and your interests at the same time. Like I try to look at a bunch of ideas that are interesting to me, but also try to play to my strengths. I'm not gonna go work on theoretical ML. I'm interested in it, but when I want to work on something like that, I try to partner with someone who is actually a good theoretical ML person and see if I actually have any value to provide. And if they think I do, then I come in. So I think you'd want to find that intersection of ideas you like, and that also play to your strengths. And I'd go from there. Everything else, like actually finding extrinsic success and all of that, I think is the way I think about it is like somewhat immaterial. When you're talking about building ecosystems and stuff, slightly different considerations come into play, but that's a different conversation.
Swyx [00:47:06]: We're gonna pivot a little bit to just talking about open source AI. But one more thing I wanted to establish for Meta is this 600K number, just kind of rounding out the discussion, that's for all Meta. So including your own inference needs, right? It's not just about training.
Soumith [00:47:19]: It's gonna be the number in our data centers for all of Meta, yeah.
Swyx [00:47:23]: Yeah, so there's a decent amount of workload serving Facebook and Instagram and whatever. And then is there interest in like your own hardware?
MTIA
Soumith [00:47:31]: We already talked about our own hardware. It's called MTIA. Our own silicon, I think we've even showed the standard photograph of you holding the chip that doesn't work. Like as in the chip that you basically just get like-
Swyx [00:47:51]: As a test, right?
Soumith [00:47:52]: Yeah, a test chip or whatever. So we are working on our silicon and we'll probably talk more about it when the time is right, but-
Swyx [00:48:00]: Like what gaps do you have that the market doesn't offer?
Soumith [00:48:04]: Okay, I mean, this is easy to answer. So basically, remember how I told you about there's this memory hierarchy and like sweet spots and all of that? Fundamentally, when you build a hardware, you make it general enough that a wide set of customers and a wide set of workloads can use it effectively while trying to get the maximum level of performance they can. The more specialized you make the chip, the more hardware efficient it's going to be, the more power efficient it's gonna be, the more easier it's going to be to find the software, like the kernel's right to just map that one or two workloads to that hardware and so on. So it's pretty well understood across the industry that if you have a sufficiently large volume, enough workload, you can specialize it and get some efficiency gains, like power gains and so on. So the way you can think about everyone building, every large company building silicon, I think a bunch of the other large companies are building their own silicon as well, is they, each large company has a sufficient enough set of verticalized workloads that can be specialized that have a pattern to them that say a more generic accelerator like an NVIDIA or an AMD GPU does not exploit. So there is some level of power efficiency that you're leaving on the table by not exploiting that. And you have sufficient scale and you have sufficient forecasted stability that those workloads will exist in the same form, that it's worth spending the time to build out a chip to exploit that sweet spot. Like obviously something like this is only useful if you hit a certain scale and that your forecasted prediction of those kind of workloads being in the same kind of specializable exploitable way is true. So yeah, that's why we're building our own chips.
Swyx [00:50:08]: Awesome.
Open Source AI
Alessio [00:50:09]: Yeah, I know we've been talking a lot on a lot of different topics and going back to open source, you had a very good tweet. You said that a single company's closed source effort rate limits against people's imaginations and needs. How do you think about all the impact that some of the Meta AI work in open source has been doing and maybe directions of the whole open source AI space?
Soumith [00:50:32]: Yeah, in general, I think first, I think it's worth talking about this in terms of open and not just open source, because like with the whole notion of model weights, no one even knows what source means for these things. But just for the discussion, when I say open source, you can assume it's just I'm talking about open. And then there's the whole notion of licensing and all that, commercial, non-commercial, commercial with clauses and all that. I think at a fundamental level, the most benefited value of open source is that you make the distribution to be very wide. It's just available with no friction and people can do transformative things in a way that's very accessible. Maybe it's open source, but it has a commercial license and I'm a student in India. I don't care about the license. I just don't even understand the license. But like the fact that I can use it and do something with it is very transformative to me. Like I got this thing in a very accessible way. And then it's various degrees, right? And then if it's open source, but it's actually a commercial license, then a lot of companies are gonna benefit from gaining value that they didn't previously have, that they maybe had to pay a closed source company for it. So open source is just a very interesting tool that you can use in various ways. So there's, again, two kinds of open source. One is some large company doing a lot of work and then open sourcing it. And that kind of effort is not really feasible by say a band of volunteers doing it the same way. So there's both a capital and operational expenditure that the large company just decided to ignore and give it away to the world for some benefits of some kind. They're not as tangible as direct revenue. So in that part, Meta has been doing incredibly good things. They fund a huge amount of the PyTorch development. They've open sourced Llama and those family of models and several other fairly transformative projects. FICE is one, Segment Anything, Detectron, Detectron 2. Dense Pose. I mean, it's-
Swyx [00:52:52]: Seamless. Yeah, seamless.
Soumith [00:52:53]: Like it's just the list is so long that we're not gonna cover. So I think Meta comes into that category where we spend a lot of CapEx and OpEx and we have a high talent density of great AI people and we open our stuff. And the thesis for that, I remember when FAIR was started, the common thing was like, wait, why would Meta wanna start a open AI lab? Like what exactly is a benefit from a commercial perspective? And for then the thesis was very simple. It was AI is currently rate limiting Meta's ability to do things. Our ability to build various product integrations, moderation, various other factors. Like AI was the limiting factor and we just wanted AI to advance more and we didn't care if the IP of the AI was uniquely in our possession or not. However the field advances, that accelerates Meta's ability to build a better product. So we just built an open AI lab and we said, if this helps accelerate the progress of AI, that's strictly great for us. But very easy, rational, right? Still the same to a large extent with the Llama stuff. And it's the same values, but the argument, it's a bit more nuanced. And then there's a second kind of open source, which is, oh, we built this project, nights and weekends and we're very smart people and we open sourced it and then we built a community around it. This is the Linux kernel and various software projects like that. So I think about open source, like both of these things being beneficial and both of these things being different. They're different and beneficial in their own ways. The second one is really useful when there's an active arbitrage to be done. If someone's not really looking at a particular space because it's not commercially viable or whatever, like a band of volunteers can just coordinate online and do something and then make that happen. And that's great.
Open Source LLMs
I wanna cover a little bit about open source LLMs maybe. So open source LLMs have been very interesting because I think we were trending towards an increase in open source in AI from 2010 all the way to 2017 or something. Like where more and more pressure within the community was to open source their stuff so that their methods and stuff get adopted. And then the LLMs revolution kind of took the opposite effect OpenAI stopped open sourcing their stuff and DeepMind kind of didn't, like all the other cloud and all these other providers, they didn't open source their stuff. And it was not good in the sense that first science done in isolation probably will just form its own bubble where people believe their own b******t or whatever. So there's that problem. And then there was the other problem which was the accessibility part. Like, okay, I again always go back to I'm a student in India with no money. What is my accessibility to any of these closers models? At some scale I have to pay money. That makes it a non-starter and stuff. And there's also the control thing. I strongly believe if you want human aligned stuff, you want all humans to give feedback. And you want all humans to have access to that technology in the first place. And I actually have seen, living in New York, whenever I come to Silicon Valley, I see a different cultural bubble. Like all the friends I hang out with talk about some random thing like Dyson Spheres or whatever, that's a thing. And most of the world doesn't know or care about any of this stuff. It's definitely a bubble and bubbles can form very easily. And when you make a lot of decisions because you're in a bubble, they're probably not globally optimal decisions. So I think open source, the distribution of open source powers a certain kind of non-falsifiability that I think is very important. I think on the open source models, like it's going great in the fact that LoRa I think came out of the necessity of open source models needing to be fine-tunable in some way. Yeah, and I think DPO also came out of the academic open source side of things. So do any of the closed source labs, did any of them already have LoRa or DPO internally? Maybe, but that does not advance humanity in any way. It advances some companies probability of doing the winner takes all that I talked about earlier in the podcast.
Open Source and Trust
I don't know, it just feels fundamentally good. Like when people try to, you know, people are like, well, what are the ways in which it is not okay? I find most of these arguments, and this might be a little controversial, but I find a lot of arguments based on whether closed source models are safer or open source models are safer very much related to what kind of culture they grew up in, what kind of society they grew up in. If they grew up in a society that they trusted, then I think they take the closed source argument. And if they grew up in a society that they couldn't trust, where the norm was that you didn't trust your government, obviously it's corrupt or whatever, then I think the open source argument is what they take. I think there's a deep connection to like people's innate biases from their childhood and their trust in society and governmental aspects that push them towards one opinion or the other. And I'm definitely in the camp of open source is definitely going to actually have better outcomes for society. Closed source to me just means that centralization of power, which, you know, is really hard to trust. So I think it's going well in so many ways that we're actively disaggregating the centralization of power to just two or three providers. We are, I think, benefiting from so many people using these models in so many ways that aren't allowed by, say, Silicon Valley left-wing tropes. Like some of these things are good or bad, but they're not culturally accepted universally in the world. So those are things worth thinking about. And I think open source is not winning in certain ways. Like these are all the things in which like, as I mentioned, it's actually being very good and beneficial and winning.
Feedback to solve the Open Source Coordination problem
I think one of the ways in which it's not winning, at some point I should write a long-form post about this, is I think it has a classic coordination problem. I mean, open source in general always has a coordination problem. If there's a vertically integrated provider with more resources, they will just be better coordinated than open source. And so now open source has to figure out how to have coordinated benefits. And the reason you want coordinated benefits is because these models are getting better based on human feedback. And if you see with open source models, if you go to Reddit, local llama, subreddit, like there's so many variations of models that are being produced from, say, nose research. I mean, there's so many variations built by so many people. And one common theme is they're all using these fine-tuning or human preferences datasets that are very limited. And like someone published them somewhere and they're not sufficiently diverse. And you look at the other side, say front-ends like Uba or Hugging Chat or Ollama, they don't really have like feedback buttons. Like all the people using all these front-ends, they probably want to give feedback, but there's no way for them to give feedback. So these models are being built, they're being arbitrarily measured, and then they are being deployed into all these open source front-ends or like apps that are closed source, they're serving open source models. And these front-ends don't have, they are not exposing the ability to give feedback. So we're just losing all of this feedback. Maybe open source models are being as used as GPT is at this point in like all kinds of, in a very fragmented way, in aggregate all the open source models together are probably being used as much as GPT is, maybe close to that. But the amount of feedback that is driving back into the open source ecosystem is negligible, maybe less than 1% of the usage. So I think the blueprint here I think is you'd want someone to create a sinkhole for the feedback, some centralized sinkhole, maybe Hugging Face or someone just funds like, okay, I will make available a call to log a string along with a bit of information of positive or negative or something that. And then you would want to send pull requests to all the open source front-ends like Ooba and all being like, hey, we're just integrating a feedback UI and then work with the closed source people as also being like, look, it doesn't cost you anything, just have a button. And then the sinkhole will have a bunch of this data coming in. And then I think a bunch of open source researchers should figure out how to filter their feedback into only the high quality one. I'm sure it will be exploited by spam bots or whatever, right? Like, this is the perfect way to inject your advertising product into the next. So there needs to be some level of that, that in the same way, I'm sure all the closed providers are doing today, like OpenAI, Claude, the feedback that comes in, I'm sure they are figuring out if that's legit or not. That kind of data filtering needs to be done. And that loop has to be set up. And this requires that central sinkhole and that data cleaning effort both to be there. They're not there right now. They're not there right now, I think for capital reasons, but also for coordination reasons. Okay, if that central sinkhole is there, who's gonna go coordinate all of this integration across all of these open source front ends. But I think if we do that, if that actually happens, I think that probably has a real chance of the open source models having a runaway effect against OpenAI with their current daily active users. Probably doesn't have a chance against Google because you know, Google has Android and Chrome and Gmail and Google Docs and everything, you know? So people just use that a lot. But like, I think there's a clear chance we can take at truly winning open source.
AGI
Alessio [01:04:00]: Do you think this feedback is helpful to make open source models better or to get to like open source AGI? Because in a way like OpenAI's goal is to get to AGI, right? So versus I think in open source, we're more focused on personal better usage or like commercial better usage.
Soumith [01:04:17]: Yeah, I think that's a good question. But I think, I actually don't think people have a good understanding of AGI. And I don't mean definition level. I mean, people are like, okay, we're gonna, AGI means it's powering 40% of world economic output or something like that, right? But what does that mean? So do you think electricity is powering 40% of world economic output or is it not? Like generally the notion of powering X percent of economic output is not defined well at all for me to understand how to know when we got to AGI or how to measure whether we're getting AGI. Like, you know, you can look at it in terms of intelligence or task automation or whatever. I think that's what we are doing right now. We're basically integrating like the current set of AI technologies into so many real world use cases where we find value that if some new version of AI comes in, we can find, we can be like, ah, this helps me more. In that sense, I think the whole process of how we think we got to AGI will be continuous and not discontinuous like how I think the question is posed. So I think the open source thing will be very much in line with getting to AGI because open source has that natural selection effect. Like if a better open source model comes, really no one says, ha, I don't want to use it because there are ecosystem effect, I'm logged into my ecosystem or, I don't know if I like the models, you know, whatever. It's just a very pure direct thing. So if there's a better model that comes out, then it will be used. So I definitely think it has a good chance of achieving how I would think about it as a continuous path to what we might define as AGI.
OpenAssistant vs LMSys vs OpenRouter
Swyx [01:06:18]: For the listeners, I would actually mention a couple other maybe related notes on just this very interesting concept of feedback sinkhole for open source to really catch up in terms of the overall Google versus OpenAI debate. Open Assistant was led by Yannick Kilcher who recently ended his effort. I think the criticism there was like the kind of people that go to a specific website to give feedback is not representative of real world usage. And that's why the models trained on Open Assistant didn't really seem like they have caught on in the open source world. The two leading candidates in my mind are LMSYS out of UC Berkeley who have the LMSYS arena, which is being touted as one of the only ways, only reliable benchmarks anymore. I kind of call them non-parametric benchmarks because there's nothing to cheat on it except for ELO. And then the other one is OpenRouter, which is Alex Atala's thing. I don't know if you've talked to any of these people.
Soumith [01:07:11]: I obviously know all of the efforts that you talked about. I haven't talked to them directly about this yet. But the way I think about it is the way these models are going to be used is always going to be way more distributed than centralized. Like, which is the power of the open source movement. Like the UI within which these models are going to be used is going to be decentralized. These models are going to be integrated into hundreds and thousands of projects and products and all of that. And I think that is important to recognize. Like the LMSYS leaderboard is the best thing we have right now to understand whether a model is better or not versus another model. But it's also biased in only having a sliver of view into how people actually use these models. Like the people who actually end up coming to the LMSYS leaderboard and then using a model only use it for certain things. Like GitHub Copilot style usage is not captured in say LMSYS things. And so many other styles, like the character AI style things is not captured in LMSYS.
Swyx [01:08:19]: Which OpenRouter could do. They don't do it right now, but.
Soumith [01:08:22]: Yeah, so my point is like the way these models are going to be used is going to be always a large surface area. And I think we need to figure out how to provide the infrastructure to integrate with all these like ways in which it's being used. Even if you get the top hundred front ends that the model, like open source models are used through to subscribe to the sinkhole. I think that's already a substantial thing. I think thinking one or two things will by themselves get a lot of data I think is not going to happen.
Swyx [01:08:58]: Yeah, fair enough.
Other Modalities
Alessio [01:08:59]: Before we let you go, can we do just a quick beyond text segment? So you're an investor in Runway, which is a beta generation. You're an investor in One X, which is a humanoid assistant. Osmo, which is focused on using AI for smell recognition and synthesis. You advise a bunch of robotics projects at NYU.
Swyx [01:09:19]: Maybe. And he builds his own home robot. Yeah, exactly.
Alessio [01:09:22]: On a more, yeah, maybe open editing. What are the things that you're most excited about beyond text generation and kind of the more mundane usage?
Soumith [01:09:30]: Yeah, I mean, in general, I have more things I'm generally excited about than I can possibly do. Investing is one way to try to clear those urges. I'm generally excited about robotics being a possibility, home robotics being five to seven years away into commercialization. I think it's not next year or two years from now, but five to seven years from now, I think a lot more robotics companies might pop out. There's not a good consensus on whether hardware is a bottleneck or AI is a bottleneck in robotics right now. My view is actually hardware is still the bottleneck and AI is also a little bit of bottleneck, but I don't think there's any obvious breakthroughs we need. I think it's just work. So I'm generally excited about robotics. I spend a lot of personal time. I spend every Wednesday afternoon at NYU working with Lerrel Pinto and team and just getting towards my home robot that just does my dishes and stuff.
Swyx [01:10:38]: What's the status of it? Like what does it do for you now?
Soumith [01:10:41]: As of today, we just deployed a couple of months ago, we deployed our home robotics stuff into several tens of New York City homes and tried to make it do a bunch of tasks. And we're basically starting to build out a framework that gets to a certain level of robustness on fairly simple tasks, like picking this cup and putting it somewhere else or taking a few pieces of cloth on the ground and put it somewhere else or open your microwave and various baseline tasks that with low sample complexity. So I think one of the things people don't spend a lot of time in robotics is the user experience, which I think in the research I do at NYU, we spend a huge amount of time on. I think the key there is sample complexity has to be really low. A lot of the current robotics research, if you see they're like, oh yeah, we collected 50 demos and now it's able to do this task or we collected 300 demos or the number of samples you need for this thing to do the task is really high. So we're focusing a lot on, you show it two or three times and that's sufficient for it to actually do the task, but it comes with less generalization, right? Like there's some initial conditions that have to be true for it to do the task. So we're making progress. That's very interesting in general, the space. I don't think people in this space have settled on the hardware, like how the hardware looks like for it to be truly useful in the home or whatever, or the UX or the like AI, ML stuff needed to make it sample efficient and all of that. But I think lots of work is happening in the field.
Alessio [01:12:28]: Yeah, one of my friends, Carlo at Berkeley, he worked on a project called M3L, which is two CNNs, one for tactile feedback and one for image. When you say hardware, is it running all these things on the edge or is it just like the actual servos and the-
Soumith [01:12:45]: By hardware, I mean the actual servos, like the motors, servos, even the sensors. I think we have incredible vision that's still it's so much better compared to in the field of view and in resolution compared to any of the cameras we can buy. We have, our skin is all available touch sensing and we have some of the most efficient, some of the most high capacity motors that can lift large loads in the dexterity of a hand and stuff. So in terms of hardware, I mean in terms of those capabilities, we haven't figured out how to do a lot of this stuff. I mean, Tesla has been making incredible progress. One X, I think announced their new thing that looks incredible. Some of the other companies figure and others are doing great work. But we're really not anywhere close to the hardware that we feel like we need. And there's obviously the other thing I want to call out is a lot of what people show works, but has to be fixed all the time. And like, that's the other thing we are incredible at. Like we don't need any maintenance or the maintenance is part of us. If you buy a product, electronics product of any kind, you buy a PS5, you don't say, oh yeah, my PS5 breaks every six days and I have to do some reasonable amount of work on it. But that's robotics. Like if it's not industrial robotics where it's very controlled and specialized or whatever, you're talking about reliability in those ranges. So I think people don't talk about the reliability thing enough. Like what I mean, we're going to enter the commercialization phase. I mean, we're going to start thinking about, okay, now we have this thing and we need to figure out how to get reliability high enough to deploy it into homes and just sell it to people and Best Buy or something. So that's the other factor that we have to make a lot of progress on.
Swyx [01:14:44]: I just realized that Google has a play in this with Palm E and stuff and OpenAI obviously has a long history of doing this stuff. Is there anything at Meta? No robotics stuff in Meta?
Soumith [01:14:55]: We have a small robotics program at Meta out of FAIR. I actually used to do it at FAIR a little bit before I moved into Infra and focused on my Meta time on a lot of other infrastructural stuff. So yeah, Meta's robotics program is a lot smaller.
Swyx [01:15:10]: Seems like it would be a personal computing.
Soumith [01:15:14]: You could think of it as like, Meta has a ridiculously large device strategy, right? Like, you know, this is how our reality labs stuff. You know, we're going at it from VR and AR and, you know, we showcase a lot of that stuff. I think for Meta, the robot is not as important as like the physical device. Physical devices kind of stuff.
Osmo - smell AI
Swyx [01:15:37]: Yeah, for sure. Yeah. Okay, I want to touch on Osmo a bit because very unusual company to the stuff that we normally discuss, not robotics, sense of smell. The original pitch I heard from the founder, maybe you can correct me, is that he realized that you can smell cancer. Yeah. Is that intuitive? Is that what you get? Or is that the potential that you see?
Soumith [01:15:56]: The very interesting reason I invested in Osmo is because Alex Wiltschko, the founder of Osmo, before PyTorch, there was Torch. And Alex Wiltschko actually worked on Torch. He's actually a frameworks guy. Like, you know, he built this thing called Tangent from Google, another autodiff framework and stuff. I know him from that side of things. And then, he is a neurobiologist by training. He just happens to also love, neural networks and hacking on those frameworks. So incredibly smart guy, one of the smartest people I know. So when he was going in this direction, I thought it was incredible that smell is something that we haven't even started to scrape in terms of digitization. When we think about audio or images or video, they're so advanced. So we have the concept of color spaces. We have the concept of frequency spectrums. Like, you know, we figured out how ears process, like, frequencies in mouse spectrum or whatever logarithmically scaled. Images for RGB, YUV. We have so many different kinds of parameterizations. We have formalized these two senses ridiculously well. Touch and smell, nada. We're where we were with images in, say, in 1920 or maybe even the 1800s, right? That's where we're at. And Alex has this incredible vision of, like, having a smell sensor just eventually just be part of your daily life. Like, as of today, you don't really think about when you're watching an Instagram reel or something, huh, I also would love to know what it smelled like, you know, when you're watching a reel of a food or something. You don't, because we really haven't, as a society, got that muscle to even understand what a smell sensor can do. I think the more near-term effects are obviously going to be around things that provide more obvious utility in the short term, like maybe smelling cancer or repelling mosquitoes better, or, you know, stuff like that.
Swyx [01:18:12]: More recently, he's been talking about categorizing perfumes, obviously. Yeah, exactly. That's a market that you can pursue.
Soumith [01:18:17]: Yeah, like, I mean, think about how you can customize a perfume to your own liking in the same way you can customize a shoe or something, right? I think all the near-term stuff, I think if he's able to figure out a near-term value for it, they, as a company, can sustain themselves to then eventually try to make progress on the long term, which is really in uncharted territory. Like, think about it, 50 years from now, it would be pretty obvious to kids of the generation to just, like, I was going to say scroll a reel on their phone, and maybe phones wouldn't be there.
Swyx [01:18:58]: They're just on their glasses, they're watching something.
Soumith [01:18:58]: Yeah, I think VR would be. And then, like, they immediately get a smell sense of that remote experience as well. We haven't really progressed enough in that dimension, and I think they have a chance to do it.
Alessio [01:19:13]: Awesome, I mean, we touched on a lot of things. Anything, we're missing anything you want to direct people to, or?
Swyx [01:19:19]: Yeah, call to action. Yeah. Call for research, call for startups.
Soumith [01:19:22]: I don't really have a lot of calls to action, because usually I think people should be intrinsically, like, figuring it out.
Swyx [01:19:29]: That's a good look inside yourself. Yeah. That's good.
Alessio [01:19:33]: Awesome, thank you so much for coming on.
Swyx [01:19:35]: Yeah, for sure. This was great.

Get full access to Latent Space at www.latent.space/subscribe

6 March 2024, 6:40 pm
1 hour 10 minutes

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

This Friday we’re doing a special crossover event in SF with of SemiAnalysis (previous guest!), and we will do a live podcast on site. RSVP here.
Also join us on June 25-27 for the biggest AI Engineer conference of the year!
Replicate is one of the most popular AI inference providers, reporting over 2 million users as of their $40m Series B with a16z. But how did they get there?
The Definitive Replicate Story (warts and all)
Their overnight success took 5 years of building, and it all started with arXiv Vanity, which was a 2017 vacation project that scrapes arXiv PDFs and re-renders them into semantic web pages that reflow nicely with better typography and whitespace.
From there, Ben and Andreas’ idea was to build tools to make ML research more robust and reproducible by making it easy to share code artefacts alongside papers. They had previously created Fig, which made it easy to spin up dev environments; it was eventually acquired by Docker and turned into `docker-compose`, the industry standard way to define services from containerized applications.
2019: Cog
The first iteration of Replicate was a Fig-equivalent for ML workloads which they called Cog; it made it easy for researchers to package all their work and share it with peers for review and reproducibility.
But they found that researchers were terrible users: they’d do all this work for a paper, publish it, and then never return to it again.
“We talked to a bunch of researchers and they really wanted that.... But how the hell is this a business, you know, like how are we even going to make any money out of this?
…So we went and talked to a bunch of companies trying to sell them something which didn't exist. So we're like, hey, do you want a way to share research inside your company so that other researchers or say like the product manager can test out the machine learning model? They're like, maybe. Do you want like a deployment platform for deploying models? Do you want a central place for versioning models? We were trying to think of lots of different products we could sell that were related to this thing…
So we then got halfway through our YC batch. We hadn't built a product. We had no users. We had no idea what our business was going to be because we couldn't get anybody to like buy something which didn't exist. And actually there was quite a way through our, I think it was like two thirds the way through our YC batch or something. And we're like, okay, well we're kind of screwed now because we don't have anything to show at demo day.”
The team graduated YCombinator with no customers, no product and nothing to demo - which was fine because demo day got canceled as the YC W’20 class graduated right into the pandemic. The team spent the next year exploring and building Covid tools.
2021: CLIP + GAN = PixRay
By 2021, OpenAI released CLIP. Overnight dozens of Discord servers got spun up to hack on CLIP + GANs. Unlike academic researchers, this community was constantly releasing new checkpoints and builds of models.
PixRay was one of the first models being built on Replicate, and it quickly started taking over the community. Chris Dixon has a famous 2010 post titled “The next big thing will start out looking like a toy”; image generation would have definitely felt like a toy in 2021, but it gave Replicate its initial boost.
2022: Stable Diffusion
In August 2022 Stable Diffusion came out, and all the work they had been doing to build this infrastructure for CLIP / GANs models became the best way for people to share their StableDiffusion fine-tunes:
And like the first week we saw people making animation models out of it. We saw people make game texture models that use circular convolutions to make repeatable textures. We saw a few weeks later, people were fine tuning it so you could put your face in these models and all of these other ways. […] So tons of product builders wanted to build stuff with it. And we were just sitting in there in the middle, as the interface layer between all these people who wanted to build, and all these machine learning experts who were building cool models. And that's really where it took off. Incredible supply, incredible demand, and we were just in the middle.
(Stable Diffusion also spawned Latent Space as a newsletter)
The landing page paved the cowpath for the intense interest in diffusion model APIs.
2023: Llama & other multimodal LLMs
By 2023, Replicate’s growing visibility in the Stable Diffusion indie hacker community came from top AI hackers like Pieter Levels and Danny Postmaa, each making millions off their AI apps:
Meta then released LLaMA 1 and 2 (our coverage of it), greatly pushing forward the SOTA open source model landscape. Demand for text LLMs and other modalities rose, and Replicate broadened its focus accordingly, culminating in a $18m Series A and $40m Series B from a16z (at a $350m valuation).
Building standards for the AI world
Now that the industry is evolving from toys to enterprise use cases, all these companies are working to set standards for their own space. We cover this at ~45 mins in the podcast. Some examples:
* LangChain has been trying to establish "chain” as the standard mental models when putting multiple prompts and models together, and the “LangChain Expression Language” to go with it. (Our episode with Harrison)
* LLamaHub for packaging RAG utilities. (Our episode with Jerry)
* Ollama’s Modelfile to define runtimes for different model architectures. These are usually targeted at local inference.
* Cog (by Replicate) to create environments to which you can easily attach CUDA devices and make it easy to spin up inference on remote servers.
* GGUF as the filetype ggml-based executors.
None of them have really broken out yet, but this is going to become a fiercer competition as the market matures.
Full Video Podcast
As a reminder, all Latent Space pods now come in full video on our YouTube, with bonus content that we cut for time!
Show Notes
* Ben Firshman
* Replicate
* Free $10 credit for Latent Space readers
* Andreas Jansson (Ben’s co-founder)
* Charlie Holtz (Replicate’s Hacker in Residence)
* Fig (now Docker Compose)
* Command Line Interface Guidelines (clig)
* Apple Human Interface Guidelines
* arXiv Vanity
* Open Interpreter
* PixRay
* SF Compute
* Big Sleep by Advadnoun
* VQGAN-CLIP by Rivers Have Wings
Timestamps
* [00:00:00] Introductions
* [00:01:17] Low latency is all you need
* [00:04:08] Evolution of CLIs
* [00:05:59] How building ArxivVanity led to Replicate
* [00:11:37] Making ML research replicable with containers
* [00:17:22] Doing YC in 2020 and pivoting to tools for COVID
* [00:20:22] Launching the first version of Replicate
* [00:25:51] Embracing the generative image community
* [00:28:04] Getting reverse engineered into an API product
* [00:31:25] Growing to 2 million users
* [00:34:29] Indie vs Enterprise customers
* [00:37:09] How Unsplash uses Replicate
* [00:38:29] Learnings from Docker that went into Cog
* [00:45:25] Creating AI standards
* [00:50:05] Replicate's compute availability
* [00:53:55] Fixing GPU waste
* [01:00:39] What's open source AI?
* [01:04:46] Building for AI engineers
* [01:06:41] Hiring at Replicate
This summary covers the full range of topics discussed throughout the episode, providing a comprehensive overview of the content and insights shared.
Transcript
Alessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.
Swyx [00:00:14]: Hey, and today we have Ben Firshman in the studio. Welcome Ben.
Ben [00:00:18]: Hey, good to be here.
Swyx [00:00:19]: Ben, you're a co-founder and CEO of Replicate. Before that, you were most notably founder of Fig, which became Docker Compose. You also did a couple of other things before that, but that's what a lot of people know you for. What should people know about you that, you know, outside of your, your sort of LinkedIn profile?
Ben [00:00:35]: Yeah. Good question. I think I'm a builder and tinkerer, like in a very broad sense. And I love using my hands to make things. So like I work on, you know, things may be a bit closer to tech, like electronics. I also like build things out of wood and I like fix cars and I fix my bike and build bicycles and all this kind of stuff. And there's so much, I think I've learned from transferable skills, from just like working in the real world to building things, building things in software. And you know, it's so much about being a builder, both in real life and, and in software that crosses over.
Swyx [00:01:11]: Is there a real world analogy that you use often when you're thinking about like a code architecture or problem?
Ben [00:01:17]: I like to build software tools as if they were something real. So I wrote this thing called the command line interface guidelines, which was a bit like sort of the Mac human interface guidelines, but for command line interfaces, I did it with the guy I created Docker Compose with and a few other people. And I think something in there, I think I described that your command line interface should feel like a big iron machine where you pull a lever and it goes clunk and like things should respond within like 50 milliseconds as if it was like a real life thing. And like another analogy here is like in the real life, you know, when you press a button on an electronic device and it's like a soft switch and you press it and nothing happens and there's no physical feedback of anything happening, then like half a second later, something happens. Like that's how a lot of software feels, but instead like software should feel more like something that's real where you touch, you pull a physical lever and the physical lever moves, you know, and I've taken that lesson of kind of human interface to, to software a ton. You know, it's all about kind of low latency of feeling, things feeling really solid and robust, both the command lines and, and user interfaces as well.
Swyx [00:02:22]: And how did you operationalize that for Fig or Docker?
Ben [00:02:27]: A lot of it's just low latency. Actually, we didn't do it very well for Fig in the first place. We used Python, which was a big mistake where Python's really hard to get booting up fast because you have to load up the whole Python runtime before it can run anything. Okay. Go is much better at this where like Go just instantly starts.
Swyx [00:02:45]: You have to be under 500 milliseconds to start up?
Ben [00:02:48]: Yeah, effectively. I mean, I mean, you know, perception of human things being immediate is, you know, something like a hundred milliseconds. So anything like that is, is yeah, good enough.
Swyx [00:02:57]: Yeah. Also, I should mention, since we're talking about your side projects, well, one thing is I am maybe one of a few fellow people who have actually written something about CLI design principles because I was in charge of the Netlify CLI back in the day and had many thoughts. One of my fun thoughts, I'll just share it in case you have thoughts, is I think CLIs are effectively starting points for scripts that are then run. And the moment one of the script's preconditions are not fulfilled, typically they end. So the CLI developer will just exit the program. And the way that I designed, I really wanted to create the Netlify dev workflow was for it to be kind of a state machine that would resolve itself. If it detected a precondition wasn't fulfilled, it would actually delegate to a subprogram that would then fulfill that precondition, asking for more info or waiting until a condition is fulfilled. Then it would go back to the original flow and continue that. I don't know if that was ever tried or is there a more formal definition of it? Because I just came up with it randomly. But it felt like the beginnings of AI in the sense that when you run a CLI command, you have an intent to do something and you may not have given the CLI all the things that it needs to do, to execute that intent. So that was my two cents.
Ben [00:04:08]: Yeah, that reminds me of a thing we sort of thought about when writing the CLI guidelines, where CLIs were designed in a world where the CLI was really a programming environment and it's primarily designed for machines to use all of these commands and scripts. Whereas over time, the CLI has evolved to humans. It was back in a world where the primary way of using computers was writing shell scripts effectively. We've transitioned to a world where actually humans are using CLI programs much more than they used to. And the current sort of best practices about how Unix was designed, there's lots of design documents about Unix from the 70s and 80s, where they say things like, command line commands should not output anything on success. It should be completely silent, which makes sense if you're using it in a shell script. But if a user is using that, it just looks like it's broken. If you type copy and it just doesn't say anything, you assume that it didn't work as a new user. I think what's really interesting about the CLI is that it's actually a really good, to your point, it's a really good user interface where it can be like a conversation, where it feels like you're, instead of just like you telling the computer to do this thing and either silently succeeding or saying, no, you did, failed, it can guide you in the right direction and tell you what your intent might be, and that kind of thing in a way that's actually, it's almost more natural to a CLI than it is in a graphical user interface because it feels like this back and forth with the computer, almost funnily like a language model. So I think there's some interesting intersection of CLIs and language models actually being very sort of closely related and a good fit for each other.
Swyx [00:05:59]: Yeah, I'll say one of the surprises from last year, I worked on a coding agent, but I think the most successful coding agent of my cohort was Open Interpreter, which was a CLI implementation. And I have chronically, even as a CLI person, I have chronically underestimated the CLI as a useful interface. You also developed ArchiveVanity, which you recently retired after a glorious seven years.
Ben [00:06:22]: Something like that.
Swyx [00:06:23]: Which is nice, I guess, HTML PDFs.
Ben [00:06:27]: Yeah, that was actually the start of where Replicate came from. Okay, we can tell that story. So when I quit Docker, I got really interested in science infrastructure, just as like a problem area, because it is like science has created so much progress in the world. The fact that we're, you know, can talk to each other on a podcast and we use computers and the fact that we're alive is probably thanks to medical research, you know. But science is just like completely archaic and broken and it's like 19th century processes that just happen to be copied to the internet rather than take into account that, you know, we can transfer information at the speed of light now. And the whole way science is funded and all this kind of thing is all kind of very broken. And there's just so much potential for making science work better. And I realized that I wasn't a scientist and I didn't really have any time to go and get a PhD and become a researcher, but I'm a tool builder and I could make existing scientists better at their job. And if I could make like a bunch of scientists a little bit better at their job, maybe that's the kind of equivalent of being a researcher. So one particular thing I dialed in on is just how science is disseminated in that all of these PDFs, quite often behind paywalls, you know, on the internet.
Swyx [00:07:34]: And that's a whole thing because it's funded by national grants, government grants, then they're put behind paywalls. Yeah, exactly.
Ben [00:07:40]: That's like a whole, yeah, I could talk for hours about that. But the particular thing we got dialed in on was, interestingly, these PDFs are also, there's a bunch of open science that happens as well. So math, physics, computer science, machine learning, notably, is all published on the archive, which is actually a surprisingly old institution.
Swyx [00:08:00]: Some random Cornell.
Ben [00:08:01]: Yeah, it was just like somebody in Cornell who started a mailing list in the 80s. And then when the web was invented, they built a web interface around it. Like it's super old.
Swyx [00:08:11]: And it's like kind of like a user group thing, right? That's why they're all these like numbers and stuff.
Ben [00:08:15]: Yeah, exactly. Like it's a bit like something, yeah. That's where all basically all of math, physics and computer science happens. But it's still PDFs published to this thing. Yeah, which is just so infuriating. The web was invented at CERN, a physics institution, to share academic writing. Like there are figure tags, there are like author tags, there are heading tags, there are site tags. You know, hyperlinks are effectively citations because you want to link to another academic paper. But instead, you have to like copy and paste these things and try and get around paywalls. Like it's absurd, you know. And now we have like social media and things, but still like academic papers as PDFs, you know. This is not what the web was for. So anyway, I got really frustrated with that. And I went on vacation with my old friend Andreas. So we were, we used to work together in London on a startup, at somebody else's startup. And we were just on vacation in Greece for fun. And he was like trying to read a machine learning paper on his phone, you know, like we had to like zoom in and like scroll line by line on the PDF. And he was like, this is f*****g stupid. So I was like, I know, like this is something we discovered our mutual hatred for this, you know. And we spent our vacation sitting by the pool, like making latex to HTML, like converters, making the first version of Archive Vanity. Anyway, that was up then a whole thing. And the story, we shut it down recently because they caught the eye of Archive. They were like, oh, this is great. We just haven't had the time to work on this. And what's tragic about the Archive, it's like this project of Cornell that's like, they can barely scrounge together enough money to survive. I think it might be better funded now than it was when we were, we were collaborating with them. And compared to these like scientific journals, it's just that this is actually where the work happens. But they just have a fraction of the money that like these big scientific journals have, which is just so tragic. But anyway, they were like, yeah, this is great. We can't afford to like do it, but do you want to like as a volunteer integrate arXiv Vanity into arXiv?
Swyx [00:10:05]: Oh, you did the work.
Ben [00:10:06]: We didn't do the work. We started doing the work. We did some. I think we worked on this for like a few months to actually get it integrated into arXiv. And then we got like distracted by Replicate. So a guy called Dan picked up the work and made it happen. Like somebody who works on one of the, the piece of the libraries that powers arXiv Vanity. Okay.
Swyx [00:10:26]: And the relationship with arXiv Sanity?
Ben [00:10:28]: None.
Swyx [00:10:30]: Did you predate them? I actually don't know the lineage.
Ben [00:10:32]: We were after, we both were both users of arXiv Sanity, which is like a sort of arXiv...
Ben [00:10:37]: Which is Andre's RecSys on top of arXiv.
Ben [00:10:40]: Yeah. Yeah. And we were both users of that. And I think we were trying to come up with a working name for arXiv and Andreas just like cracked a joke of like, oh, let's call it arXiv Vanity. Let's make the papers look nice. Yeah. Yeah. And that was the working name and it just stuck.
Swyx [00:10:52]: Got it.
Ben [00:10:53]: Got it.
Alessio [00:10:54]: Yeah. And then from there, tell us more about why you got distracted, right? So Replicate, maybe it feels like an overnight success to a lot of people, but you've been building this since 2019. Yeah.
Ben [00:11:04]: So what prompted the start?
Alessio [00:11:05]: And we've been collaborating for even longer.
Ben [00:11:07]: So we created arXiv Vanity in 2017. So in some sense, we've been doing this almost like six, seven years now, a classic seven year.
Swyx [00:11:16]: Overnight success.
Ben [00:11:17]: Yeah. Yes. We did arXiv Vanity and then worked on a bunch of like surrounding projects. I was still like really interested in science publishing at that point. And I'm trying to remember, because I tell a lot of like the condensed story to people because I can't really tell like a seven year history. So I'm trying to figure out like the right. Oh, we got room. The right length.
Swyx [00:11:35]: We want to nail the definitive Replicate story here.
Ben [00:11:37]: One thing that's really interesting about these machine learning papers is that these machine learning papers are published on arXiv and a lot of them are actual fundamental research. So like should be like prose describing a theory. But a lot of them are just running pieces of software that like a machine learning researcher made that did something, you know, it was like an image classification model or something. And they managed to make an image classification model that was better than the existing state of the art. And they've made an actual running piece of software that does image segmentation. And then what they had to do is they then had to take that piece of software and write it up as prose and math in a PDF. And what's frustrating about that is like if you want to. So this was like Andreas is, Andreas was a machine learning engineer at Spotify. And some of his job was like he did pure research as well. Like he did a PhD and he was doing a lot of stuff internally. But part of his job was also being an engineer and taking some of these existing things that people have made and published and trying to apply them to actual problems at Spotify. And he was like, you know, you get given a paper which like describes roughly how the model works. It's probably listing lots of crucial information. There's sometimes code on GitHub. More and more there's code on GitHub. But back then it was kind of relatively rare. But it's quite often just like scrappy research code and didn't actually run. And, you know, there was maybe the weights that were on Google Drive, but they accidentally deleted the weights of Google Drive, you know, and it was like really hard to like take this stuff and actually use it for real things. We just started talking together about like his problems at Spotify and I connected this back to my work at Docker as well. I was like, oh, this is what we created containers for. You know, we solved this problem for normal software by putting the thing inside a container so you could ship it around and it kept on running. So we were sort of hypothesizing about like, hmm, what if we put machine learning models inside containers so they could actually be shipped around and they could be defined in like some production ready formats and other researchers could run them to generate baselines and you could people who wanted to actually apply them to real problems in the world could just pick up the container and run it, you know. And we then thought this is quite whether it gets normally in this part of the story I skip forward to be like and then we created cog this container stuff for machine learning models and we created Replicate, the place for people to publish these machine learning models. But there's actually like two or three years between that. The thing we then got dialed into was Andreas was like, what if there was a CI system for machine learning? It's like one of the things he really struggled with as a researcher is generating baselines. So when like he's writing a paper, he needs to like get like five other models that are existing work and get them running.
Swyx [00:14:21]: On the same evals.
Ben [00:14:22]: Exactly, on the same evals so you can compare apples to apples because you can't trust the numbers in the paper.
Swyx [00:14:26]: So you can be Google and just publish them anyway.
Ben [00:14:31]: So I think this was coming from the thinking of like there should be containers for machine learning, but why are people going to use that? Okay, maybe we can create a supply of containers by like creating this useful tool for researchers. And the useful tool was like, let's get researchers to package up their models and push them to the central place where we run a standard set of benchmarks across the models so that you can trust those results and you can compare these models apples to apples and for like a researcher for Andreas, like doing a new piece of research, he could trust those numbers and he could like pull down those models, confirm it on his machine, use the standard benchmark to then measure his model and you know, all this kind of stuff. And so we started building that. That's what we applied to YC with, got into YC and we started sort of building a prototype of this. And then this is like where it all starts to fall apart. We were like, okay, that sounds great. And we talked to a bunch of researchers and they really wanted that and that sounds brilliant. That's a great way to create a supply of like models on this research platform. But how the hell is this a business, you know, like how are we even going to make any money out of this? And we're like, oh s**t, that's like the, that's the real unknown here of like what the business is. So we thought it would be a really good idea to like, okay, before we get too deep into this, let's try and like reduce the risk of this turning into a business. So let's try and like research what the business could be for this research tool effectively. So we went and talked to a bunch of companies trying to sell them something which didn't exist. So we're like, hey, do you want a way to share research inside your company so that other researchers or say like the product manager can test out the machine learning model? They're like, maybe. And we were like, do you want like a deployment platform for deploying models? Like, do you want like a central place for versioning models? Like we're trying to think of like lots of different like products we could sell that were like related to this thing. And terrible idea. Like we're not sales people and like people don't want to buy something that doesn't exist. I think some people can pull this off, but we were just like, you know, a bunch of product people, products and engineer people, and we just like couldn't pull this off. So we then got halfway through our YC batch. We hadn't built a product. We had no users. We had no idea what our business was going to be because we couldn't get anybody to like buy something which didn't exist. And actually there was quite a way through our, I think it was like two thirds the way through our YC batch or something. And we're like, okay, well we're kind of screwed now because we don't have anything to show at demo day. And then we then like tried to figure out, okay, what can we build in like two weeks that'll be something. So we like desperately tried to, I can't remember what we've tried to build at that point. And then two weeks before demo day, I just remember it was all, we were going down to Mountain View every week for dinners and we got called on to like an all hands Zoom call, which was super weird. We're like, what's going on? And they were like, don't come to dinner tomorrow. And we realized, we kind of looked at the news and we were like, oh, there's a pandemic going on. We were like so deep in our startup. We were just like completely oblivious to what was going on around us.
Swyx [00:17:20]: Was this Jan or Feb 2020?
Ben [00:17:22]: This was March 2020. March 2020. 2020.
Swyx [00:17:25]: Yeah. Because I remember Silicon Valley at the time was early to COVID. Like they started locking down a lot faster than the rest of the US.
Ben [00:17:32]: Yeah, exactly. And I remember, yeah, soon after that, like there was the San Francisco lockdowns and then like the YC batch just like stopped. There wasn't demo day and it was in a sense a blessing for us because we just kind of
Swyx [00:17:43]: In the normal course of events, you're actually allowed to defer to a future demo day. Yeah.
Ben [00:17:51]: So we didn't even take any defer because it just kind of didn't happen.
Swyx [00:17:55]: So was YC helpful?
Ben [00:17:57]: Yes. We completely screwed up the batch and that was our fault. I think the thing that YC has become incredibly valuable for us has been after YC. I think there was a reason why we couldn't, didn't need to do YC to start with because we were quite experienced. We had done some startups before. We were kind of well connected with VCs, you know, it was relatively easy to raise money because we were like a known quantity. You know, if you go to a VC and be like, Hey, I made this piece of-
Swyx [00:18:24]: It's Docker Compose for AI.
Ben [00:18:26]: Exactly. Yeah. And like, you know, people can pattern match like that and they can have some trust, you know what you're doing. Whereas it's much harder for people straight out of college and that's where like YC sweet spot is like helping people straight out of college who are super promising, like figure out how to do that.
Swyx [00:18:40]: No credentials.
Ben [00:18:41]: Yeah, exactly. We don't need that. But the thing that's been incredibly useful for us since YC has been, this was actually, I think, so Docker was a YC company and Solomon, the founder of Docker, I think told me this. He was like, a lot of people underestimate the value of YC after you finish the batch. And his biggest regret was like not staying in touch with YC. I might be misattributing this, but I think it was him. And so we made a point of that. And we just stayed in touch with our batch partner, who Jared at YC has been fantastic.
Ben [00:19:10]: Jared Friedman. All of like the team at YC, there was the growth team at YC when they were still there and they've been super helpful. And two things have been super helpful about that is like raising money, like they just know exactly how to raise money. And they've been super helpful during that process in all of our rounds, like we've done three rounds since we did YC and they've been super helpful during the whole process. And also just like reaching a ton of customers. So like the magic of YC is that you have all of, like there's thousands of YC companies, I think, on the order of thousands, I think. And they're all of your first customers. And they're like super helpful, super receptive, really want to like try out new things. You have like a warm intro to every one of them basically. And there's this mailing list where you can post about updates to your products, which is like really receptive. And that's just been fantastic for us. Like we've just like got so many of our users and customers through YC. Yeah.
Swyx [00:20:00]: Well, so the classic criticism or the sort of, you know, pushback is people don't buy you because you are both from YC. But at least they'll open the email. Right. Like that's the... Okay.
Ben [00:20:13]: Yeah. Yeah. Yeah.
Swyx [00:20:16]: So that's been a really, really positive experience for us. And sorry, I interrupted with the YC question. Like you were, you make it, you just made it out of the YC, survived the pandemic.
Ben [00:20:22]: I'll try and condense this a little bit. Then we started building tools for COVID weirdly. We were like, okay, we don't have a startup. We haven't figured out anything. What's the most useful thing we could be doing right now?
Swyx [00:20:32]: Save lives.
Ben [00:20:33]: So yeah. Let's try and save lives. I think we failed at that as well. We had a bunch of products that didn't really go anywhere. We kind of worked on, yeah, a bunch of stuff like contact tracing, which turned out didn't really be a useful thing. Sort of Andreas worked on like a door dash for like people delivering food to people who are vulnerable. What else did we do? The meta problem of like helping people direct their efforts to what was most useful and a few other things like that. It didn't really go anywhere. So we're like, okay, this is not really working either. We were considering actually just like doing like work for COVID. We have this decision document early on in our company, which is like, should we become a like government app contracting shop? We decided no.
Swyx [00:21:11]: Because you also did work for the gov.uk. Yeah, exactly.
Ben [00:21:14]: We had experience like doing some like-
Swyx [00:21:17]: And the Guardian and all that.
Ben [00:21:18]: Yeah. For like government stuff. And we were just like really good at building stuff. Like we were just like product people. Like I was like the front end product side and Andreas was the back end side. So we were just like a product. And we were working with a designer at the time, a guy called Mark, who did our early designs for Replicate. And we were like, hey, what if we just team up and like become and build stuff? And yeah, we gave up on that in the end for, I can't remember the details. So we went back to machine learning. And then we were like, well, we're not really sure if this is going to work. And one of my most painful experiences from previous startups is shutting them down. Like when you realize it's not really working and having to shut it down, it's like a ton of work and it's people hate you and it's just sort of, you know. So we were like, how can we make something we don't have to shut down? And even better, how can we make something that won't page us in the middle of the night? So we made an open source project. We made a thing which was an open source Weights and Biases, because we had this theory that like people want open source tools. There should be like an open source, like version control, experiment tracking like thing. And it was intuitive to us and we're like, oh, we're software developers and we like command line tools. Like everyone loves command line tools and open source stuff, but machine learning researchers just really didn't care. Like they just wanted to click on buttons. They didn't mind that it was a cloud service. It was all very visual as well, that you need lots of graphs and charts and stuff like this. So it wasn't right. Like it was right. We actually were building something that Andreas made at Spotify for just like saving experiments to cloud storage automatically, but other people didn't really want this. So we kind of gave up on that. And then that was actually originally called Replicate and we renamed that out of the way. So it's now called Keepsake and I think some people still use it. Then we sort of came back, we looped back to our original idea. So we were like, oh, maybe there was a thing in that thing we were originally sort of thinking about of like researchers sharing their work and containers for machine learning models. So we just built that. And at that point we were kind of running out of the YC money. So we were like, okay, this like feels good though. Let's like give this a shot. So that was the point we raised a seed round. We raised seed round. Pre-launch. We raised pre-launch and pre-team. It was an idea basically. We had a little prototype. It was just an idea and a team. But we were like, okay, like, you know, bootstrapping this thing is getting hard. So let's actually raise some money. Then we made Cog and Replicate. It initially didn't have APIs, interestingly. It was just the bit that I was talking about before of helping researchers share their work. So it was a way for researchers to put their work on a webpage such that other people could try it out and so that you could download the Docker container. We cut the benchmarks thing of it because we thought that was just like too complicated. But it had a Docker container that like, you know, Andreas in a past life could download and run with his benchmark and you could compare all these models apples to apples. So that was like the theory behind it. That kind of started to work. It was like still when like, you know, it was long time pre-AI hype and there was lots of interesting stuff going on, but it was very much in like the classic deep learning era. So sort of image segmentation models and sentiment analysis and all these kinds of things, you know, that people were using, that we're using deep learning models for. And we were very much building for research because all of this stuff was happening in research institutions, you know, the sort of people who'd be publishing to archive. So we were creating an accompanying material for their models, basically, you know, they wanted a demo for their models and we were creating a company material for it. What was funny about that is they were like not very good users. Like they were, they were doing great work obviously, but, but the way that research worked is that they, they just made like one thing every six months and they just fired and forget it, forgot it. Like they, they published this piece of paper and like, done, I've, I've published it. So they like output it to Replicate and then they just stopped using Replicate. You know, they were like once every six monthly users and that wasn't great for us, but we stumbled across this early community. This was early 2021 when OpenAI created this, created CLIP and people started smushing CLIP and GANs together to produce image generation models. And this started with, you know, it was just a bunch of like tinkerers on Discord, basically. There was an early model called Big Sleep by Advadnoun. And then there was VQGAN Clip, which was like a bit more popular by Rivers Have Wings. And it was all just people like tinkering on stuff in Colabs and it was very dynamic and it was people just making copies of co-labs and playing around with things and forking in. And to me this, I saw this and I was like, oh, this feels like open source software, like so much more than the research world where like people are publishing these papers.
Swyx [00:25:48]: You don't know their real names and it's just like a Discord.
Ben [00:25:51]: Yeah, exactly. But crucially, it was like people were tinkering and forking and things were moving really fast and it just felt like this creative, dynamic, collaborative community in a way that research wasn't really, like it was still stuck in this kind of six month publication cycle. So we just kind of latched onto that and started building for this community. And you know, a lot of those early models were published on Replicate. I think the first one that was really primarily on Replicate was one called Pixray, which was sort of mid 2021 and it had a really cool like pixel art output, but it also just like produced general, you know, the sort of, they weren't like crisp in images, but they were quite aesthetically pleasing, like some of these early image generation models. And you know, that was like published primarily on Replicate and then a few other models around that were like published on Replicate. And that's where we really started to find our early community and like where we really found like, oh, we've actually built a thing that people want and they were great users as well. And people really want to try out these models. Lots of people were like running the models on Replicate. We still didn't have APIs though, interestingly, and this is like another like really complicated part of the story. We had no idea what a business model was still at this point. I don't think people could even pay for it. You know, it was just like these web forms where people could run the model.
Swyx [00:27:06]: Just for historical interest, which discords were they and how did you find them? Was this the Lion Discord? Yeah, Lion. This is Eleuther.
Ben [00:27:12]: Eleuther, yeah. It was the Eleuther one. These two, right? There was a channel where Viki Gangklep, this was early 2021, where Viki Gangklep was set up as a Discord bot. I just remember being completely just like captivated by this thing. I was just like playing around with it all afternoon and like the sort of thing. In Discord. Oh s**t, it's 2am. You know, yeah.
Swyx [00:27:33]: This is the beginnings of Midjourney.
Ben [00:27:34]: Yeah, exactly. And Stability. It was the start of Midjourney. And you know, it's where that kind of user interface came from. Like what's beautiful about the user interface is like you could see what other people are doing. And you could riff off other people's ideas. And it was just so much fun to just like play around with this in like a channel full of a hundred people. And yeah, that just like completely captivated me and I'm like, okay, this is something, you know. So like we should get these things on Replicate. Yeah, that's where that all came from.
Swyx [00:28:00]: And then you moved on to, so was it APIs next or was it Stable Diffusion next?
Ben [00:28:04]: It was APIs next. And the APIs happened because one of our users, our web form had like an internal API for making the web form work, like with an API that was called from JavaScript. And somebody like reverse engineered that to start generating images with a script. You know, they did like, you know, Web Inspector Coffee is Carl, like figured out what the API request was. And it wasn't secured or anything.
Swyx [00:28:28]: Of course not.
Ben [00:28:29]: They started generating a bunch of images and like we got tons of traffic and like what's going on? And I think like a sort of usual reaction to that would be like, hey, you're abusing our API and to shut them down. And instead we're like, oh, this is interesting. Like people want to run these models. So we documented the API in a Notion document, like our internal API in a Notion document and like message this person being like, hey, you seem to have found our API. Here's the documentation. That'll be like a thousand bucks a month, please, with a straight form, like we just click some buttons to make. And they were like, sure, that sounds great. So that was our first customer.
Swyx [00:29:05]: A thousand bucks a month.
Ben [00:29:07]: It was a surprising amount of money. That's not casual. It was on the order of a thousand bucks a month.
Swyx [00:29:11]: So was it a business?
Ben [00:29:13]: It was the creator of PixRay. Like it was, he generated NFT art. And so he like made a bunch of art with these models and was, you know, selling these NFTs effectively. And I think lots of people in his community were doing similar things. And like he then referred us to other people who were also generating NFTs and he joined us with models. We started our API business. Yeah. Then we like made an official API and actually like added some billing to it. So it wasn't just like a fixed fee.
Swyx [00:29:40]: And now people think of you as the host and models API business. Yeah, exactly.
Ben [00:29:44]: But that just turned out to be our business, you know, but what ended up being beautiful about this is it was really fulfilling. Like the original goal of what we wanted to do is that we wanted to make this research that people were making accessible to like other people and for it to be used in the real world. And this was like the just like ultimately the right way to do it because all of these people making these generative models could publish them to replicate and they wanted a place to publish it. And software engineers, you know, like myself, like I'm not a machine learning expert, but I want to use this stuff, could just run these models with a single line of code. And we thought, oh, maybe the Docker image is enough, but it's actually super hard to get the Docker image running on a GPU and stuff. So it really needed to be the hosted API for this to work and to make it accessible to software engineers. And we just like wound our way to this. Yeah.
Swyx [00:30:30]: Two years to the first paying customer. Yeah, exactly.
Alessio [00:30:33]: Did you ever think about becoming Midjourney during that time? You have like so much interest in image generation.
Swyx [00:30:38]: I mean, you're doing fine for the record, but, you know, it was right there, you were playing with it.
Ben [00:30:46]: I don't think it was our expertise. Like I think our expertise was DevTools rather than like Midjourney is almost like a consumer products, you know? Yeah. So I don't think it was our expertise. It certainly occurred to us. I think at the time we were thinking about like, oh, maybe we could hire some of these people in this community and make great models and stuff like this. But we ended up more being at the tooling. Like I think like before I was saying, like I'm not really a researcher, but I'm more like the tool builder, the behind the scenes. And I think both me and Andreas are like that.
Swyx [00:31:09]: I think this is an illustration of the tool builder philosophy. Something where you latch on to in DevTools, which is when you see people behaving weird, it's not their fault, it's yours. And you want to pave the cow paths is what they say, right? Like the unofficial paths that people are making, like make it official and make it easy for them and then maybe charge a bit of money.
Alessio [00:31:25]: And now fast forward a couple of years, you have 2 million developers using Replicate. Maybe more. That was the last public number that I found.
Ben [00:31:33]: It's 2 million users. Not all those people are developers, but a lot of them are developers, yeah.
Alessio [00:31:38]: And then 30,000 paying customers was the number late in space runs on Replicate. So we had a small podcaster and we host a whisper diarization on Replicate. And we're paying. So we're late in space in the 30,000. You raised a $40 million dollars, Series B. I would say that maybe the stable diffusion time, August 22, was like really when the company started to break out. Tell us a bit about that and the community that came out and I know now you're expanding beyond just image generation.
Ben [00:32:06]: Yeah, like I think we kind of set ourselves, like we saw there was this really interesting image, generative image world going on. So we kind of, you know, like we're building the tools for that community already, really. And we knew stable diffusion was coming out. We knew it was a really exciting thing, you know, it was the best generative image model so far. I think the thing we underestimated was just like what an inflection point it would be, where it was, I think Simon Willison put it this way, where he said something along the lines of it was a model that was open source and tinkerable and like, you know, it was just good enough and open source and tinkerable such that it just kind of took off in a way that none of the models had before. And like what was really neat about stable diffusion is it was open source so you could like, compared to like Dali, for example, which was like sort of equivalent quality. And like the first week we saw like people making animation models out of it. We saw people make like game texture models that like use circular convolutions to make repeatable textures. We saw, you know, a few weeks later, like people were fine tuning it so you could make, put your face in these models and all of these other-
Swyx [00:33:10]: Textual inversion.
Ben [00:33:11]: Yep. Yeah, exactly. That happened a bit before that. And all of this sort of innovation was happening all of a sudden. And people were publishing on Replicate because you could just like publish arbitrary models on Replicate. So we had this sort of supply of like interesting stuff being built. But because it was a sufficiently good model, there was also just like a ton of people building with it. They were like, oh, we can build products with this thing. And this was like about the time where people were starting to get really interested in AI. So like tons of product builders wanted to build stuff with it. And we were just like sitting in there in the middle, it's like the interface layer between like all these people who wanted to build and all these like machine learning experts who were building cool models. And that's like really where it took off. We were just sort of incredible supply, incredible demand, and we were just like in the middle. And then, yeah, since then, we've just kind of grown and grown really. And we've been building a lot for like the indie hacker community, these like individual tinkerers, but also startups and a lot of large companies as well who are sort of exploring and building AI things. Then kind of the same thing happened like middle of last year with language models and Lama 2, where the same kind of stable diffusion effect happened with Lama. And Lama 2 was like our biggest week of growth ever because like tons of people wanted to tinker with it and run it. And you know, since then we've just been seeing a ton of growth in language models as well as image models. Yeah. We're just kind of riding a lot of the interest that's going on in AI and all the people building in AI, you know. Yeah.
Swyx [00:34:29]: Kudos. Right place, right time. But also, you know, took a while to position for the right place before the wave came. I'm curious if like you have any insights on these different markets. So Peter Levels, notably very loud person, very picky about his tools. I wasn't sure actually if he used you. He does. So you've met him on your Series B blog posts and Danny Post might as well, his competitor all in that wave. What are their needs versus, you know, the more enterprise or B2B type needs? Did you come to a decision point where you're like, okay, you know, how serious are these indie hackers versus like the actual businesses that are bigger and perhaps better customers because they're less churny?
Ben [00:35:04]: They're surprisingly similar because I think a lot of people right now want to use and build with AI, but they're not AI experts and they're not infrastructure experts either. So they want to be able to use this stuff without having to like figure out all the internals of the models and, you know, like touch PyTorch and whatever. And they also don't want to be like setting up and booting up servers. And that's the same all the way from like indie hackers just getting started because like obviously you just want to get started as quickly as possible, all the way through to like large companies who want to be able to use this stuff, but don't have like all of the experts on stuff, you know, you know, big companies like Google and so on that do actually have a lot of experts on stuff, but the vast majority of companies don't. And they're all software engineers who want to be able to use this AI stuff, but they just don't know how to use it. And it's like, you really need to be an expert and it takes a long time to like learn the skills to be able to use that. So they're surprisingly similar in that sense. I think it's kind of also unfair of like the indie community, like they're not churning surprisingly, or churny or spiky surprisingly, like they're building real established businesses, which is like, kudos to them, like building these really like large, sustainable businesses, often just as solo developers. And it's kind of remarkable how they can do that actually, and it's in credit to a lot of their like product skills. And you know, we're just like there to help them being like their machine learning team effectively to help them use all of this stuff. A lot of these indie hackers are some of our largest customers, like alongside some of our biggest customers that you would think would be spending a lot more money than them, but yeah.
Swyx [00:36:35]: And we should name some of these. So you have them on your landing page, your Buzzfeed, you have Unsplash, Character AI. What do they power? What can you say about their usage?
Ben [00:36:43]: Yeah, totally. It's kind of a various things.
Swyx [00:36:46]: Well, I mean, I'm naming them because they're on your landing page. So you have logo rights. It's useful for people to, like, I'm not imaginative. I see monkey see monkey do, right? Like if I see someone doing something that I want to do, then I'm like, okay, Replicate's great for that.
Ben [00:37:00]: Yeah, yeah, yeah.
Swyx [00:37:01]: So that's what I think about case studies on company landing pages is that it's just a way of explaining like, yep, this is something that we are good for. Yeah, totally.
Ben [00:37:09]: I mean, it's, these companies are doing things all the way up and down the stack at different levels of sophistication. So like Unsplash, for example, they actually publicly posted this story on Twitter where they're using BLIP to annotate all of the images in their catalog. So you know, they have lots of images in the catalog and they want to create a text description of it so you can search for it. And they're annotating images with, you know, off the shelf, open source model, you know, we have this big library of open source models that you can run. And you know, we've got lots of people are running these open source models off the shelf. And then most of our larger customers are doing more sophisticated stuff. So they're like fine tuning the models, they're running completely custom models on us. A lot of these larger companies are like, using us for a lot of their, you know, inference, but it's like a lot of custom models and them like writing the Python themselves because they've got machine learning experts on the team. And they're using us for like, you know, their inference infrastructure effectively. And so it's like lots of different levels of sophistication where like some people using these off the shelf models. Some people are fine tuning models. So like level, Peter Levels is a great example where a lot of his products are based off like fine tuning, fine tuning image models, for example. And then we've also got like larger customers who are just like using us as infrastructure effectively. So yeah, it's like all things up and down, up and down the stack.
Alessio [00:38:29]: Let's talk a bit about COG and the technical layer. So there are a lot of GPU clouds. I think people have different pricing points. And I think everybody tries to offer a different developer experience on top of it, which then lets you charge a premium. Why did you want to create COG?
Ben [00:38:46]: You worked at Docker.
Alessio [00:38:47]: What were some of the issues with traditional container runtimes? And maybe yeah, what were you surprised with as you built it?
Ben [00:38:54]: COG came right from the start, actually, when we were thinking about this, you know, evaluation, the sort of benchmarking system for machine learning researchers, where we wanted researchers to publish their models in a standard format that was guaranteed to keep on running, that you could replicate the results of, like that's where the name came from. And we realized that we needed something like Docker to make that work, you know. And I think it was just like natural from my point of view of like, obviously that should be open source, that we should try and create some kind of open standard here that people can share. Because if more people use this format, then that's great for everyone involved. I think the magic of Docker is not really in the software. It's just like the standard that people have agreed on, like, here are a bunch of keys for a JSON document, basically. And you know, that was the magic of like the metaphor of real containerization as well. It's not the containers that are interesting. It's just like the size and shape of the damn box, you know. And it's a similar thing here, where really we just wanted to get people to agree on like, this is what a machine learning model is. This is how a prediction works. This is what the inputs are, this is what the outputs are. So cog is really just a Docker container that attaches to a CUDA device, if it needs a GPU, that has a open API specification as a label on the Docker image. And the open API specification defines the interface for the machine learning model, like the inputs and outputs effectively, or the params in machine learning terminology. And you know, we just wanted to get people to kind of agree on this thing. And it's like general purpose enough, like we weren't saying like, some of the existing things were like at the graph level, but we really wanted something general purpose enough that you could just put anything inside this and it was like future compatible and it was just like arbitrary software. And you know, it'd be future compatible with like future inference servers and future machine learning model formats and all this kind of stuff. So that was the intent behind it. It just came naturally that we wanted to define this format. And that's been really working for us. Like a bunch of people have been using cog outside of replicates, which is kind of our original intention, like this should be how machine learning is packaged and how people should use it. Like it's common to use cog in situations where like maybe they can't use the SAS service because I don't know, they're in a big company and they're not allowed to use a SAS service, but they can use cog internally still. And like they can download the models from replicates and run them internally in their org, which we've been seeing happen. And that works really well. People who want to build like custom inference pipelines, but don't want to like reinvent the world, they can use cog off the shelf and use it as like a component in their inference pipelines. We've been seeing tons of usage like that and it's just been kind of happening organically. We haven't really been trying, you know, but it's like there if people want it and we've been seeing people use it. So that's great. Yeah. So a lot of it is just sort of philosophical of just like, this is how it should work from my experience at Docker, you know, and there's just a lot of value from like the core being open, I think, and that other people can share it and it's like an integration point. So, you know, if replicate, for example, wanted to work with a testing system, like a CI system or whatever, we can just like interface at the cog level, like that system just needs to put cog models and then you can like test your models on that CI system before they get deployed to replicate. And it's just like a format that everyone, we can get everyone to agree on, you know.
Alessio [00:41:55]: What do you think, I guess, Docker got wrong? Because if I look at a Docker Compose and a cog definition, first of all, the cog is kind of like the Dockerfile plus the Compose versus in Docker Compose, you're just exposing the services. And also Docker Compose is very like ports driven versus you have like the actual, you know, predict this is what you have to run.
Ben [00:42:16]: Yeah.
Alessio [00:42:17]: Any learnings and maybe tips for other people building container based runtimes, like how much should you separate the API services versus the image building or how much you want to build them together?
Ben [00:42:29]: I think it was coming from two sides. We were thinking about the design from the point of view of user needs, what are their problems and what problems can we solve for them, but also what the interface should be for a machine learning model. And it was sort of the combination of two things that led us to this design. So the thing I talked about before was a little bit of like the interface around the machine learning model. So we realized that we wanted to be general purpose. We wanted to be at the like JSON, like human readable things rather than the tensor level. So it was like an open API specification that wrapped a Docker container. And that's where that design came from. And it's really just a wrapper around Docker. So we were kind of building on, standing on shoulders there, but Docker is too low level. So it's just like arbitrary software. So we wanted to be able to like have a open API specification that defined the function effectively that is the machine learning model. But also like how that function is written, how that function is run, which is all defined in code and stuff like that. So it's like a bunch of abstraction on top of Docker to make that work. And that's where that design came from. But the core problems we were solving for users was that Docker is really hard to use and productionizing machine learning models is really hard. So on the first part of that, we knew we couldn't use Dockerfiles. Like Dockerfiles are hard enough for software developers to write. I'm saying this with love as somebody who works on Docker and like works on Dockerfiles, but it's really hard to use. And you need to know a bunch about Linux, basically, because you're running a bunch of CLI commands. You need to know a bunch about Linux and best practices and like how apt works and all this kind of stuff. So we're like, OK, we can't get to that level. We need something that machine learning researchers will be able to understand, like people who are used to like Colab notebooks. And what they understand is they're like, I need this version of Python. I need these Python packages. And somebody told me to apt-get install something. You know? If there was sudo in there, I don't really know what that means. So we tried to create a format that was at that level, and that's what cog.yaml is. And we were really kind of trying to imagine like, what is that machine learning researcher going to understand, you know, and trying to build for them. Then the productionizing machine learning models thing is like, OK, how can we package up all of the complexity of like productionizing machine learning models, like picking CUDA versions, like hooking it up to GPUs, writing an inference server, defining a schema, doing batching, all of these just like really gnarly things that everyone does again and again. And just like, you know, provide that as a tool. And that's where that side of it came from. So it's like combining those user needs with, you know, the sort of world need of needing like a common standard for like what a machine learning model is. And that's how we thought about the design. I don't know whether that answers the question.
Alessio [00:45:12]: Yeah. So your idea was like, hey, you really want what Docker stands for in terms of standard, but you actually don't want people to do all the work that goes into Docker.
Ben [00:45:22]: It needs to be higher level, you know?
Swyx [00:45:25]: So I want to, for the listener, you're not the only standard that is out there. As with any standard, there must be 14 of them. You are surprisingly friendly with Olama, who is your former colleagues from Docker, who came out with the model file. Mozilla came out with the Lama file. And then I don't know if this is in the same category even, but I'm just going to throw it in there. Like Hugging Face has the transformers and diffusers library, which is a way of disseminating models that obviously people use. How would you compare your contrast, your approach of Cog versus all these?
Ben [00:45:53]: It's kind of complementary, actually, which is kind of neat in that a lot of transformers, for example, is lower level than Cog. So it's a Python library effectively, but you still need to like...
Swyx [00:46:04]: Expose them.
Ben [00:46:05]: Yeah. You still need to turn that into an inference server. You still need to like install the Python packages and that kind of thing. So lots of replicate models are transformers models and diffusers models inside Cog, you know? So that's like the level that that sits. So it's very complementary in some sense. We're kind of working on integration with Hugging Face such that you can deploy models from Hugging Face into Cog models and stuff like that to replicate. And some of these things like Llamafile and what Llama are working on are also very complementary in that they're doing a lot of the sort of running these things locally on laptops, which is not a thing that works very well with Cog. Like Cog is really designed around servers and attaching to CUDA devices and NVIDIA GPUs and this kind of thing. So we're actually like, you know, figuring out ways that like we can, those things can be interoperable because, you know, they should be and they are quite complementary and that you should be able to like take a model and replicate and run it on your local machine. You should be able to take a model, you know, the machine and run it in the cloud.
Swyx [00:47:02]: Is the base layer something like, is it at the like the GGUF level, which by the way, I need to get a primer on like the different formats that have emerged, or is it at the star dot file level, which is model file, Llamafile, whatever, whatever, or is it at the Cog level? I don't know, to be honest.
Ben [00:47:16]: And I think this is something we still have to figure out. There's a lot yet, like exactly where those lines are drawn. Don't know exactly. I think this is something we're trying to figure out ourselves, but I think there's certainly a lot of promise about these systems interoperating. We just want things to work together. You know, we want to try and reduce the number of standards. So the more, the more these things can interoperate and, you know, convert between each other and that kind of stuff at the minute.
Swyx [00:47:34]: Cool. Well, there's a foundation for that.
Alessio [00:47:36]: Andreas comes out of Spotify, Eric from Moto also comes out of Spotify. You work at Docker and the Llamafile guys work at Docker. Did both you and Andreas know that there was somebody else you work with that had a kind of like similar, not similar idea, but like was interested in the same thing or did you then just say, oh, I know those people. They're doing something very similar.
Ben [00:47:58]: We learned about both early on actually, yeah, because we know, we know them both quite well. And it's funny how I think we're all seeing the same problems and just like applying, you know, trying to fix the same problems that we're all seeing. I think the Llama one's particularly funny because I joined Docker through my startup. Funnily, actually, the thing which worked for my startup was Compose, but we were actually working on another thing, which was a bit like EC2 for Docker. So we were working on like productionizing Docker containers. And Llama was working on a thing called Chimatic, which was a bit like a desktop app for Docker. And our companies both got bought by Docker at the same time. And you know, Chimatic turned into Docker desktop. And then, you know, our thing then turned into Compose. And it's funny how we're both applying our, like the things we saw at Docker to the AI world, but they're building like the local environment for us and we're building like the cloud for it. And yeah, so that's just like really pleasing. And I think, you know, we're collaborating closely because there's just so much opportunity for working there. You have a hammer.
Swyx [00:49:06]: Everything's a nail.
Ben [00:49:07]: Yeah, exactly. Exactly. So I think a lot of where we're coming from a lot with AI is we're all kind of on the replicated team. We're all kind of people who have built developer tools in the past. We've got a team, like I worked at Docker, we've got people who worked at Heroku and GitHub and like the iOS ecosystem and all this kind of thing, like the previous generation of developer tools, where we like figured out a bunch of stuff. And then like AI has come along and we just don't yet have those tools and abstractions like to make it easy to use. So we're trying to like take the lessons that we learned from the previous generation of stuff and apply it to this new generation of stuff. And obviously there's a bit of nuance there because the trick is to take like the right lessons and do new stuff where it makes sense. You can't just like cut and paste, you know, but that's like how we're approaching this is we're trying to like as much as possible, like take some of those lessons we learned from like, you know, how Heroku and GitHub was built, for example, and apply them to AI.
Swyx [00:50:05]: We should also talk a little bit about your compute availability. We're trying to ask this of all, you know, it's Compute Provider Month. Do you own your own GPUs? How many do you have access to? What do you feel about the tightness of the GPU market?
Ben [00:50:17]: We don't own our own GPUs. We've got a few that we play around with, but not for production workloads. And we are primarily built on public clouds, so primarily GCP and CoreWeave and like some smatterings elsewhere.
Swyx [00:50:29]: None from NVIDIA, which is your newest investor?
Ben [00:50:31]: We work with NVIDIA, so, you know, they're kind of helping us get GPU availability. GPUs are hard to get hold of. Like if you go to AWS and ask for one A100, they won't give you an A100. But if you go to AWS and say, I would like 100 A100s in two years, they're like, sure, we've got some. And I think the problem is like that makes sense from their point of view. They want just like reliable, sustained usage. They don't want like spiky usage and like wastage in their infrastructure, which makes total sense. But that makes it really hard for startups, you know, who are wanting to just like get hold of GPUs. I think we're in a fortunate position where we can aggregate demand so we can make commits to cloud providers. And then, you know, we actually have good availability, like, you know, we don't have infinite availability, obviously, but, you know, if you want an A100 from Replicate, you can get it. But, you know, we're seeing other companies pop up as well, like SF Compute's a great example of this, where they're doing the same idea for training almost where, you know, a lot of startups need to be able to train a model, but they can't get hold of GPUs from large cloud providers. So SF Compute is like letting people rent, you know, 10 H100s for two days, which is just impossible otherwise. And, you know, what they're effectively doing there is they're aggregating demand such that they can make a big commit to the cloud provider and then let people use smaller chunks of it. And that's kind of what we're doing with Replicate as well. We're aggregating demand such that we make big commits to the cloud providers. And you know, then people can run like a 100 millisecond API request on an A100.
Swyx [00:51:51]: So, you know, coming from a finance background, this sounds surprisingly similar to banks, where the job of a bank is maturity transformation, is what you call it. You take short term deposits, which technically can be withdrawn at any time, and you turn that into long term loans for mortgages and stuff, and you pocket the difference in interest. And that's the bank.
Ben [00:52:09]: Yeah, that's exactly what we're doing.
Swyx [00:52:11]: So you run a bank.
Ben [00:52:12]: Yeah, it's your bank. Right, yeah. And it's so much a finance problem as well, because we have to make bets on the future demand value of GPUs, yeah.
Swyx [00:52:21]: What are you... Okay, I don't know how much you can disclose, but what are you forecasting? Down? Up a lot? Yeah. Up 10x?
Ben [00:52:30]: I can't really. We're projecting our growth with some educated guesses about what kind of models are going to come out and what kind of models these will run, you know? We need to bet that like, okay, maybe language models are getting larger. So we need to like have GPUs with a lot of RAM, or like multi GPU nodes, or maybe models are getting smaller, and we actually need smaller GPUs, you know, we have to make some educated guesses about that kind of stuff, yeah.
Swyx [00:52:50]: Yeah. Speaking of which, the mixture of experts models must be throwing a spanner into the planning.
Ben [00:52:56]: Not so much. We've got like multi-node A100 machines, which can run those, and multi-node H100 machines, which can run those, no problem. So we're set up for that. Okay.
Swyx [00:53:04]: Right. I didn't expect it to be so easy. My impression was that the amount of RAM per model was increasing a lot, especially on a sort of per parameter basis, per active parameter basis, going from like mixed trial being eight experts to like the deep-seek MOE models, I don't know if you saw them, being like 30, 60 experts, and you can see it keep going up, I guess.
Ben [00:53:26]: Yeah. I think we might run into problems at some point, and yeah, I don't know exactly what's going on there. I think something that we're finding, which is kind of interesting, like I don't know this in depth, you know, we're certainly seeing a lot of good results from lower precision models. So like, you know, 90% of the performance with just like much less RAM required. That means that we can run them on GPUs we have available, and it's good for customers as well because it runs faster, and like they want that trade-off, you know, where it's just slightly worse, but like way faster and cheaper.
Alessio [00:53:55]: Do you see a lot of GPU waste in terms of people running the thing on a GPU that is like too advanced? I think we use C4 to run Whisper. So we're at the bottom end of it. Yeah. Any thoughts? I think one of the hackathons we were at, people were like, oh, how do I get access to like H100s? And it's like, you need to run like- Dude, you don't need H100s.
Ben [00:54:14]: You don't need H100s. Yeah. Yeah. Well, if you want low licensee, like sure, like spend a lot of money on the H100. Yeah. We see a ton of that kind of stuff. And it's surprisingly hard to optimize these models right now. So a lot of people are just running like really unoptimized models. We're doing the same, honestly. Like we're a lot of models on Replicate have just been like not been optimized very well. So something we want to like be able to help people with is optimizing those models. Like either we show people how to with guides or we make it easier to use some of these more optimized inference servers or we show people how to compile the models or we do that automatically or something like that. But that's only something we're exploring because there's so much wastage. Like it's not just wasting the GPUs. It's also like a bad experience and the models run slow. So the models on Replicate almost all pushed by our community. Like people have pushed those models themselves, but like it's like a big head of distribution where there's like a long tail of lots of models that people have pushed. And then like a big head of like the models most people run. So models like Llama 2, like Stable Diffusion, you know, we work with Meta and Stability to like maintain those models. And we've done a ton of optimization to make this really fast. So those models are optimized, but the long tail is not. And there's like a lot of wastage there.
Alessio [00:55:32]: And going into the, well, it's already the new year. Do you see the customer demand and the GPU like hardware demand kind of like staying together? Because I think a lot of people are saying, oh, there's like hundreds of thousands of GPUs being shipped this year. Like the crunch is going to be over, but you also have like millions of people that now care about using AI. You know, how do you see the two lines progressing? Are you seeing customer demand is going to outpace the GPU growth? Do you see them together? Do you see maybe a lot of this like model improvement work kind of helping alleviate
Ben [00:56:04]: that? That's a really good question. From our point of view, demand is not outpacing supply GPUs, like we have enough, from our point of view, we have enough GPUs to go around, but that might change for sure. Yeah.
Alessio [00:56:15]: That's a very nicely put way as a startup founder to respond.
Swyx [00:56:21]: So as your frame did more, it's like sort of picking the wrong box model, whereas yours is more about maybe the inference stack, if you can call it. Were you referencing VLLM? What other sort of techniques are you referencing? Also keeping in mind that when I talk to your competitors, and I don't know if we don't have to name any of them, but they are working on trying to optimize the kinds of models. Like they basically, they'll quantize their models for you with their special stack. So you basically use their versions of Llamatu, you use their versions of Mistral, and that's one way to approach it. I don't see it as the replicate DNA to do that because that would be like sort of, you would have to slap the replicate house brand on something, which I mean, just comment on any of that. What do you mean when you say optimize models?
Ben [00:57:05]: Things like quantizing the models, you can imagine a way that we could help people quantize their models if we want to. We've had success using inference servers like VLM and TRT LLM, and we're using those kind of things to serve language models. We've had success with things like AI templates, which compile the models, all of those kinds of things. And there's like some even really just boring things of just like making the code more efficient. Like when they're just writing some Python code, it's really easy to just write inefficient Python code. And there's like really boring things like that as well, but it's like a whole smash of things like that.
Swyx [00:57:40]: You will do that for a customer? Like you look at their code and-
Ben [00:57:43]: Yeah, we've certainly helped some of our customers be able to do that, some of the stuff. And a lot of the models on, like the popular models on replicate, we've like rewritten them to use that stuff as well. And like the stable diffusion that we run, for example, is compiled for the AI template to make it super fast. And it's all open source that you can see all of this stuff on GitHub, if you want to like see how we do it. But you can imagine ways that we could help people. It's almost like built into the Cog layer maybe, where we could help people like use these fast inference servers or use AI template to compile their models to make it faster. Whether it's like manual, semi-manual or automatic, we're not really sure, but that's something we want to explore because it benefits everyone.
Swyx [00:58:21]: And then on the competitive piece, there was a price war on Mixtral last year, this last December. As far as I can tell, you guys did not enter that war. You have Mixtral, but it's just regular pricing. I think also some of these players are probably losing money on their pricing. You don't have to say anything, but the break even is somewhere between 50 to 75 cents per million tokens served. How are you thinking about like just the overall competitiveness in the market? How should people choose when everyone's an API?
Ben [00:58:50]: So for Lama2 and Mistral, I think not mixed trial, I can't remember exactly. We have similar performance and similar price to some of these other services. We're not like bargain basement to some of the others, because to your point, we don't want to burn tons of money, but we're pricing it sensibly and sustainably to a point where we think it's competitive with other people such that we want developers using Replicate and we don't want to price it such that it's only affordable by big companies. We want to make it cheap enough such that the developers can afford it, but we also don't want the super cheap prices, because then it's almost like then your customers are hostile and the more customers you get, the worse it gets. So we're pricing it sensibly, but still to the point where hopefully it's cheap enough to build on. And I think the thing we really care about, like we want to, obviously we want models and Replicate to be comparable to other people. But I think the really crucial thing about Replicate and the way I think we think about it is that it's not just the API for them, particularly in open source, it's not just the API for the model that is the important bit. It's because quite often with open source models, like the whole point of open source is that you can tinker on it and you can customize it and you can fine tune it and you can like smush it together with another model, like Lava, for example. And you can't do that if it's just like a hosted API, because it's just like, you know, you can't touch the code. So what we want to do with Replicate is build a platform that's actually open. So like we've got all of these models where the performance and price is on par with everything else. But if you want to customize it, you can fine tune it, you can go to GitHub and get the source code for it and edit the source code and push up your own custom version and this kind of thing. Because that's the crucial thing for open source machine learning is be able to tinker on it and customizing it. And we think that's really important to make open source AI work.
Alessio [01:00:39]: You mentioned open source. How do you think about levels of openness? When Lama 2 came out, I wrote a post about this, about it's like open source and there's open weights, then there's restrictive weights. It was on the front page of Agornews. So there was like all sort of comments from people. So I'm always curious to hear your thoughts. Like what do you think it's okay for people to license? What's okay for people to not release?
Ben [01:01:03]: You know, before it was just like closed source, big models, open source, little models, you know, purely open source stuff. And we're now seeing like lots of variations where, you know, model companies putting restrictive licenses on their models, you know, that means it can only be used for non-commercial use, you know, and a lot of the, you know, open source crowd is complaining it's not true open source, you know, and all this kind of thing. And I think a lot of that is coming from philosophy, you know, like the sort of free software movement kind of philosophy. And I don't think it's necessarily a bad thing. I think it's good that model companies can make money out of their models. You know, that's like how this will incentivize people to make more models and this kind of thing. And I think it's totally fine if like somebody made something to ask for some money in return if you're making money out of it. And I think that's totally okay. And I think there's some really interesting like midpoints as well where people are releasing the codes, you can still tinker on it, but the person who trained the model still wants to get a cut of it if like you're making a bunch of money out of it. And I think that's good. And that's going to make like the ecosystem more sustainable. I don't think anybody's really figured it out yet. We're going to see like more experimentation with this and more people like try to figure out like what are the business models around building models and how can I make money out of this? And we'll just see where it ends up. And I think it's something we want to support as Replicate as well because we believe in open source. We think it's great, but there's also going to be lots of models which are closed source as well. And these companies might not be, there's probably going to be a long tail of a bunch of people building models that don't have the reach that OpenAI have. And hopefully as Replicate, we can help those people find developers and help them make money and that kind of thing.
Alessio [01:02:46]: I think the computer requirements of AI kind of changed the thing. I started an open source company. I'm a big open source fan. And before it was kind of man hours was really all that went into open source. It wasn't much monetary investment. Well, not that man hours are not worth a lot, but if you think about Llama 2, it's like $25 million, you know, like all in, it's like you can't just spin up a discord and like spend $25 million. So I think it's net positive for everybody that Llama 2 is open source and well, it's the open source, you know, it's the open source term. I think people like you're saying, it's like they kind of argue on the semantics of it, but like all we care about is that Llama 2 is open because if Llama 2 wasn't open source today, like that, if Mistral was not open source, we will be in a bad spot, you know?
Ben [01:03:33]: So, and I think the nuance here is making sure that these models are still tinkerable because the beautiful thing about Llama 2 as a base model is that like, yeah, it costs $25 million to train to start with, but then you can fine tune it for like 50 bucks. And that's what's so beautiful about the open source ecosystem. And something I think is really surprising as well, like completely surprised me. Like I think a lot of people assumed that it's not going to be open source machine learning. It's just not going to be practical because it's so expensive to train these models. But like fine tuning is unreasonably effective and people are getting really good results out of it and it's really cheap. So people can effectively create open source models really cheaply. And there's going to be like this sort of ecosystem of tons of models being made. And I think the risk there from a licensing point of view is we need to make sure that the licenses let people do that, because if you release a big model under a non-commercial license and people can't fine tune it, you've lost the magic of it being open. And I'm sure there are ways to structure that such that the person paying $25 million feels like they're compensated somehow and they can feel like they can, you know, they should keep on training models and people can keep on fine tuning it. But I guess we just have to figure out exactly how that plays out.
Swyx [01:04:46]: Excellent. So just wanted to round it out. You've been an excellent, very open. I should have started my intro with this, but I feel like you found the sort of AI engineer crew before I did. And, you know, something I really resonated with you in sort of the Series B announcement was that you put in some stats here about how there are two orders of magnitude more software engineers than there are machine learning engineers, about 30 million software engineers and 500,000 machine learning engineers. You can maybe plus minus one of those orders of magnitude, but it's around that ballpark. And so obviously there will be a lot more engineers than there will be ML engineers. How do you see this group? Like, is it all software engineers? Are they going to specialize? What would you advise someone trying to become an AI engineer? Is this a legitimate career path?
Ben [01:05:30]: Yeah, absolutely. I mean, it's very clear that AI is going to be a large part of how we build software in the future. Now, it's a bit like being a software developer in the 90s and ignoring the Internet. You know, you just need to you need to learn about this stuff. You need to figure this stuff out. I don't think it needs to be super low level. You don't need to be like, you know, the metaphor here is that you don't need to be digging down into like this sort of Pytorch level if you don't want to in the same way as a software engineer in the 90s. You don't need to be like understanding how network stacks work to be able to build a website, you know, but you need to understand the shape of this thing and how to hold it and what it's good at and what it's not. And that's really important. So, yeah, certainly just advise people to like just start playing around with it, get a feel of like how language models work, get a feel of like how these diffusion models work, get a feel of like what fine tuning is and how it works, because some of your job might be building datasets, you know, get a feeling of how prompting works, because some of your job might be writing a prompt. And those are just all really important skills to sort of figure out.
Swyx [01:06:36]: Yeah. Well, thanks for building the definitive platform for doing all that.
Ben [01:06:41]: Yeah, of course.
Alessio [01:06:42]: And if I know call to actions, who should come work at Replicate, anything for the audience?
Ben [01:06:47]: Yeah, well, I mean, we're hiring. If you click on jobs at the bottom of our Replicate.com, there's some jobs. And I just encourage you to like just like try out AI, even if you don't, even if you think you're not smart enough. Like the whole reason I started this company is because I was looking at the cool stuff that Andreas was making. Like Andreas is like a proper machine learning person with a PhD, you know, and I was like just like a, you know, a sort of lowly software engineer. I was like, you're doing really cool stuff and I want to be able to do that. And by us working together, you know, we've now made it accessible to dummies like me. And I just encourage anyone who's like wants to try this stuff out, just give it a try. I would also encourage people who are tool builders. Like the limiting factor now on AI is not like the technology, like the technology has made incredible advances and there's just so many incredible machine learning models that can do a ton of stuff. The limiting factor is just like making that accessible to people who build products, because it's really hard to use this stuff right now. And obviously we're building some of that stuff as Replicate, but there's just like a ton of other tooling and abstractions that need to be built out to make this stuff usable. So I just encourage people who like building developer tools to just like get stuck into it as well, because that's going to make this stuff accessible to everyone.
Swyx [01:07:58]: Yeah, I especially want to highlight you have a hacker in residence job opening available, which not every company has, which means just join you and hack stuff. I think Charlie Holtz is doing a fantastic job of that.
Ben [01:08:09]: Yeah, effectively. Like most of our, a lot of our job is just like showing people how to use AI. So we've just got a team of like software developers and people have kind of figured this stuff out who are writing about it, who are making videos about it, who are making example applications to show people what you can do with this stuff.
Swyx [01:08:26]: Yeah. In my world that used to be called DevRel, but now it's hacker in residence.
Ben [01:08:31]: And this came from Zeke, who's another one of our hackers.
Swyx [01:08:38]: Tell me this came from Chroma, because I want to start that one.
Ben [01:08:41]: We developed, like they, Antoine actually was like, hey, we came up with that first. But I think we came up with it independently, because the story behind this is we originally called it the DevRel team. Yeah. And DevRel's cursed now. Zeke was like, that sounds so boring. I want to go to someone and say I'm a developer relations person, or a developer advocate or something. So we were like, okay, what's the like, the way we can make this sound the most fun? All right, you're a hacker.
Swyx [01:09:10]: I would say like that is consistently the vibe I get from Replicate. Everyone on your team I interact with. When I go to your San Francisco office, like that's the vibe that you're generating. Like it's a hacker space more than an office. And you hold fantastic meetups there. And I think you're a really positive presence in our community. So thank you for doing all that. And it's instilling the hacker vibe and culture into AI.
Ben [01:09:31]: I'm really glad that I'm really glad that's working. Cool. That's a wrap.
Alessio [01:09:34]: I think. Thank you so much for coming on, man.
Ben [01:09:36]: Yeah, of course. Thank you. This is a lot of fun.

Get full access to Latent Space at www.latent.space/subscribe

28 February 2024, 6:04 pm
1 hour 2 minutes

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

We’re writing this one day after the monster release of OpenAI’s Sora and Gemini 1.5. We covered this on ‘s ThursdAI space, so head over there for our takes.
IRL: We’re ONE WEEK away from Latent Space: Final Frontiers, the second edition and anniversary of our first ever Latent Space event! Also: join us on June 25-27 for the biggest AI Engineer conference of the year!
Online: All three Discord clubs are thriving. Join us every Wednesday/Friday!
Almost 12 years ago, while working at Spotify, Erik Bernhardsson built one of the first open source vector databases, Annoy, based on ANN search. He also built Luigi, one of the predecessors to Airflow, which helps data teams orchestrate and execute data-intensive and long-running jobs. Surprisingly, he didn’t start yet another vector database company, but instead in 2021 founded Modal, the “high-performance cloud for developers”. In 2022 they opened doors to developers after their seed round, and in 2023 announced their GA with a $16m Series A.
More importantly, they have won fans among both household names like Ramp, Scale AI, Substack, and Cohere, and newer startups like (upcoming guest!) Suno.ai and individual hackers (Modal was the top tool of choice in the Vercel AI Accelerator):
We've covered the nuances of GPU workloads, and how we need new developer tooling and runtimes for them (see our episodes with Chris Lattner of Modular and George Hotz of tiny to start). In this episode, we run through the major limitations of the actual infrastructure behind the clouds that run these models, and how Erik envisions the “postmodern data stack”.
In his 2021 blog post “Software infrastructure 2.0: a wishlist”, Erik had “Truly serverless” as one of his points:
* The word cluster is an anachronism to an end-user in the cloud! I'm already running things in the cloud where there's elastic resources available at any time. Why do I have to think about the underlying pool of resources? Just maintain it for me.
* I don't ever want to provision anything in advance of load.
* I don't want to pay for idle resources. Just let me pay for whatever resources I'm actually using.
* Serverless doesn't mean it's a burstable VM that saves its instance state to disk during periods of idle.
Swyx called this Self Provisioning Runtimes back in the day. Modal doesn’t put you in YAML hell, preferring to colocate infra provisioning right next to the code that utilizes it, so you can just add GPU (and disk, and retries…):
After 3 years, we finally have a big market push for this: running inference on generative models is going to be the killer app for serverless, for a few reasons:
* AI models are stateless: even in conversational interfaces, each message generation is a fully-contained request to the LLM. There’s no knowledge that is stored in the model itself between messages, which means that tear down / spin up of resources doesn’t create any headaches with maintaining state.
* Token-based pricing is better aligned with serverless infrastructure than fixed monthly costs of traditional software.
* GPU scarcity makes it really expensive to have reserved instances that are available to you 24/7. It’s much more convenient to build with a serverless-like infrastructure.
In the episode we covered a lot more topics like maximizing GPU utilization, why Oracle Cloud rocks, and how Erik has never owned a TV in his life. Enjoy!
Show Notes
* Modal
* ErikBot
* Erik’s Blog
* Software Infra 2.0 Wishlist
* Luigi
* Annoy
* Hetzner
* CoreWeave
* Cloudflare FaaS
* Poolside AI
* Modular Inference Engine
Chapters
* [00:00:00] Introductions
* [00:02:00] Erik's OSS work at Spotify: Annoy and Luigi
* [00:06:22] Starting Modal
* [00:07:54] Vision for a "postmodern data stack"
* [00:10:43] Solving container cold start problems
* [00:12:57] Designing Modal's Python SDK
* [00:15:18] Self-Revisioning Runtime
* [00:19:14] Truly Serverless Infrastructure
* [00:20:52] Beyond model inference
* [00:22:09] Tricks to maximize GPU utilization
* [00:26:27] Differences in AI and data science workloads
* [00:28:08] Modal vs Replicate vs Modular and lessons from Heroku's "graduation problem"
* [00:34:12] Creating Erik's clone "ErikBot"
* [00:37:43] Enabling massive parallelism across thousands of GPUs
* [00:39:45] The Modal Sandbox for agents
* [00:43:51] Thoughts on the AI Inference War
* [00:49:18] Erik's best tweets
* [00:51:57] Why buying hardware is a waste of money
* [00:54:18] Erik's competitive programming backgrounds
* [00:59:02] Why does Sweden have the best Counter Strike players?
* [00:59:53] Never owning a car or TV
* [01:00:21] Advice for infrastructure startups
Transcript
Alessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO-in-Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.
Swyx [00:00:14]: Hey, and today we have in the studio Erik Bernhardsson from Modal. Welcome.
Erik [00:00:19]: Hi. It's awesome being here.
Swyx [00:00:20]: Yeah. Awesome seeing you in person. I've seen you online for a number of years as you were building on Modal and I think you're just making a San Francisco trip just to see people here, right? I've been to like two Modal events in San Francisco here.
Erik [00:00:34]: Yeah, that's right. We're based in New York, so I figured sometimes I have to come out to capital of AI and make a presence.
Swyx [00:00:40]: What do you think is the pros and cons of building in New York?
Erik [00:00:45]: I mean, I never built anything elsewhere. I lived in New York the last 12 years. I love the city. Obviously, there's a lot more stuff going on here and there's a lot more customers and that's why I'm out here. I do feel like for me, where I am in life, I'm a very boring person. I kind of work hard and then I go home and hang out with my kids. I don't have time to go to events and meetups and stuff anyway. In that sense, New York is kind of nice. I walk to work every morning. It's like five minutes away from my apartment. It's very time efficient in that sense. Yeah.
Swyx [00:01:10]: Yeah. It's also a good life. So we'll do a brief bio and then we'll talk about anything else that people should know about you. Actually, I was surprised to find out you're from Sweden. You went to college in KTH and your master's was in implementing a scalable music recommender system. Yeah.
Erik [00:01:27]: I had no idea. Yeah. So I actually studied physics, but I grew up coding and I did a lot of programming competition and then as I was thinking about graduating, I got in touch with an obscure music streaming startup called Spotify, which was then like 30 people. And for some reason, I convinced them, why don't I just come and write a master's thesis with you and I'll do some cool collaborative filtering, despite not knowing anything about collaborative filtering really. But no one knew anything back then. So I spent six months at Spotify basically building a prototype of a music recommendation system and then turned that into a master's thesis. And then later when I graduated, I joined Spotify full time.
Swyx [00:02:00]: So that was the start of your data career. You also wrote a couple of popular open source tooling while you were there. Is that correct?
Erik [00:02:09]: No, that's right. I mean, I was at Spotify for seven years, so this is a long stint. And Spotify was a wild place early on and I mean, data space is also a wild place. I mean, it was like Hadoop cluster in the like foosball room on the floor. It was a lot of crude, like very basic infrastructure and I didn't know anything about it. And like I was hired to kind of figure out data stuff. And I started hacking on a recommendation system and then, you know, got sidetracked in a bunch of other stuff. I fixed a bunch of reporting things and set up A-B testing and started doing like business analytics and later got back to music recommendation system. And a lot of the infrastructure didn't really exist. Like there was like Hadoop back then, which is kind of bad and I don't miss it. But I spent a lot of time with that. As a part of that, I ended up building a workflow engine called Luigi, which is like briefly like somewhat like widely ended up being used by a bunch of companies. Sort of like, you know, kind of like Airflow, but like before Airflow. I think it did some things better, some things worse. I also built a vector database called Annoy, which is like for a while, it was actually quite widely used. In 2012, so it was like way before like all this like vector database stuff ended up happening. And funny enough, I was actually obsessed with like vectors back then. Like I was like, this is going to be huge. Like just give it like a few years. I didn't know it was going to take like nine years and then there's going to suddenly be like 20 startups doing vector databases in one year. So it did happen. In that sense, I was right. I'm glad I didn't start a startup in the vector database space. I would have started way too early. But yeah, that was, yeah, it was a fun seven years as part of it. It was a great culture, a great company.
Swyx [00:03:32]: Yeah. Just to take a quick tangent on this vector database thing, because we probably won't revisit it but like, has anything architecturally changed in the last nine years?
Erik [00:03:41]: I'm actually not following it like super closely. I think, you know, some of the best algorithms are still the same as like hierarchical navigable small world.
Swyx [00:03:51]: Yeah. HNSW.
Erik [00:03:52]: Exactly. I think now there's like product quantization, there's like some other stuff that I haven't really followed super closely. I mean, obviously, like back then it was like, you know, it's always like very simple. It's like a C++ library with Python bindings and you could mmap big files and into memory and like they had some lookups. I used like this kind of recursive, like hyperspace splitting strategy, which is not that good, but it sort of was good enough at that time. But I think a lot of like HNSW is still like what people generally use. Now of course, like databases are much better in the sense like to support like inserts and updates and stuff like that. I know I never supported that. Yeah, it's sort of exciting to finally see like vector databases becoming a thing.
Swyx [00:04:30]: Yeah. Yeah. And then maybe one takeaway on most interesting lesson from Daniel Ek?
Erik [00:04:36]: I mean, I think Daniel Ek, you know, he started Spotify very young. Like he was like 25, something like that. And that was like a good lesson. But like he, in a way, like I think he was a very good leader. Like there was never anything like, no scandals or like no, he wasn't very eccentric at all. It was just kind of like very like level headed, like just like ran the company very well, like never made any like obvious mistakes or I think it was like a few bets that maybe like in hindsight were like a little, you know, like took us, you know, too far in one direction or another. But overall, I mean, I think he was a great CEO, like definitely, you know, up there, like generational CEO, at least for like Swedish startups.
Swyx [00:05:09]: Yeah, yeah, for sure. Okay, we should probably move to make our way towards Modal. So then you spent six years as CTO of Better. You were an early engineer and then you scaled up to like 300 engineers.
Erik [00:05:21]: I joined as a CTO when there was like no tech team. And yeah, that was a wild chapter in my life. Like the company did very well for a while. And then like during the pandemic, yeah, it was kind of a weird story, but yeah, it kind of collapsed.
Swyx [00:05:32]: Yeah, laid off people poorly.
Erik [00:05:34]: Yeah, yeah. It was like a bunch of stories. Yeah. I mean, the company like grew from like 10 people when I joined at 10,000, now it's back to a thousand. But yeah, they actually went public a few months ago, kind of crazy. They're still around, like, you know, they're still, you know, doing stuff. So yeah, very kind of interesting six years of my life for non-technical reasons, like I managed like three, four hundred, but yeah, like learning a lot of that, like recruiting. I spent all my time recruiting and stuff like that. And so managing at scale, it's like nice, like now in a way, like when I'm building my own startup. It's actually something I like, don't feel nervous about at all. Like I've managed a scale, like I feel like I can do it again. It's like very different things that I'm nervous about as a startup founder. But yeah, I started Modal three years ago after sort of, after leaving Better, I took a little bit of time off during the pandemic and, but yeah, pretty quickly I was like, I got to build something. I just want to, you know. Yeah. And then yeah, Modal took form in my head, took shape.
Swyx [00:06:22]: And as far as I understand, and maybe we can sort of trade off questions. So the quick history is started Modal in 2021, got your seed with Sarah from Amplify in 2022. You just announced your Series A with Redpoint. That's right. And that brings us up to mostly today. Yeah. Most people, I think, were expecting you to build for the data space.
Erik: But it is the data space.
Swyx:: When I think of data space, I come from like, you know, Snowflake, BigQuery, you know, Fivetran, Nearby, that kind of stuff. And what Modal became is more general purpose than that. Yeah.
Erik [00:06:53]: Yeah. I don't know. It was like fun. I actually ran into like Edo Liberty, the CEO of Pinecone, like a few weeks ago. And he was like, I was so afraid you were building a vector database. No, I started Modal because, you know, like in a way, like I work with data, like throughout my most of my career, like every different part of the stack, right? Like I thought everything like business analytics to like deep learning, you know, like building, you know, training neural networks, the scale, like everything in between. And so one of the thoughts, like, and one of the observations I had when I started Modal or like why I started was like, I just wanted to make, build better tools for data teams. And like very, like sort of abstract thing, but like, I find that the data stack is, you know, full of like point solutions that don't integrate well. And still, when you look at like data teams today, you know, like every startup ends up building their own internal Kubernetes wrapper or whatever. And you know, all the different data engineers and machine learning engineers end up kind of struggling with the same things. So I started thinking about like, how do I build a new data stack, which is kind of a megalomaniac project, like, because you kind of want to like throw out everything and start over.
Swyx [00:07:54]: It's almost a modern data stack.
Erik [00:07:55]: Yeah, like a postmodern data stack. And so I started thinking about that. And a lot of it came from like, like more focused on like the human side of like, how do I make data teams more productive? And like, what is the technology tools that they need? And like, you know, drew out a lot of charts of like, how the data stack looks, you know, what are different components. And it shows actually very interesting, like workflow scheduling, because it kind of sits in like a nice sort of, you know, it's like a hub in the graph of like data products. But it was kind of hard to like, kind of do that in a vacuum, and also to monetize it to some extent. I got very interested in like the layers below at some point. And like, at the end of the day, like most people have code to have to run somewhere. So I think about like, okay, well, how do you make that nice? Like how do you make that? And in particular, like the thing I always like thought about, like developer productivity is like, I think the best way to measure developer productivity is like in terms of the feedback loops, like how quickly when you iterate, like when you write code, like how quickly can you get feedback. And at the innermost loop, it's like writing code and then running it. And like, as soon as you start working with the cloud, like it's like takes minutes suddenly, because you have to build a Docker container and push it to the cloud and like run it, you know. So that was like the initial focus for me was like, I just want to solve that problem. Like I want to, you know, build something less, you run things in the cloud and like retain the sort of, you know, the joy of productivity as when you're running things locally. And in particular, I was quite focused on data teams, because I think they had a couple unique needs that wasn't well served by the infrastructure at that time, or like still is in like, in particular, like Kubernetes, I feel like it's like kind of worked okay for back end teams, but not so well for data teams. And very quickly, I got sucked into like a very deep like rabbit hole of like...
Swyx [00:09:24]: Not well for data teams because of burstiness. Yeah, for sure.
Erik [00:09:26]: So like burstiness is like one thing, right? Like, you know, like you often have this like fan out, you want to like apply some function over very large data sets. Another thing tends to be like hardware requirements, like you need like GPUs and like, I've seen this in many companies, like you go, you know, data scientists go to a platform team and they're like, can we add GPUs to the Kubernetes? And they're like, no, like, that's, you know, complex, and we're not gonna, so like just getting GPU access. And then like, I mean, I also like data code, like frankly, or like machine learning code like tends to be like, super annoying in terms of like environments, like you end up having like a lot of like custom, like containers and like environment conflicts. And like, it's very hard to set up like a unified container that like can serve like a data scientist, because like, there's always like packages that break. And so I think there's a lot of different reasons why the technology wasn't well suited for back end. And I think the attitude at that time is often like, you know, like you had friction between the data team and the platform team, like, well, it works for the back end stuff, you know, why don't you just like, you know, make it work. But like, I actually felt like data teams, you know, or at this point now, like there's so much, so many people working with data, and like they, to some extent, like deserve their own tools and their own tool chains, and like optimizing for that is not something people have done. So that's, that's sort of like very abstract philosophical reason why I started Model. And then, and then I got sucked into this like rabbit hole of like container cold start and, you know, like whatever, Linux, page cache, you know, file system optimizations.
Swyx [00:10:43]: Yeah, tell people, I think the first time I met you, I think you told me some numbers, but I don't remember, like, what are the main achievements that you were unhappy with the status quo? And then you built your own container stack?
Erik [00:10:52]: Yeah, I mean, like, in particular, it was like, in order to have that loop, right? You want to be able to start, like take code on your laptop, whatever, and like run in the cloud very quickly, and like running in custom containers, and maybe like spin up like 100 containers, 1000, you know, things like that. And so container cold start was the initial like, from like a developer productivity point of view, it was like, really, what I was focusing on is, I want to take code, I want to stick it in container, I want to execute in the cloud, and like, you know, make it feel like fast. And when you look at like, how Docker works, for instance, like Docker, you have this like, fairly convoluted, like very resource inefficient way, they, you know, you build a container, you upload the whole container, and then you download it, and you run it. And Kubernetes is also like, not very fast at like starting containers. So like, I started kind of like, you know, going a layer deeper, like Docker is actually like, you know, there's like a couple of different primitives, but like a lower level primitive is run C, which is like a container runner. And I was like, what if I just take the container runner, like run C, and I point it to like my own root file system, and then I built like my own virtual file system that exposes files over a network instead. And that was like the sort of very crude version of model, it's like now I can actually start containers very quickly, because it turns out like when you start a Docker container, like, first of all, like most Docker images are like several gigabytes, and like 99% of that is never going to be consumed, like there's a bunch of like, you know, like timezone information for like Uzbekistan, like no one's going to read it. And then there's a very high overlap between the files are going to be read, there's going to be like lib torch or whatever, like it's going to be read. So you can also cache it very well. So that was like the first sort of stuff we started working on was like, let's build this like container file system. And you know, coupled with like, you know, just using run C directly. And that actually enabled us to like, get to this point of like, you write code, and then you can launch it in the cloud within like a second or two, like something like that. And you know, there's been many optimizations since then, but that was sort of starting point.
Alessio [00:12:33]: Can we talk about the developer experience as well, I think one of the magic things about Modal is at the very basic layers, like a Python function decorator, it's just like stub and whatnot. But then you also have a way to define a full container, what were kind of the design decisions that went into it? Where did you start? How easy did you want it to be? And then maybe how much complexity did you then add on to make sure that every use case fit?
Erik [00:12:57]: I mean, Modal, I almost feel like it's like almost like two products kind of glued together. Like there's like the low level like container runtime, like file system, all that stuff like in Rust. And then there's like the Python SDK, right? Like how do you express applications? And I think, I mean, Swix, like I think your blog was like the self-provisioning runtime was like, to me, always like to sort of, for me, like an eye-opening thing. It's like, so I didn't think about like...
Swyx [00:13:15]: You wrote your post four months before me. Yeah? The software 2.0, Infra 2.0. Yeah.
Erik [00:13:19]: Well, I don't know, like convergence of minds. I guess we were like both thinking. Maybe you put, I think, better words than like, you know, maybe something I was like thinking about for a long time. Yeah.
Swyx [00:13:29]: And I can tell you how I was thinking about it on my end, but I want to hear you say it.
Erik [00:13:32]: Yeah, yeah, I would love to. So to me, like what I always wanted to build was like, I don't know, like, I don't know if you use like Pulumi. Like Pulumi is like nice, like in the sense, like it's like Pulumi is like you describe infrastructure in code, right? And to me, that was like so nice. Like finally I can like, you know, put a for loop that creates S3 buckets or whatever. And I think like Modal sort of goes one step further in the sense that like, what if you also put the app code inside the infrastructure code and like glue it all together and then like you only have one single place that defines everything and it's all programmable. You don't have any config files. Like Modal has like zero config. There's no config. It's all code. And so that was like the goal that I wanted, like part of that. And then the other part was like, I often find that so much of like my time was spent on like the plumbing between containers. And so my thing was like, well, if I just build this like Python SDK and make it possible to like bridge like different containers, just like a function call, like, and I can say, oh, this function runs in this container and this other function runs in this container and I can just call it just like a normal function, then, you know, I can build these applications that may span a lot of different environments. Maybe they fan out, start other containers, but it's all just like inside Python. You just like have this beautiful kind of nice like DSL almost for like, you know, how to control infrastructure in the cloud. So that was sort of like how we ended up with the Python SDK as it is, which is still evolving all the time, by the way. We keep changing syntax quite a lot because I think it's still somewhat exploratory, but we're starting to converge on something that feels like reasonably good now.
Swyx [00:14:54]: Yeah. And along the way you, with this expressiveness, you enabled the ability to, for example, attach a GPU to a function. Totally.
Erik [00:15:02]: Yeah. It's like you just like say, you know, on the function decorator, you're like GPU equals, you know, A100 and then or like GPU equals, you know, A10 or T4 or something like that. And then you get that GPU and like, you know, you just run the code and it runs like you don't have to, you know, go through hoops to, you know, start an EC2 instance or whatever.
Swyx [00:15:18]: Yeah. So it's all code. Yeah. So one of the reasons I wrote Self-Revisioning Runtimes was I was working at AWS and we had AWS CDK, which is kind of like, you know, the Amazon basics blew me. Yeah, totally. And then, and then like it creates, it compiles the cloud formation. Yeah. And then on the other side, you have to like get all the config stuff and then put it into your application code and make sure that they line up. So then you're writing code to define your infrastructure, then you're writing code to define your application. And I was just like, this is like obvious that it's going to converge, right? Yeah, totally.
Erik [00:15:48]: But isn't there like, it might be wrong, but like, was it like SAM or Chalice or one of those? Like, isn't that like an AWS thing that where actually they kind of did that? I feel like there's like one.
Swyx [00:15:57]: SAM. Yeah. Still very clunky. It's not, not as elegant as modal.
Erik [00:16:03]: I love AWS for like the stuff it's built, you know, like historically in order for me to like, you know, what it enables me to build, but like AWS is always like struggle with developer experience.
Swyx [00:16:11]: I mean, they have to not break things.
Erik [00:16:15]: Yeah. Yeah. And totally. And they have to build products for a very wide range of use cases. And I think that's hard.
Swyx [00:16:21]: Yeah. Yeah. So it's, it's easier to design for. Yeah. So anyway, I was, I was pretty convinced that this, this would happen. I wrote, wrote that thing. And then, you know, I imagine my surprise that you guys had it on your landing page at some point. I think, I think Akshad was just like, just throw that in there.
Erik [00:16:34]: Did you trademark it?
Swyx [00:16:35]: No, I didn't. But I definitely got sent a few pitch decks with my post on there and it was like really interesting. This is my first time like kind of putting a name to a phenomenon. And I think this is a useful skill for people to just communicate what they're trying to do.
Erik [00:16:48]: Yeah. No, I think it's a beautiful concept.
Swyx [00:16:50]: Yeah. Yeah. Yeah. But I mean, obviously you implemented it. What became more clear in your explanation today is that actually you're not that tied to Python.
Erik [00:16:57]: No. I mean, I, I think that all the like lower level stuff is, you know, just running containers and like scheduling things and, you know, serving container data and stuff. So like one of the benefits of data teams is obviously like they're all like using Python, right? And so that made it a lot easier. I think, you know, if we had focused on other workloads, like, you know, for various reasons, we've like been kind of like half thinking about like CI or like things like that. But like, in a way that's like harder because like you also, then you have to be like, you know, multiple SDKs, whereas, you know, focusing on data teams, you can only, you know, Python like covers like 95% of all teams. That made it a lot easier. But like, I mean, like definitely like in the future, we're going to have others support, like supporting other languages. JavaScript for sure is the obvious next language. But you know, who knows, like, you know, Rust, Go, R, whatever, PHP, Haskell, I don't know.
Swyx [00:17:42]: You know, I think for me, I actually am a person who like kind of liked the idea of programming language advancements being improvements in developer experience. But all I saw out of the academic sort of PLT type people is just type level improvements. And I always think like, for me, like one of the core reasons for self-provisioning runtimes and then why I like Modal is like, this is actually a productivity increase, right? Like, it's a language level thing, you know, you managed to stick it on top of an existing language, but it is your own language, a DSL on top of Python. And so language level increase on the order of like automatic memory management. You know, you could sort of make that analogy that like, maybe you lose some level of control, but most of the time you're okay with whatever Modal gives you. And like, that's fine. Yeah.
Erik [00:18:26]: Yeah. Yeah. I mean, that's how I look at about it too. Like, you know, you look at developer productivity over the last number of decades, like, you know, it's come in like small increments of like, you know, dynamic typing or like is like one thing because not suddenly like for a lot of use cases, you don't need to care about type systems or better compiler technology or like, you know, the cloud or like, you know, relational databases. And, you know, I think, you know, you look at like that, you know, history, it's a steadily, you know, it's like, you know, you look at the developers have been getting like probably 10X more productive every decade for the last four decades or something that was kind of crazy. Like on an exponential scale, we're talking about 10X or is there a 10,000X like, you know, improvement in developer productivity. What we can build today, you know, is arguably like, you know, a fraction of the cost of what it took to build it in the eighties. Maybe it wasn't even possible in the eighties. So that to me, like, that's like so fascinating. I think it's going to keep going for the next few decades. Yeah.
Alessio [00:19:14]: Yeah. Another big thing in the infra 2.0 wishlist was truly serverless infrastructure. The other on your landing page, you called them native cloud functions, something like that. I think the issue I've seen with serverless has always been people really wanted it to be stateful, even though stateless was much easier to do. And I think now with AI, most model inference is like stateless, you know, outside of the context. So that's kind of made it a lot easier to just put a model, like an AI model on model to run. How do you think about how that changes how people think about infrastructure too? Yeah.
Erik [00:19:48]: I mean, I think model is definitely going in the direction of like doing more stateful things and working with data and like high IO use cases. I do think one like massive serendipitous thing that happened like halfway, you know, a year and a half into like the, you know, building model was like Gen AI started exploding and the IO pattern of Gen AI is like fits the serverless model like so well, because it's like, you know, you send this tiny piece of information, like a prompt, right, or something like that. And then like you have this GPU that does like trillions of flops, and then it sends back like a tiny piece of information, right. And that turns out to be something like, you know, if you can get serverless working with GPU, that just like works really well, right. So I think from that point of view, like serverless always to me felt like a little bit of like a solution looking for a problem. I don't actually like don't think like backend is like the problem that needs to serve it or like not as much. But I look at data and in particular, like things like Gen AI, like model inference, like it's like clearly a good fit. So I think that is, you know, to a large extent explains like why we saw, you know, the initial sort of like killer app for model being model inference, which actually wasn't like necessarily what we're focused on. But that's where we've seen like by far the most usage. Yeah.
Swyx [00:20:52]: And this was before you started offering like fine tuning of language models, it was mostly stable diffusion. Yeah.
Erik [00:20:59]: Yeah. I mean, like model, like I always built it to be a very general purpose compute platform, like something where you can run everything. And I used to call model like a better Kubernetes for data team for a long time. What we realized was like, yeah, that's like, you know, a year and a half in, like we barely had any users or any revenue. And like we were like, well, maybe we should look at like some use case, trying to think of use case. And that was around the same time stable diffusion came out. And the beauty of model is like you can run almost anything on model, right? Like model inference turned out to be like the place where we found initially, well, like clearly this has like 10x like better agronomics than anything else. But we're also like, you know, going back to my original vision, like we're thinking a lot about, you know, now, okay, now we do inference really well. Like what about training? What about fine tuning? What about, you know, end-to-end lifecycle deployment? What about data pre-processing? What about, you know, I don't know, real-time streaming? What about, you know, large data munging, like there's just data observability. I think there's so many things, like kind of going back to what I said about like redefining the data stack, like starting with the foundation of compute. Like one of the exciting things about model is like we've sort of, you know, we've been working on that for three years and it's maturing, but like this is so many things you can do like with just like a better compute primitive and also go up to stack and like do all this other stuff on top of it.
Alessio [00:22:09]: How do you think about or rather like I would love to learn more about the underlying infrastructure and like how you make that happen because with fine tuning and training, it's a static memory. Like you exactly know what you're going to load in memory one and it's kind of like a set amount of compute versus inference, just like data is like very bursty. How do you make batches work with a serverless developer experience? You know, like what are like some fun technical challenge you solve to make sure you get max utilization on these GPUs? What we hear from people is like, we have GPUs, but we can really only get like, you know, 30, 40, 50% maybe utilization. What's some of the fun stuff you're working on to get a higher number there?
Erik [00:22:48]: Yeah, I think on the inference side, like that's where we like, you know, like from a cost perspective, like utilization perspective, we've seen, you know, like very good numbers and in particular, like it's our ability to start containers and stop containers very quickly. And that means that we can auto scale extremely fast and scale down very quickly, which means like we can always adjust the sort of capacity, the number of GPUs running to the exact traffic volume. And so in many cases, like that actually leads to a sort of interesting thing where like we obviously run our things on like the public cloud, like AWS GCP, we run on Oracle, but in many cases, like users who do inference on those platforms or those clouds, even though we charge a slightly higher price per GPU hour, a lot of users like moving their large scale inference use cases to model, they end up saving a lot of money because we only charge for like with the time the GPU is actually running. And that's a hard problem, right? Like, you know, if you have to constantly adjust the number of machines, if you have to start containers, stop containers, like that's a very hard problem. Starting containers quickly is a very difficult thing. I mentioned we had to build our own file system for this. We also, you know, built our own container scheduler for that. We've implemented recently CPU memory checkpointing so we can take running containers and snapshot the entire CPU, like including registers and everything, and restore it from that point, which means we can restore it from an initialized state. We're looking at GPU checkpointing next, it's like a very interesting thing. So I think with inference stuff, that's where serverless really shines because you can drive, you know, you can push the frontier of latency versus utilization quite substantially, you know, which either ends up being a latency advantage or a cost advantage or both, right? On training, it's probably arguably like less of an advantage doing serverless, frankly, because you know, you can just like spin up a bunch of machines and try to satisfy, like, you know, train as much as you can on each machine. For that area, like we've seen, like, you know, arguably like less usage, like for modal, but there are always like some interesting use case. Like we do have a couple of customers, like RAM, for instance, like they do fine tuning with modal and they basically like one of the patterns they have is like very bursty type fine tuning where they fine tune 100 models in parallel. And that's like a separate thing that modal does really well, right? Like you can, we can start up 100 containers very quickly, run a fine tuning training job on each one of them for that only runs for, I don't know, 10, 20 minutes. And then, you know, you can do hyper parameter tuning in that sense, like just pick the best model and things like that. So there are like interesting training. I think when you get to like training, like very large foundational models, that's a use case we don't support super well, because that's very high IO, you know, you need to have like infinite band and all these things. And those are things we haven't supported yet and might take a while to get to that. So that's like probably like an area where like we're relatively weak in. Yeah.
Alessio [00:25:12]: Have you cared at all about lower level model optimization? There's other cloud providers that do custom kernels to get better performance or are you just given that you're not just an AI compute company? Yeah.
Erik [00:25:24]: I mean, I think like we want to support like a generic, like general workloads in a sense that like we want users to give us a container essentially or a code or code. And then we want to run that. So I think, you know, we benefit from those things in the sense that like we can tell our users, you know, to use those things. But I don't know if we want to like poke into users containers and like do those things automatically. That's sort of, I think a little bit tricky from the outside to do, because we want to be able to take like arbitrary code and execute it. But certainly like, you know, we can tell our users to like use those things. Yeah.
Swyx [00:25:53]: I may have betrayed my own biases because I don't really think about modal as for data teams anymore. I think you started, I think you're much more for AI engineers. My favorite anecdotes, which I think, you know, but I don't know if you directly experienced it. I went to the Vercel AI Accelerator, which you supported. And in the Vercel AI Accelerator, a bunch of startups gave like free credits and like signups and talks and all that stuff. The only ones that stuck are the ones that actually appealed to engineers. And the top usage, the top tool used by far was modal.
Erik [00:26:24]: That's awesome.
Swyx [00:26:25]: For people building with AI apps. Yeah.
Erik [00:26:27]: I mean, it might be also like a terminology question, like the AI versus data, right? Like I've, you know, maybe I'm just like old and jaded, but like, I've seen so many like different titles, like for a while it was like, you know, I was a data scientist and a machine learning engineer and then, you know, there was like analytics engineers and there was like an AI engineer, you know? So like, to me, it's like, I just like in my head, that's to me just like, just data, like, or like engineer, you know, like I don't really, so that's why I've been like, you know, just calling it data teams. But like, of course, like, you know, AI is like, you know, like such a massive fraction of our like workloads.
Swyx [00:26:59]: It's a different Venn diagram of things you do, right? So the stuff that you're talking about where you need like infinite bands for like highly parallel training, that's not, that's more of the ML engineer, that's more of the research scientist and less of the AI engineer, which is more sort of trying to put, work at the application.
Erik [00:27:16]: Yeah. I mean, to be fair to it, like we have a lot of users that are like doing stuff that I don't think fits neatly into like AI. Like we have a lot of people using like modal for web scraping, like it's kind of nice. You can just like, you know, fire up like a hundred or a thousand containers running Chromium and just like render a bunch of webpages and it takes, you know, whatever. Or like, you know, protein folding is that, I mean, maybe that's, I don't know, like, but like, you know, we have a bunch of users doing that or, or like, you know, in terms of, in the realm of biotech, like sequence alignment, like people using, or like a couple of people using like modal to run like large, like mixed integer programming problems, like, you know, using Gurobi or like things like that. So video processing is another thing that keeps coming up, like, you know, let's say you have like petabytes of video and you want to just like transcode it, like, or you can fire up a lot of containers and just run FFmpeg or like, so there are those things too. Like, I mean, like that being said, like AI is by far our biggest use case, but you know, like, again, like modal is kind of general purpose in that sense.
Swyx [00:28:08]: Yeah. Well, maybe I'll stick to the stable diffusion thing and then we'll move on to the other use cases for AI that you want to highlight. The other big player in my mind is replicate. Yeah. In this, in this era, they're much more, I guess, custom built for that purpose, whereas you're more general purpose. How do you position yourself with them? Are they just for like different audiences or are you just heads on competing?
Erik [00:28:29]: I think there's like a tiny sliver of the Venn diagram where we're competitive. And then like 99% of the area we're not competitive. I mean, I think for people who, if you look at like front-end engineers, I think that's where like really they found good fit is like, you know, people who built some cool web app and they want some sort of AI capability and they just, you know, an off the shelf model is like perfect for them. That's like, I like use replicate. That's great. I think where we shine is like custom models or custom workflows, you know, running things at very large scale. We need to care about utilization, care about costs. You know, we have much lower prices because we spend a lot more time optimizing our infrastructure, you know, and that's where we're competitive, right? Like, you know, and you look at some of the use cases, like Suno is a big user, like they're running like large scale, like AI. Oh, we're talking with Mikey.
Swyx [00:29:12]: Oh, that's great. Cool.
Erik [00:29:14]: In a month. Yeah. So, I mean, they're, they're using model for like production infrastructure. Like they have their own like custom model, like custom code and custom weights, you know, for AI generated music, Suno.AI, you know, that, that, those are the types of use cases that we like, you know, things that are like very custom or like, it's like, you know, and those are the things like it's very hard to run and replicate, right? And that's fine. Like I think they, they focus on a very different part of the stack in that sense.
Swyx [00:29:35]: And then the other company pattern that I pattern match you to is Modular. I don't know.
Erik [00:29:40]: Because of the names?
Swyx [00:29:41]: No, no. Wow. No, but yeah, yes, the name is very similar. I think there's something that might be insightful there from a linguistics point of view. Oh no, they have Mojo, the sort of Python SDK. And they have the Modular Inference Engine, which is their sort of their cloud stack, their sort of compute inference stack. I don't know if anyone's made that comparison to you before, but like I see you evolving a little bit in parallel there.
Erik [00:30:01]: No, I mean, maybe. Yeah. Like it's not a company I'm like super like familiar, like, I mean, I know the basics, but like, I guess they're similar in the sense like they want to like do a lot of, you know, they have sort of big picture vision.
Swyx [00:30:12]: Yes. They also want to build very general purpose. Yeah. So they're marketing themselves as like, if you want to do off the shelf stuff, go out, go somewhere else. If you want to do custom stuff, we're the best place to do it. Yeah. Yeah. There is some overlap there. There's not overlap in the sense that you are a closed source platform. People have to host their code on you. That's true. Whereas for them, they're very insistent on not running their own cloud service. They're a box software. Yeah. They're licensed software.
Erik [00:30:37]: I'm sure their VCs at some point going to force them to reconsider. No, no.
Swyx [00:30:40]: Chris is very, very insistent and very convincing. So anyway, I would just make that comparison, let people make the links if they want to. But it's an interesting way to see the cloud market develop from my point of view, because I came up in this field thinking cloud is one thing, and I think your vision is like something slightly different, and I see the different takes on it.
Erik [00:31:00]: Yeah. And like one thing I've, you know, like I've written a bit about it in my blog too, it's like I think of us as like a second layer of cloud provider in the sense that like I think Snowflake is like kind of a good analogy. Like Snowflake, you know, is infrastructure as a service, right? But they actually run on the like major clouds, right? And I mean, like you can like analyze this very deeply, but like one of the things I always thought about is like, why does Snowflake arbitrarily like win over Redshift? And I think Snowflake, you know, to me, one, because like, I mean, in the end, like AWS makes all the money anyway, like and like Snowflake just had the ability to like focus on like developer experience or like, you know, user experience. And to me, like really proved that you can build a cloud provider, a layer up from, you know, the traditional like public clouds. And in that layer, that's also where I would put Modal, it's like, you know, we're building a cloud provider, like we're, you know, we're like a multi-tenant environment that runs the user code. But we're also building on top of the public cloud. So I think there's a lot of room in that space, I think is very sort of interesting direction.
Alessio [00:31:55]: How do you think of that compared to the traditional past history, like, you know, you had AWS, then you had Heroku, then you had Render, Railway.
Erik [00:32:04]: Yeah, I mean, I think those are all like great. I think the problem that they all faced was like the graduation problem, right? Like, you know, Heroku or like, I mean, like also like Heroku, there's like a counterfactual future of like, what would have happened if Salesforce didn't buy them, right? Like, that's a sort of separate thing. But like, I think what Heroku, I think always struggled with was like, eventually companies would get big enough that you couldn't really justify running in Heroku. So they would just go and like move it to, you know, whatever AWS or, you know, in particular. And you know, that's something that keeps me up at night too, like, what does that graduation risk like look like for modal? I always think like the only way to build a successful infrastructure company in the long run in the cloud today is you have to appeal to the entire spectrum, right? Or at least like the enterprise, like you have to capture the enterprise market. But the truly good companies capture the whole spectrum, right? Like I think of companies like, I don't like Datadog or Mongo or something that were like, they both captured like the hobbyists and acquire them, but also like, you know, have very large enterprise customers. I think that arguably was like where I, in my opinion, like Heroku struggle was like, how do you maintain the customers as they get more and more advanced? I don't know what the solution is, but I think there's, you know, that's something I would have thought deeply if I was at Heroku at that time.
Alessio [00:33:14]: What's the AI graduation problem? Is it, I need to fine tune the model, I need better economics, any insights from customer discussions?
Erik [00:33:22]: Yeah, I mean, better economics, certainly. But although like, I would say like, even for people who like, you know, needs like thousands of GPUs, just because we can drive utilization so much better, like we, there's actually like a cost advantage of staying on modal. But yeah, I mean, certainly like, you know, and like the fact that VCs like love, you know, throwing money at least used to, you know, add companies who need it to buy GPUs. I think that didn't help the problem. And in training, I think, you know, there's less software differentiation. So in training, I think there's certainly like better economics of like buying big clusters. But I mean, my hope it's going to change, right? Like I think, you know, we're still pretty early in the cycle of like building AI infrastructure. And I think a lot of these companies over in the long run, like, you know, they're, except it may be super big ones, like, you know, on Facebook and Google, they're always going to build their own ones. But like everyone else, like some extent, you know, I think they're better off like buying platforms. And, you know, someone's going to have to build those platforms.
Swyx [00:34:12]: Yeah. Cool. Let's move on to language models and just specifically that workload just to flesh it out a little bit. You already said that RAMP is like fine tuning 100 models at once simultaneously on modal. Closer to home, my favorite example is ErikBot. Maybe you want to tell that story.
Erik [00:34:30]: Yeah. I mean, it was a prototype thing we built for fun, but it's pretty cool. Like we basically built this thing that hooks up to Slack. It like downloads all the Slack history and, you know, fine-tunes a model based on a person. And then you can chat with that. And so you can like, you know, clone yourself and like talk to yourself on Slack. I mean, it's like nice like demo and it's just like, I think like it's like fully contained modal. Like there's a modal app that does everything, right? Like it downloads Slack, you know, integrates with the Slack API, like downloads the stuff, the data, like just runs the fine-tuning and then like creates like dynamically an inference endpoint. And it's all like self-contained and like, you know, a few hundred lines of code. So I think it's sort of a good kind of use case for, or like it kind of demonstrates a lot of the capabilities of modal.
Alessio [00:35:08]: Yeah. On a more personal side, how close did you feel ErikBot was to you?
Erik [00:35:13]: It definitely captured the like the language. Yeah. I mean, I don't know, like the content, I always feel this way about like AI and it's gotten better. Like when you look at like AI output of text, like, and it's like, when you glance at it, it's like, yeah, this seems really smart, you know, but then you actually like look a little bit deeper. It's like, what does this mean?
Swyx [00:35:32]: What does this person say?
Erik [00:35:33]: It's like kind of vacuous, right? And that's like kind of what I felt like, you know, talking to like my clone version, like it's like says like things like the grammar is correct. Like some of the sentences make a lot of sense, but like, what are you trying to say? Like there's no content here. I don't know. I mean, it's like, I got that feeling also with chat TBT in the like early versions right now it's like better, but.
Alessio [00:35:51]: That's funny. So I built this thing called small podcaster to automate a lot of our back office work, so to speak. And it's great at transcript. It's great at doing chapters. And then I was like, okay, how about you come up with a short summary? And it's like, it sounds good, but it's like, it's not even the same ballpark as like, yeah, end up writing. Right. And it's hard to see how it's going to get there.
Swyx [00:36:11]: Oh, I have ideas.
Erik [00:36:13]: I'm certain it's going to get there, but like, I agree with you. Right. And like, I have the same thing. I don't know if you've read like AI generated books. Like they just like kind of seem funny, right? Like there's off, right? But like you glance at it and it's like, oh, it's kind of cool. Like looks correct, but then it's like very weird when you actually read them.
Swyx [00:36:30]: Yeah. Well, so for what it's worth, I think anyone can join the modal slack. Is it open to the public? Yeah, totally.
Erik [00:36:35]: If you go to modal.com, there's a button in the footer.
Swyx [00:36:38]: Yeah. And then you can talk to Erik Bot. And then sometimes I really like picking Erik Bot and then you answer afterwards, but then you're like, yeah, mostly correct or whatever. Any other broader lessons, you know, just broadening out from like the single use case of fine tuning, like what are you seeing people do with fine tuning or just language models on modal in general? Yeah.
Erik [00:36:59]: I mean, I think language models is interesting because so many people get started with APIs and that's just, you know, they're just dominating a space in particular opening AI, right? And that's not necessarily like a place where we aim to compete. I mean, maybe at some point, but like, it's just not like a core focus for us. And I think sort of separately, it's sort of a question of like, there's economics in that long term. But like, so we tend to focus on more like the areas like around it, right? Like fine tuning, like another use case we have is a bunch of people, Ramp included, is doing batch embeddings on modal. So let's say, you know, you have like a, actually we're like writing a blog post, like we take all of Wikipedia and like parallelize embeddings in 15 minutes and produce vectors for each article. So those types of use cases, I think modal suits really well for. I think also a lot of like custom inference, like yeah, I love that.
Swyx [00:37:43]: Yeah. I think you should give people an idea of the order of magnitude of parallelism, because I think people don't understand how parallel. So like, I think your classic hello world with modal is like some kind of Fibonacci function, right? Yeah, we have a bunch of different ones. Some recursive function. Yeah.
Erik [00:37:59]: Yeah. I mean, like, yeah, I mean, it's like pretty easy in modal, like fan out to like, you know, at least like 100 GPUs, like in a few seconds. And you know, if you give it like a couple of minutes, like we can, you know, you can fan out to like thousands of GPUs. Like we run it relatively large scale. And yeah, we've run, you know, many thousands of GPUs at certain points when we needed, you know, big backfills or some customers had very large compute needs.
Swyx [00:38:21]: Yeah. Yeah. And I mean, that's super useful for a number of things. So one of my early interactions with modal as well was with a small developer, which is my sort of coding agent. The reason I chose modal was a number of things. One, I just wanted to try it out. I just had an excuse to try it. Akshay offered to onboard me personally. But the most interesting thing was that you could have that sort of local development experience as it was running on my laptop, but then it would seamlessly translate to a cloud service or like a cloud hosted environment. And then it could fan out with concurrency controls. So I could say like, because like, you know, the number of times I hit the GPT-3 API at the time was going to be subject to the rate limit. But I wanted to fan out without worrying about that kind of stuff. With modal, I can just kind of declare that in my config and that's it. Oh, like a concurrency limit?
Erik [00:39:07]: Yeah. Yeah.
Swyx [00:39:09]: Yeah. There's a lot of control. And that's why it's like, yeah, this is a pretty good use case for like writing this kind of LLM application code inside of this environment that just understands fan out and rate limiting natively. You don't actually have an exposed queue system, but you have it under the hood, you know, that kind of stuff. Totally.
Erik [00:39:28]: It's a self-provisioning cloud.
Swyx [00:39:30]: So the last part of modal I wanted to touch on, and obviously feel free, I know you're working on new features, was the sandbox that was introduced last year. And this is something that I think was inspired by Code Interpreter. You can tell me the longer history behind that.
Erik [00:39:45]: Yeah. Like we originally built it for the use case, like there was a bunch of customers who looked into code generation applications and then they came to us and asked us, is there a safe way to execute code? And yeah, we spent a lot of time on like container security. We used GeoVisor, for instance, which is a Google product that provides pretty strong isolation of code. So we built a product where you can basically like run arbitrary code inside a container and monitor its output or like get it back in a safe way. I mean, over time it's like evolved into more of like, I think the long-term direction is actually I think more interesting, which is that I think modal as a platform where like I think the core like container infrastructure we offer could actually be like, you know, unbundled from like the client SDK and offer to like other, you know, like we're talking to a couple of like other companies that want to run, you know, through their packages, like run, execute jobs on modal, like kind of programmatically. So that's actually the direction like Sandbox is going. It's like turning into more like a platform for platforms is kind of what I've been thinking about it as.
Swyx [00:40:45]: Oh boy. Platform. That's the old Kubernetes line.
Erik [00:40:48]: Yeah. Yeah. Yeah. But it's like, you know, like having that ability to like programmatically, you know, create containers and execute them, I think, I think is really cool. And I think it opens up a lot of interesting capabilities that are sort of separate from the like core Python SDK in modal. So I'm really excited about C. It's like one of those features that we kind of released and like, you know, then we kind of look at like what users actually build with it and people are starting to build like kind of crazy things. And then, you know, we double down on some of those things because when we see like, you know, potential new product features and so Sandbox, I think in that sense, it's like kind of in that direction. We found a lot of like interesting use cases in the direction of like platformized container runner.
Swyx [00:41:27]: Can you be more specific about what you're double down on after seeing users in action?
Erik [00:41:32]: I mean, we're working with like some companies that, I mean, without getting into specifics like that, need the ability to take their users code and then launch containers on modal. And it's not about security necessarily, like they just want to use modal as a back end, right? Like they may already provide like Kubernetes as a back end, Lambda as a back end, and now they want to add modal as a back end, right? And so, you know, they need a way to programmatically define jobs on behalf of their users and execute them. And so, I don't know, that's kind of abstract, but does that make sense? I totally get it.
Swyx [00:42:03]: It's sort of one level of recursion to sort of be the Modal for their customers.
Erik [00:42:09]: Exactly.
Swyx [00:42:10]: Yeah, exactly. And Cloudflare has done this, you know, Kenton Vardar from Cloudflare, who's like the tech lead on this thing, called it sort of functions as a service as a service.
Erik [00:42:17]: Yeah, that's exactly right. FaSasS.
Swyx [00:42:21]: FaSasS. Yeah, like, I mean, like that, I think any base layer, second layer cloud provider like yourself, compute provider like yourself should provide, you know, it's a mark of maturity and success that people just trust you to do that. They'd rather build on top of you than compete with you. The more interesting thing for me is like, what does it mean to serve a computer like an LLM developer, rather than a human developer, right? Like, that's what a sandbox is to me, that you have to redefine modal to serve a different non-human audience.
Erik [00:42:51]: Yeah. Yeah, and I think there's some really interesting people, you know, building very cool things.
Swyx [00:42:55]: Yeah. So I don't have an answer, but, you know, I imagine things like, hey, the way you give feedback is different. Maybe you have to like stream errors, log errors differently. I don't really know. Yeah. Obviously, there's like safety considerations. Maybe you have an API to like restrict access to the web. Yeah. I don't think anyone would use it, but it's there if you want it.
Erik [00:43:17]: Yeah.
Swyx [00:43:18]: Yeah. Any other sort of design considerations? I have no idea.
Erik [00:43:21]: With sandboxes?
Swyx [00:43:22]: Yeah. Yeah.
Erik [00:43:24]: Open-ended question here. Yeah. I mean, no, I think, yeah, the network restrictions, I think, make a lot of sense. Yeah. I mean, I think, you know, long-term, like, I think there's a lot of interesting use cases where like the LLM, in itself, can like decide, I want to install these packages and like run this thing. And like, obviously, for a lot of those use cases, like you want to have some sort of control that it doesn't like install malicious stuff and steal your secrets and things like that. But I think that's what's exciting about the sandbox primitive, is like it lets you do that in a relatively safe way.
Alessio [00:43:51]: Do you have any thoughts on the inference wars? A lot of providers are just rushing to the bottom to get the lowest price per million tokens. Some of them, you know, the Sean Randomat, they're just losing money and there's like the physics of it just don't work out for them to make any money on it. How do you think about your pricing and like how much premium you can get and you can kind of command versus using lower prices as kind of like a wedge into getting there, especially once you have model instrumented? What are the tradeoffs and any thoughts on strategies that work?
Erik [00:44:23]: I mean, we focus more on like custom models and custom code. And I think in that space, there's like less competition and I think we can have a pricing markup, right? Like, you know, people will always compare our prices to like, you know, the GPU power they can get elsewhere. And so how big can that markup be? Like it never can be, you know, we can never charge like 10x more, but we can certainly charge a premium. And like, you know, for that reason, like we can have pretty good margins. The LLM space is like the opposite, like the switching cost of LLMs is zero. If all you're doing is like straight up, like at least like open source, right? Like if all you're doing is like, you know, using some, you know, inference endpoint that serves an open source model and, you know, some other provider comes along and like offers a lower price, you're just going to switch, right? So I don't know, to me that reminds me a lot of like all this like 15 minute delivery wars or like, you know, like Uber versus Lyft, you know, and like maybe going back even further, like I think a lot about like sort of, you know, flip side of this is like, it's actually a positive side, which is like, I thought a lot about like fiber optics boom of like 98, 99, like the other day, or like, you know, and also like the overinvestment in GPU today. Like, like, yeah, like, you know, I don't know, like in the end, like, I don't think VCs will have the return they expected, like, you know, in these things, but guess who's going to benefit, like, you know, is the consumers, like someone's like reaping the value of this. And that's, I think an amazing flip side is that, you know, we should be very grateful, the fact that like VCs want to subsidize these things, which is, you know, like you go back to fiber optics, like there was an extreme, like overinvestment in fiber optics network in like 98. And no one made money who did that. But consumers, you know, got tremendous benefits of all the fiber optics cables that were led, you know, throughout the country in the decades after. I feel something similar about like GPUs today. And also like specifically looking like more narrowly at like LLM in France market, like that's great. Like, you know, I'm very happy that, you know, there's a price war. Modal is like not necessarily like participating in that price war, right? Like, I think, you know, it's going to shake out and then someone's going to win and then they're going to raise prices or whatever. Like, we'll see how that works out. But for that reason, like we're not like hyper focused on like serving, you know, just like straight up, like here's an endpoint to an open source model. We think the value in Modal comes from all these, you know, the other use cases, the more custom stuff, like fine tuning and complex, you know, guided output, like type stuff. Or like also like in other, like outside of LLMs, like with more focus, a lot more like image, audio, video stuff, because that's where there's a lot more proprietary models. There's a lot more like custom workflows. And that's where I think, you know, Modal is more, you know, there's a lot of value in software differentiation. I think focusing on developer experience and developer productivity, that's where I think, you know, you can have more of a competitive moat.
Alessio [00:46:58]: I'm curious what the difference is going to be now that it's an enterprise. So like with DoorDash, Uber, they're going to charge you more. And like as a customer, like you can decide to not take Uber. But if you're a company building AI features in your product using the subsidized prices, and then, you know, the VC money dries up in a year and like prices go up, it's like, you can't really take the features back without a lot of backlash. But you also cannot really kill your margins by paying the new price. So I don't know what that's going to look like
Erik [00:47:28]: But like margins are going to go up for sure. But I don't know if prices will go up because like GPU prices have to drop eventually, right? So like, you know, like in the long run, I still think like prices may not go up that much. But certainly margins will go up. Like I think you said, Swyx, that margins are negative right now. Like, you know, for some people, obviously, that's not sustainable. So certainly margins will have to go up. Like some companies are going to have to make money in this space. Otherwise, like they're not going to provide the service. But that's equilibrium too, right? Like at some point, like, you know, it sort of stabilizes and one or two or three providers make money.
Alessio [00:48:02]: Yeah. What else is maybe underrated, a model, something that people don't talk enough about, or yeah, that we didn't cover in the discussion?
Erik [00:48:11]: Yeah, I think what are some other things? We talked about a lot of stuff. Like we have the bursty parallelism. I think that's pretty cool. Working on a lot of like, trying to figure out like, kind of thinking more about the roadmap. But like one of the things I'm very excited about is building primitives for like, more like IO intensive workloads. And so like, we're building some like crude stuff right now where like, you can like create like direct TCP tunnels to containers and that lets you like pipe data. And like, you know, we haven't really explored this as much as we should, but like, there's a lot of interesting applications. Like you can actually do like kind of real time video stuff in Modal now, because you can like create a tunnel to, exactly. You can create a raw TCP socket to a container, feed it video and then like, you know, get the video back. And I think like, it's still like a little bit like, you know, not fully ergonomically like figured out. But I think there's a lot of like, super cool stuff. Like when we start enabling those more like high IO workloads, I'm super excited about. I think also like, you know, working with large data sets or kind of taking the ability to map and fan out and like building more like higher level, like functional primitives, like filters and group buys and joins. Like I think there's a lot of like, really cool stuff you can do. But this is like maybe like, you know, years out like.
Swyx [00:49:18]: Yeah, we can just broaden out from Modal a little bit, but you still have a lot of, you have a lot of great tweets. So it's very easy to just kind of go through them. Why is Oracle underrated? I love Oracle's GPUs. I don't know why, you know,
Erik [00:49:34]: what the economics looks like for Oracle, but I think they're great value for money. Like we run a bunch of stuff in Oracle and they have bare metal machines, like two terabytes of RAM. They're like super fast SSDs. You know, I mean, we love AWS and AGCP too. We have great relationships with them. But I think Oracle is surprising. Like, you know, if you told me like three years ago that I would be using Oracle Cloud, like I'd be like, what, wait, why? But now, you know,
Swyx [00:49:55]: I'm a happy customer. And it's a combination of pricing and the kinds of SKUs I guess they offer.
Erik [00:50:01]: Yeah. Great, great machines, good prices, you know. That's it. Yeah. Yeah. That's all I care about. Yeah. The sales team is pretty fun too. Like I like them.
Swyx [00:50:09]: In Europe, people often talk about Hetzner. Yeah. Like we've focused on the main clouds, right?
Erik [00:50:14]: Like we've, you know, Oracle, AWS, GCP, we'll probably add Azure at some point. I think, I mean, there's definitely a long tail of like, you know, CoreWeave, Hetzner, like Lambda, like all these things. And like over time, I think we'll look at those too. Like, you know, wherever we can get the right GPUs at the right price. Yeah. I mean, I think it's fascinating. Like it's a tough business. Like I wouldn't want to try to build like a cloud provider. You know, it's just, you just have to be like incredibly focused on like, you know, efficiency and margins and things like that. But I mean, I'm glad people are trying.
Swyx [00:50:45]: Yeah. And you can ramp up on any of these clouds very quickly, right? Because it's your standard stack.
Erik [00:50:50]: Yeah. I mean, yeah. Like I think so. Like, you know, what Modal does is like programmatic, you know, launching and termination of machines. So that's like what's nice about the clouds is, you know, they have relatively like immature APIs for doing that, as well as like, you know, support for Terraform for all the networking and all that stuff. So that makes it easier to work with the big clouds. But yeah, I mean, some of those things, like I think, you know, I also expect the smaller clouds to like embrace those things in the long run, but also think, you know, you know, we can also probably integrate with some of the clouds, like even without that. There's always an HTML API that you can use, just like script something that launches instances like through the web.
Swyx [00:51:24]: Yeah. I think a lot of people are always curious about whether or not you will buy your own hardware someday. I think you're pretty firm in that it's not your interest, but like your story and your growth does remind me a little bit of Cloudflare, which obviously, you know, invests a lot in its own physical network.
Erik [00:51:42]: Yeah. I don't remember like early days, like, did they have their own hardware or?
Swyx [00:51:47]: They push out a lot with like agreements through other, you know, providers.
Erik [00:51:52]: Yeah. Okay. Interesting.
Swyx [00:51:53]: But now it's all their own hardware. So I understand.
Erik [00:51:57]: Yeah. I mean, my feeling is that when you're a venture funded startup, like buying physical hardware is maybe not the best use of the money.
Swyx [00:52:06]: I really wanted to put you in a room with Isocat from Poolside. Yeah. Because he has the complete opposite view. Yeah.
Erik [00:52:12]: It is great. I mean, I don't like, I just think for like a capital efficiency point of view, like, do you really want to tie up that much money and like, you know, physical hardware and think about depreciation and like, like, as much as possible, like I, you know, I favor a more capital efficient way of like, we don't want to own the hardware because then, and ideally, we want to, we want the sort of margin structure to be sort of like 100% correlated revenue in cogs in the sense that like, you know, when someone comes and pays us, you know, $1 for compute, like, you know, we immediately incur a cost of like, whatever, 70 cents, 80 cents, you know, and there's like complete correlation between cost and revenue because then you can leverage up in like a kind of a nice way you can scale very efficiently. You know, like, that's not, you know, turns out like that's hard to do. Like, you can't just only use like spotting on demand instances. Like over time, we've actually started adding a pretty significant amount of reservations too. So I don't know, like reservation is always like one step towards owning your own hardware. Like, I don't know, like, do we really want to be, you know, thinking about switches and cooling and HVAC and like power supplies? Accessory recovery. Yeah. Like, is that the thing I want to think about? Like, I don't know. Like I like to make developers happy, but who knows, like maybe one day, like, but I don't think it's gonna happen anytime soon.
Swyx [00:53:23]: Yeah. Obviously, for what it's worth, obviously, I'm a believer in cloud, but it's interesting to have the devil's advocate on the other side. The main thing you have to do is be confident that you can manage your depreciation better than the typical assumption, which is two to three years. Yeah. Yeah. And so the moment you have a CTO that tells you, no, I think I can make these things last seven years, then it changes the math.
Erik [00:53:46]: Yeah. Yeah. But you know, are you deluding yourself then? That's the question, right? It's like the waste management scandal. Do you know about that? Like they had all this like, like accounting scandal back in the 90s, like this garbage company, like where they like, started assuming their garbage trucks had a 10-year depreciation schedule, booked like a massive profit, you know, the stock went to like, you know, up like, you know, and then it turns out actually all those garbage trucks broke down and like, you can't really depreciate them over 10 years. And so, so then the whole company, you know, they had to restate all the earnings.
Alessio [00:54:18]: Let's go into some personal nuggets. You received the IOI gold medal, which is the International Olympiad in Informatics.
Erik [00:54:29]: 20 years ago.
Alessio [00:54:30]: Yeah. How have these models and like going to change competitive programming? Like, do you think people are still love the craft? I feel like over time, we're kind of like programming has kind of lost maybe a little bit of its luster in the eyes of a lot of, a lot of people. Yeah. I'm curious to, to see what you think.
Erik [00:54:51]: I mean, maybe, but like, I don't know, like, you know, I've been coding for almost 30 or more than 30 years. And like, I feel like, you know, you look at like programming and, you know, where it is today versus where it was, you know, 30, 40, 50 years ago, there's like probably thousand times more developers today than, you know, so like, and every year there's more and more developers. And at the same time, developer productivity keeps going up. And when I look at the real world, I just think there's so much software that's still waiting to be built. Like, I think we can, you know, 10X the amount of developers and still, you know, have a lot of people making a lot of money, you know, building amazing software and also being while at the same time being more productive. Like I never understood this, like, you know, AI is going to, you know, replace engineers. That's very rarely how this actually works. When AI makes engineers more productive, like the demand actually goes up because the cost of engineers goes down because you can build software more cheaply. And that's, I think, the story of software in the world over the last few decades. So, I mean, I don't know how this relates to like competitive programming. Kind of going back to your question, competitive programming to me was always kind of a weird kind of, you know, niche, like kind of, I don't know. I love it. It's like puzzle solving. And like my experience is like, you know, half of competitive programmers are able to translate that to actual like building cool stuff in the world. Half just like get really in, you know, sucked into this like puzzle stuff and, you know, it never loses its grip on them. But like for me, it was an amazing way to get started with coding or get very deep into coding and, you know, kind of battle off with like other smart kids and traveling to different countries when I was a teenager.
Swyx [00:56:29]: I was just going to mention, like, it's not just that he personally is a competitive programmer. Like, I think a lot of people at Modal are competitive programmers. I think you met Akshat through... Akshat, co-founder is also at Gold Medal.
Erik [00:56:42]: By the way, Gold Medal doesn't mean you win. Like, but although we actually had an intern that won Iowa. Gold Medal is like the top 20, 30 people roughly.
Swyx [00:56:47]: Yeah. Obviously, it's very hard to get hired at Modal. But what is it like to work with like such a talent density? Like, you know, how is that contributing to the culture at Modal? Yeah. I mean, I think humans are the root cause of like everything at a company, like, you know, bad code is because it's bad human or like whatever, you know, bad culture.
Erik [00:57:03]: So like, I think, you know, like talent density is very important and like keeping the bar high and like hiring smart people. And, you know, it's not always like the case that like hiring competitive programmers is the right strategy, right? If you're building something very different, like you may not, you know, but we actually end up having a lot of like hard, you know, complex challenges. Like, you know, I talked about like the cloud, you know, the resource allocation, like turns out like that actually, like you can phrase that as a mixed integer programming problem. Like we now have that running in production, like constantly optimizing how we allocate cloud resources. There's a lot of like interesting, like complex, like scheduling problems. And like, how do you do all the bin packing of all the containers? Like, so, you know, I think for what we're building, you know, it makes a lot of sense to hire these people who like, like those very hard problems.
Swyx [00:57:52]: Yeah. And they don't necessarily have to know the details of the stack. They just need to be very good at algorithms.
Erik [00:57:56]: No, but my feeling is like people who are like pretty good at competitive programming, they can also pick up like other stuff like elsewhere. Not always the case, but you know, there's definitely a high correlation.
Swyx [00:58:08]: Oh yeah. I'm just, I'm interested in that just because, you know, like there's competitive mental talents in other areas, like competitive speed memorization or whatever. And like, you don't really see those transfer. And I always assumed in my narrow perception that competitive programming is so specialized, it's so obscure, even like so divorced from real world scenarios that it doesn't actually transfer that much. But obviously I think for the problems that you work on it, it does.
Erik [00:58:34]: But it's also like, you know, frankly, it's like translates to some extent, not because like the problems are the same, but just because like it sort of filters for the, you know, people who are like willing to go very deep and work hard on things. Right. Like, I feel like a similar thing is like a lot of good developers are like talented musicians. Like, why? Like, why is this a correlation? And like, my theory is like, you know, it's the same sort of skill. Like you have to like just hyper focus on something and practice a lot. Like, and there's something similar that I think creates like good developers.
Alessio [00:59:02]: Yeah. Sweden also had a lot of very good Counter-Strike players. I don't know, why does Sweden have fiber optics before all of Europe? I feel like, I grew up in Italy and our internet was terrible. And then I feel like all the Nordics and like amazing internet, I remember getting online and people in the Nordics are like five ping, 10 ping.
Erik [00:59:23]: Yeah. We had very good network back then. Yeah. Do you know why? I mean, I'm sure like, you know, I think the government, you know, did certain things quite well. Right. Like in the nineties, like there was like a bunch of tax rebates for like buying computers. And I think there was similar like investments in infrastructure. I mean, like, and I think like I was thinking about, you know, it's like, I still can't use my phone in the subway in New York. And that was something I could use in Sweden in 95. You know, we're talking like 40 years almost. Right. Like, like why? And I don't know, like I think certain infrastructure,
Alessio [00:59:53]: you know, Sweden was just better at, I don't know. And also you never owned a TV or a car?
Erik [00:59:59]: Never owned a TV or a car. I never had a driver's license.
Alessio [01:00:01]: How do you do that in Sweden though? Like that's cold.
Erik [01:00:03]: I grew up in a city. I mean, like I took the subway everywhere with bike or whatever. Yeah. I always lived in cities, so I don't, you know, I never felt, I mean, like we have like me and my wife as a car, but like. That doesn't count. I mean, it's her name because I don't have a driver's license. She drives me everywhere. It's nice.
Swyx [01:00:21]: Nice. That's fantastic. I was going to ask you, like the last thing I had on this list was your advice to people thinking about running some sort of run code in the cloud startup is only do it if you're genuinely excited about spending five years thinking about load balancing, page falls, cloud security and DNS. So basically like it sounds like you're summing up a lot of pain running Modal. Yeah. Yeah. Like one thing I struggle with, like I talked to a lot of people
Erik [01:00:43]: starting companies in the data space or like AI space or whatever. And they sort of come at it at like, you know, from like an application developer point of view. And they're like, I'm going to make this better. But like, guess how you have to make it better. It's like, you have to go very deep on the infrastructure layer. And so one of my frustrations has been like so many startups are like, in my opinion, like Kubernetes wrappers and not very like thick wrappers, like fairly thin wrappers. And I think, you know, every startup is a wrapper to some extent, but like you need to be like a fat wrapper. You need to like go deep and like build some stuff. And that's like, you know, if you build a tech company, you're going to want to, you're going to have to spend, you know, five, 10, 20 years of your life, like going very deep and like, you know, building the infrastructure you need in order to like make your product truly stand out and be competitive. And so, you know, I think that goes for everything. I mean, like you're starting a whatever, you know, online retailer of, I don't know, bathroom sinks. You have to be willing to spend 10 years of your life thinking about, you know, whatever, bathroom sinks, like otherwise it's going to be hard.
Swyx [01:01:37]: Yeah. I think that's good advice for everyone. And yeah, congrats on all your success. It's pretty exciting to watch it. It's just the beginning. Yeah. Yeah. Yeah. It's
Erik [01:01:45]: exciting. And everyone should sign up and try out modal.modal.com. Yeah. Now it's GA. Yay. Yeah.
Swyx [01:01:50]: Used to be behind a wait list. Yeah. Awesome, Erik. Thank you so much for coming on. Yeah, it's amazing. Thank you so much. Thanks.
Swyx [01:02:11]: Bye.

Get full access to Latent Space at www.latent.space/subscribe

16 February 2024, 5:42 pm
More Episodes? Get the App

About Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0

Links

Listeners Also Subscribed To

Your feedback is valuable to us. Should you encounter any bugs, glitches, lack of functionality or other problems, please email us on [email protected] or join Moon.FM Telegram Group where you can talk directly to the dev team who are happy to answer any queries.