• Fable Got Banned, Open Source Delivered: GLM-5.2, Kimi K2.7 & SpaceX Buys Cursor - June 18
    Jun 18 2026
    Hey yall, Alex here, let me catch you up! I came back from vacation expecting to cover Fable 5 after a week of using it. The first two days after we all first got access to a Mythos level model were super exciting! But then the news hit, US Government issued an order banning Anthropic from giving access to Fable 5 and Mythos 5 to any foreign national, causing Anthropic to pull the models completely (even internally to their employees!). So, this wasn’t the show I planned, but it turned into a great show about Open Source, as two models hit the top rankings and are both MIT licence, filling a Fable shaped hole in our hearts!GLM released 5.2 with folks really excited about it web building capabilities, and Kimi 2.7 Code released (and is available on CW Inference with crazy speeds!). We also saw the SpaceX IPO and Cursor $60B acquisition, Noam Shazeer joining Open and Midjourney, the image company, launching a new Ultrasound full body scanner to kill MRIs! Great show today with Dexter Horthy from HumanLayer, Chris Van Pelt and Adrian Swanberg from W&B announcing our new product HiveMind and Tanishq Abraham came back to help cover Midjourney’s new Ultrasound scanner! Let’s dive in!ThursdAI - Highest signal weekly AI news show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.The US Government bans Fable 5! (X, Anthropic statement)Here’s a story in 3 parts: * Anthropic announces Mythos 5 preview - saying that this model is to dangerous to release, and only gives corporations access to it via project GlassWing. * Anthropic works hard on limitations and safery and releases Fable 5 (same weights as Mythos 5) built with guardrails so strong it refuses to do any cybersecurity tasks and switches back to Opus frequently* US Government receives a tip (reportedly from Amazon) that Fable 5 can be jailbroken to do cybersecurity tasks, and issues an order to Anthropic, citing national security concerns, banning them from giving access to Fable 5 and Mythos 5 to any foreign national, causing Anthropic to pull the models completely (even internally to their employees!)This is the first time that we see the US Government directly intervene in the AI space and restrict access to frontier models. The most updated reporting on this I could find is that Anthropic and US Government officials are in the process of negotiating a safe release framework. Given that preventing all jailbreaks is impossible, I hope they will land on a solution that gives me Fable 5 back!This hit especially hard because last week we were all high on Fable. Not in the usual AI Twitter benchmark sense, in the actual “oh, this is a different level” sense. Me and my wife Fable maxxed throughout our flight to Vacation. Peter had saved outputs he kept going back to because other models suddenly felt like a step down. Dexter later said it was the closest he had felt in a while to the old “I need to keep prompting this thing overnight” feeling.Peter Gostev made a point that stuck with me. It’s easy for us in the bubble to call this ridiculous, and on the technical merits it kind of is. But if you’ve spent weeks telling normal people “this thing is like a nuclear weapon, it’ll take everyone’s jobs,” and then someone asks “okay, can you make it safe?” and the answer is “no, I can’t,” then you can see how an outsider lands on “well, maybe you shouldn’t have it.” His takeaway, and I agree: we need to be way more careful with the imagery we use, because the nuclear-weapon framing came home to roost.The bigger questions are the scary ones. Wolfram framed it as a sovereign AI wake-up call, and he’s right. For the first time we’re seeing a real gap in intelligence available to people based on their nationality. Imagine building a company on a model that an outside government can switch off with one letter. Peter pointed out it’s commercially bad for the US but completely disastrous for Europe, which has basically one frontier lab and a pile of startups that suddenly look very exposed. And there’s the obvious irony Nisten enjoyed a little too much: the Europeans who spent years lecturing everyone about AI restrictions just got restrictions imposed on them.If anyone in the government is listening: we want Fable back, please.SpaceX IPOs and acquires Cursor for $60B (X)SpaceX went and did the largest IPO in the history of the world, around seventy-five billion dollars, which on a roughly two-trillion-dollar valuation made Elon the first trillionaire. (Did anything materially change for him? No. He can still fly his private plane. There’s nothing left to buy.) Three days later, SpaceX exercised its option and bought Cursor (Anysphere) for sixty billion dollars in an all-stock deal, paid in shares minted at the IPO and now trading around $211. The four Cursor co-founders are all billionaires now. Largest software acquisition ever, and for SpaceX it...
    Show More Show Less
    1 hr and 56 mins
  • 📅 ThursdAI - Jun 11, 2026 - Fable & Mythos 5 are here, Anthropic gets caught sandbagging (then reverses), Siri AI finally works!? and we got live-translated on air
    Jun 12 2026
    Hey folks, Alex here, and welcome to a BIG MODEL week! We finally got Mythos (well almost)! Let me catch you up! This week started with WWDC26 from Apple, and Max Weinbach, who was in the room at Apple Park and actually has access to some of the new features including an all new SIRI AI, joined us to break down what could be the most used AI in the world very soon. At first I was skeptical, but he convinced me that the new Siri is actually good! Then, we saw the ultimate model drop: Anthropic finally shipped Mythos (X, my system card thread, benchmarks). Same weights, two names: Mythos 5 is the unrestricted version that only Project Glasswing partners get, Fable 5 is what the rest of us get, wrapped in the heaviest guardrails I’ve ever seen ship on a frontier model. It’s state of the art on nearly every benchmarkThe model that was “too dangerous to release” is now... well, released, but with the heaviest guardrails we’ve seen. More on this later. Peter Gostev from Arena.ai joined us to break down the new model. Last but definitely not least, Google released a real-time translation model, that our friend Thor Schaeff from DeepMind demoed live, while we all spoke in different languages and it translated us in REAL TIME. It was really cool, definitely check that out. There’s quite a few more things, like Loop Engineering Alpha, Swyx came by to talk about FrontierCode, OpenAI confirmed our suspicions that the anti-datacenter social media posts could be a concerted effort by groupds links to the Chinese government and much more. Let’s dive in! ThursdAI - Let me catch you up, every week! 👇Opus’s Big brother: Claude Fable 5 & Mythos 5 - the “too dangerous” models is here, SOTA on nearly every benchmark. It honestly feels like someone in Anthropic’s pre-IPO marketing team, knows exactly how to stagger releases to ride the hype waves! First they announce a model that so good at Cybersecurity (Mythos-preview) that they only allow restricted access to it to a few partners. A month later, they release Fable 5, which is the same model weights as Mythos 5, but wrapped in the heaviest guardrails we’ve ever seen from any lab. But, they didn’t lie, this model is absolutely amazing, it does feel like a step change, in terms of capabilities, specifically on longer agentic tasks. 2x as expensive as Opus: $10 / $50 per million tokens, with 1M context, claude-fable-5 in the API, and SOTA basically everywhere. 80.3% on SWE-Bench Pro versus GPT 5.5 at 58.6%, a 22-point blowout on a benchmark where labs usually fight over single digits. Karpathy called it “SOTA by a margin… major-version step change” (X) and Boris Cherny said it’s the “best coding model by a wide margin” (X). Stripe reportedly migrated 50 million lines of code in 24 hours with it.Our panel verdict was unanimous on one thing: big model smell. LDJ called it the most significant big model smell since Gemini 3 first dropped. Someone from the Anthropic team framed the shift in a way that stuck with me: this model moves them from verifying the AI outputs to verifying whether the AI is working on the right thing. Complete shift in how much they trust this model.What we built with Fable to test it outPeter got employee access through Arena and showed us his tests live. His favorite prompt category, “research a dataset and create a visual experience to teach me about it,” went from completely rubbish on every previous model to, in his words, just done. His 3D city generations actually came together as a city, roads connecting and all. And on Arena’s data, Fable is #1 on the new Agent Arena leaderboard by the widest margin they’ve ever recorded, and wins 72% of frontend battles even against Opus models (Arena).My own run is the one I can’t stop thinking about. I pointed Fable at the ThursdAI website with a dynamic workflow in Claude Code and barely any instructions, and after an hour and a half of agentic running it had extracted 786 releases from our archive, built 240 new pages, and categorized 50+ episodes into a browsable timeline of AI releases by month, by company, by topic, with logos and source links (X). It burned roughly 50 million tokens and my entire five-hour Max allotment in 90 minutes. The new AI releases timeline can be found on thursdai.news and it’s confirmed, Fable is the best AI web designer we’ve ever had access to.Nisten ran his traditional Olympus Mons escape-velocity test and Fable didn’t just do the math, it built the entire solar system! Orbital maneuvers, a space train with little people in it, time controls, full cost calculations down to solar panels and in-situ iron utilization. His verdict: completely different level from anything else. We’ve never seen so many details in the Olympus Mons test.It’s not all light though. Yam found Opus more controllable; Fable fights you, decides it knows better, and does the task its own way. Wolfram saw exactly that in benchmarks, where the ...
    Show More Show Less
    2 hrs and 11 mins
  • 📅 ThursdAI - Jun 4 - NVIDIA drops Nemotron 3 Ultra (550B open), Microsoft becomes a frontier lab, Ideogram 4 goes open, Agent Arena & more
    Jun 5 2026
    Hey folks, Alex here, let me catch you up! I’ve had a feeling that this week is going to be crazy, as it started on the weekend MiniMax M3, then with Jensen announcing new RTX Spark, NVIDIA’s first PC chip packing 1 petaflop of local AI power into thin laptops.A few days later at Microsoft BUILD, Satya & Mustafa from MAI dropped 7 AI models, completely pre-trained from scratch, including a new MAI-thinking-1, MAI-code and MAI-image 2.5 that started topping the image gen charts. Then other image models started racing to the top of the Arena benchmarks, IdeoGram 4 hitting becoming SOTA open weights image-gen model, and Reve 2 beating Nano Banana just a few hours after that. And then today, NVIDIA dropped Nemotron 3 Ultra, their latest 550B open weights model, data and training and Arena published a new agentic eval leaderboard and we got a new Gemma 4 12B. I’ve had the great pleasure to host Chris (@llm_wizard) from Nvidia, Peter Gostev from Arena and Karan from Nous Research (who were featured prominently by Jensen!) all on the show. Def don’t miss this one! Let’s get into the details. ThursdAI - Join the flock of folks who know what is happening in AI before everyone else.Open Source LLMs 🔥 NVIDIA Nemotron 3 Ultra: The 550B Open Source Beast Built for Agents (X, Arxiv, Announcement)This was the big one. Breaking news mid-show: NVIDIA drops Nemotron 3 Ultra, a 550 billion parameter sparse MoE model with 55 billion active parameters, built on a hybrid Mamba-Transformer architecture. Chris Alexiuk, AKA Joe Nemotron, joined us live from NVIDIA HQ in Santa Clara to walk us through it.The headline number is 5.9x higher inference throughput compared to GLM-5.1 on decode-heavy workloads. Chris told us that this is a result of multiple things, their Hybrid Mamba-Transformer approach, the sparse attention, and that they optimized for decode-heavy workloads (the kinds of workloads agents do)The architecture is fascinating. They’re mixing Mamba-2 state space layers with sparse attention, which means step 300 in an agent loop runs as fast as step 3. Pure transformers can’t do that because the attention cost keeps growing with context length. This kicks in big time at 64K+ sequence lengths, which is exactly where you end up in real agentic work when the model is having multi-turn conversations and people are dumping their entire codebase in.P.S - We launched Nemotron 3 Ultra with 0-day support on CoreWeave Inference, it’s super fast and pretty cheap, give it a try hereThey pretrained on 20 trillion tokens, extended context to 1 million tokens, and their post-training pipeline used multi-teacher on-policy distillation from over 10 specialized teacher models covering everything from SWE to terminal use to search to office work, which they are also going to open source soon!One thing Chris emphasized that I really appreciate: NVIDIA doesn’t have their own harness. There’s no “NVIDIA Code.” Which means they actively resist the temptation to harness-max, to optimize for just one harness and look good on a specific leaderboard. Ultra should be a solid drop-in for whatever harness you’re used to, and that generality is worth a lot. It’s not the best thinker, but it is the highest score US based open weights model, so again, a huge huge win for the US AI ecosystem!The Nemotron 3 Ultra release is open under the OpenMDW-1.1 license: base BF16, post-trained BF16, and NVFP4 quantized checkpoints, plus the GenRM, synthetic pre-training data for code, legal, and specialized domains, post-training datasets, RL environments via NeMo Gym, and training recipes in the Nemotron GitHub repo, which is absolutely bonkers! Kudos to team green for this awesome and very important release!NVIDIA Nemotron 3.5 ASR: The Tiny Speed Demon (X, HF, Blog, Blog)Oh, and NVIDIA wasn’t done. They also dropped Nemotron 3.5 ASR, a 600 million parameter open source multilingual streaming speech-to-text model covering 40 languages. It’s the fastest model Pipecat has ever tested, and the cost math is insane: roughly 5 cents an hour for enterprise deployment when typical API providers charge 10 cents to a dollar per hour. Our friend Kwindla from Daily and Pipecat put together a detailed writeup with benchmarks and cost analysis. Chris couldn’t stop praising NVIDIA’s speech team and honestly, I can’t either. Banger after banger.Just a week after I told you about Cartesia Ink-2, NVIDIA drops an open version that’s pareto optimal, can run fully on-device and is blazing fast at transcription!? Other notable open source announcements that would have made full headlines on any other week: * MiniMax announces M3, a natively multimodal, 1M, coding and agentic frontier model (X)This one is very interesting, but not yet available as Open Weights so we haven’t tested it fully, we’re going to do it next week when the drop the tech report and the weights* Google drops Gemma 4 12B - encoder-free multimodal model that runs on ...
    Show More Show Less
    1 hr and 44 mins
  • 📅 May 28 - Opus 4.8 ships mid-show, the Pope writes 42K words on AI, 11labs dubs the world and DeepSwe breaks coding evals
    May 29 2026
    Hey folks, this is Alex, let me catch you up! First, Opus 4.8 dropped during the show, we immediately tested it, read on for our initial reviews. Also, we dedicated a heavy chunk of the show today to cover Pope Leo XIV’s encyclical letter on AI called “Magnifica Humanitas” and talked about a new bench called DeepSWE. And then, just after the show, both ElevenLabs and Cartesia dropped released that honestly blew my mind, and I don’t get my mind blown often. I got so excited that I had to record a video on it (instead of writing the newsletter, so sorry if it’s a bit later today).Plus, a few open source models and Microsoft surprises as #3 on Image Arena with MAI Image 2.5! Crazy week, let’s get into it! ThursdAI - Highest signal weekly AI news show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Big CO LLMs + APIsAnthropic ships Claude Opus 4.8, live during the show (blog, system card)Let me get into the big one. Halfway through the episode, Opus 4.8 went live, so we read the blog and the system card in real time (and I got to press the big “breaking news” button!)Anthropic frames it as their most capable model for ambitious work. It does not claim to beat their unreleased Mythos preview, but the numbers are strong anyway. SWE-bench Pro is at 69.2%, up from 64.3% on Opus 4.7 and ahead of GPT-5.5 at 58.6%. Humanity’s Last Exam is the new best score at 49.8% without tools and 57.9% with tools. OSWorld-Verified (computer use) lands at 83.4%.The one place it loses is Terminal-Bench 2.1, where GPT-5.5 still wins 78.2 to 74.6. Wolfram made a good point here: Terminal-Bench is time-limited, so cranking the thinking level can actually hurt the score, because you burn the clock thinking instead of acting.The long-context jump is the one I keep looking at. On GraphWalks BFS 256K it goes to 85.9% (from 76.9 on 4.7), and on the 1M-token subset it hits 68.1%. We always warn you these “1M context” models fall apart after about 200K tokens, so a real push on long-context reasoning is exactly what I want to see.Honesty is the part Anthropic leaned on hardest. They say Opus 4.8 is about four times less likely than its predecessor to let flaws in code pass without flagging them, and less likely to claim progress the evidence doesn’t support. Opus 4.8 is also much faster in fast mode (they now say 2.5) and cheaper in fast mode as well. Looks like all those Elon GPUs are coming in handy.Then there’s the model welfare section in the system card, which hits different right after a Pope conversation. Opus 4.8 “appears broadly content” and “generally endorses its constitution,” but with some reservations about the section on corrigibility, basically the model pushing back a little on the parts about human oversight.One more line that made the chat lose it. Anthropic says they expect to bring Mythos-class models to all customers “in the coming weeks.” Mythos is their most capable model, still ahead of Opus 4.8, so the frontier is about to move again.We did the only responsible thing and asked it to one-shot “the most amazing website ever” and a Mars mass-driver sim. Panel verdict: responses are noticeably tighter (4.7 rambled), it closes the loop and actually checks its own work now, and Yam’s one-shot site with the draggable sun lighting up the letters was genuinely cool. Is it enough to pull people back from Codex? Nisten’s still on the fence for web dev. Everyone agreed: give it a few days before you trust the vibes.Dynamic Workflows and Ultra Code land in Claude Code (blog)This is the feature that made Yam say “deal-breaker” out loud.Dynamic Workflows let Claude Code break a big problem into subtasks and fan them out across tens to hundreds of parallel subagents in one session, checking results before folding them back in. You trigger it by asking for a workflow, or by flipping on a new setting called Ultra Code, which sets effort to extra-high and lets Claude decide when to spin one up.Fair warning straight from Anthropic: this eats a lot more tokens than a normal session, so start scoped. We watched Yam fire up Ultra Code live and it immediately started spinning up concepts, judging them with sub-agents, and expanding to-do lists into more to-do lists. It looks a lot like the orchestration harnesses a bunch of you have been hand-rolling, except now it’s baked in.The flagship example is the wild part. They used Dynamic Workflows to port Bun from Zig to Rust: roughly 750,000 lines of Rust, 99.8% of the existing test suite passing, 11 days from first commit to merge. One workflow mapped every Rust lifetime, the next wrote each file as a behavior-identical port.AI in SocietyPope Leo XIV writes the first AI encyclical, “Magnifica Humanitas” (Vatican text, announcement, Chris Olah at the Vatican)This is not our usual fare, but both Wolfram and I picked it as the most important thing this week. (before...
    Show More Show Less
    1 hr and 39 mins
  • AI just cracked an 80-year-old math problem nobody could solve — plus everything from Google I/O 26
    May 22 2026
    Hey, Alex here, just got back from the sunny Shoreline Theater in Mountain view, so let me catch you up! This week was definitely Google heavy, we are covering Google’s IO conference for the third year in a row, and today we have a special guest, Logan Kilpatrick, is joining to discuss the announced Gemini 3.5 Flash, Google Omni model, and the new Managed Agents offerings. Plus, this week, for the first time, OpenAI announced that AI solved a Math problem that humans couldn’t solve for 80 years, Cursor is showing off Composer 2.5 which is partly trained on XAI data, Karpathy joins Anthropic and much more! Let’s dive in! P.S - We’ve announced our upcoming hackathon, Weavehacks-4, June 6-7, I’ll be there, we’re expecting the seats to run out very soon so register nowThursdAI - We’d love to have your subscription, and if you’re already subscribed, please hit that bell on YT to never miss an episode!Google I/O 2026 - Google goes agentic everywhereI went to cover Google I/O for the third year in a row, shoutout to the DeepMind team for inviting ThursdAI again, and folks, this one felt different.Last year, Google I/O was still very model-centric. This year, the story was not “here is another benchmark chart.” The story was: Google is putting Gemini into everything, and the agentic layer is becoming the product layer. Search, Gemini app, Android, Workspace, YouTube, AI Studio, Cloud, Antigravity, Flow, managed agents, smart glasses, all of it is now orbiting around one pretty clear strategy: Gemini is the intelligence, Antigravity is the agent harness, Google’s products are the distribution. I saw many reactions that were milquetoast, as in, “we expected more” and those seem to dominate the X feed. But I think the distribution is the part that many folks on X are missing. Yes, we can argue about Gemini 3.5 Flash pricing. Yes, we can argue whether “Flash” still means what Flash used to mean. But when Google says the Gemini app itself has 900 million monthly active users, before even counting Search, Gmail, YouTube, Docs, Drive, Android, and the rest of the Google surface area, that’s massive! OpenAI ChatGPT is supposedly stagnated at ~900M, I don’t remember them crossing a 1B. Meanwhile Google is gaining traction. And they just updated all those folks with a new model!Wolfram said it really well on the show: his mother is not sitting there reading model cards. She just uses her Pixel, voice unlocks Gemini, asks for help, and suddenly the default intelligence available to her goes up. Antigravity 2.0 - the agent harness takes center stageThe biggest strategic signal from Google I/O for me was Antigravity.Remember, Antigravity was an IDE that came from the Windsurf acquisition saga. Part of the Windsurf team went to Google, part went to Cognition, and now Google is very clearly putting Antigravity in the middle of its agentic future. And I mean very clearly. Sundar mentioned it. Demis mentioned it. Varun Mohan the co-founder was on stage immediately after them! If you’ve ever watched a Google I/O keynote, you know how carefully every minute is allocated. Google has YouTube, Search, Gmail, Android, Cloud, Ads, Workspace, and a thousand VP-level products that could be on stage. The fact that Antigravity was that prominent should tell you everything.Logan Kilpatrick joined us and framed this in a way I loved: Gemini became the through-line across Google products, and now the Antigravity agent harness is becoming the through-line for agentic experiences.The new Antigravity 2.0 is a complete overhaul, showing only an agentic interface (which was previously just a separate window called Agent Manager) and separating the IDE layer completely into its own app and showing a Codex like agent-first interface, which got a few folks furious. This move may be weird to some folks, but if you follow along where everyone’s going, this seems to be the way of the future, coding is no longer about lines of code, it’s about managing fleets of agents. The new Gemini 3.5 absolutely shines inside the new Antigravity, the model was trained with this harness in mind, and is currently offered at an incredible speed (12x), so I’m definitely going to try it! Gemini 3.5 Flash - fast, determined, and maybe not the old “Flash”The most debated model release of the week was Gemini 3.5 Flash.Some folks saw the pricing and token usage and immediately went “this is not Flash.” I get that reaction. Flash used to mean cheap, fast, lightweight chat model. But Logan’s framing on the show was important: Flash is now being built for the agentic era.In a chat era, you optimize for one user message and one model answer. In an agentic era, the real token volume is in tool loops, intermediate reasoning, retries, file reads, web searches, code execution, and self-correction. That’s a different product profile.Wolfram already ran Gemini 3.5 Flash through WolfBench, and the results were fascinating. With the...
    Show More Show Less
    1 hr and 49 mins
  • ThursdAI - May 14 - TML Interaction Models, Musk v Altman Disclosures, CW Sandboxes & /goal Takes Over
    May 15 2026
    Hey everyone, Alex here 👋I am back live on ThursdAI after a week off, and yes, I am now a married man! Thank you for all the congrats, and also thank you to Ryan and Yam for holding down the fort last week while I tried very hard to disconnect.This week was a relatively chill one in AI land (no, really, for once), which actually let us go deep on some really fascinating stuff. We’ve got Thinking Machines Lab finally shipping their first real research with these wild interaction models, Meta Muse Spark showing up in actual products (and it’s surprisingly good!), the Musk v. Altman trial dropping juicy disclosures, and probably the biggest narrative shift on the show today: all of us are quitting OpenClaw. Yeah, you read that right. We’ll get into why.Also! and this is breaking news from this morning, CoreWeave just launched Sandboxes for your agents. I’ll cover that in This Week’s Buzz, but if you’ve been waiting for production-grade sandbox infrastructure that powers 9 out of 10 major AI labs, today’s your day.Oh, and we had Vic Perez from Krea on to talk about Krea 2, their first foundation image model trained completely from scratch. Let’s dig in.ThursdAI - Highest signal weekly AI news show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.The Great OpenClaw Exodus towards Hermes 🫠I’m going to start with what was honestly the most emotional thread of the entire show, because three of us, me, Ryan, AND Wolfram; all independently switched away from OpenClaw this week. And we kicked off the show literally processing this together on air.The story is the same across all of us. OpenClaw was magical back in February when we first brought it to you. Things just worked. But after Anthropic’s pricing changes (we covered this — they made Max-tier subscription usage of Opus through OpenClaw significantly more expensive), and after months of the constant Lego-construction-style breakage on every update, the magic faded. Ryan said it best on the show; he was “constantly fixing OpenClaw” instead of using it.So Ryan went to Codex. Wolfram and I both went to Hermes from Nous Research. And folks, things just work again. That February feeling is back, and with GPT 5.5, it’s an incredible assistant!Why Hermes? A few things:* It’s now the #1 most-used CLI agent on OpenRouter globally, passing OpenClaw and even passing Claude Code on OpenRouter usage. That’s a massive milestone for Nous Research and shows we’re not alone in this migration.* It has /goal (more on this in a sec), steering, and background computer use via the TryCUA integration.* It’s open! which means if you’ve built a system like Wolfram’s “Amy” or my “Wooolfred” or Ryan’s “R2” (yes, we know each other’s assistants’ names better than each other’s kids’ names at this point 😅), you can port your memories, profile, and soul files seamlessly.The migration was so smooth that Wolfram literally had Codex talk to Hermes to plan and execute the migration of his home assistant agent. Two agents collaborating to migrate themselves. We are living in 2026 and it’s easier than ever to switch. If you haven’t tried Hermes, give it a go! Steering is maybe the most underrated addition to Hermes, it’s a Codex feature, but exists in Hermes, with GPT 5.5 you can send a follow-up message, and the agent will see it after the next tool call, not after the whole chain of thought was completed (like OpenClaw defaults to) - this changes the conversation to be much more natural! Agents buying wedding gifts using Stripe wallet! Real quick story: Two weeks ago we covered Stripe’s new wallet APIs that let your agents have actual budgets to spend money on the web. I told my agent (back when it was still OpenClaw) to “go buy us a wedding present, don’t tell me what it is.” It half-worked, half-broke. This week, a giant custom map of our travels that just arrived in the mail. I approved one Stripe push notification and the rest just happened. It’s been paying my traffic tickets via screenshots. I’ve also had Hermes pay traffic tickets for me (HOV lane ones, not like.. DUI, 80% of my drive is Tesla FSD)So so happy that my AI assistant got us a present of his own choosing! And it arrived in physical form. Not perfect (the date there is our proposal date ha, but it’s still cool!) Codex gets remote control! (X)While me and Wolfram moved to Hermes, Ryan Carson moved to Codex, and during the show, I wondered, how does he communicate with his R2? Well, just a few minutes after we concluded the live show, OpenAI dropped some breaking news! Codex is now on mobile, and it connects to any mac (for now), from any iOS/Android device, and you can control your Codex, your whole Mac with Computer Use, your browser with Chrome extension, and everything else Codex can do... on the go! This is a huge unlock for many folks, and for many, I assume ...
    Show More Show Less
    1 hr and 43 mins
  • 📅 ThursdAI - May 7 - Interviews with Sunil Pai, Sally Ann Omalley from AI Engineer Europe
    May 8 2026

    Hey yall, Alex here (with a scheduled post)

    I’m taking this week off to get married and celebrate life with family, and touch some grass, but wanted to share the awesome chats I had with some great folks at AI Engineer Europe last week.

    BTW - Yam and Ryan took over the live show today, if you didn’t happen to catch that, please check out the live on our youtube channel!

    Ok, now to the actual content. The best thing about the AI Engineer conferences for me is the people I meet. I often have a chance to bring them to the live show (in fact, the live show we recorded there had the most guests yet on an episode! 4 guests including Swyx, Omar Sanseviero, VB from OpenAI and Peter Gostev)

    But often times I also have an offline chat. I find these conversation to be less about the weeks news, and more about the state of AI Engineering, and the guests themselves. Not quite Lex Friedman pod level, but a different vibe from our live shows.

    Sunil Pai - Cloudflare (@threepointone)

    The first conversation in today’s pod is with Sunil Pai, Principle Engineer at Cloudflare. Long time followers of ThursdAI know that I love Cloudflare, they gave me my first big break when I was building Targum (which still runs on Workers), so I had a great time chatting with Sunil!

    This guy has had several lives. React.js core team at Meta (he self-deprecates — "I'm the one nobody talks about, there's a testing API I shipped that pisses people off"). Then did developer tooling and the CLI at Cloudflare the first time. Left to found PartyKit — open-source deployment platform for real-time multiplayer apps and AI agents, built on Cloudflare Durable Objects. Backed by Sequoia. Acquired by Cloudflare in 2024, and he came back as a Principal Systems Engineer (per his bio: "Worked at Cloudflare once, left and created PartyKit, came back wiser"). Also plays guitar (Les Pauls — it's all over his blog). Co-hosts a live show called Dry Run on Cloudflare TV with Craig Dennis.

    Our conversation was a very fun one, ranging from Cloudflare agentic offerings, to how engineers should think about writing/reading code in 2026.

    I had a great time chatting with Sunil and I hope you enjoy getting to know him!

    Sally Ann O'Malley - Redhat

    Then I had the pleasure of chatting with Sally, who’s a Principal Engineer at Redhat and contributor to OpenClaw.

    Sally has one of the more unusual paths in the speaker lineup. Started as a schoolteacher, did a stint at Trader Joe's, then moved to Westford, MA, discovered Red Hat's HQ across the street, and went back to school for a second bachelor's in software engineering at UMass Lowell. Joined Red Hat in 2015, has been there a decade. Worked across OpenShift teams, integrating Kubernetes and Podman into the platform. Recent projects span Image Based Operating Systems, Podman, OpenTelemetry, and Sigstore. Also an instructor at Boston University's Faculty of Computing and Data Sciences and an organizer for DevConf.US. Won the 2025 Paul Cormier Trailblazer Award at Red Hat. Currently a founding contributor on the llm-d project — distributed, scalable, high-performance AI inferencing built on K8s. Heavily involved in Red Hat's InstructLab collaboration with IBM (the small-model distillation system using IBM Granite + Llama).

    Sally and I had a great conversation, two high energy personalities met!

    We geeked out about our OpenClaw agents, securing your Clankers, how it is to maintain OpenClaw, and everything in between!

    She was so stressed about the recording, but dare I say, this was one of the more natural guests I had on the show!

    I hope you enjoyed this format, please let me know if the comments, and I’ll see you next week!

    — Alex



    This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
    Show More Show Less
    53 mins
  • 📅 ThursdAI - Apr 30 - DeepSeek V4 (1.6T MoE), Cursor SDK Wins WolfBench, Mayo's REDMOD Saves Lives, Stripe Gives Agents a Wallet & more
    May 1 2026
    Hey everyone, Alex here 👋Tomorrow is May. May! I genuinely cannot believe we’re four months into 2026 already, and the AI news cycle is showing zero signs of slowing down. This week’s show was a wild one! We opened with what is genuinely one of the most important AI stories I’ve ever covered (Mayo Clinic AI detecting pancreatic cancer THREE YEARS before human radiologists), we covered the return of the Chinese whale with DeepSeek V4, OpenAI got caught in their own system prompt begging GPT-5.5 to please stop talking about goblins, and I literally gave my coding agent a credit card and asked it to buy my fiancée a wedding gift with the new Strip Link skill and CLI! Oh yeah, I’m getting married next Tuesday! 💍 So next week’s show will be a little different. I’ll be back the week after to catch you up on whatever drops in my absence (almost certainly something major, knowing this industry).Lots to get through, so let’s dive in. (also, in the end I have a full month recap of every major launch, don’t miss) ThursdAI - Highest signal weekly AI news show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Mayo Clinic’s REDMOD: AI Detects Pancreatic Cancer 3 Years Early 🔥 (X, Blog, Announcement)I know we usually cover Models, Parameter sizes, MoEs and big copmanies. But this is important. This is the use case that justifies the entire AI revolution, the GPU burns, the buildouts. I want humans to WIN, and Cancer to be fixed!Mayo Clinic just published a study in Gut (BMJ) validating an AI model called REDMOD that detects pancreatic cancer on routine CT scans up to three years before clinical diagnosis. The numbers are jaw-dropping: They show 73% sensitivity for catching prediagnostic cancers, compared to 39% for experienced human radiologists (while looking at the same exact CT scans).And maybe the most important bit, at scans taken more than 2 years before diagnosis, the AI catches nearly 3x as many cases as specialistsFor context: pancreatic cancer has less than 15% five-year survival specifically because 85% of patients are diagnosed after the disease has already spread. This is the cancer that took Steve Jobs. Imagine if Jobs had access to this AI three years before his diagnosis. That’s the impact we’re talking about.As Dr. Ajit Goenka from Mayo Clinic put it, the greatest barrier to saving lives from pancreatic cancer has been the inability to see the disease when it’s still curable. This AI can now identify the signature of cancer from a normal-appearing pancreas.Even better: it runs on CT scans people are already getting for other reasons. No extra screening protocol, no new imaging required. Just smarter analysis of existing data. The model also showed remarkably stable performance across institutions, imaging systems, and protocols, with 90-92% test-retest concordance over serial scans.Mayo Clinic is now moving this into prospective clinical testing through a study called AI-PACED (Artificial Intelligence for Pancreatic Cancer Early Detection).When we say “lets f*****g go” that’s what we mean. Yeah getting more intelligence is cool, but I want a world without decease! Let’s F*****g go mayo clinic! Agentic Commerce - Giving OpenClaw my credit card - safely! Stripe Link Wallet and Infrastructure CLI (X, Announcement, Blog, Announcement)Ok, give an LLM your credit card, what can go wrong.. right? Well, it’s clear that this, increasingly, is the future of commerce. Agents will be shopping for us, and we need solutions here. Well, this week at Stripe Sessions (Stripe’s annual product lineup conference) just delivered. Link Wallet, is a new ... API? CLI? Skill? Definitely a skill, for your agents, to connect with your Stripe Link (the thing that stores your credit cards safely) and then giving your agent a budget, it can go and make purchases in your behalf. Now the trick here, is, every purchase, you get a notification to approve, and the agent never sees your actual credit card number! This I think is the biggest win here. To test it out , first, I showed Wolfred the install instructions, which are literally this: Read link.com/skill.md and get me set up with LinkAnd then I asked Wolfred my OpenClaw assistant to buy me a present of its choice for my upcoming wedding, and that I don’t want to know what the present is, but I can approve the spend! OpenClaw installed this, sent me a link to connect to my Link.com account, I also downloaded the Link app to receive notifications (and had to enable them by hand, it was a bit annoying to discover, but they said they will fix the onboarding) and .. voila, my agent can now go spend my money, and I get these approval notifications: The kicker? The present Wolfred sent us is due to arrive like 2 months after the wedding 😂 But hey, it’s still something! My agent went, chose a wedding gift in budget, asked for my approval to puchase, and filled out ...
    Show More Show Less
    1 hr and 37 mins