• Google’s New Weapon in AI — Cloudflare

    Tech strategy is difficult AND fascinating because it’s unpredictable. In addition to worrying about the actions of direct competitors (i.e. Samsung vs Apple), companies need to also worry about the actions of ecosystem players (i.e. smartphones and AI vendors) who may make moves that were intended for something else but have far-reaching consequences.

    In the competition between frontier AI models, it is no surprise that Google, where the Transformer architecture virtually all LLMs are based on was created, was caught off-guard by the rapid rise of OpenAI and AI-powered search vendors like Perplexity and Chinese participants like DeepSeek and Alibaba/Qwen. While Google (and its subsidiary DeepMind) have doubled down on their own impressive AI efforts, the general perception in the tech industry has been that Google is on defense.

    But, as I started, tech strategy is not just about your direct competition. It’s also about the ecosystem. Cloudflare, which offers distributed internet security solutions (which protect this blog and let me access my home server remotely) recently announced that it would start blocking the webscrapers that AI companies use due to concerns from publishers of websites that their content is being used without compensation.

    However, because search is still a key source of traffic for most websites, this “default block” is almost certainly not turned on (at least by most website owners) for Google’s own scrapers, giving Google’s internal AI efforts a unique data advantage over it’s non-search-engine rivals.

    Time will tell how the major AI vendors will adapt to this, but judging by the announcement this morning that Cloudflare is now actively flagging AI-powered search engine Perplexity as a bad agent, Cloudflare may have just given Google a powerful new weapon in it’s AI competition.


    Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives
    Gabriel Corral, Vaibhav Singal, Brian Mitchell, Reid Tatoris
    | Cloudflare Blog

  • Helsinki goes a full year without a traffic death

    Helsinki, Finland (population: ~650,000) has not had a single car cash related death in over a year! That is impressive! They believe a combination of lower speed limits, improved pedestrian / cycling infrastructure, public transit improvement, and traffic camera all contributed.

    While Americans love cars (this writer included!), the fact that roughly 40,000 Americans die each year from motor vehicle fatalities (not to mention ~2.6 million emergency department visits related to motor vehicle crashes in 2022 and $470 billion in medical costs) should push us to question if we can do better by learning from the experiences of other places.

    I don’t relish driving even slower in a city, but it’s hard to deny the alternative is even grimmer.


  • Cloudflare Tunnels for Your Home Server

    (Note: this is part of my ongoing series on cheaply selfhosting)

    If you’ve been following my selfhosting journey, you’ll know that I host some web applications — among them, network storage for my family, tools for working with PDFs, media streamers, a newsreader, etc. — from an inexpensive home server on my home network. It’s mostly a fun hobby but it’s helped teach me a great deal about web applications and Docker containers, and has helped me save some time and money by having applications I can control (and offer).

    But one of the big questions facing every self-hoster is how to access these applications securely when I’m not at home? This is a conundrum as the two traditional options available have major tradeoffs

    1. Opening up ports to the internet — One way to do this is to open up ports on your internet router and to forward traffic to those ports to your server. While this is the most direct solution to the problem of granting access to your hosted applications, this has several issues
      • First, some internet service providers and routers don’t actually let you do this!
      • Second, by opening up a port on your router, you’ll be opening up a door for everyone on the internet to access. This could expose your home network to malicious actors, and requires you to either accept the risk or set up additional security mechanisms to protect yourself.
      • Third, unless your internet service provider has granted you a static IP address (which is relatively rare for consumer internet plans), the IP address of your home will randomly change. Therefore in order to access your home server, you’ll need to setup a Dynamic DNS service which adds additional complexity to manage.
    2. VPNs or VPN-like solutions (Twingate, Tailscale, etc) — The alternative to opening up a port is to leverage VPN and VPN-like technologies. This is much more secure (and, in fact, I use Twingate, a great VPN-like service, for this type of secure remote access). But, this has one major downside: it requires each device that wants to access your hosted applications to have a special client installed. This can be a hassle (especially if you want to grant access to someone less tech-savvy), and, in some cases, near impossible (if you’re dealing with devices like a connected TV or eReader or if the device is behind a firewall that doesn’t like VPNs).

    I wanted a third option that:

    • would work nicely and securely with practically any internet-connected device
    • didn’t require client installation or configuration
    • didn’t require me to open up any new ports on my home router or expose a public IP address
    • could integrate authentication (as an extra layer of security)

    That’s how I landed on Cloudflare tunnels!

    Cloudflare Tunnels

    Enter Cloudflare Tunnels, a product in the Cloudflare Zero Trust family of offerings. By running a small piece of software called cloudflared on your home network (i.e. as a Docker container on your home server), you can link:

    • the services/resources on your home network
    • domains hosted and secured by Cloudflare
    • third party authentication services (like Google login)

    What that means is my local Stirling PDF tools (which live on my home server at the domain pdf.home) can now be reached by any internet-connected device at https://pdf.[mydomain.com] while locked behind a Google login which only allows users with specific email addresses through (i.e. my wife and myself)! All for free!

    How to Setup

    Transferring Your Domains

    To get started, transfer your domains to Cloudflare. The specific instructions for this will vary by domain registrar (see some guidelines from Cloudflare). While you can technically just change the nameservers, I would highly recommend fully transferring your domains to Cloudflare for three reasons

    • Cost: Cloudflare (as of this writing) offers at-cost domain registration. This means they don’t add any markup on top of what it costs to actually register the domain and so it’s typically cheaper to buy and renew domains with Cloudflare
    • Security: Cloudflare offers free and automatic HTTPS protection on all domains and basic DDOS protection as well
    • Extra Configurable Protection: I am not a cybersecurity expert but Cloudflare, even on their free tier, offer generous protection and domain features that you can further customize: bot protection, analytics, a sophisticated web application firewall, etc.

    Creating the Tunnel

    Once you have your domains transferred to Cloudflare, go into your Cloudflare dashboard and create your tunnel. Start by clicking on Zero Trust on the sidebar. Then go to Networks > Tunnels and click on Create Tunnel

    Select the Cloudflared option

    You will be asked to name your connector — pick any name that suits you, I went with OMV (since my homeserver is an OpenMediaVault server).

    Then copy the installation command. Paste it somewhere and extract the really long token that starts with “ey…” as you’ll need it for the next step.

    Setting up Cloudflared

    Set up cloudflared. The following are instructions for OpenMediaVault. Depending on your home server setup, you may need to do different things to get a Docker container up and running using Docker compose but the Docker compose file and the general order of operations should match. Assuming you use OpenMediaVault…

    • If you haven’t already, make sure you have OMV Extras and Docker Compose installed (refer to the section Docker and OMV-Extras in my previous post, you’ll want to follow all 10 steps as I refer to different parts of the process throughout this post) and have a static local IP address assigned to your server.
    • Login to your OpenMediaVault web admin panel, and then go to [Services > Compose > Files] in the sidebar. Press the  button in the main interface to add a new Docker compose file.

      Under Name put down cloudflared and under File, adapt the following. Copy the token from the installation command
    services: 
      cloudflared: 
        image: cloudflare/cloudflared 
        container_name: cloudflare-tunnel 
        restart: unless-stopped 
        command: tunnel run 
        environment: 
          - TUNNEL_TOKEN={{the long token from before that starts with ey...}}
    Code language: Dockerfile (dockerfile)
    • Once you’re done, hit Save and you should be returned to your list of Docker compose files. Notice that the new Cloudflared entry you created has a Down status, showing the container has yet to be initialized.
    • To start your Duplicati container, click on the new Cloudflared entry and press the  (up) button. This will create the container, download any files needed, and run it.

    Go back to your Cloudflare Zero Trust dashboard and click on Networks > Tunnels. If your Docker container worked, you should see a HEALTHY status showing that your Cloudflared container is up and running and connected to Cloudflare

    Connecting your Services to the Tunnel

    Click on your now active tunnel in the Cloudflare interface and click on Edit (or use the three-dot menu on the right hand side and select Configure) and then click on the Public Hostnames tab at the top. Press the Add a public hostname button.

    For each service you want to make available, you will need to enter:

    • The Domain you wish to use (and have transferred to Cloudflare)
      • The Subdomain you want to map that service to — if the domain you wish to use is example.com, an example subdomain would be subdomain.example.com. If you leave this blank, it will map the “naked” domain (in this case example.com)
      • The Path you want to map the service to — if the domain and subdomain is subdomain.example.com and you add a path /path, then the service would be mapped to subdomain.example.com/path
    • The Type of service — Cloudflare will map many different types of resources, but chances are it’ll be HTTP.
    • The URL of the service relative to your network — this is the IP address (including port) that you use within your network. For example: 192.168.85.22:5678 (assuming your home server’s local IP is 192.168.85.22 and the port the service you want to link is set to 5678)

    Press Save once you’re done and go ahead and test the subdomain/domain/path you just added (i.e. go to https://subdomain.example.com/path). It should take you straight to your application, except now it’s through a publicly accessible URL secured behind Cloudflare SSL!

    Suggestions on Service Configuration

    You need to repeat the above process for every selfthosted application that you want to make publicly available. Some suggestions based on what I did:

    • I made public every service I host with a few exceptions related to security, such as:
      • The OpenMediaVault console & WeTTY — Since this controls my entire home server setup (and grants access to all my network attached storage), it felt a little too important to make it easy to access (at least not without a VPN-like solution like the one I use, Twingate)
      • The PiHole administrative console — Similarly, because my PiHole is so vital to how the internet functions on my home network and regulates DNS in my home, it felt like locking this behind Twingate was reasonable
      • The NAS — As there are important and sensitive files on the OpenMediaVault file server, this was again one of the things where security trumped expediency.
      • Duplicati — I was less concerned about security here, but Duplicati is largely a “set it and forget it” type of backup tool, so it felt like there was little benefit to make this publicly available (and only potential risks)
      • The Ubooquity Admin interface — I’m again not super concerned about security here, but I have rarely needed to use it, so it didn’t make sense to add to my “surface area of attack” by exposing this as well
    • For a media server like Plex (or Jellyfin or Emby), you don’t have to, but I’d encourage you to connect two domains:
      • One that is easily memorable by you (i.e. plex.yourdomain.com) for you to access via browser over HTTPS and protected by authentication
      • and access control (see later in the post for how to configure)
      • One that has a long, hard-to-guess subdomain (i.e. hippo-oxygen-face.yourdomain.com) that will still be served over HTTPS but will not be protected by authentication. This will allow access to devices like smart TVs and the Plex clients which do not expect the servers to have additional authentication on top of them.
      If you have Plex and you follow this second suggestion, you can further secure you server by going into your Plex configuration panel from a browser and pressing the wrench icon in the upper right (which takes you to settings)

      Under your server settings (not Plex Web or your account settings which are above), go to Settings > Remote Access and press the Disable Remote Access button. This disables Plex’s built-in Relay feature which, while reasonably functional, is not under your control and limited in bandwidth / typically forces your server to transcode more than necessary

      To allow Plex apps (such as those on a TV or smartphone) to access your server, you’ll need to let Plex know what the right URL is. To do that go to Settings > Network and scroll down to Customer server access URLs. Here you’ll enter your hard-to-guess subdomain (i.e. https://hippo-oxygen-face.yourdomain.com) and press Save Changes. This informs Plex (and therefore all Plex clients) where to look for your media server


      To confirm it all works, login to your Plex account at https://app.plex.tv/ and confirm that your server shows up (you may have to wait the first time you do this as Plex connects to your server).

      Because this approach does NOT have extra access control and authentication, and because there are malicious actors who scan the internet for unguarded media server domains, it’s important that your subdomain here be long and hard-to-guess.

    Authentication and Access Control

    Because Cloudflare Tunnels are part of Cloudflare’s enterprise offering to help IT organizations make their applications secure & accessible, it comes with authentication support and access controls built-in for any application connected to your Cloudflare tunnel. This means you can easily protect your web applications against unwanted access.

    To set this up, log back in to the Cloudflare dashboard, go to Zero Trust, and then go to Access > Policies in the sidebar and press the Add a policy button.

    Enter a Policy name (pick something that describes how you’re restricting access, like “Jack and Jill only“).

    You can then add the specific rules that govern the policy. Cloudflare supports a wide range of rules (including limiting based on IP address, country, etc), but assuming you just want to restrict access to specific individuals, I’d pick Emails under Selector and add the emails of the individual who are being granted access under Value. Once you’re set, press the Save button at the bottom!

    Now you have a policy which can restrict a given application only to users with specific email addresses 🙌🏻.

    Now, we just need to set up Cloudflare to apply that policy (and a specific login method) to the services in question. To do that, in the Cloudflare Zero Trust dashboard, go to Access > Applications in the sidebar and press the Add an application button in the screen that comes up.

    Select the Self-hosted option. And then enter your Application name. Press the Add public hostname button and enter in the Subdomain, Domain, and Path for your previously-connected subdomain.

    Scroll down to Access Policies and press the Select existing policies button and check the policy you just created and then hit the Confirm button. You should see something like the following

    Finally you can configure which login methods you want to support. Out of the box, Cloudflare supports one-time PIN as a login method. Any user who lands on the domain in question the first time will be prompted to enter their email and, to verify the user is who they say they are, they’ll be sent a PIN number to that email address which they’ll then need to enter. This is straightforward, and if that’s all you want, accept the current default settings.

    However, if, like me, you prefer to have your users login via a 3rd party authentication service (like Google or Facebook), then you have a little bit of extra work to do. Press the Manage login methods link where you’ll be taken to a screen in a new tab to configure your Authentication options. Where it says Login methods, press the Add new button.

    You’ll be given the ability to add support for 3rd party logins through a number of identity providers (see below).

    You can select any identity provider you wish — I went with Google — but whatever you select, Cloudflare will provide instructions for how to connect that provider to Cloudflare Zero Trust. These instructions can be quite complicated (see the Google instructions below) but if you follow Cloudflare’s instructions, you should be fine.

    Once you’re done, press the Save button and return to the tab where you were configuring the application.

    Under Login methods you should see that Cloudflare has checked the Accept all available identity providers toggle. You can keep that option, but as I configured only want my users to use Google, I unchecked that toggle and un-selected the One-time PIN option. I also checked the Instant Auth option (only available if there’s only one authentication method selected) which skips the authentication method selection step for your users. Then I pressed Next

    The next two screens have additional optional configuration options which you can skip through by pressing Next and Save. Et voila! You have now configured an authentication and access control system on top of your now publicly accessible web service. Repeat this process for every service you want to put authentication & access control on and you’ll be set!

    I have a few services I share access to with my wife and a few that are just for myself and so I’ve configured two access policies which I apply to my services differently. For services I intend to let anyone without access control reach (for example my Plex server for Plex apps), I simply don’t add them as an application in Cloudflare for access control (and just host them via subdomain).

    I hope this is helpful for anyone who wants to make their selfhosted services accessible securely through the web. If you’re interested in how to setup a home server on OpenMediaVault or how to self-host different services, check out all my posts on the subject!

  • LLMs Get Trounced at Chess

    While Large Language Models (LLMs) have demonstrated they can do many things well enough, it’s important to remember that these are not “thinking machines” so much as impressively competent “writing machines” (able to figure out what words are likely to follow).

    Case in point: both OpenAI’s ChatGPT and Microsoft Copilot lost to the chess playing engine of an old Atari game (Video Chess) which takes up a mere 4 KB of memory to work (compared with the billions of parameters and GB’s of specialized accelerator memory needed to make LLMs work).

    It’s a small (yet potent) reminder that (1) different kinds of AI are necessary for different tasks (i.e. Google’s revolutionary AlphaZero probably would’ve made short work of the Atari engine) and (2) don’t underestimate how small but highly specialized algorithms can perform.


  • The War on Harvard’s Private Equity/VC Collateral Damage

    Republicans have declared a “war on Harvard” in recent months and one front of that is a request to the SEC to look at how Harvard’s massive endowment values illiquid assets like venture capital and private equity.

    What’s fascinating is that in targeting Harvard in this way the Republicans may have declared war on Private Equity and Venture Capital in general. As their holdings (in privately held companies) are highly illiquid, it is considered accounting “standard practice” to simply ask the investment funds to provide “fair market” valuations of those assets.

    This is a practical necessity, as it is highly difficult to value these companies (which rarely trade and where even highly paid professionals miss the mark). But, it means that investment firms are allowed to “grade their own homework”, pretending that valuations for some companies are much higher than they actually have a right to be, resulting in quite a bit of “grade inflation” across the entire sector.

    If Harvard is forced to re-value these according to a more objective standard — like the valuations of these assets according to a 409a valuation or a secondary transaction (where shares are sold without the company being involved) both of which artificially deflate prices — then it wouldn’t be a surprise to see significant “grade deflation” which could have major consequences for private capital:

    • Less capital for private equity / venture capital: Many institutional investors (LPs) like private equity / venture capital in part because the “grade inflation” buffers the price turbulence that more liquid assets (like stocks) experience (so long as the long-term returns are good). Those investors will find private equity and venture capital less attractive if the current practices are replaced with something more like “grade deflation”
    • A shift in investments from higher risk companies to more mature ones: If Private Equity / Venture Capital investments need to be graded on a harsher scale, they will be less likely to invest in higher risk companies (which are more likely to experience valuation changes under stricter methodologies) and more likely to invest in more mature companies with more predictable financials (ones that are closer to acting like publicly traded companies). This would be a blow to smaller and earlier stage companies.

  • CAR-T Bests Solid Tumors

    The promise of immune cell therapies is to direct the incredible sophistication of the immune system to the medical targets we want to hit. In the case of CAR-T cells, we take a patient’s own immune T-cells and “program” them through genetic modification to go after cancer cells. While the process of the genetic modification to create those cells is still incredibly expensive and challenging, we’ve seen amazing progress with CAR-T in liquid tumors (e.g., leukemias, lymphomas, etc).

    But, when it comes to solid tumors, it’s been far more challenging. Enter this Phase II clinical trial from China (summarized in Nature News). The researchers performed a random controlled trial on 266 patients with gastric or gastro-esophageal cancer who resisted previous treatment and assigned 2/3 to receive CAR-T or best-medical-care (the control) otherwise. The results (see the survival curve below) are impressive — while the median progression-free survival is only about 1.5 months different, it’s very clear that by month 8 there are no progression-free patients in the control group but something like ~25% of the CAR-T group.

    The side effect profile is still challenging (with 99% of patients in CAR-T group experiencing moderately severe side effects) but this is (sadly) to be expected with CAR-T treatments.

    While it remains to be seen how this scales up in a Phase III study with a larger population, this is incredibly promising finding — giving clinicians a new tool in their arsenal for dealing with a wider range of cancer targets as well as suggesting that cell therapies still have more tricks up their sleeves


  • Great Expectations

    This is an old piece from Morgan Housel from May 2023. It highlights how optimistic expectations can serve as a “debt” that needs to be “paid off”.

    To illustrate this, he gives a fascinating example — the Japanese stock market. From 1965 to 2022, both the Japanese stock market and the S&P500 (a basket of mostly American large companies) had similar returns. As most people know, Japan has had a miserable 3 “lost decades” of growth and stock performance. But Housel presents this fact in an interesting light: it wasn’t that Japan did poorly, it just did all of its growth in a 25 year run between 1965-1990 and then spent the following two decades “paying off” that “expectations debt”.

    Housel concludes, as he oftentimes does, with wisdom for all of us: “An asset you don’t deserve can quickly become a liability … reality eventually catches up, and demands repayment in equal proportion to your delusions – plus interest”.

    Manage your great expectations.


    Expectations Debt
    Morgan Housel

  • Why Self-Hosting is Having a Moment

    I am a selfhoster (as anyone who’s read many of my more recent blog posts knows). I’m also a fan of the selfh.st site (which documents a lot of news & relevant interviews from the self-host world) so I was delighted to see the owner of selfh.st get interviewed in Ars Technica.

    Nothing earth-shattering but I appreciated (and agreed with) his breakdown of why self-hosting is flourishing today (excerpt below). For me, personally, the ease with which Docker makes setting up selfhosted services and the low cost of storage and mini-PCs turned this from an impractical idea into one that I’ve come to rely on for my own “personal tech stack”.


  • I’m just a Medical Guideline

    Medical Guidelines are incredibly important — they impact everything from your doctor’s recommendations and insurance coverage to the medications your insurance covers — but are somewhat shrouded in mystery.

    This piece from Emily Oster’s ParentData is a good overview of what they are (and aren’t) — and give a pretty good explanation of why a headline from the popular press is probably not capturing the nuance and review of clinical evidence that goes into them.

    (and yes, that title is a Schoolhouse Rock reference)


  • On (Stock Market) Bubbles

    I spotted this memo from Oaktree Capital founder Howard Marks and thought it was a sobering and grounded take on what makes a stock market bubble and reasons to be alarmed about the current concentration of market capitalization in the so-called “Magnificent Seven” and how eerily similar this was to the “Nifty Fifty” or the “Dot Com Bubble” eras of irrational exuberance. Whether you agree with him or not, it’s a worthwhile piece of wisdom to remember.

    This graph that Marks borrowed from JP Morgan is also quite intriguing (terrifying?)


    On Bubble Watch
    Howard Marks

  • Pivoting from Consumer to Utility: Span

    As a Span customer, I’ve always appreciated their vision: to make home electrification cleaner, simpler, and more efficient through beautifully designed, tech-enabled electrical panels. But, let’s be honest, selling a product like this directly to consumers is tough. Electrical panels are not top-of-mind for most people until there’s a problem — and explaining the value proposition of “a smarter electrical panel” to justify the high price tag can be a real challenge. That’s why I’m unsurprised by their recent shift in strategy towards utilities.

    This pivot to partnering with utility companies makes a lot of sense. Instead of trying to convince individual homeowners to upgrade, Span can now work directly with those who can impact community-scale electrification.

    While the value proposition of avoiding costly service upgrades is undeniably beneficial for utilities, understanding precisely how that translates into financial savings for the utilities needs much more nuance. That, along with the fact that rebates & policy will vary wildly by locality, raises many uncertainties about pricing strategy (not to mention that there are other, larger smart electric panel companies like Leviton and Schneider Electric, albeit with less functional and less well-designed offerings).

    I wish the company well. We need better electrical infrastructure in the US (and especially California, where I live) and one way to achieve that is for companies like Span to find a successful path to market.


    Span’s quiet turn toward utilities
    Lisa Martine Jenkins | Latitude Media

  • RISC-V in Computers

    One of the most exciting technological developments from the semiconductor side of things is the rapid development of the ecosystem around the open-source RISC-V instruction set architecture (ISA). One landmark in its rise is that the architecture appears to be moving beyond just behind-the-scenes projects to challenging Intel/AMD’s x86 architecture and ARM (used by Apple and Qualcomm) in customer-facing applications.

    This article highlights this crucial development by reporting on early adopters embracing RISC-V to move into higher-end devices like laptops. Companies like Framework and DeepComputing have just launched or are planning to launch RISC-V laptops. While RISC-V-powered hardware still have a steep mountain to climb of software and performance challenges (as evidenced by the amount of time it’s taken for the ARM ecosystem to be credible in PCs), Intel’s recent setbacks and ARM’s legal battles with Qualcomm over licensing (pretty much guaranteeing every company that uses ARM is now going to work on RISC-V) coupled with the open source nature of RISC-V potentially allowing for a lot more innovation in form factors and functionality may have created an opening here for enterprising companies willing to make the investment.


    This Year, RISC-V Laptops Really Arrive
    Matthew S. Smith | IEEE Spectrum

  • Revenge of the Plug-In Hybrid

    While growing vehicle electrification is inevitable, it always surprised me that US automakers would drop past plug-in hybrid (PHEV) technology to only embrace all-electric. While many have attacked Toyota’s more deliberate “slow-and-steady” approach to vehicle electrification, it always seemed to me that, until we had broadly available, high quality electric vehicle charging infrastructure and until all-electric vehicles were broadly available at the price point of a non-luxury family car (i.e. a Camry or RAV4), that electric vehicles were going to be more of a upper middle class/wealthy phenomena. Considering their success in the Chinese automotive market (and growing faster than all-electric vehicles!), it always felt odd that the category wouldn’t make its way into the US market as the natural next step in vehicle electrification.

    It sounds like Dodge Ram (a division of Stellantis) agrees. It intends to delay its all-electric version of its Ram 1500 in favor of starting with its extended range plug-in hybrid version, the Ramcharger. Extended range electric vehicles (EREVs) are plug-in hybrids similar to the Chevy Volt. They employ an electric powertrain and a generator which can run on gasoline to supply additional range when the battery runs low.

    Interestingly, Nissan, Hyundai, Mazda, and GM’s Buick have made similar announcements as well.

    While it still remains to be seen how well these EREVs/PHEVs are adopted — the price points that are being discussed still feel too high to me — seeing broader adoption of plug-in hybrid technology (supplemented with gas-powered range extension) feels like the natural next step on our path to vehicle electrification.


    As EV Sales Stall, Plug-In Hybrids Get a Reboot
    Lawrence Ulrich | IEEE Spectrum

  • Helping Multi-Agent AI Experimentation

    Inspired by some work from a group at Stanford on building a lab from AI agents, I’ve been experimenting with multi-agent AI conversations and workflows. But, because the space (at least to me) has seemed more focused on building more capable agents rather than coordinating and working with more agents, the existing tools and libraries have been difficult to carry out experiments.

    To facilitate some of my own exploration work, I built what I’m calling a Multi-Agent ChatLab — a browser-based, completely portable setup to define multiple AI agents and facilitate conversations between them. This has made my experimentation work vastly simpler and I hope it can help someone else.

    And, to show off the tool, and for your amusement (and given my love of military history), here is a screengrab from the tool where I set up two AI Agents — one believing itself to be Napoleon Bonaparte and one believing itself to be the Duke of Wellington (the British commander who defeated Napoleon at Waterloo) — and had them describe (and compare!) the hallmarks of their military strategy.

  • Decarbonizing Shipping with Wind

    The shipping industry is known for being fairly dirty environmentally due largely to the fact that the most common fuel used in shipping — bunker fuel — contributes both to carbon emissions, significant air pollution, and water pollution (from spills and due to the common practice of dumping the byproduct of sulphur scrubbing to curtail air pollution).

    While much of the effort to green shipping has focused on the use of alternative fuels like hydrogen, ammonia and methanol as replacements for bunker fuel, I recently saw an article on the use of automated & highly durable sail technology to le ships leverage wind as a means to reduce fuel consumption.

    I don’t have any inside information on what the cost / speed tradeoffs are for the technology, nor whether or not there’s a credible path to scaling to handle the massive container ships that dominate global shipping, but it’s a fascinating technology vector, and a direct result of the growing realization by the shipping industry that it needs to green itself.


  • Google’s Quantum Error Correction Breakthrough

    One of the most exciting areas of technology development, but that doesn’t get a ton of mainstream media coverage, is the race to build a working quantum computer that exhibits “below threshold quantum computing” — the ability to do calculations utilizing quantum mechanics accurately.

    One of the key limitations to achieving this has been the sensitivity of quantum computing systems — in particular the qubits that capture the superposition of multiple states that allow quantum computers to exploit quantum mechanics for computation — to the world around them. Imagine if your computer’s accuracy would change every time someone walked in the room — even if it was capable of amazing things, it would not be especially practical. As a result, much research to date has been around novel ways of creating physical systems that can protect these quantum states.

    Google has (in a pre-print in Nature) demonstrated their new Willow quantum computing chip which demonstrates a quantum error correction method that spreads the quantum state information of a single “logical” qubit across multiple entangled “physical” qubits to create a more robust system. Beyond proving that their quantum error correction method worked, what is most remarkable to me, is that they’re able to extrapolate a scaling law for their error correction — a way of guessing how much better their system is at avoiding loss of quantum state as they increase the number of physical qubits per logical qubit — which could suggest a “scale up” path towards building functional, practical quantum computers.

    I will confess that quantum mechanics was never my strong suit (beyond needing it for a class on statistical mechanics eons ago in college), and my understanding of the core physics underlying what they’ve done in the paper is limited, but this is an incredibly exciting feat on our way towards practical quantum computing systems!


  • Cynefin

    I had never heard of this framework for thinking about how to address problems before. Shout-out to my friend Chris Yiu and his new Substack Secret Weapon about improving productivity for teaching me about this. It’s surprisingly insightful about when to think about something as a process problem vs an expertise problem vs experimentation vs direction.


    Problems come in many forms
    Chris Yiu | Secret Weapon

  • The Hits Business — Games Edition

    The best return on investment in terms of hours of deep engagement per dollar in entertainment is with games. When done right, they blend stunning visuals and sounds, earworm-like musical scores, compelling story and acting, and a sense of progression that are second to none.

    Case in point: I bought the complete edition of the award-winning The Witcher 3: Wild Hunt for $10 during a Steam sale in 2021. According to Steam, I’ve logged over 200 hours (I had to doublecheck that number!) playing the game, between two playthroughs and the amazing expansions Hearts of Stone and Blood and Wine — an amazing 20 hours/dollar spent. Even paying full freight (as of this writing, the complete edition including both expansions costs $50), that would still be a remarkable 4 hours/dollar. Compare that with the price of admission to a movie or theater or concert.

    The Witcher 3 has now surpassed 50 million sales — comfortably earning over $1 billion in revenue which is an amazing feat for any media property.

    But as amazing and as lucrative as these games can be, these games cannot escape the cruel hit-driven basis of their industry, where a small number of games generate the majority of financial returns. This has resulted in studios chasing ever more expensive games with familiar intellectual property (i.e. Star Wars) that has, to many game players, cut the soul from the games and has led to financial instability in even popular game studios.

    This article from IGN summarizes the state of the industry well — with so-called AAA games now costing $200 million to create, not to mention $100’s of millions to market, more and more studios have to wind down as few games can generate enough revenue to cover the cost of development and marketing.

    The article predicts — and I hope it’s right — that the games industry will learn some lessons that many studios in Hollywood/the film industry have been forced to: embrace more small budget games to experiment with new forms and IP. Blockbusters will have their place but going all-in on blockbusters is a recipe for a hollowing out of the industry and a cutting off of the creativity that it needs.

    Or, as the author so nicely puts it: “Maybe studios can remember that we used to play video games because they were fun – not because of their bigger-than-last-year maps carpeted by denser, higher-resolution grass that you walk across to finish another piece of side content that pushes you one digit closer to 100% completion.”


  • Why is it so Hard to Build a Diagnostic Business?

    Everywhere you look, the message seems clear: early detection (of cancer & disease) saves lives. Yet behind the headlines, companies developing these screening tools face a different reality. Many tests struggle to gain approval, adoption, or even financial viability. The problem isn’t that the science is bad — it’s that the math is brutal.

    This piece unpacks the economic and clinical trade-offs at the heart of the early testing / disease screening business. Why do promising technologies struggle to meet cost-effectiveness thresholds, despite clear scientific advances? And what lessons can diagnostic innovators take from these challenges to improve their odds of success? By the end, you’ll have a clearer view of the challenges and opportunities in bringing new diagnostic tools to market—and why focusing on the right metrics can make all the difference.

    The brutal math of diagnostics

    Image Credit: Wikimedia

    Technologists often prioritize metrics like sensitivity (also called recall) — the ability of a diagnostic test to correctly identify individuals with a condition (i.e., if the sensitivity of a test is 90%, then 90% of patients with the disease will register as positives and the remaining 10% will be false negatives) — because it’s often the key scientific challenge and aligns nicely with the idea of getting more patients earlier treatment.

    But when it comes to adoption and efficiency, specificity — the ability of a diagnostic test to correctly identify healthy individuals (i.e., if the specificity of a test is 90%, then 90% of healthy patients will register as negatives and the remaining 10% will be false positives) — is usually the more important and overlooked criteria.

    The reason specificity is so important is that it can have a profound impact on a test’s Positive Predictive Value (PPV) — whether or not a positive test result means a patient actually has a disease (i.e., if the positive predictive value of a test is 90%, then a patient that registers as positive has a 90% chance of having the disease and 10% chance of actually being healthy — being a false positive).

    What is counter-intuitive, even to many medical and scientific experts, is that because (by definition) most patients are healthy, many high accuracy tests have disappointingly low PPV as most positive results are actually false positives.

    Let me present an example (see table below for summary of the math) that will hopefully explain:

    • There are an estimated 1.2 million people in the US with HIV — that is roughly 0.36% (the prevalence) of the US population
    • Let’s say we have an HIV test with 99% sensitivity and 99% specificity — a 99% (very) accurate test!
    • If we tested 10,000 Americans at random, you would expect roughly 36 of them (0.36% x 10,000) to be HIV positive. That means, roughly 9,964 are HIV negative
      • 99% sensitivity means 99% of the 36 HIV positive patients will test positive (99% x 36 = ~36)
      • 99% specificity means 99% of the 9,964 HIV negative patients will test negative (99% x 9,964 = ~9,864) while 1% (1% x 9,964 = ~100) would be false positives
    • This means that even though the test is 99% accurate, it only has a positive predictive value of ~26% (36 true positives out of 136 total positive results)
    Math behind the hypothetical HIV test example (Google Sheet link)

    Below (if you’re on a browser) is an embedded calculator which will run this math for any values of disease prevalence and sensitivity / specificity (and here is a link to a Google Sheet that will do the same), but you’ll generally find that low disease rates result in low positive predictive values for even very accurate diagnostics.

    Typically, introducing a new diagnostic means balancing true positives against the burden of false positives. After all, for patients, false positives will result in anxiety, invasive tests, and, sometimes, unnecessary treatments. For healthcare systems, they can be a significant economic burden as the cost of follow-up testing and overtreatment add up, complicating their willingness to embrace new tests.

    Below (if you’re on a browser) is an embedded calculator which will run the basic diagnostic economics math for different values of the cost of testing and follow-up testing to calculate the cost of testing and follow-up testing per patient helped (and here is a link to a Google Sheet that will do the same)

    Finally, while diagnostics businesses face many of the same development hurdles as drug developers — the need to develop cutting-edge technology, to carry out large clinical studies to prove efficacy, and to manage a complex regulatory and reimbursement landscape — unlike drug developers, diagnostic businesses face significant pricing constraints. Successful treatments can command high prices for treating a disease. But successful diagnostic tests, no matter how sophisticated, cannot, because they ultimately don’t treat diseases, they merely identify them.

    Case Study: Exact Sciences and Cologuard

    Let’s take Cologuard (from Exact Sciences) as an example. Cologuard is a combination genomic and immunochemistry test for colon cancer carried out on patient stool samples. It’s two primary alternatives are:

    1. a much less sensitive fecal immunochemistry test (FIT) — which uses antibodies to detect blood in the stool as a potential, imprecise sign of colon cancer
    2. colonoscopies — a procedure where a skilled physician uses an endoscope to enter and look for signs of cancer in a patient’s colon. It’s considered the “gold standard” as it functions both as diagnostic and treatment (a physician can remove or biopsy any lesion or polyp they find). But, because it’s invasive and uncomfortable for the patient, this test is typically only done every 4-10 years

    Cologuard is (as of this writing) Exact Science’s primary product line, responsible for a large portion of Exact Science’s $2.5 billion in 2023 revenue. It can detect earlier stage colon cancer as well as pre-cancerous growths that could lead to cancer. Impressively, Exact Sciences also commands a gross margin greater than 70%, a high margin achieved mainly by pharmaceutical and software companies that have low per-unit costs of production. This has resulted in Exact Sciences, as of this writing, having a market cap over $11 billion.

    Yet for all its success, Exact Sciences is also a cautionary note, illustrating the difficulties of building a diagnostics company.

    • The company was founded in 1995, yet didn’t see meaningful revenue from selling diagnostics until 2014 (nearly 20 years later, after it received FDA approval for Cologuard)
    • The company has never had a profitable year (this includes the last 10 years it’s been in-market), losing over $200 million in 2023, and in the first three quarters of 2024, it has continued to be unprofitable.
    • Between 1997 (the first year we have good data from their SEC filings as summarized in this Google Sheet) and 2014 when it first achieved meaningful diagnostic revenue, Exact Sciences lost a cumulative $420 million, driven by $230 million in R&D spending, $88 million in Sales & Marketing spending, and $33 million in CAPEX. It funded those losses by issuing over $624 million in stock (diluting investors and employees)
    • From 2015-2023, it has needed to raise an additional $3.5 billion in stock and convertible debt (net of paybacks) to cover its continued losses (over $3 billion from 2015-2023)
    • Prior to 2014, Exact Sciences attempted to commercialize colon cancer screening technologies through partnerships with LabCorp (ColoSure and PreGenPlus). These were not very successful and led to concerns from the FDA and insurance companies. This forced Exact Sciences to invest heavily in clinical studies to win over the payers and the FDA, including a pivotal ~10,000 patient study to support Cologuard which recruited patients from over 90 sites and took over 1.5 years.
    • It took Exact Sciences 3 years after FDA approval of Cologuard for its annual diagnostic revenues to exceed what it spends on sales & marketing. It continues to spend aggressively there ($727M in 2023).

    While it’s difficult to know precisely what the company’s management / investors would do differently if they could do it all over again, the brutal math of diagnostics certainly played a key role.

    From a clinical perspective, Cologuard faces the same low positive predictive value problem all diagnostic screening tests face. From the data in their study on ~10,000 patients, it’s clear that, despite having a much higher sensitivity for cancer (92.3% vs 73.8%) and higher AUROC (94% vs 89%) than the existing FIT test, the PPV of Cologuard is only 3.7% (lower than the FIT test: 6.9%).

    Even using a broader disease definition that includes the pre-cancerous advanced lesions Exact Sciences touted as a strength, the gap on PPV does not narrow (Cologuard: 23.6% vs FIT: 32.6%)

    Clinical comparison of FIT vs Cologuard
    (Google Sheet link)

    The economic comparison with a FIT test fares even worse due to the higher cost of Cologuard as well as the higher rate of false positives. Under the Center for Medicare & Medicaid Service’s 2024Q4 laboratory fee schedule, a FIT test costs $16 (CPT code: 82274), but Cologuard costs $509 (CPT code: 81528), over 30x higher! If each positive Cologuard and FIT test results in a follow-up colonoscopy (which has a cost of $800-1000 according to this 2015 analysis), the screening cost per cancer patient is 5.2-7.1x higher for Cologuard than for the FIT test.

    Cost comparison of FIT vs Cologuard
    (Google Sheet link)

    This quick math has been confirmed in several studies.

    From ACS Clinical Congress 2022 Presentation

    While Medicare and the US Preventive Services Task Force concluded that the cost of Cologuard and the increase in false positives / colonoscopy complications was worth the improved early detection of colon cancer, it stayed largely silent on comparing cost-efficacy with the FIT test. It’s this unfavorable comparison that has probably required Exact Sciences to invest so heavily in sales and marketing to drive sales. That Cologuard has been so successful is a testament both to the value of being the only FDA-approved test on the market as well as Exact Science’s efforts in making Cologuard so well-known (how many other diagnostics do you know have an SNL skit dedicated to them?).

    Not content to rest on the laurels of Cologuard, Exact Sciences recently published a ~20,000 patient study on their next generation colon cancer screening test: Cologuard Plus. While the study suggests Exact Sciences has improved the test across the board, the company’s marketing around Cologuard Plus having both >90% sensitivity and specificity is misleading, because the figures for sensitivity and specificity are for different conditions: sensitivity for colorectal cancer but specificity for colorectal cancer OR advanced precancerous lesion (see the table below).

    Sensitivity and Specificity by Condition for Cologuard Plus Study
    (Google Sheet link)

    Disentangling these numbers shows that while Cologuard Plus has narrowed its PPV disadvantage (now worse by 1% on colorectal cancer and even on cancer or lesion) and its cost-efficacy disadvantage (now “only” 4.4-5.8x more expensive) vs the FIT test (see tables below), it still hasn’t closed the gap.

    Clinical: Cologuard+ vs FIT (Google Sheet link)
    Economic: Cologuard+ vs FIT (Google Sheet link)

    Time will tell if this improved test performance translates to continued sales performance for Exact Sciences, but it is telling that despite the significant time and resources that went into developing Cologuard Plus, the data suggests it’s still likely more cost effective for health systems to adopt FIT over Cologuard Plus as a means of preventing advanced colon cancer.

    Lessons for diagnostics companies

    The underlying math of the diagnostics business and the lessons from Exact Sciences’ long path to dramatic sales has several key lessons for diagnostic entrepreneurs:

    1. Focus on specificity — For diagnostic technologists, too little attention is paid to specificity while too much attention is paid on sensitivity. Positive predictive value and the cost-benefit for a health system are largely going to swing on specificity.
    2. Aim for higher value tests — Because the development and required validation for a diagnostic can be as high as that of a drug or medical device, it is important to pursue opportunities where the diagnostic can command a high price. These are usually markets where the alternatives are very expensive because they require new technology (e.g. advanced genetic tests) or a great deal of specialized labor (e.g. colonoscopy) or where the diagnostic directly decides on a costly course of treatment (e.g. a companion diagnostic for an oncology drug).
    3. Go after unmet needs — If a test is able to fill a mostly unmet need — for example, if the alternatives are extremely inaccurate or poorly adopted — then adoption will be determined by awareness (because there aren’t credible alternatives) and pricing will be determined by sensitivity (because this drives the delivery of better care). This also simplifies the sales process.
    4. Win beyond the test — Because performance can only ever get to 100%, each incremental point on sensitivity and specificity is both exponentially harder to achieve but also delivers less medical or financial value. As a result, it can be advantageous to focus on factors beyond the test such as regulatory approval / guidelines adoption, patient convenience, time to result, and impact on follow-up tests and procedures. Cologuard gained a great deal from being “the first FDA-approved colon cancer screening test”. Non-invasive prenatal testing, despite low positive predictive values and limited disease coverage, gained adoption in part by helping to triage follow-up amniocentesis (a procedure which has a low but still frighteningly high rate of miscarriage ~0.5%). Rapid antigen tests for COVID have also similarly been adopted despite their lower sensitivity and specificity than PCR tests due to their speed, low cost, and ability to carry out at home.

    Diagnostics developers must carefully navigate the intersection of scientific innovation and financial reality, while grappling with the fact that even the most impressive technology may be insufficient without taking into account clinical and economic factors to achieve market success.

    Ultimately, the path forward for diagnostic innovators lies in prioritizing specificity, targeting high-value and unmet needs, and crafting solutions that deliver value beyond the test itself. While Exact Science’s journey underscores the difficulty of these challenges, it also illustrates that with persistence, thoughtful investment, and strategic differentiation, it is possible to carve out a meaningful and impactful space in the market.

  • The Challenge of Capacity

    The rise of Asia as a force to be reckoned with in large scale manufacturing of critical components like batteries, solar panels, pharmaceuticals, chemicals, and semiconductors has left US and European governments seeking to catch up with a bit of a dilemma.

    These activities largely moved to Asia because financially-motivated management teams in the West (correctly) recognized that:

    • they were low return in a conventional financial sense (require tremendous investment and maintenance)
    • most of these had a heavy labor component (and higher wages in the US/European meant US/European firms were at a cost disadvantage)
    • these activities tend to benefit from economies of scale and regional industrial ecosystems, so it makes sense for an industry to have fewer and larger suppliers
    • much of the value was concentrated in design and customer relationship, activities the Western companies would retain

    What the companies failed to take into account was the speed at which Asian companies like WuXi, TSMC, Samsung, LG, CATL, Trina, Tongwei, and many others would consolidate (usually with government support), ultimately “graduating” into dominant positions with real market leverage and with the profitability to invest into the higher value activities that were previously the sole domain of Western industry.

    Now, scrambling to reposition themselves closer to the forefront in some of these critical industries, these governments have tried to kickstart domestic efforts, only to face the economic realities that led to the outsourcing to begin with.

    Northvolt, a major European effort to produce advanced batteries in Europe, is one example of this. Despite raising tremendous private capital and securing European government support, the company filed for bankruptcy a few days ago.

    While much hand-wringing is happening in climate-tech circles, I take a different view: this should really not come as a surprise. Battery manufacturing (like semiconductor, solar, pharmaceutical, etc) requires huge amounts of capital and painstaking trial-and-error to perfect operations, just to produce products that are steadily dropping in price over the long-term. It’s fundamentally a difficult and not-very-rewarding endeavor. And it’s for that reason that the West “gave up” on these years ago.

    But if US and European industrial policy is to be taken seriously here, the respective governments need to internalize that reality and be committed for the long haul. The idea that what these Asian companies are doing is “easily replicated” is simply not true, and the question is not if but when will the next recipient of government support fall into dire straits.