But, when it comes to solid tumors, it’s been far more challenging. Enter this Phase II clinical trial from China (summarized in Nature News). The researchers performed a random controlled trial on 266 patients with gastric or gastro-esophageal cancer who resisted previous treatment and assigned 2/3 to receive CAR-T or best-medical-care (the control) otherwise. The results (see the survival curve below) are impressive — while the median progression-free survival is only about 1.5 months different, it’s very clear that by month 8 there are no progression-free patients in the control group but something like ~25% of the CAR-T group.
The side effect profile is still challenging (with 99% of patients in CAR-T group experiencing moderately severe side effects) but this is (sadly) to be expected with CAR-T treatments.
While it remains to be seen how this scales up in a Phase III study with a larger population, this is incredibly promising finding — giving clinicians a new tool in their arsenal for dealing with a wider range of cancer targets as well as suggesting that cell therapies still have more tricks up their sleeves
The phase II clinical trial in China tested chimeric antigen receptor (CAR) T cells in people with advanced gastric cancer or gastro-oesophageal junction cancer, which are solid tumours. To create CAR-T-cell therapies, T cells are collected from a person with cancer and tweaked to produce proteins that target cancer cells. The T cells are then infused back into the same person. CAR-T-cell therapy has revolutionized cancer treatment but has been most successful against blood cancers.
“Solid tumours generally don’t respond well to CAR-T-cell therapy,” says Lisa Mielke, a cancer researcher at the Olivia Newton John Cancer Research Institute in Heidelberg, Australia. The trials are among the first in which CAR-T-cell therapy has had promising results against solid tumours. They provide “evidence that there is potential for CAR T cells to be further optimized for future treatment of patients with solid tumours”, adds Mielke.
This is an old piece from Morgan Housel from May 2023. It highlights how optimistic expectations can serve as a “debt” that needs to be “paid off”.
To illustrate this, he gives a fascinating example — the Japanese stock market. From 1965 to 2022, both the Japanese stock market and the S&P500 (a basket of mostly American large companies) had similar returns. As most people know, Japan has had a miserable 3 “lost decades” of growth and stock performance. But Housel presents this fact in an interesting light: it wasn’t that Japan did poorly, it just did all of its growth in a 25 year run between 1965-1990 and then spent the following two decades “paying off” that “expectations debt”.
Housel concludes, as he oftentimes does, with wisdom for all of us: “An asset you don’t deserve can quickly become a liability … reality eventually catches up, and demands repayment in equal proportion to your delusions – plus interest”.
Manage your great expectations.
There’s a stoic saying: “Misfortune weighs most heavily on those who expect nothing but good fortune.”
Expecting nothing but good feels like such a good mindset – you’re optimistic, happy, and winning. But whether you know it or not you’re very likely piling up a hidden debt that must eventually be repaid.
Nothing earth-shattering but I appreciated (and agreed with) his breakdown of why self-hosting is flourishing today (excerpt below). For me, personally, the ease with which Docker makes setting up selfhosted services and the low cost of storage and mini-PCs turned this from an impractical idea into one that I’ve come to rely on for my own “personal tech stack”.
What gave rise to self-hosting’s relative recent popularity? That led Sholly to a few answers, many of them directly relating to the corporate cloud services people typically use instead of self-hosting:
Privacy for photos, files, and other data
Cost of cloud hosting and storage
Accessibility of services, through GitHub, Reddit, and sites like his
Installation with Docker (“a game-changer for lots of people”) and Unraid
Single-board computers (SBCs) like the Raspberry Pi
NUCS, mini-PCs, workstations, and other pandemic-popular hardware
Finally, there’s the elephant in any self-hosting discussion: piracy.
Medical Guidelines are incredibly important — they impact everything from your doctor’s recommendations and insurance coverage to the medications your insurance covers — but are somewhat shrouded in mystery.
This piece from Emily Oster’s ParentData is a good overview of what they are (and aren’t) — and give a pretty good explanation of why a headline from the popular press is probably not capturing the nuance and review of clinical evidence that goes into them.
(and yes, that title is a Schoolhouse Rock reference)
The headlines almost always simplify or distort these complexities. Sometimes those simplifications are necessary, other times…well, less so. Why should you care? Guidelines often impact health insurance coverage, influence standards of care, and play a role in how health policy is developed. Understanding what goes into them empowers you to engage more thoughtfully when talking to your doctor. You’ll be more aware of when those sensationalized headlines really mean anything, which is hopefully most of the time nothing at all.
I spotted this memo from Oaktree Capital founder Howard Marks and thought it was a sobering and grounded take on what makes a stock market bubble and reasons to be alarmed about the current concentration of market capitalization in the so-called “Magnificent Seven” and how eerily similar this was to the “Nifty Fifty” or the “Dot Com Bubble” eras of irrational exuberance. Whether you agree with him or not, it’s a worthwhile piece of wisdom to remember.
This graph that Marks borrowed from JP Morgan is also quite intriguing (terrifying?)
There’s usually a grain of truth that underlies every mania and bubble. It just gets taken too far. It’s clear that the internet absolutely did change the world – in fact, we can’t imagine a world without it. But the vast majority of internet and e-commerce companies that soared in the late ’90s bubble ended up worthless. When a bubble burst in my early investing days, The Wall Street Journal would run a box on the front page listing stocks that were down by 90%. In the aftermath of the TMT Bubble, they’d lost 99%.
When something is on the pedestal of popularity, the risk of a decline is high. When people assume – and price in – an expectation that things can only get better, the damage done by negative surprises is profound. When something is new, the competitors and disruptive technologies have yet to arrive. The merit may be there, but if it’s overestimated it can be overpriced, only to evaporate when reality sets in. In the real world, trees don’t grow to the sky.
As a Span customer, I’ve always appreciated their vision: to make home electrification cleaner, simpler, and more efficient through beautifully designed, tech-enabled electrical panels. But, let’s be honest, selling a product like this directly to consumers is tough. Electrical panels are not top-of-mind for most people until there’s a problem — and explaining the value proposition of “a smarter electrical panel” to justify the high price tag can be a real challenge. That’s why I’m unsurprised by their recent shift in strategy towards utilities.
This pivot to partnering with utility companies makes a lot of sense. Instead of trying to convince individual homeowners to upgrade, Span can now work directly with those who can impact community-scale electrification.
While the value proposition of avoiding costly service upgrades is undeniably beneficial for utilities, understanding precisely how that translates into financial savings for the utilities needs much more nuance. That, along with the fact that rebates & policy will vary wildly by locality, raises many uncertainties about pricing strategy (not to mention that there are other, larger smart electric panel companies like Leviton and Schneider Electric, albeit with less functional and less well-designed offerings).
I wish the company well. We need better electrical infrastructure in the US (and especially California, where I live) and one way to achieve that is for companies like Span to find a successful path to market.
Span’s panel costs $3,500, and accessories involve separate $700-plus purchases. It’s unavoidably pricey, though tax rebates and other incentives can bring the cost down, but the premise is that they offer buyers costs saving through avoiding expensive upgrades. The pitch to utility companies is also one of cost avoidance, just at a much larger scale. Span’s target utility customer is one at the intersection of load growth, the energy transition, and existing regulatory restrictions, especially in places with aggressive decarbonization timelines like California.
One of the most exciting technological developments from the semiconductor side of things is the rapid development of the ecosystem around the open-source RISC-V instruction set architecture (ISA). One landmark in its rise is that the architecture appears to be moving beyond just behind-the-scenes projects to challenging Intel/AMD’s x86 architecture and ARM (used by Apple and Qualcomm) in customer-facing applications.
This article highlights this crucial development by reporting on early adopters embracing RISC-V to move into higher-end devices like laptops. Companies like Framework and DeepComputing have just launched or are planning to launch RISC-V laptops. While RISC-V-powered hardware still have a steep mountain to climb of software and performance challenges (as evidenced by the amount of time it’s taken for the ARM ecosystem to be credible in PCs), Intel’s recent setbacks and ARM’s legal battles with Qualcomm over licensing (pretty much guaranteeing every company that uses ARM is now going to work on RISC-V) coupled with the open source nature of RISC-V potentially allowing for a lot more innovation in form factors and functionality may have created an opening here for enterprising companies willing to make the investment.
“If we look at a couple of generations down the [software] stack, we’re starting to see a line of sight to consumer-ready RISC-V in something like a laptop, or even a phone,” said Nirav Patel, CEO of laptop maker Framework. Patel’s company plans to release a laptop that can support a RISC-V mainboard in 2025. Though still intended for early adopters and developers, it will be the most accessible and polished RISC-V laptop yet, and it will ship to users with the same look and feel as the Framework laptops that use x86 chips.
While growing vehicle electrification is inevitable, it always surprised me that US automakers would drop past plug-in hybrid (PHEV) technology to only embrace all-electric. While many have attacked Toyota’s more deliberate “slow-and-steady” approach to vehicle electrification, it always seemed to me that, until we had broadly available, high quality electric vehicle charging infrastructure and until all-electric vehicles were broadly available at the price point of a non-luxury family car (i.e. a Camry or RAV4), that electric vehicles were going to be more of a upper middle class/wealthy phenomena. Considering their success in the Chinese automotive market (and growing faster than all-electric vehicles!), it always felt odd that the category wouldn’t make its way into the US market as the natural next step in vehicle electrification.
It sounds like Dodge Ram (a division of Stellantis) agrees. It intends to delay its all-electric version of its Ram 1500 in favor of starting with its extended range plug-in hybrid version, the Ramcharger. Extended range electric vehicles (EREVs) are plug-in hybrids similar to the Chevy Volt. They employ an electric powertrain and a generator which can run on gasoline to supply additional range when the battery runs low.
While it still remains to be seen how well these EREVs/PHEVs are adopted — the price points that are being discussed still feel too high to me — seeing broader adoption of plug-in hybrid technology (supplemented with gas-powered range extension) feels like the natural next step on our path to vehicle electrification.
Consumers are still looking for electrified rides, just not the ones that many industry pundits predicted. In China, Europe, and the United States, buyers are converging on hybrids, whose sales growth is outpacing that of pure EVs. “It’s almost been a religion that it’s EVs or bust, so let’s not fool around with hybrids or hydrogen,” says Michael Dunne, CEO of Dunne Insights, a leading analyst of China’s auto industry. “But even in the world’s largest market, almost half of electrified vehicle sales are hybrids.”
Inspired by some work from a group at Stanford on building a lab from AI agents, I’ve been experimenting with multi-agent AI conversations and workflows. But, because the space (at least to me) has seemed more focused on building more capable agents rather than coordinating and working with more agents, the existing tools and libraries have been difficult to carry out experiments.
To facilitate some of my own exploration work, I built what I’m calling a Multi-Agent ChatLab — a browser-based, completely portable setup to define multiple AI agents and facilitate conversations between them. This has made my experimentation work vastly simpler and I hope it can help someone else.
More about how to use this & the underlying design on this page.
And, to show off the tool, and for your amusement (and given my love of military history), here is a screengrab from the tool where I set up two AI Agents — one believing itself to be Napoleon Bonaparte and one believing itself to be the Duke of Wellington (the British commander who defeated Napoleon at Waterloo) — and had them describe (and compare!) the hallmarks of their military strategy.
While much of the effort to green shipping has focused on the use of alternative fuels like hydrogen, ammonia and methanol as replacements for bunker fuel, I recently saw an article on the use of automated & highly durable sail technology to le ships leverage wind as a means to reduce fuel consumption.
I don’t have any inside information on what the cost / speed tradeoffs are for the technology, nor whether or not there’s a credible path to scaling to handle the massive container ships that dominate global shipping, but it’s a fascinating technology vector, and a direct result of the growing realization by the shipping industry that it needs to green itself.
Wind, on the other hand, is abundant. With the U.N.’s International Maritime Organization poised to negotiate stricter climate policies next year, including a new carbon pricing mechanism and global fuel standard, more shipping companies are seriously considering the renewable resource as an immediate answer. While sails aren’t likely to outright replace the enormous engines that drive huge cargo ships, wind power could still make a meaningful dent in the industry’s overall emissions, experts say.
One of the most exciting areas of technology development, but that doesn’t get a ton of mainstream media coverage, is the race to build a working quantum computer that exhibits “below threshold quantum computing” — the ability to do calculations utilizing quantum mechanics accurately.
One of the key limitations to achieving this has been the sensitivity of quantum computing systems — in particular the qubits that capture the superposition of multiple states that allow quantum computers to exploit quantum mechanics for computation — to the world around them. Imagine if your computer’s accuracy would change every time someone walked in the room — even if it was capable of amazing things, it would not be especially practical. As a result, much research to date has been around novel ways of creating physical systems that can protect these quantum states.
Google has (in a pre-print in Nature) demonstrated their new Willow quantum computing chip which demonstrates a quantum error correction method that spreads the quantum state information of a single “logical” qubit across multiple entangled“physical” qubits to create a more robust system. Beyond proving that their quantum error correction method worked, what is most remarkable to me, is that they’re able to extrapolate a scaling law for their error correction — a way of guessing how much better their system is at avoiding loss of quantum state as they increase the number of physical qubits per logical qubit — which could suggest a “scale up” path towards building functional, practical quantum computers.
I will confess that quantum mechanics was never my strong suit (beyond needing it for a class on statistical mechanics eons ago in college), and my understanding of the core physics underlying what they’ve done in the paper is limited, but this is an incredibly exciting feat on our way towards practical quantum computing systems!
The company’s new chip, called Willow, is a larger, improved version of that technology, with 105 physical qubits. It was developed in a fabrication laboratory that Google built at its quantum-computing campus in Santa Barbara, California, in 2021.
As a first demonstration of Willow’s power, the researchers showed that it could perform, in roughly 5 minutes, a task that would take the world’s largest supercomputer an estimated 1025 years, says Hartmut Neven, who heads Google’s quantum-computing division. This is the latest salvo in the race to show that quantum computers have an advantage over classical ones.
And, by creating logical qubits inside Willow, the Google team has shown that each successive increase in the size of a logical qubit cuts the error rate in half.
“This is a very impressive demonstration of solidly being below threshold,” says Barbara Terhal, a specialist in quantum error correction at the Delft University of Technology in the Netherlands. Mikhail Lukin, a physicist at Harvard University in Cambridge, Massachusetts, adds, “It clearly shows that the idea works.”
I had never heard of this framework for thinking about how to address problems before. Shout-out to my friend Chris Yiu and his new Substack Secret Weapon about improving productivity for teaching me about this. It’s surprisingly insightful about when to think about something as a process problem vs an expertise problem vs experimentation vs direction.
[The Cynefin framework] organises problems into four primary domains:
Clear. Cause and effect are obvious; categorise the situation and apply best practice. Example: baking a cake. Rewards process.
Complicated. Cause and effect are knowable but not immediately apparent; analyse the situation carefully to find a solution. Example: coaching a sports team. Rewards expertise.
Complex. Cause and effect are only apparent with hindsight; focus on spotting and exploiting patterns as they emerge. Example: playing poker. Rewards [experimentation — Chris’s original term was entrepreneurship, I think experimentation is more clear & actionable].
Chaotic. Cause and effect are impossible to parse; act on instinct, in the hope of imposing some order on the immediate situation. Example: novel crisis response. Rewards direction.
The best return on investment in terms of hours of deep engagement per dollar in entertainment is with games. When done right, they blend stunning visuals and sounds, earworm-like musical scores, compelling story and acting, and a sense of progression that are second to none.
Case in point: I bought the complete edition of the award-winning The Witcher 3: Wild Hunt for $10 during a Steam sale in 2021. According to Steam, I’ve logged over 200 hours (I had to doublecheck that number!) playing the game, between two playthroughs and the amazing expansions Hearts of Stone and Blood and Wine — an amazing 20 hours/dollar spent. Even paying full freight (as of this writing, the complete edition including both expansions costs $50), that would still be a remarkable 4 hours/dollar. Compare that with the price of admission to a movie or theater or concert.
The Witcher 3 has now surpassed 50 million sales — comfortably earning over $1 billion in revenue which is an amazing feat for any media property.
But as amazing and as lucrative as these games can be, these games cannot escape the cruel hit-driven basis of their industry, where a small number of games generate the majority of financial returns. This has resulted in studios chasing ever more expensive games with familiar intellectual property (i.e. Star Wars) that has, to many game players, cut the soul from the games and has led to financial instability in even popular game studios.
This article from IGN summarizes the state of the industry well — with so-called AAA games now costing $200 million to create, not to mention $100’s of millions to market, more and more studios have to wind down as few games can generate enough revenue to cover the cost of development and marketing.
The article predicts — and I hope it’s right — that the games industry will learn some lessons that many studios in Hollywood/the film industry have been forced to: embrace more small budget games to experiment with new forms and IP. Blockbusters will have their place but going all-in on blockbusters is a recipe for a hollowing out of the industry and a cutting off of the creativity that it needs.
Or, as the author so nicely puts it: “Maybe studios can remember that we used to play video games because they were fun – not because of their bigger-than-last-year maps carpeted by denser, higher-resolution grass that you walk across to finish another piece of side content that pushes you one digit closer to 100% completion.”
Just five years ago, AAA projects’ average budget ranged $50 – $150 million. Today, the minimum average is $200 million. Call of Duty’s new benchmark is $300 million, with Activision admitting in the Competition & Market Authority’s report on AAA development that it now takes the efforts of one-and-a-half studios just to complete the annual Call of Duty title.
It’s far from just Call of Duty facing ballooning costs. In the same CMA report, an anonymous publisher admits that development costs for one of its franchises reached $660 million. With $550 million of marketing costs on top, that is a $1.2 billion game. To put that into perspective, Minecraft – the world’s best-selling video game of all time – has of last year only achieved $3 billion. It took 12 years to reach that figure, having launched in 2011.
Everywhere you look, the message seems clear: early detection (of cancer & disease) saves lives. Yet behind the headlines, companies developing these screening tools face a different reality. Many tests struggle to gain approval, adoption, or even financial viability. The problem isn’t that the science is bad — it’s that the math is brutal.
This piece unpacks the economic and clinical trade-offs at the heart of the early testing / disease screening business. Why do promising technologies struggle to meet cost-effectiveness thresholds, despite clear scientific advances? And what lessons can diagnostic innovators take from these challenges to improve their odds of success? By the end, you’ll have a clearer view of the challenges and opportunities in bringing new diagnostic tools to market—and why focusing on the right metrics can make all the difference.
Technologists often prioritize metrics like sensitivity (also called recall) — the ability of a diagnostic test to correctly identify individuals with a condition (i.e., if the sensitivity of a test is 90%, then 90% of patients with the disease will register as positives and the remaining 10% will be false negatives) — because it’s often the key scientific challenge and aligns nicely with the idea of getting more patients earlier treatment.
But when it comes to adoption and efficiency, specificity — the ability of a diagnostic test to correctly identify healthy individuals (i.e., if the specificity of a test is 90%, then 90% of healthy patients will register as negatives and the remaining 10% will be false positives) — is usually the more important and overlooked criteria.
The reason specificity is so important is that it can have a profound impact on a test’s Positive Predictive Value (PPV) — whether or not a positive test result means a patient actually has a disease (i.e., if the positive predictive value of a test is 90%, then a patient that registers as positive has a 90% chance of having the disease and 10% chance of actually being healthy — being a false positive).
What is counter-intuitive, even to many medical and scientific experts, is that because (by definition) most patients are healthy, many high accuracy tests have disappointingly low PPV as most positive results are actually false positives.
Let me present an example (see table below for summary of the math) that will hopefully explain:
Let’s say we have an HIV test with 99% sensitivity and 99% specificity — a 99% (very) accurate test!
If we tested 10,000 Americans at random, you would expect roughly 36 of them (0.36% x 10,000) to be HIV positive. That means, roughly 9,964 are HIV negative
99% sensitivity means 99% of the 36 HIV positive patients will test positive (99% x 36 = ~36)
99% specificity means 99% of the 9,964 HIV negative patients will test negative (99% x 9,964 = ~9,864) while 1% (1% x 9,964 = ~100) would be false positives
This means that even though the test is 99% accurate, it only has a positive predictive value of ~26% (36 true positives out of 136 total positive results)
Below (if you’re on a browser) is an embedded calculator which will run this math for any values of disease prevalence and sensitivity / specificity (and here is a link to a Google Sheet that will do the same), but you’ll generally find that low disease rates result in low positive predictive values for even very accurate diagnostics.
Typically, introducing a new diagnostic means balancing true positives against the burden of false positives. After all, for patients, false positives will result in anxiety, invasive tests, and, sometimes, unnecessary treatments. For healthcare systems, they can be a significant economic burden as the cost of follow-up testing and overtreatment add up, complicating their willingness to embrace new tests.
Below (if you’re on a browser) is an embedded calculator which will run the basic diagnostic economics math for different values of the cost of testing and follow-up testing to calculate the cost of testing and follow-up testing per patient helped (and here is a link to a Google Sheet that will do the same)
Finally, while diagnostics businesses face many of the same development hurdles as drug developers — the need to develop cutting-edge technology, to carry out large clinical studies to prove efficacy, and to manage a complex regulatory and reimbursement landscape — unlike drug developers, diagnostic businesses face significant pricing constraints. Successful treatments can command high prices for treating a disease. But successful diagnostic tests, no matter how sophisticated, cannot, because they ultimately don’t treat diseases, they merely identify them.
Case Study: Exact Sciences and Cologuard
Let’s take Cologuard (from Exact Sciences) as an example. Cologuard is a combination genomic and immunochemistry test for colon cancer carried out on patient stool samples. It’s two primary alternatives are:
a much less sensitive fecal immunochemistry test (FIT) — which uses antibodies to detect blood in the stool as a potential, imprecise sign of colon cancer
colonoscopies — a procedure where a skilled physician uses an endoscope to enter and look for signs of cancer in a patient’s colon. It’s considered the “gold standard” as it functions both as diagnostic and treatment (a physician can remove or biopsy any lesion or polyp they find). But, because it’s invasive and uncomfortable for the patient, this test is typically only done every 4-10 years
Cologuard is (as of this writing) Exact Science’s primary product line, responsible for a large portion of Exact Science’s $2.5 billion in 2023 revenue. It can detect earlier stage colon cancer as well as pre-cancerous growths that could lead to cancer. Impressively, Exact Sciences also commands a gross margin greater than 70%, a high margin achieved mainly by pharmaceutical and software companies that have low per-unit costs of production. This has resulted in Exact Sciences, as of this writing, having a market cap over $11 billion.
Yet for all its success, Exact Sciences is also a cautionary note, illustrating the difficulties of building a diagnostics company.
The company was founded in 1995, yet didn’t see meaningful revenue from selling diagnostics until 2014 (nearly 20 years later, after it received FDA approval for Cologuard)
The company has never had a profitable year (this includes the last 10 years it’s been in-market), losing over $200 million in 2023, and in the first three quarters of 2024, it has continued to be unprofitable.
Between 1997 (the first year we have good data from their SEC filings as summarized in this Google Sheet) and 2014 when it first achieved meaningful diagnostic revenue, Exact Sciences lost a cumulative $420 million, driven by $230 million in R&D spending, $88 million in Sales & Marketing spending, and $33 million in CAPEX. It funded those losses by issuing over $624 million in stock (diluting investors and employees)
From 2015-2023, it has needed to raise an additional $3.5 billion in stock and convertible debt (net of paybacks) to cover its continued losses (over $3 billion from 2015-2023)
Prior to 2014, Exact Sciences attempted to commercialize colon cancer screening technologies through partnerships with LabCorp (ColoSure and PreGenPlus). These were not very successful and led to concerns from the FDA and insurance companies. This forced Exact Sciences to invest heavily in clinical studies to win over the payers and the FDA, including a pivotal ~10,000 patient study to support Cologuard which recruited patients from over 90 sites and took over 1.5 years.
It took Exact Sciences 3 years after FDA approval of Cologuard for its annual diagnostic revenues to exceed what it spends on sales & marketing. It continues to spend aggressively there ($727M in 2023).
While it’s difficult to know precisely what the company’s management / investors would do differently if they could do it all over again, the brutal math of diagnostics certainly played a key role.
From a clinical perspective, Cologuard faces the same low positive predictive value problem all diagnostic screening tests face. From the data in their study on ~10,000 patients, it’s clear that, despite having a much higher sensitivity for cancer (92.3% vs 73.8%) and higher AUROC (94% vs 89%) than the existing FIT test, the PPV of Cologuard is only 3.7% (lower than the FIT test: 6.9%).
Even using a broader disease definition that includes the pre-cancerous advanced lesions Exact Sciences touted as a strength, the gap on PPV does not narrow (Cologuard: 23.6% vs FIT: 32.6%)
The economic comparison with a FIT test fares even worse due to the higher cost of Cologuard as well as the higher rate of false positives. Under the Center for Medicare & Medicaid Service’s 2024Q4 laboratory fee schedule, a FIT test costs $16 (CPT code: 82274), but Cologuard costs $509 (CPT code: 81528), over 30x higher! If each positive Cologuard and FIT test results in a follow-up colonoscopy (which has a cost of $800-1000 according to this 2015 analysis), the screening cost per cancer patient is 5.2-7.1x higher for Cologuard than for the FIT test.
This quick math has been confirmed in several studies.
A study by a group at the University Medical Center of Rotterdam concluded that “Compared to nearly all other CRC screening strategies reimbursed by CMS (Medicare), [Cologuard] is less effective and considerably more costly, making it an inefficient screening option” and would only be comparable at a much lower cost (~$6-18!)
While Medicare and the US Preventive Services Task Force concluded that the cost of Cologuard and the increase in false positives / colonoscopy complications was worth the improved early detection of colon cancer, it stayed largely silent on comparing cost-efficacy with the FIT test. It’s this unfavorable comparison that has probably required Exact Sciences to invest so heavily in sales and marketing to drive sales. That Cologuard has been so successful is a testament both to the value of being the only FDA-approved test on the market as well as Exact Science’s efforts in making Cologuard so well-known (how many other diagnostics do you know have an SNL skit dedicated to them?).
Not content to rest on the laurels of Cologuard, Exact Sciences recently published a ~20,000 patient study on their next generation colon cancer screening test: Cologuard Plus. While the study suggests Exact Sciences has improved the test across the board, the company’s marketing around Cologuard Plus having both >90% sensitivity and specificity is misleading, because the figures for sensitivity and specificity are for different conditions: sensitivity for colorectal cancer but specificity for colorectal cancer OR advanced precancerous lesion (see the table below).
Sensitivity and Specificity by Condition for Cologuard Plus Study (Google Sheet link)
Disentangling these numbers shows that while Cologuard Plus has narrowed its PPV disadvantage (now worse by 1% on colorectal cancer and even on cancer or lesion) and its cost-efficacy disadvantage (now “only” 4.4-5.8x more expensive) vs the FIT test (see tables below), it still hasn’t closed the gap.
Time will tell if this improved test performance translates to continued sales performance for Exact Sciences, but it is telling that despite the significant time and resources that went into developing Cologuard Plus, the data suggests it’s still likely more cost effective for health systems to adopt FIT over Cologuard Plus as a means of preventing advanced colon cancer.
Lessons for diagnostics companies
The underlying math of the diagnostics business and the lessons from Exact Sciences’ long path to dramatic sales has several key lessons for diagnostic entrepreneurs:
Focus on specificity — For diagnostic technologists, too little attention is paid to specificity while too much attention is paid on sensitivity. Positive predictive value and the cost-benefit for a health system are largely going to swing on specificity.
Aim for higher value tests — Because the development and required validation for a diagnostic can be as high as that of a drug or medical device, it is important to pursue opportunities where the diagnostic can command a high price. These are usually markets where the alternatives are very expensive because they require new technology (e.g. advanced genetic tests) or a great deal of specialized labor (e.g. colonoscopy) or where the diagnostic directly decides on a costly course of treatment (e.g. a companion diagnostic for an oncology drug).
Go after unmet needs — If a test is able to fill a mostly unmet need — for example, if the alternatives are extremely inaccurate or poorly adopted — then adoption will be determined by awareness (because there aren’t credible alternatives) and pricing will be determined by sensitivity (because this drives the delivery of better care). This also simplifies the sales process.
Win beyond the test — Because performance can only ever get to 100%, each incremental point on sensitivity and specificity is both exponentially harder to achieve but also delivers less medical or financial value. As a result, it can be advantageous to focus on factors beyond the test such as regulatory approval / guidelines adoption, patient convenience, time to result, and impact on follow-up tests and procedures. Cologuard gained a great deal from being “the first FDA-approved colon cancer screening test”. Non-invasive prenatal testing, despite low positive predictive values and limited disease coverage, gained adoption in part by helping to triage follow-up amniocentesis (a procedure which has a low but still frighteningly high rate of miscarriage ~0.5%). Rapid antigen tests for COVID have also similarly been adopted despite their lower sensitivity and specificity than PCR tests due to their speed, low cost, and ability to carry out at home.
Diagnostics developers must carefully navigate the intersection of scientific innovation and financial reality, while grappling with the fact that even the most impressive technology may be insufficient without taking into account clinical and economic factors to achieve market success.
Ultimately, the path forward for diagnostic innovators lies in prioritizing specificity, targeting high-value and unmet needs, and crafting solutions that deliver value beyond the test itself. While Exact Science’s journey underscores the difficulty of these challenges, it also illustrates that with persistence, thoughtful investment, and strategic differentiation, it is possible to carve out a meaningful and impactful space in the market.
The rise of Asia as a force to be reckoned with in large scale manufacturing of critical components like batteries, solar panels, pharmaceuticals, chemicals, and semiconductors has left US and European governments seeking to catch up with a bit of a dilemma.
These activities largely moved to Asia because financially-motivated management teams in the West (correctly) recognized that:
they were low return in a conventional financial sense (require tremendous investment and maintenance)
most of these had a heavy labor component (and higher wages in the US/European meant US/European firms were at a cost disadvantage)
these activities tend to benefit from economies of scale and regional industrial ecosystems, so it makes sense for an industry to have fewer and larger suppliers
much of the value was concentrated in design and customer relationship, activities the Western companies would retain
What the companies failed to take into account was the speed at which Asian companies like WuXi, TSMC, Samsung, LG, CATL, Trina, Tongwei, and many others would consolidate (usually with government support), ultimately “graduating” into dominant positions with real market leverage and with the profitability to invest into the higher value activities that were previously the sole domain of Western industry.
Now, scrambling to reposition themselves closer to the forefront in some of these critical industries, these governments have tried to kickstart domestic efforts, only to face the economic realities that led to the outsourcing to begin with.
Northvolt, a major European effort to produce advanced batteries in Europe, is one example of this. Despite raising tremendous private capital and securing European government support, the company filed for bankruptcy a few days ago.
While much hand-wringing is happening in climate-tech circles, I take a different view: this should really not come as a surprise. Battery manufacturing (like semiconductor, solar, pharmaceutical, etc) requires huge amounts of capital and painstaking trial-and-error to perfect operations, just to produce products that are steadily dropping in price over the long-term. It’s fundamentally a difficult and not-very-rewarding endeavor. And it’s for that reason that the West “gave up” on these years ago.
But if US and European industrial policy is to be taken seriously here, the respective governments need to internalize that reality and be committed for the long haul. The idea that what these Asian companies are doing is “easily replicated” is simply not true, and the question is not if but when will the next recipient of government support fall into dire straits.
From the start, Northvolt set out to build something unprecedented. It didn’t just promise to build batteries in Europe, but to create an entire battery ecosystem, from scratch, in a matter of years. It would build the region’s biggest battery factories, develop and source its own materials, and recycle its own batteries. And, with some help from government subsidies, it would do so while matching prices from Asian manufacturers that had dominated global markets.
Northvolt’s ambitious attempt to compress decades of industry development into just eight years culminated last week, with its filing for Chapter 11 bankruptcy protection and the departure of several top executives, including CEO Peter Carlsson. The company’s downfall is a setback for Europe’s battery ambitions — as well as a signal of how challenging it is for the West to challenge Chinese dominance.
Thankfully, Keras 3 lived up to it’s multi-backend promise and made switching to JAX remarkably easy. For my code, I simply had to make three sets of tweaks.
First, I had to change the definition of my container images. Instead of starting from Tensorflow’s official Docker images, I instead installed JAX and Keras on Modal’s default Debian image and set the appropriate environmental variables to configure Keras to use JAX as a backend:
jax_image = (
modal.Image.debian_slim(python_version='3.11')
.pip_install('jax[cuda12]==0.4.35', extra_options="-U")
.pip_install('keras==3.6')
.pip_install('keras-hub==0.17')
.env({"KERAS_BACKEND":"jax"}) # sets Keras backend to JAX .env({"XLA_PYTHON_CLIENT_MEM_FRACTION":"1.0"})
Code language:Python(python)
Second, because tf.data pipelines convert everything to Tensorflow tensors, I had to switch my preprocessing pipelines from using Keras’s ops library (which, because I was using JAX as a backend, expected JAX tensors) to Tensorflow native operations:
Lastly, I had a few lines of code which assumed Tensorflow tensors (where getting the underlying value required a .numpy() call). As I was now using JAX as a backend, I had to remove the .numpy() calls for the code to work.
Everything else — the rest of the tf.data preprocessing pipeline, the code to train the model, the code to serve it, the previously saved model weights and the code to save & load them — remained the same! Considering that the training time per epoch and the time the model took to evaluate (a measure of inference time) both seemed to improve by 20-40%, this simple switch to JAX seemed well worth it!
Model Architecture Improvements
There were two major improvements I made in the model architecture over the past few months.
First, having run my news reader for the better part of a year now, I now have accumulated enough data where my strategy to simultaneously train on two related tasks (predicting the human rating and predicting the length of an article) no longer required separate inputs. This reduced the memory requirement as well as simplified the data pipeline for training (see architecture diagram below)
Secondly, I was successfully able to train a version of my algorithm which can use dot products natively. This not only allowed me to remove several layers from my previous model architecture (see architecture diagram below), but because the Supabase postgres database I’m using supports pgvector, it means I can even compute ratings for articles through a SQL query:
UPDATE articleuser
SET ai_rating = 0.5 + 0.5 * (1 - (a.embedding <=> u.embedding)),
rating_timestamp = NOW(),
updated_at = NOW()
FROM articles a,
users u
WHERE articleuser.article_id = a.id
AND articleuser.user_id = u.id
AND articleuser.ai_rating ISNULL;
Code language:SQL (Structured Query Language)(sql)
The result is much greater simplicity in architecture as well as greater operational flexibility as I can now update ratings from the database directly as well as from serving a deep neural network from my serverless backend.
Model architecture (output from Keras plot_model function)
Making Sources a First-Class Citizen
As I used the news reader, I realized early on that the ability to just have sorted content from one source (i.e. a particular blog or news site) would be valuable to have. To add this, I created and populated a new sources table within the database to track these independently (see database design diagram below) which was linked to the articles table.
Newsreader database design diagram (produced by a Supabase tool)
I then modified my scrapers to insert the identifier for each source alongside each new article, as well as made sure my fetch calls all JOIN‘d and pulled the relevant source information.
With the data infrastructure in place, I added the ability to add a source parameter to the core fetch URLs to enable single (or multiple) source feeds. I then added a quick element at the top of the feed interface (see below) to let a user know when the feed they’re seeing is limited to a given source. I also made all the source links in the feed clickable so that they could take the user to the corresponding single source feed.
One recurring issue I noticed in my use of the news reader pertained to slow load times. While some of this can be attributed to the “cold start” issue that serverless applications face, much of this was due to how the news reader was fetching pertinent articles from the database. It was deciding at the moment of the fetch request what was most relevant to send over by calculating all the pertinent scores and rank ordering. As the article database got larger, this computation became more complicated.
To address this, I decided to move to a “pre-calculated” ranking system. That way, the system would know what to fetch in advance of a fetch request (and hence return much faster). Couple that with a database index (which effectively “pre-sorts” the results to make retrieval even faster), and I saw visually noticeable improvements in load times.
But with any pre-calculated score scheme, the most important question is how and when re-calculation should happen. Too often and too broadly and you incur unnecessary computing costs. Too infrequently and you risk the scores becoming stale.
The compromise I reached derived itself from the three ways articles are ranked in my system:
The AI’s rating of an article plays the most important role (60%)
How recently the article was published is tied with… (20%)
How similar an article is with the 10 articles a user most recently read (20%
These factors lent themselves to very different natural update cadences:
Newly scraped articles would have their AI ratings and calculated score computed at the time they enter the database
AI ratings for the most recent and the previously highest scoring articles would be re-computed after model training updates
On a daily basis, each article’s score was recomputed (focusing on the change in article recency)
The article similarity for unread articles is re-evaluated after a user reads 10 articles
This required modifying the reader’s existing scraper and post-training processes to update the appropriate scores after scraping runs and model updates. It also meant tracking article reads on the users table (and modifying the /read endpoint to update these scores at the right intervals). Finally, it also meant adding a recurring cleanUp function set to run every 24 hours to perform this update as well as others.
Next Steps
With some of these performance and architecture improvements in place, my priorities are now focused on finding ways to systematically improve the underlying algorithms as well as increase the platform’s usability as a true news tool. To that end some of the top priorities for next steps in my mind include:
Testing new backbone models — The core ranking algorithm relies on Roberta, a model released 5 years ago before large language models were common parlance. Keras Hub makes it incredibly easy to incorporate newer models like Meta’s Llama 2 & 3, OpenAI’s GPT2, Microsoft’s Phi-3, and Google’s Gemma and fine-tune them.
Solving the “all good articles” problem — Because the point of the news reader is to surface content it considers good, users will not readily see lower quality content, nor will they see content the algorithm struggles to rank (i.e. new content very different from what the user has seen before). This makes it difficult to get the full range of data needed to help preserve the algorithm’s usefulness.
Creating topic and author feeds — Given that many people think in terms of topics and authors of interest, expanding what I’ve already done with Sources but with topics and author feeds sounds like a high-value next step
I also endeavor to make more regular updates to the public Github repository (instead of aggregate many updates I had already made into two large ones). This will make the updates more manageable and hopefully help anyone out there who’s interested in building a similar product.
The pursuit of carbon-free energy has largely leaned on intermittent sources of energy — like wind and solar; and sources that require a great deal of initial investment — like hydroelectric (which requires elevated bodies of water and dams) and nuclear (which require you to set up a reactor).
The theoretical beauty of geothermal power is that, if you dig deep enough, virtually everywhere on planet earth is hot enough to melt rock (thanks to the nuclear reactions that heat up the inside of the earth). But, until recently, geothermal has been limited to regions of Earth where well-formed geologic formations can deliver predictable steam without excessive engineering.
But, ironically, it is the fracking boom, which has helped the oil & gas industries get access to new sources of carbon-producing energy, which may help us tap geothermal power in more places. As fracking and oil & gas exploration has led to a revolution in our ability to precisely drill deep underground and push & pull fluids, it also presents the ability for us to tap more geothermal power than ever before. This has led to the rise of enhanced geothermal, the process by which we inject water deep underground to heat, and leverage the steam produced to generate electricity. Studies suggest the resource is particularly rich and accessible in the Southwest of the United States (see map below) and could be an extra tool in our portfolio to green energy consumption.
While there is a great deal of uncertainty around how much this will cost and just what it will take (not to mention the seismic risks that have plagued some fracking efforts), the hunger for more data center capacity and the desire to power this with clean electricity has helped startups like Fervo Energy and Sage Geosystems fund projects to explore.
On 17 October, Fervo Energy, a start-up based in Houston, Texas, got a major boost as the US government gave the green light to the expansion of a geothermal plant Fervo is building in Beaver County, Utah. The project could eventually generate as much as 2,000 megawatts — a capacity comparable with that of two large nuclear reactors. Although getting to that point could take a while, the plant already has 400 MW of capacity in the pipeline, and will be ready to provide around-the-clock power to Google’s energy-hungry data centres, and other customers, by 2028. In August, another start-up, Sage Geosystems, announced a partnership with Facebook’s parent company Meta to deliver up to 150 MW of geothermal power to Meta’s data centres by 2027.
A recent preprint from Stanford has demonstrated something remarkable: AI agents working together as a team solving a complex scientific challenge.
While much of the AI discourse focuses on how individual large language models (LLMs) compare to humans, much of human work today is a team effort, and the right question is less “can this LLM do better than a single human on a task” and more “what is the best team-up of AI and human to achieve a goal?” What is fascinating about this paper is that it looks at it from the perspective of “what can a team of AI agents achieve?”
The researchers tackled an ambitious goal: designing improved COVID-binding proteins for potential diagnostic or therapeutic use. Rather than relying on a single AI model to handle everything, the researchers tasked an AI “Principal Investigator” with assembling a virtual research team of AI agents! After some internal deliberation, the AI Principal Investigator selected an AI immunologist, an AI machine learning specialist, and an AI computational biologist. The researchers made sure to add an additional role, one of a “scientific critic” to help ground and challenge the virtual lab team’s thinking.
The team composition and phases of work planned and carried out by the AI principal investigator (Source: Figure 2 from Swanson et al.)
What makes this approach fascinating is how it mirrors high functioning human organizational structures. The AI team conducted meetings with defined agendas and speaking orders, with a “devil’s advocate” to ensure the ideas were grounded and rigorous.
Example of a virtual lab meeting between the AI agents; note the roles of the Principal Investigator (to set agenda) and Scientific Critic (to challenge the team to ground their work) (Source: Figure 6 from Swanson et al.)
One tactic that the researchers said helped with boosting creativity that is harder to replicate with humans is running parallel discussions, whereby the AI agents had the same conversation over and over again. In these discussions, the human researchers set the “temperature” of the LLM higher (inviting more variation in output). The AI principal investigator then took the output of all of these conversations and synthesized them into a final answer (this time with the LLM temperature set lower, to reduce the variability and “imaginativeness” of the answer).
The use of parallel meetings to get “creativity” and a diverse set of options (Source: Supplemental Figure 1 from Swanson et al.)
The results? The AI team successfully designed nanobodies (small antibody-like proteins — this was a choice the team made to pursue nanobodies over more traditional antibodies) that showed improved binding to recent SARS-CoV-2 variants compared to existing versions. While humans provided some guidance, particularly around defining coding tasks, the AI agents handled the bulk of the scientific discussion and iteration.
Experimental validation of some of the designed nanobodies; the relevant comparison is the filled in circles vs the open circles. The higher ELISA assay intensity for the filled in circles shows that the designed nanbodies bind better than their un-mutated original counterparts (Source: Figure 5C from Swanson et al.)
This work hints at a future where AI teams become powerful tools for human researchers and organizations. Instead of asking “Will AI replace humans?”, we should be asking “How can humans best orchestrate teams of specialized AI agents to solve complex problems?”
The implications extend far beyond scientific research. As businesses grapple with implementing AI, this study suggests that success might lie not in deploying a single, all-powerful AI system, but in thoughtfully combining specialized AI agents with human oversight. It’s a reminder that in both human and artificial intelligence, teamwork often trumps individual brilliance.
I personally am also interested in how different team compositions and working practices might lead to better or worse outcomes — for both AI teams and human teams. Should we have one scientific critic, or should their be specialist critics for each task? How important was the speaking order? What if the group came up with their own agendas? What if there were two principal investigators with different strengths?
The next frontier in AI might not be building bigger models, but building better teams.
I’ve been using OpenMediaVault 6 on a cheap mini-PC as a home server for over a year. Earlier this year, OpenMediaVault 7 was announced which upgrades the underlying Linux to Debian 12 Bookworm and made a number of other security, compatibility, and user interface improvements.
Not wanting to start over fresh, I decided to take advantage of OpenMediaVault’s built-in command line tool to handle upgrades and, if like me, you are looking for a quick and clean way of upgrading from OpenMediaVault 6 to OpenMediaVault 7, look no further:
SSH into your system/ connect directly with a keyboard and monitor. While normally I would recommend WeTTY (accessible via [Services > WeTTY] from the web console interface) to handle any command line activity on your server, because WeTTY relies on the server to be running and updating the operating system necessitates shutting the server down, you’ll need to either plug in a keyboard and monitor or use SSH.
Run sudo omv-upgrade in the command-line. This will start a long process of downloading and installing necessary files to complete the operating system update. From time to time you’ll be asked to accept / approve a set of changes via keyboard. If you’re on the online administrative panel, you’ll be booted off as the server shuts down.
Restartthe server. Once everything is complete, you’ll need to restart the server to make sure everything “takes”. This can be done by running reboot in the command line or by manually turning off and on the server.
Assuming everything went smoothly, after the server completes its reboot (which will take a little bit of extra time after an operating system upgrade), upon logging into the administrative console as you had done before, you’ll be greeted by the new OMV 7 login screen. Congratulations!