Medical Guidelines are incredibly important — they impact everything from your doctor’s recommendations and insurance coverage to the medications your insurance covers — but are somewhat shrouded in mystery.
This piece from Emily Oster’s ParentData is a good overview of what they are (and aren’t) — and give a pretty good explanation of why a headline from the popular press is probably not capturing the nuance and review of clinical evidence that goes into them.
(and yes, that title is a Schoolhouse Rock reference)
The headlines almost always simplify or distort these complexities. Sometimes those simplifications are necessary, other times…well, less so. Why should you care? Guidelines often impact health insurance coverage, influence standards of care, and play a role in how health policy is developed. Understanding what goes into them empowers you to engage more thoughtfully when talking to your doctor. You’ll be more aware of when those sensationalized headlines really mean anything, which is hopefully most of the time nothing at all.
I spotted this memo from Oaktree Capital founder Howard Marks and thought it was a sobering and grounded take on what makes a stock market bubble and reasons to be alarmed about the current concentration of market capitalization in the so-called “Magnificent Seven” and how eerily similar this was to the “Nifty Fifty” or the “Dot Com Bubble” eras of irrational exuberance. Whether you agree with him or not, it’s a worthwhile piece of wisdom to remember.
This graph that Marks borrowed from JP Morgan is also quite intriguing (terrifying?)
There’s usually a grain of truth that underlies every mania and bubble. It just gets taken too far. It’s clear that the internet absolutely did change the world – in fact, we can’t imagine a world without it. But the vast majority of internet and e-commerce companies that soared in the late ’90s bubble ended up worthless. When a bubble burst in my early investing days, The Wall Street Journal would run a box on the front page listing stocks that were down by 90%. In the aftermath of the TMT Bubble, they’d lost 99%.
When something is on the pedestal of popularity, the risk of a decline is high. When people assume – and price in – an expectation that things can only get better, the damage done by negative surprises is profound. When something is new, the competitors and disruptive technologies have yet to arrive. The merit may be there, but if it’s overestimated it can be overpriced, only to evaporate when reality sets in. In the real world, trees don’t grow to the sky.
As a Span customer, I’ve always appreciated their vision: to make home electrification cleaner, simpler, and more efficient through beautifully designed, tech-enabled electrical panels. But, let’s be honest, selling a product like this directly to consumers is tough. Electrical panels are not top-of-mind for most people until there’s a problem — and explaining the value proposition of “a smarter electrical panel” to justify the high price tag can be a real challenge. That’s why I’m unsurprised by their recent shift in strategy towards utilities.
This pivot to partnering with utility companies makes a lot of sense. Instead of trying to convince individual homeowners to upgrade, Span can now work directly with those who can impact community-scale electrification.
While the value proposition of avoiding costly service upgrades is undeniably beneficial for utilities, understanding precisely how that translates into financial savings for the utilities needs much more nuance. That, along with the fact that rebates & policy will vary wildly by locality, raises many uncertainties about pricing strategy (not to mention that there are other, larger smart electric panel companies like Leviton and Schneider Electric, albeit with less functional and less well-designed offerings).
I wish the company well. We need better electrical infrastructure in the US (and especially California, where I live) and one way to achieve that is for companies like Span to find a successful path to market.
Span’s panel costs $3,500, and accessories involve separate $700-plus purchases. It’s unavoidably pricey, though tax rebates and other incentives can bring the cost down, but the premise is that they offer buyers costs saving through avoiding expensive upgrades. The pitch to utility companies is also one of cost avoidance, just at a much larger scale. Span’s target utility customer is one at the intersection of load growth, the energy transition, and existing regulatory restrictions, especially in places with aggressive decarbonization timelines like California.
One of the most exciting technological developments from the semiconductor side of things is the rapid development of the ecosystem around the open-source RISC-V instruction set architecture (ISA). One landmark in its rise is that the architecture appears to be moving beyond just behind-the-scenes projects to challenging Intel/AMD’s x86 architecture and ARM (used by Apple and Qualcomm) in customer-facing applications.
This article highlights this crucial development by reporting on early adopters embracing RISC-V to move into higher-end devices like laptops. Companies like Framework and DeepComputing have just launched or are planning to launch RISC-V laptops. While RISC-V-powered hardware still have a steep mountain to climb of software and performance challenges (as evidenced by the amount of time it’s taken for the ARM ecosystem to be credible in PCs), Intel’s recent setbacks and ARM’s legal battles with Qualcomm over licensing (pretty much guaranteeing every company that uses ARM is now going to work on RISC-V) coupled with the open source nature of RISC-V potentially allowing for a lot more innovation in form factors and functionality may have created an opening here for enterprising companies willing to make the investment.
“If we look at a couple of generations down the [software] stack, we’re starting to see a line of sight to consumer-ready RISC-V in something like a laptop, or even a phone,” said Nirav Patel, CEO of laptop maker Framework. Patel’s company plans to release a laptop that can support a RISC-V mainboard in 2025. Though still intended for early adopters and developers, it will be the most accessible and polished RISC-V laptop yet, and it will ship to users with the same look and feel as the Framework laptops that use x86 chips.
While growing vehicle electrification is inevitable, it always surprised me that US automakers would drop past plug-in hybrid (PHEV) technology to only embrace all-electric. While many have attacked Toyota’s more deliberate “slow-and-steady” approach to vehicle electrification, it always seemed to me that, until we had broadly available, high quality electric vehicle charging infrastructure and until all-electric vehicles were broadly available at the price point of a non-luxury family car (i.e. a Camry or RAV4), that electric vehicles were going to be more of a upper middle class/wealthy phenomena. Considering their success in the Chinese automotive market (and growing faster than all-electric vehicles!), it always felt odd that the category wouldn’t make its way into the US market as the natural next step in vehicle electrification.
It sounds like Dodge Ram (a division of Stellantis) agrees. It intends to delay its all-electric version of its Ram 1500 in favor of starting with its extended range plug-in hybrid version, the Ramcharger. Extended range electric vehicles (EREVs) are plug-in hybrids similar to the Chevy Volt. They employ an electric powertrain and a generator which can run on gasoline to supply additional range when the battery runs low.
While it still remains to be seen how well these EREVs/PHEVs are adopted — the price points that are being discussed still feel too high to me — seeing broader adoption of plug-in hybrid technology (supplemented with gas-powered range extension) feels like the natural next step on our path to vehicle electrification.
Consumers are still looking for electrified rides, just not the ones that many industry pundits predicted. In China, Europe, and the United States, buyers are converging on hybrids, whose sales growth is outpacing that of pure EVs. “It’s almost been a religion that it’s EVs or bust, so let’s not fool around with hybrids or hydrogen,” says Michael Dunne, CEO of Dunne Insights, a leading analyst of China’s auto industry. “But even in the world’s largest market, almost half of electrified vehicle sales are hybrids.”
Inspired by some work from a group at Stanford on building a lab from AI agents, I’ve been experimenting with multi-agent AI conversations and workflows. But, because the space (at least to me) has seemed more focused on building more capable agents rather than coordinating and working with more agents, the existing tools and libraries have been difficult to carry out experiments.
To facilitate some of my own exploration work, I built what I’m calling a Multi-Agent ChatLab — a browser-based, completely portable setup to define multiple AI agents and facilitate conversations between them. This has made my experimentation work vastly simpler and I hope it can help someone else.
More about how to use this & the underlying design on this page.
And, to show off the tool, and for your amusement (and given my love of military history), here is a screengrab from the tool where I set up two AI Agents — one believing itself to be Napoleon Bonaparte and one believing itself to be the Duke of Wellington (the British commander who defeated Napoleon at Waterloo) — and had them describe (and compare!) the hallmarks of their military strategy.
While much of the effort to green shipping has focused on the use of alternative fuels like hydrogen, ammonia and methanol as replacements for bunker fuel, I recently saw an article on the use of automated & highly durable sail technology to le ships leverage wind as a means to reduce fuel consumption.
I don’t have any inside information on what the cost / speed tradeoffs are for the technology, nor whether or not there’s a credible path to scaling to handle the massive container ships that dominate global shipping, but it’s a fascinating technology vector, and a direct result of the growing realization by the shipping industry that it needs to green itself.
Wind, on the other hand, is abundant. With the U.N.’s International Maritime Organization poised to negotiate stricter climate policies next year, including a new carbon pricing mechanism and global fuel standard, more shipping companies are seriously considering the renewable resource as an immediate answer. While sails aren’t likely to outright replace the enormous engines that drive huge cargo ships, wind power could still make a meaningful dent in the industry’s overall emissions, experts say.
One of the most exciting areas of technology development, but that doesn’t get a ton of mainstream media coverage, is the race to build a working quantum computer that exhibits “below threshold quantum computing” — the ability to do calculations utilizing quantum mechanics accurately.
One of the key limitations to achieving this has been the sensitivity of quantum computing systems — in particular the qubits that capture the superposition of multiple states that allow quantum computers to exploit quantum mechanics for computation — to the world around them. Imagine if your computer’s accuracy would change every time someone walked in the room — even if it was capable of amazing things, it would not be especially practical. As a result, much research to date has been around novel ways of creating physical systems that can protect these quantum states.
Google has (in a pre-print in Nature) demonstrated their new Willow quantum computing chip which demonstrates a quantum error correction method that spreads the quantum state information of a single “logical” qubit across multiple entangled“physical” qubits to create a more robust system. Beyond proving that their quantum error correction method worked, what is most remarkable to me, is that they’re able to extrapolate a scaling law for their error correction — a way of guessing how much better their system is at avoiding loss of quantum state as they increase the number of physical qubits per logical qubit — which could suggest a “scale up” path towards building functional, practical quantum computers.
I will confess that quantum mechanics was never my strong suit (beyond needing it for a class on statistical mechanics eons ago in college), and my understanding of the core physics underlying what they’ve done in the paper is limited, but this is an incredibly exciting feat on our way towards practical quantum computing systems!
The company’s new chip, called Willow, is a larger, improved version of that technology, with 105 physical qubits. It was developed in a fabrication laboratory that Google built at its quantum-computing campus in Santa Barbara, California, in 2021.
As a first demonstration of Willow’s power, the researchers showed that it could perform, in roughly 5 minutes, a task that would take the world’s largest supercomputer an estimated 1025 years, says Hartmut Neven, who heads Google’s quantum-computing division. This is the latest salvo in the race to show that quantum computers have an advantage over classical ones.
And, by creating logical qubits inside Willow, the Google team has shown that each successive increase in the size of a logical qubit cuts the error rate in half.
“This is a very impressive demonstration of solidly being below threshold,” says Barbara Terhal, a specialist in quantum error correction at the Delft University of Technology in the Netherlands. Mikhail Lukin, a physicist at Harvard University in Cambridge, Massachusetts, adds, “It clearly shows that the idea works.”
I had never heard of this framework for thinking about how to address problems before. Shout-out to my friend Chris Yiu and his new Substack Secret Weapon about improving productivity for teaching me about this. It’s surprisingly insightful about when to think about something as a process problem vs an expertise problem vs experimentation vs direction.
[The Cynefin framework] organises problems into four primary domains:
Clear. Cause and effect are obvious; categorise the situation and apply best practice. Example: baking a cake. Rewards process.
Complicated. Cause and effect are knowable but not immediately apparent; analyse the situation carefully to find a solution. Example: coaching a sports team. Rewards expertise.
Complex. Cause and effect are only apparent with hindsight; focus on spotting and exploiting patterns as they emerge. Example: playing poker. Rewards [experimentation — Chris’s original term was entrepreneurship, I think experimentation is more clear & actionable].
Chaotic. Cause and effect are impossible to parse; act on instinct, in the hope of imposing some order on the immediate situation. Example: novel crisis response. Rewards direction.
The best return on investment in terms of hours of deep engagement per dollar in entertainment is with games. When done right, they blend stunning visuals and sounds, earworm-like musical scores, compelling story and acting, and a sense of progression that are second to none.
Case in point: I bought the complete edition of the award-winning The Witcher 3: Wild Hunt for $10 during a Steam sale in 2021. According to Steam, I’ve logged over 200 hours (I had to doublecheck that number!) playing the game, between two playthroughs and the amazing expansions Hearts of Stone and Blood and Wine — an amazing 20 hours/dollar spent. Even paying full freight (as of this writing, the complete edition including both expansions costs $50), that would still be a remarkable 4 hours/dollar. Compare that with the price of admission to a movie or theater or concert.
The Witcher 3 has now surpassed 50 million sales — comfortably earning over $1 billion in revenue which is an amazing feat for any media property.
But as amazing and as lucrative as these games can be, these games cannot escape the cruel hit-driven basis of their industry, where a small number of games generate the majority of financial returns. This has resulted in studios chasing ever more expensive games with familiar intellectual property (i.e. Star Wars) that has, to many game players, cut the soul from the games and has led to financial instability in even popular game studios.
This article from IGN summarizes the state of the industry well — with so-called AAA games now costing $200 million to create, not to mention $100’s of millions to market, more and more studios have to wind down as few games can generate enough revenue to cover the cost of development and marketing.
The article predicts — and I hope it’s right — that the games industry will learn some lessons that many studios in Hollywood/the film industry have been forced to: embrace more small budget games to experiment with new forms and IP. Blockbusters will have their place but going all-in on blockbusters is a recipe for a hollowing out of the industry and a cutting off of the creativity that it needs.
Or, as the author so nicely puts it: “Maybe studios can remember that we used to play video games because they were fun – not because of their bigger-than-last-year maps carpeted by denser, higher-resolution grass that you walk across to finish another piece of side content that pushes you one digit closer to 100% completion.”
Just five years ago, AAA projects’ average budget ranged $50 – $150 million. Today, the minimum average is $200 million. Call of Duty’s new benchmark is $300 million, with Activision admitting in the Competition & Market Authority’s report on AAA development that it now takes the efforts of one-and-a-half studios just to complete the annual Call of Duty title.
It’s far from just Call of Duty facing ballooning costs. In the same CMA report, an anonymous publisher admits that development costs for one of its franchises reached $660 million. With $550 million of marketing costs on top, that is a $1.2 billion game. To put that into perspective, Minecraft – the world’s best-selling video game of all time – has of last year only achieved $3 billion. It took 12 years to reach that figure, having launched in 2011.
Everywhere you look, the message seems clear: early detection (of cancer & disease) saves lives. Yet behind the headlines, companies developing these screening tools face a different reality. Many tests struggle to gain approval, adoption, or even financial viability. The problem isn’t that the science is bad — it’s that the math is brutal.
This piece unpacks the economic and clinical trade-offs at the heart of the early testing / disease screening business. Why do promising technologies struggle to meet cost-effectiveness thresholds, despite clear scientific advances? And what lessons can diagnostic innovators take from these challenges to improve their odds of success? By the end, you’ll have a clearer view of the challenges and opportunities in bringing new diagnostic tools to market—and why focusing on the right metrics can make all the difference.
The brutal math of diagnostics
Technologists often prioritize metrics like sensitivity (also called recall) — the ability of a diagnostic test to correctly identify individuals with a condition (i.e., if the sensitivity of a test is 90%, then 90% of patients with the disease will register as positives and the remaining 10% will be false negatives) — because it’s often the key scientific challenge and aligns nicely with the idea of getting more patients earlier treatment.
But when it comes to adoption and efficiency, specificity — the ability of a diagnostic test to correctly identify healthy individuals (i.e., if the specificity of a test is 90%, then 90% of healthy patients will register as negatives and the remaining 10% will be false positives) — is usually the more important and overlooked criteria.
The reason specificity is so important is that it can have a profound impact on a test’s Positive Predictive Value (PPV) — whether or not a positive test result means a patient actually has a disease (i.e., if the positive predictive value of a test is 90%, then a patient that registers as positive has a 90% chance of having the disease and 10% chance of actually being healthy — being a false positive).
What is counter-intuitive, even to many medical and scientific experts, is that because (by definition) most patients are healthy, many high accuracy tests have disappointingly low PPV as most positive results are actually false positives.
Let me present an example (see table below for summary of the math) that will hopefully explain:
Let’s say we have an HIV test with 99% sensitivity and 99% specificity — a 99% (very) accurate test!
If we tested 10,000 Americans at random, you would expect roughly 36 of them (0.36% x 10,000) to be HIV positive. That means, roughly 9,964 are HIV negative
99% sensitivity means 99% of the 36 HIV positive patients will test positive (99% x 36 = ~36)
99% specificity means 99% of the 9,964 HIV negative patients will test negative (99% x 9,964 = ~9,864) while 1% (1% x 9,964 = ~100) would be false positives
This means that even though the test is 99% accurate, it only has a positive predictive value of ~26% (36 true positives out of 136 total positive results)
Below (if you’re on a browser) is an embedded calculator which will run this math for any values of disease prevalence and sensitivity / specificity (and here is a link to a Google Sheet that will do the same), but you’ll generally find that low disease rates result in low positive predictive values for even very accurate diagnostics.
Typically, introducing a new diagnostic means balancing true positives against the burden of false positives. After all, for patients, false positives will result in anxiety, invasive tests, and, sometimes, unnecessary treatments. For healthcare systems, they can be a significant economic burden as the cost of follow-up testing and overtreatment add up, complicating their willingness to embrace new tests.
Below (if you’re on a browser) is an embedded calculator which will run the basic diagnostic economics math for different values of the cost of testing and follow-up testing to calculate the cost of testing and follow-up testing per patient helped (and here is a link to a Google Sheet that will do the same)
Finally, while diagnostics businesses face many of the same development hurdles as drug developers — the need to develop cutting-edge technology, to carry out large clinical studies to prove efficacy, and to manage a complex regulatory and reimbursement landscape — unlike drug developers, diagnostic businesses face significant pricing constraints. Successful treatments can command high prices for treating a disease. But successful diagnostic tests, no matter how sophisticated, cannot, because they ultimately don’t treat diseases, they merely identify them.
Case Study: Exact Sciences and Cologuard
Let’s take Cologuard (from Exact Sciences) as an example. Cologuard is a combination genomic and immunochemistry test for colon cancer carried out on patient stool samples. It’s two primary alternatives are:
a much less sensitive fecal immunochemistry test (FIT) — which uses antibodies to detect blood in the stool as a potential, imprecise sign of colon cancer
colonoscopies — a procedure where a skilled physician uses an endoscope to enter and look for signs of cancer in a patient’s colon. It’s considered the “gold standard” as it functions both as diagnostic and treatment (a physician can remove or biopsy any lesion or polyp they find). But, because it’s invasive and uncomfortable for the patient, this test is typically only done every 4-10 years
Cologuard is (as of this writing) Exact Science’s primary product line, responsible for a large portion of Exact Science’s $2.5 billion in 2023 revenue. It can detect earlier stage colon cancer as well as pre-cancerous growths that could lead to cancer. Impressively, Exact Sciences also commands a gross margin greater than 70%, a high margin achieved mainly by pharmaceutical and software companies that have low per-unit costs of production. This has resulted in Exact Sciences, as of this writing, having a market cap over $11 billion.
Yet for all its success, Exact Sciences is also a cautionary note, illustrating the difficulties of building a diagnostics company.
The company was founded in 1995, yet didn’t see meaningful revenue from selling diagnostics until 2014 (nearly 20 years later, after it received FDA approval for Cologuard)
The company has never had a profitable year (this includes the last 10 years it’s been in-market), losing over $200 million in 2023, and in the first three quarters of 2024, it has continued to be unprofitable.
Between 1997 (the first year we have good data from their SEC filings as summarized in this Google Sheet) and 2014 when it first achieved meaningful diagnostic revenue, Exact Sciences lost a cumulative $420 million, driven by $230 million in R&D spending, $88 million in Sales & Marketing spending, and $33 million in CAPEX. It funded those losses by issuing over $624 million in stock (diluting investors and employees)
From 2015-2023, it has needed to raise an additional $3.5 billion in stock and convertible debt (net of paybacks) to cover its continued losses (over $3 billion from 2015-2023)
Prior to 2014, Exact Sciences attempted to commercialize colon cancer screening technologies through partnerships with LabCorp (ColoSure and PreGenPlus). These were not very successful and led to concerns from the FDA and insurance companies. This forced Exact Sciences to invest heavily in clinical studies to win over the payers and the FDA, including a pivotal ~10,000 patient study to support Cologuard which recruited patients from over 90 sites and took over 1.5 years.
It took Exact Sciences 3 years after FDA approval of Cologuard for its annual diagnostic revenues to exceed what it spends on sales & marketing. It continues to spend aggressively there ($727M in 2023).
While it’s difficult to know precisely what the company’s management / investors would do differently if they could do it all over again, the brutal math of diagnostics certainly played a key role.
From a clinical perspective, Cologuard faces the same low positive predictive value problem all diagnostic screening tests face. From the data in their study on ~10,000 patients, it’s clear that, despite having a much higher sensitivity for cancer (92.3% vs 73.8%) and higher AUROC (94% vs 89%) than the existing FIT test, the PPV of Cologuard is only 3.7% (lower than the FIT test: 6.9%).
Even using a broader disease definition that includes the pre-cancerous advanced lesions Exact Sciences touted as a strength, the gap on PPV does not narrow (Cologuard: 23.6% vs FIT: 32.6%)
The economic comparison with a FIT test fares even worse due to the higher cost of Cologuard as well as the higher rate of false positives. Under the Center for Medicare & Medicaid Service’s 2024Q4 laboratory fee schedule, a FIT test costs $16 (CPT code: 82274), but Cologuard costs $509 (CPT code: 81528), over 30x higher! If each positive Cologuard and FIT test results in a follow-up colonoscopy (which has a cost of $800-1000 according to this 2015 analysis), the screening cost per cancer patient is 5.2-7.1x higher for Cologuard than for the FIT test.
This quick math has been confirmed in several studies.
A study by a group at the University Medical Center of Rotterdam concluded that “Compared to nearly all other CRC screening strategies reimbursed by CMS (Medicare), [Cologuard] is less effective and considerably more costly, making it an inefficient screening option” and would only be comparable at a much lower cost (~$6-18!)
While Medicare and the US Preventive Services Task Force concluded that the cost of Cologuard and the increase in false positives / colonoscopy complications was worth the improved early detection of colon cancer, it stayed largely silent on comparing cost-efficacy with the FIT test. It’s this unfavorable comparison that has probably required Exact Sciences to invest so heavily in sales and marketing to drive sales. That Cologuard has been so successful is a testament both to the value of being the only FDA-approved test on the market as well as Exact Science’s efforts in making Cologuard so well-known (how many other diagnostics do you know have an SNL skit dedicated to them?).
Not content to rest on the laurels of Cologuard, Exact Sciences recently published a ~20,000 patient study on their next generation colon cancer screening test: Cologuard Plus. While the study suggests Exact Sciences has improved the test across the board, the company’s marketing around Cologuard Plus having both >90% sensitivity and specificity is misleading, because the figures for sensitivity and specificity are for different conditions: sensitivity for colorectal cancer but specificity for colorectal cancer OR advanced precancerous lesion (see the table below).
Disentangling these numbers shows that while Cologuard Plus has narrowed its PPV disadvantage (now worse by 1% on colorectal cancer and even on cancer or lesion) and its cost-efficacy disadvantage (now “only” 4.4-5.8x more expensive) vs the FIT test (see tables below), it still hasn’t closed the gap.
Time will tell if this improved test performance translates to continued sales performance for Exact Sciences, but it is telling that despite the significant time and resources that went into developing Cologuard Plus, the data suggests it’s still likely more cost effective for health systems to adopt FIT over Cologuard Plus as a means of preventing advanced colon cancer.
Lessons for diagnostics companies
The underlying math of the diagnostics business and the lessons from Exact Sciences’ long path to dramatic sales has several key lessons for diagnostic entrepreneurs:
Focus on specificity — For diagnostic technologists, too little attention is paid to specificity while too much attention is paid on sensitivity. Positive predictive value and the cost-benefit for a health system are largely going to swing on specificity.
Aim for higher value tests — Because the development and required validation for a diagnostic can be as high as that of a drug or medical device, it is important to pursue opportunities where the diagnostic can command a high price. These are usually markets where the alternatives are very expensive because they require new technology (e.g. advanced genetic tests) or a great deal of specialized labor (e.g. colonoscopy) or where the diagnostic directly decides on a costly course of treatment (e.g. a companion diagnostic for an oncology drug).
Go after unmet needs — If a test is able to fill a mostly unmet need — for example, if the alternatives are extremely inaccurate or poorly adopted — then adoption will be determined by awareness (because there aren’t credible alternatives) and pricing will be determined by sensitivity (because this drives the delivery of better care). This also simplifies the sales process.
Win beyond the test — Because performance can only ever get to 100%, each incremental point on sensitivity and specificity is both exponentially harder to achieve but also delivers less medical or financial value. As a result, it can be advantageous to focus on factors beyond the test such as regulatory approval / guidelines adoption, patient convenience, time to result, and impact on follow-up tests and procedures. Cologuard gained a great deal from being “the first FDA-approved colon cancer screening test”. Non-invasive prenatal testing, despite low positive predictive values and limited disease coverage, gained adoption in part by helping to triage follow-up amniocentesis (a procedure which has a low but still frighteningly high rate of miscarriage ~0.5%). Rapid antigen tests for COVID have also similarly been adopted despite their lower sensitivity and specificity than PCR tests due to their speed, low cost, and ability to carry out at home.
Diagnostics developers must carefully navigate the intersection of scientific innovation and financial reality, while grappling with the fact that even the most impressive technology may be insufficient without taking into account clinical and economic factors to achieve market success.
Ultimately, the path forward for diagnostic innovators lies in prioritizing specificity, targeting high-value and unmet needs, and crafting solutions that deliver value beyond the test itself. While Exact Science’s journey underscores the difficulty of these challenges, it also illustrates that with persistence, thoughtful investment, and strategic differentiation, it is possible to carve out a meaningful and impactful space in the market.
The rise of Asia as a force to be reckoned with in large scale manufacturing of critical components like batteries, solar panels, pharmaceuticals, chemicals, and semiconductors has left US and European governments seeking to catch up with a bit of a dilemma.
These activities largely moved to Asia because financially-motivated management teams in the West (correctly) recognized that:
they were low return in a conventional financial sense (require tremendous investment and maintenance)
most of these had a heavy labor component (and higher wages in the US/European meant US/European firms were at a cost disadvantage)
these activities tend to benefit from economies of scale and regional industrial ecosystems, so it makes sense for an industry to have fewer and larger suppliers
much of the value was concentrated in design and customer relationship, activities the Western companies would retain
What the companies failed to take into account was the speed at which Asian companies like WuXi, TSMC, Samsung, LG, CATL, Trina, Tongwei, and many others would consolidate (usually with government support), ultimately “graduating” into dominant positions with real market leverage and with the profitability to invest into the higher value activities that were previously the sole domain of Western industry.
Now, scrambling to reposition themselves closer to the forefront in some of these critical industries, these governments have tried to kickstart domestic efforts, only to face the economic realities that led to the outsourcing to begin with.
Northvolt, a major European effort to produce advanced batteries in Europe, is one example of this. Despite raising tremendous private capital and securing European government support, the company filed for bankruptcy a few days ago.
While much hand-wringing is happening in climate-tech circles, I take a different view: this should really not come as a surprise. Battery manufacturing (like semiconductor, solar, pharmaceutical, etc) requires huge amounts of capital and painstaking trial-and-error to perfect operations, just to produce products that are steadily dropping in price over the long-term. It’s fundamentally a difficult and not-very-rewarding endeavor. And it’s for that reason that the West “gave up” on these years ago.
But if US and European industrial policy is to be taken seriously here, the respective governments need to internalize that reality and be committed for the long haul. The idea that what these Asian companies are doing is “easily replicated” is simply not true, and the question is not if but when will the next recipient of government support fall into dire straits.
From the start, Northvolt set out to build something unprecedented. It didn’t just promise to build batteries in Europe, but to create an entire battery ecosystem, from scratch, in a matter of years. It would build the region’s biggest battery factories, develop and source its own materials, and recycle its own batteries. And, with some help from government subsidies, it would do so while matching prices from Asian manufacturers that had dominated global markets.
Northvolt’s ambitious attempt to compress decades of industry development into just eight years culminated last week, with its filing for Chapter 11 bankruptcy protection and the departure of several top executives, including CEO Peter Carlsson. The company’s downfall is a setback for Europe’s battery ambitions — as well as a signal of how challenging it is for the West to challenge Chinese dominance.
Thankfully, Keras 3 lived up to it’s multi-backend promise and made switching to JAX remarkably easy. For my code, I simply had to make three sets of tweaks.
First, I had to change the definition of my container images. Instead of starting from Tensorflow’s official Docker images, I instead installed JAX and Keras on Modal’s default Debian image and set the appropriate environmental variables to configure Keras to use JAX as a backend:
jax_image = (
modal.Image.debian_slim(python_version='3.11')
.pip_install('jax[cuda12]==0.4.35', extra_options="-U")
.pip_install('keras==3.6')
.pip_install('keras-hub==0.17')
.env({"KERAS_BACKEND":"jax"}) # sets Keras backend to JAX .env({"XLA_PYTHON_CLIENT_MEM_FRACTION":"1.0"})
Code language:Python(python)
Second, because tf.data pipelines convert everything to Tensorflow tensors, I had to switch my preprocessing pipelines from using Keras’s ops library (which, because I was using JAX as a backend, expected JAX tensors) to Tensorflow native operations:
Lastly, I had a few lines of code which assumed Tensorflow tensors (where getting the underlying value required a .numpy() call). As I was now using JAX as a backend, I had to remove the .numpy() calls for the code to work.
Everything else — the rest of the tf.data preprocessing pipeline, the code to train the model, the code to serve it, the previously saved model weights and the code to save & load them — remained the same! Considering that the training time per epoch and the time the model took to evaluate (a measure of inference time) both seemed to improve by 20-40%, this simple switch to JAX seemed well worth it!
Model Architecture Improvements
There were two major improvements I made in the model architecture over the past few months.
First, having run my news reader for the better part of a year now, I now have accumulated enough data where my strategy to simultaneously train on two related tasks (predicting the human rating and predicting the length of an article) no longer required separate inputs. This reduced the memory requirement as well as simplified the data pipeline for training (see architecture diagram below)
Secondly, I was successfully able to train a version of my algorithm which can use dot products natively. This not only allowed me to remove several layers from my previous model architecture (see architecture diagram below), but because the Supabase postgres database I’m using supports pgvector, it means I can even compute ratings for articles through a SQL query:
UPDATE articleuser
SET ai_rating = 0.5 + 0.5 * (1 - (a.embedding <=> u.embedding)),
rating_timestamp = NOW(),
updated_at = NOW()
FROM articles a,
users u
WHERE articleuser.article_id = a.id
AND articleuser.user_id = u.id
AND articleuser.ai_rating ISNULL;
Code language:SQL (Structured Query Language)(sql)
The result is much greater simplicity in architecture as well as greater operational flexibility as I can now update ratings from the database directly as well as from serving a deep neural network from my serverless backend.
Making Sources a First-Class Citizen
As I used the news reader, I realized early on that the ability to just have sorted content from one source (i.e. a particular blog or news site) would be valuable to have. To add this, I created and populated a new sources table within the database to track these independently (see database design diagram below) which was linked to the articles table.
I then modified my scrapers to insert the identifier for each source alongside each new article, as well as made sure my fetch calls all JOIN‘d and pulled the relevant source information.
With the data infrastructure in place, I added the ability to add a source parameter to the core fetch URLs to enable single (or multiple) source feeds. I then added a quick element at the top of the feed interface (see below) to let a user know when the feed they’re seeing is limited to a given source. I also made all the source links in the feed clickable so that they could take the user to the corresponding single source feed.
One recurring issue I noticed in my use of the news reader pertained to slow load times. While some of this can be attributed to the “cold start” issue that serverless applications face, much of this was due to how the news reader was fetching pertinent articles from the database. It was deciding at the moment of the fetch request what was most relevant to send over by calculating all the pertinent scores and rank ordering. As the article database got larger, this computation became more complicated.
To address this, I decided to move to a “pre-calculated” ranking system. That way, the system would know what to fetch in advance of a fetch request (and hence return much faster). Couple that with a database index (which effectively “pre-sorts” the results to make retrieval even faster), and I saw visually noticeable improvements in load times.
But with any pre-calculated score scheme, the most important question is how and when re-calculation should happen. Too often and too broadly and you incur unnecessary computing costs. Too infrequently and you risk the scores becoming stale.
The compromise I reached derived itself from the three ways articles are ranked in my system:
The AI’s rating of an article plays the most important role (60%)
How recently the article was published is tied with… (20%)
How similar an article is with the 10 articles a user most recently read (20%
These factors lent themselves to very different natural update cadences:
Newly scraped articles would have their AI ratings and calculated score computed at the time they enter the database
AI ratings for the most recent and the previously highest scoring articles would be re-computed after model training updates
On a daily basis, each article’s score was recomputed (focusing on the change in article recency)
The article similarity for unread articles is re-evaluated after a user reads 10 articles
This required modifying the reader’s existing scraper and post-training processes to update the appropriate scores after scraping runs and model updates. It also meant tracking article reads on the users table (and modifying the /read endpoint to update these scores at the right intervals). Finally, it also meant adding a recurring cleanUp function set to run every 24 hours to perform this update as well as others.
Next Steps
With some of these performance and architecture improvements in place, my priorities are now focused on finding ways to systematically improve the underlying algorithms as well as increase the platform’s usability as a true news tool. To that end some of the top priorities for next steps in my mind include:
Testing new backbone models — The core ranking algorithm relies on Roberta, a model released 5 years ago before large language models were common parlance. Keras Hub makes it incredibly easy to incorporate newer models like Meta’s Llama 2 & 3, OpenAI’s GPT2, Microsoft’s Phi-3, and Google’s Gemma and fine-tune them.
Solving the “all good articles” problem — Because the point of the news reader is to surface content it considers good, users will not readily see lower quality content, nor will they see content the algorithm struggles to rank (i.e. new content very different from what the user has seen before). This makes it difficult to get the full range of data needed to help preserve the algorithm’s usefulness.
Creating topic and author feeds — Given that many people think in terms of topics and authors of interest, expanding what I’ve already done with Sources but with topics and author feeds sounds like a high-value next step
I also endeavor to make more regular updates to the public Github repository (instead of aggregate many updates I had already made into two large ones). This will make the updates more manageable and hopefully help anyone out there who’s interested in building a similar product.
The pursuit of carbon-free energy has largely leaned on intermittent sources of energy — like wind and solar; and sources that require a great deal of initial investment — like hydroelectric (which requires elevated bodies of water and dams) and nuclear (which require you to set up a reactor).
The theoretical beauty of geothermal power is that, if you dig deep enough, virtually everywhere on planet earth is hot enough to melt rock (thanks to the nuclear reactions that heat up the inside of the earth). But, until recently, geothermal has been limited to regions of Earth where well-formed geologic formations can deliver predictable steam without excessive engineering.
But, ironically, it is the fracking boom, which has helped the oil & gas industries get access to new sources of carbon-producing energy, which may help us tap geothermal power in more places. As fracking and oil & gas exploration has led to a revolution in our ability to precisely drill deep underground and push & pull fluids, it also presents the ability for us to tap more geothermal power than ever before. This has led to the rise of enhanced geothermal, the process by which we inject water deep underground to heat, and leverage the steam produced to generate electricity. Studies suggest the resource is particularly rich and accessible in the Southwest of the United States (see map below) and could be an extra tool in our portfolio to green energy consumption.
While there is a great deal of uncertainty around how much this will cost and just what it will take (not to mention the seismic risks that have plagued some fracking efforts), the hunger for more data center capacity and the desire to power this with clean electricity has helped startups like Fervo Energy and Sage Geosystems fund projects to explore.
On 17 October, Fervo Energy, a start-up based in Houston, Texas, got a major boost as the US government gave the green light to the expansion of a geothermal plant Fervo is building in Beaver County, Utah. The project could eventually generate as much as 2,000 megawatts — a capacity comparable with that of two large nuclear reactors. Although getting to that point could take a while, the plant already has 400 MW of capacity in the pipeline, and will be ready to provide around-the-clock power to Google’s energy-hungry data centres, and other customers, by 2028. In August, another start-up, Sage Geosystems, announced a partnership with Facebook’s parent company Meta to deliver up to 150 MW of geothermal power to Meta’s data centres by 2027.
A recent preprint from Stanford has demonstrated something remarkable: AI agents working together as a team solving a complex scientific challenge.
While much of the AI discourse focuses on how individual large language models (LLMs) compare to humans, much of human work today is a team effort, and the right question is less “can this LLM do better than a single human on a task” and more “what is the best team-up of AI and human to achieve a goal?” What is fascinating about this paper is that it looks at it from the perspective of “what can a team of AI agents achieve?”
The researchers tackled an ambitious goal: designing improved COVID-binding proteins for potential diagnostic or therapeutic use. Rather than relying on a single AI model to handle everything, the researchers tasked an AI “Principal Investigator” with assembling a virtual research team of AI agents! After some internal deliberation, the AI Principal Investigator selected an AI immunologist, an AI machine learning specialist, and an AI computational biologist. The researchers made sure to add an additional role, one of a “scientific critic” to help ground and challenge the virtual lab team’s thinking.
What makes this approach fascinating is how it mirrors high functioning human organizational structures. The AI team conducted meetings with defined agendas and speaking orders, with a “devil’s advocate” to ensure the ideas were grounded and rigorous.
One tactic that the researchers said helped with boosting creativity that is harder to replicate with humans is running parallel discussions, whereby the AI agents had the same conversation over and over again. In these discussions, the human researchers set the “temperature” of the LLM higher (inviting more variation in output). The AI principal investigator then took the output of all of these conversations and synthesized them into a final answer (this time with the LLM temperature set lower, to reduce the variability and “imaginativeness” of the answer).
The results? The AI team successfully designed nanobodies (small antibody-like proteins — this was a choice the team made to pursue nanobodies over more traditional antibodies) that showed improved binding to recent SARS-CoV-2 variants compared to existing versions. While humans provided some guidance, particularly around defining coding tasks, the AI agents handled the bulk of the scientific discussion and iteration.
This work hints at a future where AI teams become powerful tools for human researchers and organizations. Instead of asking “Will AI replace humans?”, we should be asking “How can humans best orchestrate teams of specialized AI agents to solve complex problems?”
The implications extend far beyond scientific research. As businesses grapple with implementing AI, this study suggests that success might lie not in deploying a single, all-powerful AI system, but in thoughtfully combining specialized AI agents with human oversight. It’s a reminder that in both human and artificial intelligence, teamwork often trumps individual brilliance.
I personally am also interested in how different team compositions and working practices might lead to better or worse outcomes — for both AI teams and human teams. Should we have one scientific critic, or should their be specialist critics for each task? How important was the speaking order? What if the group came up with their own agendas? What if there were two principal investigators with different strengths?
The next frontier in AI might not be building bigger models, but building better teams.
I’ve been using OpenMediaVault 6 on a cheap mini-PC as a home server for over a year. Earlier this year, OpenMediaVault 7 was announced which upgrades the underlying Linux to Debian 12 Bookworm and made a number of other security, compatibility, and user interface improvements.
Not wanting to start over fresh, I decided to take advantage of OpenMediaVault’s built-in command line tool to handle upgrades and, if like me, you are looking for a quick and clean way of upgrading from OpenMediaVault 6 to OpenMediaVault 7, look no further:
SSH into your system/ connect directly with a keyboard and monitor. While normally I would recommend WeTTY (accessible via [Services > WeTTY] from the web console interface) to handle any command line activity on your server, because WeTTY relies on the server to be running and updating the operating system necessitates shutting the server down, you’ll need to either plug in a keyboard and monitor or use SSH.
Run sudo omv-upgrade in the command-line. This will start a long process of downloading and installing necessary files to complete the operating system update. From time to time you’ll be asked to accept / approve a set of changes via keyboard. If you’re on the online administrative panel, you’ll be booted off as the server shuts down.
Restartthe server. Once everything is complete, you’ll need to restart the server to make sure everything “takes”. This can be done by running reboot in the command line or by manually turning off and on the server.
Assuming everything went smoothly, after the server completes its reboot (which will take a little bit of extra time after an operating system upgrade), upon logging into the administrative console as you had done before, you’ll be greeted by the new OMV 7 login screen. Congratulations!
As a kid, I remember playing Microsoft Flight Simulator 5.0 — while I can’t say I really understood all the nuances of the several hundred page manual (which explained how ailerons and rudders and elevators worked), I remember being blown away with the idea that I could fly anywhere on the planet and see something reasonably representative there.
Flash forward a few decades and Microsoft Flight Simulator 2024 can safely be said to be one of the most detailed “digital twins” of the whole planet ever built. In addition to detailed photographic mapping of many locations (I would imagine a combination of aerial surveillance and satellite imagery) and an accurate real world inventory of every helipad (including offshore oil rigs!) and glider airport, they also simulate flocks of animals, plane wear and tear, how snow vs mud vs grass behave when you land on it, wake turbulence, and more! And, just as impressive, it’s being streamed from the cloud to your PC/console when you play!
Who said the metaverse is dead?
People are dressed in clothes and styles matching their countries of origin. They speak in the language of their home countries. Flying from the US to Finland on a commercial plane? Walk through the cabin: you’ll hear both English and Finnish being spoken by the passengers.
Neumann, who has a supervising producer credit on 2013’s Zoo Tycoon and a degree in biology, has a soft-spot for animals and wants to make sure they’re also being more realistically simulated in MSFS 2024. “I really didn’t like the implementation of the animal flights in 2020,” he admitted. “It really bothered me, it was like, ‘Hey, find the elephants!’ and there’s a stick in the UI and there’s three sad-looking elephants.
“There’s an open source database that has all wild species, extinct and living, and it has distribution maps with density over time,” Neumann continued. Asobo is drawing from that database to make sure animals are exactly where they’re supposed to be, and that they have the correct population densities. In different locations throughout the year, “you will find different stuff, but also they’re migrating,” so where you spot a herd of wildebeests or caribou one day might not be the same place you find them the next.
It is hard to find good analogies for running a startup that founders can learn from. Some of the typical comparisons — playing competitive sports & games, working on large projects, running large organizations — all fall short of capturing the feeling that the odds are stacked against you that founders have to grapple with.
But the annals of military history offer a surprisingly good analogy to the startup grind. Consider the campaigns of some of history’s greatest military leaders — like Alexander the Great and Julius Caesar — who successfully waged offensive campaigns against numerically superior opponents in hostile territory. These campaigns have many of the same hallmarks as startups:
Bad odds: Just as these commanders faced superior enemy forces in hostile territory, startups compete against incumbents with vastly more resources in markets that favor them.
Undefined rules: Unlike games with clear rules and a limited set of moves, military commanders and startup operators have broad flexibility of action and must be prepared for all types of competitive responses.
Great uncertainty: Not knowing how the enemy will act is very similar to not knowing how a market will respond to a new offering.
As a casual military history enthusiast and a startup operator & investor, I’ve found striking parallels in how history’s most successful commanders overcame seemingly insurmountable odds with how the best startup founders operate, and think that’s more than a simple coincidence.
In this post, I’ll explore the strategies and campaigns of 9 military commanders (see below) who won battle after battle against numerically superior opponents across a wide range of battlefields. By examining their approach to leadership and strategy, I found 5 valuable lessons that startup founders can hopefully apply to their own ventures.
Conquered the Persian Empire before the age of 32; spread Hellenistic culture across Eurasia and widely viewed in the West as antiquity’s greatest conqueror
Despite being a commoner, his victories led to the creation of the Han Dynasty (漢朝) and his being remembered as one of “the Three Heroes of the Han Dynasty” (漢初三傑)
Established Rome’s dominance in Gaul (France); became undisputed leader of Rome, effectively ending the Roman Republic, and his name has since become synonymous with “emperor” in the West
Despite being a commoner, became one of the most successful military commanders in the Mongol Empire. Successfully won battles in more theaters than any other commander (China, Central Asia, and Eastern Europe)
Created Central Asian empire with dominion over Turkey, Persia, Northern India, Eastern Europe, and Central Asia. His successors would eventually create the Mughal Empire in India which continued until the 1850s
Considered one of the greatest British commanders in history; Paved the way for Britain to overtake France as the pre-eminent military and economic power in Europe
Established Prussia as the pre-eminent Central European power after defeating nearly every major European power in battle; A cultural icon for the creation of Germany
Established a French empire with dominion over most of continental Europe; the Napoleonic code now serves as basis for legal systems around the world and the word Napoleon synonymous with military genius and ambition
Before I dive in, three important call-outs to remember:
Running a startup is not actually warfare — there are limitations to this analogy. Startups are not (and should not be) life-or-death. Startup employees are not bound by military discipline (or the threat of imprisonment if they are derelict). The concept of battlefield deception, which is at the heart of many of the tactics of the greatest commanders, also doesn’t translate well. Treating your employees / co-founders as one would a soldier or condoning violent and overly aggressive tactics would be both an ethical failure and a misread of this analogy.
Drawing lessons from these historical campaigns does not mean condoning the underlying sociopolitical causes of these conflicts, nor the terrible human and economic toll these battles led to. Frankly, many of these commanders were absolutist dictators with questionable motivations and sadistic streaks. This post’s focus is purely on getting applicable insights on strategy and leadership from leaders who were able to win despite difficult odds.
This is not intended to be an exhaustive list of every great military commander in history. Rather, it represents the intersection of offensive military prowess and my familiarity with the historical context. Just because I did not mention a particular commander has no bearing on their actual greatness.
With those in mind, let’s explore how the wisdom of historical military leaders can inform the modern startup journey. In the post, I’ll unpack five key principles (see below) drawn from the campaigns of history’s most successful military commanders, and show how they apply to the challenges ambitious founders face today.
1. Get in the trenches with your team
2. Achieve and maintain tactical superiority
3. Move fast and stay on offense
4. Unconventional teams win
5. Pick bold, decisive battles
Principle 1: Get in the trenches with your team
One common thread unites the greatest military commanders: their willingness to share in the hardships of their soldiers. This exercise of leadership by example, of getting “in the trenches” with one’s team, is as crucial in the startup world as it was on historical battlefields.
Every commander on our list was renowned for marching and fighting alongside their troops. This wasn’t mere pageantry; it was a fundamental aspect of their leadership style that yielded tangible benefits:
Inspiration: Seeing their leader work shoulder-to-shoulder with them motivated soldiers to push beyond their regular limits.
Trust: By sharing in their soldiers’ hardships, commanders demonstrated that they valued their troops and understood their needs.
Insight: Direct involvement gave leaders firsthand knowledge of conditions on the ground, informing better strategic decisions.
Perhaps no figure exemplified this better than Alexander the Great. Famous for being one of the first soldiers to jump into battle, Alexander was wounded seriously multiple times. This shared experience created a deep bond with his soldiers, culminating in his legendary speech at Opis where he was able to quell a mutiny of his soldiers, tired after years of campaigns, with a speech reminding them of their shared experiences:
The wealth of the Lydians, the treasures of the Persians, and the riches of the Indians are yours; and so is the External Sea. You are viceroys, you are generals, you are captains. What then have I reserved to myself after all these labors, except this purple robe and this diadem? I have appropriated nothing myself, nor can any one point out my treasures, except these possessions of yours or the things which I am guarding on your behalf. Individually, however, I have no motive to guard them, since I feed on the same fare as you do, and I take only the same amount of sleep.
Nay, I do not think that my fare is as good as that of those among you who live luxuriously; and I know that I often sit up at night to watch for you, that you may be able to sleep.
But some one may say, that while you endured toil and fatigue, I have acquired these things as your leader without myself sharing the toil and fatigue. But who is there of you who knows that he has endured greater toil for me than I have for him? Come now, whoever of you has wounds, let him strip and show them, and I will show mine in turn; for there is no part of my body, in front at any rate, remaining free from wounds; nor is there any kind of weapon used either for close combat or for hurling at the enemy, the traces of which I do not bear on my person.
For I have been wounded with the sword in close fight, I have been shot with arrows, and I have been struck with missiles projected from engines of war; and though oftentimes I have been hit with stones and bolts of wood for the sake of your lives, your glory, and your wealth, I am still leading you as conquerors over all the land and sea, all rivers, mountains, and plains. I have celebrated your weddings with my own, and the children of many of you will be akin to my children.
Alexander the Great (as told by Arrian)
This was not unique to Alexander. Julius Caesar famously slept in chariots and marched alongside his soldiers. Napoleon was called “le petit caporal” by his troops after he was found sighting the artillery himself, a task that put him within range of enemy fire and was usually delegated to junior officers.
Frederick the Great also famously mingled with his soldiers while on tour, taking kindly to the nickname from his men, “Old Fritz”. Frederick understood the importance of this as he once wrote to his nephew:
“You cannot, under any pretext whatever, dispense with your presence at the head of your troops, because two thirds of your soldiers could not be inspired by any other influence except your presence.”
Frederick the Great
For Startups
For founders, the lesson is clear: show up when & where your team is and roll up your sleeves so they can see you work beside them. It’s not just that startups tend to need “all hands on deck”, but being in the trenches also provides “on the ground” context that is valuable and help create the morale needed to succeed.
“I am personally on that line, in that machine, trying to solve problems personally where I can,” Musk said at the time. “We are working seven days a week to do it. And I have personally been here on zone 2 module line at 2:00 a.m. on a Sunday morning, helping diagnose robot calibration issues. So I’m doing everything I can.”
Principle 2: Achieve and maintain tactical superiority
To win battles against superior numbers requires a commander to have a strong tactical edge over their opponents. This can be in the form of a technological advantage (i.e. a weapons technology) or an organizational one (i.e. superior training or formations), but these successful commanders always made sure their soldiers could “punch above their weight”.
Alexander the Great, for example, leveraged the Macedonian Phalanx, a modification of the “classical Greek phalanx” used by the Greek city states of the era, that his father Philip II helped create.
The formation relied on “blocks” of heavy infantry equipped with six-meter (!!) long spears called sarissa which could rearrange themselves (to accommodate different formation widths and depths) and “pin” enemy formations down while the heavy cavalry would flank or exploit gaps in the enemy lines. This formation made Alexander’s army highly effective against every military force — Greeks, Persians, and Indians — it encountered.
A few centuries later, the brilliant Chinese commander Han Xin (韓信) leaned heavily on the value of military engineering. Han Xin (韓信)’s soldiers would rapidly repair & construct roads to facilitate his army’s movement or, at times, to deceive his enemies about which path he planned to take. His greatest military engineering accomplishment was at the Battle of Wei River (濰水) in 204 BCE. Han Xin (韓信) attacked the larger forces of the State of Qi (齊) and State of Chu (楚) and immediately retreated across the river, luring them to cross. What his rivals had not realized in their pursuit was that the water level of the Wei River was oddly low. Han Xin (韓信) had, prior to the attack, instructed his soldiers to construct a dam upstream to lower the water level. Once a sizable fraction of the enemy’s forces were mid-stream, Han Xin (韓信) ordered the dam released. The rush of water drowned a sizable portion of the enemy’s forces and divided the Chu (楚) / Qi (齊) forces letting Han Xin (韓信)’s smaller army defeat and scatter them.
A century and a half later, Roman statesman and military commander Gaius Julius Caesar also famously advocated military engineering capability in his wars with the Germanic tribes in Gaul. He became the first Roman commander to cross the Rhine (twice!) by building bridges to make the point to the Germanic tribes that he could invade them whenever he wanted. At the Battle of Alesia in 52 BCE, after trading battles with the skilled Gallic commander Vercingetorix who had united the tribes in opposition to Rome, Caesar besieged Vercingetorix’s fortified settlement of Alesia while simultaneously holding off Gallic reinforcements. Caesar did this by building 25 miles of fortifications surrounding Alesia in a month, all while outnumbered and under constant harassment from both sides by the Gallic forces! Caesar’s success forced Vercingetorix to surrender, bringing an end to organized resistance to Roman rule in Gaul for centuries.
The Mongol commander Subutai similarly made great use of Mongol innovations to overcome defenders from across Eurasia. The lightweight Mongol composite bow gave Mongol horse archers a devastating combination of long range (supposedly 150-200 meters!) and speed (because they were light enough to be fired while on horseback). The Mongol horses themselves were another “biotechnological” advantage in that they required less water and food which let the Mongols wage longer campaigns without worrying about logistics.
In the 18th century, Frederick the Great transformed warfare on the European continent with a series of innovations. First, he drilled his soldiers stressing things like firing speed. It is said that lines of Prussian riflemen could fire over twice as fast as other European armies they faced, making them exceedingly lethal in combat.
Frederick was also famous for a battle formation: the oblique order. Instead of attacking an opponent head on, the oblique order involves confronting the enemy line at an angle with soldiers massed towards one end of the formation. If one’s soldiers are well-trained and disciplined, then even with a smaller force in aggregate, the massed wing can overwhelm the opponent in one area and then flank or surround the rest. Frederick famously boasted that the oblique order could allow a skilled force to win over an opposing one three times its size.
Finally, Frederick is credited with popularizing horse artillery, the use of horse-drawn light artillery guns, in European warfare. With horse artillery units, Frederick was able to increase the adaptability of his forces and their ability to break through even numerically superior massed infantry by concentrating artillery fire where it was needed.
A few decades later, Napoleon Bonaparte became the undisputed master of much of continental Europe by mastering army-level logistics and organization. While a brilliant tactician and artillery commander, what set Napoleon’s military apart was its embrace of the “corps system”, which subdivided his forces into smaller, self-contained corps that were capable of independent operations. This allowed Napoleon the ability to pursue grander goals, knowing that he could focus his attention on the most important fronts of battle, while the other corps could independently pin an enemy down or pursue a different objective in parallel.
Additionally, Napoleon invested heavily in overhauling military logistics, using a combination of forward supply depots and teaching his forces to forage for food and supplies in enemy territory (and, just as importantly, how to estimate what foraging can do to help determine the necessary supplies to take). This investment led to the invention of modern canning technology, first used to support the marches of the French Grande Armée. The result was Napoleon could field larger armies over longer campaigns all while keeping his soldiers relatively well-fed.
For Startups
Founders need to make sure they have a strong tactical advantage that fits their market(s). As evidenced above, it does not need to be something as grand as an unassailable advantage, but it needs to be a reliable winnerand something you continuously invest in if you plan on competing with well-resourced incumbents in challenging markets.
The successful payments company Stripe started out by making sure they would always win on developer ease of use, even going so far as to charge more than their competition during their Beta to make sure that their developer customers were valuing them for their ease of use. Stripe’s advantage here, and continuous investment in maintaining that advantage, ultimately let it win any customer that needed a developer payment integration, even against massive financial institutions. This advantage laid the groundwork for Stripe’s meteoric growth and expansion into adjacent categories from its humble beginnings.
Principle 3: Move fast and stay on offense
In both military campaigns and startups, speed and a focus on offense plays an outsized role in victory, because the ability to move quickly creates opportunities and increases resiliency to mistakes.
Few understood this principle as well as the Mongol commander Subutai who frequently took advantage of the greater speed and discipline of the Mongol cavalry to create opportunities to win.
In the Battle of the Kalka River (1223), Subutai took what initially appeared to be a Mongol defeat — when the Kievan Rus and their Cuman allies successfully entrapped the Mongol forces in the area — and turned it into a victory. The Mongols began a 9 day feigned retreat (many historians believe this was a real retreat that Subutai turned into a feigned one once he realized the situation), constantly tempting the enemy by staying just out of reach into overextending themselves in pursuit.
After 9 days, Subutai’s forces took advantage of their greater speed to lay a trap. Once the Mongols crossed the river they reformed their lines to lie in ambush. As soon as the Rus forces crossed the Kalka River, they found themselves surrounded and confronted with a cavalry charge they were completely unprepared for. After all, they had been pursuing what they thought was a fleeing enemy! Their backs against the river, the Rus forces (including several major princes) were annihilated.
Subutai took advantage of the Mongol speed advantage in a number of his campaigns, coordinating fast-moving Mongol divisions across multiple objectives. In its destruction of the Central Asian Khwarazmian empire, the Mongols, under the command of Subutai and Mongol ruler Genghis Khan, overwhelmed the defenders with coordinated maneuvers. While much of the Mongol forces attacked from the East, where the Khwarazmian forces massed, Subutai used the legendary Mongol speed to go around the Khwarazmian lines altogether, ending up at Bukhara, 100 miles to the West of the Khwarazmian defensive position! In a matter of months, the empire was destroyed and its rulers chased out, never to return.
A few hundred years later, the Englishman John Churchill, the Duke of Marlborough also proved the value of speed in 1704 when he boldly marched an army of 21,000 Dutch and English troops on a 250-mile march across Europe in just five weeks to place themselves between French and Bavarian forces and their target of Vienna. Had Vienna been attacked, it would have forced England’s ally the Holy Roman Empire out of the conflict, giving France the victory in the War of the Spanish Succession. This march was made all the more challenging as Marlborough had to find a way to feed and equip his army along this march without unnecessarily burdening the neutral and friendly territories they were marching through.
Marlborough’s maneuver threw the Bavarian and French forces off-balance. What originally was supposed to be an “easy” French victory culminated in a crushing defeat for the French at Blenheim which turned the momentum of the war. This victory solidified Marlborough’s reputation and even resulted in the British government agreeing to build a lavish palace (called Blenheim Palace in honor of the battle) as a reward to Marlborough.
Marlborough proved the importance of speed again at the Battle of Oudenarde. In 1708, French forces captured Ghent and Bruges (in modern day Belgium), threatening the alliance’s ability to maintain contact with Britain. Recognizing this, Marlborough force-marched his army to the city of Oudenarde, marching 30 miles in about as many hours. The French, confident from their recent victories and suffering from an internal leadership squabble, misjudged the situation, allowing Marlborough’s forces to build five pontoon bridges to move his 80,000 soldiers across the nearby river.
When the French commander received news that the allies were already at Oudenarde building bridges, he said, “If they are there, then the devil must have carried them. Such marching is impossible!“
Marlborough’s forces, not yet at full strength, engaged the French, buying sufficient time for his forces to cross and form up. Once in formation, they counterattacked and collapsed one wing of the French line, saving the Allied position in the Netherlands, and resulting in a bad defeat for French forces.
For Startups
The pivotal role speed played in achieving victory for Subutai and the Duke of Marlborough apply in the startup domain as well. The ability to make fast decisions, to quickly shift focus to rapidly adapt to a new market context creates opportunities that slower moving incumbents (and military commanders!) cannot seize. Speed also gifts resiliency against mistakes and weak positions, in much the same way that speed let the Mongols and the Anglo-Prussian-Dutch alliance overcome their initial missteps at Kalka River and Oudenarde. Founders would be wise to remember to embrace speed of action in all they do.
Facebook and it’s (now in)famous “move fast, break things” motto is one classic example of how a company can internalize speed as a culture. It leveraged that to ship products and features which has kept it a leader in social and AI even in the face of constant competition and threats from well-funded companies like Google, Snapchat, and Bytedance.
Principle 4: Unconventional teams win
Another unifying hallmark of the great commanders is that they made unconventional choices with regards to their army composition. Relative to their peers, these commanders tended to build armies that were more diverse in class and nationality. While this required exceptional communication and inspiration skills, it gave the commanders significant advantages:
Ability to recruit in challenging conditions: For many of the commanders, the unconventional team structure was a necessity to build up the forces they needed given logistical / resource constraints while operating in enemy territory.
Operational flexibility from new tactics: Bringing on personnel from different backgrounds let commanders incorporate additional tactics and strategies, creating a more effective and flexible fighting force.
The Carthaginian general Hannibal Barca for example famously fielded a multi-nationality army consisting of Carthaginians, Libyans, Iberians, Numidians, Balearic soldiers, Gauls, and Italians. This allowed Hannibal to raise an army in hostile territory — after all, waging war in the heart of Italy against Rome made it difficult to get reinforcements from Carthage.
But, it also gave Hannibal’s army flexibility in tactics. Balearic slingers provided superior long range attack to the best Roman-used bows of the time. Numidian light cavalry provided Hannibal with fast reconnaissance and a quick way to flank and outmaneuver Roman forces. Gallic and Iberian soldiers provided shock infantry and cavalry. Each of these groups of soldiers added their own distinctive capabilities to Hannibal’s armies and great victories over Rome.
The Central Asian conqueror Timur similarly fielded a diverse army which included Mongols, Turks, Persians, Indians, Arabs, and others. This allowed Timur to field larger armies for his campaigns by recruiting from the countries he forced into submission. Like with Hannibal, it also gave Timur’s army access to a diverse set of tactics: war elephants (from India), infantry and siege technology from the Persians, gunpowder from the Ottomans, and more. This combination of operational flexibility and ability to field large armies let Timur build an empire which defeated every major power in Central Asia and the Middle East.
It should not be a surprise that some of the great commanders were drawn towards assembling unconventional teams as several of them were ultimately “commoners”. Subutai (a son of a blacksmith who Genghis Khan took interest in), Timur (a common thief), and Han Xin (韓信, who famously had to beg for food in his childhood) all came from relatively humble origins. Napoleon, famous for declaring the military “la carrier est ouvérte aux talents” (“the career open to the talents”) and creating the first modern order of merit Légion d’honneur (open to all, regardless of social class), was similarly motivated by the difficulties he faced in securing promotion early in his career due to his not being from the French nobility.
But, by embracing more of a meritocracy, Napoleon was ultimately able to field some of the largest European armies in existence as he waged war successfully against every other major European power (at once).
For Startups
Hiring is one of the key tasks for startup founders. While hiring the people that larger, better-resourced companies want to can be helpful for a startup, it’s important to always remember that transformative victories require unconventional approaches. Leaning on unconventional hires may help you get out of a salary bidding war with those deeper-pocketed competitors. Choosing unconventional hires may also add different skills and perspectives to the team.
In pursuing this strategy, it’s also vital to excel at communication & organization as well as fostering a shared sense of purpose. All teams require strong leadership to be effective but this is especially true with an unconventional team composition facing uphill odds.
The enterprise API company Zapier is one example of taking an unconventional approach to team construction by having been 100% remote from inception (pre-COVID even). This let the company assemble a team without being confined by location and eliminate the need to spend on unnecessary facilities. They’ve had to invest in norms around documentation and communication to make this work, and, while it’d be too far of a leap to argue all startups should go 100% remote, for Zapier’s market and team culture, it’s worked.
Principle 5: Pick bold, decisive battles
When in a challenging environment with limited resources, it’s important to prioritize decisive moves — actions which can result in a huge payoff — even if risky over safer, less impactful ones. This is as true for startups, which have limited runway and need to make a big splash in order to fundraise, as for military commanders who need more than just battlefield wins but strategic victories.
Few understood this as well as the Carthaginian general Hannibal Barca who, in waging the Second Punic War against Rome, crossed the Alps from Spain with his army in 218 BCE (at the age of 29!). Memorialized in many works of art (see below for one by Francisco Goya), this was a dangerous move (one that resulted in the loss of many men and almost his entire troop of war elephants) and was widely considered to be impossible.
While history (rightly) remembers Hannibal’s boldness, it’s important to remember that Hannibal’s move was highly calculated. He realized that the Gauls in Northern Italy, who had recently been subjugated by the Romans, were likely to welcome a Roman rival. Through his spies, he also knew that Rome was planning an invasion of Carthage in North Africa. He knew he had little chance to bypass the Roman navy or Roman defensive placements if he invaded in another way.
And Hannibal’s bet paid off! Having caught the Romans entirely by surprise, they cancelled their planned invasion of Africa, and Hannibal lined up many Gallic allies to his cause. Within two years of his entry into Italy, Hannibal trounced the Roman armies sent to battle him at the River Ticinus, at the River Trebia, and at Lake Trasimene. Shocked by their losses, the Romans elected two consuls with the mandate to battle Hannibal and stop him once and for all.
Knowing this, Hannibal seized a supply depot at the town of Cannae, presenting a tempting target to the Roman consuls to prove themselves. They (foolishly) took the bait. Despite fielding over 80,000 soldiers against Hannibal’s 50,000, Hannibal successfully executed a legendary double-envelopment maneuver (see below) and slaughtered almost the entire Roman force that met him in battle.
To put this into perspective, in the 2 years after Hannibal crossed the Alps, Hannibal’s army killed 20% of all male Romans over the age of 17 (including at least 80 Roman Senators and one previous consul). Cannae is today considered one of the greatest examples of military tactical brilliance, and, as historian Will Durant wrote, “a supreme example of generalship, never bettered in history”.
Cannae was a great example of Hannibal’s ability to pick a decisive battle with favorable odds. Hannibal knew that his only chance was to encourage the city-states of Italy to side with him. He knew the Romans had just elected consuls itching for a fight. He chose the field of battle by seizing a vital supply depot at Cannae. Considering the Carthaginians had started and pulled back from several skirmishes with the Romans in the days leading up to the battle, it’s clear Hannibal also chose when to fight, knowing full well the Romans outnumbered him. After Cannae, many Italian city-states and the kingdom of Macedon sided with Carthage. That Carthage ultimately lost the Second Punic War is a testament more to Rome’s indomitable spirit and the sheer odds Hannibal faced than any indication of Hannibal’s skills.
In the Far East, about a decade later, the brilliant Chinese military commander Han Xin (韓信) was laying the groundwork for the creation of the Han Dynasty (漢朝) in a China-wide civil war known as the the Chu-Han contention between the State of Chu (楚) and the State of Han (漢) led by Liu Bang (劉邦, who would become the founding emperor Gaozu 高祖 of the Han Dynasty 漢朝).
Under the leadership of Han Xin (韓信), the State of Han (漢) won many victories over their neighbors. Overconfident from those victories, his king Liu Bang (劉邦) led a Han (漢) coalition to a catastrophic defeat when he briefly captured but then lost the Chu (楚) capital of Pengcheng (彭城) in 205 BCE. Chu forces (楚) were even able to capture the king’s father and wife as hostages, and several Han (漢) coalition states switched their loyalty to the Chu (楚).
To fix his king’s blunder, Han Xin (韓信) tasked the main Han (漢) army with setting up fortified positions in the Central Plain, drawing Chu (楚) forces there. Han Xin (韓信) would himself take a smaller force of less experienced soldiers to attack rival states in the North to rebuild the Han (漢) military position.
Sensing an opportunity to deal a decisive blow to the overconfident Zhao (趙), Han Xin (韓信) ordered a cavalry unit to sneak into the mountains behind the Zhao (趙) camp and to remain hidden until battle started. He then ordered half of his remaining army to position themselves in full view of the Zhao (趙) forces with their backs to the Tao River (洮水), something Sun Tzu’s Art of War (孫子兵法) explicitly advises against (due to the inability to retreat). This “error” likely reinforced the Zhao (趙) commander’s overconfidence, as he made no move to pre-emptively flank or deny the Han (漢) forces their encampment.
Han Xin (韓信) then deployed his full army which lured the Zhao (趙) forces out of their camp to counterattack. Because the Tao River (洮水) cut off all avenues of escape, the outnumbered Han (漢) forces had no choice but to dig in and fight for their lives, just barely holding the Zhao (趙) forces at bay. By luring the enemy out for what appeared to be “an easy victory”, Han Xin (韓信) created an opportunity for his hidden cavalry unit to capture the enemy Zhao (趙) camp, replacing their banners with those of the Han (漢). The Zhao (趙) army saw this when they regrouped, which resulted in widespread panic as the Zhao (趙) army concluded they must be surrounded by a superior force. The opposition’s morale in shambles, Han Xin (韓信) ordered a counter-attack and the Zhao (趙) army crumbled, resulting in the deaths of the Zhao (趙) commander and king!
Han Xin (韓信) bet his entire outnumbered command on a deception tactic based on little more than an understanding of his army’s and the enemy’s psychology. He won a decisive victory which helped reverse the tide of the war. The State of Zhao (趙) fell, and the State of Jiujiang (九江) and the State of Yan (燕) switched allegiances to the Han (漢). This battle even inspired a Chinese expression “fighting a battle with one’s back facing a river” (背水一戰) to describe fighting for survival in a “last stand”.
Roughly a century later, on the other side of the world, the Roman statesman and military commander Julius Caesar made a career of turning bold, decisive bets into personal glory. After Caesar conquered Gaul, Caesar’s political rivals led by Gnaeus Pompeius Magnus (Pompey the Great), a famed military commander, demanded Caesar return to Rome and give up his command. Caesar refused and crossed the Rubicon, a river marking the boundary of Italy, in January 49 BCE starting a Roman Civil War and coining at least two famous expressions (including alea iacta est – “the die is cast”) for “the point of no return”.
This bold move came as a complete shock to the Roman elite. Pompey and his supporters fled Rome. Taking advantage of this, Caesar captured Italy without much bloodshed. Caesar then pursued Pompey to Macedon, seeking a decisive land battle which Pompey, wisely, given his broad network of allies and command of the Roman navy, refused to give him. Instead, Caesar tried and failed to besiege Pompey at Dyrrhachium which forced him into retreat in Greece.
Pompey’s supporters, however, lacked Pompey’s patience (and judgement). Overconfident from their naval strength, numerical advantage, and Caesar’s failure at Dyrrhachium, they pressured Pompey into a battle with Caesar who was elated at the opportunity. In the summer of 48 BCE, the two sides met at the Battle of Pharsalus.
Always cautious, Pompey took up a position on a mountain and oriented his forces such that his larger cavalry wing would have ability to overpower Caesar’s cavalry and then flank Caesar’s forces while his numerically superior infantry would be arranged deeper to smash through or at least hold back Caesar’s lines.
Caesar made a bold tactical choice when he saw Pompey’s formation. He thinned his (already outnumbered) lines to create a 4th reserve line of veterans which he positioned behind his cavalry at an angle (see battle formation above).
Caesar initiated the battle and attacked with two of his infantry lines. As Caesar expected, Pompey ordered a cavalry charge which soon forced back Caesar’s outnumbered cavalry. But Pompey’s cavalry then encountered Caesar’s 4th reserve line which had been instructed to use their javelins to stab at the faces of Pompey’s cavalry like bayonets. Pompey’s cavalry, while larger in size, was made up of relatively inexperienced soldiers and the shock of the attack caused them to panic. This let Caesar’s cavalry regroup and, with the 4th reserve line, swung around Pompey’s army completing an expert flanking maneuver. Pompey’s army, now surrounded, collapsed once Caesar sent his final reserve line into battle.
Caesar’s boldness and speed of action let him take advantage of a lapse in Pompey’s judgement. Seeing a rare opportunity to win a decisive battle, Caesar was even willing to risk a disadvantage in infantry, cavalry, and position (Pompey’s army had the high ground and had forced Caesar to march to him). But this strategic and tactical gamble (thinning his lines to counter Pompey’s cavalry charge) paid off as Pharsalus shattered the myth of Pompey’s inevitability. Afterwards, Pompey’s remaining allies fled or defected to Caesar, and Pompey himself fled to Egypt where he was assassinated (by a government wishing to win favor with Caesar). And, all of this — from Gaul to crossing the Rubicon to the Civil War — paved the way for Caesar to become the undisputed master of Rome.
For Startups
Founders need to take bold, oftentimes uncomfortable bets that have large payoffs. While a large company can take its time winning a war of attrition, startups need to score decisive wins quickly in order to attract talent, win deals, and shift markets towards them. Only taking the “safe and rational” path is a failure to recognize the opportunity cost when operating with limited resources.
In other words, founders need to find their own Alps / Rubicons to cross.
In the startup world, few moves are as bold (while also uncomfortable and risky) as big pivots. But, there are examples of incredible successes like Slack that were able to make this work. In Slack’s case, the game they originally developed ended up a flop, but CEO & founder Stewart Butterfield felt the messaging product they had built to support the game development had potential. Leaning on that insight, over the skepticism of much of his team and some high profile investors, Butterfield made a bet-the-company move similar to Han Xin (韓信) digging in with no retreat which created a seminal product in the enterprise software space.
Summary
I hope I’ve been able to show that history’s greatest military commanders can offer valuable lessons on leadership and strategy for startup founders.
The five principles derived from studying some of the commanders’ campaigns – the importance of getting in the trenches, achieving tactical superiority, moving fast, building unconventional teams, and picking bold, decisive battles – played a key role in the commanders’ success and generalize well to startup execution.
After all, what is a more successful founder than one who can recruit teams despite resource constraints (unconventional teams), inspire them (by getting in the trenches alongside them), and move with speed & urgency (move fast) to take a competitive edge (achieve tactical superiority) and apply it where there is the greatest chance of a huge impact on the market (pick bold, decisive battles).
Until I read this Verge article, I had assumed that video codecs were a boring affair. In my mind, every few years, the industry would get together and come up with a new standard that promised better compression and better quality for the prevailing formats and screen types and, after some patent licensing back and forth, the industry would standardize around yet another MPEG standard that everyone uses. Rinse and repeat.
The article was an eye-opening look at how video streamers like Netflix are pushing the envelope on using video codecs. Since one of a video streamer’s core costs is the cost of video bandwidth, it would make sense that they would embrace new compression approaches (like different kinds of compression for different content, etc.) to reduce those costs. As Netflix embraces more live streaming content, it seems they’ll need to create new methods to accommodate.
But what jumped out to me the most was that, in order to better test and develop the next generation of codec, they produced a real 12 minute noir film called Meridian (you can access it on Netflix, below is someone who uploaded it to YouTube) which presents scenes that have historically been more difficult to encode with conventional video codecs (extreme lights and shadows, cigar smoke and water, rapidly changing light balance, etc).
Absolutely wild.
While contributing to the development of new video codecs, Aaron and her team stumbled across another pitfall: video engineers across the industry have been relying on a relatively small corpus of freely available video clips to train and test their codecs and algorithms, and most of those clips didn’t look at all like your typical Netflix show. “The content that they were using that was open was not really tailored to the type of content we were streaming,” recalled Aaron. “So, we created content specifically for testing in the industry.”
In 2016, Netflix released a 12-minute 4K HDR short film called Meridian that was supposed to remedy this. Meridian looks like a film noir crime story, complete with shots in a dusty office with a fan in the background, a cloudy beach scene with glistening water, and a dark dream sequence that’s full of contrasts. Each of these shots has been crafted for video encoding challenges, and the entire film has been released under a Creative Commons license. The film has since been used by the Fraunhofer Institute and others to evaluate codecs, and its release has been hailed by the Creative Commons foundation as a prime example of “a spirit of cooperation that creates better technical standards.”