Search results for: “google”

Google’s Quantum Error Correction Breakthrough

December 11, 2024

One of the most exciting areas of technology development, but that doesn’t get a ton of mainstream media coverage, is the race to build a working quantum computer that exhibits “below threshold quantum computing” — the ability to do calculations utilizing quantum mechanics accurately.

One of the key limitations to achieving this has been the sensitivity of quantum computing systems — in particular the qubits that capture the superposition of multiple states that allow quantum computers to exploit quantum mechanics for computation — to the world around them. Imagine if your computer’s accuracy would change every time someone walked in the room — even if it was capable of amazing things, it would not be especially practical. As a result, much research to date has been around novel ways of creating physical systems that can protect these quantum states.

Google has (in a pre-print in Nature) demonstrated their new Willow quantum computing chip which demonstrates a quantum error correction method that spreads the quantum state information of a single “logical” qubit across multiple entangled “physical” qubits to create a more robust system. Beyond proving that their quantum error correction method worked, what is most remarkable to me, is that they’re able to extrapolate a scaling law for their error correction — a way of guessing how much better their system is at avoiding loss of quantum state as they increase the number of physical qubits per logical qubit — which could suggest a “scale up” path towards building functional, practical quantum computers.

I will confess that quantum mechanics was never my strong suit (beyond needing it for a class on statistical mechanics eons ago in college), and my understanding of the core physics underlying what they’ve done in the paper is limited, but this is an incredibly exciting feat on our way towards practical quantum computing systems!

The company’s new chip, called Willow, is a larger, improved version of that technology, with 105 physical qubits. It was developed in a fabrication laboratory that Google built at its quantum-computing campus in Santa Barbara, California, in 2021.

As a first demonstration of Willow’s power, the researchers showed that it could perform, in roughly 5 minutes, a task that would take the world’s largest supercomputer an estimated 10²⁵ years, says Hartmut Neven, who heads Google’s quantum-computing division. This is the latest salvo in the race to show that quantum computers have an advantage over classical ones.

And, by creating logical qubits inside Willow, the Google team has shown that each successive increase in the size of a logical qubit cuts the error rate in half.

“This is a very impressive demonstration of solidly being below threshold,” says Barbara Terhal, a specialist in quantum error correction at the Delft University of Technology in the Netherlands. Mikhail Lukin, a physicist at Harvard University in Cambridge, Massachusetts, adds, “It clearly shows that the idea works.”

‘A truly remarkable breakthrough’: Google’s new quantum chip achieves accuracy milestone
Davide Castelvecchi | Nature News
Laszlo Bock on Building Google’s Culture

March 2, 2016
Much has been written about what makes Google work so well: their ridiculously profitable advertising business model, the technology behind their search engine and data centers, and the amazing pay and perks they offer.

Source: the book

My experiences investing in and working with startups, however, has taught me that building a great company is usually less about a specific technical or business model innovation than about building a culture of continuous improvement and innovation. To try to get some insight into how Google does things, I picked up Google SVP of People Operations Laszlo Bock’s book Work Rules!

Bock describes a Google culture rooted in principles that came from founders Larry Page and Sergey Brin when they started the company: get the best people to work for you, make them want to stay and contribute, and remove barriers to their creativity. What’s great (to those interested in company building) is that Bock goes on to detail the practices Google has put in place to try to live up to these principles even as their headcount has expanded.

The core of Google’s culture boils down to four basic principles and much of the book is focused on how companies should act if they want to live up to them:
1. Presume trust: Many of Google’s cultural norms stem from a view that people are well-intentioned and trustworthy. While that may not seem so radical, this manifested at Google as a level of transparency with employees and a bias to say yes to employee suggestions that most companies are uncomfortable with. It raises interesting questions about why companies that say their talent is the most important thing treat them in ways that suggest a lack of trust.
2. Recruit the best: Many an exec pays lip service to this, but what Google has done is institute policies that run counter to standard recruiting practices to try to actually achieve this at scale: templatized interviews / forms (to make the review process more objective and standardized), hiring decisions made by cross-org committees (to insure a consistently high bar is set), and heavy use of data to track the effectiveness of different interviewers and interview tactics. While there’s room to disagree if these are the best policies (I can imagine hating this as a hiring manager trying to staff up a team quickly), what I admired is that they set a goal (to hire the best at scale) and have actually thought through the recruiting practices they need to do so.
3. Pay fairly [means pay unequally]: While many executives would agree with the notion that superstar employees can be 2-10x more productive, few companies actually compensate their superstars 2-10x more. While its unclear to me how effective Google is at rewarding superstars, the fact that they’ve tried to align their pay policies with their beliefs on how people perform is another great example of deviating from the norm (this time in terms of compensation) to follow through on their desire to pay fairly.
4. Be data-driven: Another “in vogue” platitude amongst executives, but one that very few companies live up to, is around being data-driven. In reading Bock’s book, I was constantly drawing parallels between the experimentation, data collection, and analyses his People Operations team carried out and the types of experiments, data collection, and analyses you would expect a consumer internet/mobile company to do with their users. Case in point: Bock’s team experimented with different performance review approaches and even cafeteria food offerings in the same way you would expect Facebook to experiment with different news feed algorithms and notification strategies. It underscores the principle that, if you’re truly data-driven, you don’t just selectively apply it to how you conduct business, you apply it everywhere.
Of course, not every company is Google, and not every company should have the same set of guiding principles or will come to same conclusions. Some of the processes that Google practices are impractical (i.e., experimentation is harder to set up / draw conclusions from with much smaller companies, not all professions have such wide variations in output as to drive such wide variations in pay, etc).

What Bock’s book highlights, though, is that companies should be thoughtful about what sort of cultural principles they want to follow and what policies and actions that translates into if they truly believe them. I’d highly recommend the book!
LLMs Get Trounced at Chess

July 10, 2025

While Large Language Models (LLMs) have demonstrated they can do many things well enough, it’s important to remember that these are not “thinking machines” so much as impressively competent “writing machines” (able to figure out what words are likely to follow).

Case in point: both OpenAI’s ChatGPT and Microsoft Copilot lost to the chess playing engine of an old Atari game (Video Chess) which takes up a mere 4 KB of memory to work (compared with the billions of parameters and GB’s of specialized accelerator memory needed to make LLMs work).

It’s a small (yet potent) reminder that (1) different kinds of AI are necessary for different tasks (i.e. Google’s revolutionary AlphaZero probably would’ve made short work of the Atari engine) and (2) don’t underestimate how small but highly specialized algorithms can perform.

Last month we reported on the somewhat-surprising news that an emulated Atari 2600 running the 1979 software Video Chess had “absolutely wrecked” an overconfident ChatGPT at the game of kings. Fans of schadenfreude rejoice, because Microsoft Copilot thought this was a chance to show its superiority to ChatGPT: And the Atari gave it a beating.

After conquering ChatGPT, Atari 2600 Video Chess destroys Microsoft Copilot: ‘The vintage silicon mastermind bested me fair and square’
Rich Stanton | PC Gamer
Multi-Agent ChatLab

December 21, 2024
Summary

I built a simple browser-based “ChatLab” for multi-agent AI experimentation (GitHub link). It is:
- Pure client-side implementation — no server or build system required, all you need is a modern browser and an account/API key with your LLM of choice
- Portable — Encapsulated in a single HTML file (the only external dependencies being Preact and Showdown)
- Secure — your secret LLM API keys don’t get sent to any third party server except the LLM vendor’s
- Versatile — Supports multiple LLM APIs starting with Google Gemini, Anthropic’s Claude, and OpenAI
- Easy to use — Supports rich text (via Markdown), basic color-coding to track different Agents in a conversation, editable message history, tabbed interface makes it easy to customize different Agents to use different system prompts and models, and ability for users to participate directly or to target a given Agent to respond
Motivation

As I explored the potential for multi-agent AI workflows, I quickly hit a wall with existing tools. The chat interfaces and developer tools made available by OpenAI, Google, and Anthropic aren’t built to support multiple participants, and I found myself resorting to cumbersome and error-prone copy-and-paste-and-edit workflows across multiple browser tabs or terminal windows.

So I built this to make it easier to experiment. I hope the MultiAgent ChatLab helps others who want to similarly experiment with multi-agent AI.

Design / Architecture

Streamlined Multi-Agent Conversation Handling
- My experimentation showed the major LLMs could be “encouraged” to act as if they were in conversations with multiple participants just by adding prefixes to messages (i.e. “[OpenAI]: That's a great idea! [Claude]: I agree! [User]: Ok, let's go!“) and passing each message as if it were input from the User.
  
  The challenge was that, without specialized tooling, this required a lot of effort to pass the same conversation history to multiple AI Agents because each message needed to have the right prefix added and then copied into the LLM API payload.
  
  The ChatLab tracks the message history automatically and adds (and strips) those prefixes behind-the-scenes.
- To customize the Agents, the ChatLab allows you to customize each Agent with a unique name (which determines the prefix), model, and system prompt (which gives the Agent instructions to follow) (see below).
User-Friendly Interface
- There is a simple tabbed interface to allow you to direct messages to a particular Agent to respond
- The message history is presented as a simple chat interface with colors that distinguish different agents from one another.
- The message interface also supports in-browser Markdown rendering to display rich-text responses properly
- Individual messages can also be edited and deleted, making it much easier to experiment
How to Use the ChatLab
1. Download the HTML File. You can grab it directly from GitHub. You can also just access it here.
2. Open in modern browser like Chrome. Just open the file locally — no additional setup needed
3. Configure your Agents: Identify the Agents you want participating and use the tabbed interface at the bottom to give them names and system prompts and pick the model you want to use for them
4. Carry out the Conversation.
  - Click on the tab for the Agent you want to start first with. If you want to start the conversation as a User, fill out the text field just below Message:. If you want the Agent to start, leave that text field blank. In either case, hit the Send button to invoke the AI model
  - You can send messages as the User by filling out the Message: field and pressing Send on the tab for the Agent you want to respond.
  - Or, if you just want an AI agent to go next, just press Send after selecting the tab for that agent.
  - If you want more space for the message (or if you just want the AIs to talk to each other), hit the Hide Controls button on the tab bar to collapse the input area. You can press Show Controls to bring it back.
5. Experiment and tweak away
  - You can adjust System Prompts and Models as you go
  - You can also Edit or Delete individual messages if you want the message history to reflect something different. The model will pass every message (and handle all the prefix addition and removal for you) each time
I’ve enjoyed watching AI Agents bantering with each other or with myself. The screenshot below shows some of the back and forth between one agent pretending to be Napoleon Bonaparte and another pretending to be the Duke of Wellington, the British commander who defeated Napoleon at Waterloo.

Conclusion

In an era where AI is increasingly collaborative, tools that simplify multi-agent workflows are essential. It’s my hope that the MultiAgent ChatLab can help more people prototype, experiment, and iterate on this new world of AI agent capability without needing an advanced setup or sharing sensitive API keys with external services.
Diagnostic Math

December 4, 2024
Between the deep learning work I’ve done in low vision and glaucoma and my time spent as a deeptech investor, I’ve spent a great deal of time looking at diagnostic technologies of all sorts and thinking about diagnostic tests and how to make them successful.

While each technology and entrepreneurial team is different, there are some commonalities that drive whether a diagnostic test is useful around concepts like sensitivity, specificity, positive predictive value, and AUROC.

I summarized some of these views in “Why is it so Hard to Build a Diagnostic Business?” and as part of that work, I created Google Sheets and browser-based calculators for some of the figures of merit that matter. You can access them from the post but also here as well:

Google sheets
- Main link
  - Tab on hypothetical HIV test from blog post
  - Tab with a positive predictive value (PPV) and number needed to screen (NNS) calculator (the key figures of merit to determine clinical utility for a test)
  - Tab with a calculator of cost to screen 1 true positive (the key figure of merit to determine cost-effectiveness of a test)
  - Tab showing how to calculate comparative cost-effectiveness of two tests
  - Tab showing the clinical utility comparison of FIT test and Cologuard from the 2014 NEJM paper
  - Tab showing the clinical utility comparison of FIT test and Cologuard Plus from the 2024 NEJM paper
  - Tab showing the economic comparison of FIT test and Cologuard from the 2014 NEJM paper
  - Tab showing the economic comparison of FIT test and Cologuard Plus from the 2024 NEJM paper
  - Tab on raw data from the 2014 NEJM paper that populates the other tabs
  - Tab on raw data from the 2024 NEJM paper that populates the other tabs
  - Tab on Exact Sciences financials pulled from their SEC filings
- Browser-based calculators
  - Calculator for positive predictive value (PPV) and number needed to screen (NNS) calculator (the key figures of merit to determine clinical utility for a test)
  - Calculator for cost to screen 1 true positive (the key figure of merit to determine cost-effectiveness of a test)
If you use these in preparing a research paper, grant proposal, or other publication, I would appreciate your acknowledging it by citing it in the references. Here is a suggested bibliography entry in APA or “author (date)” style:
```
Tseng, B. (2024). Calculators for diagnostic figures of merit [Computer software]. Retrieved [month day, year], from https://benjamintseng.com/portfolio/diagnostic-math/Code language: plaintext (plaintext)
```
Why is it so Hard to Build a Diagnostic Business?

December 4, 2024
Everywhere you look, the message seems clear: early detection (of cancer & disease) saves lives. Yet behind the headlines, companies developing these screening tools face a different reality. Many tests struggle to gain approval, adoption, or even financial viability. The problem isn’t that the science is bad — it’s that the math is brutal.

This piece unpacks the economic and clinical trade-offs at the heart of the early testing / disease screening business. Why do promising technologies struggle to meet cost-effectiveness thresholds, despite clear scientific advances? And what lessons can diagnostic innovators take from these challenges to improve their odds of success? By the end, you’ll have a clearer view of the challenges and opportunities in bringing new diagnostic tools to market—and why focusing on the right metrics can make all the difference.

The brutal math of diagnostics

Image Credit: Wikimedia

Technologists often prioritize metrics like sensitivity (also called recall) — the ability of a diagnostic test to correctly identify individuals with a condition (i.e., if the sensitivity of a test is 90%, then 90% of patients with the disease will register as positives and the remaining 10% will be false negatives) — because it’s often the key scientific challenge and aligns nicely with the idea of getting more patients earlier treatment.

But when it comes to adoption and efficiency, specificity — the ability of a diagnostic test to correctly identify healthy individuals (i.e., if the specificity of a test is 90%, then 90% of healthy patients will register as negatives and the remaining 10% will be false positives) — is usually the more important and overlooked criteria.

The reason specificity is so important is that it can have a profound impact on a test’s Positive Predictive Value (PPV) — whether or not a positive test result means a patient actually has a disease (i.e., if the positive predictive value of a test is 90%, then a patient that registers as positive has a 90% chance of having the disease and 10% chance of actually being healthy — being a false positive).

What is counter-intuitive, even to many medical and scientific experts, is that because (by definition) most patients are healthy, many high accuracy tests have disappointingly low PPV as most positive results are actually false positives.

Let me present an example (see table below for summary of the math) that will hopefully explain:
- There are an estimated 1.2 million people in the US with HIV — that is roughly 0.36% (the prevalence) of the US population
- Let’s say we have an HIV test with 99% sensitivity and 99% specificity — a 99% (very) accurate test!
- If we tested 10,000 Americans at random, you would expect roughly 36 of them (0.36% x 10,000) to be HIV positive. That means, roughly 9,964 are HIV negative
  - 99% sensitivity means 99% of the 36 HIV positive patients will test positive (99% x 36 = ~36)
  - 99% specificity means 99% of the 9,964 HIV negative patients will test negative (99% x 9,964 = ~9,864) while 1% (1% x 9,964 = ~100) would be false positives
- This means that even though the test is 99% accurate, it only has a positive predictive value of ~26% (36 true positives out of 136 total positive results)
Math behind the hypothetical HIV test example (Google Sheet link)

Below (if you’re on a browser) is an embedded calculator which will run this math for any values of disease prevalence and sensitivity / specificity (and here is a link to a Google Sheet that will do the same), but you’ll generally find that low disease rates result in low positive predictive values for even very accurate diagnostics.

Typically, introducing a new diagnostic means balancing true positives against the burden of false positives. After all, for patients, false positives will result in anxiety, invasive tests, and, sometimes, unnecessary treatments. For healthcare systems, they can be a significant economic burden as the cost of follow-up testing and overtreatment add up, complicating their willingness to embrace new tests.

Below (if you’re on a browser) is an embedded calculator which will run the basic diagnostic economics math for different values of the cost of testing and follow-up testing to calculate the cost of testing and follow-up testing per patient helped (and here is a link to a Google Sheet that will do the same)

Finally, while diagnostics businesses face many of the same development hurdles as drug developers — the need to develop cutting-edge technology, to carry out large clinical studies to prove efficacy, and to manage a complex regulatory and reimbursement landscape — unlike drug developers, diagnostic businesses face significant pricing constraints. Successful treatments can command high prices for treating a disease. But successful diagnostic tests, no matter how sophisticated, cannot, because they ultimately don’t treat diseases, they merely identify them.

Case Study: Exact Sciences and Cologuard

Let’s take Cologuard (from Exact Sciences) as an example. Cologuard is a combination genomic and immunochemistry test for colon cancer carried out on patient stool samples. It’s two primary alternatives are:
1. a much less sensitive fecal immunochemistry test (FIT) — which uses antibodies to detect blood in the stool as a potential, imprecise sign of colon cancer
2. colonoscopies — a procedure where a skilled physician uses an endoscope to enter and look for signs of cancer in a patient’s colon. It’s considered the “gold standard” as it functions both as diagnostic and treatment (a physician can remove or biopsy any lesion or polyp they find). But, because it’s invasive and uncomfortable for the patient, this test is typically only done every 4-10 years
Cologuard is (as of this writing) Exact Science’s primary product line, responsible for a large portion of Exact Science’s $2.5 billion in 2023 revenue. It can detect earlier stage colon cancer as well as pre-cancerous growths that could lead to cancer. Impressively, Exact Sciences also commands a gross margin greater than 70%, a high margin achieved mainly by pharmaceutical and software companies that have low per-unit costs of production. This has resulted in Exact Sciences, as of this writing, having a market cap over $11 billion.

Yet for all its success, Exact Sciences is also a cautionary note, illustrating the difficulties of building a diagnostics company.
- The company was founded in 1995, yet didn’t see meaningful revenue from selling diagnostics until 2014 (nearly 20 years later, after it received FDA approval for Cologuard)
- The company has never had a profitable year (this includes the last 10 years it’s been in-market), losing over $200 million in 2023, and in the first three quarters of 2024, it has continued to be unprofitable.
- Between 1997 (the first year we have good data from their SEC filings as summarized in this Google Sheet) and 2014 when it first achieved meaningful diagnostic revenue, Exact Sciences lost a cumulative $420 million, driven by $230 million in R&D spending, $88 million in Sales & Marketing spending, and $33 million in CAPEX. It funded those losses by issuing over $624 million in stock (diluting investors and employees)
- From 2015-2023, it has needed to raise an additional $3.5 billion in stock and convertible debt (net of paybacks) to cover its continued losses (over $3 billion from 2015-2023)
- Prior to 2014, Exact Sciences attempted to commercialize colon cancer screening technologies through partnerships with LabCorp (ColoSure and PreGenPlus). These were not very successful and led to concerns from the FDA and insurance companies. This forced Exact Sciences to invest heavily in clinical studies to win over the payers and the FDA, including a pivotal ~10,000 patient study to support Cologuard which recruited patients from over 90 sites and took over 1.5 years.
- It took Exact Sciences 3 years after FDA approval of Cologuard for its annual diagnostic revenues to exceed what it spends on sales & marketing. It continues to spend aggressively there ($727M in 2023).
While it’s difficult to know precisely what the company’s management / investors would do differently if they could do it all over again, the brutal math of diagnostics certainly played a key role.

From a clinical perspective, Cologuard faces the same low positive predictive value problem all diagnostic screening tests face. From the data in their study on ~10,000 patients, it’s clear that, despite having a much higher sensitivity for cancer (92.3% vs 73.8%) and higher AUROC (94% vs 89%) than the existing FIT test, the PPV of Cologuard is only 3.7% (lower than the FIT test: 6.9%).

Even using a broader disease definition that includes the pre-cancerous advanced lesions Exact Sciences touted as a strength, the gap on PPV does not narrow (Cologuard: 23.6% vs FIT: 32.6%)

Clinical comparison of FIT vs Cologuard
(Google Sheet link)

The economic comparison with a FIT test fares even worse due to the higher cost of Cologuard as well as the higher rate of false positives. Under the Center for Medicare & Medicaid Service’s 2024Q4 laboratory fee schedule, a FIT test costs $16 (CPT code: 82274), but Cologuard costs $509 (CPT code: 81528), over 30x higher! If each positive Cologuard and FIT test results in a follow-up colonoscopy (which has a cost of $800-1000 according to this 2015 analysis), the screening cost per cancer patient is 5.2-7.1x higher for Cologuard than for the FIT test.

Cost comparison of FIT vs Cologuard
(Google Sheet link)

This quick math has been confirmed in several studies.
- A study by a group at the University Medical Center of Rotterdam concluded that “Compared to nearly all other CRC screening strategies reimbursed by CMS (Medicare), [Cologuard] is less effective and considerably more costly, making it an inefficient screening option” and would only be comparable at a much lower cost (~$6-18!)
- Another study presented at the American College of Surgeons Clinical Congress in 2022 (see image below) looked at insurance reimbursement in the Alleghany Health Network and found that the FIT test was 5x cheaper in addressing early stage colon cancer.
From ACS Clinical Congress 2022 Presentation

While Medicare and the US Preventive Services Task Force concluded that the cost of Cologuard and the increase in false positives / colonoscopy complications was worth the improved early detection of colon cancer, it stayed largely silent on comparing cost-efficacy with the FIT test. It’s this unfavorable comparison that has probably required Exact Sciences to invest so heavily in sales and marketing to drive sales. That Cologuard has been so successful is a testament both to the value of being the only FDA-approved test on the market as well as Exact Science’s efforts in making Cologuard so well-known (how many other diagnostics do you know have an SNL skit dedicated to them?).

Not content to rest on the laurels of Cologuard, Exact Sciences recently published a ~20,000 patient study on their next generation colon cancer screening test: Cologuard Plus. While the study suggests Exact Sciences has improved the test across the board, the company’s marketing around Cologuard Plus having both >90% sensitivity and specificity is misleading, because the figures for sensitivity and specificity are for different conditions: sensitivity for colorectal cancer but specificity for colorectal cancer OR advanced precancerous lesion (see the table below).

Sensitivity and Specificity by Condition for Cologuard Plus Study
(Google Sheet link)

Disentangling these numbers shows that while Cologuard Plus has narrowed its PPV disadvantage (now worse by 1% on colorectal cancer and even on cancer or lesion) and its cost-efficacy disadvantage (now “only” 4.4-5.8x more expensive) vs the FIT test (see tables below), it still hasn’t closed the gap.

Clinical: Cologuard+ vs FIT (Google Sheet link)

Economic: Cologuard+ vs FIT (Google Sheet link)

Time will tell if this improved test performance translates to continued sales performance for Exact Sciences, but it is telling that despite the significant time and resources that went into developing Cologuard Plus, the data suggests it’s still likely more cost effective for health systems to adopt FIT over Cologuard Plus as a means of preventing advanced colon cancer.

Lessons for diagnostics companies

The underlying math of the diagnostics business and the lessons from Exact Sciences’ long path to dramatic sales has several key lessons for diagnostic entrepreneurs:
1. Focus on specificity — For diagnostic technologists, too little attention is paid to specificity while too much attention is paid on sensitivity. Positive predictive value and the cost-benefit for a health system are largely going to swing on specificity.
2. Aim for higher value tests — Because the development and required validation for a diagnostic can be as high as that of a drug or medical device, it is important to pursue opportunities where the diagnostic can command a high price. These are usually markets where the alternatives are very expensive because they require new technology (e.g. advanced genetic tests) or a great deal of specialized labor (e.g. colonoscopy) or where the diagnostic directly decides on a costly course of treatment (e.g. a companion diagnostic for an oncology drug).
3. Go after unmet needs — If a test is able to fill a mostly unmet need — for example, if the alternatives are extremely inaccurate or poorly adopted — then adoption will be determined by awareness (because there aren’t credible alternatives) and pricing will be determined by sensitivity (because this drives the delivery of better care). This also simplifies the sales process.
4. Win beyond the test — Because performance can only ever get to 100%, each incremental point on sensitivity and specificity is both exponentially harder to achieve but also delivers less medical or financial value. As a result, it can be advantageous to focus on factors beyond the test such as regulatory approval / guidelines adoption, patient convenience, time to result, and impact on follow-up tests and procedures. Cologuard gained a great deal from being “the first FDA-approved colon cancer screening test”. Non-invasive prenatal testing, despite low positive predictive values and limited disease coverage, gained adoption in part by helping to triage follow-up amniocentesis (a procedure which has a low but still frighteningly high rate of miscarriage ~0.5%). Rapid antigen tests for COVID have also similarly been adopted despite their lower sensitivity and specificity than PCR tests due to their speed, low cost, and ability to carry out at home.
Diagnostics developers must carefully navigate the intersection of scientific innovation and financial reality, while grappling with the fact that even the most impressive technology may be insufficient without taking into account clinical and economic factors to achieve market success.

Ultimately, the path forward for diagnostic innovators lies in prioritizing specificity, targeting high-value and unmet needs, and crafting solutions that deliver value beyond the test itself. While Exact Science’s journey underscores the difficulty of these challenges, it also illustrates that with persistence, thoughtful investment, and strategic differentiation, it is possible to carve out a meaningful and impactful space in the market.
Updating my AI News Reader

November 21, 2024
A few months ago, I shared that I had built an AI-powered personalized news reader which I use (and still do) on a near-daily basis. Since that post, I’ve made a couple of major improvements (which I have just reflected in my public Github).

Switching to JAX

I previously chose Keras 3 for my deep learning algorithm architecture because of its ease of use as well as the advertised ability to shift between AI/ML backends (at least between Tensorflow, JAX, and PyTorch). With Keras creator Francois Chollet noting significant speed-ups just from switching backends to JAX, I decided to give the JAX backend a shot.

Thankfully, Keras 3 lived up to it’s multi-backend promise and made switching to JAX remarkably easy. For my code, I simply had to make three sets of tweaks.

First, I had to change the definition of my container images. Instead of starting from Tensorflow’s official Docker images, I instead installed JAX and Keras on Modal’s default Debian image and set the appropriate environmental variables to configure Keras to use JAX as a backend:
```
jax_image = (
    modal.Image.debian_slim(python_version='3.11')
    .pip_install('jax[cuda12]==0.4.35', extra_options="-U")
    .pip_install('keras==3.6')
    .pip_install('keras-hub==0.17')
    .env({"KERAS_BACKEND":"jax"}) # sets Keras backend to JAX
    .env({"XLA_PYTHON_CLIENT_MEM_FRACTION":"1.0"})
Code language: Python (python)
```
Second, because tf.data pipelines convert everything to Tensorflow tensors, I had to switch my preprocessing pipelines from using Keras’s ops library (which, because I was using JAX as a backend, expected JAX tensors) to Tensorflow native operations:
```
ds = ds.map(
    lambda i, j, k, l: 
    (
        preprocessor(i), 
        j, 
        2*k-1, 
        loglength_norm_layer(tf.math.log(tf.cast(l, dtype=tf.float32)+1))
    ), 
    num_parallel_calls=tf.data.AUTOTUNE
)
Code language: Python (python)
```
Lastly, I had a few lines of code which assumed Tensorflow tensors (where getting the underlying value required a .numpy() call). As I was now using JAX as a backend, I had to remove the .numpy() calls for the code to work.

Everything else — the rest of the tf.data preprocessing pipeline, the code to train the model, the code to serve it, the previously saved model weights and the code to save & load them — remained the same! Considering that the training time per epoch and the time the model took to evaluate (a measure of inference time) both seemed to improve by 20-40%, this simple switch to JAX seemed well worth it!

Model Architecture Improvements

There were two major improvements I made in the model architecture over the past few months.

First, having run my news reader for the better part of a year now, I now have accumulated enough data where my strategy to simultaneously train on two related tasks (predicting the human rating and predicting the length of an article) no longer required separate inputs. This reduced the memory requirement as well as simplified the data pipeline for training (see architecture diagram below)

Secondly, I was successfully able to train a version of my algorithm which can use dot products natively. This not only allowed me to remove several layers from my previous model architecture (see architecture diagram below), but because the Supabase postgres database I’m using supports pgvector, it means I can even compute ratings for articles through a SQL query:
```
UPDATE articleuser
SET 
    ai_rating = 0.5 + 0.5 * (1 - (a.embedding <=> u.embedding)),
    rating_timestamp = NOW(),
    updated_at = NOW()
FROM 
    articles a, 
    users u
WHERE 
    articleuser.article_id = a.id
    AND articleuser.user_id = u.id
    AND articleuser.ai_rating IS NULL;
Code language: SQL (Structured Query Language) (sql)
```
The result is much greater simplicity in architecture as well as greater operational flexibility as I can now update ratings from the database directly as well as from serving a deep neural network from my serverless backend.

Model architecture (output from Keras plot_model function)

Making Sources a First-Class Citizen

As I used the news reader, I realized early on that the ability to just have sorted content from one source (i.e. a particular blog or news site) would be valuable to have. To add this, I created and populated a new sources table within the database to track these independently (see database design diagram below) which was linked to the articles table.

Newsreader database design diagram (produced by a Supabase tool)

I then modified my scrapers to insert the identifier for each source alongside each new article, as well as made sure my fetch calls all JOIN‘d and pulled the relevant source information.

With the data infrastructure in place, I added the ability to add a source parameter to the core fetch URLs to enable single (or multiple) source feeds. I then added a quick element at the top of the feed interface (see below) to let a user know when the feed they’re seeing is limited to a given source. I also made all the source links in the feed clickable so that they could take the user to the corresponding single source feed.
```
<div class="feed-container">
  <div class="controls-container">
    <div class="controls">
      ${source_names && source_names.length > 0 && html`
        <div class="source-info">
          Showing articles from: ${source_names.join(', ')}
        </div>
        <div>
          <a href="/">Return to Main Feed</a>
        </div>
      `}
    </div>
  </div>
</div>
Code language: HTML, XML (xml)
```
The interface when on a single source feed

Performance Speed-Up

One recurring issue I noticed in my use of the news reader pertained to slow load times. While some of this can be attributed to the “cold start” issue that serverless applications face, much of this was due to how the news reader was fetching pertinent articles from the database. It was deciding at the moment of the fetch request what was most relevant to send over by calculating all the pertinent scores and rank ordering. As the article database got larger, this computation became more complicated.

To address this, I decided to move to a “pre-calculated” ranking system. That way, the system would know what to fetch in advance of a fetch request (and hence return much faster). Couple that with a database index (which effectively “pre-sorts” the results to make retrieval even faster), and I saw visually noticeable improvements in load times.

But with any pre-calculated score scheme, the most important question is how and when re-calculation should happen. Too often and too broadly and you incur unnecessary computing costs. Too infrequently and you risk the scores becoming stale.

The compromise I reached derived itself from the three ways articles are ranked in my system:
1. The AI’s rating of an article plays the most important role (60%)
2. How recently the article was published is tied with… (20%)
3. How similar an article is with the 10 articles a user most recently read (20%
These factors lent themselves to very different natural update cadences:
- Newly scraped articles would have their AI ratings and calculated score computed at the time they enter the database
- AI ratings for the most recent and the previously highest scoring articles would be re-computed after model training updates
- On a daily basis, each article’s score was recomputed (focusing on the change in article recency)
- The article similarity for unread articles is re-evaluated after a user reads 10 articles
This required modifying the reader’s existing scraper and post-training processes to update the appropriate scores after scraping runs and model updates. It also meant tracking article reads on the users table (and modifying the /read endpoint to update these scores at the right intervals). Finally, it also meant adding a recurring cleanUp function set to run every 24 hours to perform this update as well as others.

Next Steps

With some of these performance and architecture improvements in place, my priorities are now focused on finding ways to systematically improve the underlying algorithms as well as increase the platform’s usability as a true news tool. To that end some of the top priorities for next steps in my mind include:
- Testing new backbone models — The core ranking algorithm relies on Roberta, a model released 5 years ago before large language models were common parlance. Keras Hub makes it incredibly easy to incorporate newer models like Meta’s Llama 2 & 3, OpenAI’s GPT2, Microsoft’s Phi-3, and Google’s Gemma and fine-tune them.
- Solving the “all good articles” problem — Because the point of the news reader is to surface content it considers good, users will not readily see lower quality content, nor will they see content the algorithm struggles to rank (i.e. new content very different from what the user has seen before). This makes it difficult to get the full range of data needed to help preserve the algorithm’s usefulness.
- Creating topic and author feeds — Given that many people think in terms of topics and authors of interest, expanding what I’ve already done with Sources but with topics and author feeds sounds like a high-value next step
I also endeavor to make more regular updates to the public Github repository (instead of aggregate many updates I had already made into two large ones). This will make the updates more manageable and hopefully help anyone out there who’s interested in building a similar product.
Not your grandma’s geothermal energy

November 15, 2024

The pursuit of carbon-free energy has largely leaned on intermittent sources of energy — like wind and solar; and sources that require a great deal of initial investment — like hydroelectric (which requires elevated bodies of water and dams) and nuclear (which require you to set up a reactor).

The theoretical beauty of geothermal power is that, if you dig deep enough, virtually everywhere on planet earth is hot enough to melt rock (thanks to the nuclear reactions that heat up the inside of the earth). But, until recently, geothermal has been limited to regions of Earth where well-formed geologic formations can deliver predictable steam without excessive engineering.

But, ironically, it is the fracking boom, which has helped the oil & gas industries get access to new sources of carbon-producing energy, which may help us tap geothermal power in more places. As fracking and oil & gas exploration has led to a revolution in our ability to precisely drill deep underground and push & pull fluids, it also presents the ability for us to tap more geothermal power than ever before. This has led to the rise of enhanced geothermal, the process by which we inject water deep underground to heat, and leverage the steam produced to generate electricity. Studies suggest the resource is particularly rich and accessible in the Southwest of the United States (see map below) and could be an extra tool in our portfolio to green energy consumption.

(Source: Figure 5 from NREL study on enhanced geothermal from Jan 2023)

While there is a great deal of uncertainty around how much this will cost and just what it will take (not to mention the seismic risks that have plagued some fracking efforts), the hunger for more data center capacity and the desire to power this with clean electricity has helped startups like Fervo Energy and Sage Geosystems fund projects to explore.

On 17 October, Fervo Energy, a start-up based in Houston, Texas, got a major boost as the US government gave the green light to the expansion of a geothermal plant Fervo is building in Beaver County, Utah. The project could eventually generate as much as 2,000 megawatts — a capacity comparable with that of two large nuclear reactors. Although getting to that point could take a while, the plant already has 400 MW of capacity in the pipeline, and will be ready to provide around-the-clock power to Google’s energy-hungry data centres, and other customers, by 2028. In August, another start-up, Sage Geosystems, announced a partnership with Facebook’s parent company Meta to deliver up to 150 MW of geothermal power to Meta’s data centres by 2027.

Geothermal power is vying to be a major player in the world’s clean-energy future
David Castelvecchi | Nature News

The Startup Battlefield: Lessons from History’s Greatest Military Leaders

September 18, 2024

It is hard to find good analogies for running a startup that founders can learn from. Some of the typical comparisons — playing competitive sports & games, working on large projects, running large organizations — all fall short of capturing the feeling that the odds are stacked against you that founders have to grapple with.

But the annals of military history offer a surprisingly good analogy to the startup grind. Consider the campaigns of some of history’s greatest military leaders — like Alexander the Great and Julius Caesar — who successfully waged offensive campaigns against numerically superior opponents in hostile territory. These campaigns have many of the same hallmarks as startups:

Bad odds: Just as these commanders faced superior enemy forces in hostile territory, startups compete against incumbents with vastly more resources in markets that favor them.
Undefined rules: Unlike games with clear rules and a limited set of moves, military commanders and startup operators have broad flexibility of action and must be prepared for all types of competitive responses.
Great uncertainty: Not knowing how the enemy will act is very similar to not knowing how a market will respond to a new offering.

As a casual military history enthusiast and a startup operator & investor, I’ve found striking parallels in how history’s most successful commanders overcame seemingly insurmountable odds with how the best startup founders operate, and think that’s more than a simple coincidence.

In this post, I’ll explore the strategies and campaigns of 9 military commanders (see below) who won battle after battle against numerically superior opponents across a wide range of battlefields. By examining their approach to leadership and strategy, I found 5 valuable lessons that startup founders can hopefully apply to their own ventures.

Leader	Represented	Notable Victories	Legacy
Alexander the Great	Macedon (336-323 BCE)	Tyre, Issus, Gaugamela, Persian Gate, Hydapses	Conquered the Persian Empire before the age of 32; spread Hellenistic culture across Eurasia and widely viewed in the West as antiquity’s greatest conqueror
Hannibal Barca	Carthage (221-202 BCE)	Ticinus, Trebia, Trasimene, Cannae	Brought Rome the closest to its defeat until its fall in 5th century CE; he operated freely within Italy for over a decade
Han Xin (韓信)	Han Dynasty (漢朝) (206-202 BCE)	Jingxing (井陘), Wei River (濰水), Anyi (安邑)	Despite being a commoner, his victories led to the creation of the Han Dynasty (漢朝) and his being remembered as one of “the Three Heroes of the Han Dynasty” (漢初三傑)
Gaius Julius Caesar	Rome (59-45 BCE)	Alesia, Pharsalus	Established Rome’s dominance in Gaul (France); became undisputed leader of Rome, effectively ending the Roman Republic, and his name has since become synonymous with “emperor” in the West
Subutai	Mongol Empire (1211-1248)	Khunan, Kalka River, Sanfengshan ( 三峰山), Mohi	Despite being a commoner, became one of the most successful military commanders in the Mongol Empire. Successfully won battles in more theaters than any other commander (China, Central Asia, and Eastern Europe)
Timur	Timurid Empire (1370-1405)	Kondurcha River, Terek River, Dehli, Ankara	Created Central Asian empire with dominion over Turkey, Persia, Northern India, Eastern Europe, and Central Asia. His successors would eventually create the Mughal Empire in India which continued until the 1850s
John Churchill, Duke of Marlborough	Britain (1670-1712)	Blenheim, Ramillies	Considered one of the greatest British commanders in history; Paved the way for Britain to overtake France as the pre-eminent military and economic power in Europe
Frederick the Great	Prussia (1740-1779)	Hohenfriedberg, Rossbach, Leuthen	Established Prussia as the pre-eminent Central European power after defeating nearly every major European power in battle; A cultural icon for the creation of Germany
Napoleon Bonaparte	France (1785-1815)	Rivoli, Tarvis, Ulm, Austerlitz, Jena-Auerstedt, Friedland, Dresden	Established a French empire with dominion over most of continental Europe; the Napoleonic code now serves as basis for legal systems around the world and the word Napoleon synonymous with military genius and ambition

Before I dive in, three important call-outs to remember:

Running a startup is not actually warfare — there are limitations to this analogy. Startups are not (and should not be) life-or-death. Startup employees are not bound by military discipline (or the threat of imprisonment if they are derelict). The concept of battlefield deception, which is at the heart of many of the tactics of the greatest commanders, also doesn’t translate well. Treating your employees / co-founders as one would a soldier or condoning violent and overly aggressive tactics would be both an ethical failure and a misread of this analogy.
Drawing lessons from these historical campaigns does not mean condoning the underlying sociopolitical causes of these conflicts, nor the terrible human and economic toll these battles led to. Frankly, many of these commanders were absolutist dictators with questionable motivations and sadistic streaks. This post’s focus is purely on getting applicable insights on strategy and leadership from leaders who were able to win despite difficult odds.
This is not intended to be an exhaustive list of every great military commander in history. Rather, it represents the intersection of offensive military prowess and my familiarity with the historical context. Just because I did not mention a particular commander has no bearing on their actual greatness.

With those in mind, let’s explore how the wisdom of historical military leaders can inform the modern startup journey. In the post, I’ll unpack five key principles (see below) drawn from the campaigns of history’s most successful military commanders, and show how they apply to the challenges ambitious founders face today.

1. Get in the trenches with your team

2. Achieve and maintain tactical superiority

3. Move fast and stay on offense

4. Unconventional teams win

5. Pick bold, decisive battles

Principle 1: Get in the trenches with your team

One common thread unites the greatest military commanders: their willingness to share in the hardships of their soldiers. This exercise of leadership by example, of getting “in the trenches” with one’s team, is as crucial in the startup world as it was on historical battlefields.

Every commander on our list was renowned for marching and fighting alongside their troops. This wasn’t mere pageantry; it was a fundamental aspect of their leadership style that yielded tangible benefits:

Inspiration: Seeing their leader work shoulder-to-shoulder with them motivated soldiers to push beyond their regular limits.
Trust: By sharing in their soldiers’ hardships, commanders demonstrated that they valued their troops and understood their needs.
Insight: Direct involvement gave leaders firsthand knowledge of conditions on the ground, informing better strategic decisions.

Perhaps no figure exemplified this better than Alexander the Great. Famous for being one of the first soldiers to jump into battle, Alexander was wounded seriously multiple times. This shared experience created a deep bond with his soldiers, culminating in his legendary speech at Opis where he was able to quell a mutiny of his soldiers, tired after years of campaigns, with a speech reminding them of their shared experiences:

Alexander the Great from Alexandria, Egypt (3rd Century BCE); Image Credit: Wikimedia

The wealth of the Lydians, the treasures of the Persians, and the riches of the Indians are yours; and so is the External Sea. You are viceroys, you are generals, you are captains. What then have I reserved to myself after all these labors, except this purple robe and this diadem? I have appropriated nothing myself, nor can any one point out my treasures, except these possessions of yours or the things which I am guarding on your behalf. Individually, however, I have no motive to guard them, since I feed on the same fare as you do, and I take only the same amount of sleep.

Nay, I do not think that my fare is as good as that of those among you who live luxuriously; and I know that I often sit up at night to watch for you, that you may be able to sleep.

But some one may say, that while you endured toil and fatigue, I have acquired these things as your leader without myself sharing the toil and fatigue. But who is there of you who knows that he has endured greater toil for me than I have for him? Come now, whoever of you has wounds, let him strip and show them, and I will show mine in turn; for there is no part of my body, in front at any rate, remaining free from wounds; nor is there any kind of weapon used either for close combat or for hurling at the enemy, the traces of which I do not bear on my person.

For I have been wounded with the sword in close fight, I have been shot with arrows, and I have been struck with missiles projected from engines of war; and though oftentimes I have been hit with stones and bolts of wood for the sake of your lives, your glory, and your wealth, I am still leading you as conquerors over all the land and sea, all rivers, mountains, and plains. I have celebrated your weddings with my own, and the children of many of you will be akin to my children.
Alexander the Great (as told by Arrian)

This was not unique to Alexander. Julius Caesar famously slept in chariots and marched alongside his soldiers. Napoleon was called “le petit caporal” by his troops after he was found sighting the artillery himself, a task that put him within range of enemy fire and was usually delegated to junior officers.

Frederick the Great also famously mingled with his soldiers while on tour, taking kindly to the nickname from his men, “Old Fritz”. Frederick understood the importance of this as he once wrote to his nephew:

“You cannot, under any pretext whatever, dispense with your presence at the head of your troops, because two thirds of your soldiers could not be inspired by any other influence except your presence.”
Frederick the Great

“Old Fritz” after the Battle of Hochkirch
Image credit: WikiMedia Commons

For Startups

For founders, the lesson is clear: show up when & where your team is and roll up your sleeves so they can see you work beside them. It’s not just that startups tend to need “all hands on deck”, but being in the trenches also provides “on the ground” context that is valuable and help create the morale needed to succeed.

Elon Musk, for example, famously spent time on the Tesla factory floor — even sleeping on it — while the company worked through issues with its Model 3 production, noting in an interview:

“I am personally on that line, in that machine, trying to solve problems personally where I can,” Musk said at the time. “We are working seven days a week to do it. And I have personally been here on zone 2 module line at 2:00 a.m. on a Sunday morning, helping diagnose robot calibration issues. So I’m doing everything I can.”

Principle 2: Achieve and maintain tactical superiority

To win battles against superior numbers requires a commander to have a strong tactical edge over their opponents. This can be in the form of a technological advantage (i.e. a weapons technology) or an organizational one (i.e. superior training or formations), but these successful commanders always made sure their soldiers could “punch above their weight”.

Alexander the Great, for example, leveraged the Macedonian Phalanx, a modification of the “classical Greek phalanx” used by the Greek city states of the era, that his father Philip II helped create.

Image Credit: RedTony via WikiMedia Commons

The formation relied on “blocks” of heavy infantry equipped with six-meter (!!) long spears called sarissa which could rearrange themselves (to accommodate different formation widths and depths) and “pin” enemy formations down while the heavy cavalry would flank or exploit gaps in the enemy lines. This formation made Alexander’s army highly effective against every military force — Greeks, Persians, and Indians — it encountered.

Macedonian Phalanx with *sarissa*; Image Credit: Wikimedia Commons

A few centuries later, the brilliant Chinese commander Han Xin (韓信) leaned heavily on the value of military engineering. Han Xin (韓信)’s soldiers would rapidly repair & construct roads to facilitate his army’s movement or, at times, to deceive his enemies about which path he planned to take. His greatest military engineering accomplishment was at the Battle of Wei River (濰水) in 204 BCE. Han Xin (韓信) attacked the larger forces of the State of Qi (齊) and State of Chu (楚) and immediately retreated across the river, luring them to cross. What his rivals had not realized in their pursuit was that the water level of the Wei River was oddly low. Han Xin (韓信) had, prior to the attack, instructed his soldiers to construct a dam upstream to lower the water level. Once a sizable fraction of the enemy’s forces were mid-stream, Han Xin (韓信) ordered the dam released. The rush of water drowned a sizable portion of the enemy’s forces and divided the Chu (楚) / Qi (齊) forces letting Han Xin (韓信)’s smaller army defeat and scatter them.

A century and a half later, Roman statesman and military commander Gaius Julius Caesar also famously advocated military engineering capability in his wars with the Germanic tribes in Gaul. He became the first Roman commander to cross the Rhine (twice!) by building bridges to make the point to the Germanic tribes that he could invade them whenever he wanted. At the Battle of Alesia in 52 BCE, after trading battles with the skilled Gallic commander Vercingetorix who had united the tribes in opposition to Rome, Caesar besieged Vercingetorix’s fortified settlement of Alesia while simultaneously holding off Gallic reinforcements. Caesar did this by building 25 miles of fortifications surrounding Alesia in a month, all while outnumbered and under constant harassment from both sides by the Gallic forces! Caesar’s success forced Vercingetorix to surrender, bringing an end to organized resistance to Roman rule in Gaul for centuries.

Vercingetorix Throws Down his Arms at the Feet of Julius Caesar by Lionel Royer; Image Credit: Wikimedia

The Mongol commander Subutai similarly made great use of Mongol innovations to overcome defenders from across Eurasia. The lightweight Mongol composite bow gave Mongol horse archers a devastating combination of long range (supposedly 150-200 meters!) and speed (because they were light enough to be fired while on horseback). The Mongol horses themselves were another “biotechnological” advantage in that they required less water and food which let the Mongols wage longer campaigns without worrying about logistics.

Mongol horse archers, Image credit: Wikimedia Commons

In the 18th century, Frederick the Great transformed warfare on the European continent with a series of innovations. First, he drilled his soldiers stressing things like firing speed. It is said that lines of Prussian riflemen could fire over twice as fast as other European armies they faced, making them exceedingly lethal in combat.

Frederick’s Leibgarde Batallion in action; Image credit: Military Heritage

Frederick was also famous for a battle formation: the oblique order. Instead of attacking an opponent head on, the oblique order involves confronting the enemy line at an angle with soldiers massed towards one end of the formation. If one’s soldiers are well-trained and disciplined, then even with a smaller force in aggregate, the massed wing can overwhelm the opponent in one area and then flank or surround the rest. Frederick famously boasted that the oblique order could allow a skilled force to win over an opposing one three times its size.

Finally, Frederick is credited with popularizing horse artillery, the use of horse-drawn light artillery guns, in European warfare. With horse artillery units, Frederick was able to increase the adaptability of his forces and their ability to break through even numerically superior massed infantry by concentrating artillery fire where it was needed.

Horse-drawn artillery unit; Image credit: Wikimedia Commons

A few decades later, Napoleon Bonaparte became the undisputed master of much of continental Europe by mastering army-level logistics and organization. While a brilliant tactician and artillery commander, what set Napoleon’s military apart was its embrace of the “corps system”, which subdivided his forces into smaller, self-contained corps that were capable of independent operations. This allowed Napoleon the ability to pursue grander goals, knowing that he could focus his attention on the most important fronts of battle, while the other corps could independently pin an enemy down or pursue a different objective in parallel.

Napoleon triumphantly entering Berlin by Charles Meynier; Image Credit: Wikimedia Commons

Additionally, Napoleon invested heavily in overhauling military logistics, using a combination of forward supply depots and teaching his forces to forage for food and supplies in enemy territory (and, just as importantly, how to estimate what foraging can do to help determine the necessary supplies to take). This investment led to the invention of modern canning technology, first used to support the marches of the French Grande Armée. The result was Napoleon could field larger armies over longer campaigns all while keeping his soldiers relatively well-fed.

For Startups

Founders need to make sure they have a strong tactical advantage that fits their market(s). As evidenced above, it does not need to be something as grand as an unassailable advantage, but it needs to be a reliable winner and something you continuously invest in if you plan on competing with well-resourced incumbents in challenging markets.

The successful payments company Stripe started out by making sure they would always win on developer ease of use, even going so far as to charge more than their competition during their Beta to make sure that their developer customers were valuing them for their ease of use. Stripe’s advantage here, and continuous investment in maintaining that advantage, ultimately let it win any customer that needed a developer payment integration, even against massive financial institutions. This advantage laid the groundwork for Stripe’s meteoric growth and expansion into adjacent categories from its humble beginnings.

Principle 3: Move fast and stay on offense

In both military campaigns and startups, speed and a focus on offense plays an outsized role in victory, because the ability to move quickly creates opportunities and increases resiliency to mistakes.

Few understood this principle as well as the Mongol commander Subutai who frequently took advantage of the greater speed and discipline of the Mongol cavalry to create opportunities to win.

In the Battle of the Kalka River (1223), Subutai took what initially appeared to be a Mongol defeat — when the Kievan Rus and their Cuman allies successfully entrapped the Mongol forces in the area — and turned it into a victory. The Mongols began a 9 day feigned retreat (many historians believe this was a real retreat that Subutai turned into a feigned one once he realized the situation), constantly tempting the enemy by staying just out of reach into overextending themselves in pursuit.

After 9 days, Subutai’s forces took advantage of their greater speed to lay a trap. Once the Mongols crossed the river they reformed their lines to lie in ambush. As soon as the Rus forces crossed the Kalka River, they found themselves surrounded and confronted with a cavalry charge they were completely unprepared for. After all, they had been pursuing what they thought was a fleeing enemy! Their backs against the river, the Rus forces (including several major princes) were annihilated.

Battle of Kalka River; Image Credit: Wikimedia Commons

Subutai took advantage of the Mongol speed advantage in a number of his campaigns, coordinating fast-moving Mongol divisions across multiple objectives. In its destruction of the Central Asian Khwarazmian empire, the Mongols, under the command of Subutai and Mongol ruler Genghis Khan, overwhelmed the defenders with coordinated maneuvers. While much of the Mongol forces attacked from the East, where the Khwarazmian forces massed, Subutai used the legendary Mongol speed to go around the Khwarazmian lines altogether, ending up at Bukhara, 100 miles to the West of the Khwarazmian defensive position! In a matter of months, the empire was destroyed and its rulers chased out, never to return.

Map of the Mongol force movements in the Mongol invasion of Khwarazmian Empire; Image Credit: Paul K. Davis, *Masters of the Battlefield*

A few hundred years later, the Englishman John Churchill, the Duke of Marlborough also proved the value of speed in 1704 when he boldly marched an army of 21,000 Dutch and English troops on a 250-mile march across Europe in just five weeks to place themselves between French and Bavarian forces and their target of Vienna. Had Vienna been attacked, it would have forced England’s ally the Holy Roman Empire out of the conflict, giving France the victory in the War of the Spanish Succession. This march was made all the more challenging as Marlborough had to find a way to feed and equip his army along this march without unnecessarily burdening the neutral and friendly territories they were marching through.

Marlborough’s “march to the Danube”; Image Credit: Rebel Redcoat

Marlborough’s maneuver threw the Bavarian and French forces off-balance. What originally was supposed to be an “easy” French victory culminated in a crushing defeat for the French at Blenheim which turned the momentum of the war. This victory solidified Marlborough’s reputation and even resulted in the British government agreeing to build a lavish palace (called Blenheim Palace in honor of the battle) as a reward to Marlborough.

Marlborough proved the importance of speed again at the Battle of Oudenarde. In 1708, French forces captured Ghent and Bruges (in modern day Belgium), threatening the alliance’s ability to maintain contact with Britain. Recognizing this, Marlborough force-marched his army to the city of Oudenarde, marching 30 miles in about as many hours. The French, confident from their recent victories and suffering from an internal leadership squabble, misjudged the situation, allowing Marlborough’s forces to build five pontoon bridges to move his 80,000 soldiers across the nearby river.

When the French commander received news that the allies were already at Oudenarde building bridges, he said, “If they are there, then the devil must have carried them. Such marching is impossible!“

Marlborough’s forces, not yet at full strength, engaged the French, buying sufficient time for his forces to cross and form up. Once in formation, they counterattacked and collapsed one wing of the French line, saving the Allied position in the Netherlands, and resulting in a bad defeat for French forces.

The Battle of Oudenarde, showing the position of the bridges the Allied forces needed to cross to get into position; Image Credit: WikiMedia Commons

For Startups

The pivotal role speed played in achieving victory for Subutai and the Duke of Marlborough apply in the startup domain as well. The ability to make fast decisions, to quickly shift focus to rapidly adapt to a new market context creates opportunities that slower moving incumbents (and military commanders!) cannot seize. Speed also gifts resiliency against mistakes and weak positions, in much the same way that speed let the Mongols and the Anglo-Prussian-Dutch alliance overcome their initial missteps at Kalka River and Oudenarde. Founders would be wise to remember to embrace speed of action in all they do.

Facebook and it’s (now in)famous “move fast, break things” motto is one classic example of how a company can internalize speed as a culture. It leveraged that to ship products and features which has kept it a leader in social and AI even in the face of constant competition and threats from well-funded companies like Google, Snapchat, and Bytedance.

Principle 4: Unconventional teams win

Another unifying hallmark of the great commanders is that they made unconventional choices with regards to their army composition. Relative to their peers, these commanders tended to build armies that were more diverse in class and nationality. While this required exceptional communication and inspiration skills, it gave the commanders significant advantages:

Ability to recruit in challenging conditions: For many of the commanders, the unconventional team structure was a necessity to build up the forces they needed given logistical / resource constraints while operating in enemy territory.
Operational flexibility from new tactics: Bringing on personnel from different backgrounds let commanders incorporate additional tactics and strategies, creating a more effective and flexible fighting force.

The Carthaginian general Hannibal Barca for example famously fielded a multi-nationality army consisting of Carthaginians, Libyans, Iberians, Numidians, Balearic soldiers, Gauls, and Italians. This allowed Hannibal to raise an army in hostile territory — after all, waging war in the heart of Italy against Rome made it difficult to get reinforcements from Carthage.

Illustration of troop types employed in the Second Punic War by Carthage/Hannibal Barca; Image Credit: Travis’s Ancient History

But, it also gave Hannibal’s army flexibility in tactics. Balearic slingers provided superior long range attack to the best Roman-used bows of the time. Numidian light cavalry provided Hannibal with fast reconnaissance and a quick way to flank and outmaneuver Roman forces. Gallic and Iberian soldiers provided shock infantry and cavalry. Each of these groups of soldiers added their own distinctive capabilities to Hannibal’s armies and great victories over Rome.

The Central Asian conqueror Timur similarly fielded a diverse army which included Mongols, Turks, Persians, Indians, Arabs, and others. This allowed Timur to field larger armies for his campaigns by recruiting from the countries he forced into submission. Like with Hannibal, it also gave Timur’s army access to a diverse set of tactics: war elephants (from India), infantry and siege technology from the Persians, gunpowder from the Ottomans, and more. This combination of operational flexibility and ability to field large armies let Timur build an empire which defeated every major power in Central Asia and the Middle East.

The Defeat by Timur of the Sultan of Dehli (from the Imperial Library of Emperor Akbar);
Image credit: Wikimedia

It should not be a surprise that some of the great commanders were drawn towards assembling unconventional teams as several of them were ultimately “commoners”. Subutai (a son of a blacksmith who Genghis Khan took interest in), Timur (a common thief), and Han Xin (韓信, who famously had to beg for food in his childhood) all came from relatively humble origins. Napoleon, famous for declaring the military “la carrier est ouvérte aux talents” (“the career open to the talents”) and creating the first modern order of merit Légion d’honneur (open to all, regardless of social class), was similarly motivated by the difficulties he faced in securing promotion early in his career due to his not being from the French nobility.

But, by embracing more of a meritocracy, Napoleon was ultimately able to field some of the largest European armies in existence as he waged war successfully against every other major European power (at once).

*First Légion d’Honneur Investiture* by Jean-Baptiste Debret;
Image Credit: Wikimedia

For Startups

Hiring is one of the key tasks for startup founders. While hiring the people that larger, better-resourced companies want to can be helpful for a startup, it’s important to always remember that transformative victories require unconventional approaches. Leaning on unconventional hires may help you get out of a salary bidding war with those deeper-pocketed competitors. Choosing unconventional hires may also add different skills and perspectives to the team.

In pursuing this strategy, it’s also vital to excel at communication & organization as well as fostering a shared sense of purpose. All teams require strong leadership to be effective but this is especially true with an unconventional team composition facing uphill odds.

The enterprise API company Zapier is one example of taking an unconventional approach to team construction by having been 100% remote from inception (pre-COVID even). This let the company assemble a team without being confined by location and eliminate the need to spend on unnecessary facilities. They’ve had to invest in norms around documentation and communication to make this work, and, while it’d be too far of a leap to argue all startups should go 100% remote, for Zapier’s market and team culture, it’s worked.

Principle 5: Pick bold, decisive battles

When in a challenging environment with limited resources, it’s important to prioritize decisive moves — actions which can result in a huge payoff — even if risky over safer, less impactful ones. This is as true for startups, which have limited runway and need to make a big splash in order to fundraise, as for military commanders who need more than just battlefield wins but strategic victories.

Few understood this as well as the Carthaginian general Hannibal Barca who, in waging the Second Punic War against Rome, crossed the Alps from Spain with his army in 218 BCE (at the age of 29!). Memorialized in many works of art (see below for one by Francisco Goya), this was a dangerous move (one that resulted in the loss of many men and almost his entire troop of war elephants) and was widely considered to be impossible.

*The Victorious Hannibal Seeing Italy from the Alps for the First Time* by Francisco Goya in Museo del Prado; Image Credit: Wikimedia

While history (rightly) remembers Hannibal’s boldness, it’s important to remember that Hannibal’s move was highly calculated. He realized that the Gauls in Northern Italy, who had recently been subjugated by the Romans, were likely to welcome a Roman rival. Through his spies, he also knew that Rome was planning an invasion of Carthage in North Africa. He knew he had little chance to bypass the Roman navy or Roman defensive placements if he invaded in another way.

And Hannibal’s bet paid off! Having caught the Romans entirely by surprise, they cancelled their planned invasion of Africa, and Hannibal lined up many Gallic allies to his cause. Within two years of his entry into Italy, Hannibal trounced the Roman armies sent to battle him at the River Ticinus, at the River Trebia, and at Lake Trasimene. Shocked by their losses, the Romans elected two consuls with the mandate to battle Hannibal and stop him once and for all.

Knowing this, Hannibal seized a supply depot at the town of Cannae, presenting a tempting target to the Roman consuls to prove themselves. They (foolishly) took the bait. Despite fielding over 80,000 soldiers against Hannibal’s 50,000, Hannibal successfully executed a legendary double-envelopment maneuver (see below) and slaughtered almost the entire Roman force that met him in battle.

Hannibal’s double envelopment of Roman forces at Cannae;
Image Credit: Wikimedia

To put this into perspective, in the 2 years after Hannibal crossed the Alps, Hannibal’s army killed 20% of all male Romans over the age of 17 (including at least 80 Roman Senators and one previous consul). Cannae is today considered one of the greatest examples of military tactical brilliance, and, as historian Will Durant wrote, “a supreme example of generalship, never bettered in history”.

Cannae was a great example of Hannibal’s ability to pick a decisive battle with favorable odds. Hannibal knew that his only chance was to encourage the city-states of Italy to side with him. He knew the Romans had just elected consuls itching for a fight. He chose the field of battle by seizing a vital supply depot at Cannae. Considering the Carthaginians had started and pulled back from several skirmishes with the Romans in the days leading up to the battle, it’s clear Hannibal also chose when to fight, knowing full well the Romans outnumbered him. After Cannae, many Italian city-states and the kingdom of Macedon sided with Carthage. That Carthage ultimately lost the Second Punic War is a testament more to Rome’s indomitable spirit and the sheer odds Hannibal faced than any indication of Hannibal’s skills.

In the Far East, about a decade later, the brilliant Chinese military commander Han Xin (韓信) was laying the groundwork for the creation of the Han Dynasty (漢朝) in a China-wide civil war known as the the Chu-Han contention between the State of Chu (楚) and the State of Han (漢) led by Liu Bang (劉邦, who would become the founding emperor Gaozu 高祖 of the Han Dynasty 漢朝).

Under the leadership of Han Xin (韓信), the State of Han (漢) won many victories over their neighbors. Overconfident from those victories, his king Liu Bang (劉邦) led a Han (漢) coalition to a catastrophic defeat when he briefly captured but then lost the Chu (楚) capital of Pengcheng (彭城) in 205 BCE. Chu forces (楚) were even able to capture the king’s father and wife as hostages, and several Han (漢) coalition states switched their loyalty to the Chu (楚).

Map of the 18 states that existed at the start of the Chu-Han Contention, the two sides being the Han (in light purple on the Southwest) and the Chu (in green on the East); Image Credit: Wikimedia

To fix his king’s blunder, Han Xin (韓信) tasked the main Han (漢) army with setting up fortified positions in the Central Plain, drawing Chu (楚) forces there. Han Xin (韓信) would himself take a smaller force of less experienced soldiers to attack rival states in the North to rebuild the Han (漢) military position.

After successfully subjugating the State of Wei (魏), Han Xin (韓信)’s forces moved to attack the State of Zhao (趙, also called Dai 代) through the Jingxing Pass (井陘關) in late 205 BCE. The Zhao (趙) forces, which outnumbered Han Xin (韓信)’s, encamped on the plain just outside the pass to meet them.

Sensing an opportunity to deal a decisive blow to the overconfident Zhao (趙), Han Xin (韓信) ordered a cavalry unit to sneak into the mountains behind the Zhao (趙) camp and to remain hidden until battle started. He then ordered half of his remaining army to position themselves in full view of the Zhao (趙) forces with their backs to the Tao River (洮水), something Sun Tzu’s Art of War (孫子兵法) explicitly advises against (due to the inability to retreat). This “error” likely reinforced the Zhao (趙) commander’s overconfidence, as he made no move to pre-emptively flank or deny the Han (漢) forces their encampment.

Han Xin (韓信) then deployed his full army which lured the Zhao (趙) forces out of their camp to counterattack. Because the Tao River (洮水) cut off all avenues of escape, the outnumbered Han (漢) forces had no choice but to dig in and fight for their lives, just barely holding the Zhao (趙) forces at bay. By luring the enemy out for what appeared to be “an easy victory”, Han Xin (韓信) created an opportunity for his hidden cavalry unit to capture the enemy Zhao (趙) camp, replacing their banners with those of the Han (漢). The Zhao (趙) army saw this when they regrouped, which resulted in widespread panic as the Zhao (趙) army concluded they must be surrounded by a superior force. The opposition’s morale in shambles, Han Xin (韓信) ordered a counter-attack and the Zhao (趙) army crumbled, resulting in the deaths of the Zhao (趙) commander and king!

Han Xin (韓信) bet his entire outnumbered command on a deception tactic based on little more than an understanding of his army’s and the enemy’s psychology. He won a decisive victory which helped reverse the tide of the war. The State of Zhao (趙) fell, and the State of Jiujiang (九江) and the State of Yan (燕) switched allegiances to the Han (漢). This battle even inspired a Chinese expression “fighting a battle with one’s back facing a river” (背水一戰) to describe fighting for survival in a “last stand”.

Caesar crosses the Rubicon by Bartolomeo Pinelli; Image Credit: Wikimedia

Roughly a century later, on the other side of the world, the Roman statesman and military commander Julius Caesar made a career of turning bold, decisive bets into personal glory. After Caesar conquered Gaul, Caesar’s political rivals led by Gnaeus Pompeius Magnus (Pompey the Great), a famed military commander, demanded Caesar return to Rome and give up his command. Caesar refused and crossed the Rubicon, a river marking the boundary of Italy, in January 49 BCE starting a Roman Civil War and coining at least two famous expressions (including alea iacta est – “the die is cast”) for “the point of no return”.

This bold move came as a complete shock to the Roman elite. Pompey and his supporters fled Rome. Taking advantage of this, Caesar captured Italy without much bloodshed. Caesar then pursued Pompey to Macedon, seeking a decisive land battle which Pompey, wisely, given his broad network of allies and command of the Roman navy, refused to give him. Instead, Caesar tried and failed to besiege Pompey at Dyrrhachium which forced him into retreat in Greece.

Pompey’s supporters, however, lacked Pompey’s patience (and judgement). Overconfident from their naval strength, numerical advantage, and Caesar’s failure at Dyrrhachium, they pressured Pompey into a battle with Caesar who was elated at the opportunity. In the summer of 48 BCE, the two sides met at the Battle of Pharsalus.

The initial battle formations at the Battle of Pharsalus; Image Credit: Wikimedia

Always cautious, Pompey took up a position on a mountain and oriented his forces such that his larger cavalry wing would have ability to overpower Caesar’s cavalry and then flank Caesar’s forces while his numerically superior infantry would be arranged deeper to smash through or at least hold back Caesar’s lines.

Caesar made a bold tactical choice when he saw Pompey’s formation. He thinned his (already outnumbered) lines to create a 4th reserve line of veterans which he positioned behind his cavalry at an angle (see battle formation above).

Caesar initiated the battle and attacked with two of his infantry lines. As Caesar expected, Pompey ordered a cavalry charge which soon forced back Caesar’s outnumbered cavalry. But Pompey’s cavalry then encountered Caesar’s 4th reserve line which had been instructed to use their javelins to stab at the faces of Pompey’s cavalry like bayonets. Pompey’s cavalry, while larger in size, was made up of relatively inexperienced soldiers and the shock of the attack caused them to panic. This let Caesar’s cavalry regroup and, with the 4th reserve line, swung around Pompey’s army completing an expert flanking maneuver. Pompey’s army, now surrounded, collapsed once Caesar sent his final reserve line into battle.

Caesar’s boldness and speed of action let him take advantage of a lapse in Pompey’s judgement. Seeing a rare opportunity to win a decisive battle, Caesar was even willing to risk a disadvantage in infantry, cavalry, and position (Pompey’s army had the high ground and had forced Caesar to march to him). But this strategic and tactical gamble (thinning his lines to counter Pompey’s cavalry charge) paid off as Pharsalus shattered the myth of Pompey’s inevitability. Afterwards, Pompey’s remaining allies fled or defected to Caesar, and Pompey himself fled to Egypt where he was assassinated (by a government wishing to win favor with Caesar). And, all of this — from Gaul to crossing the Rubicon to the Civil War — paved the way for Caesar to become the undisputed master of Rome.

For Startups

Founders need to take bold, oftentimes uncomfortable bets that have large payoffs. While a large company can take its time winning a war of attrition, startups need to score decisive wins quickly in order to attract talent, win deals, and shift markets towards them. Only taking the “safe and rational” path is a failure to recognize the opportunity cost when operating with limited resources.

In other words, founders need to find their own Alps / Rubicons to cross.

In the startup world, few moves are as bold (while also uncomfortable and risky) as big pivots. But, there are examples of incredible successes like Slack that were able to make this work. In Slack’s case, the game they originally developed ended up a flop, but CEO & founder Stewart Butterfield felt the messaging product they had built to support the game development had potential. Leaning on that insight, over the skepticism of much of his team and some high profile investors, Butterfield made a bet-the-company move similar to Han Xin (韓信) digging in with no retreat which created a seminal product in the enterprise software space.

Summary

I hope I’ve been able to show that history’s greatest military commanders can offer valuable lessons on leadership and strategy for startup founders.

The five principles derived from studying some of the commanders’ campaigns – the importance of getting in the trenches, achieving tactical superiority, moving fast, building unconventional teams, and picking bold, decisive battles – played a key role in the commanders’ success and generalize well to startup execution.

After all, what is a more successful founder than one who can recruit teams despite resource constraints (unconventional teams), inspire them (by getting in the trenches alongside them), and move with speed & urgency (move fast) to take a competitive edge (achieve tactical superiority) and apply it where there is the greatest chance of a huge impact on the market (pick bold, decisive battles).

My Two-Year Journey to Home Electrification

May 20, 2024
Summary
- Electrifying our (Bay Area) home was a complex and drawn-out process, taking almost two years.
- Installing solar panels and storage was particularly challenging, involving numerous hurdles and unexpected setbacks.
- We worked with a large solar installer (Sunrun) and, while the individuals we worked with were highly competent, handoffs within Sunrun and with other entities (like local utility PG&E and the local municipality) caused significant delays.
- While installing the heat pumps, smart electric panel, and EV charger was more straightforward, these projects also featured greater complexity than we expected.
- The project resulted in significant quality of improvements around home automation and comfort. However, bad pricing dynamics between electricity and natural gas meant direct cost savings from electrifying gas loads are, at best, small. While solar is an economic slam-dunk (especially given the rising PG&E rates our home sees), the batteries, in the absence of having backup, have less obvious economic value.
- Our experience underscored the need for the industry to adopt a more holistic approach to electrification and for policymakers to make the process more accessible for all homeowners to achieve the state’s ambitious goals.
Why

The decision to electrify our home was an easy one. From my years of investing in & following climate technologies, I knew that the core technologies were reliable and relatively inexpensive. As parents of young children, my wife and I were also determined to contribute positively to the environment. We also knew there were abundant financial supports from local governments and utilities to help make this all work.

Yet, as we soon discovered, what we expected to be a straightforward path turned into a nearly two-year process!

Even for a highly motivated household which had budgeted significant sums for it all, it was still shocking how long (and much money) it took. It made me skeptical that households across California would be able to do the same to meet California’s climate goals without additional policy changes and financial support.

The Plan

Two years ago, we set out a plan:
1. Smart electrical panel — From my prior experience, I knew that many home electrification projects required a main electrical panel upgrade. These were typically costly and left you at the mercy of the utility to actually carry them out (I would find out how true this was later!). Our home had an older main panel rated for 125 A and we suspected we would normally need a main panel upgrade to add on all the electrical loads we were considering.
  
  To try to get around this, we decided to get a smart electrical panel which could:
  - use software smarts to deal with the times where peak electrical load got high enough to need the entire capacity of the electrical line
  - give us the ability to intelligently manage backups and track solar production
  In doing our research, Span seemed like the clear winner. They were the most prominent company in the space and had the slickest looking device and app (many of their team had come from Tesla). They also had an EV charger product we were interested in, the Span Drive.
2. Heat pumps — To electrify is to ditch natural gas. As the bulk of our gas consumption was heating air and water, this involved replacing our gas furnace and gas water heater with heat pumps. In addition to significant energy savings — heat pumps are famous for their >200% efficiency (as they move heat rather than “create” it like gas furnaces do) — heat pumps would also let us add air conditioning (just run the heat pump in reverse!) and improve our air quality (from not combusting natural gas indoors). We found a highly rated Bay Area HVAC installer who specializes in these types of energy efficiency projects (called Building Efficiency) and trusted that they would pick the right heat pumps for us.
3. Solar and Batteries — No electrification plan is complete without solar. Our goal was to generate as much clean electricity as possible to power our new electric loads. We also wanted energy storage for backup power during outages (something that, while rare, we seemed to run into every year) and to take advantage of time-of-use rates (by storing solar energy when the price of electricity is low and then using it when the price is high).
  
  We looked at a number of solar installers and ultimately chose Sunrun. A friend of ours worked there at the time and spoke highly of a prepaid lease they offered that was vastly cheaper all-in than every alternative. It offered minimum energy production guarantees, came with a solid warranty, and the “peace of mind” that the installation would be done with one of the largest and most reputable companies in the solar industry.
4. EV Charger — Finally, with our plan to buy an electric vehicle, installing a home charger at the end of the electrification project was a simple decision. This would allow us to conveniently charge the car at home, and, with solar & storage, hopefully let us “fuel up” more cost effectively. Here, we decided to go with the Span Drive. It’s winning feature was the ability to provide Level 2 charging speeds without a panel upgrade (it does this by ramping up or down charging speeds depending on how much electricity the rest of the house needed). While pricey, the direct integration into our Span smart panel (and its app) and the ability to hit high charging rates without a panel upgrade felt like the smart path forward.
5. What We Left Out — There were two appliances we decided to defer “fully going green” on.
  
  The first was our gas stove (with electric oven). While induction stoves have significant advantages, because our current stove is still relatively new, works well, uses relatively little gas, and an upgrade would have required additional electrical work (installing a 240 V outlet), we decided to keep our current stove and consider a replacement at it’s end of life.
  
  The second was our electric resistive dryer. While heat pump based dryers would certainly save us a great deal of electricity, the existing heat pump dryers on the market have much smaller capacities than traditional resistive dryers, which may have necessitated our family of four doing additional loads of drying. As our current dryer was also only a few years old, and already running on electricity, we decided we would also wait to consider heat pump dryer only after it’s end of life.
With what we thought was a well-considered plan, we set out and lined up contractors.

But as Mike Tyson put it, “Everyone has a plan ’till they get punched in the face.”

The Actual Timeline

Smart Panel

The smart panel installation was one of the more straightforward parts of our electrification journey. Span connected us with a local electrician who quickly assessed our site, provided an estimate, and completed the installation in a single day. However, getting the permits to pass inspection was a different story.

We failed the first inspection due to a disagreement over the code between the electrician and the city inspector. This issue nearly turned into a billing dispute with the electrician, who wanted us to cover the extra work needed to meet the code (an unexpected cost). Fortunately, after a few adjustments and a second inspection, we passed.

The ability to control and monitor electric flows with the smart panel is incredibly cool. For the first few days, I checked the charts in the apps every few minutes tracking our energy use while running different appliances. It was eye-opening to see just how much power small, common household items like a microwave or an electric kettle could draw!

However, the true value of a smart panel is only achieved when it’s integrated with batteries or significant electric loads that necessitate managing peak demand. Without these, the monitoring and control benefits are more novelties and might not justify the cost.

Note: if you, like us, use Pihole to block tracking ads, you’ll need to disable it for the Span app. The app uses some sort of tracker that Pihole flags by default. It’s an inconvenience, but worth mentioning for anyone considering this path.

Heating

Building Efficiency performed an initial assessment of our heating and cooling needs. We had naively assumed they’d be able to do a simple drop-in replacement for our aging gas furnace and water heater. While the water heater was a straightforward replacement (with a larger tank), the furnace posed more challenges.

Initially, they proposed multiple mini-splits to provide zoned control, as they felt the crawlspace area where the gas furnace resided was too small for a properly sized heat pump. Not liking the aesthetics of mini-splits, we requested a proposal involving two central heat pump systems instead.

Additionally, during the assessment, they found some of our old vents, in particular the ones sending air to our kids’ rooms, were poorly insulated and too small (which explains why their rooms always seemed under-heated in the winter). To fix this, they had to cut a new hole through our garage concrete floor (!!) to run a larger, better-insulated vent from our crawlspace. They also added insulation to the walls of our kids’ rooms to improve our home’s ability to maintain a comfortable temperature (but which required additional furniture movement, drywall work, and a re-paint).

Building Efficiency spec’d an Ecobee thermostat to control the two central heat pumps. As we already had a Nest Learning Thermostat (with Nest temperature sensors covering rooms far from the thermostat), we wanted to keep our temperature control in the Nest app. At the time, we had gotten a free thermostat from Nest after signing with Sunrun. We realized later, what Sunrun gifted us was the cheaper (and, less attractive) Nest Thermostat which doesn’t support Nest temperature sensors (why?), so we had to buy our own Nest Learning Thermostat to complete the setup.

Despite some of these unforeseen complexities, the whole process went relatively smoothly. There were a few months of planning and scheduling, but the actual installation was completed in about a week. It was a very noisy (cutting a hole through concrete is not quiet!) and chaotic week, but, the process was quick, and the city inspection was painless.

Solar & Storage

The installation of solar panels and battery storage was a lengthy ordeal. Sunrun proposed a system with LONGI solar panels, two Tesla Powerwalls, a SolarEdge inverter, and a Tesla gateway. Despite the simplicity of the plan, we encountered several complications right away.

First, a main panel upgrade was required. Although we had installed the Span smart panel to avoid this, Sunrun insisted on the upgrade and offered to cover the cost. Our utility PG&E took over a year (!!) to approve our request, which started a domino of delays.

After PG&E’s approval, Sunrun discovered that local ordinances needed a concrete pad to be poured and safety fence erected around the panel, requiring a subcontractor and yet more coordination.

After the concrete pad was in place and the panel installed, we faced another wait for PG&E to connect the new setup. Ironically, during this wait, I received a request from Sunrun to pour another concrete pad. This was, thankfully, a false alarm and occurred because the concrete pad / safety fence work had not been logged in Sunrun’s tracking system!

The solar and storage installation itself took only a few days, but during commissioning, a technician found that half the panels weren’t connected properly, necessitating yet another visit before Sunrun could request an inspection from the city.

Sadly, we failed our first city inspection. Sunrun’s team had missed a local ordinance that required the Powerwalls to have a minimum distance between them and the sealing off of vents within a certain distance from each Powerwall. This necessitated yet another visit from Sunrun’s crew, and another city inspection (which we thankfully passed).

The final step was obtaining Permission to Operate (PTO) from PG&E. The application for this was delayed due to a clerical error. About four weeks after submission, we finally received approval.

Seeing the flow of solar electricity in my Span app (below) almost brought a tear to my eye. Finally!

EV Charger

When my wife bought a Nissan Ariya in early 2023, it came with a year of free charging with EVgo. We hoped this would allow us enough time to install solar before needing our own EV charger. However, the solar installation took longer than expected (by over a year!), so we had to expedite the installation of a home charger.

Span connected us with the same electrician who installed our smart panel. Within two weeks of our free charging plan expiring, the Span Drive was installed. The process was straightforward, with only two notable complications we had to deal with:
1. The 20 ft cable on the Span Drive sounds longer than it is in practice. We adjusted our preferred installation location to ensure it comfortably reached the Ariya’s charging port.
2. The Span software initially didn’t recognize the Span Drive after installation. This required escalated support from Span to reset the software, costing the poor electrician who had expected the commissioning step to be a few minute affair to stick around my home for several hours.
Result

So, “was it worth it?” Yes! There are significant environmental (our carbon footprint is meaningfully lower) benefits. But there were also quality of life improvements and financial gains from these investments in what are just fundamentally better appliances.

Quality of Life

Our programmable, internet-connected water heater allows us to adjust settings for vacations, saving energy and money effortlessly. It also lets us program temperature cycles to avoid peak energy pricing, heating water before peak rates hit.

With the new heat pumps, our home now has air conditioning, which is becoming increasingly necessary in the Bay Area’s warmer summers. Improved vents and insulation have also made our home (and, in particular, our kids’ rooms) more comfortable. We’ve also found that the heat from the heat pumps is more even and less drying compared to the old gas furnace, which created noticeable hot spots.

Backup power during outages is another significant benefit. Though we haven’t had to use it since we received permission to operate, we had an accidental trial run early on when a Sunrun technician let our batteries be charged for a few days in the winter. During two subsequent outages in the ensuing months, our system maintained power to our essential appliances, ensuring our kids didn’t even notice the disruptions!

The EV charger has also been a welcome change. While free public charging was initially helpful, reliably finding working and available fast chargers could be time-consuming and stressful. Now, charging at home is convenient and cost-effective, reducing stress and uncertainty.

Financial

There are two financial aspects to consider: the cost savings from replacing gas-powered appliances with electric ones and the savings from solar and storage.

On the first, the answer is not promising.

The chart below comes from our PG&E bill for Jan 2023. It shows our energy usage year-over-year. After installing the heat pumps in late October 2022, our natural gas consumption dropped by over 98% (from 5.86 therms/day to 0.10), while our electricity usage more than tripled (from 15.90 kWh/day to 50.20 kWh/day). Applying the conversion of 1 natural gas therm = ~29 kWh of energy shows that our total energy consumption decreased by over 70%, a testament to the much higher efficiency of heat pumps.

Our PG&E bill from Feb 2023 (for Jan 2023)

Surprisingly, however, our energy bills remained almost unchanged despite this! The graph below shows our PG&E bills over the 12 months ending in Jan 2023. Despite a 70% reduction in energy consumption, the bill stayed roughly the same. This is due to the significantly lower cost of gas in California compared to the equivalent amount of energy from electricity. It highlights a major policy failing in California: high electricity costs (relative to gas) will deter households from switching to greener options.

Our PG&E bill from Feb 2023 (for Jan 2023)

Solar, however, is a clear financial winner. With our prepaid lease, we’d locked in savings compared to 2022 rates (just by dividing the total prepaid lease amount by the expected energy production over the lifetime of the lease), and these savings have only increased as PG&E’s rates have risen (see chart below).

PG&E Rates 2022 vs 2024 (Source: PG&E; Google Sheet)

Batteries, on the other hand, are much less clear-cut financially due to their high initial cost and only modest savings from time-shifting electricity use. However, the peace of mind from having backup power during outages is valuable (not to mention the fact that, without a battery, solar panels can’t be used to power your home during an outage), and, with climate change likely to increase both peak/off-peak rate disparities and the frequency of outages, we believe this investment will pay off in the long run.

Taking Advantage of Time of Use Rates

Time of Use (TOU) rates, like PG&E’s electric vehicle time of use rates, offer a smart way to reduce electricity costs for homes with solar panels, energy storage, and smart automation. This approach has fundamentally changed how we manage home energy use. Instead of merely conserving energy by using efficient appliances or turning off devices when not needed, we now view our home as a giant configurable battery. We “save” energy when it’s cheap and use it when it’s expensive.
- Backup Reserve: We’ve set our Tesla Powerwall to maintain a 25% reserve. This ensures we always have a good supply of backup power for essential appliances (roughly 20 hours for our highest priority circuits by the Span app’s latest estimates) during outages
- Summer Strategy: During summer, our Powerwall operates in “Self Power” mode, meaning solar energy powers our home first, then charges the battery, and lastly any excess goes to the grid. This maximizes the use of our “free” solar energy. We also schedule our heat pumps to run during midday when solar production peaks and TOU rates are lower. This way, we “store” cheaper energy in the form of pre-chilled or pre-heated air and water which helps maintain the right temperatures for us later (when the energy is more expensive).
- Winter Strategy: In winter, we will switch the Powerwall to “Time-Based Control.” This setting preferentially charges the battery when electricity is cheap and discharges it when prices are high, maximizing the financial value of our solar energy during the months where solar production is likely to be limited.
This year will be our first full cycle with all systems in place, and we expect to make adjustments as rates and energy usage evolve. For those considering home electrification, hopefully these strategies give hints to what is possible to improve economic value of your setup.

Takeaways
- Two years is too long: The average household might not have started this journey if they knew the extent of time and effort involved. This doesn’t even consider the amount of carbon emissions from running appliances off grid energy due to the delays. Streamlining the process is essential to make electrification more accessible and appealing.
- Align gas and electricity prices with climate goals: The current pricing dynamics make it financially challenging for households to switch from gas appliances to greener options like heat pumps. To achieve California’s ambitious climate goals, it’s crucial to align the cost of electricity more closely with electrification.
- Streamline permitting: Electrification projects are slowed by complex, inconsistent permitting requirements across different jurisdictions. Simplifying and unifying these processes will reduce time and costs for homeowners and their contractors.
- Accelerate utility approvals: The two-year timeframe was largely due to delays from our local utility, PG&E. As utilities lack incentives to expedite these processes, regulators should build in ways to encourage utilities to move faster on home electrification-related approvals and activities, especially as many homes will likely need main panel upgrades to properly electrify.
- Improve financing accessibility: High upfront costs make it difficult for households to adopt electrification, even when there are significant long-term savings. Expanding financing options (like Sunrun’s leases) can encourage more households to invest in these technologies. Policy changes should be implemented so that even smaller installers have the ability to offer attractive financing options to their clients.
- Break down electrification silos: Coordination between HVAC specialists, solar installers, electricians, and smart home companies is sorely missing today. As a knowledgeable early adopter, I managed to integrate these systems on my own, but this shouldn’t be the expectation if we want broad adoption of electrification. The industry (in concert with policymakers) should make it easier for different vendors to coordinate and for the systems to interoperate more easily in order to help homeowners take full advantage of the technology.
This long journey highlighted to me, in a very visceral way, both the rewards and practical challenges of home electrification. While the environmental, financial, and quality-of-life benefits are clear, it’s also clear that we have a ways to go on the policy and practical hurdles before electrification becomes an easy choice for many more households. I only hope policymakers and technologists are paying attention. Our world can’t wait much longer.
Building a Personalized News Reader with AI

April 29, 2024
Summary

I built an AI-powered news reader (GitHub link) end-to-end tailored to my preferences with the goal of surfacing high quality content (rather than clickbait). It included:
- Data flow architecture encompassing scraping, AI model training & inference, frontend, and backend components.
- Deep neural network model for content rating, integrating a pre-trained language model
- Automated processes for scraping, regular model training, and batch inference using a serverless platform to maintain continuously refreshed and improving data at reasonable cost
- Use of PostgreSQL and the pgvector extension for robust data storage & serving and vector-based queries
- A FastAPI backend to serve API requests and integrate authentication
- A lightweight frontend using Preact and HTM with features like optimistic rendering and catching unauthenticated API calls
- The code was written in part by ChatGPT and my writeup includes learnings on how to use ChatGPT for this type of work
Update (2024 Nov 21): Since first writing this, I’ve made a number of updates which I have described in this blog post and that have been reflected at the Github

Motivation

I am a knowledge junkie. There are few things as intellectually stimulating to me as reading high quality articles from the internet.

It should come as no surprise that I was an ardent user of Google Reader from its early days. For many years, it would not have been an exaggeration to say that the simple feed-based interface of Google Reader was the most important thing on the entire internet to me. And, consequently, I was incredibly saddened when Google ultimately killed it.

But, as much as I love (and continued to use) RSS readers and other social/algorithmic applications which have since filled the gap, these solutions suffer from a couple of key flaws:
- Many sites today don’t have RSS feeds. Take the essays of Paul Graham. Amazing content. No RSS feed whatsoever.
- RSS feeds have no sense of priority or relevance. In practice, RSS feeds are usually populated with all the content a service has put out. The result is, for many users of RSS readers, a great deal of time spent guessing which articles are worth reading and filtering out those that are less interesting.
- Social/algorithmic feeds optimize for engagement/clicks, not quality. These platforms (i.e. Facebook, Reddit, Twitter) generally reward the controversial clickbait that leads to viral sharing and emotional responses, because that’s what typically drives their business models. I wanted quality (or at least my definition of quality), favoring reading 1 great article over clicking on 10 and starting a flame war on 3 of them.
What I wanted was a service which would:
1. Deliver content from sources I care about
2. … filtered/ordered based on whether or not they were worthwhile
3. … with context (like a summary) so that I would know if something was worthwhile prior to reading it (and why)
4. … that would learn from my preferences over time
This project came from a desire to build the service that I wanted.

But, it was also a chance to build something from scratch of moderate complexity end-to-end that would take advantage of my existing AI, Python, and product skills as well as push me to learn some new ones. In some of those new areas (like front-end development), I also wanted to see what it would be like to lean on an LLM (Large Language Model) like ChatGPT to help write the code.

The story of how I did it 👇🏻

Architecture

Data Flow Architecture (Pink: frontend; Green: web backend; Yellow: AI model training/serving; Blue: database; Purple: scraper; White: storage)

I started by thinking through the data flow that would be needed to make an application like this work:
1. Scrapers (purple) — Need to run on a regular basis parsing RSS feeds, sitemaps (which relay to scrapers like Bing/Google everything that is on a webpage), and Google News feeds (which get content sites listed on Google News) for lists of new content. That content would then be parsed by other scrapers ultimately resulting in the core information needed for the algorithm to work stored on a queue.
2. AI model inference (yellow) — The queue would then be processed by (1) an AI model to devise scores for the content and (2) an LLM to generate a summary for the user. This information would then be stored in a database (blue) for use by the application.
3. Frontend (pink) / Web backend (green) — The web application would, through interfacing with the database (blue), surface the highest ranking content as well as store any of the user’s ratings and read status activity.
4. AI model training (yellow) — On a regular basis, new ratings data provided by the user and new articles would be pulled from the database (blue) and used to fine-tune the existing model so that changes in preferences and new data collected could be used to further improve performance. Once fine-tuned, a subset of articles would have their scores revisited to make sure that these new improvements would make it back to the application user.
AI

The heart of the application is the use of AI to rate content on “worthwhileness” and also provide sufficient context to the user to make that judgement.

Why not Just Use an LLM?

With OpenAI, Google, and Anthropic making their LLMs easily accessible via API, early on, I explored the possibility of using LLMs to rate content. However, I quickly realized that, while the LLMs were well-suited at topic categorization and article summarization, the rating problem was not going to be solved easily via LLM.

LLMs work best when there is a clear and precise definition of what they need to do. Because human preferences are difficult to explain precisely (even to the person with the preferences), it was not obvious to me how to perform the prompt engineering needed for such an LLM-centric approach to work across users and time (as preferences change).

Model Architecture

Instead, I took a more “traditional” deep neural network modeling approach to the rating problem. Here, I used the newly released Keras 3. I went with Keras both due to my personal familiarity (having previously used Keras for other projects), but also my view that Keras’s functional approach would make it much easier to experiment with different model architectures and to conduct “model surgery” (cutting and moving around pieces of a model). It also helped that Keras 3 had been re-designed to be multi-backend, meaning it now supported PyTorch and JAX (as well as Tensorflow), making my code potentially more portable and future proof as new tools and libraries emerged.

Model architecture (output from keras.utils.plot_model)

The model architecture I arrived at (see above) utilized three strategies to achieve a respectable outcome:
1. Neural collaborative filter — While I currently don’t expect this project to be used by many users, I wanted to build a collaborative filter style system where every rating in the system helped improve the ratings for every other user and article. In the architecture I chose, the model simultaneously learns article embeddings (representing the text of the content) and user embeddings (representing the preferences of the user). The combination of the two is then used to make a prediction about a given article for a given user.
  
  While I initially wanted the article embeddings and user embeddings to have the same dimension (so that I could simply yield scores using a dot product), I was ultimately unable to achieve that and went with fully connected deep neural network layers instead to produce the rating.
2. Taking advantage of an already pretrained language model — To take advantage of previously learned context, I incorporated the pretrained “backbone” of RoBERTa, an effort by Facebook researchers to optimize the BERT model first popularized by Google. This was very easy to do with Keras-NLP and became the key step my architecture used to create article embeddings without needing to try untested architectures on much larger datasets to get to solid predictions.
3. Simultaneous training on two tasks — In previous work, I learned that training a model to succeed at two different and only slightly related tasks at the same time can create a model which performs better on both tasks than had two different models been trained on each task individually.
  
  My “head-canon” on why this works is that simultaneously training a model on two different tasks likely pushes the model to find generalizable intermediates which can apply to both tasks.
  
  In the described model architecture above, there are two outputs: one corresponding to the rating prediction task (where the model will evaluate the likely score on a piece of content and compare it with actual user ratings), and, the other, a length prediction task (where the model tries to guess the length of the piece of content in question from just a 512 “token” snippet). The theory being that both tasks are quite different but still related (being able to ascertain the length of a piece from a snippet requires the model to understand aspects of writing style and narrative that are probably related to things like topic, argument, and relevance).
  
  This approach was also something that drew me as it allowed me to start with relatively limited training data (since rating many articles by hand is time consuming) because it’s very easy to assemble the data for the length of an article.
Update (2024 Nov 21): Since first writing this, I’ve made a number of updates which I have described in this blog post, which include migrating the model to a JAX backend while staying on Keras 3; simplifying the overall model architecture (it now uses a dot product for the neural collaborative filter, and it no longer requires separate input data pipelines to train on both tasks simultaneously)

Training & Inference

These models were originated on Google CoLab before being run “in production” on the serverless platform Modal. For training, I utilized CoLab notebooks with V100 and L4 GPUs on their “high-memory” setting during experimentation (and, for later fine-tuning in production, Modal instances with L4 GPUs). To facilitate training, I would extract the relevant scraped information into CSV and/or in-memory data and then used Tensorflow’s tf.data API to build on-CPU data pipelines (to efficiently feed the GPUs) and which are natively supported by Keras’s easy-to-use Model.fit() and Model.evaluate() APIs.

For inference, I carried out some simple “model surgery”, extracting only the rating task / part of the model, and then used Keras’s built-in support for int8 quantization to reduce the memory required and improve the overall performance. I also used Modal’s image initialization and memory snapshot coupled with their performant distributed storage offering to reduce cold start time.

Some relatively simple experimentation showed that running inference on GPUs on Modal was only slightly faster (probably because the cold start and network latency that comes with Modal’s serverless architecture dominates the time needed) but significantly more expensive than running inference on CPU (I used images with access to 8-cores)

Summarization and Topic Extraction

For summarization and topic extraction, I used Anthropic’s new Claude 3 Haiku model. Haiku is known for being fast and cheap, charging only $0.25 per million tokens on input and $1.25 per million tokens on the output, making them very popular for AI uses not needing the capabilities of a much larger LLM.

Automation

Modal provides simple cron type functionality allowing you to schedule different processes to run in advance. For this project, I created two schedules to “run the system on autopilot”:
1. A scraper schedule — a scraping “job” is initiated every few hours which starts by reading the RSS feeds / sitemap / Google News feeds of sites of interest to find new articles. This would then trigger individual page scrapers to scrape the new articles in parallel. The output from these scrapers would then be pushed onto a Modal queue and a final inference job would be run to rate each article and push the results onto the database.
2. A training schedule — twice a month, user ratings from the database would be pulled to create a training set to fine tune the existing model. If the model’s performance exceeds a certain threshold, the resulting model parameters would be saved so that future inference calls would be directed to the new model. Finally, the model would be rerun on newer and previously highly rated articles to make sure that users are getting the most up to date scores.
While simplistic, these scheduled tasks allow the compute-intensive training and inference functions to be more efficiently run as batch jobs on a regular basis such that the model continuously improves as new data comes in.

Update (2024 Nov 21): Since first writing this, I’ve made a number of updates which I have described in this blog post, which include adding a daily cleanUp routine to re-calculate certain pre-fetched article scores as well as to make sure there are no articles with missing algorithmic ratings or summaries

Database

To store and serve the data necessary for model training, inference, and application serving, I needed a database.

I chose PostgreSQL. It’s known for its performance and stability and is widely used (with battle-tested libraries for deep integration into Python). It also supports complex JOIN operations, something I’d need to be able to surface the right content with the right attributes to a given user (something many “NoSQL” and key-value and vector-based databases can struggle with).

The PostgreSQL community had released the pgvector extension in 2021 which enables native vector operations (like the kind I would want to use with the user embeddings and article embeddings I was working with).

There are many hosted Postgres solutions, but I went with Supabase. They’re a widely used vendor, have many features beyond their core database offering which I might want to use someday, and, at the center of their offerings, is a full (non-serverless) Postgres database with a generous free tier and the ability to enable extensions (like pgvector) and to either connect directly to the underlying Postgres database or use their API. It also helped that they had nice web interface to create and modify tables, compose and run SQL queries, inspect and export query results, and even visualize data schema (see below).

Database architecture (output from Supabase’s Schema Visualizer tool)

Given the data needs of the application, only three tables were needed (see above):
- One for Users of the service which stores authentication information, the user embedding, and a log of the exponential moving average article embedding that the user has recently read (in order to surface more novel content).
- One for Articles which stores information about an article to surface in the application and during training.
- One for Article-Users — In addition to being linked to rows in the Users table and Articles table, this table stores the information on the algorithm’s ratings for a given user (to sort/filter articles), the user’s own ratings (used for training the model), and the user’s read status on the article in question (used to make sure the application doesn’t serve previously read articles).
Update (2024 Nov 21): Since first writing this, I’ve made a number of updates which I have described in this blog post, which include adding a Sources table to track individual sources and enable source-specific feeds and adding a database index on pre-calculated article ranking scores to speed up article retrieval. The blog post also includes an updated database schema visualization

Backend

To power the application’s backend, I went with FastAPI. It conforms to the ASGI spec that Modal already supports and is a high performance Python framework that can serve both the API the web front-end would use to pull/push data as well as serve the application HTML itself. FastAPI also comes with all the type validation & conversion and native JSON handling needed, reducing the amount of code I’d need to write.

There are four core actions the backend needs to support:
1. Authentication: FastAPI has an easy-to-use dependency injection system which makes it easy to require authentication on every API call and make it completely opaque to the function handling the call. I simply tied JSON Web Token creation to the login process and required the backend to see the right access token corresponding to a user when loading pages (as a cookie parameter) or interfacing with the API (as bearer tokens).
2. Fetching articles: This is where PostgreSQL’s ability to do more sophisticated JOINs comes in handy. I simply pulled the articles unread by a user sorted by a weighted combination of the AI’s rating, the recency of the article (the older, the worse it does), and how similar the article embedding is to an exponential moving average of the articles the user has just read (the more similar to what the user read, the worse it does). The Fetch API also takes an offset parameter to reduce the likelihood that the API returns pages more than once in a given session.
3. Marking an article as read: Setting the flag on the articleuser table for this is straightforward. What was more complicated was updating the exponential moving average on the user (in an attempt to show fresher topics).
4. Rating an article: The application allows users to rate an article with one of three ratings: 👍🏻(+1, this was worth the time), 👎🏻(+0, this was not worth the time), and 🤷🏻‍♂️(+0.5, not so bad I hate it, but not good enough to say it was worth the time).
Update (2024 Nov 21): Since first writing this, I’ve made a number of updates which I have described in this blog post, which include pre-calculating article rankings and adding a database index on the pre-calculated article ranking scores to speed up article retrieval

Frontend

I have always been intimidated by frontend development. The complex build systems, the odd (to me, at least) data flow and syntax of modern Typescript/Node/React, and the need to interface with finicky CSS and design elements all seemed impossibly complex for this Python dabbler.

Looking for the “least painful way” forward, I chose Preact + HTM to give me the right mix of simplicity and power:
- It’s lightweight (Preact is 3 kb, adding HTM adds <1 kb more)
- Does not require any complicated build systems or compilation steps so that it can easily be incorporated into a single HTML page (served by my backend)
- Supports core React semantics around components and virtual DOM to make interactivity and state management easy
How to Get ChatGPT to be your Contract Software Engineer

While I had gone through the (very good) Preact tutorial, one tutorial does not make you an expert frontend developer. I knew I would need help. I took advantage of this opportunity to test out ChatGPT (using GPT 3.5 on an unpaid version) as a “cheap outsourced software engineer”.

The fact that I have code that works at all is a testament to the power of LLMs and the availability of high quality code-related text on the internet on which these LLMs are trained. But, it was not an easy, direct path. In the hopes this helps others who have thought or are thinking about doing something similar, here are some tips:
- Modern LLMs have very long context windows. Use them to provide huge amounts of context. LLMs do not have mindreading abilities, nor do they really have any context on what you’re interested in. But, they have long context windows (the amount of text they “remember” from your conversation). I found out, early on, that starting all of my conversations with ChatGPT (or a similar LLM) with long, precisely written descriptions was key to getting good results as it provided the context necessary for the LLM to ground its responses. Case in point: my starting point consisted of 1-2 paragraphs of introduction, 10+ bullet points on features, and another 1-2 paragraphs explaining exactly what I wanted the LLM to do for me. I found that skimping on this level of detail would result in the LLMs generating overly generic code that would require a great deal of follow-on work to use.
- You should probably only use LLM coding with conventionally popular frameworks that have been around for a while. LLMs only “know” what they’ve been trained on. If there are fewer websites dedicated to a coding framework / language or little content on the problem you’re facing, the LLM will likely be unable to answer your questions. Sadly, this means newer programming tricks and niche tools/frameworks are less reliably handled. This is part of the reason I went with something so React-like, even if some of the newer tools had attractive functionality.
- You should have a basic understanding of the coding you’re asking the LLM to do. While I was impressed with the breadth and quality of the code ChatGPT produced for me, there was no escaping that some 10-20% of the time there were issues with its output. Sometimes, this was as simple to fix as asking ChatGPT to fill out placeholder code it chose to omit. But sometimes this involved painstaking troubleshooting and testing. While ChatGPT was reasonably good at using my reports of error messages and broken behavior to fix its code, there were quite a number of times where it was either stumped or could not identify the right fix. Those times I was happy I had taken the Preact tutorial and had a solid understanding of programming fundamentals as I could fix the issue myself or give a specific command to the LLM on what to do next. Without that, I think I would’ve ultimately faced too many stumbling blocks to have gotten to a functioning application.
- Sometimes you just have to restart the conversation. Because an LLM eventually “runs out” of context memory (especially since its output can be long stretches of code), I found that an ongoing discussion with an LLM would eventually become less productive as the LLM would forgot context I had shared earlier in the discussion. When that happened, I would start over with a new message thread sharing the old context and adding new paragraphs and bullet points that were directly relevant to the task at hand. This seemed to yield better results and is something I’d advise doing for anyone using an LLM for coding.
With this, I was able to have the LLM write at least 90% of the HTML file (including the CSS and Javascript needed for my application). The 10% I had to write myself consisted mainly of:
- fixing a few issues the LLM struggled to handle (for example, it kept passing extra parameters to an event handler function which broke the functionality)
- hand-tuning some of the CSS to achieve the look I wanted (it was faster than explaining to the LLM what I wanted)
- debugging and writing my own Jinja2 templating code to prepopulate the application on initial load (even after repeated Google searching, I couldn’t find examples of people who use a Python backend template to provide initial data to a Javascript frontend)
- debugging the API calls — for some reason, LLM introduced multiple errors in the chained callbacks that I had to ultimately step in to fix
Screenshot of app at mobile screen-width

Architecture

The LLM helped create a relatively straightforward architecture for the code consisting of a feed-container <div> to hold feed-item‘s representing the individual articles. Each feed-item had event handlers to capture when a link or button had been pressed which would initiate an authenticated API call to the backend to mark an article as read or rate an article.

Various elements had state which would be reflected their appearance. A read article would have a grayed out look. The current applied ranking would show up as a pressed button (see right).

Preload / Templating

While the app was initially conceived of as a blank feed which would pull articles from the backend after initializing, I quickly realized during testing that would add a few seconds of latency. As a result, I moved to a model where feed articles would show up upon first load.

I did this by converting fetch results from the database into Javascript objects and fed them directly into the HTML through Python’s Jinja2 templating system (something I didn’t see many references to while searching online; see code below).

This allowed me to keep most of my existing frontend and backend code while still cutting down on the time a user had to wait to see the content.
```
const initialFeedData = [
  {% for item in fetchItems %}
  {
    articleId: {{ item.articleId }},
    title: "{{ item.title|safe }}",
    articleUrl: "{{ item.articleUrl }}",
    source: "{{ item.source }}",
    author_href: "{{ item.author_href }}",
    author_name: "{{ item.author_name|safe}}",
    date: "{{ item.date }}",
    score: {{ item.score }},
    summary: "{{ item.summary }}",
    rating: null,
    read: false
  }{{ "," if not loop.last }}
  {% endfor %}
];Code language: JavaScript (javascript)
```
Optimistic Rendering and Catching Unauthenticated API Calls

I also implemented optimistic rendering after some testing revealed that waiting for the backend to confirm an action made the user experience lag significantly. Instead, the frontend now assumes a state change (like marking an article as read or rating it) is instantaneous (happens right after the user action) and is only reversed in the rare case the backend returns an error. This made the app feel much snappier (even if a Modal cold start was delaying the backend response).

Intercepting events in this fashion also allowed me to add 401 response (signifying that a call was not properly authenticated) handling which would result in the session being ended and the user being kicked back to the login page.
```
onMarkAsRead = async (articleId, status) => {
  try {
      // Optimistically update the item's read state
      this.setState(prevState => ({
          items: prevState.items.map(item => {
              if (item.articleId === articleId) {
                  return { ...item, read: status };
              }
              return item;
          })
      }));
      const response = await fetch('/read', {
          method: 'POST',
          headers: {
              'Content-Type': 'application/json',
              'Authorization': `Bearer ${this.getAccessToken()}`
          },
          body: JSON.stringify({ articleId, status })
      });

      if (!response.ok) {
          if (response.status === 401) {
              window.location.href = '/login?message=Login+expired';
          }
          throw new Error('Failed to mark article as read/unread');
      }
  } catch (error) {
      console.error('Error marking article as read/unread:', error);
      // Revert the item's read state if the backend request fails
      this.setState(prevState => ({
          items: prevState.items.map(item => {
              if (item.articleId === articleId) {
                  return { ...item, read: !status };
              }
              return item;
          })
      }));
      throw error;
  }
};Code language: JavaScript (javascript)
```
Future Directions

This project started out as an experiment to see if I could build a service that I knew I wanted. In that respect, it has succeeded! Since this project has been online, I’ve stopped using my RSS reader and have seen my use of Twitter/X and the Android Home News feed decline.

But that doesn’t mean there aren’t additional directions I intend to take it.
- Performing lighter-weight training updates more frequently. My current model training setup is too overkill to run too frequently. But, there are some permutations of my current training path that I would like to experiment with which should reduce the compute load while still achieving performance improvements. These include
  - Just training updates to user embeddings and article “encoder”
  - Creating “dummy users” who are only interested in one topic or source or range of article lengths as a means of rapidly creating more partially relevant user data
- Training a dot product version of the collaborative filter. Another way to potentially increase performance is to train a version of the collaborative filter which does not require a downstream fully connected layer to generate an accurate rating score. While my previous efforts did not succeed, I had much less data at the time and could readily combine this with the “dummy users” idea in the last point. This would allow me to push more of the burden of finding matches to the database’s vector operations instead of an expensive batch compute process that happens after each training.
- Expanding scraper site coverage. My backend is currently only scraping a small handful of sites (mostly those I used to follow via RSS). With the success my rating algorithm has shown, it makes sense to attempt to scrape broader ranges of sites (for instance paywall-limited snippets) to see how it performs.
- Making the scrapers and endpoints fully async. Modal and FastAPI both support async function calls (a type of concurrency that can increase performance when many of the tasks that need to be accomplished have to wait for network latency). Given that much of the time spent by the backend is on waiting for database response, upgrading my Python database driver from the older psycopg2 to the newer, async-compatible psycopg3 should be able to help with performance and load.
- Using the topic information. Currently, topics are distilled by Anthropic’s Claude 3 Haiku model, but, other than being shown next to the article summary, are not really used by the backend. There is an opportunity here to take advantage of those topics as a means of filtering the feed or as a different input into the model training process.
If you are interested, reach out to me at mail-at-this-domain to learn more (especially if you’re interested in using this as a paying customer — I know this is what I want but I am less sure if there are other people who have the same interests).

I’ve also posted the core code to make this work onto GitHub.

Update (2024 Nov 21): Since first writing this, I’ve made a number of updates which I have described in this blog post, which also includes updated priorities for next steps
Backup Your Home Server with Duplicati

April 24, 2024
(Note: this is part of my ongoing series on cheaply selfhosting)

Through some readily available Docker containers and OpenMediaVault, I have a cheap mini-PC which serves as:
- ad blocker for all the devices in my household
- media streamer (so I can play movies and read downloaded ebooks/documents anywhere that has internet access)
- acts as my personal RSS/newsreader
- handles every PDF related operation you can think of
- functions as network storage for my family
But, over time, as the server has picked up more uses, it’s also become a vulnerability. If any of the drives on my machine ever fail, I’ll lose data that is personally (and sometimes economically) significant.

I needed a home server backup plan.

Duplicati

Duplicati is open source software that helps you efficiently and securely backup specific partitions and folders to any destination. This could be another home server or it can be a cloud service provider (like Amazon S3 or Backblaze B2 or even a consumer service like Dropbox, Google Drive, and OneDrive). While there are many other tools that can support backup, I went with Duplicati because I wanted:
- Support for consumer storage services as a target: I am a customer of Google Drive (through Google One) and Microsoft 365 (which comes with generous OneDrive subscription) and only intend to backup some of the files I’m currently storing (mainly some of the network storage I’m using to hold important files)
- A web-based control interface so I could access this from any computer (and not just whichever machine had the software I wanted)
- An active user forum so I could find how-to guides and potentially get help
- Available as a Docker container on linuxserver.io: linuxserver.io is well-known for hosting and maintaining high quality and up-to-date Docker container images
Installation

Update 2024 Dec 18: One reason Duplicati is a great solution is that it is actively being developed. However, occasionally this can introduce breaking changes. Since version 2.0.9.105, Duplicati now requires a password. This has required an update to the Docker compose setup to include an Encryption Key, a Password, and an earlier update required Nginx proxy to pass additional headers to handle the Websocket-based interface the web interface now uses to keep its interface dynamic. I’ve changed the text below to reflect these

To install Duplicati on OpenMediaVault:
- If you haven’t already, make sure you have OMV Extras and Docker Compose installed (refer to the section Docker and OMV-Extras in my previous post, you’ll want to follow all 10 steps as I refer to different parts of the process throughout this post) and have a static local IP address assigned to your server.
- Login to your OpenMediaVault web admin panel, and then go to [Services > Compose > Files] in the sidebar. Press the button in the main interface to add a new Docker compose file.
  
  Under Name put down Duplicati and under File, adapt the following (making sure the number of spaces are consistent)
```
---
services:
   duplicati:
     image: lscr.io/linuxserver/duplicati:latest
     container_name: duplicati
     ports:
       - <unused port number>:8200
     environment:
       - TZ: 'America/Los_Angeles'
       - PUID=<UID of Docker User>
       - PGID=<GID of Docker User>
       - DUPLICATI__WEBSERVICE_PASSWORD=<Password to access interface>
       - SETTINGS_ENCRYPTION_KEY=<random set of at least 8 characters/numbers>
     volumes:
       - <absolute paths to folders to backup>:<names to use in Duplicati interface>
       - <absolute path to shared config folder>/Duplicati:/config
     restart: unless-stopped
Code language: YAML (yaml)
```
- Under ports:, make sure to add an unused port number (I went with 8200).
  
  Replace <absolute path to shared config folder> with the absolute path to the config folder where you want Docker-installed applications to store their configuration information (accessible by going to [Storage > Shared Folders] in the administrative panel).
  
  You’ll notice there’s extra lines under volumes: for <absolute paths to folders to backup>. This should correspond with the folders you are interested in backing up. You should map them to names that will show up in the Duplicati interface that you recognize. For example, I directed my <absolute path to shared config folder> to /containerconfigs as one of the things I want to make sure I backup are my containers.
  
  Once you’re done, hit Save and you should be returned to your list of Docker compose files for the next step. Notice that the new Duplicati entry you created has a Down status, showing the container has yet to be initialized.
- To start your Duplicati container, click on the new Duplicati entry and press the (up) button. This will create the container, download any files needed, and run it.
  
  To show it worked, go to your-servers-static-ip-address:8200 from a browser that’s on the same network as your server (replacing 8200 if you picked a different port in the configuration file above) and you should see the Duplicati web interface which should look something like below
- You can skip this step if you didn’t set up Pihole and local DNS / Nginx proxy or if you don’t care about having a user-readable domain name for Duplicati. But, assuming you do and you followed my instructions, open up WeTTy (which you can do by going to wetty.home in your browser if you followed my instructions or by going to [Services > WeTTY] from OpenMediaVault administrative panel and pressing Open UI button in the main panel) and login as the root user. Run:
```
cd /etc/nginx/conf.d
ls
nano <your file name>.confCode language: Shell Session (shell)
```
- This opens up the text editor nano with the file you just listed. Use your cursor to go to the very bottom of the file and add the following lines (making sure to use tabs and end each line with a semicolon)
```
server {
    listen             80;
    server_name        <duplicati.home or the domain you'd like to use>;
    location / {
        proxy_pass             http://<your-server-static-ip>:<duplicati port no.>;
        proxy_http_version     1.1;
        proxy_set_header       Upgrade $http_upgrade;
        proxy_set_header       Connection "upgrade";
        proxy_set_header       Host $host;
        proxy_set_header       X-Real-IP $remote_addr;
        proxy_set_header       X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header       X-Forwarded-Proto $scheme;
    }
}Code language: HTML, XML (xml)
```
- And then hit Ctrl+X to exit, Y to save, and Enter to overwrite the existing file. Then in the command line run the following to restart Nginx with your new configuration loaded.
```
systemctl restart nginx
```
- Now, if your server sees a request for duplicati.home (or whichever domain you picked), it will direct them to Duplicati. With the additional proxy_http_version and proxy_set_header‘s, it will also properly forward the Websocket requests the web interface uses.
- Login to your Pihole administrative console (you can just go to pi.hole in a browser) and click on [Local DNS > DNS Records] from the sidebar. Under the section called Add a new domain/IP combination, fill out under Domain: the domain you just added above (i.e. duplicati.home) and next to IP Address: you should add your server’s static IP address. Press the Add button and it will show up below.
- To make sure it all works, enter the domain you just added (duplicati.home if you went with my default) in a browser and you should see the Duplicati interface!
Configuring your Backups

Duplicati conceives of each “backup” as a “source” (folder of files to backup), a “destination” (the place the files should be backed up to), a schedule (how often does the backup run), and some options to configure how the backup works.

After logging in (with the password you specified in the Docker compose file), to configure a “backup”, click on +Add Backup button on the menu on the lefthand side. I’ll show you the screens I went through to backup my Docker container configurations:
1. Add a name (I called it DockerConfigs) and enter a Passphrase (you can use the Generate link to create a strong password) which you’d use to restore from backup. Then hit Next
2. Enter a destination. Here, you can select another computer or folder connected to your network. You can also select an online storage service.
  
  I’m using Microsoft OneDrive — for a different service, a quick Google search or a search of the Duplicati how-to forum can give you more specific instructions, but the basic steps of generating an AuthID link appear to be similar across many services.
  
  I selected Microsoft OneDrive v2 and picked a path in my OneDrive for the backup to go to (Backup/dockerconfigs). I then clicked on the AuthID link and went through an authentication process to formally grant Duplicati access to OneDrive. Depending on the service, you may need to manually copy a long string of letters and numbers and colons into the text field. After all of that, to prove it all worked, press Test connection!
  
  Then hit Next
3. Select the source. Use the folder browsing widget on the interface to select the folder you wish to backup.
  
  If you recall in my configuration step, I mapped the <absolute path to shared config folder> to /containerconfigs which is why I selected this as a one-click way to backup all my Docker container configurations. If necessary, feel free to shut down and delete your current container and start over with a configuration where you point and map the folders in a better way.
  
  Then hit Next
4. Pick a schedule. Do you want to backup every day? Once a week? Twice a week? Since my docker container configurations don’t change that frequently, I decided to schedule weekly backups on Saturday early morning (so it wouldn’t interfere with something else I might be doing).
  
  Pick your option and then hit Next
5. Select your backup options. Unless you have a strong reason to, I would not change the remote volume size from the default (50 MB). The backup retention, however, is something you may want to think about. Duplicati gives you the option to hold on to every backup (something I would not do unless you have a massive amount of storage relative to the amount of data you want to backup), to hold on to backups younger than a certain age, to hold on to a specific number of backups, or customized permutations of the above.
  
  The option you should choose depends on your circumstances, but to share what I did. For some of my most important files, I’m using Duplicati’s smart backup retention option (which gives me one backup from the last week, one for each of the last 4 weeks, and one for each of the last 12 months). For some of my less important files (for example, my docker container configurations), I’m holding on to just the last 2 weeks worth of backups.
  
  Then hit Save and you’re set!
I hope this helps you on your self-hosted backup journey.

If you’re interested in how to setup a home server on OpenMediaVault or how to self-host different services, check out all my posts on the subject!
Why Intel has to make its foundry business work

April 10, 2024

Historically, Intel has (1) designed and (2) manufactured its chips that it sells (primarily into computer and server systems). It prided itself on having the most advanced (1) designs and (2) manufacturing technology, keeping both close to its chest.

In the late 90s/00s, semiconductor companies increasingly embraced the “fabless model”, whereby they would only do the (1) design while outsourcing the manufacturing to foundries like TSMC. This made it much easier and less expensive to build up a burgeoning chip business and is the secret to the success of semiconductor giants like NVIDIA and Qualcomm.

Companies like Intel scoffed at this, arguing that the combination of (1) design and (2) manufacturing gave their products an advantage, one that they used to achieve a dominant position in the computing chip segment. And, it’s an argument which underpins why they have never made a significant effort in becoming a contract manufacturer — after all, if part of your technological magic is the (2) manufacturing, why give it to anyone else?

The success of TSMC has brought a lot of questions about Intel’s advantage in manufacturing and, given recent announcements by Intel and the US’s CHIPS Act, a renewed focus on actually becoming a contract manufacturer to the world’s leading chip designers.

While much of the attention has been paid to the manufacturing prowess rivalry and the geopolitical reasons behind this, I think the real reason Intel has to make the foundry business work is simple: their biggest customers are all becoming chip designers.

While a lot of laptops and desktops and servers are still sold in the traditional fashion, the reality is more and more of the server market is being dominated by a handful of hyperscale data center operators like Amazon, Google, Meta/Facebook, and Microsoft, companies that have historically been able to obtain the best prices from Intel because of their volume. But, in recent years, in the chase for better and better performance and cost and power consumption, they have begun designing their own chips adapted to their own systems (as this latest Google announcement for Google’s own ARM-based server chips shows).

Are these chips as good as Intel’s across every dimension? Almost certainly not. It’s hard to overtake a company like Intel’s decades of design prowess and market insight. But, they don’t have to be. They only have to be better at the specific use case Google / Microsoft / Amazon / etc need it to be for.

And, in that regard, that leaves Intel with really only one option: it has to make the foundry business work, or it risks losing not just the revenue from (1) designing a data center chip, but from the (2) manufacturing as well.

Axion processors combine Google’s silicon expertise with Arm’s highest performing CPU cores to deliver instances with up to 30% better performance than the fastest general-purpose Arm-based instances available in the cloud today, up to 50% better performance and up to 60% better energy-efficiency than comparable current-generation x86-based instances¹. That’s why we’ve already started deploying Google services like BigTable, Spanner, BigQuery, Blobstore, Pub/Sub, Google Earth Engine, and the YouTube Ads platform on current generation Arm-based servers and plan to deploy and scale these services and more on Axion soon.

Introducing Google Axion Processors, our new Arm-based CPUs
Amin Vahdat | Google Blog
Geothermal data centers

March 8, 2024
The data centers that power AI and cloud services are limited by 3 things:
- the server hardware (oftentimes limited by access to advanced semiconductors)
- available space (their footprint is massive which makes it hard to put them close to where people live)
- availability of cheap & reliable (and, generally, clean) power
If you, as a data center operator, can tap a new source of cheap & reliable power, you will go very far as you alleviate one of the main constraints on the ability to add to your footprint.

It’s no small wonder, then, that Google is willing to explore partnerships with next-gen geothermal startups like Fervo in a meaningful long-term fashion.

But Google is hoping the road to commercialization for next-generation geothermal will mimic the early days of the tech industry’s solar procurement efforts, Maud Texier, global director of clean energy and decarbonization development, told Latitude Media.

“While there are some important differences between solar and geothermal technologies, we would like to see geothermal power follow a similar trajectory as solar has over the last few decades in terms of rapid cost declines and performance improvements,” Texier said.

Google: Geothermal could follow ‘similar trajectory’ to solar
Maeve Allsup | Latitude Media
The IE6 YouTube conspiracy

February 28, 2024

An oldie but a goodie — the story of how the YouTube team, post-Google acquisition, put up a “we won’t support Internet Explorer 6 in the future” message without any permission from anyone. (HT: Eric S)

IE6 had been the bane of our web development team’s existence. At least one to two weeks every major sprint cycle had to be dedicated to fixing new UI that was breaking in IE6. Despite this pain, we were told we had to continue supporting IE6 because our users might be unable to upgrade or might be working at companies that were locked in. IE6 users represented around 18% of our user base at that point. We understood that we could not just drop support for it. However, sitting in that cafeteria, having only slept about a few hours each in the previous days, our compassion for these users had completely eroded away. We began collectively fantasizing about how we could exact our revenge on IE6. One idea rose to the surface that quickly captured everyone’s attention. Instead of outright dropping IE6 support, what if we just threatened to? How would users react? Would they revolt against YouTube? Would they mail death threats to our team like had happened in the past? Or would they suddenly become loud advocates of modern browsers?

A Conspiracy to Kill IE6
Chris Zacharias
NVIDIA to make custom AI chips? Tale as old as time

February 18, 2024

Every standard products company (like NVIDIA) eventually gets lured by the prospect of gaining large volumes and high margins of a custom products business.

And every custom products business wishes they could get into standard products to cut their dependency on a small handful of customers and pursue larger volumes.

Given the above and the fact that NVIDIA did used to effectively build custom products (i.e. for game consoles and for some of its dedicated autonomous vehicle and media streamer projects) and the efforts by cloud vendors like Amazon and Microsoft to build their own Artificial Intelligence silicon it shouldn’t be a surprise to anyone that they’re pursuing this.

Or that they may eventually leave this market behind as well.

While using NVIDIA’s A100 and H100 processors for AI and high-performance computing (HPC) instances, major cloud service providers (CSPs) like Amazon Web Services, Google, and Microsoft are also advancing their custom processors to meet specific AI and general computing needs. This strategy enables them to cut costs as well as tailor capabilities and power consumption of their hardware to their particular needs. As a result, while NVIDIA’s AI and HPC GPUs remain indispensable for many applications, an increasing portion of workloads now run on custom-designed silicon, which means lost business opportunities for NVIDIA. This shift towards bespoke silicon solutions is widespread and the market is expanding quickly. Essentially, instead of fighting custom silicon trend, NVIDIA wants to join it.

Report: NVIDIA Forms Custom Chip Unit for Cloud Computing and More
Anton Shilov | Anandtech
Selfhosting FreshRSS

January 15, 2024
(Note: this is part of my ongoing series on cheaply selfhosting)

It’s been a few months since I started down the selfhosting/home server journey. Thanks to Docker, it has been relatively smooth sailing. Today, I have a cheap mini-PC based server that:
- blocks ads / online trackers on all devices
- stores and streams media (even for when I’m out of the house)
- acts as network storage (for our devices to store and share files)
- serves as a personal RSS/newsreader
The last one is new since my last post and, in the hopes that this helps others exploring what they can selfhost or who maybe have a home server and want to start deploying services, I wanted to share how I set up FreshRSS, a self-hosted RSS reader (on an OpenMediaVault v6 server)

Why a RSS Reader?

Like many who used it, I was a massive Google Reader fan. Until 2013 when it was unceremoniously shut down, it was probably the most important website I used after Gmail.

I experimented with other RSS clients over the years, but found that I did not like most commercial web-based clients (which were focused on serving ads or promoting feeds I was uninterested in) or desktop clients (which were difficult to sync between devices). So, I switched to other alternatives (i.e. Twitter) for a number of years.

FreshRSS

Wanting to return to the simpler days where I could simply follow the content I was interested in, I stumbled on the idea of self-hosting an RSS reader. Looking at the awesome-selfhosted feed reader category, I looked at the different options and chose to go with FreshRSS for a few reasons:
- It had the most Github stars of any feed reader — an imperfect but reasonable sign of a well-liked project.
- It has been around for over 10 years! and as a result has attracted a community of extension developers and even cloud RSS service providers — suggests the product has real stability and traction which will be helpful if things go wrong
- It’s available as a Docker container on linuxserver.io — linuxserver.io is well-known for hosting and maintaining high quality and up-to-date Docker container images
Installation

To install FreshRSS on OpenMediaVault:
- If you haven’t already, make sure you have OMV Extras and Docker Compose installed (refer to the section Docker and OMV-Extras in my previous post, you’ll want to follow all 10 steps as I refer to different parts of the process throughout this post) and have a static local IP address assigned to your server.
- Login to your OpenMediaVault web admin panel, and then go to [Services > Compose > Files] in the sidebar. Press the button in the main interface to add a new Docker compose file.
  
  Under Name put down FreshRSS and under File, adapt the following (making sure the number of spaces are consistent)
```
version: "2.1"
services:
   freshrss:
     container_name: freshrss
     image: lscr.io/linuxserver/freshrss:latest
     ports:
       - <unused port number like 3777>:80
     environment:
       - TZ: 'America/Los_Angeles'
       - PUID=<UID of Docker User>
       - PGID=<GID of Docker User>
     volumes:
       - '<absolute path to shared config folder>/FreshRSS:/config'
     restart: unless-stopped
```
  You’ll need to replace <UID of Docker User> and <GID of Docker User> with the UID and GID of the Docker user you created (which will be 1000 and 100 if you followed the steps I laid out, see Step 10 in the section “Docker and OMV-Extras” in my initial post)
  
  I live in the Bay Area so I set the timezone TZ to America/Los_Angeles. You can find yours here.
  
  Under ports:, make sure to add an unused port number (I went with 3777).
  
  Replace <absolute path to shared config folder> with the absolute path to the config folder where you want Docker-installed applications to store their configuration information (accessible by going to [Storage > Shared Folders] in the administrative panel).
  
  Once you’re done, hit Save and you should be returned to your list of Docker compose files for the next step. Notice that the new FreshRSS entry you created has a Down status, showing the container has yet to be initialized.
- To start your FreshRSS container, click on the new FreshRSS entry and press the (up) button. This will create the container, download any files needed, and run it.
  
  And that’s it! To prove it worked, go to your-servers-static-ip-address:3777 from a browser that’s on the same network as your server (replacing 3777 if you picked a different port in the configuration above) and you should see the FreshRSS installation page (see below)
- You can skip this step if you didn’t (as I laid out in my last post) set up Pihole and local DNS / Nginx proxy or if you don’t care about having a user-readable domain name for FreshRSS. But, assuming you do and you followed my instructions, open up WeTTy (which you can do by going to wetty.home in your browser if you followed my instructions or by going to [Services > WeTTY] from OpenMediaVault administrative panel and pressing Open UI button in the main panel) and login as the root user. Run:
```
cd /etc/nginx/conf.d
ls
```
  Pick out the file you created before for your domains and run
```
nano <your file name>.conf
```
  This opens up the text editor nano with the file you just listed. Use your cursor to go to the very bottom of the file and add the following lines (making sure to use tabs and end each line with a semicolon)
```
server {
    listen             80;
    server_name        <rss.home or the domain you'd like to use>;
    location / {
        proxy_pass     http://<your-server-static-ip>:<FreshRSS port number>;
    }
}
```
  And then hit Ctrl+X to exit, Y to save, and Enter to overwrite the existing file. Then in the command line run the following to restart Nginx with your new configuration loaded.
```
systemctl restart nginx
```
  Now, if your server sees a request for rss.home (or whichever domain you picked), it will direct them to FreshRSS.
  
  Login to your Pihole administrative console (you can just go to pi.hole in a browser) and click on [Local DNS > DNS Records] from the sidebar. Under the section called Add a new domain/IP combination, fill out under Domain: the domain you just added above (i.e. rss.home) and next to IP Address: you should add your server’s static IP address. Press the Add button and it will show up below.
  
  To make sure it all works, enter the domain you just added (rss.home if you went with my default) in a browser and you should see the FreshRSS installation page.
- Completing installation is easy. Thanks to the use of Docker, all of your PHP and files will be configured accurately so you should be able to proceed with the default options. Unless you’re planning to store millions of articles served to dozens of people, the default option of SQLite as database type should be sufficient in Step 3 (see below)
  
  This leaves the final task of configuring a username and password (and, again, unless you’re serving this to many users whom you’re worried will hack you, the default authentication method of Web form will work)
  
  Finally, press Complete installation and you will be taken to the login page:
Advice

Once you’ve logged in with the username and password you just set, the world is your oyster. If you’ve ever used an RSS reader, the interface is pretty straightforward, but the key is to use the Subscription management button in the interface to add RSS feeds and categories as you see fit. FreshRSS will, on a regular basis, look for new content from those feeds and put it in the main interface. You can then step through and stay up to date on the sites that matter to you. There are a lot more features you can learn about from the FreshRSS documentation.

On my end, I’d recommend a few things:
- How to find the RSS feed for a page — Many (but not all) blog/news pages have RSS feeds. The most reliable way to find it is to right click on the page you’re interested in from your browser and select View source (on Chrome you’d hit Ctrl+U). Hit Ctrl+F to trigger a search and look for rss. If there is an RSS feed, you’ll see something that says "application/rss+xml" and near it will usually be a URL that ends in /rss or /feed or something like that (my blog, for instance, hosted on benjamintseng.com has a feed at benjamintseng.com/rss).
  - Once you open up the feed,
- Learn the keyboard shortcuts — they’re largely the same as found on Gmail (and the old Google Reader) but they make using this much faster:
  - j to go to the next article
  - k to go to the previous article
  - r to toggle if something is read or not
  - v to open up the original page in a new tab
- Use the normal view, sorted oldest first — (you do this by tapping the Settings gear in the upper-right of the interface and then selecting Reading under Configuration in the menu). Even though I’ve aggressively curated the feeds I subscribe to, there is a lot of material and the “normal view” allows me to quickly browse headlines to see which ones are more worth my time at a glance. I can also use my mouse to selectively mark somethings as read so I can take a quick Inbox Zero style approach to my feeds. This allows me to think of the j shortcut as “move forward in time” and the k shortcut as “move backwards” and I can use the pulldown menu next to Mark as read button to mark content older than one day / one week as read if I get overwhelmed.
- Subscribe to good feeds — probably a given, but here are a few I follow to get you started:
  - xkcd (feed URL): classic webcomic
  - Saturday Morning Breakfast Cereal (feed URL): another great webcomic
  - Paul Graham’s Essays (feed URL): love or hate him, Paul Graham does some great work writing very simple but thoughtful pieces
  - Collaborative Fund blog (feed URL): Morgan Housel is one of my favorite writers and he is responsible for most of the posts here
  - Benjamin Tseng’s blog (feed URL): you’re already here, aren’t you? 😇 You can also subscribe via email. I write about interesting things I’m reading, how-to guides like this, and my thoughts on tech / science / finance
I hope this helps you get started!

(If you’re interested in how to setup a home server on OpenMediaVault or how to self-host different services, check out all my posts on the subject)
Consulting / Advisory

October 22, 2023
Hi! My name is Benjamin Tseng. My clients work with me because of my deep experience in:
- Startup Advisory — I’ve spent 15+ years investing with two cross-border VC firms (DCM and 1955 Capital) in deeptech companies and in leadership / advisory roles at several VC-backed startups
- Product Management — I’ve taken on product leadership roles at several VC-backed startups including Yik Yak (consumer social), Maximus (telemedicine), Clint Health (health IT), and Stir (creator economy/fintech)
- Market Analysis / Investment Due Diligence — I started my career at Bain doing strategic analysis for Fortune 500 semiconductor and eCommerce clients. I subsequently drove investment due diligence processes at two VC firms where I specialized in deeptech and healthcare opportunities.
- AI / ML work — I am a published researcher who’s applied AI/ML methods to electronic medical record data and have also built products powered by NLP and LLMs.
For more about my background, take a look at my CV. For examples of projects I’ve done in the past and am open to taking on, see Types of Client Work below.

If you’re interested in working with me in any of these capacities, please direct inquiries to [mail-at-<thisdomainname.com>].

Types of Client Work
- Early Stage Product Management & Strategy
  - Build actionable product plans that account for operational & regulatory complexity (e.g. integration with customer support/ops; integrations with 3rd parties like Stripe or Plaid or an EMR; handling KYC/AML; addressing HIPAA; complying with US telemedicine regulations; etc), unit economics / market analysis, and market research
  - Collaborate with engineers, designers, and other stakeholders on
    - Rapid prototyping for product discovery
    - 0-to-1 new product development and lightweight process creation
    - Product improvement pushes to address architecture/strategy issues and expand product reach
- Expert Technology / Market Analysis
  - Conduct market analysis and unit economics assessment as part of a strategic planning or investment due diligence process
  - Objectively assess novel technologies and translate findings into business insights
- Startup Advisory & Strategic Planning
  - Work with management to build actionable strategic plans that account for current and expected business activities, future financing needs, likely execution risks, and the need for stakeholder support
  - Assist management teams with fundraising strategy from deck creation to cap table & financial modeling to syndicate formation and negotiation
- Analytics and Metrics
  - Create or overhaul an existing metrics plan to help companies understand their product/business and devise better strategy
  - Collaborate with stakeholders to select analytics stacks and implement dashboards to realize a metrics plan
  - Execute on bespoke analyses (retention, lifetime value, segmentation, clustering, sentiment analysis, etc.) to answer key strategic and operational questions
- Data Science / Machine Learning / Artificial Intelligence
  - Leverage ML and deep learning/AI methods to tackle classification and prediction problems
  - Build LLM-powered applications leveraging both publicly available LLMs (i.e. from OpenAI, Claude, and Google Gemini) and open source LLMs run on controlled infrastructure (i.e. Llama 3)
  - Scrape publicly available datasets / web pages for information to power data products or AI/ML models
Pixel’s Parade of AI

October 7, 2023

I am a big Google Pixel fan, being an owner and user of multiple Google Pixel line products. As a result, I tuned in to the recent MadeByGoogle stream. While it was hard not to be impressed with the demonstrations of Google’s AI prowess, I couldn’t help but be a little baffled…

What was the point of making everything AI-related?

Given how low Pixel’s market share is in the smartphone market, you’d think the focus ought to be on explaining why “normies” should buy the phone or find the price tag compelling, but instead every feature had to tie back to AI in some way.

Don’t get me wrong, AI is a compelling enabler of new technologies. Some of the call and photo functionalities are amazing, both as technological demonstrations but also in terms of pure utility for the user.

But, every product person learns early that customers care less about how something gets done and more about whether the product does what they want it too. And, as someone who very much wants a meaningful rival to Apple and Samsung, I hope Google doesn’t forget that either.

But while Google can call itself an AI company all it likes, people ultimately just want phones filled with useful features. At a certain point, it risks putting the AI technology cart in front of the feature horse.

Google’s Pixel 8 launch was a parade of AI
Jon Porter | The Verge
Setting Up Pihole, Nginx Proxy, and Twingate with OpenMediaVault

August 10, 2023
(Note: this is part of my ongoing series on cheaply selfhosting)

I recently shared how I set up a (OpenMediaVault) home server on a cheap mini-PC. After posting it, I received a number of suggestions that inspired me to make a few additional tweaks to improve the security and usability of my server.

Read more if you’re interested in setting up (on an OpenMediaVault v6 server):
- Pihole, a “DNS filter” that blocks ads / trackers
- using Pihole as a local DNS server to have custom web addresses for software services running on your network and Nginx to handle port forwarding
- Twingate (a better alternative to opening up a port and setting up Dynamic DNS to grant secure access to your network)
Pihole

Pihole is a lightweight local DNS server (it gets its name from the Raspberry Pi, a <$100 device popular with hobbyists, that it can run fully on).

A DNS (or Domain Name Server) converts human readable addresses (like www.google.com) into IP addresses (like 142.250.191.46). As a result, every piece of internet-connected technology is routinely making DNS requests when using the internet. Internet service providers typically offer their own DNS servers for their customers. But, some technology vendors (like Google and CloudFlare) also offer their own DNS services with optimizations on speed, security, and privacy.

A home-grown DNS server like Pihole can layer additional functionality on top:
- DNS “filter” for ad / tracker blocking: Pihole can be configured to return dummy IP addresses for specific domains. This can be used to block online tracking or ads (by blocking the domains commonly associated with those activities). While not foolproof, one advantage this approach has over traditional ad blocking software is that, because this blocking happens at the network level, the blocking extends to all devices on the network (such as internet-connected gadgets, smart TVs, and smartphones) without needing to install any extra software.
- DNS caching for performance improvements: In addition to the performance gains from blocking ads, Pihole also boosts performance by caching commonly requested domains, reducing the need to “go out to the internet” to find a particular IP address. While this won’t speed up a video stream or download, it will make content from frequently visited sites on your network load faster by skipping that internet lookup step.
To install Pihole using Docker on OpenMediaVault:
- If you haven’t already, make sure you have OMV Extras and Docker Compose installed (refer to the section Docker and OMV-Extras in my previous post) and have a static local IP address assigned to the server.
- Login to your OpenMediaVault web admin panel, go to [Services > Compose > Files], and press the button. Under Name put down Pihole and under File, adapt the following (making sure the number of spaces are consistent)
```
version: "3"
services:
   pihole:
     container_name: pihole
     image: pihole/pihole:latest
     ports:
       - "53:53/tcp"
       - "53:53/udp"
       - "8000:80/tcp"
     environment:
       TZ: 'America/Los_Angeles'
       WEBPASSWORD: '<Password for the web admin panel>'
       FTLCONF_LOCAL_IPV4: '<your server IP address>'
     volumes:
       - '<absolute path to shared config folder>/pihole:/etc/pihole'
       - '<absolute path to shared config folder>/dnsmasq.d:/etc/dnsmasq.d'
     restart: unless-stopped
```
  You’ll need to replace <Password for the web admin panel> with the password you’ll want to use to be access the Pihole web configuration interface, <your server IP address> with the static local IP address for your server, and <absolute path to shared config folder> with the absolute path to the config folder where you want Docker-installed applications to store their configuration information (accessible by going to [Storage > Shared Folders] in the administrative panel).
  
  I live in the Bay Area so I set timezone TZ to America/Los_Angeles. You can find yours here.
  
  Under Ports, I’ve kept the port 53 reservation (as this is the standard port for DNS requests) but I’ve chosen to map the Pihole administrative console to port 8000 (instead of the default of port 80 to avoid a conflict with the OpenMediaVault admin panel default). Note: This will prevent you from using Pihole’s default pi.hole domain as a way to get to the Pihole administrative console out-of-the-box. Because standard web traffic goes to port 80 (and this configuration has Pihole listening at port 8080), pi.hole would likely just direct you to the OpenMediaVault panel. While you could let pi.hole take over port 80, you would need to move OpenMediaVault’s admin panel to a different port (which itself has complexity). I ultimately opted with keeping OpenMediaVault at port 80 knowing that I could configure Pihole and Nginx proxy (see below) to redirect pi.hole to the right port.
  
  You’ll notice this configures two volumes, one for dnsmasq.d, which is the DNS service, and one for pihole which provides an easy way to configure dnsmasq.d and download blocklists.
  
  Note: the above instructions assume your home network, like most, is IPv4 only. If you have an IPv6 network, you will need to add an IPv6: True line under environment: and replace the FTLCONF_LOCAL_IPV4:'<server IPv4 address>' with FTLCONF_LOCAL_IPV6:'<server IPv6 address>'. For more information, see the official Pihole Docker instructions.
  
  Once you’re done, hit Save and you should be returned to your list of Docker compose files for the next step. Notice that the new Pihole entry you created has a Down status, showing the container has yet to be initiated.
- Disabling systemd-resolved: Most modern Linux operating systems include a built-in DNS resolver that listens on port 53 called systemd-resolved. Prior to initiating the Pihole container, you’ll need to disable this to prevent that port conflict. Use WeTTy (refer to the section Docker and OMV-Extras in my previous post) or SSH to login as the root user to your OpenMediaVault command line. Enter the following command:
```
nano /etc/systemd/resolved.conf
```
  Look for the line that says #DNSStubListener=yes and replace it with DNSStubListener=no, making sure to remove the # at the start of the line. (Hit Ctrl+X to exit, Y to save, and Enter to overwrite the file). This configuration will tell systemd-resolved to stop listening to port 53.
  
  To complete the configuration change, you’ll need to edit the symlink /etc/resolv.conf to point to the file you just edited by running:
```
sh -c 'rm /etc/resolv.conf && ln -s /run/systemd/resolve/resolv.conf /etc/resolve.conf'
```
  Now all that remains is to restart systemd-resolved:
```
systemctl restart systemd-resolved
```
- How to start / update / stop / remove your Pihole container: You can manage all of your Docker Compose files by going to [Services > Compose > Files] in the OpenMediaVault admin panel. Click on the Pihole entry (which should turn it yellow) and press the (up) button. This will create the container, download any files needed, and, if you properly disabled systemd-resolved in the last step, initiate Pihole.
  
  And that’s it! To prove it worked, go to your-server-ip:8000 in a browser and you should see the login for the Pihole admin webpage (see below).
  
  From time to time, you’ll want to update the container. OMV makes this very easy. Every time you press the (pull) button in the [Services > Compose > Files] interface, Docker will pull the latest version (maintained by the Pihole team).
Now that you have Pihole running, it is time to enable and configure it for your network.
- Test Pihole from a computer: Before you change your network settings, it’s a good idea to make sure everything works.
  - On your computer, manually set your DNS service to your Pihole by putting in your server IP address as the address for your computer’s primary DNS server (Mac OS instructions; Windows instructions; Linux instructions). Be sure to leave any alternate / secondary addresses blank (many computers will issue DNS requests to every server they have on their list and if an alternative exists you may not end up blocking anything).
  - (Temporarily) disable any ad blocking service you may have on your computer / browser you want to test with (so that this is a good test of Pihole as opposed to your ad blocking software). Then try to go to https://consumerproductsusa.com/ — this is a URL that is blocked by default by Pihole. If you see a very spammy website promising rewards, either your Pihole does not work or you did not configure your DNS correctly.
  - Finally login to the Pihole configuration panel (your-server-ip:8000) using the password you set up during installation. From the dashboard click on the Queries Blocked box at the top (your colors may vary but it’s the red box on my panel, see below).
    
    On the next screen, you should see the domain consumerproductsusa.com next to the IP address of your computer, confirming that the address was blocked.
    
    You can now turn your ad blocking software back on!
  - You should now set the DNS service on your computer back to “automatic” or “DHCP” so that it will inherit its DNS settings from the network/router (and especially if this is a laptop that you may use on another network).
- Configure DNS on router: Once you’ve confirmed that the Pihole service works, you should configure the default DNS settings on your router to make Pihole the DNS service for your entire network. The instructions for this will vary by router manufacturer. If you use Google Wifi as I do, here are the instructions.
  
  Once this is completed, every device which inherits DNS settings from the router will now be using Pihole for their DNS requests.
  
  Note: one downside of this approach is that the Pihole becomes a single point of failure for the entire network. If the Pihole crashes or fails, for any reason, none of your network’s DNS requests will go through until the router’s settings are changed or the Pihole becomes functional again. Pihole generally has good reliability so this is unlikely to be an issue most of the time, but I am currently using Google’s DNS as a fallback on my Google Wifi (for the times when something goes awry with my server) and I would also encourage you to know how to change the DNS settings for your router in case things go bad so that your access to the internet is not taken out unnecessarily.
- Configure Pihole: To get the most out of Pihole’s ad blocking functionality, I would suggest three things
  - Select Good Upstream DNS Servers: From the Pihole administrative panel, click on Settings. Then select the DNS tab. Here, Pihole allows you to configure which external DNS services the DNS requests on your network should go to if they aren’t going to be blocked and haven’t yet been cached. I would recommend selecting the checkboxes next to Google and Cloudflare given their reputations for providing fast, secure, and high quality DNS services (and selecting multiple will provide redundancy).
  - Update Gravity periodically: Gravity is the system by which Pihole updates its list of domains to block. From the Pihole administrative panel, click on [Tools > Update Gravity] and click the Update button. If there are any updates to the blocklists you are using, these will be downloaded and “turned on”.
  - Configure Domains to block/allow: Pihole allows administrators to granularly customize the domains to block (blacklist) or allow (whitelist). From the Pihole administrative panel, click on Domains. Here, an admin can add a domain (or a regular expression for a family of domains) to the blacklist (if it’s not currently blocked) or the whitelist (if it currently is) to change what happens when a user on the network accesses the DNS.
    
    I added whitelist exclusions for link.axios.com to let me click through links from the Axios email newsletters I receive and www.googleadservices.com to let my wife click through Google-served ads. Pihole also makes it easy to manually take a domain that a device on your network has requested to block/allow. Tap on Total Queries from the Pihole dashboard, click on the IP address of the device making the request, and you’ll see every DNS request (including those which were blocked) with a link beside them to add to the domain whitelist or blacklist.
    
    Pihole will also allow admins to configure different rules for different sets of devices. This can be done by calling out clients (which can be done by clicking on Clients and picking their IP address / MAC address / hostnames), assigning them to groups (which can be defined by clicking on Groups), and then configuring domain rules to go with those groups (in Domains). Unfortunately because Google Wifi simply forwards DNS requests rather than distributes them, I can only do this for devices that are configured to directly point at the Pihole, but this could be an interesting way to impose parental internet controls.
Now you have a Pihole network-level ad blocker and DNS cache!

Local DNS and Nginx proxy

As a local DNS server, Pihole can do more than just block ads. It also lets you create human readable addresses for services running on your network. In my case, I created one for the OpenMediaVault admin panel (omv.home), one for WeTTy (wetty.home), and one for Ubooquity (ubooquity.home).

If your setup is like mine (all services use the same IP address but different ports), you will need to set up a proxy as DNS does not handle port forwarding. Luckily, OpenMediaVault has Nginx, a popular web server with a performant proxy, built-in. While many online tutorials suggest installing Nginx Proxy Manager, that felt like overkill, so I decided to configure Nginx directly.

To get started:
- Configure the A records for the domains you want in Pihole: Login to your Pihole administrative console (your-server-ip:8000) and click on [Local DNS > DNS Records] from the sidebar. Under the section called Add a new domain/IP combination, fill out the Domain: you want for a given service (like omv.home or wetty.home) and the IP Address: (if you’ve been following my guides, this will be your-server-ip). Press the Add button and it will show up below. Repeat for all the domains you want. If you have a setup similar to mine, you will see many domains pointed at the same IP address (because the different services are simply different ports on my server).
  
  To test if these work, enter any of the domains you just put in to a browser and it should take you to the login page for the OpenMediaVault admin panel (as currently they are just pointing at your server IP address).
  
  Note 1: while you can generally use whatever domains you want, it is suggested that you don’t use a TLD that could conflict with an actual website (i.e. .com) or that are commonly used by networking systems (i.e. .local or .lan). This is why I used .home for all of my domains (the IETF has a list they recommend, although it includes .lan which I would advise against as some routers such as Google Wifi use this)
  
  Note 2: Pihole itself automatically tries to forward pi.hole to its web admin panel, so you don’t need to configure that domain. The next step (configuring proxy port forwarding) will allow pi.hole to work.
- Edit the Nginx proxy configuration: Pihole’s Local DNS server will send users looking for one of the domains you set up (i.e. wetty.home) to the IP address you configured. Now you need your server to forward that request to the appropriate port to get to the right service.
  
  You can do this by taking advantage of the fact that Nginx, by default, will load any .conf file in the /etc/nginx/conf.d/ directory as a proxy configuration. Pick any file name you want (I went with dothome.conf because all of my service domains end with .home) and after using WeTTy or SSH to login as root, run:
```
nano /etc/nginx/conf.d/<your file name>.conf
```
  The first time you run this, it will open up a blank file. Nginx looks at the information in this file for how to redirect incoming requests. What we’ll want to do is tell Nginx that when a request comes in for a particular domain (i.e. ubooquity.home or pi.hole) that request should be sent to a particular IP address and port.
  
  Manually writing these configuration files can be a little daunting and, truth be told, the text file I share below is the result of a lot of trial and error, but in general there are 2 types of proxy commands that are relevant for making your domain setup work.
  
  One is a proxy_pass where Nginx will basically take any traffic to a given domain and just pass it along (sometimes with additional configuration headers). I use this below for wetty.home, pi.hole, ubooquityadmin.home, and ubooquity.home. It worked without the need to pass any additional headers for WeTTy and Ubooquity, but for pi.hole, I had to set several additional proxy headers (which I learned from this post on Reddit).
  
  The other is a 301 redirect where you tell the client to simply forward itself to another location. I use this for ubooquityadmin.home because the actual URL you need to reach is not / but /admin/ and the 301 makes it easy to setup an automatic forward. I then use the regex match ~ /(.*)$ to make sure every other URL is proxy_pass‘d to the appropriate domain and port.
  
  You’ll notice I did not include the domain I configured for my OpenMediaVault console (omv.home). That is because omv.home already goes to the right place without needing any proxy to port forward.
```
server {
        listen        80;
        server_name   pi.hole;
        location / {
                proxy_pass        http://<your-server-ip>:8000;
                proxy_set_header  Host            $host;
                proxy_set_header  X-Real-IP       $host;
                proxy_set_header  X-ForwardedFor  $proxy_add_x_forwarded_for;           
                proxy_hide_header X-Frame-Options;
                proxy_set_header  X-Frame-Options "SAMEORIGIN";
                proxy_read_timeout 90;
       }
}
server {
        listen        80;
        server_name   wetty.home;
        location / {
                proxy_pass       http://<your-server-ip>:2222;
                proxy_set_header  Host            $host;
                proxy_set_header  X-Real-IP       $host;
                proxy_set_header  X-ForwardedFor  $proxy_add_x_forwarded_for;           
        }
}
server {
        listen        80;
        server_name   ubooquity.home;
        location / {
                proxy_pass       http://<your-server-ip>:2202;
        }
}
server {
        listen        80;
        server_name   ubooquityadmin.home;
        location =/ {
                return 301 http://ubooquityadmin.home/admin;
        }
        location ~ /(.*)$ {
                proxy_pass        http://<your-server-ip>:2203/$1;
        }
}
```
  If you are using other domains, ports, or IP addresses, adjust accordingly. Be sure all your curly braces have their mates ({}) and that each line ends with a semicolon (;) or Nginx will crash. I use Tab‘s between statements (i.e. between listen and 80) to format them more nicely but Nginx will accept any number or type of whitespace.
  
  To test if your new configuration worked, save your changes (hit Ctrl+X to exit, Y to save, and Enter to overwrite the file if you are editing a pre-edited one). In the command line, run the following command to restart Nginx with your new configuration loaded.
```
systemctl restart nginx
```
  Try to login to your OpenMediaVault administrative panel in a browser. If that works, it means Nginx is up and running and you at least didn’t make any obvious syntax errors!
  
  Next try to access one of the domains you just configured (for instance pi.hole) to test if the proxy was configured correctly.
  
  If either of those steps failed, use WeTTy or SSH to log back in to the command line and use the command above to edit the file (you can delete everything if you want to start fresh) and rerun the restart command after you’ve made changes to see if that fixes it. It may take a little bit of doing if you have a tricky configuration but once you’re set, everyone on the network can now use your configured addresses to access the services on your network.
Twingate

In my previous post, I set up Dynamic DNS and a Wireguard VPN to grant secure access to the network from external devices (i.e. a work computer, my smartphone when I’m out, etc.). While it worked, the approach had two flaws:
1. The work required to set up each device for Wireguard is quite involved (you have to configure it on the VPN server and then pass credentials to the device via QR code or file)
2. It requires me to open up a port on my router for external traffic (a security risk) and maintain a Dynamic DNS setup that is vulnerable to multiple points of failure and could make changing domain providers difficult.
A friend of mine, after reading my post, suggested I look into Twingate instead. Twingate offers several advantages, including:
- Simple graphical configuration of which resources should be made available to which devices
- Easier to use client software with secure (but still easy to use) authentication
- No need to configure Dynamic DNS or open a port
- Support for local DNS rules (i.e. the domains I configured in Pihole)
I was intrigued (it didn’t hurt that Twingate has a generous free Starter plan that should work for most home server setups). To set up Twingate to enable remote access:
- Create a Twingate account and Network: Go to their signup page and create an account. You will then be asked to set up a unique Network name. The resulting address, <yournetworkname>.twingate.com, will be your Network configuration page from where you can configure remote access.
- Add a Remote Network: Click the Add button on the right-hand-side of the screen. Select On Premise for Location and enter any name you choose (I went with Home network).
- Add Resources: Select the Remote Network you just created (if you haven’t already) and use the Add Resource button to add an individual domain name or IP address and then grant access to a group of users (by default, it will go to everyone).
  
  With my configuration, I added 5 domains (pi.hole + the four .home domains I configured through Pihole) and 1 IP address (for the server, to handle the ubooquityadmin.home forwarding and in case there was ever a need to access an additional service on my server that I had not yet created a domain for).
- Install Connector Docker Container: To make the selected network resources available through Twingate requires installing a Twingate Connector to something internet-connected on the network.
  
  Press the Deploy Connector button on one of the connectors on the right-hand-side of the Remote Network page (mine is called flying-mongrel). Select Docker in Step 1 to get Docker instructions (see below). Then press the Generate Tokens button under Step 2 to create the tokens that you’ll need to link your Connector to your Twingate network and resources.
  
  With the Access Token and Refresh Token saved, you are ready to configure Docker to install. Login to the OpenMediaVault administrative panel and go to [Services > Compose > Files] and press the button. Under Name put down Twingate Connector and under File, enter the following (making sure the number of spaces are consistent)
```
services:
   twingate_connector:
     container_name: twingate_connector
     restart: unless-stopped
     image: "twingate/connector:latest"
     environment:
       - SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
       - TWINGATE_API_ENDPOINT=/connector.stock
       - TWINGATE_NETWORK=<your network name>
       - TWINGATE_ACCESS_TOKEN=<your connector access token>
       - TWINGATE_REFRESH_TOKEN=<your connector refresh token>
       - TWINGATE_LOG_LEVEL=7
```
  You’ll need to replace <your network name> with the name of the Twingate network you created, <your connector access token> and <your connector refresh token> with the access token and refresh token generated from the Twingate website. Do not add any single or double quotation marks around the network name or the tokens as they will result in a failed authentication with Twingate (as I was forced to learn through experience).
  
  Once you’re done, hit Save and you should be returned to your list of Docker compose files. Click on the entry for Twingate Connector you just created and then press the (up) button to initialize the container.
  
  Go back to your Twingate network page and select the Remote Network your Connector is associated with. If you were successful, within a few moments, the Connector’s status will reflect this (see below for the before and after).
  
  If, after a few minutes there is still no change, you should check the container logs. This can be done by going to [Services > Compose > Services] in the OpenMediaVault administrative panel. Select the Twingate Connector container and press the (logs) button in the menubar. The TWINGATE_LOG_LEVEL=7 setting in the Docker configuration file sets the Twingate Connector to report all activities in great detail and should give you (or a helpful participant on the Twingate forum) a hint as to what went wrong.
- Add Users and Install Clients: Once the configuration is done and the Connector is set up, all that remains is to add user accounts and install the Twingate client software on the devices that should be able to access the network resources.
  
  Users can be added (or removed) by going to your Twingate network page and clicking on the Team link in the menu bar. You can Add User (via email) or otherwise customize Group policies. Be mindful of the Twingate Starter plan limit to 5 users…
  
  As for the devices, the client software can be found at https://get.twingate.com/. Once installed, to access the network, the user will simply need to authenticate.
- Remove my old VPN / Dynamic DNS setup. This is not strictly necessary, but if you followed my instructions from before, you can now undo those by:
  - Closing the port you opened from your Router configuration
  - Disabling Dynamic DNS setup from your domain provider
  - “Down”-ing and deleting the container and configuration file for DDClient (you can do this by going to [Services > Compose > Files] from OpenMediaVault admin panel)
  - Deleting the configured Wireguard clients and tunnels (you can do this by going to [Services > Wireguard] from the OpenMediaVault admin panel) and then disabling the Wireguard plugin (go to [System > Plugins])
  - Removing the Wireguard client from my devices
And there you have it! A secure means of accessing your network while retaining your local DNS settings and avoiding the pitfalls of Dynamic DNS and opening a port.

Resources

There were a number of resources that were very helpful in configuring the above. I’m listing them below in case they are helpful:
(If you’re interested in how to setup a home server on OpenMediaVault or how to self-host different services, check out all my posts on the subject)