Tag: AI

  • Updating my AI News Reader

    A few months ago, I shared that I had built an AI-powered personalized news reader which I use (and still do) on a near-daily basis. Since that post, I’ve made a couple of major improvements (which I have just reflected in my public Github).

    Switching to JAX

    I previously chose Keras 3 for my deep learning algorithm architecture because of its ease of use as well as the advertised ability to shift between AI/ML backends (at least between Tensorflow, JAX, and PyTorch). With Keras creator Francois Chollet noting significant speed-ups just from switching backends to JAX, I decided to give the JAX backend a shot.

    Thankfully, Keras 3 lived up to it’s multi-backend promise and made switching to JAX remarkably easy. For my code, I simply had to make three sets of tweaks.

    First, I had to change the definition of my container images. Instead of starting from Tensorflow’s official Docker images, I instead installed JAX and Keras on Modal’s default Debian image and set the appropriate environmental variables to configure Keras to use JAX as a backend:

    jax_image = (
        modal.Image.debian_slim(python_version='3.11')
        .pip_install('jax[cuda12]==0.4.35', extra_options="-U")
        .pip_install('keras==3.6')
        .pip_install('keras-hub==0.17')
        .env({"KERAS_BACKEND":"jax"}) # sets Keras backend to JAX
        .env({"XLA_PYTHON_CLIENT_MEM_FRACTION":"1.0"})
    Code language: Python (python)

    Second, because tf.data pipelines convert everything to Tensorflow tensors, I had to switch my preprocessing pipelines from using Keras’s ops library (which, because I was using JAX as a backend, expected JAX tensors) to Tensorflow native operations:

    ds = ds.map(
        lambda i, j, k, l: 
        (
            preprocessor(i), 
            j, 
            2*k-1, 
            loglength_norm_layer(tf.math.log(tf.cast(l, dtype=tf.float32)+1))
        ), 
        num_parallel_calls=tf.data.AUTOTUNE
    )
    Code language: Python (python)

    Lastly, I had a few lines of code which assumed Tensorflow tensors (where getting the underlying value required a .numpy() call). As I was now using JAX as a backend, I had to remove the .numpy() calls for the code to work.

    Everything else — the rest of the tf.data preprocessing pipeline, the code to train the model, the code to serve it, the previously saved model weights and the code to save & load them — remained the same! Considering that the training time per epoch and the time the model took to evaluate (a measure of inference time) both seemed to improve by 20-40%, this simple switch to JAX seemed well worth it!

    Model Architecture Improvements

    There were two major improvements I made in the model architecture over the past few months.

    First, having run my news reader for the better part of a year now, I now have accumulated enough data where my strategy to simultaneously train on two related tasks (predicting the human rating and predicting the length of an article) no longer required separate inputs. This reduced the memory requirement as well as simplified the data pipeline for training (see architecture diagram below)

    Secondly, I was successfully able to train a version of my algorithm which can use dot products natively. This not only allowed me to remove several layers from my previous model architecture (see architecture diagram below), but because the Supabase postgres database I’m using supports pgvector, it means I can even compute ratings for articles through a SQL query:

    UPDATE articleuser
    SET 
        ai_rating = 0.5 + 0.5 * (1 - (a.embedding <=> u.embedding)),
        rating_timestamp = NOW(),
        updated_at = NOW()
    FROM 
        articles a, 
        users u
    WHERE 
        articleuser.article_id = a.id
        AND articleuser.user_id = u.id
        AND articleuser.ai_rating IS NULL;
    Code language: SQL (Structured Query Language) (sql)

    The result is much greater simplicity in architecture as well as greater operational flexibility as I can now update ratings from the database directly as well as from serving a deep neural network from my serverless backend.

    Model architecture (output from Keras plot_model function)

    Making Sources a First-Class Citizen

    As I used the news reader, I realized early on that the ability to just have sorted content from one source (i.e. a particular blog or news site) would be valuable to have. To add this, I created and populated a new sources table within the database to track these independently (see database design diagram below) which was linked to the articles table.

    Newsreader database design diagram (produced by a Supabase tool)

    I then modified my scrapers to insert the identifier for each source alongside each new article, as well as made sure my fetch calls all JOIN‘d and pulled the relevant source information.

    With the data infrastructure in place, I added the ability to add a source parameter to the core fetch URLs to enable single (or multiple) source feeds. I then added a quick element at the top of the feed interface (see below) to let a user know when the feed they’re seeing is limited to a given source. I also made all the source links in the feed clickable so that they could take the user to the corresponding single source feed.

    <div class="feed-container">
      <div class="controls-container">
        <div class="controls">
          ${source_names && source_names.length > 0 && html`
            <div class="source-info">
              Showing articles from: ${source_names.join(', ')}
            </div>
            <div>
              <a href="/">Return to Main Feed</a>
            </div>
          `}
        </div>
      </div>
    </div>
    Code language: HTML, XML (xml)
    The interface when on a single source feed

    Performance Speed-Up

    One recurring issue I noticed in my use of the news reader pertained to slow load times. While some of this can be attributed to the “cold start” issue that serverless applications face, much of this was due to how the news reader was fetching pertinent articles from the database. It was deciding at the moment of the fetch request what was most relevant to send over by calculating all the pertinent scores and rank ordering. As the article database got larger, this computation became more complicated.

    To address this, I decided to move to a “pre-calculated” ranking system. That way, the system would know what to fetch in advance of a fetch request (and hence return much faster). Couple that with a database index (which effectively “pre-sorts” the results to make retrieval even faster), and I saw visually noticeable improvements in load times.

    But with any pre-calculated score scheme, the most important question is how and when re-calculation should happen. Too often and too broadly and you incur unnecessary computing costs. Too infrequently and you risk the scores becoming stale.

    The compromise I reached derived itself from the three ways articles are ranked in my system:

    1. The AI’s rating of an article plays the most important role (60%)
    2. How recently the article was published is tied with… (20%)
    3. How similar an article is with the 10 articles a user most recently read (20%

    These factors lent themselves to very different natural update cadences:

    • Newly scraped articles would have their AI ratings and calculated score computed at the time they enter the database
    • AI ratings for the most recent and the previously highest scoring articles would be re-computed after model training updates
    • On a daily basis, each article’s score was recomputed (focusing on the change in article recency)
    • The article similarity for unread articles is re-evaluated after a user reads 10 articles

    This required modifying the reader’s existing scraper and post-training processes to update the appropriate scores after scraping runs and model updates. It also meant tracking article reads on the users table (and modifying the /read endpoint to update these scores at the right intervals). Finally, it also meant adding a recurring cleanUp function set to run every 24 hours to perform this update as well as others.

    Next Steps

    With some of these performance and architecture improvements in place, my priorities are now focused on finding ways to systematically improve the underlying algorithms as well as increase the platform’s usability as a true news tool. To that end some of the top priorities for next steps in my mind include:

    • Testing new backbone models — The core ranking algorithm relies on Roberta, a model released 5 years ago before large language models were common parlance. Keras Hub makes it incredibly easy to incorporate newer models like Meta’s Llama 2 & 3, OpenAI’s GPT2, Microsoft’s Phi-3, and Google’s Gemma and fine-tune them.
    • Solving the “all good articles” problem — Because the point of the news reader is to surface content it considers good, users will not readily see lower quality content, nor will they see content the algorithm struggles to rank (i.e. new content very different from what the user has seen before). This makes it difficult to get the full range of data needed to help preserve the algorithm’s usefulness.
    • Creating topic and author feeds — Given that many people think in terms of topics and authors of interest, expanding what I’ve already done with Sources but with topics and author feeds sounds like a high-value next step

    I also endeavor to make more regular updates to the public Github repository (instead of aggregate many updates I had already made into two large ones). This will make the updates more manageable and hopefully help anyone out there who’s interested in building a similar product.

  • Who needs humans? Lab of AIs designs valid COVID-binding proteins

    A recent preprint from Stanford has demonstrated something remarkable: AI agents working together as a team solving a complex scientific challenge.

    While much of the AI discourse focuses on how individual large language models (LLMs) compare to humans, much of human work today is a team effort, and the right question is less “can this LLM do better than a single human on a task” and more “what is the best team-up of AI and human to achieve a goal?” What is fascinating about this paper is that it looks at it from the perspective of “what can a team of AI agents achieve?”

    The researchers tackled an ambitious goal: designing improved COVID-binding proteins for potential diagnostic or therapeutic use. Rather than relying on a single AI model to handle everything, the researchers tasked an AI “Principal Investigator” with assembling a virtual research team of AI agents! After some internal deliberation, the AI Principal Investigator selected an AI immunologist, an AI machine learning specialist, and an AI computational biologist. The researchers made sure to add an additional role, one of a “scientific critic” to help ground and challenge the virtual lab team’s thinking.

    The team composition and phases of work planned and carried out by the AI principal investigator
    (Source: Figure 2 from Swanson et al.)

    What makes this approach fascinating is how it mirrors high functioning human organizational structures. The AI team conducted meetings with defined agendas and speaking orders, with a “devil’s advocate” to ensure the ideas were grounded and rigorous.

    Example of a virtual lab meeting between the AI agents; note the roles of the Principal Investigator (to set agenda) and Scientific Critic (to challenge the team to ground their work)
    (Source: Figure 6 from Swanson et al.)

    One tactic that the researchers said helped with boosting creativity that is harder to replicate with humans is running parallel discussions, whereby the AI agents had the same conversation over and over again. In these discussions, the human researchers set the “temperature” of the LLM higher (inviting more variation in output). The AI principal investigator then took the output of all of these conversations and synthesized them into a final answer (this time with the LLM temperature set lower, to reduce the variability and “imaginativeness” of the answer).

    The use of parallel meetings to get “creativity” and a diverse set of options
    (Source: Supplemental Figure 1 from Swanson et al.)

    The results? The AI team successfully designed nanobodies (small antibody-like proteins — this was a choice the team made to pursue nanobodies over more traditional antibodies) that showed improved binding to recent SARS-CoV-2 variants compared to existing versions. While humans provided some guidance, particularly around defining coding tasks, the AI agents handled the bulk of the scientific discussion and iteration.

    Experimental validation of some of the designed nanobodies; the relevant comparison is the filled in circles vs the open circles. The higher ELISA assay intensity for the filled in circles shows that the designed nanbodies bind better than their un-mutated original counterparts
    (Source: Figure 5C from Swanson et al.)

    This work hints at a future where AI teams become powerful tools for human researchers and organizations. Instead of asking “Will AI replace humans?”, we should be asking “How can humans best orchestrate teams of specialized AI agents to solve complex problems?”

    The implications extend far beyond scientific research. As businesses grapple with implementing AI, this study suggests that success might lie not in deploying a single, all-powerful AI system, but in thoughtfully combining specialized AI agents with human oversight. It’s a reminder that in both human and artificial intelligence, teamwork often trumps individual brilliance.

    I personally am also interested in how different team compositions and working practices might lead to better or worse outcomes — for both AI teams and human teams. Should we have one scientific critic, or should their be specialist critics for each task? How important was the speaking order? What if the group came up with their own agendas? What if there were two principal investigators with different strengths?

    The next frontier in AI might not be building bigger models, but building better teams.

  • Huggingface: security vulnerability?

    Anyone who’s done any AI work is familiar with Huggingface. They are a repository of trained AI models and maintainer of AI libraries and services that have helped push forward AI research. It is now considered standard practice for research teams with something to boast to publish their models to Huggingface for all to embrace. This culture of open sharing has helped the field make its impressive strides in recent years and helped make Huggingface a “center” in that community.

    However, this ease of use and availability of almost every publicly accessible model under the sun comes with a price. Because many AI models require additional assets as well as the execution of code to properly initialize, Huggingface’s own tooling could become a vulnerability. Aware of this, Huggingface has instituted their own security scanning procedures on models they host.

    But security researchers at JFrog have found that even with such measures, have identified a number of models that exploit gaps in Huggingface’s scanning which allow for remote code execution. One example model they identified baked into a Pytorch model a “phone home” functionality which would initiate a secure connection between the server running the AI model and another (potentially malicious) computer (seemingly based in Korea).

    The JFrog researchers were also able to demonstrate that they could upload models which would allow them to execute other arbitrary Python code which would not be flagged by Huggingface’s security scans.

    While I think it’s a long way from suggesting that Huggingface is some kind of security cesspool, the research reminds us that so long as a connected system is both popular and versatile, there will always be the chance for security risk, and it’s important to keep that in mind.


  • NVIDIA to make custom AI chips? Tale as old as time

    Every standard products company (like NVIDIA) eventually gets lured by the prospect of gaining large volumes and high margins of a custom products business.

    And every custom products business wishes they could get into standard products to cut their dependency on a small handful of customers and pursue larger volumes.

    Given the above and the fact that NVIDIA did used to effectively build custom products (i.e. for game consoles and for some of its dedicated autonomous vehicle and media streamer projects) and the efforts by cloud vendors like Amazon and Microsoft to build their own Artificial Intelligence silicon it shouldn’t be a surprise to anyone that they’re pursuing this.

    Or that they may eventually leave this market behind as well.


  • Going from Formula One to Odd One Out

    Market phase transitions have a tendency to be incredibly disruptive to market participants. A company or market segment used to be the “alpha wolf” can suddenly find themselves an outsider in a short time. Look at how quickly Research in Motion (makers of the Blackberry) went from industry darling to laggard after Apple’s iPhone transformed the phone market.

    Something similar is happening in the high performance computing (HPC) world (colloquially known as supercomputers). Built to do the highly complex calculations needed to simulate complex physical phenomena, HPC was, for years, the “Formula One” of the computing world. New memory, networking, and processor technologies oftentimes got their start in HPC, as it was the application that was most in need of pushing the edge (and had the cash to spend on exotic new hardware to do it).

    The use of GPUs (graphical processing units) outside of games, for example, was a HPC calling card. NVIDIA’s CUDA framework which has helped give it such a lead in the AI semiconductor race was originally built to accelerate the types of computations that HPC could benefit from.

    The success of Deep Learning as the chosen approach for AI benefited greatly from this initial work in HPC, as the math required to make deep learning worked was similar enough that existing GPUs and programming frameworks could be adapted. And, as a result, HPC benefited as well, as more interest and investment flowed into the space.

    But, we’re now seeing a market transition. Unlike with HPC which performs mathematical operations requiring every last iota of precision on mostly dense matrices, AI inference works on sparse matrices and does not require much precision at all. This has resulted in a shift in industry away from software and hardware that works for both HPC and AI and towards the much larger AI market specifically.

    Couple that with the recent semiconductor shortage (making it harder and more expensive to build HPC system with the latest GPUs) and the fact that research suggests some HPC calculations are more efficiently simulated with AI methods than actually run (in the same way that NVIDIA now uses AI to take a game rendered at a lower resolution and simulate what it would look like at a higher resolution more effectively than actually rendering the game at a higher resolution natively), I think we’re beginning to see traditional HPC shift from “Formula One of computing” to increasingly the “odd one out”.


    Trying to Do More Real HPC in an Increasingly AI World
    Timothy Prickett Morgan | The Next Platform

  • Are Driverless Cars Safer? (This time with data)

    I’m over two months late to seeing this study, but a brilliant study design (use insurance data to measure rate of bodily injury and property damage) and strong, noteworthy conclusion (doesn’t matter how you cut it, Waymo’s autonomous vehicle service resulted in fewer injuries per mile and less property damage per mile than human drivers in the same area) make this worthwhile to return to! Short and sweet paper from researchers from Waymo, Swiss Re (the re-insurer), and Stanford that is well worth the 10 minute read!


  • AI for Defense

    My good friend Danny Goodman (and Co-Founder at Swarm Aero) recently wrote a great essay on how AI can help with America’s defense. He outlines 3 opportunities:

    • “Affordable mass”: Balancing/augmenting America’s historical strategy of pursuing only extremely expensive, long-lived “exquisite” assets (e.g. F-35’s, aircraft carriers) with autonomous and lower cost units which can safely increase sensor capability &, if it comes to it, serve as alternative targets to help safeguard human operators
    • Smarter war planning: Leveraging modeling & simulation to devise better tactics and strategies (think AlphaCraft on steroids)
    • Smarter procurement: Using AI to evaluate how programs and budget line items will actually impact America’s defensive capabilities to provide objectivity in budgeting

  • Pixel’s Parade of AI

    I am a big Google Pixel fan, being an owner and user of multiple Google Pixel line products. As a result, I tuned in to the recent MadeByGoogle stream. While it was hard not to be impressed with the demonstrations of Google’s AI prowess, I couldn’t help but be a little baffled…

    What was the point of making everything AI-related?

    Given how low Pixel’s market share is in the smartphone market, you’d think the focus ought to be on explaining why “normies” should buy the phone or find the price tag compelling, but instead every feature had to tie back to AI in some way.

    Don’t get me wrong, AI is a compelling enabler of new technologies. Some of the call and photo functionalities are amazing, both as technological demonstrations but also in terms of pure utility for the user.

    But, every product person learns early that customers care less about how something gets done and more about whether the product does what they want it too. And, as someone who very much wants a meaningful rival to Apple and Samsung, I hope Google doesn’t forget that either.


  • The “Large Vision Model” (LVM) Era is Upon Us

    Unless you’ve been under a rock, you’ll know the tech industry has been rocked by the rapid advance in performance by large language models (LLMs) such as ChatGPT. By adapting self-supervised learning methods, LLMs “learn” to sound like a human being by learning how to fill in gaps in language and, by doing so, become remarkably adept at solving not just language problems but understanding & creativity.

    Interestingly, the same is happening in imaging, as models largely trained to fill in “gaps” in images are becoming amazingly adept. A friend of mine, Pearse Keane’s group at University College of London, for instance, just published a model trained using self-supervised learning methods on ophthalmological images which is capable of not only diagnosing diabetic retinopathy and glaucoma relatively accurately, but relatively good at predicting cardiovascular events and Parkinson’s.

    At a talk, Andrew Ng captured it well, by pointing out the parallels between the advances in language modeling that happened after the seminal Transformer paper and what is happening in the “large vision model” world with this great illustration.

    From Andrew Ng (Image credit: EETimes)

  • Dr. Machine Learning

    How to realize the promise of applying machine learning to healthcare

    Not going to happen anytime soon, sadly: the Doctor from Star Trek: Voyager; Source: TrekCore

    Despite the hype, it’ll likely be quite some time before human physicians will be replaced with machines (sorry, Star Trek: Voyager fans).

    While “smart” technology like IBM’s Watson and Alphabet’s AlphaGo can solve incredibly complex problems, they are probably not quite ready to handle the messiness of qualitative unstructured information from patients and caretakers (“it kind of hurts sometimes”) that sometimes lie (“I swear I’m still a virgin!”) or withhold information (“what does me smoking pot have to do with this?”) or have their own agendas and concerns (“I just need some painkillers and this will all go away”).

    Instead, machine learning startups and entrepreneurs interested in medicine should focus on areas where they can augment the efforts of physicians rather than replace them.

    One great example of this is in diagnostic interpretation. Today, doctors manually process countless X-rays, pathology slides, drug adherence records, and other feeds of data (EKGs, blood chemistries, etc) to find clues as to what ails their patients. What gets me excited is that these tasks are exactly the type of well-defined “pattern recognition” problems that are tractable for an AI / machine learning approach.

    If done right, software can not only handle basic diagnostic tasks, but to dramatically improve accuracy and speed. This would let healthcare systems see more patients, make more money, improve the quality of care, and let medical professionals focus on managing other messier data and on treating patients.

    As an investor, I’m very excited about the new businesses that can be built here and put together the following “wish list” of what companies setting out to apply machine learning to healthcare should strive for:

    • Excellent training data and data pipeline: Having access to large, well-annotated datasets today and the infrastructure and processes in place to build and annotate larger datasets tomorrow is probably the main defining . While its tempting for startups to cut corners here, that would be short-sighted as the long-term success of any machine learning company ultimately depends on this being a core competency.
    • Low (ideally zero) clinical tradeoffs: Medical professionals tend to be very skeptical of new technologies. While its possible to have great product-market fit with a technology being much better on just one dimension, in practice, to get over the innate skepticism of the field, the best companies will be able to show great data that makes few clinical compromises (if any). For a diagnostic company, that means having better sensitivty and selectivity at the same stage in disease progression (ideally prospectively and not just retrospectively).
    • Not a pure black box: AI-based approaches too often work like a black box: you have no idea why it gave a certain answer. While this is perfectly acceptable when it comes to recommending a book to buy or a video to watch, it is less so in medicine where expensive, potentially life-altering decisions are being made. The best companies will figure out how to make aspects of their algorithms more transparent to practitioners, calling out, for example, the critical features or data points that led the algorithm to make its call. This will let physicians build confidence in their ability to weigh the algorithm against other messier factors and diagnostic explanations.
    • Solve a burning need for the market as it is today: Companies don’t earn the right to change or disrupt anything until they’ve established a foothold into an existing market. This can be extremely frustrating, especially in medicine given how conservative the field is and the drive in many entrepreneurs to shake up a healthcare system that has many flaws. But, the practical reality is that all the participants in the system (payers, physicians, administrators, etc) are too busy with their own issues (i.e. patient care, finding a way to get everything paid for) to just embrace a new technology, no matter how awesome it is. To succeed, machine diagnostic technologies should start, not by upending everything with a radical solution, but by solving a clear pain point (that hopefully has a lot of big dollar signs attached to it!) for a clear customer in mind.

    Its reasons like this that I eagerly follow the development of companies with initiatives in applying machine learning to healthcare like Google’s DeepMind, Zebra Medical, and many more.