---
title: "From Zero to Product with Generative AI"
author: "Kamil Nicieja"
url: "https://read.kamil.fyi/2/from-zero-to-product-with-generative-ai"
---

# about this book

In most industries, I’d be considered middle-aged or even young by some standards, having just passed 30. However, in tech—I’m old. (Can’t imagine how people who remember the dot-com bubble must feel.)

This means I experienced the earlier wave of AI firsthand, which eventually became known as machine learning. I witnessed attempts to create products similar to ChatGPT using technology that we now recognize as a dead end. Archaic.

But it also means I’m uniquely equipped for this new generative age. Many topics that are new to those who recently discovered the benefits and challenges of, say, chatbots are familiar to me. I learned them the hard way. In 2015, I tried to build a company around this technology. But it was too early, and we failed. Now, I read the same pitches I once made on the landing pages of others and think, *what if.*

Thankfully, every failure brings gifts too, known as experience. The goal of this book is to share that experience with you, my reader, and equip you with the skills to:

- Grasp the fundamentals of generative AI based on practical case studies with just the right amount of theory

- Incorporate gen AI methods into your product design efforts

- Create new applications, ventures, and startups using generative AI technologies such as OpenAI’s GPT-4, DALL-E, or open-source alternatives like Llama and Stable Diffusion

- Hone your ability to effectively prompt these models

- Navigate the complete journey of crafting products powered by generative AI, from budget allocation and design to selecting between in-house and third-party solutions, and then to building prototypes

The book discusses various industries that could be transformed by generative AI, providing case studies to explain these impacts. It also explores both real and hypothetical examples of products to show how this emerging technology is reshaping the way the tech industry approaches the design, prototyping, and implementation of apps, services, and experiences.

- You can download the PDF version from [here.](https://drive.google.com/file/d/19vgAghHNtCBBJsSieL7za_Rff0Ox0DUf/view?usp=sharing)

- The EPUB file is available for download [here.](https://drive.google.com/file/d/1hZ1qdQUewS4hj3KI-oQtNUH01LThprzw/view?usp=sharing)

## Who should read this book

This book is intended for intermediate-level readers eager to apply generative AI models, particularly in the realm of new product development. When I refer to “product design,” I’m talking about more than just the user experience or user interface. I mean the comprehensive, high-level process of developing a product from start to finish. Keep in mind: design isn’t just what something looks and feels like—the design is how it *works.*

While this book is technical in nature, coding skills aren’t a prerequisite for grasping its content. It’s tailored for professionals—be they engineers, designers, project managers, executives, or founders—who have already brought products to market and are now seeking an introductory guide to integrating generative AI into their process.

Instead of focusing too much on technical challenges, this book maintains a high-level perspective. This approach allows non-engineers to understand how they can meaningfully contribute to their team’s AI initiatives without necessarily running code themselves. For those with technical expertise, the book offers insights into practical applications of their deep knowledge of AI model internals, especially when it comes to developing and launching new applications.

If you’re not familiar with the core terminology of modern artificial intelligence—concepts such as models, prompts, training, tokens, and hallucination, to name a few—this book will provide some foundational understanding. However, we won’t dwell on these terms excessively. If you find you need a deeper dive into such topics, it would be beneficial to consult additional resources before returning to this book.

## About the author

My name is Kamil Nicieja. I’m a programmer and a product person who’s worked with tech companies in the US and Europe. I’m currently a lead product engineer at Plane, a Y Combinator startup building a payroll, benefits, and compliance platform for fast-growing companies.

Beyond my software development work, I’ve also written [a book about product management](https://www.amazon.com/Writing-Great-Specifications-Specification-Example/dp/1617294101) and [co-founded a few startups myself.](https://kamil.fyi/projects/) I was once featured in Forbes’ 30 Under 30. But don’t worry, I’m not gonna defraud you… well, at least not too much. I can be wrong sometimes, you know, even when I write like I think that I can’t.

The concept for this book sprang from articles written for [“Before Growth,”](https://kamil.fyi) my newsletter about startups and their builders before product-market fit. Become a member and I’ll bring you research on the top companies of the future before they even step out of their garages.


Mechanical robot with articulating limbs and high-tech circuitry, balancing on a gleaming, towering wave, mid-action, sea spray misting around, ocean sparkling with sunlight, set against a clear blue sky, digital painting, highly detailed, dramatic lighting

# 1. Introduction to generative AI

When it comes to AI, we’re still in the early days. The majority of people haven’t yet adopted these tools for their work, hobbies, or day-to-day lives.

Some have experimented with early versions and found them lacking, abandoning ship before the current wave of tools, with their significantly improved capabilities, came into wider use. Even those who have found some applications often don’t venture beyond their starting point, opting to use these models for basic outputs.

Just recently, I assisted a friend who runs a one-person business with employing AI as a virtual marketing consultant. We used the tool to generate content and strategic ideas for the company. Yet, like all things, the assistant had its limitations. It needed many guiding questions and iterative refinements to produce truly valuable output. If you just skim the surface, it will do so, too. But the outcomes were still notably better than what you’d expect from someone with no marketing experience.

Another friend of mine, who isn't into programming, asked me if I could suggest any articles or books that could help them understand a technical concept better.

I suggested a book—and also recommended using ChatGPT as a tutor, which I’ve been doing more often myself. The app is essentially a talking encyclopedia. (A Borgesian nightmare.) It easily breaks down barriers previously caused by a lack of skill, talent, or knowledge.

Not sure how to do something? Pair up with ChatGPT, Claude, or Gemini. Tell it about your current skill level and ask for guidance. It will adapt to your needs, allow you to ask follow-up questions, and even provide practical examples when possible.

I don’t often use it this way for tasks within my expertise, but it’s great for everything else. Just this week, I teamed up with it to brush up on some basic legal concepts related to my business. I later verified the information with a lawyer friend, but thanks to that previous chat, I already had a good understanding of the topic, which saved my friend some time getting me up to speed.

And while I believe the term “prompt engineering,” which we’ll talk about in the later chapters, is overhyped, I *do* think a particular mindset is necessary when working with neural models: you’ve got to guide and steer them. It’s an iterative refinement process, and anything less will yield rather superficial results—at least as things stand now. Paradoxically, it’s more art than science.

So, if you’re anxiously searching for an avenue to break into the AI field, concerned you’re already behind due to the progress of others, here’s a tip: learn to prompt well and then help somebody who can’t. It's a simple concept, true, but again—we’re still **early.** Most people don’t use any large language models daily yet. And these apps and models are… broad, to put it mildly. The interface is essentially just a text box, leaving it up to you to figure out how to make it useful for your needs or to learn from how others are using it.

As you can probably guess—this book is my way of helping.


## 1.1. What’s generative AI?

Generative AI is a class of artificial intelligence models designed with the specific goal of creating new data samples that resemble a given dataset. 

Unlike discriminative models, which focus on classifying or distinguishing between existing data points, generative models aim to understand the underlying data distribution in order to produce novel instances of data. The concept originates from probability theory and statistics but has seen a vast expansion in scope and complexity due to advancements in machine learning techniques and computational resources.

The most prominent architectures in the realm of generative AI include generative adversarial networks, variational autoencoders, and more recently, diffusion models. These architectures serve various purposes: GANs are excellent at generating high-quality and realistic images; VAEs are well-suited for generating new samples while also offering a structured latent space; and diffusion models have found success in applications that require iterative refinement of generated data, like image denoising or text-to-image tasks.

The applications of generative AI are extensive and cross-disciplinary. In the creative industries, for instance, these models have been used for art creation, music composition, and even scriptwriting. In science and healthcare, they find applications in drug discovery, simulating molecular structures, and generating synthetic medical data for research. Generative AI plays a crucial role in data augmentation, particularly useful in scenarios where acquiring real-world data is challenging or ethically problematic.

While generative AI has seen tremendous success, it is essential to consider the implications of its capabilities. For example, it can generate deepfakes that convincingly replace a person’s likeness and voice, or produce synthetic data that may inadvertently encode and perpetuate existing biases. Alongside the technological advancements, there is a parallel track of ethical and governance considerations that guide how this technology should be used responsibly.

## 1.2. Will your industry experience any impact?

Every industry will experience generative AI’s impact. The only question is how much. 

Obvious examples include industries like marketing, which could use generative AI for creating ad copy, social media posts, or even for strategy planning. In the legal industry, AI could help in drafting legal documents or contracts based on a set of user-defined parameters. This would speed up many administrative tasks, allowing lawyers to devote more time to complex legal issues. In HR, AI could assist in the initial stages of candidate screening by generating interview questions based on the specific needs and culture of the company, or even by assessing the suitability of applicants through automated analysis of resumes and cover letters.

What might this actually look like in a real-world scenario? Let’s walk through a brief case study to visualize some concrete effects.

The gaming industry stands to gain significantly from generative AI, too. We can think of an extreme example: a game that’s entirely customized for each player’s unique experience, generating assets on-the-spot. This could involve creating a personalized narrative, textures, and dialogue based on the player’s decisions, thereby offering a completely non-linear gaming experience. However, it’s probably safe to say that we’re still many, many years away from being able to pull something like this off on a AAA scale.

But what about indie games? Take Jussi Kemppainen from Dinosaurs Are Better as an example. He’s single-handedly [developing](https://echoesofsomewhere.com/category/devblog) an entire adventure game, using AI assistance in all aspects of game design, ranging from character creation to coding, dialogue crafting to graphic design.

Let’s consider another application.

Imagine a detective game where you play as a police investigator trying to solve a murder. Your task is to interview numerous witnesses and suspects to discover the true culprit. Each character you interact with would be powered by a sophisticated model, each embodying a specific persona. These personas would have two stories: the one they willingly share with investigators, and the hidden truth that they’d prefer to keep secret for various reasons. As the player, your goal would be to interrogate each persona, gradually unraveling the truth to identify the real offender. To add an extra layer of complexity, we could introduce a judge persona. Instead of merely choosing who they think is guilty, players would need to build and present a compelling case to the judge, effectively discouraging random guesswork.

The user interface for such a game could be as simple as a chat window, with the main gameplay focusing primarily on dialogue-based interaction. I believe it’s feasible to create a prototype of such a game with our current technology. This could serve as a powerful proof of concept for the kind of non-linear, immersive experiences generative AI could eventually enable in the gaming world.

I trust these examples will ignite your imagination, inspiring you to consider how these AI models could reshape your field—even right now, let alone what they might achieve in the next five to ten years.

## 1.3. Types of generative AI

Now, let’s transition from this broad overview to dig into the nuts and bolts of generative AI. This book mainly focuses on two types of gen AI: large language models and text-to-image models. Let’s quickly go over the fundamentals of each.

### Large language models

Large language models are constructed using machine learning architectures, particularly deep learning, to process and generate natural language based on a given input. LLMs have grown to become more intricate and are now capable of generating text that often mirrors human-level understanding and syntax. They are a product of advancements in computational power, algorithmic optimization, and vast amounts of data.

These models are typically trained on a broad corpus of text data, encompassing everything from books and articles to websites and social media posts. This eclectic data gathering aims to equip the model with a generalized understanding of human language, including its nuances and subtleties.

One of the major breakthroughs that propelled LLMs into the spotlight is the transformer architecture, which excels at handling sequences and relationships between words. Due to their versatile capabilities, LLMs have been applied in a wide range of applications—ranging from chatbots and customer service to data analytics and automated journalism.

It is crucial to recognize that LLMs are not sentient beings. Their seemingly insightful output is a byproduct of statistical patterns learned during training rather than a manifestation of understanding or consciousness. While the technology is promising, it also comes with ethical considerations, such as data privacy, fairness, and the potential for misuse.

### Text-to-image models

Text-to-image models based on diffusion techniques represent another subset of generative models. Diffusion models, initially developed for tasks like denoising and inpainting, exploit the process of iteratively refining a random noise sample into a target output. In this context, the target is a realistic image that corresponds to a given textual description.

The adoption of diffusion techniques for text-to-image tasks capitalizes on their potential to capture intricate dependencies between textual input and visual output. This makes them adept at generating images with nuanced details that closely align with the accompanying text description. These diffusion-based approaches can be particularly effective when integrated with language models, combining the textual understanding of language models with the generative prowess of diffusion processes.

Figure 1 shows an example derived from the text below it.

 ![A surfer riding a wave in the sea while wearing a VR headset](https://read.kamil.fyi/u/1c7c84d259f871b58bd83746da3ec827-rrpJsU.jpeg)

>A surfer riding a wave in the sea while wearing a VR headset 

While they demonstrate compelling results, text-to-image models based on diffusion techniques are computationally intensive due to their iterative nature, and thus require efficient implementation and powerful hardware. Additionally, their performance is influenced by the quality of the textual description, posing challenges for ambiguous or abstract prompts.

## 1.4. What’s a prompt?

Prompts serve as the input queries that guide these models to generate a specific type of output. The function of a prompt varies slightly between these two kinds of generative models, but the core principle remains consistent: a prompt initiates the generation process and steers the model toward producing content that aligns with the user’s intention.

For large language models, prompts are typically text-based queries or statements. They can range from simple requests, such as “Tell me about the history of Rome,” to more complex or conditional queries, like “Write a persuasive essay arguing for renewable energy adoption.” The prompt is fed into the model, which then produces text that ideally satisfies the query, all while adhering to the grammar, tone, and context specified.

In the case of text-to-image models, prompts still serve as input queries, but the output is visual rather than textual. Here, the text-based prompts may include descriptions or features that the user wants to see in the generated image. For example, a prompt like “a serene beach at sunset” would guide the model to generate an image of a beach with qualities that could be described as *serene* and with lighting conditions consistent with a sunset.

Naturally, various models yield distinct outcomes, but prompts always shape the contours of the output—whether that output is a block of text or a visual image.

## 1.5. Foundational models

While these models come in many forms, they all share a common characteristic: they’re large. *Exceptionally* large.

The size of these models is quantified in terms of parameters. A model parameter is an internal configuration variable whose value can be derived from training data. Essential for making predictions, these parameters determine the model’s effectiveness in solving specific problems. They are learned aspects of the model, gleaned from historical training data. Parameters are fundamental to the operation of machine learning algorithms, and some of these models boast hundreds of billions of them.

The sheer scale of these models precludes individual developers or small organizations from training them, due to prohibitive computational and financial costs. As a result, they are typically developed by major tech companies or heavily-funded startups. This is why these models are often termed “foundational,” as they serve as the base upon which other applications and services are built.

We’ll explore some of the most prominent foundational models available at the time this book was written.

### GPT

OpenAI introduced the GPT model in 2018, featuring a 12-layer transformer decoder equipped with a self-attention mechanism. It was trained on the BookCorpus dataset, containing over 11,000 freely available novels.

Following this, GPT-2 was unveiled in 2019, boasting 1.5 billion parameters—a significant increase from GPT-1’s 117 million. GPT-3, launched later, employs a neural network with 96 layers and an astounding 175 billion parameters. It’s trained on the expansive 500-billion-word Common Crawl dataset.

The most recent iteration, the GPT-4 family of models, was released in late 2022 and made headlines by passing the Uniform Bar Examination with a score of 297, equivalent to a 76% success rate. As of the time this book was written, ChatGPT, arguably the world’s most popular generative AI tool, runs on the GPT-4o architecture for both its free version and the premium plan.

### Gemini

Google Gemini is a family of AI models, similar to OpenAI’s GPT, designed to be multimodal. This means they can understand and generate text like LLMs and also natively process images, audio, videos, and code. Unlike some models that add these capabilities later, Gemini integrates them from the start. 

One significant feature Google highlights is Gemini’s “long context window.” This allows a prompt to include extensive information, improving the model’s response quality and resource acc~~~~ess. Gemini 1.5 Pro currently supports a context window of up to a million tokens, with plans to expand to two million tokens soon. This capacity can accommodate a 1,500-page PDF, enabling users to upload large documents and query Gemini about their contents.

### Claude

Claude 3.5 Sonnet, the latest model from Anthropic, is designed to enhance performance in reasoning, coding, and safety. It not only surpasses GPT-4o and Gemini 1.5 Pro in several benchmarks but also introduces an impressive new feature called Artifacts. 

These Artifacts are unique windows within the Claude interface that provide detailed, standalone content in response to user requests. Unlike typical chatbot replies, Artifacts are interactive and editable, offering a variety of content types. This innovation marks a significant shift, transforming Claude from a simple conversational AI into a versatile collaborative work tool.

### Llama

Unlike many high-capacity language models that are typically restricted to limited API access, Meta made Llama’s model weights available to the research community under a noncommercial license. The weights were leaked to the public using 4chan and BitTorrent within a week of the release.

By July 2023, Meta rolled out Llama 2 with models featuring 7, 13, and 70 billion parameters. The unique aspect of Meta’s approach is its near open-source nature; the license only prohibits using Llama 2 for training other language models and mandates a special license for applications or services exceeding 700 million monthly users.

Later, Meta introduced Llama 3.1 405B, another open-source language model. Experimental evaluations indicate that it competes effectively with top closed models such as GPT-4, GPT-4o, and Claude 3.5 Sonnet across a wide range of tasks—the first open-source model to do so.

### Stable Diffusion

Stable Diffusion, launched in 2022, is a text-to-image model capable of generating high-definition images that appear remarkably realistic. Utilizing both noising and denoising techniques, its diffusion model learns how to craft images. Unlike larger counterparts such as DALL-E, Stable Diffusion is more compact, alleviating the need for extensive computational resources. Remarkably, it can operate on a standard graphics card or even a smartphone equipped with a Snapdragon Gen2 platform.

### DALL-E

OpenAI’s DALLE-E 3 is an advanced text-to-image generator that converts textual prompts into striking visual content. An upgrade from its earlier version, this newest iteration features an integration with ChatGPT. This synergy allows users to effortlessly generate top-notch visuals either by inputting descriptive text or sourcing prompt ideas from ChatGPT. Access to DALL-E 3 is provided through ChatGPT Plus and is also available to Enterprise customers who subscribe to the paid version of the chatbot platform.

### Midjourney

Midjourney stands out as a generative AI capable of transforming natural language prompts into images. While it’s among several recent machine learning-driven image generators, it has carved a niche for itself, joining the ranks of prominent AI names like DALL-E and Stable Diffusion.

Using Midjourney, you can produce high-quality images from textual prompts. It operates solely through the Discord chat app, eliminating the need for specialized hardware or software. However, a notable drawback is its cost—unlike many competitors that offer initial free image generations, Midjourney requires a payment from the outset.

### Firefly

Adobe Firefly, developed by the software company behind Photoshop and Illustrator, is a collection of generative AI tools. Like top-tier AI art generators, it employs a model trained to discern links between text and visuals, enabling users to craft images using text descriptions.

Distinctive features differentiate Adobe Firefly from competitors like Midjourney, Stable Diffusion, and DALL-E. Notably, its emphasis on ethical practices stands out. While many AI models have been trained on indiscriminately sourced internet images, often disregarding copyright, Firefly's training exclusively utilized open-source images, out-of-copyright content, and Adobe Stock materials.

### Hugging Face

Hugging Face serves as a platform that provides open-source resources for constructing and deploying machine learning models. It functions as a communal hub where developers can exchange and discover both models and datasets. While individual membership is complimentary, enhanced access is available through paid subscriptions. The platform grants public access to an extensive collection of nearly 200,000 models and 30,000 datasets.

### Which AI model should you use?

Here’s a concise overview of the top-performing models for specific tasks as of October 2023:

- Reasoning → GPT-4o
- Writing and brainstorming → Claude
- Translation and large context → Gemini
- Private and sensitive data → Llama
- Self-hosted tasks → Llama
- Images → DALL-E 3 
- Complex document understanding → Claude Opus
- Coding → Claude 3.5 Sonnet
- Web search → GPT-4o

Selecting an appropriate model involves weighing factors such as result quality for distinct tasks, cost, speed, and availability. The decision is multifaceted. Chapter 4 will dig deeper into the intricacies of LLM economics, drawing parallels and distinctions with other cloud-based services.

## 1.6. Real-world product categories

There are a few categories of products where these models are already being used in.

The leading category in this space is AI assistants, which have evolved significantly beyond earlier versions like Apple’s Siri or Amazon’s Alexa. The key distinction lies in how interactions are programmed. Earlier assistants relied on hand-coded responses, excelling in narrow tasks like weather reporting but falling short in complex assignments. In contrast, LLMs can easily draft a creative homework assignment, but are, in exchange, prone to factual inaccuracies due to a phenomenon known as “hallucinations.” We’ll discuss it in the next chapter. So while these models represent the most comprehensive aggregation of human knowledge to date, they lack an inherent understanding of truth.

AI assistants are typically designed for general-purpose use, capable of performing a wide range of tasks. However, a burgeoning subcategory focuses on AI companionship. These are specialized AI personas that serve as your digital friends, peers, or acquaintances. For example, apps like Character allow you to engage in conversations with characters from your favorite books and movies just for entertainment. Meanwhile, platforms like Facebook are developing AI personas intended to act as, say, your personal trainer, motivating you to hit the gym.

Another category includes AI utilities, which can be integrated either into the front end or the back end of existing products. On the front end, these utilities could include features like document summarization in platforms like Google Docs, voice transcription in Zoom, or automated thumbnail generation for WordPress articles. On the back end, large language models can be employed for tasks like optical character recognition or tagging unstructured text data at scale. While these capabilities offer substantial utility, they usually need to be incorporated into broader products to be truly effective, as they provide limited standalone value.

The most innovative yet least validated category in the realm of generative AI consists of autonomous agents. While it’s debatable whether these models can truly think like humans, they have demonstrated the ability to mimic certain aspects of human thought. For instance, they can tackle logical problems, ranging from simple to complex, and some can even ace university exams. 

This has led to their use in creating task-solving loops. Essentially, you can instruct a model to devise its own action plan, then have it execute the steps it outlined. A rudimentary example would involve asking a large language model to generate a table of contents for a book, then having it create subsequent chapters based on that framework until the book is complete. While potentially revolutionary for multiple industries, these autonomous agents are still in early stages, delivering results that are too inconsistent for reliable production use.

Newspaper production process captured in a series of high-resolution photographs, workers diligently operating machinery, stacks of blank newsprint rolling off printing presses, sheets of printed paper traveling through automatic collating systems, graphic designers reviewing copy for the next edition, light flooding in from skylights onto the bustling factory floor, ambient sound of machines in motion, warm hues of brown newsprint contrasting with the crisp, white manila folders

# 2. Inputs and outputs

Like many others, my interest in text-based AI was sparked when I started using Siri on my iPhone.

Siri, a chatbot developed before the advent of modern generative AI, has interactions that are predefined rather than limitless. Its capabilities, such as providing weather updates, were anticipated and manually programmed by Apple engineers. This involves querying an API for weather conditions to answer such queries.

But if you ask Siri to write a poem about a recent event, it won’t be able to assist, as its responses to such specific requests aren't predefined. It’ll probably direct you to a Google search instead. In contrast, modern AI assistants can autonomously create a poem, albeit of varying quality, by filling in the details themselves.

These questions and commands—they’re called prompts. When prompted, a large language model generates text, while a diffusion model creates images. In this chapter, we’re going to examine the types of prompts these models can handle, the underlying mechanics that drive their responses, the structure of prompts, and the ways in which they can be creatively applied in product design.


## 2.1. Simple prompts

Let’s begin with something straightforward. I’ve compiled a selection of good prompts I’ve used with ChatGPT in the past 30 days. If you haven’t had the chance to interact with a large language model, this list should provide you with an overview of the fundamental capabilities of models like GPT-4.

“Can you provide 10 alternative names for this piece of code?” This could be for a method, variable, or constant.

“I’m not a native English speaker. Can you help me rewrite this sentence or paragraph to sound more like one?” Sometimes I’m just fine with sacrificing some of my personal voice for the sake of clearer and more professional English.

I often request translations, too. GPT-4 has already proven to be much more accurate and reliable than Google Translate in this aspect.

“Could you interpret this code for me? I’d appreciate a step-by-step explanation.” If the explanation is unclear, then it’s likely that my fellow engineers might also find it hard to understand, indicating a need for refactoring.

“Let’s discuss the new feature I’m developing. I’d like you to act as a team member and review my proposed implementation method step by step.” Alternatively: “I’m faced with a particular issue or feature to be developed. Could you suggest how you might approach it?” This is particularly helpful when I hit a creative roadblock.

Conventionality checks. ChatGPT typically defaults to a safe, conventional response. This can be limiting when looking for creative inputs, but in certain contexts, such as some engineering decisions, a conventional response might be ideal as it’s likely to be more widely understood.

Pros and cons. Ask ChatGPT for arguments in favor of and against a certain concept. The responses can help gauge how your thoughts align with or deviate from common viewpoints.

Finding synonyms. With ChatGPT readily available, it’s quicker to ask it for synonyms than to look them up in a dictionary.

Tackling anything I’m below average at. Take, for instance, naming characters in the short stories I write. I used to struggle with this, so now I ask ChatGPT for similar character names based on those I like. Even if none of the suggestions strike a chord, I can ask for the reasoning behind each name and use that as a springboard for further ideas.

“I need to write a performance review, create a blog article, or draft a long email, and so on—but I’m finding it hard to start. Can you ask me questions about the person or topic until we’ve gathered enough information to form a comprehensive review?” This approach helps overcome writer’s block, as conversations with ChatGPT are informal and free-flowing.

We can identify some basic patterns in these prompts. As a programmer, I often use ChatGPT like a peer to help solve coding problems. I also write a lot, so it serves as a writing partner for me. This approach reflects how many others interact with these models and the trend in early modern AI products. They were job-focused, leading to the emergence of AI roles like programmers, lawyers, security consultants, writers, and more.

### Case study: Creating useful products with simple prompts

A year and a half later, after the hype subsided just a bit, I think of large language models as on-demand intelligence accessible through APIs. Whenever my software needs to use fuzzy logic or analyze unstructured data, I can just send an HTTP request and get the insights I need.

Let’s consider a real-world scenario. Imagine an applicant tracking system that filters candidates using specific keywords. Each resume must be reviewed by a recruiter to see if it matches their criteria. For example, they might be searching for a candidate with seven years of experience in a particular technology.

The top companies often get hundreds—if not thousands—of applications, resulting in a lot of repetitive manual work. Moreover, recruiters are usually not experts in technology. To compensate, they come up with their own heuristics, which might lead false positives or false negatives.

With large language models, rather than depending on these filters, we can tap into AI’s intelligence by making an API call. (Shoutout to [Patrick McKenzie on Twitter](https://twitter.com/patio11/status/1768645785162289616) for the inspiration behind this prompt.)

> Suppose you’re looking for someone with at least seven years of Python experience. A developer’s resume indicates they’ve been using Django since version 1.8. Explain why you’d decide to include or exclude this candidate.

…and get a response:

> Include them. Django 1.8’s release in 2015 implies over 7 years of Python experience.

I wasn’t sure myself since I’m a Ruby developer. But after looking into it, I confirmed this is the right decision. Great!

Building an applicant tracking system like that without AI would require engineers to either integrate a complex expert system directly into their product, or allow recruiters to set up a rigid job application workflow on their own. But LLMs can easily read the job listing and the resume by using their extensive knowledge—on demand.

>**AI bias in recruitment**
>
>A friend reached out after reading an earlier draft of this chapter and suggested that HR might not be the best example. That's true! Research indicates that built-in biases can influence candidate evaluations made by AI. 
>
>However, my point is that automated filters should be straightforward enough to minimize this issue. We don’t want the LLM to evaluate the entire profile and make any decisions on its own; it should just process the unstructured text, extract metadata at scale, and make very simple filters based on **your** existing criteria.

So here’s the best way to understand how these models are useful today, despite not yet being smarter than humans: if you assign them specific, concrete tasks, they can help you achieve good results efficiently. Turns out, my perspective on AI systems hasn’t changed much over the past decade: they can deliver what a thousand interns would, but much faster and at a lower cost.

The major difference this time is that the tech stack has become much simpler to use due to commoditization. What used to be accessible only to Big Tech only has now become available to everyone.

## 2.2. Hallucinations

So, does that mean the task is complete? Just slap *AI* on every white-collar job and voilà, a fresh product idea emerges, and product design becomes a solved discipline. Well, not exactly. Despite their skill, these models are not all-powerful.

Because they hallucinate.

Hallucinations in large language models refer to instances where the output is coherent and grammatically correct, yet factually inaccurate or nonsensical. In this context, a “hallucination” is the creation of false or misleading information. Such errors can emerge from several causes, including limited training data, biases within the model, or the complex nature of language itself.

For example, in February 2023, Google’s chatbot, Bard, mistakenly stated that the James Webb Space Telescope captured the first image of a planet outside our solar system. This was inaccurate; NASA confirmed that the first images of an exoplanet were actually captured in 2004. Furthermore, the James Webb Space Telescope wasn’t launched until 2021.

And in June 2023, there was a case where a New York attorney used ChatGPT to draft a motion that contained fabricated judicial opinions and legal citations. The attorney faced sanctions and fines, claiming that he was unaware that ChatGPT had the capability to generate fictitious legal cases.

The key issue with hallucinations is that they are not merely a glitch; they are inherent to the design. These models are generative in nature. Without the ability to deviate from their training data, they would be restricted to replicating what they’ve previously encountered, limiting their usefulness. They would be reduced from reasoning tools to mere search algorithms—a problem search engines like Google have already addressed. Fundamentally, these models don’t possess definitive answers to the questions posed to them. Their core function is to predict the next word based on probabilities. When given a prompt, they identify the most likely word sequence that resembles responses to similar inputs in their datasets.

## 2.3. Prompt engineering

Is it possible to solve the problem of hallucinations in generative AI models? And if so, how? Considering the probabilistic nature of these models, complete eradication of hallucinations isn’t feasible. However, there are developed techniques to mitigate the issue. Since LLMs are trained for next-word prediction, the solution partly lies in the formulation of the question itself. By refining our prompts or questions, we can achieve more accurate and reliable outcomes.

That’s the reason behind the emergence of prompt engineering. This new field focuses on crafting and optimizing prompts to maximize the efficiency of language models across diverse applications and research areas. Developers engage in prompt engineering to create strong and effective techniques for interfacing with LLMs and other tools, aiming to minimize issues like hallucinations.

To grasp this concept in action, we’ll analyze advanced prompting methods like chain-of-thought, tree-of-thought, and reflexion. These techniques enhance the model’s ability for deeper and more nuanced reasoning, narrowing the gap between user expectations and the model’s actual capabilities. When building AI-powered products, engineered prompts are likely to be the ones you’ll predominantly use. The simpler prompts previously mentioned? They might work in B2C apps. But for B2B solutions? Forget it. Given the high stakes, would you risk having your enterprise product generate unreliable outputs? Yeah, I thought so.

## 2.4. Chain-of-thought

Let’s start with chain-of-thought. This is a novel strategy that prompts the language model to articulate its thought process. By exposing the model to a handful of examples where the reasoning is clearly spelled out, the model learns to similarly outline its reasoning when responding to new prompts.

To illustrate a chain-of-thought approach using different examples than those provided, let’s consider a scenario involving a bookstore and another involving a school’s sports equipment inventory.

> **Model input:**
>
> Question: A bookstore has 200 novels in stock. It receives 4 more boxes of novels, each box containing 25 novels. How many novels are in stock now?
>
> Answer: Initially, the bookstore has 200 novels. Each of the 4 boxes contains 25 novels, so 4 boxes total 4 × 25 = 100 novels. Adding these to the original stock, 200 novels + 100 novels = 300 novels. The answer is 300 novels.
>
> Question: The school’s sports department has 15 basketballs and 10 volleyballs. If they purchase 3 new sets of sports equipment, each set containing 5 basketballs and 4 volleyballs, how many of each type of ball do they have now?
>
>
> **Model output:**
>
> Answer: The sports department starts with 15 basketballs and 10 volleyballs. Each set of new equipment contains 5 basketballs and 4 volleyballs. So, 3 sets would have 3 × 5 = 15 basketballs and 3 × 4 = 12 volleyballs. Adding these to the original numbers, the department now has 15 + 15 = 30 basketballs and 10 + 12 = 22 volleyballs. The answer is 30 basketballs and 22 volleyballs.

As models advanced in intelligence, it became less crucial to provide examples of reasoning. The training datasets already contain a vast array of reasoning examples! Depending on the size of the model, you can activate chain-of-thought processing in almost any prompt by using a magic phrase: “Let’s think step by step.”

> **Model input:**
>
> Q: The school’s sports department has 15 basketballs and 10 volleyballs. If they purchase 3 new sets of sports equipment, each set containing 5 basketballs and 4 volleyballs, how many of each type of ball do they have now? Let's think step by step.
>
>
> **Model output:**
>
> A: The sports department starts with 15 basketballs and 10 volleyballs. Each set of new equipment contains 5 basketballs and 4 volleyballs. So, 3 sets would have 3 × 5 = 15 basketballs and 3 × 4 = 12 volleyballs. Adding these to the original numbers, the department now has 15 + 15 = 30 basketballs and 10 + 12 = 22 volleyballs. The answer is 30 basketballs and 22 volleyballs.

By outlining the thought process, the AI’s responses become more transparent and easier to inspect. Users can see how the AI arrived at its conclusion, which helps in verifying the accuracy and reliability of the response. By seeing the intermediate steps in the AI’s reasoning, users can more easily identify where the LLM might have gone wrong and provide targeted feedback for improvement.

For tasks that require sequential steps or reasoning, chain-of-thought prompting can guide the AI to follow a structured approach, leading to more accurate and relevant answers. When the AI explains its reasoning step-by-step, it can also be an educational tool, helping users learn how to approach similar problems in the future.

## 2.5. Plan-and-solve

To further enhance the precision of the output, we may sometimes consider giving the model an outline of the specific steps we anticipate it will handle. This is called the plan-and-solve approach. 

This method involves two steps: first, creating a plan to break down the whole task into smaller subtasks, and then executing these subtasks as per the plan. PaS prompting significantly enhances the quality of the reasoning process generated. 

For example, let’s consider our sports inventory scenario:

> **Model input:**
>
> Q: The school’s sports department has 15 basketballs and 10 volleyballs. If they purchase 3 new sets of sports equipment, each set containing 5 basketballs and 4 volleyballs, how many of each type of ball do they have now? First, let’s break down the problem and identify key variables and their values. Next, we’ll formulate a plan. Then, we’ll proceed with the plan, computing intermediate results while being mindful of both calculation accuracy and common sense. We’ll solve the problem step by step and present the final answer.
>
>
> **Model output:**
>
> A: To solve this problem, we’ll first establish the key variables and their initial values. The school currently has 15 basketballs and 10 volleyballs. They plan to purchase 3 new sets of sports equipment, with each set containing 5 basketballs and 4 volleyballs.
>
> Our goal is to determine the total number of basketballs and volleyballs after these purchases. To do this, we’ll calculate the number of balls in the new sets and then add these to the existing totals.
>
> For the basketballs, each set contains 5. Since 3 sets are being purchased, we multiply 5 basketballs by 3, giving us the total number of basketballs in the new sets. Similarly, for the volleyballs, each set contains 4 volleyballs. Multiplying this by 3 gives us the total number of new volleyballs.
>
> Adding these numbers to the existing totals will give us our final count for each type of ball. We'll perform these calculations with care, ensuring accuracy and applying common sense to check our results.

As you can see, this is an overly complicated approach for a straightforward calculation. However, that doesn’t imply the technique is without merit. The neat part? You can guide the model to learn various planning methods as long as you can steer it towards a sensible sequence of steps to execute. Using plan-and-solve prompts enables us to mimic human-performed workflows, which an unprompted AI might struggle with.

As we analyze more complex scenarios, particularly business tasks or workflows like the upcoming example, we’ll notice that these models tend to get more easily confused compared to dealing with elementary school math formulas.

### Case study: The smart reactivity of LLM-based apps

You might be wondering: *This is nice and all, but how can I apply what I’ve learned? Am I supposed to create a calculator app? That’s not my goal!* And I would agree with you. Now that we’ve grasped this piece of theory, let’s apply it to a real, though not overly complex, project.

There’s a limit to what you can achieve using just tags, keywords, likes, votes, and other “simple” metadata.

The first wave of metadata-driven products emerged with the advent of social networks. Platforms encouraged users to “like” various items and operated on the naive assumption that if individuals within your network appreciated something, you would likely enjoy it as well. And so we got Digg, Facebook, Twitter, YouTube, and many, many more…

The second generation of smarter reactivity leveraged classification algorithms. Consider TikTok, for example. It cleverly employs AI to determine the content you engage with, then curates more of what might appeal to you, bypassing your social connections. Just watch the stuff you like—we’ll figure out the rest on our own. While this was groundbreaking at a large scale, it’s only scratching the surface of what’s next.

Enter large language models.

The upcoming wave will pivot from mere classification to reasoning and cognition. Even though LLMs sometimes err and glitch, they exhibit a semblance of reasoning in many straightforward scenarios. The debate on whether this mirrors human-level thought is ongoing, but for many applications, even current capabilities suffice. Let me illustrate with a personal example.

As a proof of concept, I built [Changepack,](https://github.com/changepack/changepack) an open-source changelog tool integrated with ChatGPT. Changepack syncs with your GitHub activity, streamlining progress tracking. Every month, Changepack selects the most noteworthy updates, crafting a release note draft for your perusal and dissemination.

This selection process harnesses ChatGPT. In essence, I task the algorithm with sifting through recent changes, selecting the most pertinent ones, and justifying its choices for optimized outcomes. This allows me to proactively compose a draft for release notes without any human input, compelling the AI to scrutinize the core content, and deduce conclusions for me, sidestepping the need for behavior-based metadata.

This process involves two steps. Initially, AI needs to assess each change:

> As an AI language model, your assignment is to evaluate an outstanding task related to a product called (name). The task’s explanation is technical jargon and is geared toward the organization’s in-house departments.
>
> (Introducing the product here…)
>
> (Introducing the target audience's description of the product here…)
>
> Your task has two aspects:
>
> 1. Assess the task, underlining parts that could be unclear to the target audience. Propose changes to enhance readability and improve understanding, while maintaining a professional yet accessible tone. Identify any mentions of specific staff members or proprietary tools including but not limited to feature management platforms like LaunchDarkly, customer messaging tools like Intercom, user behavior analytics platforms like FullStory, project management software like Jira, and others.
>
> 2. Craft a clear and succinct summary of this task. This summary should not exceed 600 characters and should not include any reference to specific staff members or proprietary tools such as LaunchDarkly, Intercom, FullStory, Jira, and the likes. The objective is to convey the essential alterations and updates to the product. Remove all URLs, regardless of whether they are in HTML or Markdown format.
>
> Now, please evaluate and summarize the following task…

Following the cleanup phase, where the AI summarizes all tasks, we then instruct it to select the most critical ones:

> Based on the following updates provided, identify and summarize the most impactful changes for (name)’s customers. As you select each update, please provide a brief rationale for its inclusion. It's crucial that you do not reveal names of any specific users, clients, accounts, or organizations.

Typically, we would consider various metadata indicators, such as keywords, the number of lines of code altered, the number of contributors to the feature, or the volume of added comments, to gauge the significance of a change. However, we take a different approach in this case. We direct the AI to assess the changes solely from the perspective of the target audience. This approach compels the AI to evaluate the importance of these features based on customer perception. Next, we instruct it to prioritize the list according to what would matter most to the customer.

Before implementing this plan, Changepack struggled to consistently deliver outcomes that met my expectations. The quality varied because the model lacked a clear understanding of what I wanted from it. It became clear that the model needed precise instructions and a well-defined objective to perform at its best. I began to refine my approach, carefully outlining my needs and what I anticipated. This change in strategy proved to be a turning point. Slowly, the results started to better match my vision.

I anticipate many apps will tread this path in the coming years, as we push the boundaries of LLMs beyond routine tasks.

## 2.6. Reflexion

Reflexion is another technique designed to curb the issue of hallucination in current generative models. It employs a feedback loop that corrects errors autonomously, creating a “model-in-the-loop” framework as opposed to the traditional “human-in-the-loop” system. In essence, one language model reviews and refines the output of another.

I also stumbled upon this method while I was developing [Changepack.](https://github.com/changepack/changepack) One fascinating discovery was when I tried to make ChatGPT consistently spit out changelogs in HTML, but to no avail. Sometimes it would give me Markdown, and other times, the results wouldn’t have any formatting. Despite trying multiple variations, I just couldn’t pin it down. Admitting defeat, I turned to ChatGPT to fix its own prompt. To my surprise, it rewrote it… and it worked. Consistently. I suppose it explained to itself what to do in a way it could comprehend, which I still find amusing.

This is the model-in-the-loop approach. Instead of manually correcting my errors, I provided one neural network with the output of another neural network’s work. This approach yielded improved results because, as we observed with chain-of-thought processing when we instructed models to focus on single tasks instead of multitasking, we typically achieve better outcomes.

### Case study: Self-reviewing agents

Around the same time, I had the chance to test out [Sweep,](https://sweep.dev/) an AI-driven junior programmer designed to tackle issues like bug fixing, implementing simple features, writing tests, completing documentation, and so forth. What struck me during this process was that Sweep is essentially a self-reviewing entity—and that’s quite intriguing.

We’re all fairly familiar now with the fact that results from large language models like ChatGPT can fluctuate quite a bit. The output depends heavily on the given prompt. At times, the model might produce wholly imagined answers, make errors, display human-like cognitive biases, or generate text that statistically seems plausible, but isn’t accurate. There are ways to mitigate these issues, some of which closely resemble strategies we instill in our education system. For instance, asking a model to break down its response step by step often yields superior results, much like how humans often catch their own mistakes when asked to explain their thought process.

Sweep pushes this concept further. It employs dual prompts: the committer, which prompts the model to play the role of a coder writing the script, and the reviewer, which prompts the model to act as a code reviewer providing feedback. The coder then revises and enhances their code based on this feedback, and the cycle continues, yielding better code.

I find this approach fascinating because, in my experience, models, despite having a wealth of knowledge ingrained in them, often don’t put that knowledge to use unless explicitly asked to do so. This is somewhat like how humans can fall victim to certain cognitive biases unless we take a moment to slow down and apply a more scientific approach to our own thinking. We also have to consciously engage our internal reviewer!

Philosophically speaking, this leads me to wonder if such behavior is learned—essentially mimicked—from human inputs given to the model, or if it’s inherent to the cognitive process itself. It might indicate a feature common to all intelligent beings. Well, maybe not all, but those with structurally networked minds. (Can minds exist without such networking? It’s anyone’s guess.) It could also be tied to their method of information retrieval or generation.

Intriguing thoughts to toy with. Don’t expect me to give you answers, though.

## 2.7. Tree-of-thought

When tackling intricate problems that necessitate foresight or exploratory reasoning, the prompting approaches we’ve already explored may fall flat. This is where the tree-of-thought method becomes invaluable. Essentially, the framework operates on several key principles:

- It structures the reasoning process as a branching tree, where each node signifies an intermediate line of thought or logical step on the path to a solution. In the realm of math problems, for instance, each node might represent an equation.

- Instead of linearly generating a single line of reasoning as seen in chain-of-thought methods, this approach produces multiple potential thoughts at every node. This enhances the model’s ability to examine a range of possible reasoning trajectories.

- The model itself evaluates the merit of each thought or node by determining its validity or likelihood of success. This acts as a heuristic, offering guidance on which branches of the tree to pursue further.

- Advanced search techniques like breadth-first or depth-first algorithms are deployed to rigorously navigate through the tree. This diverges from simpler approaches, offering the capabilities of lookahead, backtracking, and branching to explore various scenarios.

- The architecture of the system is highly adaptable and modular. Whether it’s the representation of thoughts, the process of generating them, their subsequent evaluation, or even the search algorithm used, all components can be tailored to fit specific problem sets. Moreover, there’s no requirement for additional training of the models.

Though it may sound complicated on paper, its application is far more straightforward. Let’s dig into a hands-on example.

> Three brilliant minds are using the “tree of thoughts” methodology to collaboratively tackle a question. Each expert will thoughtfully build upon the prior contributions of others, acknowledge any mistakes, and enhance the collective understanding. They will iterate on each other’s insights and give due recognition. This iterative process will continue until they arrive at a definitive solution. The entire discussion will be organized in a markdown table for easy reference. The question under consideration is…

In essence, the approach makes the AI act as if it's hosting a roundtable of experts from various fields, all coming together to unpack a single concept. I was unfamiliar with this method until I came across it in the report. I found it enlightening and thought it would be valuable to share this newfound knowledge with you all.

Here’s an example, generated by GPT-4.

### Should a startup take as little or as much funding as possible?

**Expert 1**

> Opinion: Startups that take as much funding as possible can more aggressively invest in growth and gain a competitive edge. They can hire top talent, invest in R&D, and enter new markets more rapidly.

Expert 1 thinks it’s important to acknowledge that taking more money often comes with increased pressure for fast growth and high returns, which may not align with every startup’s vision or operational reality.

**Expert 2**

> Opinion: While investing in growth is essential, taking too much funding could lead to inefficiencies and wasteful spending. It also dilutes the ownership, which could demotivate the original team and even lead to a loss of control over the company’s direction.

Expert 2 agrees with Expert 1 on the advantages of ample funding for rapid growth. Startups with less funding often adopt a lean approach, focusing on profitability and sustainability. This can make them more resilient to market downturns and reduce the reliance on further funding rounds.

**Expert 3**

> Opinion: Both approaches have merit but depend heavily on contextual factors such as the competitive landscape, the maturity of the market, and the startup’s unique strengths and weaknesses.

Expert 3 acknowledges points from both Expert 1 and Expert 2. An optimal funding strategy could involve a balanced approach: raising enough capital to achieve key milestones but not so much that it leads to inefficiencies or excessive dilution. This would require careful planning and a deep understanding of the startup’s business model and market conditions.

## 2.8. Chain-of-density

The next method aims to demonstrate the extent of detail you can put into your prompts—and it’s probably significantly more than you imagine! The Salesforce AI team rolled out a fresh approach for LLM-based text summarization, dubbed [chain-of-density. ](https://arxiv.org/abs/2309.04269)The researchers recognized the fine line between detail and essential ideas when summarizing text. To address this, they developed a new prompt that lets you tweak the summary’s density to your liking.

Here's the prompt:

> Article: [Paste the article here…]
> 
> You will generate increasingly concise, entity-dense summaries of the above article.

> Repeat the following 2 steps 5 times.
>
> 1. Identify 1-3 informative entities (“;” delimited) from the article which are missing from the previously generated summary.
> 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the missing entities.
>
> A missing entity is:
>
> • relevant to the main story
>
> • specific yet concise (5 words or fewer)
>
> • novel (not in the previous summary)
>
> • faithful (present in the article)
>
> • anywhere (can be located anywhere in the article)
>
> Guidelines:
>
> • The first summary should be long (4-5 sentences, ~80 words) yet highly non-specific, containing little information beyond the entities marked as missing. Use overly verbose language and fillers (e.g., “this article discusses”) to reach ~80 words.
>
> • Make every word count: rewrite the previous summary to improve flow and make space for additional entities.
>
> • Make space with fusion, compression, and removal of uninformative phrases like “the article discusses.”
>
> • The summaries should become highly dense and concise yet self-contained, i.e., easily understood without the article.
>
> • Missing entities can appear anywhere in the new summary.
>
> • Never drop entities from the previous summary. If space cannot be made, add fewer new entities.
>
> Remember, use the exact same number of words for each summary. Answer in JSON. The JSON should be a list (length 5) of dictionaries whose keys are “Missing_Entities“ and “Denser_Summary.“


As you can observe, this command is highly specific. I used it on an article from my newsletter:

```
[
  {
    "Missing_Entities": "productive fictions; Pascal’s Wager; societal issues",
    "Denser_Summary": "This piece elaborates on the concept of 'productive fictions', beliefs not necessarily true but adopted for their beneficial outcomes. The article discusses instances like Pascal’s Wager, proposing belief in God as beneficial, and societal issues. It emphasizes how these fictions, though not universally accurate, offer pragmatic frameworks for navigating life and maintaining societal balance."
  },
  {
    "Missing_Entities": "Blaise Pascal; meritocracy; trust your gut",
    "Denser_Summary": "The article explores 'productive fictions', including Blaise Pascal's Pascal’s Wager, suggesting the benefits of belief in God and illustrating societal constructs like meritocracy and 'trust your gut'. These are not strictly true but are embraced for positive outcomes and societal equilibrium, offering practical viewpoints for life navigation and balance maintenance."
  },
  {
    "Missing_Entities": "pragmatic optimism; legal system; eternal life",
    "Denser_Summary": "The text explores 'productive fictions' like Pascal’s Wager by Blaise Pascal, and societal norms like meritocracy, depicting the pragmatic optimism derived from such beliefs. It illuminates how the legal system and notions like 'eternal life' are not empirically true but foster societal balance and constructive life perspectives, providing practical pathways for life and societal harmony."
  },
  {
    "Missing_Entities": "17th-century philosopher; presumption of innocence; external factors",
    "Denser_Summary": "Exploring 'productive fictions', the text cites 17th-century philosopher Blaise Pascal’s Wager and societal ideologies like meritocracy and presumption of innocence, reflecting pragmatic optimism and the equilibrium derived from adopting such beneficial, though unproven, beliefs. These constructs, intertwined with external factors, guide life, fostering societal harmony and offering practical, balanced life perspectives."
  },
  {
    "Missing_Entities": "moral stances; human traits; market demand",
    "Denser_Summary": "The article delves into 'productive fictions' like 17th-century philosopher Blaise Pascal’s Wager and societal values like meritocracy, illustrating moral stances, human traits, and the impact of market demand. It underscores the pragmatic optimism and societal equilibrium emanating from these unverified but beneficial beliefs, which serve as practical guides, balancing life and societal constructs."
  }
]
```

Given that summarization is often why people turn to large language models, the new CoD technique’s favorable ratings in human preference studies make it particularly valuable. Even more noteworthy is that this approach works seamlessly with the baseline GPT-4, no fine-tuning required, demonstrating there’s still a lot to uncover in the realm of effective prompting strategies.

## 2.9. Chaining outputs

The inputs and outputs we discussed were relatively brief. Even if the prompts were complex, they consisted of a few paragraphs, and the model’s responses were similar. But what if we wanted the AI to write an entire book, for example? When I typed this prompt into ChatGPT, I got just nine paragraphs of something the model considers a “book,” which they obviously are not.

So the question is, is there any way to guide the model into writing a book, even though it can’t do it on its own?

It turns out the answer is yes. If you, as a prompt engineer, know how to write a book, you can hand-hold the model step by step. First, you ask it to generate a few topics and choose the best one. Next, you have it write an elevator pitch for the book to determine the approach to the topic. Then, you instruct it to create an outline based on this pitch. For each chapter from the outline, you have it develop a plan. Finally, for each bullet point, it writes a few paragraphs. The model might not be able to hold everything it wrote in memory, so to reduce the length of the prompt, each time you prompt it to write a new chapter, you provide the outlines and a summary of what it previously wrote. And yes, it writes the summaries on its own, too.

As a result, at every step, it will generate a few paragraphs, then use these paragraphs to generate more paragraphs, and so on. The outcome might not be easy to digest, since LLMs are not highly creative, but you just guided the model through an extremely complex process.

The technique we just used is called chaining. It involves generating a desired output by following a series of prompts. In this process, a sequence of prompts is provided to a model, guiding it to produce the desired response. Each prompt addresses a subtask defined by the prompt engineer, and they are executed in sequence, each one feeding into the next, continuing down the chain of tasks until we arrive at the final result.

Time for a practical example.

We'll take the result of the summarization prompt from the previous section and build upon it by chaining it with a couple of other prompts.

>**Model input:** 
>Take the last summary you wrote and prepare a list of ideas for different social media sites based on it. Each idea should be tailored to the type of content that works best on each platform.

>**Model output:**
>Based on the summary of the blog post, here are tailored ideas for different social media platforms:
> 
>_Twitter:_ Create a thread explaining "productive fictions" with examples like Pascal's Wager and meritocracy, and how these unverified beliefs positively impact societal values and behaviors. Include engaging questions to encourage discussions.
> 
>_Instagram:_ Post a carousel of visually appealing graphics that illustrate the concepts of Pascal's Wager, meritocracy, and their benefits. Use concise text overlays and captions to explain how these ideas contribute to societal balance and pragmatic optimism.
> 
>_TikTok:_ Make a short video explaining the concept of "productive fictions" using simple terms and relatable examples. Use visual aids, animations, and on-screen text to make the content engaging. End with a question prompting viewers to share their own examples of beneficial beliefs.
> 
>(More ideas follow…)

You may have noticed that depending on the platform, the AI suggested different content forms. On Twitter, it’s text; on Instagram, it’s images; on TikTok, it’s videos. This is another reason why chaining can be extremely useful. In this example, we can take the ideas generated by the LLM and feed them to a visual model or an audio model.

For now, let's focus on just one modality. We’ll feed the model with its own instructions to write a Twitter thread based on the summary it wrote.

>**Model input:** 
>Create a thread explaining "productive fictions" with examples like Pascal's Wager and meritocracy, and how these unverified beliefs positively impact societal values and behaviors. Include engaging questions to encourage discussions.

>**Model output:**
>_1/10_ 🌟 Ever heard of "productive fictions"? These are beliefs that, while unverified, bring positive impacts to our lives and society. Let's dive into some examples and see how they work!
> 
>_2/10_ 📜 One classic example is Blaise Pascal's Wager. Pascal argued that it's rational to believe in God, even without proof, because the potential benefits (eternal happiness) outweigh the potential costs.
> 
>(More tweets follow…)

Great! We could use this simple chain, for example, in an app meant for content creators to help them automate processes related to self-promotion. As the app designer, you set up the process step by step by writing the prompts in a chainable way, teaching the model to act almost like you would instruct a human. The AI then takes care of the results.

Also, you can chain prompts with a human-in-the-loop approach for verification or corrections when it comes to mission-critical processes where hallucinations are unacceptable. For example, if you want the model to generate simple legal contracts, you might add a step that requires a human to approve the content created by the AI at a certain point in the chain. This ensures that you are feeding only 100% correct information into the next prompt.

## 2.10. Beyond prompts

Having covered all these techniques, I must confess something to you, my reader, with a bit of embarrassment: I don’t even like prompts.

It’s true that the basics are straightforward and open to everyone. However, when it comes to tasks with edge cases, the need for clear expression of vague preferences, or tasks that require a precise understanding of LLM behavior, writing prompts can be difficult. Take the tree-of-thought technique we’ve discussed before. That wasn’t an easy prompt to write. 

What about images, though? I found a prompt for a generated image on my feed in [Playground](https://playground.com), an AI image app:

> Digital realistic art of a Plymouth Road Runner 426 Hemi, tinted windows, speeding through the city streets of Miami, the sun shining on the bright orange body and chrome parts. Professional digital painting made with alcohol inks and acrylic, in the style of WLOP, RHADS, APK, vibrant colors, sharp focus, vanishing point, three-point perspective. High quality.

The author uses very specific phrases to guide the model to their desired outcome. I wonder how much experimentation it took to achieve that. I doubt I could produce something similar without further training, even though I’ve been working with generative AI for the past year.

It seems likely to me that prompts will become increasingly less important over time. This trend is already evident in cutting-edge tools. OpenAI’s DALL-E 3 integrates natively with ChatGPT, allowing you to use ChatGPT as a collaborator to refine your prompts. You can describe what you want to see using the chat, ranging from a simple sentence to a detailed paragraph.

However, ChatGPT doesn’t directly use this description for the image prompt. Instead, it creates its own command—a case of two AI systems interacting independently.

Let me show you an example. This is an image prompt created from a very simple command I gave to ChatGPT:

> **Kamil to ChatGPT:** Draw a cat, please.
>
> *ChatGPT to DALL-E 3:* A realistic image of a domestic cat sitting peacefully. The cat has a fluffy coat, with distinct tabby markings in shades of grey and black. Its eyes are bright green, and it is gazing curiously at something out of frame. The background is simple, with a soft focus to keep the emphasis on the cat. There’s a hint of a comfortable living room setting, with a cozy couch and a warm, soft light that enhances the cat’s fur texture.

[Research shows](https://arxiv.org/pdf/2310.11589.pdf) that GPT-4 is, in fact, more effective at eliciting user preferences through questioning than users were in conveying their needs through prompts—and that it requires less mental effort.

I’ve crafted a simple prompt demonstrating how to get a similar [outcome:](https://chat.openai.com/share/fd3318ec-eae3-4e02-9b9e-22fcc9ad280f)

> As a language model, your role involves examining and refining prompts for systems similar to yourself. When a user presents a prompt, break down its purpose, considering the user’s objectives and motivations. Reflect on the constraints and optimal strategies for engaging with large language models in scenarios like the user’s. Your task is to reshape the prompt into a directive for another neural network to maximize effectiveness. Recognize that the user may not have expertise in prompt engineering, so your assistance is key in enhancing their interaction.
>
> The prompt is: “Write a short story about knights.”

I believe this approach will become the norm: experts in various fields will create task elicitation systems specific to their domains. These systems will then help casual users engage with generative AI more easily and achieve better results, all without needing to learn prompt engineering.

As models start to take over the simpler aspects of prompt engineering before the field fully matures, AI engineers will focus on the more complex tasks. These include versioning, testing, fine-tuning, and deploying prompts and models. They will use a variety of advanced skills required to develop applications with this new generation of models.

Two gladiators, mid-combat in the Colosseum, dust rising under their feet, a crowd blurring in the background, shields and swords clashing with sparks flying, tension palpable, action shot, high dynamic range, motion blur capturing the intensity of the moment, digital painting.

# 3. Technological design constraints

In Chapter 1, we explored the vast potential of generative AI in revolutionizing various industries. Chapter 2 talked about the nooks and crannies of prompting, highlighting its role in creating a new wave of products highly responsive to human interaction. However, generative AI is still emerging in terms of widespread adoption. While promising, the technology is not without its hurdles and constraints. It isn’t all-powerful.

In this chapter, we will address challenges that product professionals need to overcome for effective use of generative AI. We’ll begin with technical constraints, one of which—hallucinations—you’re already familiar with from Chapter 2. There are other similar challenges to explore. Next, we will transition to issues tied to product and design.

Let’s dive in.


## 3.1. Context windows

Current generative AI models, whether textual or graphical, still face significant challenges when it comes to maintaining context. While they can remember things within a single session, these models are quick to forget once the session is lost or a new one starts, necessitating the need to prompt them with relevant details all over again.

For instance, if I’m using GPT-4 to brainstorm marketing strategies for the launch of a new feature but fail to save the conversation or simply lose it, ChatGPT will not recall the context I’ve already shared. Details such as my business description or my target audience profiles will need to be re-provided when I approach OpenAI for help with launching another feature.

[Custom instructions](https://openai.com/blog/custom-instructions-for-chatgpt) *do* offer some level of control over how ChatGPT responds, allowing you to set your preferences and have them remembered for future conversations. I’m not particularly fond of this feature, given that it requires you to prepare in advance by laying out as much context as possible up-front. This feels like a chore, detracting from the overall magic of the experience. It would be far more beneficial, I believe, if the model could summarize key points at the end of a conversation and store them as “core memories.” Ideally, it should learn from our conversations, but this seems unfeasible due to the immense context window needed for the model to retain everything we discuss, or the impracticality of retraining a personalized model for each user.

Another possible solution to the context problem is the implementation of integrations. For instance, an AI-driven sales manager could learn about your customers and product by linking it to your Google account and accessing your past emails. And integrations will be a huge game-changer in the world of large language models, opening up incredible opportunities for startups. When a company like Google launches an AI assistant, that assistant is limited to operating solely within Google. But imagine an independent assistant that could seamlessly integrate with other platforms. It could pull brand assets from your cloud storage, whip up UI mockups in Figma, and then send them over to an AI programmer to code the frontend.

This kind of cross-functionality is currently unachievable with the walled garden approach favored by large corporations, but it represents a significant potential avenue for disruption by generative AI. OpenAI knows this, too. You can tell by their announcement of the [ChatGPT Enterprise](https://openai.com/blog/introducing-chatgpt-enterprise) offering. One notable feature under development is customization, which will allow companies to securely augment ChatGPT's knowledge base with their own data through integration with existing applications. I think this is a sound strategic direction.

## 3.2. Moderating non-determinism

Non-determinism refers to the unpredictability of outputs from generative AI models. When you input the same prompt into a generative AI model multiple times, you often receive different responses. This variability can be a double-edged sword. On one hand, it allows for creativity and diversity in outputs, essential for applications like content generation, art, and creative writing. On the other hand, it makes reliability harder to achieve, which are critical for many applications. 

Traditional software relies on predictable outputs for given inputs. Non-determinism can lead to frustration when the AI generates unexpected or irrelevant responses. This unpredictability can erode trust in the product, especially in applications requiring precise and reliable outputs, like customer support. 

Need an example? Air Canada lost a small claims court case against a grieving passenger after trying unsuccessfully to disavow its AI-powered chatbot. The passenger argued they were misled about the airline’s bereavement fare policies when the chatbot provided incorrect information. The Tribunal in Canada’s small claims court sided with the passenger. 

After their grandmother passed away, the passenger used Air Canada’s website chatbot to look up flights. The chatbot incorrectly stated that bereavement fares could be applied retroactively. The passenger took a screenshot of this response and presented it to the Tribunal. The chatbot had told the customer: 

> Air Canada offers reduced bereavement fares if you need to travel because of an imminent death or a death in your immediate family… If you need to travel immediately or have already travelled and would like to submit your ticket for a reduced bereavement rate, kindly do so within 90 days of the date your ticket was issued by completing our Ticket Refund Application form.

The passenger later found out from Air Canada employees that the airline did not accept retroactive bereavement applications. However, they still pursued the refund, stating that they had relied on the chatbot’s advice—caused by a non-deterministic hallucination—according to case records. 

In regulated industries non-deterministic outputs pose significant challenges for compliance. Regulators often require clear, predictable, and explainable decisions. The inherent unpredictability of generative AI models can make it difficult to meet these requirements, leading to potential legal and ethical issues.

The answer to the problem of non-determinism is, unfortunately, moderation. In general, you can moderate inputs or you can moderate outputs. If you want to moderate inputs, you can use tools like the free [Perspective API.](https://perspectiveapi.com/) OpenAI offers a similar [endpoint.](https://platform.openai.com/docs/guides/moderation/overview) These APIs employ machine learning to identify toxic content, making it easier to moderate. Comparable tools exist for detecting NSFW content or inappropriate images. If unwanted content is detected, it’s best to prevent the chatbot from responding, as the response could be unpredictable.

If you want to moderate outputs, frontier models like ChatGPT usually excel at basic filtering. For example, they prevent the generation of harmful content, like instructions for making a bomb; they are often legally required to do so. However, for more specific cases, you need to implement a human-in-the-loop approach, which we briefly mentioned in Chapter 2. 

Human-in-the-loop involves integrating human oversight into the AI's decision-making process. Before the AI-generated content is finalized, it can be reviewed by human moderators. These moderators can approve, reject, or edit the output to ensure it meets quality and safety standards. This is particularly useful in applications where accuracy and appropriateness are critical, such as medical advice.

Users can also provide feedback on the AI's outputs, which is then reviewed by human moderators. This feedback helps in continuously improving the model’s performance and reducing non-deterministic behavior over time. For example, if a user flags a response as inappropriate, moderators can investigate and adjust the model’s parameters or training data accordingly. 

An example of tackling non-determinism in a practical application can be seen with [Nibble:](https://www.nibbletechnology.com/) an AI-powered chatbot that allows shoppers to haggle for lower prices. Shoppers can reach a deal within 45 seconds, and roughly one-fifth proceed to purchase the item, according to the company. However, if you're considering implementing a similar idea and are concerned about customers finding exploits to secure 100% discounts, put a human-in-the-loop system in place. Before the final deal is confirmed, human moderators can review the negotiated prices to ensure fairness and prevent abuse, maintaining both customer satisfaction and business integrity.

### Case study: The delicate balance between medical education and medical advice

Medyk.ai is one of the chatbots available on [Czat.ai,](https://czat.ai/) an AI platform developed by Michał Jaskólski, the founder of one of Poland’s top real estate search websites. The platform aims to offer around-the-clock support from AI through a variety of specialized characters, each with its own unique personality.

The product faced scrutiny after journalists [investigated](https://www-rynekzdrowia-pl.translate.goog/E-zdrowie/Lekarz-obok-wrozki-i-ogrodnika-dzieki-sztucznej-inteligencji-Rzecznik-Praw-Pacjenta-ma-watpliwosci,260788,7.html?mp=promo&_x_tr_sl=pl&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp) it and sought opinions from legal authorities and experts about its regulatory compliance. In Poland, as in many other countries, healthcare is a highly regulated sector. The journalists reached out to the Patients Ombudsman and the Office of Competition and Consumer Protection—attention that startups usually prefer to avoid. However, both authorities were unable to provide definitive answers, as the market is so new that it’s still unclear whether such a service qualifies as a medical product. Additionally, no customers had filed complaints yet. Nonetheless, their responses conveyed a general sense of caution.

Fortunately, Michał anticipated these challenges and took steps to safeguard the product. Journalists conducted hands-on tests, but despite their best efforts, they couldn’t persuade the AI to offer specific medical advice. Instead, the chatbot responded, “I understand that you are in a difficult situation and are looking for a specific answer, but I cannot exceed my limitations. This is important for your safety and in line with the rules designed to protect your health.” At the same time, the AI offered support by helping users prepare for a doctor’s visit, such as assisting in creating a list of questions or organizing documentation to present.

This is precisely the type of built-in safeguard we discussed earlier. As someone who supports innovation and doesn’t want to see new idea stifled by legal issues early on, I was pleased to find that when I tested the chatbot myself, it clearly stated that its sole purpose is to help prepare users for conversations with real doctors. It emphasized the importance of consulting medical professionals immediately if there are any concerns. The chatbot also acknowledged its limitations, such as the possibility of not having the most up-to-date medical knowledge, and reiterated that the entire conversation is purely educational.

When I asked Michał about the article, he mentioned that before launching, he decided to do some research. He discovered a preliminary [ruling](https://curia.europa.eu/juris/document/document.jsf;jsessionid=8F459A96910D13A247396ADC828E02C2?text=&docid=197527&pageIndex=0&doclang=EN&mode=lst&dir=&occ=first&part=1&cid=4505257) by the Court of Justice of the European Union regarding national legislation that requires drug prescription assistance software to undergo a certification process by a national authority. The ruling offers a reasonably clear definition:

>“Medical device” means any instrument, apparatus, appliance, software, material or other article […] used specifically for diagnostic [or] therapeutic purposes […], intended by the manufacturer to be used for human beings for the purpose of: diagnosis, prevention, monitoring, treatment or alleviation of disease; diagnosis, monitoring, treatment, alleviation of or compensation for an injury or handicap; investigation, replacement or modification of the anatomy or of a physiological process; control of conception. […] Software for general use, even if used in healthcare, and software related to lifestyle and well-being are not considered medical devices.

Michał believes that as long as the app is intended for general use—which it clearly is, given that it’s a horizontal platform with multiple personas—and the chatbot remains transparent about its limitations without attempting to diagnose users, it should be in the clear. In the end, no one found any significant issues with the platform, except for—surprise, surprise—the usual calls for more and clearer regulation. So, it seems his assessment was correct.

Additionally, the product is designed so that no personal information—particularly critical in the European context—is stored long-term. There’s no registration and no user profiles, and the terms of service explicitly prohibit users from sharing personal information. I suspect this was added to shield the team from concerns related to GDPR compliance, especially since Czat.ai openly acknowledges using GPT-4o and Claude 3.5 Sonnet, both of which are US-based companies, making data processing tricky. We'll explore similar issues in Section 4 of Chapter 4, where we discuss the legal aspects of generative AI.

You can see how a single precaution might not be sufficient, as this case study highlights a range of techniques: built-in product safeguards, careful selection of features and business model, and ensuring that the Terms of Service address necessary use cases. It’s a good reminder that managing non-determinism is an interdisciplinary effort.

Curious about how European companies are navigating the challenges of deploying generative AI, I sat down with Michał Jaskólski to get his perspective. When asked about the differences in how companies approach generative AI in the US and Europe, he didn’t hesitate. “The first question is always, 'What will happen to our data?' 'Will our data be leaked?' and 'Where will user data end up?'” Jaskólski said, highlighting the more cautious European approach. 

He noted that, especially in large corporations, projects are meticulously reviewed. “I’ve already encountered situations where, for example, the compliance department of one bank blocked a marketing campaign based on generative AI because no one could guarantee 100% that the chatbot wouldn’t say something it shouldn’t.”

His comments reflect the regulatory complexity that companies face in Europe, particularly with regulations like the GDPR, DMA, and AIA. “Fortunately," Michał added, “there are already lawyers in the Polish market who understand the complexity and multidimensionality of this area and are able to recommend specific solutions.”

When asked if he'd consulted any lawyers before launching his own product, Jaskólski was candid. “Yes. I did a lot of research myself, but I also consulted a lawyer on a few topics,” he said. For Michał, whether legal advice is essential depends on the nature of the project. “When the project touches on regulated areas such as finance, health, or law, such consultations are absolutely necessary because they resemble a minefield, and it’s easy to make a mistake.”

But he was clear about one thing: “It's worth being very specific in asking questions and not being afraid to discuss possible solutions,“ he advised, stressing the importance of seeking out the right legal help when necessary.

Michał, who also founded Morizon, a real estate platform under the media company Ringier Axel Springer, is no stranger to deploying AI-powered features. Reflecting on how larger companies approach AI differently, he pointed to the importance of focus. “A lot depends on how well you can impose a rapid development pace,“ he said. “If a topic is covered by OKRs, then you can do a lot in 2-3 months. If not, the production premiere of a solution can be delayed by even a year.“

Despite these challenges, he saw the advantage of working within a large company. “The main advantage is multi-level feedback from various sides, including experts in given areas,“ he explained. “Thanks to this, the products are simply better.“ He emphasized that larger teams often have the ability to spot areas for improvement that might go unnoticed in smaller projects.


## 3.3. Async

Unlike traditional software systems that operate mostly by giving you instant answers, current generative AI systems are fundamentally asynchronous.

For example, when using a language model like ChatGPT, users type their queries into a text box. This input is then sent to the back-end server for processing and streaming. Streaming enables real-time data flow between the client and server, providing users with incremental updates. For example, in a chat application, a generative AI can stream responses as they are being generated. This creates a more dynamic and engaging user experience, as users can see partial responses and start interpreting them even before the entire output is completed.

Standard UI elements like buttons and sliders are also used to trigger AI actions. These triggers can initiate complex back-end processes, such as generating an image, composing a piece of music, or performing data analysis. The asynchronous nature of these interactions allows the front-end interface to remain responsive, providing users with immediate feedback while the back-end handles the heavy lifting.

The back-end of these systems is generally where the bulk of processing occurs. For example, a user might upload a large PDF document and request a summary. The AI system will process this document asynchronously, allowing the user to continue other activities. Once the summary is ready, the system can notify the user, ensuring a seamless experience without requiring them to wait for the task to complete.

The ability to process tasks in the background offers several key benefits. One significant advantage is the control over concurrency. This is particularly important for services that rely on external APIs, especially those involving paid models. By regulating the number of requests made to these APIs, systems can optimize their usage and control costs effectively. This is more difficult on the front-end where we can't always control how often users interact with particular features.

The size of generative AI models plays a crucial role in determining their performance and accuracy. Larger models, while often more accurate and capable of generating higher-quality outputs, tend to be slower. This slowness is the trade-off for their greater accuracy and depth of understanding. On the other hand, smaller models, though less capable of nuanced and complex responses, are typically faster. This speed is particularly evident when running smaller models on hardware designed for larger ones, as the hardware's capabilities are underutilized, resulting in quicker response times.

Another permutation of this idea is edge AI. It involves deploying artificial intelligence directly onto peripheral devices, making each device a miniature decision-making hub. This contrasts with relying on the cloud for computational power. Essentially, it’s like having a small, independent brain in your smart gadgets, eliminating the need to constantly communicate with a more powerful, distant brain. That said, edge AI often lags behind its cloud-based counterpart because local devices like laptops lack the computing prowess of centralized servers. However, if you have a small model fine-tuned for a very specific, simple task that runs directly on your laptop or phone, it can deliver an answer without needing to communicate with the cloud, eliminating the time needed to handle server requests.

### Case study: OpenAI o1

In September 2024, OpenAI advanced the concept of asynchronous AI with the release of the GPT-1o model. This model was built to take more time to think before responding, allowing it to reason through complex tasks and solve more challenging problems in areas like science, coding, and math. When I tested it, the model took 12 to 20 seconds to begin generating text, and then a few more seconds to complete the response once it started.

![o1’s initial reaction to the prompt](https://read.kamil.fyi/u/screenshot-2024-09-12-at-18-21-55-JQwIbG.jpeg)

When it finished, it showed me the chain of thoughts that led to its answer.

![o1’s chain-of-thought](https://read.kamil.fyi/u/screenshot-2024-09-12-at-18-30-58-G6F0tT.jpeg)

OpenAI’s o1 represents a significant shift not just in research, but also in how products are developed. We’re now faced with the challenge of building around a model that takes time to think. For the past two decades, we’ve focused on shrinking feedback loops to mere milliseconds—whether it’s media, communication, computing, or hardware. Everything became so fast that we trained people to expect immediate responses. But generative AI is changing that dynamic. OpenAI has even mentioned that future versions of o1 might think for hours, days, or even weeks! This is a radical departure from what today’s fast-paced, dopamine-driven culture is used to. However, if the end result is so valuable that it’s worth the wait, who’s to say this isn’t where software development is headed in the next decade?

Now, product designers face a whole new set of challenges with this evolving interaction model. Can we even call something that takes hours or weeks to process a chatbot anymore? Is text input still the best way for users to engage with it? Are chat bubbles the right format to display results? If tasks take that long, should there be a progress pipeline that lets users review results step by step? And what if you catch an error early in the process—can you intervene, or are you stuck waiting for the final output to correct mistakes? Any data analyst knows the frustration of waiting for a lengthy and costly query to complete, only to realize you made an error, making the results useless and forcing you to start all over. These are critical questions designers must now grapple with.

If you’re a designer working on an agentic AI app right now, you likely need to look to other fields for inspiration on addressing these challenges. My guess is that the most relevant design patterns will come from industries that manage long-running, often physical, processes. Beyond data science, you could look at areas like shipping, which developed tools like email notifications, text updates, and live maps to show users where their package is and when it will arrive. Project management tools are another great example—they allow stakeholders to easily track the progress of long-term projects. What if your AI agent used something like a hill chart to visualize its progress, giving users a clear, intuitive view of how far along it is and what’s left to be done? These industries provide valuable lessons on keeping users informed and engaged during lengthy processes.

![Basecamp's hill charts](https://read.kamil.fyi/u/hill-charts-bcb208f51d8388753d0425a01e1c97619715353da29468852011d94292ba145d-sTE2Sq.webp) 



## 3.4. Discovery

When exploring AI-powered apps to write case studies for this book, I stumbled upon a fascinating solution to a classic design problem: When you have the power to instruct AI to create anything, what do you specifically ask for?

Since generative AI is inherently unpredictable, no two outputs will be identical if different seeds are used. As a result, achieving the desired outcome often involves a process of iteration, brimming with experimentation, due to either ambiguous instructions or the unpredictable nature of the results.

Take this for instance: I use [Playground](http://playground.com/) to conjure feature images for my articles. I aimed to produce an image of a semi-robotic cat licking its paw. It took an arduous 50 attempts before I was content with an image. Yet, despite my best efforts, the algorithm fell short of depicting the cat in the specific act of licking. And believe me, my early days experimenting with image-centric models were even more challenging.

 ![I tried to create an image of a semi-robotic cat licking its paw. After trying for a while, I finally achieved an image I was satisfied with.](https://read.kamil.fyi/u/screenshot-2023-08-09-at-02-11-57-hTC1Jh.jpeg) 

So, from a product design perspective, how do you streamline a user’s learning curve? Expecting them to navigate the maze of trial and error might deter many. This might be why many AI-driven platforms, including Playground AI, incorporate features such as galleries, community feeds, remix options, and collaborative modes.

Observing other users’ approaches provides invaluable insights. You can mimic their prompts, tweak them to your preference, and hasten the learning process. While seasoned users might outgrow the need for such galleries, they serve as a pivotal guide for novices.

![Playground offers a gallery of user-generated content, allowing you to emulate their progress and accelerate your own learning](https://read.kamil.fyi/u/screenshot-2023-08-09-at-02-21-45-QmFGyV.jpeg) 

Clever.

## 3.5. Articulation barriers

As we discussed in Chapter 2, the latest surge in generative AI is driven by the use of prompts—instructions or queries that you feed into a model to guide its responses. As multi-modal AIs gain traction, the role of prompts is expanding beyond text to encompass vision and voice. Virtually everything becomes driven by text. Take DALL-E 3, for example, which produces images influenced by the messages you exchange with ChatGPT.

This approach has its advantages. For one, it fosters a more conversational interaction with the product, making the technology more approachable. Using verbal prompts is often more intuitive than navigating a complex user interface, too—after all, you already know how to express yourself. You learned it in school!

Or did you, really?

### Language proficiency

That’s not always obvious. To start, users need to be eloquent enough to craft the necessary textual prompts effectively. Also, since most of these models are primarily trained on English data, their performance in other languages can be subpar. This puts non-English speakers at a disadvantage. Prominent UX researcher Jakob Nielsen refers to this issue as the [“articulation barrier.”](https://www.uxtigers.com/post/ai-articulation-barrier)

While it’s true that a GUI may not always be available in your native language, when it is, the quality of its output isn’t compromised by that fact. Also, translating a user interface doesn’t cost millions of dollars—unlike retraining a machine learning model.

Language proficiency isn’t the sole barrier to effective use of language models. Research indicates that in affluent countries like the United States and Germany, up to half of the population are considered low-literacy users. Although literacy rates may be higher in countries like Japan and potentially other Asian nations, the situation deteriorates significantly in middle-income and likely even more so in developing countries.

Even for those with high levels of literacy, conveying your requirements in written form can be challenging. In my book, [“Writing Great Specifications,”](https://www.amazon.com/Writing-Great-Specifications-Specification-Example/dp/1617294101) I talk in depth about the complexities of drafting specifications for software development teams. This task is not unlike instructing LLMs like GPT-4 to create an app for you. In particular, two major pitfalls I discuss are information asymmetry and the under-documentation pattern.

>**Information asymmetry**
>A situation that arises when one party possesses more or better information than the other, leading to an imbalance in understanding.

>**Under-documentation**
>A common mistake which involves neglecting to provide adequate information, whether due to errors, miscommunication, or even laziness.

It’s not hard to see how these issues often intersect: we may have a clear vision of what we want the app to do, but fail to communicate this adequately to the model. These pitfalls are not theoretical; they manifest in real-world scenarios every day, even among well-educated, well-intentioned professionals—and with intelligent humans on both ends of the process.

### Communication complexity—for algorithm designers

That covers the challenges of prompting, but there’s also the matter of the generated content to consider. Analysis reveals that the output from these models is typically crafted at a reading level of 12th grade or higher, making it problematic for low-literacy users. Usability research focusing on such users has long recommended that online text be written at an 8th-grade level to be more inclusive of a broader consumer base.

As [measured](https://www.uxtigers.com/post/ai-complex-text) by Nielsen:

- Bing Chat’s response was calibrated at a 13th-grade reading level, similar to what a university freshman might face
- ChatGPT responded at an astonishing 16th-grade reading level

Intriguingly, both of these applications are built on the same foundational model: GPT-4. This implies that it’s possible to prompt these models to produce simpler responses through system prompts, or to fine-tune them for that purpose. Each development team needs to determine the level of complexity that both they and their target audience are comfortable with.

My own experiences align with this perspective. I often rely on GPT-4 to assist me in editing [my newsletter.](https://kamil.fyi) Although I’m proficient in English, it’s not my native language—perfecting a newsletter issue to a a high standard on my own is time-consuming. For example, crafting an article like this one used to take me between one to two days before I started using ChatGPT. I’d complete the initial draft fairly quickly, but then spend a considerable amount of time fine-tuning the text—agonizing over idiomatic expressions, searching for synonyms, and the like.

GPT-4 has dramatically cut my editing time to just 30 minutes to an hour per article, allowing me to concentrate more on articulating my thoughts rather than perfecting their presentation. The trade-off? I often find myself having to simplify the model’s language choices. It just loves these complex, four- or five-syllable words. Yuck!

## 3.6. The economics of LLMs

As AI startups grow, there’s a trend of sharing memes on Twitter about massive bills from OpenAI. Some companies are posting about receiving bills of $8,000 or even $25,000, which can amount to about 10% of a startup’s monthly recurring revenue.

In the past decade, we’ve seen similar situations with cloud service bills. Back then, teams didn’t worry too much because if their services gained popularity, they had access to almost unlimited venture capital. However, in today’s climate, with the end of the zero interest rate policy era, companies need to be much more mindful of costs right from the start.

So, the big question is, how can we reduce costs? Naturally, the main solutions include developing more efficient models and improving hardware. However, we can also apply software engineering or prompt engineering techniques to cut expenses. This article explores the following strategies:

- Trimming prompts and responses to minimize token usage
- Implementing caching, including both exact matches and semantic caching for approximate matches
- Optimizing models through fine-tuning and deploying smaller models trough the AI router design pattern

This post leans more towards the technical side. I’m deeply interested in practical implementation techniques. I want to ensure this ebook doesn’t turn into a purely theoretical business text disconnected from real-world practices.

Let’s dive in.

### Condensing prompts and responses

We’ll begin with the basics. Since the primary cost from LLM cloud providers comes from the tokens used, reducing the number of tokens in each request can lower expenses. Because we can’t always manage user input, it makes sense to look for efficiencies in the system’s prompts and ChatGPT's responses.

System prompts can be manually shortened or we can use a tool like ChatGPT to do it for us. As I explained in the previous chapter, the LLM itself often has the ability to rephrase its own prompts in a way that makes them more compliant, and this technique is effective in reducing their length as well.

We can also use summaries. For example, we can summarize a document once, incurring the full cost, and then use the summary for further processing. This approach reduces the number of tokens used while preserving the most important information.

For the model’s responses, we can request it to be less verbose or to follow instructions such as replying in just a single sentence.

While these strategies might seem simple, they’re not trivial. If you look into the leaked system prompt for ChatGPT, you’ll discover that its developers have explicitly instructed it to conserve computing resources. This includes directives to avoid verbosity, such as the guideline to “never write a summary with more than 80 words” in the prompt. If OpenAI sees savings opportunities in commands like this, you can benefit from them as well.

If you’re really looking for something more advanced, there’s [LLMLingua](https://github.com/microsoft/LLMLingua) by Microsoft. This tool uses a compact, thoroughly trained language model like GPT2-small or LLaMA-7B to pinpoint and eliminate unnecessary tokens in prompts. This allows for efficient processing, achieving up to 20 times compression while keeping performance loss to a minimum.

To me, investing in such frameworks really pays off when you’re handling highly complex prompts or when doing stuff like retrieval-augmented generation. However, as the tech evolves, we’re seeing new features, like Google Gemini’s 1 million token context window, enabling users to literally put entire books into these models. If history from the past decade has shown us anything, it’s that people will continue to push the boundaries in unexpected ways with these technologies. So, approaches like these could become increasingly valuable as well.

### Exact caching

Caching is a technique familiar to programmers across many fields, not just those working with AI. If you’re using a framework like [LangChain,](https://www.langchain.com) which is optimized for developing applications powered by language models, you might find caching features already built in. This means you can easily incorporate it into your app without much hassle.

Here’s an example.

```
from langchain.globals import set_llm_cache
from langchain_openai import OpenAI

# To make the caching really obvious, lets use a slower model
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)
```

```
%%time
from langchain.cache import InMemoryCache

set_llm_cache(InMemoryCache())

# The first time, the input is not yet in cache, so request should take longer
llm.predict("What's GitHub?")
```

```
CPU times: user 13.7 ms, sys: 6.54 ms, total: 20.2 ms
Wall time: 330 ms
```

```
%%time
# The second time it is, so we go faster
llm.predict("What's GitHub?")
```

```
CPU times: user 436 µs, sys: 921 µs, total: 1.36 ms
Wall time: 1.36 ms
```

When the framework accesses the cache for the second time, it skips connecting to your provider’s API and fetches the same answer from the data store. This not only reduces costs but also offers a speed benefit of almost 15 times.

However, there are downsides, such as increased complexity but I won't go into more detail on that—every engineer knows how caching can generate problems. And to be fair, you don’t necessarily need LangChain to set up exact caching. It’s easy to implement in any programming language or framework. For example, the effort would be similar even in Ruby on Rails, which is my usual coding environment.

But there are some unique downsides to caching with LLMs that many might find new. One major issue is that the response from the model will remain unchanged until the cache expires. This might work well for certain AI products, but it’s less than ideal for others—particularly those focused on content generation. For example, if you ask an LLM to write a blog post and it produces the same one every time, it clearly is not very good at its job. However, in the case of a customer support chatbot, this might not be a concern at all.

### Semantic caching

The second issue becomes visible soon after implementing exact caching. One user might say “Tell me a joke,” while another asks “Do you know any jokes?” Because these sentences don’t match exactly, the cache will be bypassed.

This is where semantic caching and tools like [GPTCache](https://github.com/zilliztech/GPTCache) become valuable. GPTCache uses embedding algorithms to transform queries into embeddings, employing a vector store for similarity searches on these embeddings. Through this method, GPTCache can recognize and fetch similar or related queries from the cache, enhancing efficiency.

We can integrate GPTCache with LangChain to enhance our previous example.

```
import hashlib

from gptcache import Cache
from gptcache.adapter.api import init_similar_cache
from langchain.cache import GPTCache

def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()

def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    init_similar_cache(cache_obj=cache_obj, data_dir=f"similar_cache_{hashed_llm}")

set_llm_cache(GPTCache(init_gptcache))
```

```
%%time
# This is an exact match, so it finds it in the cache
llm("What's GitHub?")
```

```
"GitHub is a developer platform that allows developers to create, store, manage and share their code."
```

```
%%time
# This is not an exact match, but semantically within distance so it hits!
llm("Explain what GitHub is.")
```

```
"GitHub is a developer platform that allows developers to create, store, manage and share their code."
```

This time, even though our second query wasn’t identical to the first, we still managed to hit the cache successfully.

This solution has its drawbacks, too. With a semantic cache, you might face false positives during cache hits and false negatives during cache misses. So, not only have we added a caching system that increases complexity, but we’ve also introduced a particularly complex type of cache. Hopefully, when we weigh these challenges against potential savings, they will justify the effort involved.

Now, you can see why opting for a dedicated framework like LangChain might be more optimal than just querying external APIs. Both GPTCache and LLMLingua, which we discussed earlier, are available as integrations within LangChain's framework, allowing for seamless chaining. The more complex your required chains are, the more it makes sense to invest in a solid foundation to support them.

### Fine-tuning and model-swapping

If you prefer not to use caching, there’s another strategy to consider. We’re in the middle of the AI boom; with the tech improving quickly, everyone wants to use the latest, state-of-the-art models. However, it can sometimes be more practical to opt for a less advanced LLM and tailor it to your specific needs through fine-tuning.

Fine-tuning is a method where a pre-trained model undergoes additional training on a smaller, specialized dataset. This process adjusts the model’s parameters to improve its performance on tasks related to this new data. It’s like an experienced chef refining a new recipe by tweaking their methods. This approach enables the model to become more specialized, boosting its effectiveness on specific tasks without having to be developed from the ground up.

For example, if we assign a task to GPT-4, it might perform well 80% of the time, while GPT-3.5 might only succeed in 60% of cases for the same task. However, by fine-tuning GPT-3.5 with sufficient specific examples demonstrating how to complete that task, it can eventually match the performance of its newer counterpart.

Research shows that fewer than 1000 data points can be enough for effective fine-tuning. Just 100 data points led to a 96% improvement in GPT-3.5’s ability to answer questions in JSON format, and 1000 data points were enough to surpass GPT-4 in generating raw responses. While GPT-4’s pricing is $0.03 per 1000 tokens for inputs and $0.06 per 1000 tokens for outputs, GPT-3.5’s costs are much lower, at only $0.0005 per 1000 tokens for inputs and $0.0015 per 1000 tokens for outputs. This represents a 60x cost improvement!

If you're interested, here’s a 4-step playbook you can follow.

**Step 1.** Begin with the most advanced model required for your application’s needs. For 95% of companies, this would be GPT-4, but probably not Turbo, as you’re aiming for the highest quality outputs. These will serve as the basis for fine-tuning a smaller model.

**Step 2.** Keep a record of your requests and responses in a format that allows for easy export.

**Step 3.** After gathering approximately 1000 request and response pairs, export and refine the data to ensure both inputs and outputs are of high quality.

**Step 4.** Using your cleaned dataset, [fine-tune](https://platform.openai.com/docs/guides/fine-tuning) GPT-3.5-Turbo or deploy a self-hosted open-source model. Replace GPT-4 with your fine-tuned model and start enjoying the cost savings.

Startups like [OpenPipe](https://openpipe.ai) simplify this process a lot. You can use the OpenPipe SDK as a direct substitute for the standard OpenAI package. Calls made via the SDK are automatically recorded for future training. You’ll also use this SDK to access your own fine-tuned models once they’re up and running.

### Using the AI router pattern

If we push this strategy to its limits, we might imagine multiple small language models that have been fine-tuned to excel at particular tasks. These LLMs can be selected based on the performance and cost-effectiveness of both the base models and their fine-tuned variants. We could reserve the use of a cutting-edge model like GPT-4 only for unfamiliar or new tasks that our more economical, smaller language models are unable to handle.

The diagram below showcases the router pattern, a design strategy [adopted](https://tomtunguz.com/ai-design-patterns) from Tomasz Tunguz.

 ![The AI router pattern](https://read.kamil.fyi/u/router-OyAr8D.jpeg) 

The router functions similarly to our caching layer from the earlier example. It transforms queries into embeddings and employs a vector store for similarity searches on these embeddings, which are then matched with specific models.

A recognized query is directed to a small language model, which is usually more accurate, more responsive, and less costly to run. If the query is not recognized, it’s handled by a large model. They are more expensive to operate but can successfully return answers to a wider range of queries. This approach allows an AI product to strike a balance between cost, performance, and user experience.

Some startups are already implementing patterns like this in their production. For example, Ramp’s multi-LLM strategy uses OpenAI’s GPT-4 for scenarios where output quality is paramount and speed is less critical, Anthropic’s Claude for synchronous tasks needing quick responses, and local models for straightforward tasks where both speed and cost are important.

If you’re looking to adopt this strategy, you can find some tips for selecting the most suitable model for any particular task in Chapter 1. The key is to avoid a vendor lock on a single provider and to design your AI architecture to be modular and adaptable right from the start.

### Summary

These ideas represent the most important portions of what I discovered when I tried to analyze the economics of LLMs. As we continue to deploy this technology across various production settings, it’s likely we’ll uncover even more strategies for optimization and this section of the book will have to be updated.

And as you can see, while LLMs might lower the entry barriers to programming jobs, programmers will still have plenty to do. Prompt engineering might replace some tasks traditionally done through manual coding, but this article highlights how important developer-centric practices like instrumentation, deployment, logging, and monitoring remain. Many of the concepts discussed here can seem like dark magic to those without a technical background. This is precisely why, as an engineer, I find these techniques interesting.

To wrap up, let’s summarize the key points.

- Since large language models operate on a token-based system, controlling the length of repeatable prompts and responses generated by the model can lead to cost savings.
- When working with inputs that are fully under your control and often repeatable, you can use exact caching to enhance speed and reduce costs.
- For a more sophisticated approach that handles non-exact matches, implementing semantic caching can improve your cache hit rate.
- Fine-tuning is also an effective strategy for reducing costs. Research shows that less than 1000 data points can lead to significant improvements through fine-tuning.
- By fine-tuning small language models and carefully orchestrating them, we can create complex systems that work effectively alongside cutting-edge large language models, thanks to the AI router pattern.

Contract signing with two business professionals in a sleek, modern office, one of them handing over a pen, strategic positioning around a polished mahogany table, city skyline in background through floor-to-ceiling windows, shadows playing across determined faces, ambient light, high-angle view, capturing pivotal moment of agreement, tension and accomplishment, ultra clear, cinematic.

# 4. Business challenges

In this chapter, we’ll explain how AI impacts businesses and why adopting new technologies is crucial for staying competitive. However, _truly_ adopting AI comes with challenges. It's not just about adding a feature and slapping AI on top of it. Generative models will become truly transformative only if they enables new business models that wouldn’t be possible otherwise.

So, what does AI mean in your field? How can it impact your revenue and cost structure? Are there new legal requirements that come with using it? We’ll cover all this and more.


## 4.1. Product-market fit

Broadly speaking, latest surge of AI-driven products can be grouped into two categories.

The first includes AI features integrated into a broader service, supplementing its existing value. For instance, consider Box enhancing its platform with natural language search, Zoom introducing transcription services, or Notion integrating an AI assistant to facilitate content creation. Here, even without the AI element, these products would still function.

The second category represents entirely new products, with AI serving as the cornerstone. Without it, these products cease to exist. ChatGPT and Playground, an online AI image creator we’ve already mentioned, are examples.

This stands in contrast to the 2015’s influx of natural language processing related products, which largely remained at the tech demo stage. But I’ve noticed a tendency to overstate the product-market fit of Generative AI because of the first category of products. Many argue that AI’s product-market fit is clearer than, say, that of cryptocurrency, given the surge in companies adopting LLMs or Stable Diffusion. I find this argument superficial. While it’s true that AI is increasingly incorporated into every service, often even when it’s not necessarily that beneficial, it’s rarely the fundamental component.

In my opinion, we’re nowhere near a consensus on product-market fit of AI products creating novel value propositions or business models. They’re in the nascent stages, and it’s unclear whether their current business models can survive. I anticipate seeing as many rise and fall in the fully-AI companies as we’ve observed in the cryptocurrency realm. (And much like cryptocurrency, many of the current winners appear to be infrastructure companies.)

The market seems to confirm that. Recently, Sequoia has followed up on its one-year-old hypothesis regarding the game-changing potential of generative AI. The firm’s primary insight? While generative AI has no shortage of use cases or customer interest, it’s struggling to maintain user retention and daily engagement. 

In terms of one-month mobile app retention, AI-centric apps lag behind established companies. Even when it comes to daily active users as a percentage of monthly active users, generative AI apps have a median ratio of just 14%, well below the 60-65% seen in top consumer companies and WhatsApp’s 85%. (The exception lies in the “AI Companionship” category, represented by apps like Character.) In essence, the real challenge for generative AI isn’t creating demand; it’s in proving sustained value to convert users into daily members. 

There’s no doubt about it—this is still the wild west.

That’s why experiments like [Intercom’s Fin](https://www.intercom.com/fin) are particularly intriguing. Fin is an AI-powered customer service bot. At first glance, it seems to complement Intercom’s traditional value proposition but it proposes an entirely new business model. While Intercom operates on a per-seat SaaS model, Fin’s pricing is based on usage: customers pay 99 cents per resolved conversation. This suggests that, should Fin prove successful, Intercom is prepared to cannibalize its non-AI SaaS operations believing that the new model will become a better business.

It’s a riskier venture than adding another text summarization feature to an existing app.

## 4.2. Chatbot or not?

Did you ask yourself whether a chatbot is truly the best solution for the problem you’re tackling?

Simply adding a chatbot into a startup focused on, say, finding trendy pubs, bars, and restaurants doesn’t necessarily make it more better. Most applications won’t gain any real advantages by transitioning to text-based interfaces. This is especially true for tools geared toward data processing, such as management systems or spreadsheets. And while graphical user interfaces excel in many scenarios and text-based ones have their own set of strengths, each type comes with tradeoffs, too.

Determining the optimal conditions for utilizing chatbots is increasingly critical, especially given the current buzz surrounding large language models. And while ChatGPT got extremely popular, it raises the question: is emulating it necessarily the right move for everyone?

### Cognitive ergonomics

ChatGPT presents an interesting paradox: On one hand, it transcends traditional GUIs by allowing users to make open-ended requests instead of being confined to predefined features with buttons. On the other hand, it reintroduces the complexity of command-line interfaces, as users find themselves needing to remember specific incantations—or prompts—to get the results they want.

Enter cognitive ergonomics. It’s the art that focuses on optimizing mental workflows to allow users to comfortably assimilate new information under specific conditions. From this domain, we can adopt the notion of cognitive efficiency—essentially, the fewer steps needed to accomplish a task, the more efficient and comfortable the experience for the user.

For example, if you’re looking to schedule a new meeting in a calendar app, you’d typically need to unlock your phone, open the app, input the meeting’s date and time, and invite attendees. This process could take around a minute. On the other hand, if you’re near an Amazon Echo speaker that’s always on, you could simply tell Alexa to set up a meeting with your friends for after work tomorrow. This reduces the entire procedure to just a matter of seconds.

In scenarios like these, the development of chatbots is entirely warranted. However, if a voice or text interface doesn’t enhance the efficiency of a specific task, its implementation could be called into question. This basic guideline can serve as a useful principle when you’re designing your own products.

### Hybrid solutions

A more pragmatic approach might involve hybrid solutions, seamlessly transitioning users between conversational and graphical interfaces based on the requirements of the specific task.

This method was employed at my previous startup, which focused on AI for the real estate industry. We used a chatbot to gather apartment or location criteria from users with text conversations. This allowed buyers to succinctly convey all relevant details in one brief message, avoiding the complexity of configuring multiple advanced filters. However, we displayed search results in a web application where users could also manage their meetings, as this aspect would have been cumbersome to navigate using a text interface, especially for multiple appointments.

If, after employing a similar evaluation approach, your chatbot concept remains viable, you’re likely heading in the right direction—well done! However, it’s important not to adopt specific technologies merely because they’re en vogue. The key is to understand the conditions that make a given solution most effective and to apply that knowledge judiciously. This principle holds true for chatbots as well.

### Case study: Turn rough notes into content with AI

Every new wave of technology seems to bring with it a fresh note-taking app that captures the public’s imagination. In its time, Evernote revolutionized the field with its unique approach to note-taking and information organization. Then came Notion, which won people over with its intuitive design and versatile features.

Most recently, I’ve come across [Strut,](https://strut.so) an AI-powered notebook designed for creators, writers, and teams. Utilizing LLM technology, Strut transforms hastily written notes into polished content through the power of natural language processing.

Two compelling questions emerge when examining this case study.

1. First, there’s the matter of Strut’s hybrid interface, which seamlessly blends graphical and chat-based user interfaces. Interestingly, the principles behind this design choice align well with the concepts I’ve discussed today, even though the app didn’t specifically inspire this week’s essay.

2. Second, we need to ask whether Strut qualifies as a feature, a product, or, potentially, a full-fledged business. This is a question that virtually every app in this category has grappled with in the past.

For a meaningful analysis, we’ll weigh Strut—and indeed, any emerging note-taking app—against the current market leader when it comes to design: Notion.

Beginning with the user interface, Notion’s success was initially built on its block-based UI. As a result, even though Notion has introduced an LLM-based AI feature, it remains a peripheral component, tucked away in the shortcut menu. I often overlook its presence and rarely make use of it.

It’s a hybrid UI, but an ineffective one. Not bold enough. The feature comes across as an add-on rather than an integral part of the platform, largely because the original architecture of the app was not centered around AI capabilities. In contrast, Strut employs a dual-panel layout: a text editor on the left and a chat interface on the right, with a command palette acting as a mediator between the two. If you prefer manual note-taking, you can stick to the left panel.

However, the right panel provides the option to interact with Strut’s AI, allowing you to pose questions about your notes or generate additional content. Once you’re satisfied with the AI-generated suggestions, you can easily incorporate them into your main text. It's convenient, user-friendly, and integral to the overall user experience.

This serves as a good illustration of a hybrid UI designed to make the most of LLM technology. The text editor offers all the advanced controls users are accustomed to, including text formatting, sections, and—potentially—comments or blocks. Meanwhile, the chat interface provides the freedom of free-form content generation and is always available, though it can be turned off when unnecessary. Essentially, it offers the best features of both traditional and AI-based interfaces.

The cognitive ergonomics of Strut’s hybrid interface score highly as well. Anyone who has interacted with generative AI knows it’s a deeply iterative process. Given that the output isn’t entirely under your control, the AI requires step-by-step guidance, corrections for inaccuracies, and specific instructions for wording changes. A chat-based UI is particularly well-suited for these tasks. In contrast, some products I’ve encountered use comment-thread systems that necessitate resolution before the proposed text can be incorporated, making the process painstakingly slow and diminishing the sense of seamless interaction. Strut’s right-hand pane functions like a rapid, informal sketchbook, allowing results to be either discarded or carefully integrated into the main text on the left-hand pane.

### A feature, a product, or a business?

Turning to the second question—whether Strut is a feature, a product, or a business—this has significant implications for the company’s future trajectory. Unlike Notion AI, which—as mentioned—feels more like an appended feature, Strut is built entirely around a generative paradigm, positioning it firmly as a product. The lingering question, however, is whether it has the potential to evolve into a sustainable business.

Even though it prevented me from using its AI, Notion’s block-based structure has made it viable business. It allows for limitless functionality—from articles and specifications to image galleries, spreadsheets, and databases. This versatility explains its widespread adoption across diverse startup ecosystems and use-cases. Notion’s multifaceted utility also allows it to replace multiple specialized apps, offering users a way to consolidate expenses. Bye-bye Trello, Asana, Google Docs, Scrivener, iOS Notes, Todoist, Evernote, Airtable, and Google Sheets!

While one could debate the longevity of Notion in the face of competition from giants like Google Docs and Microsoft 365, there’s no denying its capability to compete on that level. The same cannot be readily said for Strut. Instead, it’s far easier to envision Strut being absorbed by these larger organizations and integrated into their broader suites of services. While this isn’t necessarily a negative outcome, it sidesteps the billion-dollar question: how does a newcomer disrupt the market in such a way that it’s not easily overshadowed by established giants?
In Strut’s case, there doesn’t appear to be a definitive answer to that question—at least, not yet.

There are examples that challenge the norm, with Linear being a popular one. Its rise in popularity can be attributed to its design innovation, quality, user-friendliness, and developer-centric approach. Linear was a trailblazer in incorporating the command palette pattern in web design—a feature also found in Strut. The app excels at doing one thing exceptionally well: task and project management. Even when faced with competitors boasting extensive product suites, like Atlassian and Jira, Linear stands out. Its strong design focus and streamlined functionality appeal directly to individual contributors rather than middle managers, allowing it to thrive in a competitive landscape.

But it’s important to note that Linear targets businesses and scales its revenue by adding new user seats, adding a layer of sustainability to its model. There’s also the challenge of vendor lock-in: transitioning a few dozens team members to a new platform, re-establishing all integrations, migrating data, resetting permissions, and reinitiating ongoing projects is no small feat. Strut, on the other hand, is still in its beta phase and currently free, leaving its long-term revenue strategy a matter of speculation for now.

## 4.3. Successes and failures of early platforms

We’ve discussed new business models, but there’s also a lot happening with new AI-powered platforms. In this section, we’ll review a few of them, analyzing their strengths and weaknesses. Hopefully, we’ll gain insight into what works and what doesn't in the realm of generative AI.

### Meta Smart Glasses

In September 2023, Meta, in collaboration with Ray-Ban, [unveiled the successor to its two-year-old smart glasses.](https://about.fb.com/news/2023/09/new-ray-ban-meta-smart-glasses/?ref=kamil.fyi) The updated version continues to be promoted as an everyday wearable, designed to capture photos and videos from a first-person perspective. Like its predecessor, the new smart glasses come equipped with built-in speakers and microphones.

So, what’s new? Meta has announced that in an upcoming release next year, the smart glasses will get multimodal capabilities. This will enable users to engage with their environment using Meta AI. During the event, the company demoed features like playing tennis while asking the glasses whether an out-of-bounds ball was a fault or not.

This is a basic example of a new software and hardware paradigm: _ambient computing._ Ambient computing focuses on the unobtrusive incorporation of technology into our environment to automate tasks and improve our daily lives. Imagine communicating with an assistant like ChatGPT that sees what you see and hears what you hear in real-time, continuously. This approach to information is fundamentally different from using apps on your phone and even diverges from the user experience with stationary smart speakers like Alexa, which have limited access to real-world context.

I brought up Alexa for a reason. Although I’m keen to try them, I haven’t yet experienced Meta’s smart glasses firsthand, so my thoughts are speculative. I suspect that even with the integration of a multi-modal large language model, this product may face challenges similar to those encountered by Amazon. I own an Echo smart speaker and mainly use it for basic tasks like setting alarms, reminders, playing music, and checking the weather—nothing transformative. This limited scope of use is one reason why Alexa hasn’t established a sustainable business model, incurring an annual loss of about $10 billion. It was only with the advent of ChatGPT that a mass-market product of this genre truly took off, rapidly becoming the fastest-growing consumer app ever. This raises an intriguing question: Will smart glasses follow the trajectory of Alexa or that of ChatGPT?

### Custom GPTs

Speaking of ChatGPT… Do you use custom GPTs?

In November 2023, OpenAI [introduced](https://openai.com/index/introducing-gpts/?ref=kamil.fyi) the ability to customize ChatGPT with specific instructions, additional knowledge, and various skills. These custom GPTs can assist in learning board game rules, teaching math to children, or designing stickers. Following this, OpenAI [launched](https://openai.com/index/introducing-the-gpt-store/?ref=kamil.fyi) the GPT Store, making it accessible to ChatGPT Plus, Team, and Enterprise users. This store offers a selection of popular and helpful GPTs.

I haven't talked much about the store yet, but I did have some thoughts on GPTs themselves at their launch:

- Creators were attracting even up to 8,000 users with some successful bots on ChatGPT’s platform. They benefited from SEO as OpenAI’s public catalog ranks high on Google, too.

- Some users felt the new features weren’t very useful, believing they can create similar prompts themselves. This mirrored early views on Dropbox, where tech-savvy users felt they could replicate its services. In my opinion, the challenge lied in making GPTs' advanced features more accessible to those with less technical expertise.

- I wasn’t certain about whether GPTs are apps, chatbots, or autonomous agents. The evolution of the concept of GPTs itself might have been based on the plugin concept—but the original plugins weren’t highly successful.

- Some started using custom GPTs to integrate company documents, showing potential as knowledge bases.

- GPTs might be evolving into Character AI, focusing on artificial personas, though their potential to become platforms for autonomous agents is uncertain, with Actions allowing GPTs to interact with the real world through APIs, potentially evolving into platforms performing tasks independently.

Did any of this happen?

OpenAI reports that users have created more than 3 million custom versions of ChatGPT. However, I haven’t come across any that have went viral, say, taking over Twitter in a single night. It seems that these customizations are primarily used for internal workflows—which is exactly how I use this feature myself. Let me show you.

I've developed three GPTs for my personal use: Summarize, Rewrite, and Density.

- The first two aren't overly complicated. Summarize does just that—it summarizes articles into bullet points for busy, intelligent readers. I use it to assist in drafting Bits for this newsletter.

- Rewrite was also straightforward to create: it rewrites text to sound as if it were written by a native English speaker. I draft all my articles by hand, but editing takes up a significant amount of time because English is not my first language. It’s not that my English skills are lacking, but for some reason, when I edit on my own, I spend hours tweaking and adjusting, never quite satisfied with the outcome. Rewrite solves this.

- Density is the most intricate of the three. It’s a technique developed by the Salesforce AI team, offering a new method for summarizing text using LLMs. Given that many people use LLMs for summarization, the chain-of-density method stands out due to its strong performance in human preference studies, highlighting its value. Remarkably, this approach integrates smoothly with the standard GPT-4 without any need for fine-tuning, underscoring the potential for discovering effective prompting strategies. I turn to it when the basic Summarize doesn’t work very well.

But they’re not apps, chatbots, or autonomous agents as I anticipated. They are shortcuts. That’s precisely how I created them for my use—I integrated them into my custom instructions:

> Treat “/rewrite” as a shortcut for “Rewrite as a native speaker would:”
>
> Treat “/summarize” as a shortcut for "Summarize the following article using bullet points. Keep in mind I have limited time and need a concise, intelligent overview.”

Now, I don’t even have to type the command; I can simply select a custom GPT from the sidebar or, if I'm already in a conversation with ChatGPT, summon any specific GPT using @, similar to mentioning someone in a group chat. This feature is cool and useful since custom instructions are capped at 1500 characters—yet this approach isn’t exactly revolutionary.

I've discussed ChatGPT with my friends who use it for various purposes—some for coding as technical users, and others for more casual tasks. None of them use custom GPTs, likely because they don’t deal with highly repetitive tasks often enough to feel the need—and see the benefit. For example, if you’re a programmer, you don’t really need a specialized GPT; chatting with the base model or using your text editor’s Copilot does the job well enough. (And if you’re a casual, you’ll use ChatGPT to help you draft emails or do homework for you, which the base model does great, too.)

This leads me to believe that custom GPTs may carve out a niche in the enterprise market. Picture a typical company where every team has highly repetitive workflows or tasks they’re looking to automate. These could be shared internally, making them accessible to all employees. Some of these GPTs might also function as knowledge bases. For example, the HR department could upload frequently asked questions about company policies to the platform. This seems like a practical application. While not groundbreaking, it’s a solid product that OpenAI could successfully offer to many companies.

However, regarding consumer-oriented apps, I’m not as convinced.

- Low customer awareness remains a challenge. ChatGPT, being a general tool, and GPT-4, currently the top model globally, are so effective—even GPT-3.5 handles simple tasks well—that many individuals don’t see a need for a custom GPT. This presents a conflict of interest for OpenAI: maintaining the quality of the base model is crucial to keep users engaged.

- The ability to market effectively is constrained. Text does not serve as an effective user interface for sales, impacting various e-commerce sectors that are unlikely to see significant benefits from adopting the GPT Store. From my experience—I’ve given it a shot. Not with ChatGPT, but I attempted to sell real estate using the Messenger platform. It was unsuccessful because chat platforms don’t offer a better UI for browsing inventory.

- The limited ability to deep-link presents a significant hurdle. Everything that is written using ChatGPT stays in ChatGPT. However, developers aim to leverage platforms for user acquisition, trying to then guide users towards their own apps. This introduces another conflict of interest—as OpenAI will prefer to retain user engagement within its own ecosystem. And unlike Apple which doesn’t make all the apps for iOS, OpenAI’s main product already can do most of the things that GPTs made by others can do!

- The absence of analytics is another notable limitation. For example, a significant area poised for development is the attribution of media, specifically crediting the underlying content that fuels AI queries. This involves determining how revenue should be allocated among publishers. However, we have yet to reach this level. In fact, GPT Store apps feature hardly any analytics!

It appears that even ChatGPT struggles to match the success of its base version. Though the product remains highly useful, the platform doesn’t seem as appealing—not just to me, but likely to the broader audience as well.

### Rewind Pendant

Another AI-powered device is [Rewind Pendant,](https://www.limitless.ai/pendant) a wearable designed to capture and transcribe real-world conversations. The transcriptions are encrypted and stored solely on your phone, aligning with the company’s privacy-first ethos. They also provide features that aim to ensure that individuals are not recorded without their explicit consent.

Rewind is a macOS app designed as a searchable archive for your personal and professional life. It monitors your laptop, so if you’re in a meeting, the app can summarize it for you by listening and watching along with you. Importantly, it avoids becoming a privacy concern by performing most of the analysis locally on your device, without relying on cloud storage. When paired with the Pendant wearable, Rewind becomes a personalized AI that includes everything you’ve seen, heard, or spoken, even when you’re on the go. This represents another example of ambient computing, similar to Meta’s smart glasses, as these sensory inputs are eventually processed by a large language model.

The concept is divisive. Your stance will largely depend on your perspective on privacy; it could either be seen as invasive or beneficial. This paradox echoes the dilemma that Google Glass encountered years ago: while the device offered utility, its nerdy design deterred users wary of social judgment. In the case of the Rewind Pendant, the device looks sleeker, but its core function as a recording machine may make people hate you for wearing it so you face social scrutiny for choosing to use it.

This is currently available for pre-order, so the verdict remains pending until real-world use provides definitive answers. The Rewind app is already accessible, though, so I downloaded it and put it through a week-long trial.

If you look at the use-cases outlined on their website, Rewind appears to be most beneficial for individuals who frequently switch between contexts. For example, if you’re a manager juggling Zoom meetings, Notion comments, Linear issues, and emails throughout your workday, it’s easy to lose track of what was said or done where—making a universal, cross-app search engine valuable. If you’re a salesperson managing multiple daily meetings with different prospects, each requiring note-taking and follow-ups, Rewind can automate this process for you and make your life easier, too.

If you’re neither a salesperson nor a manager, as is my case, the utility diminishes. As an engineer, I minimize context-switching when possible and my work is concentrated in fewer platforms. Despite being technologically impressive, Rewind didn’t prove particularly useful for my specific needs. Drafting status updates for stand-up meetings doesn’t justify a $19-per-month price tag for me.

Based on my experience with the app, I can make some educated guesses about the potential utility of the Rewind Pendant, even though Rewind also promotes personal use-cases on their website. 

If you’re a salesperson who frequently engages in face-to-face meetings, the Pendant could be quite advantageous. Similarly, if you’re a manager coordinating multiple teams and need to keep track of numerous interactions, commitments, and insights from meetings, and you operate from a physical office, the Pendant might suit your needs. Attending a conference? The device will likely prove valuable, too. This could also mitigate the social acceptability issue that plagued Google Glass; using the Pendant in a professional setting might gain tacit approval—especially if all meeting participants see the benefit of automated note-taking.

Perhaps long-term use will change my mind. At the moment, its modest utility hardly seems worth the discomfort of constant surveillance.

### Humane AI Pin and Rabbit R1

The [Humane AI Pin](https://humane.com/aipin) is a wearable, voice-controlled device with a digital assistant designed to replace your smartphone for various common tasks. Priced at $700, it has received some of the worst reviews in recent memory.

The [Rabbit R1](https://www.rabbit.tech/) is a standalone pocket-sized device that allows you to interact with AI without the need for a smartphone. Designed like a walkie-talkie, it features a touchscreen, a 360° camera, and an analog scroll wheel. Its main draw is the rabbits, a series of automated scripts designed to handle your everyday mundane tasks. It, too, got eviscerated by reviewers.

The appeal of both device lied not in its hardware, but primarily in its cloud-based AI capabilities. For example, Rabbit utilizes what they term a Large Action Model, trained to understand graphical user interfaces, including the functionality of various buttons and the layout of website content. This means when a user requests something like “order pizza from my favorite restaurant,” the cloud application actually interacts with the Uber Eats interface—provided the user has logged in through OAuth on Rabbit’s website—and places the order.

Spoiler alert: users report that this feature often doesn't work as expected. While Meta's smart glasses may not be as ambitious, they perform their tasks reliably. In contrast, both of these devices promised too much and delivered too little.

Intriguing concepts aside, Rabbit and Humane some questions. Do they need to be standalone devices? The AI Pin was supposedly justified as an always-on device, constantly observing and listening to your environment. R1, however, operates through button inputs. The key here might be its operating system: if it’s designed to interact with various user interfaces directly, rather than through API calls, then this approach could be challenging to implement on iOS and Android due to the extensive app security permissions required. While the technical aspect is impressive, the success of such devices hinges on their practical applications. Previous AI assistants like Google Assistant, Cortana, and Alexa have shown that to persuade people to move away from their smartphones, these devices need exceptionally strong and compelling use cases.

### Character AI

Which platform leads in monthly visits from both desktop and mobile in the realm of generative AI? Is it ChatGPT? Not quite. Bing? Think again. Perhaps Google’s Bard? Almost, but not there yet. The crown goes to [Character AI.](https://character.ai/)

Character AI lets users to design and converse with virtual characters. Teenagers are using this platform to craft personas of fictional entities, such as Aragorn from The Lord of the Rings, or renowned figures like LeBron James or Elon Musk. Even though ChatGPT might hold more brand recognition, Character AI takes the trophy for user engagement. Reports suggest that users dedicate an average of two hours daily on the platform.

While generative AI has no shortage of use cases or customer interest, it’s struggling to maintain user retention and daily engagement. In terms of one-month mobile app retention, AI-centric apps lag behind established companies. Even when it comes to daily active users as a percentage of monthly active users, generative AI apps have a median ratio of just 14%, well below the 60-65% seen in top consumer companies and WhatsApp’s 85%. The exception lies in the “AI Companionship” category, represented by apps like Character.

The real challenge for generative AI isn’t creating demand; it’s in proving sustained value to convert users into daily members—the impressive user retention of Character indicates that the AI Companionship category might possess one of the most compelling product-market fits within the generative AI industry. In a world captivated by new models, this often goes unnoticed. My personal opinion? Character, flying under the radar of tech nerds, is poised to become one of the biggest winners of this early AI era. It has the potential to be the TikTok of gen AI.


## 4.4. Legislation

To analyze the legal aspects of generative AI, I had a conversation with my friend [Maciej Mańturz](https://www.linkedin.com/in/maciej-ma%C5%84turz-b934bb1a2) about the upcoming European Union legislation aimed at regulating AI. We discussed how the new law will impact startups looking to incorporate AI, and what entrepreneurs need to know to stay ahead.

Maciej is a lawyer and a specialist in privacy. The common branches of law never truly resonated with him, and he’s never envisioned himself in a courtroom setting. Eventually, he joined a major corporation, which opened his eyes to the intersection of technology and the business world. His view became that a lawyer should be not an obstacle but a facilitator of business initiatives.

Since then, he’s pursued further education and earned certifications in privacy and broader tech law, covering areas like contracts, intellectual property, a touch of cybersecurity, and even worked on AI, culminating in a postgraduate thesis on the EU’s proposed AI framework—which I read preparing for this discussion.

**Kamil Nicieja: We studied together, so you know I left law school behind. I’ve often reflected on what initially attracted me to law. Once I shifted from law to coding and then to business—which isn’t that different from law—it became clear to me: lawyers are like coders, but they deal with incredibly complex “syntax” and “run” their code in a slow and unpredictable system: the courts.**

**Now, with the emergence of advanced language models, coding feels more and more like drafting laws. You type in a command, and really, it’s anyone’s guess what the outcome will be. Do you see the parallels?**

Maciej Mańturz: It’s intriguing to see you make those connections, given that they’re both apparent and commonly acknowledged. This year, I went to a series of talks with lawyers well-versed in tech and legal intersections, covering fields like Intellectual Property, Cybersecurity, and AI. One specialist had launched a postgrad course: The course teaches tech-centric law and coding abilities, like understanding how an app works, some programming skills, and the software development lifecycle. It’s said to draw inspiration from Western European models and others worldwide.

I read an article which proposed that lawyers, aside from math-heavy tasks, are naturally fit for coding classes, too. The logical nature of both professions likely supports this view. It’s also becoming evident that modern lawyers should stay updated with tech trends, highlighting the value of cross-disciplinary expertise. I only wish such a mindset had been mainstream when we started our studies.

**We’re meeting at a significant time. It’s been five years since the EU introduced GDPR. Now, they’re working on another major tech regulation named the AIA, the AI Act. For those not closely following European legislation, could you explain the main goals of this new bill? Do you know when it might be implemented?**

The final EU Artificial Intelligence Act is expected to be adopted near the end of 2023. It’s clear that AI is no longer just a theoretical idea. Nowadays, you can’t browse social media without coming across news about AI, whether it’s a new advancement or a tool set to change our lives.

There are genuine concerns about the potential negative effects of this technology on the average person. Deepfakes are a prime example of potential misuse. Training AI models, especially using deep learning methods, requires vast amounts of data. Personal information can be as valuable as gold, which is why many social media platforms can seem invasive to our privacy.

There’s a tug-of-war between laws protecting our privacy and businesses looking to profit while offering us these innovative services. There’s a trade-off with our rights. Too strict regulations might drive businesses away, but too lax could jeopardize our rights and security.

**Just like with GDPR, this new regulation isn’t just for companies based in the EU—if you’ve got users in Europe, you’ve got to comply. It’s pretty clear the target’s is largely on U.S. and Chinese tech giants. I’ve come across some in tech circles saying the EU is essentially in a cold war with other major powers, using regulation as their weapon of choice because they can’t go toe-to-toe on tech innovation. Can you break down why, when the EU rolls out a big tech regulation, it becomes a global must-follow? What’s stopping U.S. and Chinese companies from just giving it the cold shoulder?**

GDPR surprisingly set a global precedent but, you know, it kinda worked, right? Many new privacy regulations are modeled after it, even if it’s a challenge for businesses. In the realm of data privacy, there’s ongoing tension about transferring personal data to the US. In essence, every few years, NGOs, led by Max Schrems, prompt the ECJ to declare the US framework incompatible with GDPR. Then, governments negotiate until the EU bodies approve a decision. This back-and-forth often revolves around US laws related to accessing data for national security reasons.

The same is seen with companies like Meta, who continue to operate despite GDPR-related fines. Perhaps the EU’s global influence and potential profits are compelling reasons to keep pushing boundaries. It seems the EU is willing to compromise on GDPR to ensure data transfers, which is also a strategic decision in the global tech race. From a company’s standpoint, it’s simpler to comply with these standards if they aim to operate within the EU. And the EU is a significant market so nobody can just drop it. But then again, I'm no business guru.

**The bill sorts AI systems into two major camps: those that pose a “high risk” and everything else. So what does the EU mean by significant risk when it comes to AI? Does this mean you can basically snooze on this bill if you're developing just another text-summarization app, but you’d better pay attention if your AI could potentially harm people or be used to discriminate against them on a large scale? Like in HR or finance?**

The AI Act adopts a risk-based approach, considering various points in the product chain, from creators to deployers. While it’s up to you to categorize your product’s risk level, disagreements with regulators could result in hefty penalties. Given these potential consequences, as a lawyer I would advise not to dismiss the AIA’s requirements.

Even products deemed lower risk must adhere to the AI Act’s foundational principles, like explainability, privacy, and security. As you mentioned, there are certain activities classified as “high risk” or outright prohibited in the Act. These should be immediate red flags for any company. Businesses should consult legal experts to ensure they aren’t inadvertently falling into these high-risk categories.

**Let’s talk about some tangible scenarios. Suppose I aim to create a foundational model on par with GPT-4. For instance, consider Europe’s prominent AI enterprise, Mistral, and its new LLM. How does the AIA assess the risk associated with such a venture?**

When evaluating the requirements of such a system, a layered approach is essential. The initial step involves determining its placement within the risk classification spectrum. The “high risk” category relates to systems that could significantly jeopardize a person’s health, safety, or fundamental rights.

**Perhaps that’s not a bad choice given that researchers found that Mistral’s model can provide information on topics like bomb construction, suggesting a potential shortfall in their safety measures.**

Yup. And similar to GDPR, the penalties under the AIA are significant. I think they can reach up to 40 million euros or 7% of global annual revenue. This makes the cost of non-compliance even steeper than in the realm of personal data.

**OK, next example: a B2B product that uses AI to monitor daily activities of employees across platforms like Slack, Outlook, Teams, Jira, and compiles a daily company-wide summary. Would this be classified as high-risk? What are the reasons for or against this classification?**

Certainly, this situation could be viewed as high risk since employment is explicitly labeled as such in the AIA annexes. Surveillance also poses additional challenges from a GDPR standpoint.

**Wow, color me surprised. I personally didn’t think this would be a huge deal. Let’s consider one final scenario: a chat application where 500 million users can converse with virtual representations of celebrities like, say, LeBron James about basketball. Would this be deemed high-risk or low-risk?**

I’d argue that this sounds like a low-risk situation. Generally, chatbots don’t fall under the high-risk category. The intent behind these regulations is to prevent misuse or exploitation in areas of public interest, such as welfare, employment, and safety.

However, it’s important to note that creating a virtual likeness of someone must respect intellectual property rights. It’s also now widely understood that users should be informed when they’re interacting with a machine, not a human. The AIA would actually qualify that as a deepfake. Additionally, there will be another EU legal act addressing civil liability for damages, which should also be kept in mind.

**What guidance would you offer to a standard AI startup in Europe? Given that such companies are often small, their financial landscape can be challenging. They might have secured some funding, but it’s equally likely they haven’t. On top of engineering expenses, they’re also faced with the potential costs of legal counsel. Is it wiser for them to address legal matters upfront and, if so, how can they do so affordably? Or should they prioritize gaining traction, securing external investment, and then allocating funds for legal guidance?**

As someone specialized in privacy, I’d stress that any solution aiming for a European launch should adhere to the privacy-by-design principle, especially if it involves personal data. The financial repercussions for not complying with EU regulations are steep and can be quite daunting.

Currently, it seems plausible that only major entities could significantly impact the AI sector due to these regulatory hurdles. It’s uncertain whether they’d even choose the EU as a base for AI development given these challenges. If startups are to thrive in this environment—and I hope they can—it’s wise to seek at least basic legal counsel early on. While a full legal team might not be necessary initially, gaining a foundational understanding of expected requirements is crucial. A good starting point might be recommendations from regulatory bodies like the UK’s ICO.

**Gotcha. I personally believe VC investors can play a significant role here by providing startups in their portfolio with complimentary legal consultations as a value-add. It’s far more efficient for a VC fund to employ lawyers who can assist several startups simultaneously rather than each startup seeking individual legal counsel. Some VCs did that with GDPR.**

**Oh, and while we’re at that… If I’m already compliant with GDPR, does that mean I'm in the clear with AIA as well?**

No, not really.

**If I ask ChatGPT to elaborate on who Kamil Nicieja is, am I making it non-compliant with GDPR? And what if I build an app that uses OpenAI's API and ChatGPT as the engine—am I skirting dangerous legal territory too?**

For a casual request from an ordinary individual, it likely isn’t a major concern, potentially falling under the household exemption for personal data processing. However, if you develop an app, the situation becomes more complex as it enters the realm of business data usage. You’d need to establish a framework between companies, determine the legal basis for processing, and address other details. In essence, while it’s possible to navigate this legally, it would require effort and careful planning.

**I’d imagine that if I strictly adhere to the GDPR, I should be compliant when using user data for AI training, especially if I’ve secured processing consent and taken similar measures.**

While there are similarities between the AIA and GDPR, the current landscape isn’t as straightforward as businesses might hope. Common principles like transparency and security are present in both regulations, but their interpretations might differ. Some aspects of the two regulations even seem contradictory, even though the GDPR was crafted to be tech-neutral.

Generally speaking, the GDPR’s standards are stricter. So, starting with GDPR compliance can provide a solid foundation for meeting some of AIA’s essential requirements, including privacy. Fundamentally, any solution should be designed with privacy as a central focus, in line with the privacy-by-design and privacy-by-default principles, which are integral to the GDPR, regardless of AI involvement.

Furthermore, the GDPR’s provisions on automated decision-making involving personal data introduce a stricter set of requirements when these decisions might impact an individual’s rights. For example, explicit consent is needed instead of just standard consent.

**Where does generative AI fit into this whole equation?**

It depends. Generally, the AIA does distinctly define generative AI and imposes additional stipulations. Like all tools, its risk must be assessed, and it must adhere to general principles. However, the AIA also mandates particular transparency measures, such as disclosing AI-generated content or sharing summaries of copyrighted training data.

**Isn’t this a bit premature? Generative AI has only been mainstream for less than a year, and the EU is already keen on regulating it. Regulation naturally curtails innovation. While the EU claims it aims to regulate only the large, high-risk models, leaving space for research and startups with smaller models, what if this approach is flawed? The larger models are the epicenters of innovation at this moment.**

The issue is undeniably influenced by geopolitical factors, and Europe seems to be trailing. Currently, the US and China lead the AI race. China has already implemented some AI regulations, and its unique standing might enable faster advancements in AI sectors. Given this, Europe’s AIA might already be lagging.

However, I’d be concerned if the regulation hinders technological growth. Scientific research seems better positioned, as the AIA indicates certain exceptions for such work. Startups might face more challenges, but from the EU’s viewpoint, the primary goal is to stay competitive while protecting its customers—and markets.

**Japan’s made a decision that using datasets for training AI models does not infringe upon copyright law. Therefore, model trainers can now access publicly available data without the need for licenses or permissions from the data owners. Does the AIA provide any direction on handling copyright issues in training data?**

As I mentioned before, the AIA explicitly outlines transparency requirements for generative AI and its associated training data. While there are general provisions for intellectual property rights protection within the regulation, it isn’t the primary focus.

In the EU, actions related to this are mainly governed by two exceptions for text and data mining (TDM) in the 2019 Copyright in the Digital Single Market (CDSM) Directive. These exceptions address TDM for scientific research, covered in Article 3, and what’s sometimes termed “commercial” TDM, highlighted in Article 4. For AI models like Midjourney, DALL-E, or Firefly, the relevant regulation to reference is the commercial TDM exception. So if you’re looking for guidance, I’d look there.

**Generative AI has really heated things up in the realm of automated agents—agents are AI-driven characters that use large language models to mimic basic autonomous reasoning. I read in your article about “automatic influence on the individual’s situation,” which apparently flags an AI system as high-risk. That sounds a lot like what I described earlier. Does this mean the EU’s gonna put the brakes on developing autonomous agents?**

Not necessarily. The term “automatic decision making” originates from the GDPR. While there’s a higher standard for processing personal data in this manner, it’s still achievable with the explicit consent of the individual involved. Typically, companies avoid this approach since maintaining such consent can be challenging. They often introduce human intervention to sidestep fully automated decision-making processes.

While implementing a human element in an AI system might not be feasible, the GDPR doesn’t prohibit autonomous agents given specific consent or power of attorney. Based on my understanding of the upcoming regulation, the AIA doesn’t present obstacles either. While there are safety measures and conditions to meet for any AI solution, with some tailored for autonomous agents, I’m unaware of any explicit restrictions imposed by EU regulators on such initiatives.

**Your article talks about how AIA mandates that high-risk systems need to be “transparent and understandable to users,” but let’s be real—most AI systems are black boxes. Stuff goes in, stuff comes out, and what happens in between is anyone’s guess.**

**Now, I get the sense that making tech companies clarify their AI’s decision-making process has been a major sticking point. But you’re saying AIA might not actually require a crystal-clear explanation, just that companies need to give users the lowdown on touchy subjects like how hallucinations work and be upfront about the training data, right?**

Certainly, this appears to be a central and somewhat paradoxical issue from a business standpoint. Given AI’s nature, it’s often challenging to pinpoint precisely how a system, based on various inputs, reaches a specific output. Yet, EU regulators advocate for the explainability principle, implying a clearer understanding.

The silver lining is that the AIA acknowledges this dilemma and, as you noted, doesn’t demand the impossible from developers. It primarily necessitates that entities involved in the AI system’s lifecycle can articulate its foundational principles in layman’s terms and describe the kind of data leading to specific outcomes. This might also encompass offering an alternate prediction or providing the context behind a decision. In this sense, it bears similarities to transparency and fairness principles.

Ultimately, the interpretation will hinge on regulatory guidance. History has shown that some interpretations can be more stringent than necessary. However, the latest version of the AIA offers some protection, and over time, a unified approach should emerge across the EU. It’s evident that regulators are actively engaged in the AI evolution, with some already providing guidance on AI’s interplay with the GDPR. Again, I’d specifically point to the insights from the UK’s ICO as particularly valuable.

**Thank you for this discussion! I’ve gained a lot from it, and I’m confident our readers will benefit as well.**

You’re welcome. Thank you as well!

> A few days after my chat with Maciej, an executive order was signed by President Joe Biden, outlining guidelines for generative AI.
> 
> The approach this executive order adopts differs from the European Union's. Instead of directly addressing risk, it zeroes in on the computational power of the machines used to build foundational AI models. The underlying belief is that the real threat comes from massive supercomputers with hefty price tags, not small startups operating out of garages.
> 
> Many on Twitter have criticized this view as lacking foresight. They argue that in a short span, the computational capabilities exclusive to these supercomputers will be within reach for garage startups. Critics believe that regulations are challenging to reverse, and this particular one may become outdated quicker than most.