From Zero to Product with Generative AI How Today’s Builders Develop Apps Powered by LLMs and Other Models Kamil Nicieja

about this book

In most industries, I’d be considered middle-aged or even young by some standards, having just passed 30. However, in tech—I’m old. (Can’t imagine how people who remember the dot-com bubble must feel.)

This means I experienced the earlier wave of AI firsthand, which eventually became known as machine learning. I witnessed attempts to create products similar to ChatGPT using technology that we now recognize as a dead end. Archaic.

But it also means I’m uniquely equipped for this new generative age. Many topics that are new to those who recently discovered the benefits and challenges of, say, chatbots are familiar to me. I learned them the hard way. In 2015, I tried to build a company around this technology. But it was too early, and we failed. Now, I read the same pitches I once made on the landing pages of others and think, what if.

Thankfully, every failure brings gifts too, known as experience. The goal of this book is to share that experience with you, my reader, and equip y

about this book 737 words

Chapter 1

1. Introduction to generative AI

When it comes to AI, we’re still in the early days. The majority of people haven’t yet adopted these tools for their work, hobbies, or day-to-day lives.

Some have experimented with early versions and found them lacking, abandoning ship before the current wave of tools, with their significantly improved capabilities, came into wider use. Even those who have found some applications often don’t venture beyond their starting point, opting to use these models for basic outputs.

Just recently, I assisted a friend who runs a one-person business with employing AI as a virtual marketing consultant. We used the tool to generate content and strategic ideas for the company. Yet, like all things, the assistant had its limitations. It needed many guiding questions and iterative refinements to produce truly valuable output. If you just skim the surface, it will do so, too. But the outcomes were still notably better than what you’d expect from someone with no marketing experience.

1. Introduction to generative AI 523 words

1.1. What’s generative AI?

Generative AI is a class of artificial intelligence models designed with the specific goal of creating new data samples that resemble a given dataset.

Unlike discriminative models, which focus on classifying or distinguishing between existing data points, generative models aim to understand the underlying data distribution in order to produce novel instances of data. The concept originates from probability theory and statistics but has seen a vast expansion in scope and complexity due to advancements in machine learning techniques and computational resources.

The most prominent architectures in the realm of generative AI include generative adversarial networks, variational autoencoders, and more recently, diffusion models. These architectures serve various purposes: GANs are excellent at generating high-quality and realistic images; VAEs are well-suited for generating new samples while also offering a structured latent space; and diffusion models have found success in a

1.1. What’s generative AI? 293 words

1.2. Will your industry experience any impact?

Every industry will experience generative AI’s impact. The only question is how much.

Obvious examples include industries like marketing, which could use generative AI for creating ad copy, social media posts, or even for strategy planning. In the legal industry, AI could help in drafting legal documents or contracts based on a set of user-defined parameters. This would speed up many administrative tasks, allowing lawyers to devote more time to complex legal issues. In HR, AI could assist in the initial stages of candidate screening by generating interview questions based on the specific needs and culture of the company, or even by assessing the suitability of applicants through automated analysis of resumes and cover letters.

What might this actually look like in a real-world scenario? Let’s walk through a brief case study to visualize some concrete effects.

The gaming industry stands to gain significantly from generative AI, too. We can think of

1.2. Will your industry experience any impact? 502 words

1.3. Types of generative AI

Now, let’s transition from this broad overview to dig into the nuts and bolts of generative AI. This book mainly focuses on two types of gen AI: large language models and text-to-image models. Let’s quickly go over the fundamentals of each.

Large language models

Large language models are constructed using machine learning architectures, particularly deep learning, to process and generate natural language based on a given input. LLMs have grown to become more intricate and are now capable of generating text that often mirrors human-level understanding and syntax. They are a product of advancements in computational power, algorithmic optimization, and vast amounts of data.

These models are typically trained on a broad corpus of text data, encompassing everything from books and articles to websites and social media posts. This eclectic data gathering aims to equip the model with a generalized understanding of human language, including its nuances and subtleties.

1.3. Types of generative AI 463 words

1.4. What’s a prompt?

Prompts serve as the input queries that guide these models to generate a specific type of output. The function of a prompt varies slightly between these two kinds of generative models, but the core principle remains consistent: a prompt initiates the generation process and steers the model toward producing content that aligns with the user’s intention.

For large language models, prompts are typically text-based queries or statements. They can range from simple requests, such as “Tell me about the history of Rome,” to more complex or conditional queries, like “Write a persuasive essay arguing for renewable energy adoption.” The prompt is fed into the model, which then produces text that ideally satisfies the query, all while adhering to the grammar, tone, and context specified.

In the case of text-to-image models, prompts still serve as input queries, but the output is visual rather than textual. Here, the text-based prompts may include descriptions or features that the user w

1.4. What’s a prompt? 231 words

1.5. Foundational models

While these models come in many forms, they all share a common characteristic: they’re large. Exceptionally large.

The size of these models is quantified in terms of parameters. A model parameter is an internal configuration variable whose value can be derived from training data. Essential for making predictions, these parameters determine the model’s effectiveness in solving specific problems. They are learned aspects of the model, gleaned from historical training data. Parameters are fundamental to the operation of machine learning algorithms, and some of these models boast hundreds of billions of them.

The sheer scale of these models precludes individual developers or small organizations from training them, due to prohibitive computational and financial costs. As a result, they are typically developed by major tech companies or heavily-funded startups. This is why these models are often termed “foundational,” as they serve as the base upon which other applications and

1.5. Foundational models 1,179 words

1.6. Real-world product categories

There are a few categories of products where these models are already being used in.

The leading category in this space is AI assistants, which have evolved significantly beyond earlier versions like Apple’s Siri or Amazon’s Alexa. The key distinction lies in how interactions are programmed. Earlier assistants relied on hand-coded responses, excelling in narrow tasks like weather reporting but falling short in complex assignments. In contrast, LLMs can easily draft a creative homework assignment, but are, in exchange, prone to factual inaccuracies due to a phenomenon known as “hallucinations.” We’ll discuss it in the next chapter. So while these models represent the most comprehensive aggregation of human knowledge to date, they lack an inherent understanding of truth.

AI assistants are typically designed for general-purpose use, capable of performing a wide range of tasks. However, a burgeoning subcategory focuses on AI companionship. These are specialized AI pe

1.6. Real-world product categories 449 words

Chapter 2

2. Inputs and outputs

Like many others, my interest in text-based AI was sparked when I started using Siri on my iPhone.

Siri, a chatbot developed before the advent of modern generative AI, has interactions that are predefined rather than limitless. Its capabilities, such as providing weather updates, were anticipated and manually programmed by Apple engineers. This involves querying an API for weather conditions to answer such queries.

But if you ask Siri to write a poem about a recent event, it won’t be able to assist, as its responses to such specific requests aren't predefined. It’ll probably direct you to a Google search instead. In contrast, modern AI assistants can autonomously create a poem, albeit of varying quality, by filling in the details themselves.

These questions and commands—they’re called prompts. When prompted, a large language model generates text, while a diffusion model creates images. In this chapter, we’re going to examine the types of prompts these models can handle, the

2. Inputs and outputs 185 words

2.1. Simple prompts

Let’s begin with something straightforward. I’ve compiled a selection of good prompts I’ve used with ChatGPT in the past 30 days. If you haven’t had the chance to interact with a large language model, this list should provide you with an overview of the fundamental capabilities of models like GPT-4.

“Can you provide 10 alternative names for this piece of code?” This could be for a method, variable, or constant.

“I’m not a native English speaker. Can you help me rewrite this sentence or paragraph to sound more like one?” Sometimes I’m just fine with sacrificing some of my personal voice for the sake of clearer and more professional English.

I often request translations, too. GPT-4 has already proven to be much more accurate and reliable than Google Translate in this aspect.

“Could you interpret this code for me? I’d appreciate a step-by-step explanation.” If the explanation is unclear, then it’s likely that my fellow engineers might also find it hard to understand, indicat

2.1. Simple prompts 1,025 words

2.2. Hallucinations

So, does that mean the task is complete? Just slap AI on every white-collar job and voilà, a fresh product idea emerges, and product design becomes a solved discipline. Well, not exactly. Despite their skill, these models are not all-powerful.

Because they hallucinate.

Hallucinations in large language models refer to instances where the output is coherent and grammatically correct, yet factually inaccurate or nonsensical. In this context, a “hallucination” is the creation of false or misleading information. Such errors can emerge from several causes, including limited training data, biases within the model, or the complex nature of language itself.

For example, in February 2023, Google’s chatbot, Bard, mistakenly stated that the James Webb Space Telescope captured the first image of a planet outside our solar system. This was inaccurate; NASA confirmed that the first images of an exoplanet were actually captured in 2004. Furthermore, the James Webb Space Telescope wasn’t l

2.2. Hallucinations 315 words

2.3. Prompt engineering

Is it possible to solve the problem of hallucinations in generative AI models? And if so, how? Considering the probabilistic nature of these models, complete eradication of hallucinations isn’t feasible. However, there are developed techniques to mitigate the issue. Since LLMs are trained for next-word prediction, the solution partly lies in the formulation of the question itself. By refining our prompts or questions, we can achieve more accurate and reliable outcomes.

That’s the reason behind the emergence of prompt engineering. This new field focuses on crafting and optimizing prompts to maximize the efficiency of language models across diverse applications and research areas. Developers engage in prompt engineering to create strong and effective techniques for interfacing with LLMs and other tools, aiming to minimize issues like hallucinations.

To grasp this concept in action, we’ll analyze advanced prompting methods like chain-of-thought, tree-of-thought, and reflexion.

2.3. Prompt engineering 219 words

2.4. Chain-of-thought

Let’s start with chain-of-thought. This is a novel strategy that prompts the language model to articulate its thought process. By exposing the model to a handful of examples where the reasoning is clearly spelled out, the model learns to similarly outline its reasoning when responding to new prompts.

To illustrate a chain-of-thought approach using different examples than those provided, let’s consider a scenario involving a bookstore and another involving a school’s sports equipment inventory.

Model input:

Question: A bookstore has 200 novels in stock. It receives 4 more boxes of novels, each box containing 25 novels. How many novels are in stock now?

Answer: Initially, the bookstore has 200 novels. Each of the 4 boxes contains 25 novels, so 4 boxes total 4 × 25 = 100 novels. Adding these to the original stock, 200 novels + 100 novels = 300 novels. The answer is 300 novels.

Question: The school’s sports department has 15 basketballs and 10 volleyballs.

2.4. Chain-of-thought 566 words

2.5. Plan-and-solve

To further enhance the precision of the output, we may sometimes consider giving the model an outline of the specific steps we anticipate it will handle. This is called the plan-and-solve approach.

This method involves two steps: first, creating a plan to break down the whole task into smaller subtasks, and then executing these subtasks as per the plan. PaS prompting significantly enhances the quality of the reasoning process generated.

For example, let’s consider our sports inventory scenario:

Model input:

Q: The school’s sports department has 15 basketballs and 10 volleyballs. If they purchase 3 new sets of sports equipment, each set containing 5 basketballs and 4 volleyballs, how many of each type of ball do they have now? First, let’s break down the problem and identify key variables and their values. Next, we’ll formulate a plan. Then, we’ll proceed with the plan, computing intermediate results while being mindful of both calculation accuracy and common se

2.5. Plan-and-solve 1,294 words

2.6. Reflexion

Reflexion is another technique designed to curb the issue of hallucination in current generative models. It employs a feedback loop that corrects errors autonomously, creating a “model-in-the-loop” framework as opposed to the traditional “human-in-the-loop” system. In essence, one language model reviews and refines the output of another.

I also stumbled upon this method while I was developing Changepack. One fascinating discovery was when I tried to make ChatGPT consistently spit out changelogs in HTML, but to no avail. Sometimes it would give me Markdown, and other times, the results wouldn’t have any formatting. Despite trying multiple variations, I just couldn’t pin it down. Admitting defeat, I turned to ChatGPT to fix its own prompt. To my surprise, it rewrote it… and it worked. Consistently. I suppose it explained to itself what to do in a way it could comprehend, which I still find amusing.

This is the model-in-the-loop approach. Ins

2.6. Reflexion 575 words

2.7. Tree-of-thought

When tackling intricate problems that necessitate foresight or exploratory reasoning, the prompting approaches we’ve already explored may fall flat. This is where the tree-of-thought method becomes invaluable. Essentially, the framework operates on several key principles:

It structures the reasoning process as a branching tree, where each node signifies an intermediate line of thought or logical step on the path to a solution. In the realm of math problems, for instance, each node might represent an equation.
Instead of linearly generating a single line of reasoning as seen in chain-of-thought methods, this approach produces multiple potential thoughts at every node. This enhances the model’s ability to examine a range of possible reasoning trajectories.
The model itself evaluates the merit of each thought or node by determining its validity or likelihood of success. This acts as a heuristic, offering guidance on which branches of the tree to pursue further.
Adva

2.7. Tree-of-thought 658 words

2.8. Chain-of-density

The next method aims to demonstrate the extent of detail you can put into your prompts—and it’s probably significantly more than you imagine! The Salesforce AI team rolled out a fresh approach for LLM-based text summarization, dubbed chain-of-density. The researchers recognized the fine line between detail and essential ideas when summarizing text. To address this, they developed a new prompt that lets you tweak the summary’s density to your liking.

Here's the prompt:

Article: [Paste the article here…]

You will generate increasingly concise, entity-dense summaries of the above article.

Repeat the following 2 steps 5 times.

Identify 1-3 informative entities (“;” delimited) from the article which are missing from the previously generated summary.

Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the missing entities.

A missing entity is:

2.8. Chain-of-density 752 words

2.9. Chaining outputs

The inputs and outputs we discussed were relatively brief. Even if the prompts were complex, they consisted of a few paragraphs, and the model’s responses were similar. But what if we wanted the AI to write an entire book, for example? When I typed this prompt into ChatGPT, I got just nine paragraphs of something the model considers a “book,” which they obviously are not.

So the question is, is there any way to guide the model into writing a book, even though it can’t do it on its own?

It turns out the answer is yes. If you, as a prompt engineer, know how to write a book, you can hand-hold the model step by step. First, you ask it to generate a few topics and choose the best one. Next, you have it write an elevator pitch for the book to determine the approach to the topic. Then, you instruct it to create an outline based on this pitch. For each chapter from the outline, you have it develop a plan. Finally, for each bullet point, it writes a few paragraphs. The model might not

2.9. Chaining outputs 889 words

2.10. Beyond prompts

Having covered all these techniques, I must confess something to you, my reader, with a bit of embarrassment: I don’t even like prompts.

It’s true that the basics are straightforward and open to everyone. However, when it comes to tasks with edge cases, the need for clear expression of vague preferences, or tasks that require a precise understanding of LLM behavior, writing prompts can be difficult. Take the tree-of-thought technique we’ve discussed before. That wasn’t an easy prompt to write.

What about images, though? I found a prompt for a generated image on my feed in Playground, an AI image app:

Digital realistic art of a Plymouth Road Runner 426 Hemi, tinted windows, speeding through the city streets of Miami, the sun shining on the bright orange body and chrome parts. Professional digital painting made with alcohol inks and acrylic, in the style of WLOP, RHADS, APK, vibrant colors, sharp focus, vanishing point, three-point perspective. Hi

2.10. Beyond prompts 647 words

Chapter 3

3. Technological design constraints

In Chapter 1, we explored the vast potential of generative AI in revolutionizing various industries. Chapter 2 talked about the nooks and crannies of prompting, highlighting its role in creating a new wave of products highly responsive to human interaction. However, generative AI is still emerging in terms of widespread adoption. While promising, the technology is not without its hurdles and constraints. It isn’t all-powerful.

In this chapter, we will address challenges that product professionals need to overcome for effective use of generative AI. We’ll begin with technical constraints, one of which—hallucinations—you’re already familiar with from Chapter 2. There are other similar challenges to explore. Next, we will transition to issues tied to product and design.

Let’s dive in.

3. Technological design constraints 124 words

3.1. Context windows

Current generative AI models, whether textual or graphical, still face significant challenges when it comes to maintaining context. While they can remember things within a single session, these models are quick to forget once the session is lost or a new one starts, necessitating the need to prompt them with relevant details all over again.

For instance, if I’m using GPT-4 to brainstorm marketing strategies for the launch of a new feature but fail to save the conversation or simply lose it, ChatGPT will not recall the context I’ve already shared. Details such as my business description or my target audience profiles will need to be re-provided when I approach OpenAI for help with launching another feature.

Custom instructions do offer some level of control over how ChatGPT responds, allowing you to set your preferences and have them remembered for future conversations. I’m not particularly fond of this feature, given

3.1. Context windows 440 words

3.2. Moderating non-determinism

Non-determinism refers to the unpredictability of outputs from generative AI models. When you input the same prompt into a generative AI model multiple times, you often receive different responses. This variability can be a double-edged sword. On one hand, it allows for creativity and diversity in outputs, essential for applications like content generation, art, and creative writing. On the other hand, it makes reliability harder to achieve, which are critical for many applications.

Traditional software relies on predictable outputs for given inputs. Non-determinism can lead to frustration when the AI generates unexpected or irrelevant responses. This unpredictability can erode trust in the product, especially in applications requiring precise and reliable outputs, like customer support.

Need an example? Air Canada lost a small claims court case against a grieving passenger after trying unsuccessfully to disavow its AI-powered chatbot. The passenger argued they we

3.2. Moderating non-determinism 1,857 words

3.3. Async

Unlike traditional software systems that operate mostly by giving you instant answers, current generative AI systems are fundamentally asynchronous.

For example, when using a language model like ChatGPT, users type their queries into a text box. This input is then sent to the back-end server for processing and streaming. Streaming enables real-time data flow between the client and server, providing users with incremental updates. For example, in a chat application, a generative AI can stream responses as they are being generated. This creates a more dynamic and engaging user experience, as users can see partial responses and start interpreting them even before the entire output is completed.

Standard UI elements like buttons and sliders are also used to trigger AI actions. These triggers can initiate complex back-end processes, such as generating an image, composing a piece of music, or performing data analysis. The asynchronous nature of these interactions allows the front-end interfac

3.3. Async 1,038 words

3.4. Discovery

When exploring AI-powered apps to write case studies for this book, I stumbled upon a fascinating solution to a classic design problem: When you have the power to instruct AI to create anything, what do you specifically ask for?

Since generative AI is inherently unpredictable, no two outputs will be identical if different seeds are used. As a result, achieving the desired outcome often involves a process of iteration, brimming with experimentation, due to either ambiguous instructions or the unpredictable nature of the results.

Take this for instance: I use Playground to conjure feature images for my articles. I aimed to produce an image of a semi-robotic cat licking its paw. It took an arduous 50 attempts before I was content with an image. Yet, despite my best efforts, the algorithm fell short of depicting the cat in the specific act of licking. And believe me, my early days experimenting with image-centric models were even more challenging.

![I tried

3.4. Discovery 295 words

3.5. Articulation barriers

As we discussed in Chapter 2, the latest surge in generative AI is driven by the use of prompts—instructions or queries that you feed into a model to guide its responses. As multi-modal AIs gain traction, the role of prompts is expanding beyond text to encompass vision and voice. Virtually everything becomes driven by text. Take DALL-E 3, for example, which produces images influenced by the messages you exchange with ChatGPT.

This approach has its advantages. For one, it fosters a more conversational interaction with the product, making the technology more approachable. Using verbal prompts is often more intuitive than navigating a complex user interface, too—after all, you already know how to express yourself. You learned it in school!

Or did you, really?

Language proficiency

That’s not always obvious. To start, users need to be eloquent enough to craft the necessary textual prompts effectively. Also, since most of these models are primarily trained on English

3.5. Articulation barriers 769 words

3.6. The economics of LLMs

As AI startups grow, there’s a trend of sharing memes on Twitter about massive bills from OpenAI. Some companies are posting about receiving bills of $8,000 or even $25,000, which can amount to about 10% of a startup’s monthly recurring revenue.

In the past decade, we’ve seen similar situations with cloud service bills. Back then, teams didn’t worry too much because if their services gained popularity, they had access to almost unlimited venture capital. However, in today’s climate, with the end of the zero interest rate policy era, companies need to be much more mindful of costs right from the start.

So, the big question is, how can we reduce costs? Naturally, the main solutions include developing more efficient models and improving hardware. However, we can also apply software engineering or prompt engineering techniques to cut expenses. This article explores the following strategies:

Trimming prompts and responses to minimize token usage
Implementing caching,

3.6. The economics of LLMs 2,326 words

Chapter 4

4. Business challenges

In this chapter, we’ll explain how AI impacts businesses and why adopting new technologies is crucial for staying competitive. However, truly adopting AI comes with challenges. It's not just about adding a feature and slapping AI on top of it. Generative models will become truly transformative only if they enables new business models that wouldn’t be possible otherwise.

So, what does AI mean in your field? How can it impact your revenue and cost structure? Are there new legal requirements that come with using it? We’ll cover all this and more.

4. Business challenges 95 words

4.1. Product-market fit

Broadly speaking, latest surge of AI-driven products can be grouped into two categories.

The first includes AI features integrated into a broader service, supplementing its existing value. For instance, consider Box enhancing its platform with natural language search, Zoom introducing transcription services, or Notion integrating an AI assistant to facilitate content creation. Here, even without the AI element, these products would still function.

The second category represents entirely new products, with AI serving as the cornerstone. Without it, these products cease to exist. ChatGPT and Playground, an online AI image creator we’ve already mentioned, are examples.

This stands in contrast to the 2015’s influx of natural language processing related products, which largely remained at the tech demo stage. But I’ve noticed a tendency to overstate the product-market fit of Generative AI because of the first category of products. Many argue that AI’s product-market fit is cl

4.1. Product-market fit 501 words

4.2. Chatbot or not?

Did you ask yourself whether a chatbot is truly the best solution for the problem you’re tackling?

Simply adding a chatbot into a startup focused on, say, finding trendy pubs, bars, and restaurants doesn’t necessarily make it more better. Most applications won’t gain any real advantages by transitioning to text-based interfaces. This is especially true for tools geared toward data processing, such as management systems or spreadsheets. And while graphical user interfaces excel in many scenarios and text-based ones have their own set of strengths, each type comes with tradeoffs, too.

Determining the optimal conditions for utilizing chatbots is increasingly critical, especially given the current buzz surrounding large language models. And while ChatGPT got extremely popular, it raises the question: is emulating it necessarily the right move for everyone?

Cognitive ergonomics

ChatGPT presents an interesting paradox: On one hand, it transcends traditional GUIs by allowin

4.2. Chatbot or not? 1,550 words

4.3. Successes and failures of early platforms

We’ve discussed new business models, but there’s also a lot happening with new AI-powered platforms. In this section, we’ll review a few of them, analyzing their strengths and weaknesses. Hopefully, we’ll gain insight into what works and what doesn't in the realm of generative AI.

Meta Smart Glasses

In September 2023, Meta, in collaboration with Ray-Ban, unveiled the successor to its two-year-old smart glasses. The updated version continues to be promoted as an everyday wearable, designed to capture photos and videos from a first-person perspective. Like its predecessor, the new smart glasses come equipped with built-in speakers and microphones.

So, what’s new? Meta has announced that in an upcoming release next year, the smart glasses will get multimodal capabilities. This will enable users to engage with their environment using Meta AI. During the event, the co

4.3. Successes and failures of early platforms 2,744 words

4.4. Legislation

To analyze the legal aspects of generative AI, I had a conversation with my friend Maciej Mańturz about the upcoming European Union legislation aimed at regulating AI. We discussed how the new law will impact startups looking to incorporate AI, and what entrepreneurs need to know to stay ahead.

Maciej is a lawyer and a specialist in privacy. The common branches of law never truly resonated with him, and he’s never envisioned himself in a courtroom setting. Eventually, he joined a major corporation, which opened his eyes to the intersection of technology and the business world. His view became that a lawyer should be not an obstacle but a facilitator of business initiatives.

Since then, he’s pursued further education and earned certifications in privacy and broader tech law, covering areas like contracts, intellectual property, a touch of cybersecurity, and even worked on AI, culminating in a postgraduate thesis on the EU

4.4. Legislation 3,102 words