2.5. Plan-and-solve

To further enhance the precision of the output, we may sometimes consider giving the model an outline of the specific steps we anticipate it will handle. This is called the plan-and-solve approach.

This method involves two steps: first, creating a plan to break down the whole task into smaller subtasks, and then executing these subtasks as per the plan. PaS prompting significantly enhances the quality of the reasoning process generated.

For example, let’s consider our sports inventory scenario:

Model input:

Q: The school’s sports department has 15 basketballs and 10 volleyballs. If they purchase 3 new sets of sports equipment, each set containing 5 basketballs and 4 volleyballs, how many of each type of ball do they have now? First, let’s break down the problem and identify key variables and their values. Next, we’ll formulate a plan. Then, we’ll proceed with the plan, computing intermediate results while being mindful of both calculation accuracy and common sense. We’ll solve the problem step by step and present the final answer.

Model output:

A: To solve this problem, we’ll first establish the key variables and their initial values. The school currently has 15 basketballs and 10 volleyballs. They plan to purchase 3 new sets of sports equipment, with each set containing 5 basketballs and 4 volleyballs.

Our goal is to determine the total number of basketballs and volleyballs after these purchases. To do this, we’ll calculate the number of balls in the new sets and then add these to the existing totals.

For the basketballs, each set contains 5. Since 3 sets are being purchased, we multiply 5 basketballs by 3, giving us the total number of basketballs in the new sets. Similarly, for the volleyballs, each set contains 4 volleyballs. Multiplying this by 3 gives us the total number of new volleyballs.

Adding these numbers to the existing totals will give us our final count for each type of ball. We'll perform these calculations with care, ensuring accuracy and applying common sense to check our results.

As you can see, this is an overly complicated approach for a straightforward calculation. However, that doesn’t imply the technique is without merit. The neat part? You can guide the model to learn various planning methods as long as you can steer it towards a sensible sequence of steps to execute. Using plan-and-solve prompts enables us to mimic human-performed workflows, which an unprompted AI might struggle with.

As we analyze more complex scenarios, particularly business tasks or workflows like the upcoming example, we’ll notice that these models tend to get more easily confused compared to dealing with elementary school math formulas.

Case study: The smart reactivity of LLM-based apps

You might be wondering: This is nice and all, but how can I apply what I’ve learned? Am I supposed to create a calculator app? That’s not my goal! And I would agree with you. Now that we’ve grasped this piece of theory, let’s apply it to a real, though not overly complex, project.

There’s a limit to what you can achieve using just tags, keywords, likes, votes, and other “simple” metadata.

The first wave of metadata-driven products emerged with the advent of social networks. Platforms encouraged users to “like” various items and operated on the naive assumption that if individuals within your network appreciated something, you would likely enjoy it as well. And so we got Digg, Facebook, Twitter, YouTube, and many, many more…

The second generation of smarter reactivity leveraged classification algorithms. Consider TikTok, for example. It cleverly employs AI to determine the content you engage with, then curates more of what might appeal to you, bypassing your social connections. Just watch the stuff you like—we’ll figure out the rest on our own. While this was groundbreaking at a large scale, it’s only scratching the surface of what’s next.

Enter large language models.

The upcoming wave will pivot from mere classification to reasoning and cognition. Even though LLMs sometimes err and glitch, they exhibit a semblance of reasoning in many straightforward scenarios. The debate on whether this mirrors human-level thought is ongoing, but for many applications, even current capabilities suffice. Let me illustrate with a personal example.

As a proof of concept, I built Changepack, an open-source changelog tool integrated with ChatGPT. Changepack syncs with your GitHub activity, streamlining progress tracking. Every month, Changepack selects the most noteworthy updates, crafting a release note draft for your perusal and dissemination.

This selection process harnesses ChatGPT. In essence, I task the algorithm with sifting through recent changes, selecting the most pertinent ones, and justifying its choices for optimized outcomes. This allows me to proactively compose a draft for release notes without any human input, compelling the AI to scrutinize the core content, and deduce conclusions for me, sidestepping the need for behavior-based metadata.

This process involves two steps. Initially, AI needs to assess each change:

As an AI language model, your assignment is to evaluate an outstanding task related to a product called (name). The task’s explanation is technical jargon and is geared toward the organization’s in-house departments.

(Introducing the product here…)

(Introducing the target audience's description of the product here…)

Your task has two aspects:

  1. Assess the task, underlining parts that could be unclear to the target audience. Propose changes to enhance readability and improve understanding, while maintaining a professional yet accessible tone. Identify any mentions of specific staff members or proprietary tools including but not limited to feature management platforms like LaunchDarkly, customer messaging tools like Intercom, user behavior analytics platforms like FullStory, project management software like Jira, and others.

  2. Craft a clear and succinct summary of this task. This summary should not exceed 600 characters and should not include any reference to specific staff members or proprietary tools such as LaunchDarkly, Intercom, FullStory, Jira, and the likes. The objective is to convey the essential alterations and updates to the product. Remove all URLs, regardless of whether they are in HTML or Markdown format.

Now, please evaluate and summarize the following task…

Following the cleanup phase, where the AI summarizes all tasks, we then instruct it to select the most critical ones:

Based on the following updates provided, identify and summarize the most impactful changes for (name)’s customers. As you select each update, please provide a brief rationale for its inclusion. It's crucial that you do not reveal names of any specific users, clients, accounts, or organizations.

Typically, we would consider various metadata indicators, such as keywords, the number of lines of code altered, the number of contributors to the feature, or the volume of added comments, to gauge the significance of a change. However, we take a different approach in this case. We direct the AI to assess the changes solely from the perspective of the target audience. This approach compels the AI to evaluate the importance of these features based on customer perception. Next, we instruct it to prioritize the list according to what would matter most to the customer.

Before implementing this plan, Changepack struggled to consistently deliver outcomes that met my expectations. The quality varied because the model lacked a clear understanding of what I wanted from it. It became clear that the model needed precise instructions and a well-defined objective to perform at its best. I began to refine my approach, carefully outlining my needs and what I anticipated. This change in strategy proved to be a turning point. Slowly, the results started to better match my vision.

I anticipate many apps will tread this path in the coming years, as we push the boundaries of LLMs beyond routine tasks.