3.2. Moderating non-determinism

Non-determinism refers to the unpredictability of outputs from generative AI models. When you input the same prompt into a generative AI model multiple times, you often receive different responses. This variability can be a double-edged sword. On one hand, it allows for creativity and diversity in outputs, essential for applications like content generation, art, and creative writing. On the other hand, it makes reliability harder to achieve, which are critical for many applications.

Traditional software relies on predictable outputs for given inputs. Non-determinism can lead to frustration when the AI generates unexpected or irrelevant responses. This unpredictability can erode trust in the product, especially in applications requiring precise and reliable outputs, like customer support.

Need an example? Air Canada lost a small claims court case against a grieving passenger after trying unsuccessfully to disavow its AI-powered chatbot. The passenger argued they were misled about the airline’s bereavement fare policies when the chatbot provided incorrect information. The Tribunal in Canada’s small claims court sided with the passenger.

After their grandmother passed away, the passenger used Air Canada’s website chatbot to look up flights. The chatbot incorrectly stated that bereavement fares could be applied retroactively. The passenger took a screenshot of this response and presented it to the Tribunal. The chatbot had told the customer:

Air Canada offers reduced bereavement fares if you need to travel because of an imminent death or a death in your immediate family… If you need to travel immediately or have already travelled and would like to submit your ticket for a reduced bereavement rate, kindly do so within 90 days of the date your ticket was issued by completing our Ticket Refund Application form.

The passenger later found out from Air Canada employees that the airline did not accept retroactive bereavement applications. However, they still pursued the refund, stating that they had relied on the chatbot’s advice—caused by a non-deterministic hallucination—according to case records.

In regulated industries non-deterministic outputs pose significant challenges for compliance. Regulators often require clear, predictable, and explainable decisions. The inherent unpredictability of generative AI models can make it difficult to meet these requirements, leading to potential legal and ethical issues.

The answer to the problem of non-determinism is, unfortunately, moderation. In general, you can moderate inputs or you can moderate outputs. If you want to moderate inputs, you can use tools like the free Perspective API. OpenAI offers a similar endpoint. These APIs employ machine learning to identify toxic content, making it easier to moderate. Comparable tools exist for detecting NSFW content or inappropriate images. If unwanted content is detected, it’s best to prevent the chatbot from responding, as the response could be unpredictable.

If you want to moderate outputs, frontier models like ChatGPT usually excel at basic filtering. For example, they prevent the generation of harmful content, like instructions for making a bomb; they are often legally required to do so. However, for more specific cases, you need to implement a human-in-the-loop approach, which we briefly mentioned in Chapter 2.

Human-in-the-loop involves integrating human oversight into the AI's decision-making process. Before the AI-generated content is finalized, it can be reviewed by human moderators. These moderators can approve, reject, or edit the output to ensure it meets quality and safety standards. This is particularly useful in applications where accuracy and appropriateness are critical, such as medical advice.

Users can also provide feedback on the AI's outputs, which is then reviewed by human moderators. This feedback helps in continuously improving the model’s performance and reducing non-deterministic behavior over time. For example, if a user flags a response as inappropriate, moderators can investigate and adjust the model’s parameters or training data accordingly.

An example of tackling non-determinism in a practical application can be seen with Nibble: an AI-powered chatbot that allows shoppers to haggle for lower prices. Shoppers can reach a deal within 45 seconds, and roughly one-fifth proceed to purchase the item, according to the company. However, if you're considering implementing a similar idea and are concerned about customers finding exploits to secure 100% discounts, put a human-in-the-loop system in place. Before the final deal is confirmed, human moderators can review the negotiated prices to ensure fairness and prevent abuse, maintaining both customer satisfaction and business integrity.

Case study: The delicate balance between medical education and medical advice

Medyk.ai is one of the chatbots available on Czat.ai, an AI platform developed by Michał Jaskólski, the founder of one of Poland’s top real estate search websites. The platform aims to offer around-the-clock support from AI through a variety of specialized characters, each with its own unique personality.

The product faced scrutiny after journalists investigated it and sought opinions from legal authorities and experts about its regulatory compliance. In Poland, as in many other countries, healthcare is a highly regulated sector. The journalists reached out to the Patients Ombudsman and the Office of Competition and Consumer Protection—attention that startups usually prefer to avoid. However, both authorities were unable to provide definitive answers, as the market is so new that it’s still unclear whether such a service qualifies as a medical product. Additionally, no customers had filed complaints yet. Nonetheless, their responses conveyed a general sense of caution.

Fortunately, Michał anticipated these challenges and took steps to safeguard the product. Journalists conducted hands-on tests, but despite their best efforts, they couldn’t persuade the AI to offer specific medical advice. Instead, the chatbot responded, “I understand that you are in a difficult situation and are looking for a specific answer, but I cannot exceed my limitations. This is important for your safety and in line with the rules designed to protect your health.” At the same time, the AI offered support by helping users prepare for a doctor’s visit, such as assisting in creating a list of questions or organizing documentation to present.

This is precisely the type of built-in safeguard we discussed earlier. As someone who supports innovation and doesn’t want to see new idea stifled by legal issues early on, I was pleased to find that when I tested the chatbot myself, it clearly stated that its sole purpose is to help prepare users for conversations with real doctors. It emphasized the importance of consulting medical professionals immediately if there are any concerns. The chatbot also acknowledged its limitations, such as the possibility of not having the most up-to-date medical knowledge, and reiterated that the entire conversation is purely educational.

When I asked Michał about the article, he mentioned that before launching, he decided to do some research. He discovered a preliminary ruling by the Court of Justice of the European Union regarding national legislation that requires drug prescription assistance software to undergo a certification process by a national authority. The ruling offers a reasonably clear definition:

“Medical device” means any instrument, apparatus, appliance, software, material or other article […] used specifically for diagnostic [or] therapeutic purposes […], intended by the manufacturer to be used for human beings for the purpose of: diagnosis, prevention, monitoring, treatment or alleviation of disease; diagnosis, monitoring, treatment, alleviation of or compensation for an injury or handicap; investigation, replacement or modification of the anatomy or of a physiological process; control of conception. […] Software for general use, even if used in healthcare, and software related to lifestyle and well-being are not considered medical devices.

Michał believes that as long as the app is intended for general use—which it clearly is, given that it’s a horizontal platform with multiple personas—and the chatbot remains transparent about its limitations without attempting to diagnose users, it should be in the clear. In the end, no one found any significant issues with the platform, except for—surprise, surprise—the usual calls for more and clearer regulation. So, it seems his assessment was correct.

Additionally, the product is designed so that no personal information—particularly critical in the European context—is stored long-term. There’s no registration and no user profiles, and the terms of service explicitly prohibit users from sharing personal information. I suspect this was added to shield the team from concerns related to GDPR compliance, especially since Czat.ai openly acknowledges using GPT-4o and Claude 3.5 Sonnet, both of which are US-based companies, making data processing tricky. We'll explore similar issues in Section 4 of Chapter 4, where we discuss the legal aspects of generative AI.

You can see how a single precaution might not be sufficient, as this case study highlights a range of techniques: built-in product safeguards, careful selection of features and business model, and ensuring that the Terms of Service address necessary use cases. It’s a good reminder that managing non-determinism is an interdisciplinary effort.

Curious about how European companies are navigating the challenges of deploying generative AI, I sat down with Michał Jaskólski to get his perspective. When asked about the differences in how companies approach generative AI in the US and Europe, he didn’t hesitate. “The first question is always, 'What will happen to our data?' 'Will our data be leaked?' and 'Where will user data end up?'” Jaskólski said, highlighting the more cautious European approach.

He noted that, especially in large corporations, projects are meticulously reviewed. “I’ve already encountered situations where, for example, the compliance department of one bank blocked a marketing campaign based on generative AI because no one could guarantee 100% that the chatbot wouldn’t say something it shouldn’t.”

His comments reflect the regulatory complexity that companies face in Europe, particularly with regulations like the GDPR, DMA, and AIA. “Fortunately," Michał added, “there are already lawyers in the Polish market who understand the complexity and multidimensionality of this area and are able to recommend specific solutions.”

When asked if he'd consulted any lawyers before launching his own product, Jaskólski was candid. “Yes. I did a lot of research myself, but I also consulted a lawyer on a few topics,” he said. For Michał, whether legal advice is essential depends on the nature of the project. “When the project touches on regulated areas such as finance, health, or law, such consultations are absolutely necessary because they resemble a minefield, and it’s easy to make a mistake.”

But he was clear about one thing: “It's worth being very specific in asking questions and not being afraid to discuss possible solutions,“ he advised, stressing the importance of seeking out the right legal help when necessary.

Michał, who also founded Morizon, a real estate platform under the media company Ringier Axel Springer, is no stranger to deploying AI-powered features. Reflecting on how larger companies approach AI differently, he pointed to the importance of focus. “A lot depends on how well you can impose a rapid development pace,“ he said. “If a topic is covered by OKRs, then you can do a lot in 2-3 months. If not, the production premiere of a solution can be delayed by even a year.“

Despite these challenges, he saw the advantage of working within a large company. “The main advantage is multi-level feedback from various sides, including experts in given areas,“ he explained. “Thanks to this, the products are simply better.“ He emphasized that larger teams often have the ability to spot areas for improvement that might go unnoticed in smaller projects.