Teaser
Hallucinations, IP loss and no data privacy – why making a new AI-powered app is not as easy as integrating ChatGPT makes it seem.
Introduction
With the constant stream of articles exclaiming the miracle power of ChatGPT for everything from querying anything anywhere in natural language to assistants for coding, writing, and personal organization, it’s easy to believe that making these new apps is super simple. Applications that were once far-fetched are now possible, all in a few hours’ work and without enormous teams of developers anymore, or so we are led to believe.
Looking at some of these new products powered by ChatGPT, it certainly seems like a new era of point-and-click NLP is here. It is true that writing aids are increasingly good at generating interesting ideas that can unblock authors or at producing engaging content from just a few bullet points. Equally, coding assistants can generate working snippets of code in any language from just a few comments about a function’s expected behaviour. Meanwhile, querying agents like those doing natural language to SQL can find the right answers in a database with data analyst-like precision, or like those ingesting a whole website’s content can find the needle-in-a-haystack answer in ways that weren’t possible before.
As Easy As 1-2-3?
Adjacent open-source tools and libraries have also cropped up rapidly, making it look even easier to integrate LLMs straight away. With these tools, a lot of the headaches around coding ML applications like data structuration, model serving and output processing appear to have disappeared, making it faster to inject Large Language Models straight into a product to make it AI-driven in record time. LangChain is an example of this. An emerging fan favourite for developing applications with LLMs including ChatGPT, it can do many things from flexibly processing data for building vector stores, prompt management and chaining as well as interacting generically with any LLM. Even using OpenAI’s API and SDK is a win for rapid ML development since it removed the need for model-serving infrastructure and made fine-tuning simpler.

Not So Easy For All Applications
However, as ‘easy’ as it is to make these new applications, they often have certain aspects in common that make them achievable in the first place. Typically, they have outputs that are more fuzzy and lenient in terms of accuracy. For example, with writing aids, what constitutes a ‘good’ or ‘helpful’ idea is subjective so the user can try multiple times before picking something. Additionally, these applications might be low stakes like Q&A on documentation or websites where the wrong answer is annoying but will likely not destroy a whole business. The thing to note here is that by definition of being easy, they typically mean no business differentiation.
Harder use cases, on the other hand, are less frequent because they mean skills like prompt engineering, chaining, understanding how these models work and proper experimentation become vital. Prompt engineering goes beyond blind trial and error where knowing the right or most useful ways to ‘whisper’ a machine are still in flux and not always well known.
What sets a good NLP engineer apart now is experience across different use cases in NLP, plus learning and understanding why the model might behave the way it does. The latter often requires an understanding of generative AI, its limitations and typical behaviour within the scope of its architecture.
Hallucinations, Data Privacy and IP Loss Add Complexity
The other aspect that differentiates more difficult use cases from the more commonly seen, popular ones is the sensitivity and tolerance to hallucinations. Given one of these high-stakes applications, the best-case scenario is that GPT’s propensity to hallucinate adds to the complexity of prompt engineering, requiring additional guardrails for prevention and methods for detection which in turn means having an idea of what ‘right answers’ are. The worst case however is that no amount of ‘possibly making up the wrong answer’ is acceptable, as is often the case in insurance, law and healthcare, so these models can’t be used. For example, SQL in data-driven organizations with wrong answers can cost millions so in the end, they either don’t use LLMs or another NLP approach is needed instead.
On a similar note, data privacy and the potential loss of IP are two more elements that make a use case more difficult. Making LLM prompts work for NL to code or database search language, like SQL or NoSQL, often requires some data such as database schema or actual examples from columns. However, the fact that many LLMs are third-party-hosted and posting data to them means losing control of it means that some companies cannot allow that because their data is proprietary so prompting is likely to work poorly or not at all. This is especially the case with applications involving personally identifiable information. It ends up not being possible to use such data with third-party LLMs because it’s then out of the control of the company that is supposed to be safeguarding it.
Equally, for those companies requiring tight control of the IP they develop, using LLMs also means now having to contend with prompt injection attacks that can reverse engineer the hard work of getting the AI to do what they need in their product. In both cases, other NLP methods are safer, especially if data privacy, IP and differentiability are important.
Conclusion
Many of the applications coming out these days make it seem like ChatGPT is an easy and foolproof solution for integrating AI into a product. While that may be the truth for use cases where accuracy, privacy and IP security are less important, that is not necessarily true for everyone. For others, there is disappointment that the potential for IP loss or data privacy concerns aren’t mitigated in the current iterations of LLMs. Additionally, some might be excited by the prospect of using LLMs but need assurance that hallucinations won’t cause untold damage. Lastly, there is certainly frustration at not being able to make GPT work for a use case despite it being ‘so easy’, typically because greater expertise in prompt engineering and wider NLP techniques is needed.
If any of these scenarios is something you’ve experienced at your organization, reach out to see how we can help!