A SWOT Analysis Of AI
AI is good. The whole spectrum of Artificial Intelligence (AI) from predictive to reactive to prescriptive to generative AI and the Machine Learning (ML) functions that power it are generally regarded as technical evolutionary developments likely to, as a whole, benefit society if we apply them carefully.
However, there is an if and a but (and perhaps even an occasional maybe) in that proposition.
The various misgivings associated with AI that need to be analyzed are not a question of which job roles and workplace functions might soon be completely robot-automated and driven by AI. The general panic is over in that regard and most people understand that some menial jobs will go, more high-value jobs can be created and existing roles can now be augmented and positively accelerated by AI to make our lives better.
All that said, a Strengths, Weaknesses, Opportunities, Threats (SWOT) analysis of the state of AI today would not go amiss. For the sake of the storytelling narrative here, let’s reorder that analysis to opportunities, strengths, weaknesses and the essential care and consideration ground of threats (OSWT).
Opportunities
There is so much we can do with AI and Large Language Models (LLMs) if we take the opportunity to really understand how they work. If we ask ChatGPT to describe Einstein’s general theory of relativity, we get a pretty very accurate answer. But ultimately, ChatGPT is still ‘just’ a computer program (as are all other LLMs) that is blindly executing its instruction set. It understands Einstein’s general theory of relativity no better than your favorite pet does.
“Unfortunately, we use ‘human-like’ words to describe the techniques engineers use to create AI models and functions. For example, we talk about ‘machine learning’ and ‘training’ in the context of the way we are working with LLMs in the AI arena. This is misleading because an LLM does not have a mind like a human,” clarified Keith Pijanowski, senior technologist & AI/ML SME at MinIO, a company known for its work in open source high-performance object storage for cloud-native workloads such as those now being executed for AI.
There is a certain irony here says Pijanowski i.e. how can a non-thinking chatbot correctly summarize the findings of the smartest man to ever live? If we can understand more about the essentially contradictory nature of LLMs, we may be able to uncover more opportunities to use these new intelligence functions that have yet even considered.
Strengths
The strength of LLMs is that they are trained to understand the probability distribution of words in the training set used to create them. If the training set is sufficiently large (i.e. a corpus of Wikipedia articles or public code on GitHub), then the models will have a vocabulary and a corresponding probability distribution that will make their results appear as if they have a real-world understanding of the text they output.
If we move to an example drawn from philosophy and ask ChatGPT the question, “What does ‘cogito, ergo sum’ mean and who wrote it?” the result is something similar to the text below:
“Cogito, ergo sum” is a Latin philosophical proposition that translates to “I think, therefore I am” in English. This statement is famously associated with René Descartes, a French philosopher, mathematician and scientist. Descartes expressed this idea in his work “Discourse on the Method,” published in 1637. The phrase reflects Descartes’ attempt to establish a foundational truth that cannot be doubted – the certainty of one’s own existence as a thinking being.
“So we’re looking at the strengths element here and, as stated previously, LLMs produce results like this using probability distributions,” explained Pijanowski. “It works something like this, they start by looking at the text in the question and determine that the word ‘cogito’ has the highest probability of being the first word of the answer. From there, they look at the question and the first word of the answer to determine the word that has the highest probability of being next. This goes on and on until a special ‘end of answer’ character is determined to be of the highest probability.”
Pijanowski explains that this ability to generate a natural language response based on billions of probabilities is not something to be feared – rather, it is something that should be exploited for business value. The results get even better when you use modern techniques. For example, using techniques like Retrieval Augmented Generation (RAG) and fine-tuning, we can teach an LLM about your specific business. Achieving these human-like results will require data and your infrastructure will need a strong data storage solution.
Now that we understand what LLMs are good at and why, let’s investigate what LLMs cannot do.
Weaknesses
For Pijanowski and team, the weaknesses are relatively clear to see… and this is reality drawn from experience working the MInIO customers. We know that LLMs cannot think, understand or reason and thiis is the fundamental limitation of LLMs.
“Language models lack the ability to reason about a user’s question. They are probability machines that produce a really good guess to a user’s question. No matter how good of a guess something is, it is still a guess and whatever creates these guesses will eventually produce something that is not true. In generative AI, this is known as a hallucination,” proposed Pijanowski. “When trained correctly, hallucinations can be kept to a minimum. Fine-tuning and RAG also greatly cut down on hallucinations. The bottom line – to train a model correctly, fine-tune it, and give it relevant context (RAG) requires data and the infrastructure to store it at scale and serve it in a performant manner.”
Threats
The most popular use of LLMs is of course generative AI. Generative AI does not produce a specific answer that can be compared to a known result. This is in contrast to other AI use cases, which make a specific prediction that can be easily tested.
“It is straightforward to test models for image detection, categorization and regression. But how do you test LLMs used for generative AI in a way that is impartial, fact-faithful and scalable? How can you be sure that the complex answers LLMs generate are correct if you are not an expert yourself? Even if you are an expert, human reviewers can not be a part of the automated testing that occurs in a CI/CD pipeline,” explained Pijanowski, highlighting what could be one of the most pertinent threat factors in this space.
He laments the fact that there are a few benchmarks in the industry that can help. GLUE (General Language Understanding Evaluation) is used to evaluate and measure the performance of LLMs. It consists of a set of tasks that assess the ability of models to process human language. SuperGLUE is an extension of the GLUE benchmark that introduces more challenging language tasks. These tasks involve coreference resolution, question answering and more complex linguistic phenomena.
“While the benchmarks above are helpful, a big part of the solution should be an organization’s own data collection procedures. Consider logging all questions and answers and creating your own tests based on custom findings. This will also require a data infrastructure built to scale and perform,” concluded Pijanowski. “When we look at the strengths, opportunities, weaknesses and threats of LLMs (now rearranged into this order SOWT), if we want to exploit the first and mitigate the other two, then we will need data and a storage solution that can handle lots of it.”
Although a SWOT (in any order) analysis of AI is arguably somewhat simplistic, prone to generalization and deserving of a subsequent fact or fiction audit in and of itself, these technologies are currently moving very quickly and this is surely a prudent evaluation exercise that we should be applying on an ongoing basis.
Don’t forget, SWOT also stands for Success WithOut Tears.