nhatha.me

Personal blog

Is LLM emergent?

Posted at — Aug 15, 2024

How could words pulled from an LLM possibly reason like a human, hence Explainable AI? This article is super duper interesting. The author guides us into the definition of “Explainable AI” through the world of science. Complex system, reductionism, emergence, chemical reaction and physical interaction. How are these going to help?

Reductionism states that a complex system can be studied by examining its individual constituents. Namely, the system behavior can be predicted by understanding its fundamental building blocks.

Question arises. Is it true for every system? Many others, including LLM, are not quite so, the author argues. For example, reductionism cannot explain why Sodium and Chlorine in combination forming salt, while having an edible crystalline structure and distinctive taste, do not react explosively and poisonously as their constituent atoms do.

The same method can be used to observe properties in LLM. From a reductionism perspective, LLM technically is a giant statistical machine equipped with a huge compute power, designed to generate text by predicting the next statistically probable token based on the previous. This well-known autoregressive language model, alone, does not perform any structured reasoning or database lookup from existing real-world knowledge other than a massive amount of corpus it has seen. The response it provides to a query is sometimes correct, sometimes not, and often difficult to evaluate. Hence, this kind of ability is not considered to be “explainable” at all.

However, it does not mean that there is no way else to understand such properties in a complex system. Emergence, if accepted, does help. Emergence provides a less surprising way to see a sign of reasoning ability of a language model more than just simply thinking about next token prediction in a very different form. By being directed by training and prompting technique, it is able to generate human-like reasoning, many of the times, even better than being randomly answered by human.

The author also provides a few examples with LLM outputs along with well explanation. One of them is “What weighs more — 1 kg of feathers, or 0.5 kg of steel. Please explain your reasoning and how this relates to the nature of the two materials.” and expected what, the response is more than impressive. One other is LLM helped save one’s dog by identifying possible issues, how come that predicting next token turns so powerful. Another one is a simple math 7 x 6 = 42, suggested by the author and prompted by me. The answer is also correct but keep in mind that LLM again does not apply any mathematical principles rather than trying to statistically retrieve or recall what it has already seen once if not many in its training materials. Otherwise, LLM is more likely to respond the wrong answer, hence the phenomenon, hallucination.

While some readings supplied in this article seem to support the evidence of emergence existence, others not mentioned would support the otherwise. With a few simple contextual questions, they were able to show that how unreliable an LLM is. Anyone with an LLM could even try it themselves. However, this does not mean that LLMs are not helpful at all. They are.

So, are there any other ways to reduce hallucination besides prompting? Fine-tuning is to train LLM with some more domain-specific and high-quality data to steer its behavior. RAG is to supply LLM with external knowledge and keep it up-to-date. GraphRAG as a backbone is to help LLM grasp facts and relationships between data point. RLHF is to incorporate human feedback into fine-tuning process. NLEP is to complement undeterministic nature, improving interpretability and problem solving ability by enabling the model to perform code generation and execution. Name a few. Since each of these has different sets of pros and cons, a combination of these techniques may lead to a better performance gain. A massive amount of high-quality corpus, compute power, model size, and proper architecture are crucial to a foundation LLM. However, building it can be very challenging.

The best is yet to come. Developing an alternative system that is more capable in reasoning is also an option. As the field of AI is huge, at the same time, continually advancing with computer, there is much more to discover.

References

Additional Resources