Post

Detecting AI

The Turing Test

Let’s discuss, arguably, the foundational test for detecting AI: The Turing Test. Alan Turing proposed a game where an interrogator can ask questions to two entites, one a human, the other a machine. If the interrogator cannot determine which of the entities is a machine from their responses, the machine has passed the test.

The communication was initially suggested to take place via text channels only. This allows the machine to compete outside its ability to speak convincingly. We will tackle the subject, at first, with this condition applied.

You’re on YouTube. It may be a bit cynical to point out, but most comments, especially on those videos with a lot of traffic, may plausibly be generated by a language model, even an early one such as GPT-3.

This is no fault of our dear commenters, however. These comments are short and usually express basic support for the video or a funny, first-come-first-served, joke. You would be in the same boat; sometimes you don’t need to say much. This narrows the possibility space, allowing for a more accurate imitation from a statistical machine.

Statistical Machines

Statistical? What happened to Artificial Intelligence? The best way to think of these models, even newer, more groundbreaking, models like ChatGPT, are as statistical machines. They don’t generate thoughts, statements, lists, etc. on their own. They predict what a human would respond.

Language models are trained on ennumerable human outputs. Text corpuses, books, blogs, social media posts, etc. The model learns to connect these words, developing larger and larger context. When you prompt a Large Language Model (LLM), you are asking for a probable answer to your question, not aiming for an accurate response, simply an appropriate one.

This is how all Machine Learning algorithms work. They are trained to statistically predict a value based on previous values.

Becoming Human

We can see the true form of LLMs is to respond, in every case, exactly as a human would.

This does not implicate that the model would provide accurate responses. There are cases where providing inaccurate responses would, indeed, give the machine away in a Turing Test. In order to be a “perfect” LLM, it must only approximate the sum total of average human knowledge.

If you wanted to use LLMs as a tool for diffusing human knowledge, you would train them more heavily on scholarly and technical information. This may fail the Turing Test however… Imagine you are the interrogator: ask for in-depth descriptions of problems and theories in quantum field theory, fluid dynamics, chaos theory. Ask about summarizations of philosophy, classical and ancient literature, economical schools of thought. Ask about building codes, tax law, etc. Soon, you will be able to exhaust even the most impressive polyglot of their knowledge and determine the machine.

In an LLM trained representatively of all forms of knowledge an average person receives, the task will be much more difficult. A polyglot may actaully be mis-selected in a test against such a machine. As an LLM approaches its limit (being trained on all human output proportionately), it will becoming indistinguishable from a human.

What good would such a model be for us? Better to train LLMs on different sorts of knowledge so that it can extend human capabilities instead of mimicking them. While LLMs are being used this way, and with a long enough periods for and responses to interrogation, AI in text content should remain distinguishable.

Average Case

However, you’re not always sitting in a white, sterile room interrogating two entites over a terminal to determine which is a machine. You pull up a blog for an answer to a question you had. What are the chances it is AI generated? What are the chances it is inaccurate? How can you tell?

Since an ML model is just predicting the answer to the questions asked it, why wouldn’t an AI-generated blog post just contain a predication of what it assumes its average reader would expect to read? For now we have hueristics even for advanced models, like LLMs, but as the LLMs become better there really seems to be no way.

There are obviously serious implications of indistinguishable AI text written content (misleading the public, misuse in academia, propoganda generation, etc.) that I am glossing over here. We are looking at more abstract side in this article. However, these concerns are motivating researchers to develop ways of marking AI generated text content before it even goes out the door.

Watermarking

An interesting development from OpenAI, the company claims to be working on a watermark for its LLM, ChatGPT. Using psuedorandom determination for certain, equally probable tokens in an output, a watermark could be cryptographically embedded in the output. With the right key, the content could be indicated as AI-generated. This seems a possible way of identify AI content, especially if the large corporations providing these models (the only entities that can afford development in this space) agree to adopt such a technique across the board.

Other Forms of AI Content

I have a truly marvelous commentary of this subject which this margin is too narrow to contain! Maybe in another article. :)

Thanks for reading!

Even LLMs need education - quality data makes LLMs overperform

This post is licensed under CC BY 4.0 by the author.