Mockingbird Consulting Company

Introduction

Does it ever feel like an AI is speaking an alien dialect, even when all the words in the output are in the expected language? Somehow, the output feels hollow, like it's missing the depth and nuance that makes language useful. It could very well be because the AI's output is missing a crucial element of language utility: information density. Understanding information density and its counterpart, surprisal, can improve your prompts.

‍

What is information density?

First, a loose definition of "information density": approximately how much information is encoded into a particular unit of language, often a word or sentence. For the purposes of this post, we'll interpret "information density" as how much knowledge the use of a particular word imparts to the listener (or reader)¹. An adjacent and useful concept is "surprisal", which refers to the probability of any particular word appearing after its predecessor. Taken together, this means that a rarely-spoken sentence is likely to convey more information than an often-spoken sentence².

It may come as no surprise that AIs are not very good at generating surprising sentences without being explicitly prompted for such a thing. GPT-class AIs optimize for relevance, not meaning; they're more interested in which word is most likely to show up next. By our definition, this is inherently less informative, because the information made available to the recipient is inversely proportional to the likelihood that the next word follows its predecessor. The AI is most likely to provide generic, predictable text, because it's optimized for predictable relevance rather than the rich density human speech aims for.

So, what does this mean for prompt writing? To me, it outlines two approaches: relevance optimization and context optimization.

Relevance optimization is the process of using prompt vocabulary that has a greater likelihood of being correlated to the vocabulary of your desired output. Described differently, this means that using precise language - like technical jargon - is more likely to generate the desired language in the output than using generic vocabulary (ie, buzzwords).

As an example, here are a series of Claude prompts and their outputs:

As you can see, the complexity and nuance of the prompt given increases the complexity and depth of the response.

Context optimization is similar, but perhaps less verbose. When using context optimization, I supply the context from which I want the AI to source its answer, such as a technical manual. In the example here, this could mean including a cheese tasting menu or similar artifact to encourage the AI to stay within the theme.

Either of these strategies alone is likely to help increase the utility of an AI’s output. Using both of them can enhance prompt creation even further, guiding the AI towards richer, more informative, and more useful outputs. If you'd like to learn more about writing effective prompts that reflect your unique context, request a consultation here.

‍

Mockingbird consulting

AI & Information Density

Introduction

What is information density?