The Statistics and The Word

For someone who has always had trouble with words, knowing my computer has had the same trouble is a great relief. English is a confusing language, the rules bend and break depending on the context. It’s hard enough for people to learn language context for what they are learning, a computer, which arguably doesn’t understand the semantic meaning of the word, has to suggest and predict based on the rules we give it and what has happened before.

How does this work? Why are google docs becoming so good a guessing the next word in your sentence? The answer is statistics.

As we write, information is flowing out of our fingertips coming to what we expect to be the eventual end of a sentence.  Each sentence has some meaning which comes in a usually predictable way.  (Subject – Predicate is how I learned it.) That means that as I sentence off there should be certain elements of the prose which are more understandable each step I take. We do this all the time when we can guess what will come at the end of a talk, or guess what someone is likely to say next. It’s because computers, like us, build a repertoire of already constructed sentences that allows us to get a good idea of what is likely to come next. The computer builds models based on millions of lines of texts, evaluating each and every way these words have been constructed, taking into account what has already been written, and then generating a suggestion that has the highest likelihood of being correct.

We can see this in action with google docs, as we are writing it makes in the moment suggestions as to what will come next. This does two-fold. First, this is training the model for the language we expect it to use in real-time. If I choose to write what the AI suggests then it knows it was correct in its choice in how the sentence was structured and choosing the right model for the future. Second, it shoehorns the user into using more predictable language which then the AI can better predict. The more language the AI knows, and the more it knows how you write, the better it is as predicting how you will compose a document.

Writing is an arduous task, to do it well takes a tremendous amount of effort and time. I imagine as these AI advances writing will become easier, to the point where all we will need to do is give the AI the subject of our writing and the context of why it’s being written and the AI will be able to give us a decent first draft.

Statistical probabilities are an interesting thing, as models get better we (people) become more predictable. This lends itself an eerie feeling that someone knows what we will do. Though this is a topic for another time, the predictability of how you write and speak is critical to how systems work. The only way AI can write is because what we write, and perhaps how we think, comes in a predictable way. Think about that next time you decide to compose a document, whether your writing foreshadows what is to come next.  For now, we can start to rely on machines for that next step.