Why Spellcheck Is So Good and Grammar Check Is So Bad

computer, check sign — It's much easier to program software to check spelling than it is to check grammar. bubaone/Getty Images

There's an old saying in robotics: Anything a human being learns to do after age 5 is easy to teach a machine. Everything we learn before 5, not so easy. That unwritten law of machine learning might explain why there are computers that can beat the world's best chess and Go masters, but we've yet to build a robot that can walk like a human. (Don't try to tell me that ASIMO walks like a human.)

This might also explain why the spellchecker on your computer works so brilliantly, but the grammar checker doesn't. We learn how to spell only when we're old enough to go to school, but the basics of language development can start as early as in the womb.

Inference and Context

Spelling is a finite task with discrete right or wrong answers. English grammar, on the other hand, contains a near infinite number of possibilities, and whether something is grammatically correct or incorrect can largely depend on subtle clues like context and inference.

That's why certain English sentences are such a pain in the neck for automated grammar checkers. Les Perelman, a retired MIT professor and former associate dean of undergraduate education who ran the university's writing program, gave me this one: "The car was parked by John."

My admittedly dated version of Microsoft Word (Word for Mac 2011) is programmed to recognize and correct passive voice, a no-no in most grammar circles. When I type this sentence into Word, the program dutifully underlines it in green and suggests: "John parked the car." That would be fine if John had parked the car, but what if I meant that the car was physically parked near John?

Simple mistake, you might say, but look what happens when I change the sentence to "The car was parked by the curb." Word underlines it and suggests: "The curb parked the car." That's downright goofy, even for a computer.

"So much of English grammar involves inference and something called mutual contextual beliefs," says Perelman. "When I make a statement, I believe that you know what I know about this. Machines aren't that smart. You can train the machine for a specific situation, but when you talk about transactions in human language, there's actually a huge number of inferences like that going on all the time."

Perelman has a beef with grammar checkers, which he claims simply do not work. Citing previous research, he found that grammar checkers only correctly identified errors in student papers 50 percent of the time. And even worse, they often flagged perfectly good prose as a mistake, known as a false positive.

In one exercise, Perelman plugged 5,000 words of a famous Noam Chomsky essay into the e-rater scoring engine by ETS, the company that produces (and grades) the GRE and TOEFL exams. The grammar checker found 62 errors — including 14 instances of a sentence starting with a coordinating conjunction ("and," "but," "or") and nine missing commas — all but one of which Perelman classified as "perfectly grammatical prose."

A Little History

The first automated spell checker shipped with an early version of WordPerfect in 1983, and the first computerized grammar checkers soon followed in both WordPerfect and Microsoft Word.

Mar Ginés Marín is a principal program manager at Microsoft who's been tinkering with the Office grammar editor for the past 17 years. She says that in the early days, the best Word could do was parse a sentence into its component parts of speech and identify simple grammar errors like noun-verb agreement. Then engineers figured out how to parse a sentence into smaller "chunks" of two or three words to target things like "a/an" agreement. This is called natural language processing or NLP.

The next step was to introduce machine learning. Susan Hendrich is a group program manager at Microsoft in charge of the natural language processing teams working on Office. With machine learning, Microsoft engineers could go beyond programming each and every grammar rule into the software. Instead, they train the machine on a huge dataset of correct English usage and let the machine learn from the patterns it discovers.

Hendrich says that algorithms developed by Microsoft through machine learning are what drive Word's decisions about whether or not a sentence needs a question mark, or what types of clauses require a comma (pretty tricky stuff, even for us human writers).

But did it work? Daniel Kies, an English professor at the College of Du Page, in Glen Ellyn, Illinois, once conducted a head-to-head test of various grammar checkers ranging from WordPerfect 8, released in the late 1990s, up to Word 2007. When checked against 20 sentences containing the most common writing errors, all the grammar checkers performed fairly miserably. No version of Word after 2000 caught any of the mistakes (oddly, Word 97 scored better) and WordPerfect only identified 40 percent of the errors.

While those numbers don't represent the latest versions of grammar checkers, they do point to one of the biggest challenges in creating a powerful and precise grammar engine that's built into a piece of software — space.

"We can make these big beautiful models that have a high precision accuracy, but they're too big to ship in the box with the product," says Hendrich at Microsoft. "So we have to slim our model down, and as we slim our model down we lose precision accuracy. So we have this balance point that we're willing to ship with."

Ginés Marín defends Word's precision but admits that space constraints affected the level of "coverage" that Microsoft's grammar checker provided. When the model was slimmed down to fit into the software, it also needed to be dialed back in breadth so that it didn't flag lots of good text as mistakes.

The Golden Squiggle

What's changed since the days of Word 2007 is the rise of Web-based software applications. Now engineers don't have to cram a large grammar engine into a package small enough to live on the user's hard drive. The grammar algorithms can live in the cloud and be accessed over the internet in real time.

Hendrich says that the web-based versions of Office already rely on robust grammar engines that are hosted in the cloud, and her team is currently in the process of moving all the old built-in critiques and grammar models to the cloud, too. The challenge going forward, says Hendrich, is to decide how much functionality to keep "in the box" and how much to deliver "through the service," as Hendrich calls Microsoft's cloud-based, software-as-a-service model.

The issue is cost. Every time Word calls up to the cloud for grammar advice, it costs a few fractions of a penny.

"If you're writing a 10-page document, do you call up to the service on every keystroke?" Hendrich asks. "When you start looking at the cost models, it can be quite large."

The latest version of Microsoft's grammar editor is far more robust than its predecessors. Errors come with multiple correction suggestions plus explanations for the grammar rules behind them. There's a built-in read-aloud function that's particularly helpful for people with dyslexia and for non-native speakers. And there's a new type of suggestion that Hendrich calls the "golden squiggle" that addresses writing style more than basic grammar.

If you write that the committee is looking for a new "chairman," for example, the golden squiggle will suggest that you use a gender-neutral term like "chairperson." If you're writing a memo to your boss that requires a certain degree of formality, the gold squiggle will flag words that seem too casual like "comfy."

One question that's important to ask is whether grammar checkers really need to be perfect. If Word suggests that the sentence should read "The curb parked the car," you can just ignore it. No big deal, right?

For native English speakers, a not-so-perfect grammar checker is a mild irritation. Even if you're not a grammar whiz, you can hear it when something sounds wrong. The real problem, says former MIT writing professor Perelman, occurs when English language learners rely on these tools to correct their writing.

"It really depends who the user is," says Perelman. "If the user is a native speaker, false positives aren't as dangerous as they are to a non-native speaker."

If Word tells an English language learner that "the curb parked the car," not only will their writing not make any sense, but they'll be learning bad grammar. Now that English has become the lingua franca of science and technology, Perelman says, businesses around the world are desperate for a truly reliable and accurate English grammar checker. That's why you see the rise of third-party, web-based grammar tools like Grammarly and Ginger, all trying to meet this international demand.

The good news is that the latest version of Word (2016) passes the "curb" test. Grammarly, however, flagged it as passive voice.