awk Spell Checking

Spell checking

We create an AWK program for spell checking.

BEGIN {
    count = 0
    
    i = 0
    while (getline myword <"/usr/share/dict/words") {
        dict[i] = myword
        i++
    }
}

{
    for (i=1; i<=NF; i++) {
    
        field = $i
    
        if (match(field, /[[:punct:]]$/)) {
            field = substr(field, 0, RSTART-1)
        }
    
        mywords[count] = field
        count++
    }
}

END {

    for (w_i in mywords) { 
        for (w_j in dict) { 
            if (mywords[w_i] == dict[w_j] || 
                        tolower(mywords[w_i]) == dict[w_j]) {
                delete mywords[w_i]
            }
        }
    }

    for (w_i in mywords) { 
        if (mywords[w_i] != "") {
            print mywords[w_i]        
        }
    }
}

The script compares the words of the provided text file against a dictionary. Under the standard /usr/share/dict/words path we can find an English dictionary; each word is on a separate line.

BEGIN {
    count = 0
    
    i = 0
    while (getline myword <"/usr/share/dict/words") {
        dict[i] = myword
        i++
    }
}

Inside the BEGIN block, we read the words from the dictionary into the dict array. The getline command reads a record from the given file name; the record is stored in the $0 variable.

{
    for (i=1; i<=NF; i++) {
    
        field = $i
    
        if (match(field, /[[:punct:]]$/)) {
            field = substr(field, 0, RSTART-1)
        }
    
        mywords[count] = field
        count++
    }
}

In the main part of the program, we place the words of the file that we are spell checking into the mywords array. We remove any punctuation marks (like commas or dots) from the endings of the words.

END {

    for (w_i in mywords) { 
        for (w_j in dict) { 
            if (mywords[w_i] == dict[w_j] || 
                        tolower(mywords[w_i]) == dict[w_j]) {
                delete mywords[w_i]
            }
        }
    }
...
}

We compare the words from the mywords array against the dictionary array. If the word is in the dictionary, it is removed with the delete command. Words that begin a sentence start with an uppercase letter; therefore, we also check for a lowercase alternative utilizing the tolower() function.

for (w_i in mywords) { 
    if (mywords[w_i] != "") {
        print mywords[w_i]        
    }
}

Remaining words have not been found in the dictionary; they are printed to the console.

$ awk -f spellcheck.awk text
consciosness
finaly

We have run the program on a text file; we have found two misspelled words. Note that the program takes some time to finish.

Spell checking

AI-Enabled Business Decision-Making

Explore Generative AI with the Gemini API in Vertex AI

Inspect Rich Documents with Gemini Multimodality and Multimodal RAG

🎓 Why Original Work Matters in Your Final Year Project (And How It Can Shape Your Career)

Nishant Munjal

Tools

Resources

Legal