Natural Stemming Derived from Porter's
Computational Linguistics Department
Traditional Porter's stemming algorithm is very effective. It also has some problems:
To solve the second problem, we designed some sets of word pairs to take are of irregular words (mainly irregular verbs). Because there are not too many irregular words in English, the sets are small. We also changed the basic formula of Porter's algorithm from:
(If a word ends with -A, then delete -A or change -A to -B.)
(If a word ends with -A, then delete -A or change -A to -B, except the word is in a set of S.)
Some more rules for adjectives were added to the algorithm.
To solve the first problem, we added some post-processing to the stemming. One example is to change -i back to -y in some cases. The post-processing takes the stemmed form of words more close to its natural form. The derived algorithm gives better stemming results and it is still very effective.