I’ve been recently working with Java Lucene and its Analyzers and for I project I worked on the client needed to use the Porter Stemmer algorithm. I used the SnowballAnalyzer, but unfortunately I found out that, as someone before me said, Porter stemmer works right on 90% of the cases, but when it fails, it fails hard! The example is the following: consider the words “organic”, “organ” and “organization” .. the three words haven’t a lot in common except of their prefix, thei do not mean the sameĀ tihng… but for Porter (and for the Snowball Analyzer) they’re stemmed into “organ”.. in Lucene 3.1.x release there will be plenty new features allowing programmers to control and fine tune each stemming algorithm.
So, what can I do since I must use the 3.0.3 release? Well.. I created a new PlingStemmerFilter using YAGO java Pling stemmer implementation following instructions found here.