ecologylab.generic
Class NewPorterStemmer

java.lang.Object
  extended by ecologylab.generic.Debug
      extended by ecologylab.generic.NewPorterStemmer

public class NewPorterStemmer
extends Debug

Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form. The input word can be provided a character at time (by calling add()), or at once by calling one of the various stem(something) methods. New version of PorterStemmer implemented with StringBuffer.


Constructor Summary
NewPorterStemmer()
           
 
Method Summary
 void add(char ch)
          Add a character to the word being stemmed.
 void add(char[] w, int wLen)
          Adds wLen characters to the word being stemmed contained in a portion of a char[] array.
 void add(java.lang.String s)
           
 java.lang.StringBuffer getResultBuffer()
          Returns a reference to a character buffer containing the results of the stemming process.
 int getResultLength()
          Returns the length of the word resulting from the stemming process.
static void main(java.lang.String[] args)
          Test program for demonstrating the PorterStemmer.
 void reset()
          reset() resets the stemmer so it can stem another word.
 void stem()
          Stem the word placed into the Stemmer buffer through calls to add().
 java.lang.String stem(java.lang.String s)
           
 java.lang.String toString()
          After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)
 
Methods inherited from class ecologylab.generic.Debug
classSimpleName, closeLoggingFile, debug, debug, debug, debug, debugA, debugA, debugA, debugI, debugI, debugI, error, error, getClassName, getClassName, getInteractive, getPackageName, getPackageName, getPackageName, initialize, level, level, level, logToFile, print, print, println, println, println, println, println, println, printlnA, printlnA, printlnA, printlnI, printlnI, printlnI, printlnI, setLoggingFile, show, show, superString, toggleInteractive, toString, warning, warning, weird, weird
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

NewPorterStemmer

public NewPorterStemmer()
Method Detail

reset

public void reset()
reset() resets the stemmer so it can stem another word. If you invoke the stemmer by calling add(char) and then stem(), you must call reset() before starting another word.


add

public void add(char ch)
Add a character to the word being stemmed. When you are finished adding characters, you can call stem(void) to stem the word.


add

public void add(char[] w,
                int wLen)
Adds wLen characters to the word being stemmed contained in a portion of a char[] array. This is like repeated calls of add(char ch), but faster.


add

public void add(java.lang.String s)

toString

public java.lang.String toString()
After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)

Overrides:
toString in class Debug

stem

public java.lang.String stem(java.lang.String s)

getResultLength

public int getResultLength()
Returns the length of the word resulting from the stemming process.


getResultBuffer

public java.lang.StringBuffer getResultBuffer()
Returns a reference to a character buffer containing the results of the stemming process. You also need to consult getResultLength() to determine the length of the result.


stem

public void stem()
Stem the word placed into the Stemmer buffer through calls to add(). Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().


main

public static void main(java.lang.String[] args)
Test program for demonstrating the PorterStemmer. It reads text from a a list of files, stems each word, and writes the result to standard output. Note that the word stemmed is expected to be in lower case: forcing lower case must be done outside the PorterStemmer class. Usage: PorterStemmer file-name file-name ...