Class WordNGrammer

  extended by org.egothor.core.Filter
      extended by org.egothor.parser.filter.WordNGrammer
All Implemented Interfaces:

public class WordNGrammer
extends Filter

This class produces N-grams of words. From a sentence with W words w1, w2, w3,... wW it produce W - N + 1 N-grams. The N is determined by the Constants.WORDNGRAMS_LENGHT constant. Filter treats all types of input tokens equally.

Output tokens:

Example (N=3):
Sentence: "the dog smelled like a skunk"
N-grams: "<NGRAM>the dog smelled", "<NGRAM>dog smelled like", "<NGRAM>smelled like a", "<NGRAM>like a skunk"

Typically the filter ParagraphPunctFilter shall be applied to the token sequence before this filter. If it is not, than all the document is taken as one sentence and N-grams are produced "on document level". This filter should be the last filter to apply. The filter has an inner context thus it cannot be shared in a filtering chain.

Kate�ina Dufkov�

Field Summary
Fields inherited from class org.egothor.core.Filter
Constructor Summary
WordNGrammer(Sequence<Token> prev)
          Constructor for the WordNGrammer object
Method Summary
 Token next()
          Return the next token.
Methods inherited from class org.egothor.core.Filter
action, getPrevTokenizer, setPrevTokenizer
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public WordNGrammer(Sequence<Token> prev)
Constructor for the WordNGrammer object

prev - tokenizer used by the filter
Method Detail


public Token next()
Return the next token.

Specified by:
next in interface Sequence<Token>
next in class Filter
the next token
See Also: