org.egothor.parser.filter
Class WordNGrammer

java.lang.Object
  extended by org.egothor.core.Filter
      extended by org.egothor.parser.filter.WordNGrammer
All Implemented Interfaces:
Sequence<Token>

public class WordNGrammer
extends Filter

This class produces N-grams of words. From a sentence with W words w1, w2, w3,... wW it produce W - N + 1 N-grams. The N is determined by the Constants.WORDNGRAMS_LENGHT constant. Filter treats all types of input tokens equally.

Output tokens:


Example (N=3):
Sentence: "the dog smelled like a skunk"
N-grams: "<NGRAM>the dog smelled", "<NGRAM>dog smelled like", "<NGRAM>smelled like a", "<NGRAM>like a skunk"

Typically the filter ParagraphPunctFilter shall be applied to the token sequence before this filter. If it is not, than all the document is taken as one sentence and N-grams are produced "on document level". This filter should be the last filter to apply. The filter has an inner context thus it cannot be shared in a filtering chain.

Author:
Kate�ina Dufkov�

Field Summary
 
Fields inherited from class org.egothor.core.Filter
prev
 
Constructor Summary
WordNGrammer(Sequence<Token> prev)
          Constructor for the WordNGrammer object
 
Method Summary
 Token next()
          Return the next token.
 
Methods inherited from class org.egothor.core.Filter
action, getPrevTokenizer, setPrevTokenizer
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordNGrammer

public WordNGrammer(Sequence<Token> prev)
Constructor for the WordNGrammer object

Parameters:
prev - tokenizer used by the filter
Method Detail

next

public Token next()
Return the next token.

Specified by:
next in interface Sequence<Token>
Overrides:
next in class Filter
Returns:
the next token
See Also:
Filter.next()