WordNGrammer

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.egothor.parser.filter
Class WordNGrammer

java.lang.Object
  org.egothor.core.Filter
      org.egothor.parser.filter.WordNGrammer

All Implemented Interfaces:: Sequence<Token>

public class WordNGrammer
extends Filter
extends Filter

This class produces N-grams of words. From a sentence with W words w1, w2, w3,... wW it produce W - N + 1 N-grams. The N is determined by the Constants.WORDNGRAMS_LENGHT constant. Filter treats all types of input tokens equally.

Output tokens:

name - <NGRAM>
text - concatenation of texts of tokens belonging to the N-gram, with space separator
weight - arithmetic mean of all tokens belonging to the N-gram
colS, lineS - taken from the first token belonging to the N-gram
colE, lineE - taken from the last token belonging to the N-gram
sentence, paragraph, sentenceInParagraph - taken from first token, are equal for all tokens belonging to the N-gram
reloffset - taken from the first token belonging to the N-gram

Example (N=3):
Sentence: "the dog smelled like a skunk"
N-grams: "<NGRAM>the dog smelled", "<NGRAM>dog smelled like", "<NGRAM>smelled like a", "<NGRAM>like a skunk"

Typically the filter ParagraphPunctFilter shall be applied to the token sequence before this filter. If it is not, than all the document is taken as one sentence and N-grams are produced "on document level". This filter should be the last filter to apply. The filter has an inner context thus it cannot be shared in a filtering chain.

Author:: Kate�ina Dufkov�

Field Summary

Fields inherited from class org.egothor.core.Filter
`prev`

Constructor Summary
`WordNGrammer(Sequence<Token> prev)` Constructor for the WordNGrammer object

Method Summary
`Token`	`next()` Return the next token.

Methods inherited from class org.egothor.core.Filter
`action, getPrevTokenizer, setPrevTokenizer`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

WordNGrammer

public WordNGrammer(Sequence<Token> prev)

Constructor for the WordNGrammer object

Parameters:: prev - tokenizer used by the filter

Method Detail

public Token next()

Return the next token.

Specified by:: next in interface Sequence<Token>
Overrides:: next in class Filter

Returns:: the next token
See Also:: Filter.next()

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.egothor.parser.filter Class WordNGrammer

WordNGrammer

next

org.egothor.parser.filter
Class WordNGrammer