org.egothor.parser.filter
Class DupWithoutDiacritics

java.lang.Object
  extended by org.egothor.core.Filter
      extended by org.egothor.parser.filter.DupWithoutDiacritics
All Implemented Interfaces:
Sequence<Token>

public final class DupWithoutDiacritics
extends Filter

This filter transforms all (Latin) words to non-diacritical (ASCII), but still keeps the original tokens. ASCII variants of tokens (if any) are placed with reloffset=0 after the original tokens.

Author:
Leo Galambos

Field Summary
 
Fields inherited from class org.egothor.core.Filter
prev
 
Constructor Summary
DupWithoutDiacritics(Sequence<Token> prev)
          Constructor for the Diacritics object.
 
Method Summary
 Token action(Token t)
          If the name/type of the token is <WORD> then transform the text of the token to lower case.
 Token next()
          Return the next token.
 
Methods inherited from class org.egothor.core.Filter
getPrevTokenizer, setPrevTokenizer
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DupWithoutDiacritics

public DupWithoutDiacritics(Sequence<Token> prev)
Constructor for the Diacritics object.

Parameters:
prev - this filter's Tokenizer
Method Detail

action

public Token action(Token t)
If the name/type of the token is <WORD> then transform the text of the token to lower case. In other cases the token remains untouched.

Overrides:
action in class Filter
Parameters:
t - the Token
Returns:
the lower case Token

next

public Token next()
Return the next token.

Specified by:
next in interface Sequence<Token>
Overrides:
next in class Filter
Returns:
the next token
See Also:
org.egothor.core.Tokenizer#next()