Class Generator

  extended by org.egothor.text.Generator
All Implemented Interfaces:

public class Generator
extends java.lang.Object
implements Sequence<Token>

This class generates Tokenizer-s (documents) which reflect the Zipf's law. Words are numbers. The documents may repeat a word k-times, where k is a random number 1-9. It implies that a word appears 5 times approximately. When we want to generate documents with an average length of L words, then we prepare the Tokenizer this way: 1) L/5 unique words are prepared according to Zipf's law; 2) duplicities are generated and the words are shuffled.

Constructor Summary
Generator(int words, int L)
Method Summary
static void main(java.lang.String[] args)
 Token next()
          Return the next item in the iteration.
 void refresh()
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public Generator(int words,
                 int L)
Method Detail


public void refresh()


public Token next()
Description copied from interface: Sequence
Return the next item in the iteration.

Specified by:
next in interface Sequence<Token>
the item


public static void main(java.lang.String[] args)