org.egothor.text
Class Generator

java.lang.Object
  extended by org.egothor.text.Generator
All Implemented Interfaces:
Sequence<Token>

public class Generator
extends java.lang.Object
implements Sequence<Token>

This class generates Tokenizer-s (documents) which reflect the Zipf's law. Words are numbers. The documents may repeat a word k-times, where k is a random number 1-9. It implies that a word appears 5 times approximately. When we want to generate documents with an average length of L words, then we prepare the Tokenizer this way: 1) L/5 unique words are prepared according to Zipf's law; 2) duplicities are generated and the words are shuffled.


Constructor Summary
Generator(int words, int L)
           
 
Method Summary
static void main(java.lang.String[] args)
           
 Token next()
          Return the next item in the iteration.
 void refresh()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Generator

public Generator(int words,
                 int L)
Method Detail

refresh

public void refresh()

next

public Token next()
Description copied from interface: Sequence
Return the next item in the iteration.

Specified by:
next in interface Sequence<Token>
Returns:
the item

main

public static void main(java.lang.String[] args)