org.egothor
Class Constants

java.lang.Object
  extended by org.egothor.Constants

public class Constants
extends java.lang.Object

Major constant values of the core engine.

Author:
Leo Galambos

Nested Class Summary
static class Constants.CheckDuplicityLevel
          Enum for possible levels of duplicity checking algorithm.
 
Field Summary
static java.lang.String BITTOKEN
          Tag used for bitmaps stored in the index.
static boolean CHECK_DUPLICITY
          Sign if duplicity checking algorithm should be defaultly used.
static java.lang.String CHECK_DUPLICITY_DIR
          The name of directory for the duplicity checking algorithm files.
static java.lang.String CHECK_DUPLICITY_INDEX_DIR
          The name of directory for the duplicity checking algorithm reports.
static Constants.CheckDuplicityLevel CHECK_DUPLICITY_LEVEL
          Actual level of duplicity checking algorithm.
static boolean CHECK_DUPLICITY_ON_NGRAMS
          Sign if duplicity checking algorithm will work with N-grams of words.
static int CHECK_DUPLICITY_PERM_CHUNK_BITS
          Number of bits of permutation chunks used in duplicity checking algorithm.
static int CHECK_DUPLICITY_PERM_CHUNKS
          Number of permutation chunks that together form one logical permutation used in duplicity checking algorithm.
static int CHECK_DUPLICITY_PERM_NUM
          Number of permutations used in duplicity checking algorithm.
static java.lang.String CHECK_DUPLICITY_REPORT_DIR
          The name of directory for the duplicity checking algorithm reports.
static java.lang.String CHECK_DUPLICITY_TEMP_DIR
          The name of directory for the duplicity checking algorithm temporary files.
static boolean CHECK_PARAGRAPHS
          Sign if ParagraphPunctFilter should be used.
static java.lang.String CONST_FILE_BEGINNING_POSTFIX
          postfix of const files that specifie time of the request for constancy of index.
static java.lang.String CONST_FILE_PREFIX
           
static java.lang.String DEADBARRELS_FILENAME
          Name of the file where directory numbers of dead barrels are saved.
static int DEFAULTMODEL
          What model is used for querying.
static int DOCINVSIZE
          How many terms we assume in a document.
static int DOCSCACHE
          How many documents are cached in each barrel during querying phase?
static double DUPLICATE_TRESHOLD
          The treshold for mean value of Jaccard coeficient (divided by number of permutations see CHECK_DUPLICITY_PERM_NUM) for all textual units of the document.
static int FIRSTPARAGRAPH
          Number of the first paragraph in a document.
static int FIRSTSENTENCE
          Number of the first sentence in a document.
static long FIRSTUID
          Number of the first document in a collection (barrel/tanker).
static java.lang.String FS
          File separator.
static int IOSIZE
          Size of the I/O buffers.
static int ITEM_LENGTH_IN_TRANSACTION_LISTENER
          Length of a sorted item in transaction listener log.
static java.lang.String JACCARD_COEFICIENTS_FILE_NAME
          The filename for the JaccardCoeficientsFile.
static java.lang.String LOCAL_TANKER_COMMIT_TO_GLOBAL_LOG_FILENAME
          Prefix of state file, that signals, that local tanker is in commit phase.
static java.lang.String LOCAL_TANKER_DIRECTORY_PREFIX
          Prefix of all local tankers.
static long LOCK_RESERVATION_REFRESH_PERIOD
          Refresh time of lock reservation, time after which the reservation can expire.
static java.lang.String LOCK_SERVER_DEFAULT_CONFIG_FILENAME
          Full filename to the lock server configuration file.
static java.lang.String LS
          Line separator.
static double MINIDF
          Minimal value of an inverse document frequency.
static double MINVALIDIDF
          All terms with idf that is lower are excluded automatically.
static java.lang.String MODIFIER_STATE_FILENAME_PREFIX
          Prefix of all modify active state filenames.
static long MODIFY_STATE_REFRESH_PERIOD
          Period of time after which modify state file of a modifier is refreshed.
static long NO_RESERVATION_ID
          Id of lock, that is returned from lock server, when no reservation was created.
static int NORMFACTOR
          Normalization of vectors to this...
static int OCCURENCIESTOSCAN
          Maximum number of positions which are scanned during phrase queries in each of the acting term occurencies.
static char PARAGRAPH_SEPARATOR
          Special character which determines paragraph separator.
static int PARAGRAPH_SEPARATOR_WEIGHT
          Special weight which determines paragraph separator.
static java.lang.String PERMUTATED_MINS_FILE_PREFIX
          The prefix of filename for the PermutatedMinsFile.
static int PRECOMPCACHE
          How many values are precomputed for an inverted list during the search phase.
static java.lang.String READ_LOCK_FILENAME_PREFIX
          Prefix of all read lock filenames.
static long READ_LOCK_PERIOD
          Default read lock refresh period.
static boolean REQUIREDMODEBYDEF
          Required mode in queries? (true=act as g00gle)
static int SECOND
          Period of time - 1 second.
static java.lang.String SEPFILESEXT
          What extention is used in ThickBarrel for separated inverted lists.
static java.lang.String SEPTOKEN
          Tag(s) used for separated inverted lists - defines the prefix.
static java.lang.String SIMILAR_UNIT_PAIRS_FILE_PREFIX
          The prefix of filename for the SimilarUnitPairsFile.
static double SIMILARITY_RELEVANT_TRESHOLD
          The treshold for Jaccard coeficient (divided by number of permutations see CHECK_DUPLICITY_PERM_NUM) for a textual units of the document to appear as suspect in duplicity checking report.
static int SKIPFACTOR
          Skip factor preferably used.
static boolean SUPPORTHTDIG
          Support HTdig in the HTML parser.
static int TANKERCACHE
          Size of a cache in the TankerImpl.
static java.lang.String TEMPDIR
          Temporary directory.
static int TERMSCACHE
          How many words are cached in each barrel during querying phase?
static int TITLELEN
          Title length.
static java.lang.String TRANSACTION_LISTENER_FILENAME_PREFIX
          Prefix of all transaction listeners' filenames.
static java.lang.String UNKNOWNCONTENTTYPE
          Unknown content type (used by robot or indexers when they cannot obtain a valid content-type).
static int WORDNGRAMS_LENGHT
          The lenght of N-grams produced by the WordNGrammer filter.
static java.lang.String WRITE_LOCK_FILENAME_PREFIX
          Prefix of all write lock filenames.
static long WRITE_LOCK_PERIOD
          Default write lock refresh period.
 
Constructor Summary
Constants()
           
 
Method Summary
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NORMFACTOR

public static final int NORMFACTOR
Normalization of vectors to this...

See Also:
Constant Field Values

PRECOMPCACHE

public static final int PRECOMPCACHE
How many values are precomputed for an inverted list during the search phase. If you set this number equal to NORMFACTOR, then all values possible are precomputed.

See Also:
Constant Field Values

DEFAULTMODEL

public static final int DEFAULTMODEL
What model is used for querying.

See Also:
Query.MODEL_VECTOR, Query.MODEL_FUZZY_M, Query.MODEL_BOOLEAN, Query.setModel(int), Constant Field Values

DOCINVSIZE

public static final int DOCINVSIZE
How many terms we assume in a document. Terms are stored in a hash map and this value is used for its initial construction size.

See Also:
Constant Field Values

FIRSTUID

public static final long FIRSTUID
Number of the first document in a collection (barrel/tanker).

See Also:
Constant Field Values

FIRSTSENTENCE

public static final int FIRSTSENTENCE
Number of the first sentence in a document.

See Also:
Constant Field Values

FIRSTPARAGRAPH

public static final int FIRSTPARAGRAPH
Number of the first paragraph in a document.

See Also:
Constant Field Values

TERMSCACHE

public static final int TERMSCACHE
How many words are cached in each barrel during querying phase?

See Also:
Terms, Constant Field Values

DOCSCACHE

public static final int DOCSCACHE
How many documents are cached in each barrel during querying phase?

See Also:
Documents, Constant Field Values

SEPFILESEXT

public static final java.lang.String SEPFILESEXT
What extention is used in ThickBarrel for separated inverted lists. Nonfunctional

See Also:
Constant Field Values

SEPTOKEN

public static final java.lang.String SEPTOKEN
Tag(s) used for separated inverted lists - defines the prefix. Nonfunctional

See Also:
Constant Field Values

BITTOKEN

public static final java.lang.String BITTOKEN
Tag used for bitmaps stored in the index.

See Also:
Constant Field Values

OCCURENCIESTOSCAN

public static final int OCCURENCIESTOSCAN
Maximum number of positions which are scanned during phrase queries in each of the acting term occurencies.

See Also:
PhraseScan, Constant Field Values

MINIDF

public static final double MINIDF
Minimal value of an inverse document frequency.

See Also:
CWI, Constant Field Values

FS

public static final java.lang.String FS
File separator.


LS

public static final java.lang.String LS
Line separator.


SUPPORTHTDIG

public static final boolean SUPPORTHTDIG
Support HTdig in the HTML parser.

See Also:
HTMLParser, Constant Field Values

TITLELEN

public static final int TITLELEN
Title length.

See Also:
HTMLParser, Constant Field Values

IOSIZE

public static final int IOSIZE
Size of the I/O buffers.

See Also:
FileLocal, Constant Field Values

REQUIREDMODEBYDEF

public static final boolean REQUIREDMODEBYDEF
Required mode in queries? (true=act as g00gle)

See Also:
Constant Field Values

MINVALIDIDF

public static final double MINVALIDIDF
All terms with idf that is lower are excluded automatically.

See Also:
QTerm.applyCWI(org.egothor.core.CWI), Constant Field Values

SKIPFACTOR

public static final int SKIPFACTOR
Skip factor preferably used.

See Also:
IListMetadataWrite, Constant Field Values

TANKERCACHE

public static final int TANKERCACHE
Size of a cache in the TankerImpl.

See Also:
TankerImpl, Constant Field Values

TEMPDIR

public static final java.lang.String TEMPDIR
Temporary directory.

See Also:
Directory

UNKNOWNCONTENTTYPE

public static final java.lang.String UNKNOWNCONTENTTYPE
Unknown content type (used by robot or indexers when they cannot obtain a valid content-type).

See Also:
Constant Field Values

CONST_FILE_PREFIX

public static final java.lang.String CONST_FILE_PREFIX
See Also:
Constant Field Values

CONST_FILE_BEGINNING_POSTFIX

public static final java.lang.String CONST_FILE_BEGINNING_POSTFIX
postfix of const files that specifie time of the request for constancy of index.

See Also:
Constant Field Values

LOCAL_TANKER_DIRECTORY_PREFIX

public static final java.lang.String LOCAL_TANKER_DIRECTORY_PREFIX
Prefix of all local tankers.

See Also:
Constant Field Values

DEADBARRELS_FILENAME

public static final java.lang.String DEADBARRELS_FILENAME
Name of the file where directory numbers of dead barrels are saved.

See Also:
Constant Field Values

READ_LOCK_FILENAME_PREFIX

public static final java.lang.String READ_LOCK_FILENAME_PREFIX
Prefix of all read lock filenames.

See Also:
Constant Field Values

WRITE_LOCK_FILENAME_PREFIX

public static final java.lang.String WRITE_LOCK_FILENAME_PREFIX
Prefix of all write lock filenames.

See Also:
Constant Field Values

TRANSACTION_LISTENER_FILENAME_PREFIX

public static final java.lang.String TRANSACTION_LISTENER_FILENAME_PREFIX
Prefix of all transaction listeners' filenames.

See Also:
Constant Field Values

MODIFIER_STATE_FILENAME_PREFIX

public static final java.lang.String MODIFIER_STATE_FILENAME_PREFIX
Prefix of all modify active state filenames.

See Also:
Constant Field Values

LOCAL_TANKER_COMMIT_TO_GLOBAL_LOG_FILENAME

public static final java.lang.String LOCAL_TANKER_COMMIT_TO_GLOBAL_LOG_FILENAME
Prefix of state file, that signals, that local tanker is in commit phase.

See Also:
Constant Field Values

ITEM_LENGTH_IN_TRANSACTION_LISTENER

public static final int ITEM_LENGTH_IN_TRANSACTION_LISTENER
Length of a sorted item in transaction listener log.

See Also:
Constant Field Values

LOCK_SERVER_DEFAULT_CONFIG_FILENAME

public static final java.lang.String LOCK_SERVER_DEFAULT_CONFIG_FILENAME
Full filename to the lock server configuration file.

See Also:
Constant Field Values

READ_LOCK_PERIOD

public static final long READ_LOCK_PERIOD
Default read lock refresh period.

See Also:
Constant Field Values

WRITE_LOCK_PERIOD

public static final long WRITE_LOCK_PERIOD
Default write lock refresh period.

See Also:
Constant Field Values

NO_RESERVATION_ID

public static final long NO_RESERVATION_ID
Id of lock, that is returned from lock server, when no reservation was created.

See Also:
Constant Field Values

SECOND

public static final int SECOND
Period of time - 1 second.

See Also:
Constant Field Values

MODIFY_STATE_REFRESH_PERIOD

public static final long MODIFY_STATE_REFRESH_PERIOD
Period of time after which modify state file of a modifier is refreshed. Must be in seconds, because unix file system can handle on its files last-modified atribute only minimum scale of seconds.

See Also:
Constant Field Values

LOCK_RESERVATION_REFRESH_PERIOD

public static final long LOCK_RESERVATION_REFRESH_PERIOD
Refresh time of lock reservation, time after which the reservation can expire.

See Also:
Constant Field Values

PARAGRAPH_SEPARATOR

public static final char PARAGRAPH_SEPARATOR
Special character which determines paragraph separator.

See Also:
Constant Field Values

PARAGRAPH_SEPARATOR_WEIGHT

public static final int PARAGRAPH_SEPARATOR_WEIGHT
Special weight which determines paragraph separator.

See Also:
Constant Field Values

WORDNGRAMS_LENGHT

public static final int WORDNGRAMS_LENGHT
The lenght of N-grams produced by the WordNGrammer filter.

See Also:
Constant Field Values

PERMUTATED_MINS_FILE_PREFIX

public static final java.lang.String PERMUTATED_MINS_FILE_PREFIX
The prefix of filename for the PermutatedMinsFile.

See Also:
Constant Field Values

SIMILAR_UNIT_PAIRS_FILE_PREFIX

public static final java.lang.String SIMILAR_UNIT_PAIRS_FILE_PREFIX
The prefix of filename for the SimilarUnitPairsFile.

See Also:
Constant Field Values

JACCARD_COEFICIENTS_FILE_NAME

public static final java.lang.String JACCARD_COEFICIENTS_FILE_NAME
The filename for the JaccardCoeficientsFile.

See Also:
Constant Field Values

CHECK_DUPLICITY_DIR

public static final java.lang.String CHECK_DUPLICITY_DIR
The name of directory for the duplicity checking algorithm files.


CHECK_DUPLICITY_TEMP_DIR

public static final java.lang.String CHECK_DUPLICITY_TEMP_DIR
The name of directory for the duplicity checking algorithm temporary files.


CHECK_DUPLICITY_REPORT_DIR

public static final java.lang.String CHECK_DUPLICITY_REPORT_DIR
The name of directory for the duplicity checking algorithm reports.


CHECK_DUPLICITY_INDEX_DIR

public static final java.lang.String CHECK_DUPLICITY_INDEX_DIR
The name of directory for the duplicity checking algorithm reports.


CHECK_DUPLICITY_ON_NGRAMS

public static final boolean CHECK_DUPLICITY_ON_NGRAMS
Sign if duplicity checking algorithm will work with N-grams of words.

See Also:
Constant Field Values

CHECK_DUPLICITY_LEVEL

public static Constants.CheckDuplicityLevel CHECK_DUPLICITY_LEVEL
Actual level of duplicity checking algorithm.


CHECK_DUPLICITY_PERM_CHUNK_BITS

public static final int CHECK_DUPLICITY_PERM_CHUNK_BITS
Number of bits of permutation chunks used in duplicity checking algorithm.

See Also:
Constant Field Values

CHECK_DUPLICITY_PERM_CHUNKS

public static final int CHECK_DUPLICITY_PERM_CHUNKS
Number of permutation chunks that together form one logical permutation used in duplicity checking algorithm. Must be CHECK_DUPLICITY_PERM_LENGHT * CHECK_DUPLICITY_PERM_CHUNKS < 64.

See Also:
Constant Field Values

CHECK_DUPLICITY_PERM_NUM

public static int CHECK_DUPLICITY_PERM_NUM
Number of permutations used in duplicity checking algorithm.


CHECK_DUPLICITY

public static final boolean CHECK_DUPLICITY
Sign if duplicity checking algorithm should be defaultly used.

See Also:
Constant Field Values

CHECK_PARAGRAPHS

public static final boolean CHECK_PARAGRAPHS
Sign if ParagraphPunctFilter should be used.

See Also:
Constant Field Values

DUPLICATE_TRESHOLD

public static double DUPLICATE_TRESHOLD
The treshold for mean value of Jaccard coeficient (divided by number of permutations see CHECK_DUPLICITY_PERM_NUM) for all textual units of the document. Above this value the document is considered duplicate.


SIMILARITY_RELEVANT_TRESHOLD

public static final double SIMILARITY_RELEVANT_TRESHOLD
The treshold for Jaccard coeficient (divided by number of permutations see CHECK_DUPLICITY_PERM_NUM) for a textual units of the document to appear as suspect in duplicity checking report. Above this value the diff algorithm with the most similar document from the index is made for this textual unit.

See Also:
Constant Field Values
Constructor Detail

Constants

public Constants()