Represents the merge of "similar unit pairs" files for all permutations that is
used in the duplicity checking algorithm.
The file contains instances of UnitPair class.
That means it contains pairs {first, second}, where first, second are identificators of units
on which we check duplicity (can be document, paragraph or sentence).
The file is sorted - the main criteria is first field, in case of tie second field.
The file is used as temporary file, from which the
JaccardCoeficientsFile is created.
Currently, the file is never written to filesystem and is only an abstraction,
because the JaccardCoeficientsFile is created directly
when merging. The out field of the file is used for the created
JaccardCoeficientsFile file.
The file should be used as follows.
First it should be created using constructor in a given location. You must specify
whether this instance will produce regular JaccardCoeficientsFile
or a temporary one (in temp directory).
public AllSimilarUnitPairsFile(java.lang.String location,
boolean createTemp)
throws java.io.IOException
Initialializes the file.
Sets permID (to invalid value), location and createTemp fields.
Whether a regular or temporary file will be created depends on the createTemp field.
The filename is in form returned by getFilename()
and will be searched in the location directory.
Merges given similar unit pairs files and produces
the JaccardCoeficientsFile
by aggregating the unit pairs.
The JaccardCoeficientsFile will be created on filesystem in the location
specified when creating this AllSimilarUnitPairsFile instance.
Whether a regular or temporary file will be created also depends on
parameter createTemp specified on creation.
Returns the filename corresponding to the JaccardCoeficientsFile
to be created by this instance.
The location field MUST be already set.
The filename is created in the directory given in the location field
and is in form Constants.JACCARD_COEFICIENTS_FILE_NAME.
Merges multiple files, passing the output to the
JaccardCoeficientsFileProducer.
The files does not need to correnspond to the same permutation.
The method does not write anything to the filesystem.