org.egothor.duplicity.file
Class AllSimilarUnitPairsFile

java.lang.Object
  extended by org.egothor.duplicity.file.DuplicityCheckingFile
      extended by org.egothor.duplicity.file.CommonSimilarUnitPairsFile
          extended by org.egothor.duplicity.file.MergeableSimilarUnitPairsFile
              extended by org.egothor.duplicity.file.AllSimilarUnitPairsFile

public class AllSimilarUnitPairsFile
extends MergeableSimilarUnitPairsFile

Represents the merge of "similar unit pairs" files for all permutations that is used in the duplicity checking algorithm. The file contains instances of UnitPair class. That means it contains pairs {first, second}, where first, second are identificators of units on which we check duplicity (can be document, paragraph or sentence). The file is sorted - the main criteria is first field, in case of tie second field. The file is used as temporary file, from which the JaccardCoeficientsFile is created. Currently, the file is never written to filesystem and is only an abstraction, because the JaccardCoeficientsFile is created directly when merging. The out field of the file is used for the created JaccardCoeficientsFile file.

The file should be used as follows.

  1. First it should be created using constructor in a given location. You must specify whether this instance will produce regular JaccardCoeficientsFile or a temporary one (in temp directory).
  2. It can be used to merge files corresponding to any permutations to produce JaccardCoeficientsFile, by a call to mergeToJaccardCoeficientsFile(java.util.ArrayList) method.

Author:
Kate�ina Dufkov�

Nested Class Summary
 
Nested classes/interfaces inherited from class org.egothor.duplicity.file.DuplicityCheckingFile
DuplicityCheckingFile.TempFile
 
Field Summary
 
Fields inherited from class org.egothor.duplicity.file.MergeableSimilarUnitPairsFile
lastLoading
 
Fields inherited from class org.egothor.duplicity.file.CommonSimilarUnitPairsFile
permID
 
Fields inherited from class org.egothor.duplicity.file.DuplicityCheckingFile
location, out
 
Constructor Summary
AllSimilarUnitPairsFile(java.lang.String location, boolean createTemp)
          Initialializes the file.
 
Method Summary
protected  void checkPermutation(long permID1, long permID2)
          Checks if files with given permutations can be merged.
protected  void createOut()
          Creates temporary file and sets the out field.
 void delete()
          Deletes the file from filesystem.
 java.lang.String getFilename()
          Returns the filename corresponding to the JaccardCoeficientsFile to be created by this instance.
protected  void mergeAll(JaccardCoeficientsFileProducer mergeTo, java.util.ArrayList<CommonSimilarUnitPairsFile> supfs)
          Merges multiple files, passing the output to the JaccardCoeficientsFileProducer.
protected  void mergeAll(JaccardCoeficientsFileProducer jcfp, java.util.ArrayList<DataInputStream> diss, java.util.ArrayList<UnitPair> ups)
           
 JaccardCoeficientsFile mergeToJaccardCoeficientsFile(java.util.ArrayList<CommonSimilarUnitPairsFile> supfs)
          Merges given similar unit pairs files and produces the JaccardCoeficientsFile by aggregating the unit pairs.
 java.lang.String toString()
           
 
Methods inherited from class org.egothor.duplicity.file.MergeableSimilarUnitPairsFile
mergeAll, mergeAll, openInputs
 
Methods inherited from class org.egothor.duplicity.file.CommonSimilarUnitPairsFile
dump, getPermID, hasTheSameContent, remove
 
Methods inherited from class org.egothor.duplicity.file.DuplicityCheckingFile
createPermOut, createTempOut, dump, getLocation, getNewTempFile, getOut, hasTheSameContent, initFromProducer, openOut, remove
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

AllSimilarUnitPairsFile

public AllSimilarUnitPairsFile(java.lang.String location,
                               boolean createTemp)
                        throws java.io.IOException
Initialializes the file. Sets permID (to invalid value), location and createTemp fields. Whether a regular or temporary file will be created depends on the createTemp field. The filename is in form returned by getFilename() and will be searched in the location directory.

Parameters:
location - path and name of the directory where to create JaccardCoeficientsFile
createTemp - sign if the JaccardCoeficientsFile file should be created in temp directory, and the file should be rewritten if it already exists
Throws:
java.io.FileNotFoundException - if the file could not be created
java.io.IOException
Method Detail

mergeToJaccardCoeficientsFile

public JaccardCoeficientsFile mergeToJaccardCoeficientsFile(java.util.ArrayList<CommonSimilarUnitPairsFile> supfs)
                                                     throws java.io.IOException
Merges given similar unit pairs files and produces the JaccardCoeficientsFile by aggregating the unit pairs. The JaccardCoeficientsFile will be created on filesystem in the location specified when creating this AllSimilarUnitPairsFile instance. Whether a regular or temporary file will be created also depends on parameter createTemp specified on creation.

Parameters:
supfs - list of files to be merged. List can contain regular (SimilarUnitPairsFile) or temporary (SimilarUnitPairsTempFile)
Returns:
instance of JaccardCoeficientsFile written to filesystem
Throws:
java.io.IOException - on error while reading from this file, or creating result file, or writing to result file

delete

public void delete()
Description copied from class: DuplicityCheckingFile
Deletes the file from filesystem.

Overrides:
delete in class DuplicityCheckingFile

toString

public java.lang.String toString()
Overrides:
toString in class CommonSimilarUnitPairsFile

getFilename

public java.lang.String getFilename()
Returns the filename corresponding to the JaccardCoeficientsFile to be created by this instance. The location field MUST be already set. The filename is created in the directory given in the location field and is in form Constants.JACCARD_COEFICIENTS_FILE_NAME.

Specified by:
getFilename in class DuplicityCheckingFile

createOut

protected void createOut()
                  throws java.io.IOException
Creates temporary file and sets the out field. Uses the DuplicityCheckingFile.createTempOut() method.

Specified by:
createOut in class DuplicityCheckingFile
Throws:
java.io.IOException - if the file already exists or could not be created
See Also:
DuplicityCheckingFile.createTempOut()

checkPermutation

protected void checkPermutation(long permID1,
                                long permID2)
                         throws MergeException
Checks if files with given permutations can be merged. Actually does nothing as this file is merged from all permutations.

Specified by:
checkPermutation in class MergeableSimilarUnitPairsFile
Parameters:
permID1 - id of first permutation
permID2 - id of second permutation
Throws:
MergeException - on attempt to merge files corresponding to different permutations

mergeAll

protected void mergeAll(JaccardCoeficientsFileProducer jcfp,
                        java.util.ArrayList<DataInputStream> diss,
                        java.util.ArrayList<UnitPair> ups)
                 throws java.io.IOException
Throws:
java.io.IOException

mergeAll

protected void mergeAll(JaccardCoeficientsFileProducer mergeTo,
                        java.util.ArrayList<CommonSimilarUnitPairsFile> supfs)
                 throws java.io.IOException
Merges multiple files, passing the output to the JaccardCoeficientsFileProducer. The files does not need to correnspond to the same permutation. The method does not write anything to the filesystem.

Parameters:
mergeTo - JaccardCoeficientsFileProducer to get the results
supfs - list of files to be merged. Can be regular (SimilarUnitPairsFile or SimilarUnitPairsFile) or temporary (SimilarUnitPairsTempFile).
Throws:
java.io.IOException