org.egothor.duplicity.visualization
Class DocumentDuplicities

java.lang.Object
  extended by org.egothor.duplicity.visualization.DocumentDuplicities
Direct Known Subclasses:
DocumentDuplicitiesDocumentLevel, DocumentDuplicitiesParagraphLevel, DocumentDuplicitiesSentenceLevel

public abstract class DocumentDuplicities
extends java.lang.Object

Represents the duplicities of a document and allows visualization of the duplicities with other documents in so called reports. Reports are files that contain basic information about the document and then show which parts of the document are similar to other documents. The visualization is based on diff algorithm on document words, implemented by java-diff package (org.incava.util.diff). The report are produced using iText package (com.lowagie.text) and can be produced in PDF or HTML format.

The creation of the chapter visualizing differences of the text units is done like this: There is a global chapter object, to which paragraphs are inserted when they are complete. The insertion is specific to the duplicity checking level and is done in childs - in methods DocumentDuplicitiesDocumentLevel.flush(), DocumentDuplicitiesParagraphLevel.flush() and DocumentDuplicitiesSentenceLevel.flush(). These method are called from printTokens(java.util.List, org.egothor.duplicity.visualization.Printer) method of each child, once we come to the first token of the next paragraph. When creating paragraph, its content is stored in another global object phrase, the ordinal number of the paragraph created stored in paragraphID.

Author:
Kate�ina Dufkov�

Constructor Summary
protected DocumentDuplicities(DocumentUnitID docID, DocumentData docMeta, JaccardCoeficientsFile jcf, TankerImplSecure tanker)
           
 
Method Summary
 void createCsv(java.lang.String dirname)
          Create CSV file with jaccard coeficients for this document in given directory.
static DocumentDuplicities createNew(DocumentUnitID docID, DocumentData docMeta, JaccardCoeficientsFile jcf, TankerImplSecure tanker)
          The recommended way to create new instance of DocumentDuplicities child class.
 void createReport(java.lang.String dirname, boolean producePDF, boolean produceHTML, double coef)
          Create duplicity checking report files for this document in given directory in given formats.
static java.util.List<java.util.List<Token>> getDocumentUnits(Sequence<Token> words)
          Takes the sequence of document words and depending on the Constants.CHECK_DUPLICITY_LEVEL splits it to the appropriate text units - documents, paragraphs or sentences.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DocumentDuplicities

protected DocumentDuplicities(DocumentUnitID docID,
                              DocumentData docMeta,
                              JaccardCoeficientsFile jcf,
                              TankerImplSecure tanker)
Method Detail

createNew

public static DocumentDuplicities createNew(DocumentUnitID docID,
                                            DocumentData docMeta,
                                            JaccardCoeficientsFile jcf,
                                            TankerImplSecure tanker)
The recommended way to create new instance of DocumentDuplicities child class. Returns allways the right child class according to the Constants.CHECK_DUPLICITY_LEVEL constant.

Returns:
instance of DocumentDuplicities child class

createCsv

public void createCsv(java.lang.String dirname)
               throws java.io.IOException
Create CSV file with jaccard coeficients for this document in given directory.

Parameters:
dirname - name of the directory where the report files will be stored
Throws:
java.io.IOException - if CSV file could not be created due to input/output error

createReport

public void createReport(java.lang.String dirname,
                         boolean producePDF,
                         boolean produceHTML,
                         double coef)
                  throws com.lowagie.text.DocumentException,
                         java.io.IOException,
                         DuplicityCheckingException
Create duplicity checking report files for this document in given directory in given formats.

Parameters:
dirname - name of the directory where the report files will be stored
producePDF - if true, a PDF file report will be produced
produceHTML - if true, a HTML file report will be produced
Throws:
com.lowagie.text.DocumentException - if report files could not be created due to some error in document creation
java.io.IOException - if report files could not be created due to input/output error
DuplicityCheckingException

getDocumentUnits

public static java.util.List<java.util.List<Token>> getDocumentUnits(Sequence<Token> words)
Takes the sequence of document words and depending on the Constants.CHECK_DUPLICITY_LEVEL splits it to the appropriate text units - documents, paragraphs or sentences.

Parameters:
words - sequence of words of the document
Returns:
list of lists of words in text units, in other words list of document text units, where each text unit is represented by list of its words