|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.egothor.duplicity.visualization.DocumentDuplicities
public abstract class DocumentDuplicities
Represents the duplicities of a document and allows visualization
of the duplicities with other documents in so called reports.
Reports are files that contain basic information about the document
and then show which parts of the document are similar to other documents.
The visualization is based on diff algorithm on document words,
implemented by java-diff package (org.incava.util.diff).
The report are produced using iText package (com.lowagie.text)
and can be produced in PDF or HTML format.
The creation of the chapter visualizing differences of the text units is done
like this: There is a global chapter
object, to which paragraphs are inserted
when they are complete. The insertion is specific to the duplicity checking
level and is done in childs - in methods DocumentDuplicitiesDocumentLevel.flush()
,
DocumentDuplicitiesParagraphLevel.flush()
and
DocumentDuplicitiesSentenceLevel.flush()
. These method are called from
printTokens(java.util.List
method of each child, once we come to the first token of
the next paragraph. When creating paragraph, its content is stored in another global
object phrase
, the ordinal number of the paragraph created stored in
paragraphID
.
Constructor Summary | |
---|---|
protected |
DocumentDuplicities(DocumentUnitID docID,
DocumentData docMeta,
JaccardCoeficientsFile jcf,
TankerImplSecure tanker)
|
Method Summary | |
---|---|
void |
createCsv(java.lang.String dirname)
Create CSV file with jaccard coeficients for this document in given directory. |
static DocumentDuplicities |
createNew(DocumentUnitID docID,
DocumentData docMeta,
JaccardCoeficientsFile jcf,
TankerImplSecure tanker)
The recommended way to create new instance of DocumentDuplicities child class. |
void |
createReport(java.lang.String dirname,
boolean producePDF,
boolean produceHTML,
double coef)
Create duplicity checking report files for this document in given directory in given formats. |
static java.util.List<java.util.List<Token>> |
getDocumentUnits(Sequence<Token> words)
Takes the sequence of document words and depending on the Constants.CHECK_DUPLICITY_LEVEL splits it
to the appropriate text units - documents, paragraphs or sentences. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
protected DocumentDuplicities(DocumentUnitID docID, DocumentData docMeta, JaccardCoeficientsFile jcf, TankerImplSecure tanker)
Method Detail |
---|
public static DocumentDuplicities createNew(DocumentUnitID docID, DocumentData docMeta, JaccardCoeficientsFile jcf, TankerImplSecure tanker)
Constants.CHECK_DUPLICITY_LEVEL
constant.
public void createCsv(java.lang.String dirname) throws java.io.IOException
dirname
- name of the directory where the report files will be stored
java.io.IOException
- if CSV file could not be created due to input/output errorpublic void createReport(java.lang.String dirname, boolean producePDF, boolean produceHTML, double coef) throws com.lowagie.text.DocumentException, java.io.IOException, DuplicityCheckingException
dirname
- name of the directory where the report files will be storedproducePDF
- if true, a PDF file report will be producedproduceHTML
- if true, a HTML file report will be produced
com.lowagie.text.DocumentException
- if report files could not be created
due to some error in document creation
java.io.IOException
- if report files could not be created due to input/output error
DuplicityCheckingException
public static java.util.List<java.util.List<Token>> getDocumentUnits(Sequence<Token> words)
Constants.CHECK_DUPLICITY_LEVEL
splits it
to the appropriate text units - documents, paragraphs or sentences.
words
- sequence of words of the document
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |