Egothor gives you a mechanism that allows an easy and effective implementation of your own inverted lists. If you want to see an example it can be found in the core Egothor package.

Egothor stores special inverted lists for HTML documents. When an HTML document comes from server ABC.XX, a special term is created so that we can keep track of all documents and their servers. The term looks like this (in the case of ABC.XX): <SRC>XX.CBA. The tag is <SRC>, and the value is the reverse string of the server name. Of special note is the fact that you can use the special term in your queries. It gives you a very effective way to limit the query to just the server you like.

An inverted list that has to be built for the special terms described above is not the classic inverted list with properties like FREQ, W, or HANDLE. There are no proximity lists, nor frequencies. For that eventuality, a subclass of classic inverted lists has been implemented. You can find this implementation if you study org.egothor.indexer.html.HTMLDocument.

Another case where you can find this approach helpful is in the indexing of complex documents (e.g., XML). You can write your own implementation of how the inverted list will be constructed, its format, and compression method. All of these things can be changed to your needs. As a good start point look at org.egothor.store.disc.IListMeta. This class drives what compression format will be used for an inverted list and it also manages what inverted list will be constructed for a given term.

Prev Up Next
Chapter 1. Architecture Home Parser
© 2003-2004 Egothor Developers