- Call 866-392-3336
- Chat With Sales
- Email Sales
- Request a Callback
- Contact Support
| Need Business Email Hosting? Call (866) 392-3336 |
Google is considered the leader in search innovation and technology. The speed and accuracy of the Google search system are well known. They accomplish this through Information Retrieval (IR), the science of indexing information in documents. We use the same IR approach in our search system, rather than opening every email to find the keywords a user is searching for.
Research on IR started around the 1960s. The goal of IR research is to quickly identify useful information in a large number of documents. In 1998, Google developed its own Page Rank algorithm, successfully applied IR to WWW, and turned it into a big business.
The basic idea of IR is to create an inverted index file for a large document set. The index file contains many (key, value) pairs, where "key" is any word in the document set and "value" includes information like the ID of the document that contains the key, the length of the document, the frequency of the key in that document, etc. Therefore, when you search for a word, the index file can tell you exactly which documents contain the word. And the document length and term frequency indicate how relevant this document is to your query.
Lucene is a powerful open-source IR library written in Java. It allows you to add indexing and searching ability to your own application. It is not an indexing/searching application; it is a Java package that includes APIs to meet your search needs.