You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2007/07/03 23:20:36 UTC
[Lucene-java Wiki] Update of "ConceptsAndDefinitions" by RenaudWaldura
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.
The following page has been changed by RenaudWaldura:
http://wiki.apache.org/lucene-java/ConceptsAndDefinitions
------------------------------------------------------------------------------
This page contains concepts and definitions related to Lucene. It is not a substitute for knowledge in InformationRetrieval.
-
- == Concepts ==
-
- FILL IN HERE: Basic ideas behind indexing, searching, Lucene in general, important classes, etc.
-
== Definitions ==
- '''Please keep in alphabetical order when editing'''
+ ''Please keep in alphabetical order when editing''.
'''[http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Analyzer.html Analyzer]''' - Lucene class used for preparing text for indexing. Most applications can use the [http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/StandardAnalyzer.html StandardAnalyzer] for English and latin based languages.
@@ -21, +16 @@
'''Stemmer''' - From [http://en.wikipedia.org/wiki/Stemmer Wikipedia Stemmer]: "A stemming algorithm, or stemmer, is a computer program or algorithm for reducing inflected (or sometimes derived) words to their stem, base or root form — generally a written word form." Stemmers are often used to reduce the search space and index size. Often times a user searching for "widgets" is interested in documents that contain the term "widget".
- '''[http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermFreqVector.html TermFreqVector]''' - A Term Frequency Vector (aka Term Vector) is a data structure containing a given Document's term and frequency information and can be retrieved from the [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html IndexReader] only when Term Vectors are stored during indexing.
+ == Core Classes ==
+ === Document ===
+
+ A Lucene
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/Document.html Document]
+ is a record in the index. A Document has a list of fields.
+
+ === Term ===
+
+ A [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/Term.html Term] is Lucene's unit of indexing. In western languages, a Term is often a word.
+
+ === TermEnum ===
+
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermEnum.html TermEnum] is used to enumerate all terms in the index for a given field, regardless of which documents the terms occur in (or where they occur).
+
+ Some query subclasses are implemented by enumerating terms that match a pattern, and building a large OR query from the enumeration. E.g. WildcardQuery, PrefixQuery, RangeQuery.
+
+ See ["LuceneFAQ"], ''How do I retrieve all the values of a particular field that exists within an index, across all documents?'' which also includes sample code.
+
+ === TermDocs ===
+
+ Unlike TermEnum (see above), [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermDocs.html TermDocs] is used to identify which documents contain a given Term. TermDocs also gives the frequency of the term in the document.
+
+ === TermFreqVector ===
+
+ A [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermFreqVector.html TermFreqVector] (aka Term Frequency Vector or just Term Vector) is a data structure containing a given Document's term and frequency information and can be retrieved from the [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html IndexReader] only when Term Vectors are stored during indexing.
+
+ === Directory ===
+
+ === IndexReader ===
+
+ === IndexSearcher ===
+