You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2007/07/03 23:20:36 UTC
[Lucene-java Wiki] Update of "ConceptsAndDefinitions" by RenaudWaldura

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by RenaudWaldura:
http://wiki.apache.org/lucene-java/ConceptsAndDefinitions

------------------------------------------------------------------------------
  
  This page contains concepts and definitions related to Lucene.  It is not a substitute for knowledge in InformationRetrieval.
  
- 
- == Concepts ==
- 
- FILL IN HERE:  Basic ideas behind indexing, searching, Lucene in general, important classes, etc.
- 
  == Definitions ==
  
- '''Please keep in alphabetical order when editing'''
+ ''Please keep in alphabetical order when editing''.
  
  '''[http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Analyzer.html Analyzer]''' - Lucene class used for preparing text for indexing.  Most applications can use the [http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/StandardAnalyzer.html StandardAnalyzer] for English and latin based languages.
  
@@ -21, +16 @@

  
  '''Stemmer''' - From [http://en.wikipedia.org/wiki/Stemmer Wikipedia Stemmer]: "A stemming algorithm, or stemmer, is a computer program or algorithm for reducing inflected (or sometimes derived) words to their stem, base or root form — generally a written word form."  Stemmers are often used to reduce the search space and index size.  Often times a user searching for "widgets" is interested in documents that contain the term "widget".
  
- '''[http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermFreqVector.html TermFreqVector]''' - A Term Frequency Vector (aka Term Vector) is a data structure containing a given Document's term and frequency information and can be retrieved from the [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html IndexReader] only when Term Vectors are stored during indexing.
+ == Core Classes ==
  
+ === Document ===
+ 
+ A Lucene 
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/Document.html Document]
+ is a record in the index. A Document has a list of fields.
+ 
+ === Term ===
+ 
+ A [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/Term.html Term] is Lucene's unit of indexing. In western languages, a Term is often a word.
+ 
+ === TermEnum ===
+ 
+ [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermEnum.html TermEnum] is used to enumerate all terms in the index for a given field, regardless of which documents the terms occur in (or where they occur).
+ 
+ Some query subclasses are implemented by enumerating terms that match a pattern, and building a large OR query from the enumeration. E.g. WildcardQuery, PrefixQuery, RangeQuery.
+ 
+ See ["LuceneFAQ"], ''How do I retrieve all the values of a particular field that exists within an index, across all documents?'' which also includes sample code.
+ 
+ === TermDocs ===
+ 
+ Unlike TermEnum (see above), [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermDocs.html TermDocs] is used to identify which documents contain a given Term. TermDocs also gives the frequency of the term in the document.
+ 
+ === TermFreqVector ===
+ 
+ A [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermFreqVector.html TermFreqVector] (aka Term Frequency Vector or just Term Vector) is a data structure containing a given Document's term and frequency information and can be retrieved from the [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html IndexReader] only when Term Vectors are stored during indexing.
+ 
+ === Directory ===
+ 
+ === IndexReader ===
+ 
+ === IndexSearcher ===
+