You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Seid Mohammed <se...@gmail.com> on 2009/02/19 12:17:15 UTC

lucene index details

I am new to lucene, and reading lucene in action book
sometimes, i better understand when somone tell me an answer than a book.
my queston is
when indexing, what actually lucene is doing?
if i have a file called test.txt  with contents " lucen is used to
index files" and i apply lucene indexing, what is the content of the
index and  what is the structure of the index?.

and if i apply lucene search, for example a query "index files", from
where lucene searches, from the index or from the test.index file

thanks a lot
seid m

-- 
"RABI ZIDNI ILMA"

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene index details

Posted by Matt Ronge <mr...@mronge.com>.
On Feb 19, 2009, at 5:17 AM, Seid Mohammed wrote:

> I am new to lucene, and reading lucene in action book
> sometimes, i better understand when somone tell me an answer than a  
> book.
> my queston is
> when indexing, what actually lucene is doing?
> if i have a file called test.txt  with contents " lucen is used to
> index files" and i apply lucene indexing, what is the content of the
> index and  what is the structure of the index?.
>
> and if i apply lucene search, for example a query "index files", from
> where lucene searches, from the index or from the test.index file


If you're interested in the structure of the Lucene index, the best  
resource is here: http://lucene.apache.org/java/2_4_0/fileformats.html  
(for the current release)

It lays out the structure in great detail, skim it over a bit, it will  
greatly help in understanding how Lucene works.
--
Matt Ronge
mronge@mronge.com
http://www.mronge.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene index details

Posted by Erick Erickson <er...@gmail.com>.
You have to look at Analyzers a bit here because that's what
controls what is in the index. The simplest case is a WhitespaceAnalyzer
that breaks the input stream up into tokens on any whitespace.

So, in your example and using a WhitespaceAnalyzer, you'd get
the following tokens:
lucene, is, used, to, index, files

Let's put these into a field called "text"  in a document. A document is
a little like a row in a database table. So you could have fields
"text", "filename".... In this example, "filename" has nothing
(and, in fact, doesn't even need to be present in this particular doc).

Now, parsing the query against the text field (see the query syntax)
essentially asks "does the document have the word 'index' OR the
word 'files' in the 'text' field"? (OR is the default operator).

But note that there's no magic involved here. Lucene, for instance,
doesn't know about indexing files. The examples in the book
have underlying code that opens the files, reads the data and
feeds that data through an Analyzer for indexing. That's code you
have to write yourself.


Anyway, I'd examine the examples carefully. Also, get a copy of
Luke, a program that allows you to examine the index and see what
various query parsers do. It's invaluable.

As far as the internal structure of the index, I just treat it as a black
box, but on the Wiki there are links to various explanations.

Best
Erick

On Thu, Feb 19, 2009 at 6:17 AM, Seid Mohammed <se...@gmail.com> wrote:

> I am new to lucene, and reading lucene in action book
> sometimes, i better understand when somone tell me an answer than a book.
> my queston is
> when indexing, what actually lucene is doing?
> if i have a file called test.txt  with contents " lucen is used to
> index files" and i apply lucene indexing, what is the content of the
> index and  what is the structure of the index?.
>
> and if i apply lucene search, for example a query "index files", from
> where lucene searches, from the index or from the test.index file
>
> thanks a lot
> seid m
>
> --
> "RABI ZIDNI ILMA"
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>