You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by farouk alhassan <os...@yahoo.com> on 2010/11/07 08:18:10 UTC

Newbie Question

Hi All,

I'm new to Lucene and have picked up the Lucene in Action book to get started. Really enjoying it but I have a small nagging question.

Is the index stored in the same "physical document" as the fields and values? If not, where is it stored and how is it linked.

This is because of this statement in the book

When you retrieve a document from the index, only stored fields will be present. For example, fields that were indexed but not stored won't be in the document. This behavior
is frequently a source of confusion

Thanks
Farouk A


      

Re: Newbie Question

Posted by farouk alhassan <os...@yahoo.com>.
Hi,

Thanks for your cobntribution. very much welcomed. I like the way you relate it to a relational database. we are all familiar with databases so it makes it even much more clearer.

I also like the indexing details ...
Hoping to be a Lucene guru very sooon!!

Tx

--- On Sun, 7/11/10, Israel Tsadok <it...@gmail.com> wrote:

From: Israel Tsadok <it...@gmail.com>
Subject: Re: Newbie Question
To: java-user@lucene.apache.org
Date: Sunday, 7 November, 2010, 9:54

(If I may)
In Lucene terminology, an "index" is what would be a "database" in RDBMS
terminology. It's the whole thing.
A document is akin to a row in a table. Most of the interesting stuff in
lucene revolves around locating the document, not retrieving the data
actually stored inside it. This is done using Term Vectors, Norms, Term
Frequencies, Document Frequencies etc. These are not stored per document,
but are rather properties of the whole index, and they are therefore similar
to the concept of "index" in an RDBMS.

I hope I didn't make to much of a mess trying to clear things up. I probably
missed some parts and may have misrepresented others, but this is roughly
how I look at it.

Israel

On Sun, Nov 7, 2010 at 11:09 AM, farouk alhassan <os...@yahoo.com>wrote:

> Thanks for your response.
> I have already done that and understand the code perfectly.
>
> Just to rephrase my question
>
> What is the relationship between an index and a document at the conceptual
> level.
>
> Does an index include a document or an index is a collection of documents?
>
> Also is index == document if there is only one document?
>
> Thanks
>
> --- On Sun, 7/11/10, Senthil <se...@gmail.com> wrote:
>
> From: Senthil <se...@gmail.com>
> Subject: Re: Newbie Question
> To: java-user@lucene.apache.org
> Date: Sunday, 7 November, 2010, 8:30
>
> Hi,
>   I recommend you to try simple indexer and searcher code from book which
> clear the confusion.
>
>   You need to specify the indexing folder and all the fields and values
> selected for indexing will stored in that folder. And during search, it
> searches from index and get the reference file path for search result too.
>
> regards
> Senthil
>
>
> On Sun, Nov 7, 2010 at 8:18 PM, farouk alhassan <osbert252003@yahoo.com
> >wrote:
>
> > Hi All,
> >
> > I'm new to Lucene and have picked up the Lucene in Action book to get
> > started. Really enjoying it but I have a small nagging question.
> >
> > Is the index stored in the same "physical document" as the fields and
> > values? If not, where is it stored and how is it linked.
> >
> > This is because of this statement in the book
> >
> > When you retrieve a document from the index, only stored fields will be
> > present. For example, fields that were indexed but not stored won't be in
> > the document. This behavior
> > is frequently a source of confusion
> >
> > Thanks
> > Farouk A
> >
> >
> >
>
>
>
>
>



      

Re: Newbie Question

Posted by Israel Tsadok <it...@gmail.com>.
(If I may)
In Lucene terminology, an "index" is what would be a "database" in RDBMS
terminology. It's the whole thing.
A document is akin to a row in a table. Most of the interesting stuff in
lucene revolves around locating the document, not retrieving the data
actually stored inside it. This is done using Term Vectors, Norms, Term
Frequencies, Document Frequencies etc. These are not stored per document,
but are rather properties of the whole index, and they are therefore similar
to the concept of "index" in an RDBMS.

I hope I didn't make to much of a mess trying to clear things up. I probably
missed some parts and may have misrepresented others, but this is roughly
how I look at it.

Israel

On Sun, Nov 7, 2010 at 11:09 AM, farouk alhassan <os...@yahoo.com>wrote:

> Thanks for your response.
> I have already done that and understand the code perfectly.
>
> Just to rephrase my question
>
> What is the relationship between an index and a document at the conceptual
> level.
>
> Does an index include a document or an index is a collection of documents?
>
> Also is index == document if there is only one document?
>
> Thanks
>
> --- On Sun, 7/11/10, Senthil <se...@gmail.com> wrote:
>
> From: Senthil <se...@gmail.com>
> Subject: Re: Newbie Question
> To: java-user@lucene.apache.org
> Date: Sunday, 7 November, 2010, 8:30
>
> Hi,
>   I recommend you to try simple indexer and searcher code from book which
> clear the confusion.
>
>   You need to specify the indexing folder and all the fields and values
> selected for indexing will stored in that folder. And during search, it
> searches from index and get the reference file path for search result too.
>
> regards
> Senthil
>
>
> On Sun, Nov 7, 2010 at 8:18 PM, farouk alhassan <osbert252003@yahoo.com
> >wrote:
>
> > Hi All,
> >
> > I'm new to Lucene and have picked up the Lucene in Action book to get
> > started. Really enjoying it but I have a small nagging question.
> >
> > Is the index stored in the same "physical document" as the fields and
> > values? If not, where is it stored and how is it linked.
> >
> > This is because of this statement in the book
> >
> > When you retrieve a document from the index, only stored fields will be
> > present. For example, fields that were indexed but not stored won't be in
> > the document. This behavior
> > is frequently a source of confusion
> >
> > Thanks
> > Farouk A
> >
> >
> >
>
>
>
>
>

Re: Newbie Question

Posted by farouk alhassan <os...@yahoo.com>.
Thanks for your response.
I have already done that and understand the code perfectly.

Just to rephrase my question

What is the relationship between an index and a document at the conceptual level.

Does an index include a document or an index is a collection of documents?

Also is index == document if there is only one document?

Thanks

--- On Sun, 7/11/10, Senthil <se...@gmail.com> wrote:

From: Senthil <se...@gmail.com>
Subject: Re: Newbie Question
To: java-user@lucene.apache.org
Date: Sunday, 7 November, 2010, 8:30

Hi,
  I recommend you to try simple indexer and searcher code from book which
clear the confusion.

  You need to specify the indexing folder and all the fields and values
selected for indexing will stored in that folder. And during search, it
searches from index and get the reference file path for search result too.

regards
Senthil


On Sun, Nov 7, 2010 at 8:18 PM, farouk alhassan <os...@yahoo.com>wrote:

> Hi All,
>
> I'm new to Lucene and have picked up the Lucene in Action book to get
> started. Really enjoying it but I have a small nagging question.
>
> Is the index stored in the same "physical document" as the fields and
> values? If not, where is it stored and how is it linked.
>
> This is because of this statement in the book
>
> When you retrieve a document from the index, only stored fields will be
> present. For example, fields that were indexed but not stored won't be in
> the document. This behavior
> is frequently a source of confusion
>
> Thanks
> Farouk A
>
>
>



      

Re: Newbie Question

Posted by Senthil <se...@gmail.com>.
Hi,
  I recommend you to try simple indexer and searcher code from book which
clear the confusion.

  You need to specify the indexing folder and all the fields and values
selected for indexing will stored in that folder. And during search, it
searches from index and get the reference file path for search result too.

regards
Senthil


On Sun, Nov 7, 2010 at 8:18 PM, farouk alhassan <os...@yahoo.com>wrote:

> Hi All,
>
> I'm new to Lucene and have picked up the Lucene in Action book to get
> started. Really enjoying it but I have a small nagging question.
>
> Is the index stored in the same "physical document" as the fields and
> values? If not, where is it stored and how is it linked.
>
> This is because of this statement in the book
>
> When you retrieve a document from the index, only stored fields will be
> present. For example, fields that were indexed but not stored won't be in
> the document. This behavior
> is frequently a source of confusion
>
> Thanks
> Farouk A
>
>
>