You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Joel Halbert <jo...@su3analytics.com> on 2009/09/15 10:19:50 UTC

Displaying search result data - stored fields vs external source

Hi,

When using Lucene I always consider two approaches to displaying search
result data to users:

1. Store any fields that we index and display to users in the Lucene
Documents themselves. When we perform a search simply retrieve the data
to be displayed from the Lucence documents themselves.

or

2. Index fields in Lucene but reference data to be displayed from
another source, such as a database. So, when searching I would search
for documents then use a (stored) reference key on the documents to then
lookup the display fields to display from another source e.g. a
database.

With regards to the number and size of stored fields I am looking at
indexing and displaying approximately 4 relatively small fields for each
document (e.g.  name, age, short description, URL ~ approx 500bytes in
total). In any query about 10 hits will be displayed to the user.
Approximately 10 million documents to index and search.

I am interested the differences in both approaches with regards to:

1) Indexing time performance (how long it might take to index with and
without stored fields)
2) Search time performance (total time taken to search for matching
documents and then display fields to users)

I am less interested in differences arising from
maintainability/increased storage requirements.

I would be interested to see what others  think of using each approach.

Cheers,
Joel


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Displaying search result data - stored fields vs external source

Posted by Savvas-Andreas Moysidis <sa...@googlemail.com>.
Hello ,

I would also prefer to store the content in the index because, as Erick
points out this leads to a more simplified design but also because it allows
me to preserve the relevance sort.



If you store only  the item id in the index then when extracting all the
other required data from  supposedly a database, you will probably execute a
“*select * from item where id in (id_1,id_2,id_3...)*”

which will probably not retain your relevance sort. So unless you sort by a
business field or apply some kind of convoluted sort strategy which maps to
your original Lucene ResultSet you will have lost your ranking.



Cheers,

savvas


2009/9/15 Erick Erickson <er...@gmail.com>

> Categorically I store everything in the index unless/until I *know* it
> doesn'twork. With some things, it's easy to know from the outset, like if I
> have
> 20T of data to store.
>
> First, storing fields has minimal impact on the search speed, the stored
> text
> isn't interleaved with the search tokens, so they're pretty much disjoint.
>
> Second, any scheme storing data separately is inherently more complex
> and difficult to maintain. From the eXtreme Programming folks "Do the
> simplest thing that could possibly work".
>
> Third, there isn't much work in trying it and seeing. I mean you have to
> write
> the retrieval code, and if you encapsulate fetching the data you can switch
> it out later if it comes to that pretty easily. So you don't lose much at
> all
> by "just trying it" <G>..
>
> HTH
> Erick
>
> On Tue, Sep 15, 2009 at 4:19 AM, Joel Halbert <jo...@su3analytics.com>
> wrote:
>
> > Hi,
> >
> > When using Lucene I always consider two approaches to displaying search
> > result data to users:
> >
> > 1. Store any fields that we index and display to users in the Lucene
> > Documents themselves. When we perform a search simply retrieve the data
> > to be displayed from the Lucence documents themselves.
> >
> > or
> >
> > 2. Index fields in Lucene but reference data to be displayed from
> > another source, such as a database. So, when searching I would search
> > for documents then use a (stored) reference key on the documents to then
> > lookup the display fields to display from another source e.g. a
> > database.
> >
> > With regards to the number and size of stored fields I am looking at
> > indexing and displaying approximately 4 relatively small fields for each
> > document (e.g.  name, age, short description, URL ~ approx 500bytes in
> > total). In any query about 10 hits will be displayed to the user.
> > Approximately 10 million documents to index and search.
> >
> > I am interested the differences in both approaches with regards to:
> >
> > 1) Indexing time performance (how long it might take to index with and
> > without stored fields)
> > 2) Search time performance (total time taken to search for matching
> > documents and then display fields to users)
> >
> > I am less interested in differences arising from
> > maintainability/increased storage requirements.
> >
> > I would be interested to see what others  think of using each approach.
> >
> > Cheers,
> > Joel
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: Displaying search result data - stored fields vs external source

Posted by Erick Erickson <er...@gmail.com>.
Categorically I store everything in the index unless/until I *know* it
doesn'twork. With some things, it's easy to know from the outset, like if I
have
20T of data to store.

First, storing fields has minimal impact on the search speed, the stored
text
isn't interleaved with the search tokens, so they're pretty much disjoint.

Second, any scheme storing data separately is inherently more complex
and difficult to maintain. From the eXtreme Programming folks "Do the
simplest thing that could possibly work".

Third, there isn't much work in trying it and seeing. I mean you have to
write
the retrieval code, and if you encapsulate fetching the data you can switch
it out later if it comes to that pretty easily. So you don't lose much at
all
by "just trying it" <G>..

HTH
Erick

On Tue, Sep 15, 2009 at 4:19 AM, Joel Halbert <jo...@su3analytics.com> wrote:

> Hi,
>
> When using Lucene I always consider two approaches to displaying search
> result data to users:
>
> 1. Store any fields that we index and display to users in the Lucene
> Documents themselves. When we perform a search simply retrieve the data
> to be displayed from the Lucence documents themselves.
>
> or
>
> 2. Index fields in Lucene but reference data to be displayed from
> another source, such as a database. So, when searching I would search
> for documents then use a (stored) reference key on the documents to then
> lookup the display fields to display from another source e.g. a
> database.
>
> With regards to the number and size of stored fields I am looking at
> indexing and displaying approximately 4 relatively small fields for each
> document (e.g.  name, age, short description, URL ~ approx 500bytes in
> total). In any query about 10 hits will be displayed to the user.
> Approximately 10 million documents to index and search.
>
> I am interested the differences in both approaches with regards to:
>
> 1) Indexing time performance (how long it might take to index with and
> without stored fields)
> 2) Search time performance (total time taken to search for matching
> documents and then display fields to users)
>
> I am less interested in differences arising from
> maintainability/increased storage requirements.
>
> I would be interested to see what others  think of using each approach.
>
> Cheers,
> Joel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>