You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by senthil kumaran <ku...@gmail.com> on 2007/03/06 10:37:01 UTC

Indexing & search?

Hi,
    I've indexed 4 among 5 fields with Field.Store.YES & Field.Index.NO. And
indexed the remaining one, say it's Field Name is *content*, with
Field.Store.YES & Field.Index.Tokenized(It's value is collective value of
other 4 fields and some more values).So my search always based on
*content*field.
    I've indexed 2 douments . In 1st doc, f1:mybook, f2:contains, f3:all,
f4:information, content:mybook contains all information that you need
and in 2nd   f1:somebody, f2:want, f3:search, f4:information,
content:somebody want search information of mybook
    I want to get search results of all docs where field1's value is
"mybook".My query is content:mybook.But it returns 2 matching documents
instead of 1.
    Any filters can i use for this??
    Is there any possible way other than changing f1 to
Field.Index.tokenized???Because i want to avoid duplication in index.

Re: Indexing & search?

Posted by Antony Bowesman <ad...@teamware.com>.

Hi,

>    I've indexed 4 among 5 fields with Field.Store.YES & Field.Index.NO. And
> indexed the remaining one, say it's Field Name is *content*, with
> Field.Store.YES & Field.Index.Tokenized(It's value is collective value of
> other 4 fields and some more values).So my search always based on
> *content*field.
>    I've indexed 2 douments . In 1st doc, f1:mybook, f2:contains, f3:all,
> f4:information, content:mybook contains all information that you need
> and in 2nd   f1:somebody, f2:want, f3:search, f4:information,
> content:somebody want search information of mybook
>    I want to get search results of all docs where field1's value is
> "mybook".My query is content:mybook.But it returns 2 matching documents
> instead of 1.

The example shows the first 4 words of each 'content' being stored as f1, f2, 
f3, f4.  If that is your intention, then you can use SpanFirstQuery to find 
words that were in f1.  It can also be used to find hits in words 2-4, but you 
will have to test the hits to find out the positional match.

Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Indexing & search?

Posted by Steven Rowe <sa...@syr.edu>.

Hi senthil,

senthil kumaran wrote:
>    I've indexed 4 among 5 fields with Field.Store.YES & Field.Index.NO. And
> indexed the remaining one, say it's Field Name is *content*, with
> Field.Store.YES & Field.Index.Tokenized(It's value is collective value of
> other 4 fields and some more values).So my search always based on
> *content*field.
>    I've indexed 2 douments . In 1st doc, f1:mybook, f2:contains, f3:all,
> f4:information, content:mybook contains all information that you need
> and in 2nd   f1:somebody, f2:want, f3:search, f4:information,
> content:somebody want search information of mybook
>    I want to get search results of all docs where field1's value is
> "mybook".My query is content:mybook.But it returns 2 matching documents
> instead of 1.
>    Any filters can i use for this??
>    Is there any possible way other than changing f1 to
> Field.Index.tokenized???Because i want to avoid duplication in index.

Your query is behaving as it should - since the "content" field in both
docs contains "mybook", they both match.

Although you say you want to avoid duplication in the index, I think you
already know what to do (you wrote "I want to get search results of all
docs where field1's value is 'mybook'") - index "field1" to make it
directly queryable.  If the information really needs to be distinct to
query properly, then make it so.

And if the index gets too large, you can try removing the duplication
from the "content" field, and include the other fields in your queries.

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Indexing & search?

Posted by Erick Erickson <er...@gmail.com>.

You could analyze all the documents returned in your query to see
if the "other fields" match. That is, could cycle through each
document returned in, say, a hits object to see if f1 actually matches.

This is almost certainly NOT what you want to do. Do you have any
clue whether saving the space is actually worth it? How big to
you expect your index to be? Disk space is cheap, and Lucene
handles pretty big indexes well. For instance, I've found that the
search time in a 4G index is, maybe, 10-15% faster than an 8G
index. So unless and until you *know*
there's a problem, you should index all the fields you want to search
on, keeping the design as simple as possible. Only after you *know*
there's a problem should you consider efficiencies....

Best
Erick

On 3/6/07, senthil kumaran <ku...@gmail.com> wrote:
>
> Hi,
>     I've indexed 4 among 5 fields with Field.Store.YES & Field.Index.NO.
> And
> indexed the remaining one, say it's Field Name is *content*, with
> Field.Store.YES & Field.Index.Tokenized(It's value is collective value of
> other 4 fields and some more values).So my search always based on
> *content*field.
>     I've indexed 2 douments . In 1st doc, f1:mybook, f2:contains, f3:all,
> f4:information, content:mybook contains all information that you need
> and in 2nd   f1:somebody, f2:want, f3:search, f4:information,
> content:somebody want search information of mybook
>     I want to get search results of all docs where field1's value is
> "mybook".My query is content:mybook.But it returns 2 matching documents
> instead of 1.
>     Any filters can i use for this??
>     Is there any possible way other than changing f1 to
> Field.Index.tokenized???Because i want to avoid duplication in index.
>