You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Daniel Cortes <dc...@fib.upc.edu> on 2004/12/27 11:50:32 UTC

index question

I want to know In the case that you use Lucene for index files how a 
general searcher, what fields (or keys) do you use to index.
For example, in my case are html,pdf,doc,ppt and txt and I'm thinked to 
use Field Autor, Field title, field url, field content, field 
modification date.
Something more? some recommendation?
thks
and Merry Xmas for all.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: index question

Posted by Daniel Cortes <dc...@fib.upc.edu>.
A lot of thks Nader, I try now, and I tell you the results.
thks

Nader Henein wrote:

> ok, so you can index the whole document in one shot, but you should 
> store certain fields like what you display in the search results in 
> the index to avoid a round trip to the DB.
>
> so for example you would store "title" "synopsis" "link" "doc_id" 
> "date" and then just index what you want to be searchable, the reason 
> why you would have title stored in one field and indexed again in 
> another so if you stem that field it will become useless for display 
> purposes.  So the logical representation of your index would look 
> something like this:
>
> <document>
>    <id> stored/ indexed
>    <title> stored/ un-indexed
>    <synopsis> stored/ un-indexed
>    <date> stored / indexed
>    <full document stemmed>  indexed / un stored
> </document>
>
> Enjoy
>
> Nader Henein
>
>
> Daniel Cortes wrote:
>
>> thks nader
>> I need a general search of documents, it's for this that I ask yours 
>> recomendations, because fields are only for info in the search. 
>> Tipically search on Google for example
>>
>> search:casa
>>
>> La casa roja
>> ..había una vez una casa roja que tenia ....
>> htttp:\\go.to\casa    Modification date:25-12-04
>>
>> for do this  what fields and options (keybord,text,unindex,unstored) 
>> do you should use?
>>
>> thks
>>
>> Nader Henein wrote:
>>
>>> It comes down to your searching needs, do you need to have your 
>>> documents searcheable by these fields or do you need a general 
>>> search of the whole document, your decisions will impact the size of 
>>> the index and the speed of indexing and searching so give it due 
>>> thought, start from your GUI requirement and design the index that 
>>> responds to your user needs best.
>>>
>>> Nader
>>>
>>> Daniel Cortes wrote:
>>>
>>>> I want to know In the case that you use Lucene for index files how 
>>>> a general searcher, what fields (or keys) do you use to index.
>>>> For example, in my case are html,pdf,doc,ppt and txt and I'm 
>>>> thinked to use Field Autor, Field title, field url, field content, 
>>>> field modification date.
>>>> Something more? some recommendation?
>>>> thks
>>>> and Merry Xmas for all.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: index question

Posted by Nader Henein <ns...@bayt.net>.
ok, so you can index the whole document in one shot, but you should 
store certain fields like what you display in the search results in the 
index to avoid a round trip to the DB.

so for example you would store "title" "synopsis" "link" "doc_id" "date" 
and then just index what you want to be searchable, the reason why you 
would have title stored in one field and indexed again in another so if 
you stem that field it will become useless for display purposes.  So the 
logical representation of your index would look something like this:

<document>
    <id> stored/ indexed
    <title> stored/ un-indexed
    <synopsis> stored/ un-indexed
    <date> stored / indexed
    <full document stemmed>  indexed / un stored
</document>

Enjoy

Nader Henein


Daniel Cortes wrote:

> thks nader
> I need a general search of documents, it's for this that I ask yours 
> recomendations, because fields are only for info in the search. 
> Tipically search on Google for example
>
> search:casa
>
> La casa roja
> ..había una vez una casa roja que tenia ....
> htttp:\\go.to\casa    Modification date:25-12-04
>
> for do this  what fields and options (keybord,text,unindex,unstored) 
> do you should use?
>
> thks
>
> Nader Henein wrote:
>
>> It comes down to your searching needs, do you need to have your 
>> documents searcheable by these fields or do you need a general search 
>> of the whole document, your decisions will impact the size of the 
>> index and the speed of indexing and searching so give it due thought, 
>> start from your GUI requirement and design the index that responds to 
>> your user needs best.
>>
>> Nader
>>
>> Daniel Cortes wrote:
>>
>>> I want to know In the case that you use Lucene for index files how a 
>>> general searcher, what fields (or keys) do you use to index.
>>> For example, in my case are html,pdf,doc,ppt and txt and I'm thinked 
>>> to use Field Autor, Field title, field url, field content, field 
>>> modification date.
>>> Something more? some recommendation?
>>> thks
>>> and Merry Xmas for all.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: index question

Posted by Daniel Cortes <dc...@fib.upc.edu>.
thks nader
I need a general search of documents, it's for this that I ask yours 
recomendations, because fields are only for info in the search. 
Tipically search on Google for example

search:casa

La casa roja
..había una vez una casa roja que tenia ....
htttp:\\go.to\casa    Modification date:25-12-04

for do this  what fields and options (keybord,text,unindex,unstored) do 
you should use?

thks

Nader Henein wrote:

> It comes down to your searching needs, do you need to have your 
> documents searcheable by these fields or do you need a general search 
> of the whole document, your decisions will impact the size of the 
> index and the speed of indexing and searching so give it due thought, 
> start from your GUI requirement and design the index that responds to 
> your user needs best.
>
> Nader
>
> Daniel Cortes wrote:
>
>> I want to know In the case that you use Lucene for index files how a 
>> general searcher, what fields (or keys) do you use to index.
>> For example, in my case are html,pdf,doc,ppt and txt and I'm thinked 
>> to use Field Autor, Field title, field url, field content, field 
>> modification date.
>> Something more? some recommendation?
>> thks
>> and Merry Xmas for all.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: index question

Posted by Nader Henein <ns...@bayt.net>.
It comes down to your searching needs, do you need to have your 
documents searcheable by these fields or do you need a general search of 
the whole document, your decisions will impact the size of the index and 
the speed of indexing and searching so give it due thought, start from 
your GUI requirement and design the index that responds to your user 
needs best.

Nader

Daniel Cortes wrote:

> I want to know In the case that you use Lucene for index files how a 
> general searcher, what fields (or keys) do you use to index.
> For example, in my case are html,pdf,doc,ppt and txt and I'm thinked 
> to use Field Autor, Field title, field url, field content, field 
> modification date.
> Something more? some recommendation?
> thks
> and Merry Xmas for all.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org