You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Daniel Cortes <dc...@fib.upc.edu> on 2004/12/27 11:50:32 UTC
index question
I want to know In the case that you use Lucene for index files how a
general searcher, what fields (or keys) do you use to index.
For example, in my case are html,pdf,doc,ppt and txt and I'm thinked to
use Field Autor, Field title, field url, field content, field
modification date.
Something more? some recommendation?
thks
and Merry Xmas for all.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: index question
Posted by Daniel Cortes <dc...@fib.upc.edu>.
A lot of thks Nader, I try now, and I tell you the results.
thks
Nader Henein wrote:
> ok, so you can index the whole document in one shot, but you should
> store certain fields like what you display in the search results in
> the index to avoid a round trip to the DB.
>
> so for example you would store "title" "synopsis" "link" "doc_id"
> "date" and then just index what you want to be searchable, the reason
> why you would have title stored in one field and indexed again in
> another so if you stem that field it will become useless for display
> purposes. So the logical representation of your index would look
> something like this:
>
> <document>
> <id> stored/ indexed
> <title> stored/ un-indexed
> <synopsis> stored/ un-indexed
> <date> stored / indexed
> <full document stemmed> indexed / un stored
> </document>
>
> Enjoy
>
> Nader Henein
>
>
> Daniel Cortes wrote:
>
>> thks nader
>> I need a general search of documents, it's for this that I ask yours
>> recomendations, because fields are only for info in the search.
>> Tipically search on Google for example
>>
>> search:casa
>>
>> La casa roja
>> ..había una vez una casa roja que tenia ....
>> htttp:\\go.to\casa Modification date:25-12-04
>>
>> for do this what fields and options (keybord,text,unindex,unstored)
>> do you should use?
>>
>> thks
>>
>> Nader Henein wrote:
>>
>>> It comes down to your searching needs, do you need to have your
>>> documents searcheable by these fields or do you need a general
>>> search of the whole document, your decisions will impact the size of
>>> the index and the speed of indexing and searching so give it due
>>> thought, start from your GUI requirement and design the index that
>>> responds to your user needs best.
>>>
>>> Nader
>>>
>>> Daniel Cortes wrote:
>>>
>>>> I want to know In the case that you use Lucene for index files how
>>>> a general searcher, what fields (or keys) do you use to index.
>>>> For example, in my case are html,pdf,doc,ppt and txt and I'm
>>>> thinked to use Field Autor, Field title, field url, field content,
>>>> field modification date.
>>>> Something more? some recommendation?
>>>> thks
>>>> and Merry Xmas for all.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: index question
Posted by Nader Henein <ns...@bayt.net>.
ok, so you can index the whole document in one shot, but you should
store certain fields like what you display in the search results in the
index to avoid a round trip to the DB.
so for example you would store "title" "synopsis" "link" "doc_id" "date"
and then just index what you want to be searchable, the reason why you
would have title stored in one field and indexed again in another so if
you stem that field it will become useless for display purposes. So the
logical representation of your index would look something like this:
<document>
<id> stored/ indexed
<title> stored/ un-indexed
<synopsis> stored/ un-indexed
<date> stored / indexed
<full document stemmed> indexed / un stored
</document>
Enjoy
Nader Henein
Daniel Cortes wrote:
> thks nader
> I need a general search of documents, it's for this that I ask yours
> recomendations, because fields are only for info in the search.
> Tipically search on Google for example
>
> search:casa
>
> La casa roja
> ..había una vez una casa roja que tenia ....
> htttp:\\go.to\casa Modification date:25-12-04
>
> for do this what fields and options (keybord,text,unindex,unstored)
> do you should use?
>
> thks
>
> Nader Henein wrote:
>
>> It comes down to your searching needs, do you need to have your
>> documents searcheable by these fields or do you need a general search
>> of the whole document, your decisions will impact the size of the
>> index and the speed of indexing and searching so give it due thought,
>> start from your GUI requirement and design the index that responds to
>> your user needs best.
>>
>> Nader
>>
>> Daniel Cortes wrote:
>>
>>> I want to know In the case that you use Lucene for index files how a
>>> general searcher, what fields (or keys) do you use to index.
>>> For example, in my case are html,pdf,doc,ppt and txt and I'm thinked
>>> to use Field Autor, Field title, field url, field content, field
>>> modification date.
>>> Something more? some recommendation?
>>> thks
>>> and Merry Xmas for all.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: index question
Posted by Daniel Cortes <dc...@fib.upc.edu>.
thks nader
I need a general search of documents, it's for this that I ask yours
recomendations, because fields are only for info in the search.
Tipically search on Google for example
search:casa
La casa roja
..había una vez una casa roja que tenia ....
htttp:\\go.to\casa Modification date:25-12-04
for do this what fields and options (keybord,text,unindex,unstored) do
you should use?
thks
Nader Henein wrote:
> It comes down to your searching needs, do you need to have your
> documents searcheable by these fields or do you need a general search
> of the whole document, your decisions will impact the size of the
> index and the speed of indexing and searching so give it due thought,
> start from your GUI requirement and design the index that responds to
> your user needs best.
>
> Nader
>
> Daniel Cortes wrote:
>
>> I want to know In the case that you use Lucene for index files how a
>> general searcher, what fields (or keys) do you use to index.
>> For example, in my case are html,pdf,doc,ppt and txt and I'm thinked
>> to use Field Autor, Field title, field url, field content, field
>> modification date.
>> Something more? some recommendation?
>> thks
>> and Merry Xmas for all.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: index question
Posted by Nader Henein <ns...@bayt.net>.
It comes down to your searching needs, do you need to have your
documents searcheable by these fields or do you need a general search of
the whole document, your decisions will impact the size of the index and
the speed of indexing and searching so give it due thought, start from
your GUI requirement and design the index that responds to your user
needs best.
Nader
Daniel Cortes wrote:
> I want to know In the case that you use Lucene for index files how a
> general searcher, what fields (or keys) do you use to index.
> For example, in my case are html,pdf,doc,ppt and txt and I'm thinked
> to use Field Autor, Field title, field url, field content, field
> modification date.
> Something more? some recommendation?
> thks
> and Merry Xmas for all.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org