You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Anshum <an...@gmail.com> on 2008/08/01 20:51:10 UTC

Re: The best strategy to "How store multiple fields of same document"

Hey Sergey,
With that kind of a dimension I guess you could work with multiple fields. I
have tried it over a score of fields for over 10 million documents. Works
fine if implemented neatly.
Is there more that you would be doing other than vanilla search?
--
Anshum Gupta
Naukri Labs!

On Thu, Jul 31, 2008 at 8:59 PM, Sergey Kabashnyuk <ks...@gmail.com>wrote:

> Thank you Erick.
>
> I'm talking about more then 10,000 documents and 95% less then 10 fields.
> Maximum number of fields per document is unlimited.
> But in practice it's no more the 20.
>
>
> I'm interesting: does Lucene have any internal optimization,
> which depend of the fields count or fields size, as database do?
> I mean to determinate position of row X in index:
>
> positionX = sum(fieldsize[1]+...fieldsize[i])*(X-1)
>
>
> Sergey Kabashnyuk
> eXo Platform SAS
>
>
> I'd go with option 1 unless and until you could demonstrate performance
>> problems. Speaking of which, you'd get a more informed answer if you
>> provided a bit more data, like how many fields are we talking, how many
>> documents, etc. If you're indexing 10,000 documents, go with the simplest.
>> If you're indexing 1,000,000,000 documents, more thought is required <G>..
>> Do you expect 3 fields/doc or 30,000 fields/doc?
>>
>> But the reason I'd go with <1> is that your second option has some issues.
>> 1> how to tokenize? You'll probably have to write a custom one or risk
>>    getting tokens "name" "value" rather than "name@value".
>> 2> Forming queries is, I believe, equally complex in both cases, so
>>    choose the conceptually simplest one. Let's say you have
>>    to search on foo1:val1  and foo2:val2. In the first case this is
>>    simple +foo1:val1 +foo2:val2. For your second case, you get
>>    +bigfield:foo1@val1 + bigfield:foo2@val2. There's not much
>>    difference between the two.
>> 3> Back to my initial comment about resource usage: we don't
>>    have enough data to answer whether it makes any difference.
>>    But even if we did, you'd find the response a variation of
>>    "you'll have to try it and see" since there are so many
>>    variables.
>>
>> But I'll repeat that I always go with the simplest approach unless and
>> until I'm certain there's a problem...
>>
>> Best
>> Erick
>>
>> On Thu, Jul 31, 2008 at 10:36 AM, Sergey Kabashnyuk <ksmmlist@gmail.com
>> >wrote:
>>
>> The best strategy.
>>>
>>> Hello.
>>> I want to ask you opinion about to "How
>>> store multiple fields of same document".
>>>
>>> I see now two possibility's.
>>> 1. Multiple fields in document
>>> 2. One filed: for example named PROPERTIES, with multiple instances.
>>>  And values combined with name for example "name@value"
>>>
>>> What choice the best for search speed and resource usage?
>>>
>>> Thanks.
>>>
>>> Sergey Kabashnyuk
>>> eXo Platform SAS
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............