You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by darren <da...@ontrenet.com> on 2008/04/15 16:51:45 UTC

Which will be faster?

Hi,
  Pardon the noob question. But which approach is going to be faster
over extremely large document sets. A or B?

A) Multiple field values, Stored.NO,TOKENIZED.
word: one
word: two
word: three

B) Single field value, Stored.NO,TOKENIZED
word: one two three

Thanks for the tip.
Darren


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Which will be faster?

Posted by Michael McCandless <lu...@mikemccandless.com>.
The index should be identical in these two cases as long as the  
single string yields the same tokens during analysis as the  
concatenation of the tokens from the separate strings.

So index size & search speed would be the same.

Mike

Darren Govoni wrote:
> I guess I meant searching the index, size of index etc.
>
> So they would search essentially the same?
>
> Sorry that wasn't clear from my original email.
>
> Darren
>
> ----- Original Message ----- From: "Erick Erickson"  
> <er...@gmail.com>
> To: <ja...@lucene.apache.org>
> Sent: Tuesday, April 15, 2008 1:15 PM
> Subject: Re: Which will be faster?
>
>
>> I wouldn't worry about it too much, since there'll be overhead for  
>> you
>> building up the string in the first place as well. I suspect that the
>> time difference will be dwarfed by the indexing process. So I'd do  
>> what's
>> easiest first.......
>> Erick
>> On Tue, Apr 15, 2008 at 10:51 AM, darren <da...@ontrenet.com> wrote:
>>> Hi,
>>>  Pardon the noob question. But which approach is going to be faster
>>> over extremely large document sets. A or B?
>>>
>>> A) Multiple field values, Stored.NO,TOKENIZED.
>>> word: one
>>> word: two
>>> word: three
>>>
>>> B) Single field value, Stored.NO,TOKENIZED
>>> word: one two three
>>>
>>> Thanks for the tip.
>>> Darren
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Which will be faster?

Posted by Darren Govoni <da...@ontrenet.com>.
I guess I meant searching the index, size of index etc.

So they would search essentially the same?

Sorry that wasn't clear from my original email.

Darren

----- Original Message ----- 
From: "Erick Erickson" <er...@gmail.com>
To: <ja...@lucene.apache.org>
Sent: Tuesday, April 15, 2008 1:15 PM
Subject: Re: Which will be faster?


>I wouldn't worry about it too much, since there'll be overhead for you
> building up the string in the first place as well. I suspect that the
> time difference will be dwarfed by the indexing process. So I'd do what's
> easiest first.......
> 
> Erick
> 
> On Tue, Apr 15, 2008 at 10:51 AM, darren <da...@ontrenet.com> wrote:
> 
>> Hi,
>>  Pardon the noob question. But which approach is going to be faster
>> over extremely large document sets. A or B?
>>
>> A) Multiple field values, Stored.NO,TOKENIZED.
>> word: one
>> word: two
>> word: three
>>
>> B) Single field value, Stored.NO,TOKENIZED
>> word: one two three
>>
>> Thanks for the tip.
>> Darren
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Which will be faster?

Posted by Erick Erickson <er...@gmail.com>.
I wouldn't worry about it too much, since there'll be overhead for you
building up the string in the first place as well. I suspect that the
time difference will be dwarfed by the indexing process. So I'd do what's
easiest first.......

Erick

On Tue, Apr 15, 2008 at 10:51 AM, darren <da...@ontrenet.com> wrote:

> Hi,
>  Pardon the noob question. But which approach is going to be faster
> over extremely large document sets. A or B?
>
> A) Multiple field values, Stored.NO,TOKENIZED.
> word: one
> word: two
> word: three
>
> B) Single field value, Stored.NO,TOKENIZED
> word: one two three
>
> Thanks for the tip.
> Darren
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Which will be faster?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Most likely B will be somewhat faster.

There is some small overhead to each field instance.

Mike

darren wrote:
> Hi,
>   Pardon the noob question. But which approach is going to be faster
> over extremely large document sets. A or B?
>
> A) Multiple field values, Stored.NO,TOKENIZED.
> word: one
> word: two
> word: three
>
> B) Single field value, Stored.NO,TOKENIZED
> word: one two three
>
> Thanks for the tip.
> Darren
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org