You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Yura Smolsky <in...@altervision.biz> on 2005/02/24 19:01:49 UTC

sorted search

Hello, lucene-user.

I have index with many documents, more than 40 Mil.
Each document has DateField (It is time stamp of document)

I need the most recent results only. I use single instance of IndexSearcher.
When I perform sorted search on this index:
      Sort sort = new Sort();
      sort.setSort( new SortField[] { new SortField ("modified", SortField.STRING, true) } );
      Hits hits =
        searcher.search(QueryParser.parse("good", "content",
                                          StandardAnalyzer()), sort);

then search speed is not good.

Today I have tried search without "sort by modified", but with sort by
Relevance. Speed was much better!

I think that Sort by DateField is very slow. Maybe I do something
wrong about this kind of sorted search? Can you give me advices about
this?

Thanks.
                                          
Yura Smolsky.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re[2]: sorted search

Posted by Yura Smolsky <in...@altervision.biz>.
Hello, Erik.

about memory usage...
DateField takes string of 9 bytes in memory ('000ic64p7')
How much memory will be taken by this string?

How much memory will be taken by integer?

EH> Sorting by String uses up lots more RAM than a numeric sort.  If you
EH> use a numeric (yet lexicographically orderable) date format (e.g. 
EH> YYYYMMDD) you'll see better performance most likely.

EH>         Erik


EH> On Feb 24, 2005, at 1:01 PM, Yura Smolsky wrote:

>> Hello, lucene-user.
>>
>> I have index with many documents, more than 40 Mil.
>> Each document has DateField (It is time stamp of document)
>>
>> I need the most recent results only. I use single instance of 
>> IndexSearcher.
>> When I perform sorted search on this index:
>>       Sort sort = new Sort();
>>       sort.setSort( new SortField[] { new SortField ("modified", 
>> SortField.STRING, true) } );
>>       Hits hits =
>>         searcher.search(QueryParser.parse("good", "content",
>>                                           StandardAnalyzer()), sort);
>>
>> then search speed is not good.
>>
>> Today I have tried search without "sort by modified", but with sort by
>> Relevance. Speed was much better!
>>
>> I think that Sort by DateField is very slow. Maybe I do something
>> wrong about this kind of sorted search? Can you give me advices about
>> this?
>>
>> Thanks.
>>
>> Yura Smolsky.
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail:
>> lucene-user-help@jakarta.apache.org


EH> ---------------------------------------------------------------------
EH> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
EH> For additional commands, e-mail:
EH> lucene-user-help@jakarta.apache.org





Yura Smolsky.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re[2]: sorted search

Posted by Yura Smolsky <in...@altervision.biz>.
Hello, Erik.

if i need to store hour and minute then I need to place date into
following integer format:
YYYYMMDDHHII
?
Will it be faster than current solution?
And will I have ability to do Ranged queries (from Date A to Date B)?

EH> Sorting by String uses up lots more RAM than a numeric sort.  If you
EH> use a numeric (yet lexicographically orderable) date format (e.g. 
EH> YYYYMMDD) you'll see better performance most likely.

EH>         Erik


EH> On Feb 24, 2005, at 1:01 PM, Yura Smolsky wrote:

>> Hello, lucene-user.
>>
>> I have index with many documents, more than 40 Mil.
>> Each document has DateField (It is time stamp of document)
>>
>> I need the most recent results only. I use single instance of 
>> IndexSearcher.
>> When I perform sorted search on this index:
>>       Sort sort = new Sort();
>>       sort.setSort( new SortField[] { new SortField ("modified", 
>> SortField.STRING, true) } );
>>       Hits hits =
>>         searcher.search(QueryParser.parse("good", "content",
>>                                           StandardAnalyzer()), sort);
>>
>> then search speed is not good.
>>
>> Today I have tried search without "sort by modified", but with sort by
>> Relevance. Speed was much better!
>>
>> I think that Sort by DateField is very slow. Maybe I do something
>> wrong about this kind of sorted search? Can you give me advices about
>> this?
>>
>> Thanks.
>>
>> Yura Smolsky.
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail:
>> lucene-user-help@jakarta.apache.org


EH> ---------------------------------------------------------------------
EH> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
EH> For additional commands, e-mail:
EH> lucene-user-help@jakarta.apache.org





Yura Smolsky.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: sorted search

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Sorting by String uses up lots more RAM than a numeric sort.  If you 
use a numeric (yet lexicographically orderable) date format (e.g. 
YYYYMMDD) you'll see better performance most likely.

	Erik


On Feb 24, 2005, at 1:01 PM, Yura Smolsky wrote:

> Hello, lucene-user.
>
> I have index with many documents, more than 40 Mil.
> Each document has DateField (It is time stamp of document)
>
> I need the most recent results only. I use single instance of 
> IndexSearcher.
> When I perform sorted search on this index:
>       Sort sort = new Sort();
>       sort.setSort( new SortField[] { new SortField ("modified", 
> SortField.STRING, true) } );
>       Hits hits =
>         searcher.search(QueryParser.parse("good", "content",
>                                           StandardAnalyzer()), sort);
>
> then search speed is not good.
>
> Today I have tried search without "sort by modified", but with sort by
> Relevance. Speed was much better!
>
> I think that Sort by DateField is very slow. Maybe I do something
> wrong about this kind of sorted search? Can you give me advices about
> this?
>
> Thanks.
>
> Yura Smolsky.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: sorted search

Posted by Daniel Naber <da...@t-online.de>.
On Thursday 24 February 2005 19:01, Yura Smolsky wrote:

>       sort.setSort( new SortField[] { new SortField ("modified",
> SortField.STRING, true) } );

You should store the date as a number, e.g. "days since 1970" (or weeks if 
that is precise enough) and then tell the sort that it's an integer. 
DateField always stores the date in milliseconds which leads to a large 
number of terms, it also turns the date into a string, both makes searching 
and especially sorting slower.

Regards
 Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org