You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Yura Smolsky <in...@altervision.biz> on 2005/02/24 19:01:49 UTC
sorted search
Hello, lucene-user.
I have index with many documents, more than 40 Mil.
Each document has DateField (It is time stamp of document)
I need the most recent results only. I use single instance of IndexSearcher.
When I perform sorted search on this index:
Sort sort = new Sort();
sort.setSort( new SortField[] { new SortField ("modified", SortField.STRING, true) } );
Hits hits =
searcher.search(QueryParser.parse("good", "content",
StandardAnalyzer()), sort);
then search speed is not good.
Today I have tried search without "sort by modified", but with sort by
Relevance. Speed was much better!
I think that Sort by DateField is very slow. Maybe I do something
wrong about this kind of sorted search? Can you give me advices about
this?
Thanks.
Yura Smolsky.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re[2]: sorted search
Posted by Yura Smolsky <in...@altervision.biz>.
Hello, Erik.
about memory usage...
DateField takes string of 9 bytes in memory ('000ic64p7')
How much memory will be taken by this string?
How much memory will be taken by integer?
EH> Sorting by String uses up lots more RAM than a numeric sort. If you
EH> use a numeric (yet lexicographically orderable) date format (e.g.
EH> YYYYMMDD) you'll see better performance most likely.
EH> Erik
EH> On Feb 24, 2005, at 1:01 PM, Yura Smolsky wrote:
>> Hello, lucene-user.
>>
>> I have index with many documents, more than 40 Mil.
>> Each document has DateField (It is time stamp of document)
>>
>> I need the most recent results only. I use single instance of
>> IndexSearcher.
>> When I perform sorted search on this index:
>> Sort sort = new Sort();
>> sort.setSort( new SortField[] { new SortField ("modified",
>> SortField.STRING, true) } );
>> Hits hits =
>> searcher.search(QueryParser.parse("good", "content",
>> StandardAnalyzer()), sort);
>>
>> then search speed is not good.
>>
>> Today I have tried search without "sort by modified", but with sort by
>> Relevance. Speed was much better!
>>
>> I think that Sort by DateField is very slow. Maybe I do something
>> wrong about this kind of sorted search? Can you give me advices about
>> this?
>>
>> Thanks.
>>
>> Yura Smolsky.
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail:
>> lucene-user-help@jakarta.apache.org
EH> ---------------------------------------------------------------------
EH> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
EH> For additional commands, e-mail:
EH> lucene-user-help@jakarta.apache.org
Yura Smolsky.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re[2]: sorted search
Posted by Yura Smolsky <in...@altervision.biz>.
Hello, Erik.
if i need to store hour and minute then I need to place date into
following integer format:
YYYYMMDDHHII
?
Will it be faster than current solution?
And will I have ability to do Ranged queries (from Date A to Date B)?
EH> Sorting by String uses up lots more RAM than a numeric sort. If you
EH> use a numeric (yet lexicographically orderable) date format (e.g.
EH> YYYYMMDD) you'll see better performance most likely.
EH> Erik
EH> On Feb 24, 2005, at 1:01 PM, Yura Smolsky wrote:
>> Hello, lucene-user.
>>
>> I have index with many documents, more than 40 Mil.
>> Each document has DateField (It is time stamp of document)
>>
>> I need the most recent results only. I use single instance of
>> IndexSearcher.
>> When I perform sorted search on this index:
>> Sort sort = new Sort();
>> sort.setSort( new SortField[] { new SortField ("modified",
>> SortField.STRING, true) } );
>> Hits hits =
>> searcher.search(QueryParser.parse("good", "content",
>> StandardAnalyzer()), sort);
>>
>> then search speed is not good.
>>
>> Today I have tried search without "sort by modified", but with sort by
>> Relevance. Speed was much better!
>>
>> I think that Sort by DateField is very slow. Maybe I do something
>> wrong about this kind of sorted search? Can you give me advices about
>> this?
>>
>> Thanks.
>>
>> Yura Smolsky.
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail:
>> lucene-user-help@jakarta.apache.org
EH> ---------------------------------------------------------------------
EH> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
EH> For additional commands, e-mail:
EH> lucene-user-help@jakarta.apache.org
Yura Smolsky.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: sorted search
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Sorting by String uses up lots more RAM than a numeric sort. If you
use a numeric (yet lexicographically orderable) date format (e.g.
YYYYMMDD) you'll see better performance most likely.
Erik
On Feb 24, 2005, at 1:01 PM, Yura Smolsky wrote:
> Hello, lucene-user.
>
> I have index with many documents, more than 40 Mil.
> Each document has DateField (It is time stamp of document)
>
> I need the most recent results only. I use single instance of
> IndexSearcher.
> When I perform sorted search on this index:
> Sort sort = new Sort();
> sort.setSort( new SortField[] { new SortField ("modified",
> SortField.STRING, true) } );
> Hits hits =
> searcher.search(QueryParser.parse("good", "content",
> StandardAnalyzer()), sort);
>
> then search speed is not good.
>
> Today I have tried search without "sort by modified", but with sort by
> Relevance. Speed was much better!
>
> I think that Sort by DateField is very slow. Maybe I do something
> wrong about this kind of sorted search? Can you give me advices about
> this?
>
> Thanks.
>
> Yura Smolsky.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: sorted search
Posted by Daniel Naber <da...@t-online.de>.
On Thursday 24 February 2005 19:01, Yura Smolsky wrote:
> sort.setSort( new SortField[] { new SortField ("modified",
> SortField.STRING, true) } );
You should store the date as a number, e.g. "days since 1970" (or weeks if
that is precise enough) and then tell the sort that it's an integer.
DateField always stores the date in milliseconds which leads to a large
number of terms, it also turns the date into a string, both makes searching
and especially sorting slower.
Regards
Daniel
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org