You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eloi Rocha <el...@melontech.com.br> on 2010/08/05 15:32:40 UTC

Process entire result set

Hi everybody,

I would like to know if does make sense to use Solr in the following
scenario:
  - search for large amount of data (like 1000, 10000, 100000 registers)
  - each register contains four or five fields (strings and integers)
  - every time will request for entire result set (I can paginate the
results). It would be much better to get all results at once
  - we need to process the entire set in order to decide which ones will be
returned
  - this kind of request will happen frequently in several machines (several
transactions per second)
  - solr machines and request machines will be in the same cluster
  - we would like to get the entire result set in less than 500ms.

Thanks in advance,

Eloi

Re: Process entire result set

Posted by Eloi Rocha <el...@melontech.com.br>.
Thanks Jonathan!

We decided to create offline results and store them in a Non-sql storage
(HBase). So we can answer the requests selecting one the the offline
generated results. This offline results are generated everyday.

Thanks!

Eloi

On Thu, Aug 5, 2010 at 8:59 PM, Jonathan Rochkind <ro...@jhu.edu> wrote:

> Eloi Rocha wrote:
>
>> Hi everybody,
>>
>> I would like to know if does make sense to use Solr in the following
>> scenario:
>>  - search for large amount of data (like 1000, 10000, 100000 registers)
>>  - each register contains four or five fields (strings and integers)
>>  - every time will request for entire result set (I can paginate the
>> results). It would be much better to get all results at once [...]
>>
>>
>
> Depends on what kinds of searching you're doing. Are you doing searching
> that needs an indexer like Solr?  Then Solr is a good tool for your job.
>  Are you not, and you can do what you want just as easily in an rdbms or
> non-sql store like MongoDB? Then I wouldn't use Solr.
>
> Assuming you really do need Solr, I think this should work, but I would not
> store the actual stored fields in Solr, I'd store those fields in an
> external store (key-value store, rdbms, whatever).   You store only what you
> need to index in Solr, you do your search, you get ID's back.  You ask for
> the entire result set back, why not.  If you give Solr enough RAM, and set
> your cache settings appropriately (really big document and related caches),
> then I _think_ it should perform okay. One way to find out.
>
> What you'd get back is just ID's, then you'd look up that ID in your
> external store to get your actual fields you want to operate on. _May_ not
> be neccesary, maybe you could do it with solr stored fields, but making Solr
> do only exactly what you really need from it (an index) will maximize it's
> ability to do what you need in available RAM.
>
> If you don't need Solr/Lucene indexing/faceting behavior, and you can do
> just fine with an rdbms or non-sql store, use that.
>
> Jonathan
>



-- 
Eloi Rocha Neto
Melon Tech - http://melontech.com.br
+55 83 8868-7025

Re: Process entire result set

Posted by Jonathan Rochkind <ro...@jhu.edu>.
Eloi Rocha wrote:
> Hi everybody,
>
> I would like to know if does make sense to use Solr in the following
> scenario:
>   - search for large amount of data (like 1000, 10000, 100000 registers)
>   - each register contains four or five fields (strings and integers)
>   - every time will request for entire result set (I can paginate the
> results). It would be much better to get all results at once [...]
>   

Depends on what kinds of searching you're doing. Are you doing searching 
that needs an indexer like Solr?  Then Solr is a good tool for your job. 
  Are you not, and you can do what you want just as easily in an rdbms 
or non-sql store like MongoDB? Then I wouldn't use Solr.

Assuming you really do need Solr, I think this should work, but I would 
not store the actual stored fields in Solr, I'd store those fields in an 
external store (key-value store, rdbms, whatever).   You store only what 
you need to index in Solr, you do your search, you get ID's back.  You 
ask for the entire result set back, why not.  If you give Solr enough 
RAM, and set your cache settings appropriately (really big document and 
related caches), then I _think_ it should perform okay. One way to find 
out.

What you'd get back is just ID's, then you'd look up that ID in your 
external store to get your actual fields you want to operate on. _May_ 
not be neccesary, maybe you could do it with solr stored fields, but 
making Solr do only exactly what you really need from it (an index) will 
maximize it's ability to do what you need in available RAM.

If you don't need Solr/Lucene indexing/faceting behavior, and you can do 
just fine with an rdbms or non-sql store, use that.

Jonathan