You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jame Vaalet <jv...@capitaliq.com> on 2011/07/05 10:37:07 UTC

searching a subset of SOLR index

Hi,
Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 .
Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000).

The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ?

Regards,
JAME VAALET
Software Developer
EXT :8108
Capital IQ


RE: searching a subset of SOLR index

Posted by Pierre GOSSE <pi...@arisem.com>.
It is redundancy. You have to balance the cost of redundancy with the cost in performance with your web index requested by your windows service. If your windows service is not too aggressive in its requests, go for shards.

Pierre

-----Message d'origine-----
De : Jame Vaalet [mailto:jvaalet@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 15:05
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

But incase the website docs contribute around 50 % of the entire docs , why to recreate the indexes . don't you think its redundancy ?
Can two web apps (solr instances ) share a single index file to search on it without interfering each other 


Regards,
JAME VAALET
Software Developer 
EXT :8108
Capital IQ


-----Original Message-----
From: Pierre GOSSE [mailto:pierre.gosse@arisem.com] 
Sent: Tuesday, July 05, 2011 5:12 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

>From what you tell us, I guess a separate index for website docs would be the best. If you fear that request from the window service would cripple your web site performance, why not have a totally separated index on another server, and have your website documents index in both indexes ?

Pierre

-----Message d'origine-----
De : Jame Vaalet [mailto:jvaalet@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
	The website will enable any user to search the document repository , and the set they search on is known as website presentable
2. windows service 
	The windows service will search on all the documents in the repository for fixed set of key words and store the found result in database.this set 	is universal set of documents in the doc repository including the website presentable.


Website is a high prioritized app which should work smoothly without any interference , where as windows service should run all day long continuously without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps , can this be used. 



Regards,
JAME VAALET


-----Original Message-----
From: Pierre GOSSE [mailto:pierre.gosse@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution.

Pierre

-----Message d'origine-----
De : Jame Vaalet [mailto:jvaalet@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ?

Regards,
JAME VAALET


-----Original Message-----
From: shashi.mit@gmail.com [mailto:shashi.mit@gmail.com] On Behalf Of Shashi Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet <jv...@capitaliq.com> wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

Re: searching a subset of SOLR index

Posted by Erik Hatcher <er...@gmail.com>.
I wouldn't share the same index across two Solr webapps - as they could step on each others toes.  

In this scenario, I think having two Solr instances replicating from the same master is the way to go, to allow you to scale your load from each application separately.  

	Erik



On Jul 5, 2011, at 09:04 , Jame Vaalet wrote:

> But incase the website docs contribute around 50 % of the entire docs , why to recreate the indexes . don't you think its redundancy ?
> Can two web apps (solr instances ) share a single index file to search on it without interfering each other 
> 
> 
> Regards,
> JAME VAALET
> Software Developer 
> EXT :8108
> Capital IQ
> 
> 
> -----Original Message-----
> From: Pierre GOSSE [mailto:pierre.gosse@arisem.com] 
> Sent: Tuesday, July 05, 2011 5:12 PM
> To: solr-user@lucene.apache.org
> Subject: RE: searching a subset of SOLR index
> 
> From what you tell us, I guess a separate index for website docs would be the best. If you fear that request from the window service would cripple your web site performance, why not have a totally separated index on another server, and have your website documents index in both indexes ?
> 
> Pierre
> 
> -----Message d'origine-----
> De : Jame Vaalet [mailto:jvaalet@capitaliq.com] 
> Envoyé : mardi 5 juillet 2011 13:14
> À : solr-user@lucene.apache.org
> Objet : RE: searching a subset of SOLR index
> 
> I have got two applications 
> 
> 1. website
> 	The website will enable any user to search the document repository , and the set they search on is known as website presentable
> 2. windows service 
> 	The windows service will search on all the documents in the repository for fixed set of key words and store the found result in database.this set 	is universal set of documents in the doc repository including the website presentable.
> 
> 
> Website is a high prioritized app which should work smoothly without any interference , where as windows service should run all day long continuously without break to save result from incoming docs.
> The problem here is website set is predefined and I don't want the windows service request to SOLR to slow down website request.
> 
> Suppose am segregating the website presentable docs index into a particular core and rest of them into different core will it solve the problem ?
> I have also read about multiple ports for listening request from different apps , can this be used. 
> 
> 
> 
> Regards,
> JAME VAALET
> 
> 
> -----Original Message-----
> From: Pierre GOSSE [mailto:pierre.gosse@arisem.com] 
> Sent: Tuesday, July 05, 2011 3:52 PM
> To: solr-user@lucene.apache.org
> Subject: RE: searching a subset of SOLR index
> 
> The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache.
> 
> If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution.
> 
> Pierre
> 
> -----Message d'origine-----
> De : Jame Vaalet [mailto:jvaalet@capitaliq.com] 
> Envoyé : mardi 5 juillet 2011 11:10
> À : solr-user@lucene.apache.org
> Objet : RE: searching a subset of SOLR index
> 
> Thanks.
> But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ?
> 
> Regards,
> JAME VAALET
> 
> 
> -----Original Message-----
> From: shashi.mit@gmail.com [mailto:shashi.mit@gmail.com] On Behalf Of Shashi Kant
> Sent: Tuesday, July 05, 2011 2:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: searching a subset of SOLR index
> 
> Range query
> 
> 
> On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet <jv...@capitaliq.com> wrote:
>> Hi,
>> Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 .
>> Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000).
>> 
>> The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ?
>> 
>> Regards,
>> JAME VAALET
>> Software Developer
>> EXT :8108
>> Capital IQ
>> 
>> 


RE: searching a subset of SOLR index

Posted by Jame Vaalet <jv...@capitaliq.com>.
But incase the website docs contribute around 50 % of the entire docs , why to recreate the indexes . don't you think its redundancy ?
Can two web apps (solr instances ) share a single index file to search on it without interfering each other 


Regards,
JAME VAALET
Software Developer 
EXT :8108
Capital IQ


-----Original Message-----
From: Pierre GOSSE [mailto:pierre.gosse@arisem.com] 
Sent: Tuesday, July 05, 2011 5:12 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

>From what you tell us, I guess a separate index for website docs would be the best. If you fear that request from the window service would cripple your web site performance, why not have a totally separated index on another server, and have your website documents index in both indexes ?

Pierre

-----Message d'origine-----
De : Jame Vaalet [mailto:jvaalet@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
	The website will enable any user to search the document repository , and the set they search on is known as website presentable
2. windows service 
	The windows service will search on all the documents in the repository for fixed set of key words and store the found result in database.this set 	is universal set of documents in the doc repository including the website presentable.


Website is a high prioritized app which should work smoothly without any interference , where as windows service should run all day long continuously without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps , can this be used. 



Regards,
JAME VAALET


-----Original Message-----
From: Pierre GOSSE [mailto:pierre.gosse@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution.

Pierre

-----Message d'origine-----
De : Jame Vaalet [mailto:jvaalet@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ?

Regards,
JAME VAALET


-----Original Message-----
From: shashi.mit@gmail.com [mailto:shashi.mit@gmail.com] On Behalf Of Shashi Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet <jv...@capitaliq.com> wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

RE: searching a subset of SOLR index

Posted by Pierre GOSSE <pi...@arisem.com>.
>From what you tell us, I guess a separate index for website docs would be the best. If you fear that request from the window service would cripple your web site performance, why not have a totally separated index on another server, and have your website documents index in both indexes ?

Pierre

-----Message d'origine-----
De : Jame Vaalet [mailto:jvaalet@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
	The website will enable any user to search the document repository , and the set they search on is known as website presentable
2. windows service 
	The windows service will search on all the documents in the repository for fixed set of key words and store the found result in database.this set 	is universal set of documents in the doc repository including the website presentable.


Website is a high prioritized app which should work smoothly without any interference , where as windows service should run all day long continuously without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps , can this be used. 



Regards,
JAME VAALET


-----Original Message-----
From: Pierre GOSSE [mailto:pierre.gosse@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution.

Pierre

-----Message d'origine-----
De : Jame Vaalet [mailto:jvaalet@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ?

Regards,
JAME VAALET


-----Original Message-----
From: shashi.mit@gmail.com [mailto:shashi.mit@gmail.com] On Behalf Of Shashi Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet <jv...@capitaliq.com> wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

RE: searching a subset of SOLR index

Posted by Jame Vaalet <jv...@capitaliq.com>.
I have got two applications 

1. website
	The website will enable any user to search the document repository , and the set they search on is known as website presentable
2. windows service 
	The windows service will search on all the documents in the repository for fixed set of key words and store the found result in database.this set 	is universal set of documents in the doc repository including the website presentable.


Website is a high prioritized app which should work smoothly without any interference , where as windows service should run all day long continuously without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps , can this be used. 



Regards,
JAME VAALET


-----Original Message-----
From: Pierre GOSSE [mailto:pierre.gosse@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution.

Pierre

-----Message d'origine-----
De : Jame Vaalet [mailto:jvaalet@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ?

Regards,
JAME VAALET


-----Original Message-----
From: shashi.mit@gmail.com [mailto:shashi.mit@gmail.com] On Behalf Of Shashi Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet <jv...@capitaliq.com> wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

RE: searching a subset of SOLR index

Posted by Pierre GOSSE <pi...@arisem.com>.
The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution.

Pierre

-----Message d'origine-----
De : Jame Vaalet [mailto:jvaalet@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ?

Regards,
JAME VAALET


-----Original Message-----
From: shashi.mit@gmail.com [mailto:shashi.mit@gmail.com] On Behalf Of Shashi Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet <jv...@capitaliq.com> wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

RE: searching a subset of SOLR index

Posted by Jame Vaalet <jv...@capitaliq.com>.
Thanks.
But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ?

Regards,
JAME VAALET


-----Original Message-----
From: shashi.mit@gmail.com [mailto:shashi.mit@gmail.com] On Behalf Of Shashi Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet <jv...@capitaliq.com> wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

Re: searching a subset of SOLR index

Posted by Shashi Kant <sk...@sloan.mit.edu>.
Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet <jv...@capitaliq.com> wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>