You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by rajan chandi <ch...@gmail.com> on 2009/09/02 11:47:48 UTC

A very complex search problem.

Hi All,

We are dealing with a very complex problem of person specific search.

We're building a social network where people will post stuff and other users
should be able to see the content only from their contacts.

e.g. There are 10,000 users in the system and there are only 150 users in my
network.
I should be search across only 150 users' content.

Is there an easy way to approach this problem?

We've come-up with different approaches:-


   - Storing the relationship in each document.
   - A huge ORed query with all the IDs of the people that needs to be
   searched.
   - Creating a query and filtering the results based on the list of
   contacts.

None of these approach sounds to be plausible.

We already have gone through recently released book on Solr 1.4 Enterprise
Search. The book also doesn't seem to have any pointers.

Any good approach/pointers will help.

Thanks and regards
Rajan Chandi

Re: A very complex search problem.

Posted by rajan chandi <ch...@gmail.com>.
Gerald and Birger, Thank your for your quick responses.

In our situation, Users will tend to upload more than finding new friends.

We are currently considering doing the ORing or the contacts on the fly as
part of the search query.

Please correct me if I am wrong but here is what I understand from Birger:-

e.g. If a user uploads a document, we need to put all her contact ids on the
document as tags. This will boost the search performance.


Thanks and regards
Rajan Chandi

On Wed, Sep 2, 2009 at 3:44 PM, Lie, Birger <Bi...@expert.no> wrote:

> Hi,
> If you store all mutual relations in a database, a lot of the relations
> will overlap. This is easily done using distinct clauses in sql. Use the
> overlapped values as tags on documents. That way you gain tremendous
> performance in search time, Obviously updating documents are a performance
> loss.
>
> If users tend to upload more than find new "friends" it is good...
>
>
>
> -----Original Message-----
> From: Gérard Dupont [mailto:ger.dupont@gmail.com]
> Sent: 2. september 2009 11:54
> To: solr-user@lucene.apache.org
> Cc: solr-dev@lucene.apache.org; lucene-user@lucene.apache.org;
> lucene-dev@lucene.apache.org; Shalin Shekhar Mangar
> Subject: Re: A very complex search problem.
>
> Hi,
>
> The big OR query should be the easiest way and it may work up to ~1000
> users
> (ie you can specific by default 1024 boolean clause so up to N users in the
> OR where N = 1024 - (boolean clause in your query)). You can increase this
> limit of boolean clauses in the configuration but I guess too much is
> painful. I know that colleagues of me worked on Lucene with up to ~500
> boolean wuery ith huge response time constraints and many GB indexes and it
> was working fine. I guess SolR will work in the same way.
>
> On Wed, Sep 2, 2009 at 11:47, rajan chandi <ch...@gmail.com> wrote:
>
> > Hi All,
> >
> > We are dealing with a very complex problem of person specific search.
> >
> > We're building a social network where people will post stuff and other
> > users
> > should be able to see the content only from their contacts.
> >
> > e.g. There are 10,000 users in the system and there are only 150 users in
> > my
> > network.
> > I should be search across only 150 users' content.
> >
> > Is there an easy way to approach this problem?
> >
> > We've come-up with different approaches:-
> >
> >
> >   - Storing the relationship in each document.
> >   - A huge ORed query with all the IDs of the people that needs to be
> >   searched.
> >   - Creating a query and filtering the results based on the list of
> >   contacts.
> >
> > None of these approach sounds to be plausible.
> >
> > We already have gone through recently released book on Solr 1.4
> Enterprise
> > Search. The book also doesn't seem to have any pointers.
> >
> > Any good approach/pointers will help.
> >
> > Thanks and regards
> > Rajan Chandi
> >
>
>
>
> --
> Gérard Dupont
> Information Processing Control and Cognition (IPCC) - EADS DS
> http://weblab-project.org
>
> Document & Learning team - LITIS Laboratory
>

RE: A very complex search problem.

Posted by "Lie, Birger" <Bi...@expert.no>.
Hi,
If you store all mutual relations in a database, a lot of the relations will overlap. This is easily done using distinct clauses in sql. Use the overlapped values as tags on documents. That way you gain tremendous performance in search time, Obviously updating documents are a performance loss.

If users tend to upload more than find new "friends" it is good...



-----Original Message-----
From: Gérard Dupont [mailto:ger.dupont@gmail.com] 
Sent: 2. september 2009 11:54
To: solr-user@lucene.apache.org
Cc: solr-dev@lucene.apache.org; lucene-user@lucene.apache.org; lucene-dev@lucene.apache.org; Shalin Shekhar Mangar
Subject: Re: A very complex search problem.

Hi,

The big OR query should be the easiest way and it may work up to ~1000 users
(ie you can specific by default 1024 boolean clause so up to N users in the
OR where N = 1024 - (boolean clause in your query)). You can increase this
limit of boolean clauses in the configuration but I guess too much is
painful. I know that colleagues of me worked on Lucene with up to ~500
boolean wuery ith huge response time constraints and many GB indexes and it
was working fine. I guess SolR will work in the same way.

On Wed, Sep 2, 2009 at 11:47, rajan chandi <ch...@gmail.com> wrote:

> Hi All,
>
> We are dealing with a very complex problem of person specific search.
>
> We're building a social network where people will post stuff and other
> users
> should be able to see the content only from their contacts.
>
> e.g. There are 10,000 users in the system and there are only 150 users in
> my
> network.
> I should be search across only 150 users' content.
>
> Is there an easy way to approach this problem?
>
> We've come-up with different approaches:-
>
>
>   - Storing the relationship in each document.
>   - A huge ORed query with all the IDs of the people that needs to be
>   searched.
>   - Creating a query and filtering the results based on the list of
>   contacts.
>
> None of these approach sounds to be plausible.
>
> We already have gone through recently released book on Solr 1.4 Enterprise
> Search. The book also doesn't seem to have any pointers.
>
> Any good approach/pointers will help.
>
> Thanks and regards
> Rajan Chandi
>



-- 
Gérard Dupont
Information Processing Control and Cognition (IPCC) - EADS DS
http://weblab-project.org

Document & Learning team - LITIS Laboratory

Re: A very complex search problem.

Posted by Gérard Dupont <ge...@gmail.com>.
Hi,

The big OR query should be the easiest way and it may work up to ~1000 users
(ie you can specific by default 1024 boolean clause so up to N users in the
OR where N = 1024 - (boolean clause in your query)). You can increase this
limit of boolean clauses in the configuration but I guess too much is
painful. I know that colleagues of me worked on Lucene with up to ~500
boolean wuery ith huge response time constraints and many GB indexes and it
was working fine. I guess SolR will work in the same way.

On Wed, Sep 2, 2009 at 11:47, rajan chandi <ch...@gmail.com> wrote:

> Hi All,
>
> We are dealing with a very complex problem of person specific search.
>
> We're building a social network where people will post stuff and other
> users
> should be able to see the content only from their contacts.
>
> e.g. There are 10,000 users in the system and there are only 150 users in
> my
> network.
> I should be search across only 150 users' content.
>
> Is there an easy way to approach this problem?
>
> We've come-up with different approaches:-
>
>
>   - Storing the relationship in each document.
>   - A huge ORed query with all the IDs of the people that needs to be
>   searched.
>   - Creating a query and filtering the results based on the list of
>   contacts.
>
> None of these approach sounds to be plausible.
>
> We already have gone through recently released book on Solr 1.4 Enterprise
> Search. The book also doesn't seem to have any pointers.
>
> Any good approach/pointers will help.
>
> Thanks and regards
> Rajan Chandi
>



-- 
Gérard Dupont
Information Processing Control and Cognition (IPCC) - EADS DS
http://weblab-project.org

Document & Learning team - LITIS Laboratory

Re: A very complex search problem.

Posted by rajan chandi <ch...@gmail.com>.
Great Thanks Aakash for your inputs!
We'll try to do some research and possibly bench-marks before we move
forward.

Regards
Rajan

On Wed, Sep 2, 2009 at 1:27 PM, Aakash Dharmadhikari <aa...@gmail.com>wrote:

> hi Rajan,
>
>  More knowledgeable people might be able to  provide better insight into
> the performance issues, but I have a doubt around this ORing business.
>
>  The best option I see is storing all my friends IDs in my documents as
> multi valued field. This in contrast to OR queries would make querying
> super
> fast as the number of Terms are reduced to one per document. In case of
> ORing if I have 150 friends, there would be 150 terms to be matched against
> per document in case its not my friends document.
>
> It would certainly increase the size of your index a bit, but comparing the
> query time efforts in ORing this might be extremely efficient.
>
> regards,
> aakash
>
>
>
> 2009/9/2 rajan chandi <ch...@gmail.com>
>
> > Thank you Birger for the pointer to HBase.
> >
> > HBase sounds interesting. We will consider this for -  "people you may
> > know".
> >
> > We are trying to address a different problem of searching from a well
> > defined list of contacts.
> > A huge ORed query sounds good at this point as a solution.
> >
> > Thanks and regards
> > Rajan Chandi
> >
> >
> > On Wed, Sep 2, 2009 at 4:22 PM, Lie, Birger <Bi...@expert.no>
> wrote:
> >
> > > HI,
> > > I might be unclear in what I mean.
> > >
> > >
> > > Usually people have friends in common, so if you
> > > 1) create and store a relationship between user x and y, and give that
> > > an id.
> > > 2) x knows z than there is a probability that y might know z as well.
> > >
> > > If that is the case than add z to the relation and you don't need
> update
> > > documents.
> > >
> > >
> > > The important thing is to create some sort of relationship concept so
> > > you don't end up with N users and N relations...
> > > In this is the case when you search than you only have 1 And clause
> > > instead of 3.
> > >
> > >
> > > I think Hadoop running HBase is ideal for this application. Facebook is
> > > using HBase (I think) CouchDB is also excellent...
> > >
> > >
> > > -Birger
> > >
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: gwk [mailto:gijs@eyefi.nl]
> > > Sent: 2. september 2009 12:34
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: A very complex search problem.
> > >
> > > Hello Rajan,
> > >
> > > I might be mistaken, but isn't CouchDB or a similar map/reduce database
> > > ideal for situations like this?
> > >
> > > Regards,
> > >
> > > gwk
> > >
> > > rajan chandi wrote:
> > > > Hi All,
> > > >
> > > > We are dealing with a very complex problem of person specific search.
> > > >
> > > > We're building a social network where people will post stuff and
> other
> > > users
> > > > should be able to see the content only from their contacts.
> > > >
> > > > e.g. There are 10,000 users in the system and there are only 150
> users
> > > in my
> > > > network.
> > > > I should be search across only 150 users' content.
> > > >
> > > > Is there an easy way to approach this problem?
> > > >
> > > > We've come-up with different approaches:-
> > > >
> > > >
> > > >    - Storing the relationship in each document.
> > > >    - A huge ORed query with all the IDs of the people that needs to
> be
> > > >    searched.
> > > >    - Creating a query and filtering the results based on the list of
> > > >    contacts.
> > > >
> > > > None of these approach sounds to be plausible.
> > > >
> > > > We already have gone through recently released book on Solr 1.4
> > > Enterprise
> > > > Search. The book also doesn't seem to have any pointers.
> > > >
> > > > Any good approach/pointers will help.
> > > >
> > > > Thanks and regards
> > > > Rajan Chandi
> > > >
> > > >
> > >
> > >
> >
>

Re: A very complex search problem.

Posted by Aakash Dharmadhikari <aa...@gmail.com>.
hi Rajan,

  More knowledgeable people might be able to  provide better insight into
the performance issues, but I have a doubt around this ORing business.

  The best option I see is storing all my friends IDs in my documents as
multi valued field. This in contrast to OR queries would make querying super
fast as the number of Terms are reduced to one per document. In case of
ORing if I have 150 friends, there would be 150 terms to be matched against
per document in case its not my friends document.

It would certainly increase the size of your index a bit, but comparing the
query time efforts in ORing this might be extremely efficient.

regards,
aakash



2009/9/2 rajan chandi <ch...@gmail.com>

> Thank you Birger for the pointer to HBase.
>
> HBase sounds interesting. We will consider this for -  "people you may
> know".
>
> We are trying to address a different problem of searching from a well
> defined list of contacts.
> A huge ORed query sounds good at this point as a solution.
>
> Thanks and regards
> Rajan Chandi
>
>
> On Wed, Sep 2, 2009 at 4:22 PM, Lie, Birger <Bi...@expert.no> wrote:
>
> > HI,
> > I might be unclear in what I mean.
> >
> >
> > Usually people have friends in common, so if you
> > 1) create and store a relationship between user x and y, and give that
> > an id.
> > 2) x knows z than there is a probability that y might know z as well.
> >
> > If that is the case than add z to the relation and you don't need update
> > documents.
> >
> >
> > The important thing is to create some sort of relationship concept so
> > you don't end up with N users and N relations...
> > In this is the case when you search than you only have 1 And clause
> > instead of 3.
> >
> >
> > I think Hadoop running HBase is ideal for this application. Facebook is
> > using HBase (I think) CouchDB is also excellent...
> >
> >
> > -Birger
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: gwk [mailto:gijs@eyefi.nl]
> > Sent: 2. september 2009 12:34
> > To: solr-user@lucene.apache.org
> > Subject: Re: A very complex search problem.
> >
> > Hello Rajan,
> >
> > I might be mistaken, but isn't CouchDB or a similar map/reduce database
> > ideal for situations like this?
> >
> > Regards,
> >
> > gwk
> >
> > rajan chandi wrote:
> > > Hi All,
> > >
> > > We are dealing with a very complex problem of person specific search.
> > >
> > > We're building a social network where people will post stuff and other
> > users
> > > should be able to see the content only from their contacts.
> > >
> > > e.g. There are 10,000 users in the system and there are only 150 users
> > in my
> > > network.
> > > I should be search across only 150 users' content.
> > >
> > > Is there an easy way to approach this problem?
> > >
> > > We've come-up with different approaches:-
> > >
> > >
> > >    - Storing the relationship in each document.
> > >    - A huge ORed query with all the IDs of the people that needs to be
> > >    searched.
> > >    - Creating a query and filtering the results based on the list of
> > >    contacts.
> > >
> > > None of these approach sounds to be plausible.
> > >
> > > We already have gone through recently released book on Solr 1.4
> > Enterprise
> > > Search. The book also doesn't seem to have any pointers.
> > >
> > > Any good approach/pointers will help.
> > >
> > > Thanks and regards
> > > Rajan Chandi
> > >
> > >
> >
> >
>

Re: A very complex search problem.

Posted by rajan chandi <ch...@gmail.com>.
Thank you Birger for the pointer to HBase.

HBase sounds interesting. We will consider this for -  "people you may
know".

We are trying to address a different problem of searching from a well
defined list of contacts.
A huge ORed query sounds good at this point as a solution.

Thanks and regards
Rajan Chandi


On Wed, Sep 2, 2009 at 4:22 PM, Lie, Birger <Bi...@expert.no> wrote:

> HI,
> I might be unclear in what I mean.
>
>
> Usually people have friends in common, so if you
> 1) create and store a relationship between user x and y, and give that
> an id.
> 2) x knows z than there is a probability that y might know z as well.
>
> If that is the case than add z to the relation and you don't need update
> documents.
>
>
> The important thing is to create some sort of relationship concept so
> you don't end up with N users and N relations...
> In this is the case when you search than you only have 1 And clause
> instead of 3.
>
>
> I think Hadoop running HBase is ideal for this application. Facebook is
> using HBase (I think) CouchDB is also excellent...
>
>
> -Birger
>
>
>
>
>
> -----Original Message-----
> From: gwk [mailto:gijs@eyefi.nl]
> Sent: 2. september 2009 12:34
> To: solr-user@lucene.apache.org
> Subject: Re: A very complex search problem.
>
> Hello Rajan,
>
> I might be mistaken, but isn't CouchDB or a similar map/reduce database
> ideal for situations like this?
>
> Regards,
>
> gwk
>
> rajan chandi wrote:
> > Hi All,
> >
> > We are dealing with a very complex problem of person specific search.
> >
> > We're building a social network where people will post stuff and other
> users
> > should be able to see the content only from their contacts.
> >
> > e.g. There are 10,000 users in the system and there are only 150 users
> in my
> > network.
> > I should be search across only 150 users' content.
> >
> > Is there an easy way to approach this problem?
> >
> > We've come-up with different approaches:-
> >
> >
> >    - Storing the relationship in each document.
> >    - A huge ORed query with all the IDs of the people that needs to be
> >    searched.
> >    - Creating a query and filtering the results based on the list of
> >    contacts.
> >
> > None of these approach sounds to be plausible.
> >
> > We already have gone through recently released book on Solr 1.4
> Enterprise
> > Search. The book also doesn't seem to have any pointers.
> >
> > Any good approach/pointers will help.
> >
> > Thanks and regards
> > Rajan Chandi
> >
> >
>
>

RE: A very complex search problem.

Posted by "Lie, Birger" <Bi...@expert.no>.
HI,
I might be unclear in what I mean. 


Usually people have friends in common, so if you 
1) create and store a relationship between user x and y, and give that
an id. 
2) x knows z than there is a probability that y might know z as well. 

If that is the case than add z to the relation and you don't need update
documents.


The important thing is to create some sort of relationship concept so
you don't end up with N users and N relations...
In this is the case when you search than you only have 1 And clause
instead of 3.


I think Hadoop running HBase is ideal for this application. Facebook is
using HBase (I think) CouchDB is also excellent...


-Birger





-----Original Message-----
From: gwk [mailto:gijs@eyefi.nl] 
Sent: 2. september 2009 12:34
To: solr-user@lucene.apache.org
Subject: Re: A very complex search problem.

Hello Rajan,

I might be mistaken, but isn't CouchDB or a similar map/reduce database 
ideal for situations like this?

Regards,

gwk

rajan chandi wrote:
> Hi All,
>
> We are dealing with a very complex problem of person specific search.
>
> We're building a social network where people will post stuff and other
users
> should be able to see the content only from their contacts.
>
> e.g. There are 10,000 users in the system and there are only 150 users
in my
> network.
> I should be search across only 150 users' content.
>
> Is there an easy way to approach this problem?
>
> We've come-up with different approaches:-
>
>
>    - Storing the relationship in each document.
>    - A huge ORed query with all the IDs of the people that needs to be
>    searched.
>    - Creating a query and filtering the results based on the list of
>    contacts.
>
> None of these approach sounds to be plausible.
>
> We already have gone through recently released book on Solr 1.4
Enterprise
> Search. The book also doesn't seem to have any pointers.
>
> Any good approach/pointers will help.
>
> Thanks and regards
> Rajan Chandi
>
>   


Re: A very complex search problem.

Posted by rajan chandi <ch...@gmail.com>.
Hi Gwk,

Thanks for the pointers.
The only concern will be the relevance.
Lucene has the best relevance capability so far. CouchDB sounds to be
interesting though.

May be We'll try to find some bench-marks on relevance score of CouchDB.

Thanks and Regards
Rajan Chandi

On Wed, Sep 2, 2009 at 4:04 PM, gwk <gi...@eyefi.nl> wrote:

> Hello Rajan,
>
> I might be mistaken, but isn't CouchDB or a similar map/reduce database
> ideal for situations like this?
>
> Regards,
>
> gwk
>
>
> rajan chandi wrote:
>
>> Hi All,
>>
>> We are dealing with a very complex problem of person specific search.
>>
>> We're building a social network where people will post stuff and other
>> users
>> should be able to see the content only from their contacts.
>>
>> e.g. There are 10,000 users in the system and there are only 150 users in
>> my
>> network.
>> I should be search across only 150 users' content.
>>
>> Is there an easy way to approach this problem?
>>
>> We've come-up with different approaches:-
>>
>>
>>   - Storing the relationship in each document.
>>   - A huge ORed query with all the IDs of the people that needs to be
>>   searched.
>>   - Creating a query and filtering the results based on the list of
>>   contacts.
>>
>> None of these approach sounds to be plausible.
>>
>> We already have gone through recently released book on Solr 1.4 Enterprise
>> Search. The book also doesn't seem to have any pointers.
>>
>> Any good approach/pointers will help.
>>
>> Thanks and regards
>> Rajan Chandi
>>
>>
>>
>
>

Re: A very complex search problem.

Posted by gwk <gi...@eyefi.nl>.
Hello Rajan,

I might be mistaken, but isn't CouchDB or a similar map/reduce database 
ideal for situations like this?

Regards,

gwk

rajan chandi wrote:
> Hi All,
>
> We are dealing with a very complex problem of person specific search.
>
> We're building a social network where people will post stuff and other users
> should be able to see the content only from their contacts.
>
> e.g. There are 10,000 users in the system and there are only 150 users in my
> network.
> I should be search across only 150 users' content.
>
> Is there an easy way to approach this problem?
>
> We've come-up with different approaches:-
>
>
>    - Storing the relationship in each document.
>    - A huge ORed query with all the IDs of the people that needs to be
>    searched.
>    - Creating a query and filtering the results based on the list of
>    contacts.
>
> None of these approach sounds to be plausible.
>
> We already have gone through recently released book on Solr 1.4 Enterprise
> Search. The book also doesn't seem to have any pointers.
>
> Any good approach/pointers will help.
>
> Thanks and regards
> Rajan Chandi
>
>