You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Tyler <mi...@gmail.com> on 2013/10/13 16:34:10 UTC

SolrDocumentList - bitwise operation

Hello,

    I have 2 different solr indexes returning 2 different sets of
SolrDocumentList. Doc Id is the foreign key relation.

After obtaining them, I want to perform "AND" operation between them and
then return results to user. Can you tell me how do I get this? I am using
solr 4.3

 SolrDocumentList results1 = responseA.getResults();
 SolrDocumentList results2 = responseB.getResults();

results1  : d1, d2, d3
results2  :  d1,d2, d4

Return : d1, d2

Regards,
Michael

Re: SolrDocumentList - bitwise operation

Posted by Liu Bo <di...@gmail.com>.
join query might be helpful: http://wiki.apache.org/solr/Join

join can across indexes but probably won't work in solr clound.

be aware that only "to" documents are retrievable, if you want content from
both documents, join query won't work. And in lucene join query doesn't
quite work on multiple join conditions, haven't test it in solr yet.

I have similar join case like you, eventually I choose to denormalize our
data into one set of documents.


On 13 October 2013 22:34, Michael Tyler <mi...@gmail.com> wrote:

> Hello,
>
>     I have 2 different solr indexes returning 2 different sets of
> SolrDocumentList. Doc Id is the foreign key relation.
>
> After obtaining them, I want to perform "AND" operation between them and
> then return results to user. Can you tell me how do I get this? I am using
> solr 4.3
>
>  SolrDocumentList results1 = responseA.getResults();
>  SolrDocumentList results2 = responseB.getResults();
>
> results1  : d1, d2, d3
> results2  :  d1,d2, d4
>
> Return : d1, d2
>
> Regards,
> Michael
>



-- 
All the best

Liu Bo

Re: SolrDocumentList - bitwise operation

Posted by Michael Tyler <mi...@gmail.com>.
Hi,

   Regrets, I was confused with bit-set. I l have Shawn's suggested
approach in system.  I want to try with other ways and test performance.

How can I use join? I have 2 different solr indexes.
localhost:8080/solr_1/select?q=content:test&fl=id,name,type
localhost:8081/solr_1_1/select?q=text:test&fl=id

After getting results - Join by id

How do I do this? please suggest me with other ways to do this. current
method is taking lot of time.

Thanks
Michael.










On Tue, Oct 15, 2013 at 11:41 PM, Erick Erickson <er...@gmail.com>wrote:

> Why do you think a bitset would help? Bitsets have
> a bit set on for every document that matches
> based on the _internal_ Lucene document ID, it
> has nothing to do with the <uniqueKey> you have
> defined. Nor does it have anything to do with the
> foreign key relationship.
>
> So either I don't understand the problem at all or
> pursuing bitsets is a red herring.
>
> You might be substantially faster by sorting the
> results and then doing a skip-list sort of thing.
>
> FWIW,
> Erick
>
>
> On Mon, Oct 14, 2013 at 1:47 PM, Michael Tyler
> <mi...@gmail.com>wrote:
>
> > Hi Shawn,
> >
> >   This is time consuming operation. I already have this in my
> application .
> > I was pondering whether I can get bit set from both the solr indexes ,
> > bitset.and  then retrieve only those matched? I don't know how do I
> > retrieve bitset. - wanted to try this and test the performance.
> >
> >
> > Regards
> > Michael
> >
> >
> > On Sun, Oct 13, 2013 at 8:54 PM, Shawn Heisey <so...@elyograg.org> wrote:
> >
> > > On 10/13/2013 8:34 AM, Michael Tyler wrote:
> > > > Hello,
> > > >
> > > >     I have 2 different solr indexes returning 2 different sets of
> > > > SolrDocumentList. Doc Id is the foreign key relation.
> > > >
> > > > After obtaining them, I want to perform "AND" operation between them
> > and
> > > > then return results to user. Can you tell me how do I get this? I am
> > > using
> > > > solr 4.3
> > > >
> > > >  SolrDocumentList results1 = responseA.getResults();
> > > >  SolrDocumentList results2 = responseB.getResults();
> > > >
> > > > results1  : d1, d2, d3
> > > > results2  :  d1,d2, d4
> > >
> > > The SolrDocumentList class extends ArrayList<SolrDocument>, which means
> > > that it inherits all ArrayList functionality.  Unfortunately, there's
> no
> > > built-in way of eliminating duplicates with a java List.  It's very
> easy
> > > to combine the two results into another object, but that object will
> > > contain both of the d1 and both of the d2 SolrDocument objects.
> > >
> > > The following code is a reasonably fast way to handle this.  It assumes
> > > that results1 is the list that should win when there are duplicates, so
> > > it gets added first.  It assumes that the uniqueKey field is named "id"
> > > and that it contains a String value.  If these are incorrect
> > > assumptions, you can adjust the code accordingly.
> > >
> > > SolrDocumentList results1 = responseA.getResults();
> > > SolrDocumentList results2 = responseB.getResults();
> > > List<SolrDocumentList> tmpList = new ArrayList<SolrDocumentList>();
> > > tmpList.add(results1);
> > > tmpList.add(results2);
> > >
> > > Set<String> tmpSet = new HashSet<String>();
> > > SolrDocumentList newList = new SolrDocumentList();
> > > for (SolrDocumentList l : tmpList)
> > > {
> > >         for (SolrDocument d : l)
> > >         {
> > >                 String id = (String) d.get("id");
> > >                 if (tmpSet.contains(id)) {
> > >                         continue;
> > >                 }
> > >                 tmpSet.add(id);
> > >                 newList.add(d);
> > >         }
> > > }
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>

Re: SolrDocumentList - bitwise operation

Posted by Erick Erickson <er...@gmail.com>.
Why do you think a bitset would help? Bitsets have
a bit set on for every document that matches
based on the _internal_ Lucene document ID, it
has nothing to do with the <uniqueKey> you have
defined. Nor does it have anything to do with the
foreign key relationship.

So either I don't understand the problem at all or
pursuing bitsets is a red herring.

You might be substantially faster by sorting the
results and then doing a skip-list sort of thing.

FWIW,
Erick


On Mon, Oct 14, 2013 at 1:47 PM, Michael Tyler
<mi...@gmail.com>wrote:

> Hi Shawn,
>
>   This is time consuming operation. I already have this in my application .
> I was pondering whether I can get bit set from both the solr indexes ,
> bitset.and  then retrieve only those matched? I don't know how do I
> retrieve bitset. - wanted to try this and test the performance.
>
>
> Regards
> Michael
>
>
> On Sun, Oct 13, 2013 at 8:54 PM, Shawn Heisey <so...@elyograg.org> wrote:
>
> > On 10/13/2013 8:34 AM, Michael Tyler wrote:
> > > Hello,
> > >
> > >     I have 2 different solr indexes returning 2 different sets of
> > > SolrDocumentList. Doc Id is the foreign key relation.
> > >
> > > After obtaining them, I want to perform "AND" operation between them
> and
> > > then return results to user. Can you tell me how do I get this? I am
> > using
> > > solr 4.3
> > >
> > >  SolrDocumentList results1 = responseA.getResults();
> > >  SolrDocumentList results2 = responseB.getResults();
> > >
> > > results1  : d1, d2, d3
> > > results2  :  d1,d2, d4
> >
> > The SolrDocumentList class extends ArrayList<SolrDocument>, which means
> > that it inherits all ArrayList functionality.  Unfortunately, there's no
> > built-in way of eliminating duplicates with a java List.  It's very easy
> > to combine the two results into another object, but that object will
> > contain both of the d1 and both of the d2 SolrDocument objects.
> >
> > The following code is a reasonably fast way to handle this.  It assumes
> > that results1 is the list that should win when there are duplicates, so
> > it gets added first.  It assumes that the uniqueKey field is named "id"
> > and that it contains a String value.  If these are incorrect
> > assumptions, you can adjust the code accordingly.
> >
> > SolrDocumentList results1 = responseA.getResults();
> > SolrDocumentList results2 = responseB.getResults();
> > List<SolrDocumentList> tmpList = new ArrayList<SolrDocumentList>();
> > tmpList.add(results1);
> > tmpList.add(results2);
> >
> > Set<String> tmpSet = new HashSet<String>();
> > SolrDocumentList newList = new SolrDocumentList();
> > for (SolrDocumentList l : tmpList)
> > {
> >         for (SolrDocument d : l)
> >         {
> >                 String id = (String) d.get("id");
> >                 if (tmpSet.contains(id)) {
> >                         continue;
> >                 }
> >                 tmpSet.add(id);
> >                 newList.add(d);
> >         }
> > }
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: SolrDocumentList - bitwise operation

Posted by Michael Tyler <mi...@gmail.com>.
Hi Shawn,

  This is time consuming operation. I already have this in my application .
I was pondering whether I can get bit set from both the solr indexes ,
bitset.and  then retrieve only those matched? I don't know how do I
retrieve bitset. - wanted to try this and test the performance.


Regards
Michael


On Sun, Oct 13, 2013 at 8:54 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 10/13/2013 8:34 AM, Michael Tyler wrote:
> > Hello,
> >
> >     I have 2 different solr indexes returning 2 different sets of
> > SolrDocumentList. Doc Id is the foreign key relation.
> >
> > After obtaining them, I want to perform "AND" operation between them and
> > then return results to user. Can you tell me how do I get this? I am
> using
> > solr 4.3
> >
> >  SolrDocumentList results1 = responseA.getResults();
> >  SolrDocumentList results2 = responseB.getResults();
> >
> > results1  : d1, d2, d3
> > results2  :  d1,d2, d4
>
> The SolrDocumentList class extends ArrayList<SolrDocument>, which means
> that it inherits all ArrayList functionality.  Unfortunately, there's no
> built-in way of eliminating duplicates with a java List.  It's very easy
> to combine the two results into another object, but that object will
> contain both of the d1 and both of the d2 SolrDocument objects.
>
> The following code is a reasonably fast way to handle this.  It assumes
> that results1 is the list that should win when there are duplicates, so
> it gets added first.  It assumes that the uniqueKey field is named "id"
> and that it contains a String value.  If these are incorrect
> assumptions, you can adjust the code accordingly.
>
> SolrDocumentList results1 = responseA.getResults();
> SolrDocumentList results2 = responseB.getResults();
> List<SolrDocumentList> tmpList = new ArrayList<SolrDocumentList>();
> tmpList.add(results1);
> tmpList.add(results2);
>
> Set<String> tmpSet = new HashSet<String>();
> SolrDocumentList newList = new SolrDocumentList();
> for (SolrDocumentList l : tmpList)
> {
>         for (SolrDocument d : l)
>         {
>                 String id = (String) d.get("id");
>                 if (tmpSet.contains(id)) {
>                         continue;
>                 }
>                 tmpSet.add(id);
>                 newList.add(d);
>         }
> }
>
> Thanks,
> Shawn
>
>

Re: SolrDocumentList - bitwise operation

Posted by Shawn Heisey <so...@elyograg.org>.
On 10/13/2013 8:34 AM, Michael Tyler wrote:
> Hello,
> 
>     I have 2 different solr indexes returning 2 different sets of
> SolrDocumentList. Doc Id is the foreign key relation.
> 
> After obtaining them, I want to perform "AND" operation between them and
> then return results to user. Can you tell me how do I get this? I am using
> solr 4.3
> 
>  SolrDocumentList results1 = responseA.getResults();
>  SolrDocumentList results2 = responseB.getResults();
> 
> results1  : d1, d2, d3
> results2  :  d1,d2, d4

The SolrDocumentList class extends ArrayList<SolrDocument>, which means
that it inherits all ArrayList functionality.  Unfortunately, there's no
built-in way of eliminating duplicates with a java List.  It's very easy
to combine the two results into another object, but that object will
contain both of the d1 and both of the d2 SolrDocument objects.

The following code is a reasonably fast way to handle this.  It assumes
that results1 is the list that should win when there are duplicates, so
it gets added first.  It assumes that the uniqueKey field is named "id"
and that it contains a String value.  If these are incorrect
assumptions, you can adjust the code accordingly.

SolrDocumentList results1 = responseA.getResults();
SolrDocumentList results2 = responseB.getResults();
List<SolrDocumentList> tmpList = new ArrayList<SolrDocumentList>();
tmpList.add(results1);
tmpList.add(results2);

Set<String> tmpSet = new HashSet<String>();
SolrDocumentList newList = new SolrDocumentList();
for (SolrDocumentList l : tmpList)
{
	for (SolrDocument d : l)
	{
		String id = (String) d.get("id");
		if (tmpSet.contains(id)) {
			continue;
		}
		tmpSet.add(id);
		newList.add(d);
	}
}

Thanks,
Shawn