You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Heath Abelson <HA...@netcentricinc.com> on 2013/10/01 21:35:29 UTC

Trouble with IntersectingIterator

I am attempting to get a very simple example working with the Intersecting Iterator. I made up some dummy objects for me to do this work:

A scan on the "Mail" table looks like this:

m1 mail:body [U&(USA)]    WTF?
m1 mail:receiver [U&(USA)]    mgiordano
m1 mail:sender [U&(USA)]    habelson
m1 mail:sentTime [U&(USA)]    1380571500
m1 mail:subject [U&(USA)]    Lunch
m2 mail:body [U&(USA)]    I know right?
m2 mail:receiver [U&(USA)]    jmarcolla
m2 mail:sender [U&(USA)]    habelson
m2 mail:sentTime [U&(USA)]    1380571502
m2 mail:subject [U&(USA)]    Lunch
m3 mail:body [U&(USA)]    exactly!
m3 mail:receiver [U&(USA)]    habelson
m3 mail:sender [U&(USA)]    mgiordano
m3 mail:sentTime [U&(USA)]    1380571504
m3 mail:subject [U&(USA)]    Lunch
m4 mail:body [U&(USA)]    Dude!
m4 mail:receiver [U&(USA)]    mcross
m4 mail:sender [U&(USA)]    habelson
m4 mail:sentTime [U&(USA)]    1380571506
m4 mail:subject [U&(USA)]    Lunch
m5 mail:body [U&(USA)]    Yeah
m5 mail:receiver [U&(USA)]    habelson
m5 mail:sender [U&(USA)]    mcross
m5 mail:sentTime [U&(USA)]    1380571508
m5 mail:subject [U&(USA)]    Lunch

A scan on the "MailIndex" table looks like this:

receiver habelson:m3 []    habelson
receiver habelson:m5 []    habelson
receiver jmarcolla:m2 []    jmarcolla
receiver mcross:m4 []    mcross
receiver mgiordano:m1 []    mgiordano
sender habelson:m1 []    habelson
sender habelson:m2 []    habelson
sender habelson:m4 []    habelson
sender mcross:m5 []    mcross
sender mgiordano:m3 []    mgiordano
sentTime 1380571500:m1 []    1380571500
sentTime 1380571502:m2 []    1380571502
sentTime 1380571504:m3 []    1380571504
sentTime 1380571506:m4 []    1380571506
sentTime 1380571508:m5 []    1380571508
subject Lunch:m1 []    Lunch
subject Lunch:m2 []    Lunch
subject Lunch:m3 []    Lunch
subject Lunch:m4 []    Lunch
subject Lunch:m5 []    Lunch

If I use an IntersectingIterator with a BatchScanner and pass it the terms "habelson","mgiordano" (or seemingly any pair of terms) I get zero results. If, instead, I use the same value as both terms (i.e. "habelson","habelson") I properly get back the records that contain that value.

My code is almost identical to the userguide example, and I am using Accumulo 1.4.3

Any help would be appreciated





Heath Abelson
NetCentric Technology, Inc.
3349 Route 138, Building A
Wall, NJ  07719
Phone: 732-544-0888 x159
Email:  habelson@netcentricinc.com


Re: Trouble with IntersectingIterator

Posted by Adam Fuchs <af...@apache.org>.
Heath,

In your case, the question that you are effectively asking is "within each
partition, which documents' index entries include all of the given terms".
Since you have partitions aligned by field and only a single index entry
per field you will not get any matches for queries with more than one term.
You can't ask a question that correlates index entries that cross a
partition boundary with the IntersectingIterator. For example, document
"m1" has the index entry for "habelson" in the "sender" partition, but the
index entry for "mgiordano" is in the "receiver" partition.

Another thing you might try is to partition by field within the document
partitions. You can hack this together by building something like the
following, with p1 = {m1,m2,m3} and p2 = {m4,m5}:

p1 receiver_habelson:m3 []    habelson
p1 receiver_jmarcolla:m2 []    jmarcolla
p1 receiver_mgiordano:m1 []    mgiordano
p1 sender_habelson:m1 []    habelson
p1 sender_habelson:m2 []    habelson
p1 sender_mgiordano:m3 []    mgiordano
p1 sentTime_1380571500:m1 []    1380571500
p1 sentTime_1380571502:m2 []    1380571502
p1 sentTime_1380571504:m3 []    1380571504
p1 subject_Lunch:m1 []    Lunch
p1 subject_Lunch:m2 []    Lunch
p1 subject_Lunch:m3 []    Lunch
p2 receiver_habelson:m5 []    habelson
p2 receiver_mcross:m4 []    mcross
p2 sender_habelson:m4 []    habelson
p2 sender_mcross:m5 []    mcross
p2 sentTime_1380571506:m4 []    1380571506
p2 sentTime_1380571508:m5 []    1380571508
p2 subject_Lunch:m4 []    Lunch
p2 subject_Lunch:m5 []    Lunch

Here terms are prefixed by field_, and you can do queries for things like
{"sender_habelson", "receiver_mgiordano"}.

Adam




On Tue, Oct 1, 2013 at 4:13 PM, Heath Abelson <HA...@netcentricinc.com>wrote:

>  Looking at this example, the index and record do not occur in the same
> row. The seems to be more related to the IndexedDocIterator.****
>
> ** **
>
> If we take my “mail” object as my document, and think of it as being
> partitioned by field name rather than some hash, It seems to me like the
> use of this iterator could still apply.****
>
> ** **
>
> *From:* William Slacum [mailto:wilhelm.von.cloud@accumulo.net]
> *Sent:* Tuesday, October 01, 2013 3:48 PM
> *To:* user@accumulo.apache.org
> *Subject:* Re: Trouble with IntersectingIterator****
>
> ** **
>
> That iterator is designed to be used with a sharded table format, where in
> the index and record each occur within the same row. See the Accumulo
> examples page http://accumulo.apache.org/1.4/examples/shard.html****
>
> ** **
>
> On Tue, Oct 1, 2013 at 3:35 PM, Heath Abelson <HA...@netcentricinc.com>
> wrote:****
>
> I am attempting to get a very simple example working with the Intersecting
> Iterator. I made up some dummy objects for me to do this work:****
>
>  ****
>
> A scan on the “Mail” table looks like this:****
>
>  ****
>
> m1 mail:body [U&(USA)]    WTF?****
>
> m1 mail:receiver [U&(USA)]    mgiordano****
>
> m1 mail:sender [U&(USA)]    habelson****
>
> m1 mail:sentTime [U&(USA)]    1380571500****
>
> m1 mail:subject [U&(USA)]    Lunch****
>
> m2 mail:body [U&(USA)]    I know right?****
>
> m2 mail:receiver [U&(USA)]    jmarcolla****
>
> m2 mail:sender [U&(USA)]    habelson****
>
> m2 mail:sentTime [U&(USA)]    1380571502****
>
> m2 mail:subject [U&(USA)]    Lunch****
>
> m3 mail:body [U&(USA)]    exactly!****
>
> m3 mail:receiver [U&(USA)]    habelson****
>
> m3 mail:sender [U&(USA)]    mgiordano****
>
> m3 mail:sentTime [U&(USA)]    1380571504****
>
> m3 mail:subject [U&(USA)]    Lunch****
>
> m4 mail:body [U&(USA)]    Dude!****
>
> m4 mail:receiver [U&(USA)]    mcross****
>
> m4 mail:sender [U&(USA)]    habelson****
>
> m4 mail:sentTime [U&(USA)]    1380571506****
>
> m4 mail:subject [U&(USA)]    Lunch****
>
> m5 mail:body [U&(USA)]    Yeah****
>
> m5 mail:receiver [U&(USA)]    habelson****
>
> m5 mail:sender [U&(USA)]    mcross****
>
> m5 mail:sentTime [U&(USA)]    1380571508****
>
> m5 mail:subject [U&(USA)]    Lunch****
>
>  ****
>
> A scan on the “MailIndex” table looks like this:****
>
>  ****
>
> receiver habelson:m3 []    habelson****
>
> receiver habelson:m5 []    habelson****
>
> receiver jmarcolla:m2 []    jmarcolla****
>
> receiver mcross:m4 []    mcross****
>
> receiver mgiordano:m1 []    mgiordano****
>
> sender habelson:m1 []    habelson****
>
> sender habelson:m2 []    habelson****
>
> sender habelson:m4 []    habelson****
>
> sender mcross:m5 []    mcross****
>
> sender mgiordano:m3 []    mgiordano****
>
> sentTime 1380571500:m1 []    1380571500****
>
> sentTime 1380571502:m2 []    1380571502****
>
> sentTime 1380571504:m3 []    1380571504****
>
> sentTime 1380571506:m4 []    1380571506****
>
> sentTime 1380571508:m5 []    1380571508****
>
> subject Lunch:m1 []    Lunch****
>
> subject Lunch:m2 []    Lunch****
>
> subject Lunch:m3 []    Lunch****
>
> subject Lunch:m4 []    Lunch****
>
> subject Lunch:m5 []    Lunch****
>
>  ****
>
> If I use an IntersectingIterator with a BatchScanner and pass it the terms
> “habelson”,”mgiordano” (or seemingly any pair of terms) I get zero results.
> If, instead, I use the same value as both terms (i.e.
> “habelson”,”habelson”) I properly get back the records that contain that
> value.****
>
>  ****
>
> My code is almost identical to the userguide example, and I am using
> Accumulo 1.4.3****
>
>  ****
>
> Any help would be appreciated****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> Heath Abelson****
>
> NetCentric Technology, Inc.****
>
> 3349 Route 138, Building A****
>
> Wall, NJ  07719****
>
> Phone: 732-544-0888 x159****
>
> Email:  habelson@netcentricinc.com  ****
>
>  ****
>
> ** **
>

RE: Trouble with IntersectingIterator

Posted by Heath Abelson <HA...@netcentricinc.com>.
Looking at this example, the index and record do not occur in the same row. The seems to be more related to the IndexedDocIterator.

If we take my "mail" object as my document, and think of it as being partitioned by field name rather than some hash, It seems to me like the use of this iterator could still apply.

From: William Slacum [mailto:wilhelm.von.cloud@accumulo.net]
Sent: Tuesday, October 01, 2013 3:48 PM
To: user@accumulo.apache.org
Subject: Re: Trouble with IntersectingIterator

That iterator is designed to be used with a sharded table format, where in the index and record each occur within the same row. See the Accumulo examples page http://accumulo.apache.org/1.4/examples/shard.html

On Tue, Oct 1, 2013 at 3:35 PM, Heath Abelson <HA...@netcentricinc.com>> wrote:
I am attempting to get a very simple example working with the Intersecting Iterator. I made up some dummy objects for me to do this work:

A scan on the "Mail" table looks like this:

m1 mail:body [U&(USA)]    WTF?
m1 mail:receiver [U&(USA)]    mgiordano
m1 mail:sender [U&(USA)]    habelson
m1 mail:sentTime [U&(USA)]    1380571500
m1 mail:subject [U&(USA)]    Lunch
m2 mail:body [U&(USA)]    I know right?
m2 mail:receiver [U&(USA)]    jmarcolla
m2 mail:sender [U&(USA)]    habelson
m2 mail:sentTime [U&(USA)]    1380571502
m2 mail:subject [U&(USA)]    Lunch
m3 mail:body [U&(USA)]    exactly!
m3 mail:receiver [U&(USA)]    habelson
m3 mail:sender [U&(USA)]    mgiordano
m3 mail:sentTime [U&(USA)]    1380571504
m3 mail:subject [U&(USA)]    Lunch
m4 mail:body [U&(USA)]    Dude!
m4 mail:receiver [U&(USA)]    mcross
m4 mail:sender [U&(USA)]    habelson
m4 mail:sentTime [U&(USA)]    1380571506
m4 mail:subject [U&(USA)]    Lunch
m5 mail:body [U&(USA)]    Yeah
m5 mail:receiver [U&(USA)]    habelson
m5 mail:sender [U&(USA)]    mcross
m5 mail:sentTime [U&(USA)]    1380571508
m5 mail:subject [U&(USA)]    Lunch

A scan on the "MailIndex" table looks like this:

receiver habelson:m3 []    habelson
receiver habelson:m5 []    habelson
receiver jmarcolla:m2 []    jmarcolla
receiver mcross:m4 []    mcross
receiver mgiordano:m1 []    mgiordano
sender habelson:m1 []    habelson
sender habelson:m2 []    habelson
sender habelson:m4 []    habelson
sender mcross:m5 []    mcross
sender mgiordano:m3 []    mgiordano
sentTime 1380571500:m1 []    1380571500
sentTime 1380571502:m2 []    1380571502
sentTime 1380571504:m3 []    1380571504
sentTime 1380571506:m4 []    1380571506
sentTime 1380571508:m5 []    1380571508
subject Lunch:m1 []    Lunch
subject Lunch:m2 []    Lunch
subject Lunch:m3 []    Lunch
subject Lunch:m4 []    Lunch
subject Lunch:m5 []    Lunch

If I use an IntersectingIterator with a BatchScanner and pass it the terms "habelson","mgiordano" (or seemingly any pair of terms) I get zero results. If, instead, I use the same value as both terms (i.e. "habelson","habelson") I properly get back the records that contain that value.

My code is almost identical to the userguide example, and I am using Accumulo 1.4.3

Any help would be appreciated





Heath Abelson
NetCentric Technology, Inc.
3349 Route 138, Building A
Wall, NJ  07719
Phone: 732-544-0888 x159<tel:732-544-0888%20x159>
Email:  habelson@netcentricinc.com<ma...@netcentricinc.com>



Re: Trouble with IntersectingIterator

Posted by William Slacum <wi...@accumulo.net>.
That iterator is designed to be used with a sharded table format, where in
the index and record each occur within the same row. See the Accumulo
examples page http://accumulo.apache.org/1.4/examples/shard.html


On Tue, Oct 1, 2013 at 3:35 PM, Heath Abelson <HA...@netcentricinc.com>wrote:

>  I am attempting to get a very simple example working with the
> Intersecting Iterator. I made up some dummy objects for me to do this work:
> ****
>
> ** **
>
> A scan on the “Mail” table looks like this:****
>
> ** **
>
> m1 mail:body [U&(USA)]    WTF?****
>
> m1 mail:receiver [U&(USA)]    mgiordano****
>
> m1 mail:sender [U&(USA)]    habelson****
>
> m1 mail:sentTime [U&(USA)]    1380571500****
>
> m1 mail:subject [U&(USA)]    Lunch****
>
> m2 mail:body [U&(USA)]    I know right?****
>
> m2 mail:receiver [U&(USA)]    jmarcolla****
>
> m2 mail:sender [U&(USA)]    habelson****
>
> m2 mail:sentTime [U&(USA)]    1380571502****
>
> m2 mail:subject [U&(USA)]    Lunch****
>
> m3 mail:body [U&(USA)]    exactly!****
>
> m3 mail:receiver [U&(USA)]    habelson****
>
> m3 mail:sender [U&(USA)]    mgiordano****
>
> m3 mail:sentTime [U&(USA)]    1380571504****
>
> m3 mail:subject [U&(USA)]    Lunch****
>
> m4 mail:body [U&(USA)]    Dude!****
>
> m4 mail:receiver [U&(USA)]    mcross****
>
> m4 mail:sender [U&(USA)]    habelson****
>
> m4 mail:sentTime [U&(USA)]    1380571506****
>
> m4 mail:subject [U&(USA)]    Lunch****
>
> m5 mail:body [U&(USA)]    Yeah****
>
> m5 mail:receiver [U&(USA)]    habelson****
>
> m5 mail:sender [U&(USA)]    mcross****
>
> m5 mail:sentTime [U&(USA)]    1380571508****
>
> m5 mail:subject [U&(USA)]    Lunch****
>
> ** **
>
> A scan on the “MailIndex” table looks like this:****
>
> ** **
>
> receiver habelson:m3 []    habelson****
>
> receiver habelson:m5 []    habelson****
>
> receiver jmarcolla:m2 []    jmarcolla****
>
> receiver mcross:m4 []    mcross****
>
> receiver mgiordano:m1 []    mgiordano****
>
> sender habelson:m1 []    habelson****
>
> sender habelson:m2 []    habelson****
>
> sender habelson:m4 []    habelson****
>
> sender mcross:m5 []    mcross****
>
> sender mgiordano:m3 []    mgiordano****
>
> sentTime 1380571500:m1 []    1380571500****
>
> sentTime 1380571502:m2 []    1380571502****
>
> sentTime 1380571504:m3 []    1380571504****
>
> sentTime 1380571506:m4 []    1380571506****
>
> sentTime 1380571508:m5 []    1380571508****
>
> subject Lunch:m1 []    Lunch****
>
> subject Lunch:m2 []    Lunch****
>
> subject Lunch:m3 []    Lunch****
>
> subject Lunch:m4 []    Lunch****
>
> subject Lunch:m5 []    Lunch****
>
> ** **
>
> If I use an IntersectingIterator with a BatchScanner and pass it the terms
> “habelson”,”mgiordano” (or seemingly any pair of terms) I get zero results.
> If, instead, I use the same value as both terms (i.e.
> “habelson”,”habelson”) I properly get back the records that contain that
> value.****
>
> ** **
>
> My code is almost identical to the userguide example, and I am using
> Accumulo 1.4.3****
>
> ** **
>
> Any help would be appreciated****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> Heath Abelson****
>
> NetCentric Technology, Inc.****
>
> 3349 Route 138, Building A****
>
> Wall, NJ  07719****
>
> Phone: 732-544-0888 x159****
>
> Email:  habelson@netcentricinc.com  ****
>
> ** **
>