You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by David O'Gwynn <do...@acm.org> on 2014/04/12 19:59:20 UTC

Thrift proxy: Python WholeRowIterator behavior

Hi all,

I'm working with the Python Thrift API for the Accumulo proxy service,
and I have a bit of odd behavior happening. I'm using Accumulo 1.5
(the standard one from the Accumulo website).

Whenever I use the WholeRowIterator with a Scanner, I cannot configure
the Range for that Scanner to correctly return the start row for the
Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe
row], it returns zero entries. For Range('row0',true,'row1\0',true),
it returns only "row1".

>From the WholeRowIterator documentation, this behavior implies that
the startInclusive bit was set to False, which it clearly wasn't.

I've been able to hack around this issue by setting the start key to

Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)

but I'd really rather understand the correct way of using a Range
object in conjunction with a WholeRowIterator.

Thanks much,

David

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by David O'Gwynn <do...@acm.org>.

Lol. Search JIRA before banging head against wall. Noted. :-D

On Sun, Apr 13, 2014 at 11:32 PM, Sean Busbey <bu...@cloudera.com> wrote:
> Oh! that's ACCUMULO-1994. The ticket explains the root problem. In the 1.5.0
> Proxy, the timestamp defaults to 0. Since columns sort on timestamp in
> reverse order, that means nothing in the start row gets included (unless the
> rest of the key components are 0).
>
> [1]: https://issues.apache.org/jira/browse/ACCUMULO-1994
>
>
> On Sun, Apr 13, 2014 at 8:17 PM, David O'Gwynn <do...@acm.org> wrote:
>>
>> Urgh. If I run your code with the 1.5.1 Thrift interface, behavior's
>> not there. If I run it with the one I downloaded, I get the behavior.
>> I diff'ed the ttypes.py from the old and new and got this:
>>
>> <     (5, TType.I64, 'timestamp', None, None, ), # 5
>> ---
>> >     (5, TType.I64, 'timestamp', None, 9223372036854775807, ), # 5
>> 228c228
>> <   def __init__(self, row=None, colFamily=None, colQualifier=None,
>> colVisibility=None, timestamp=None,):
>> ---
>> >   def __init__(self, row=None, colFamily=None, colQualifier=None,
>> > colVisibility=None, timestamp=thrift_spec[5][4],):
>>
>> I hand-jammed that timestamp default (sys.maxint) into my codebase and
>> presto: fixed. So, I was obviously working off of an old Thrift
>> interface version. Fail.
>>
>> Still not sure why that would fix it. The skip condition in WRI's seek
>> method requires both that the timestamp be Long.MAX_VALUE and
>> Range.isStartKeyInclusive() be false. The only thing that I can think
>> is that the versioning iterator somehow resets the Range to the
>> correct defaults. [/shrug]
>>
>> Regardless, I think the issue is actually settled. Again, thanks,
>> Josh, for the patience.
>>
>> On Sun, Apr 13, 2014 at 10:00 PM, Josh Elser <jo...@gmail.com> wrote:
>> > On 4/13/14, 9:13 PM, David O'Gwynn wrote:
>> >>
>> >> If the versioning iterator is attached, and the WRI's priority is <=
>> >> the versioning iterator's priority, then you see this behavior (the
>> >> first row of a WRI scan gets dropped). If you change the priority for
>> >> the WRI in your code to <=20, then you'll see it, Josh.
>> >>
>> >> Still not sure why this would be the case; seems an odd behavior.
>> >> Anyway, thanks for taking the time to help me suss this out.:-)
>> >
>> >
>> > Hrm, interesting.
>> >
>> > What you described with the versioning iterator doesn't make much sense,
>> > but
>> > I tried it out anyways. Setting WRI below versioning didn't change the
>> > results. Also, completely removing the versioning iterator didn't change
>> > anything.
>> >
>> > For fun, I inserted some new values for the same key and ensured that I
>> > got
>> > both values when versioning was configured below.
>> >
>> > I'm still not sure what exactly the root of your problem.
>
>
>
>
> --
> Sean

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by Josh Elser <jo...@gmail.com>.

On 4/13/14, 11:32 PM, Sean Busbey wrote:
> Oh! that's ACCUMULO-1994. The ticket explains the root problem. In the
> 1.5.0 Proxy, the timestamp defaults to 0. Since columns sort on
> timestamp in reverse order, that means nothing in the start row gets
> included (unless the rest of the key components are 0).
>
> [1]: https://issues.apache.org/jira/browse/ACCUMULO-1994
>

Bingo! That's entirely it, Sean.


> On Sun, Apr 13, 2014 at 8:17 PM, David O'Gwynn <dogwynn@acm.org
> <ma...@acm.org>> wrote:
>
>     Urgh. If I run your code with the 1.5.1 Thrift interface, behavior's
>     not there. If I run it with the one I downloaded, I get the behavior.
>     I diff'ed the ttypes.py from the old and new and got this:
>
>     <     (5, TType.I64, 'timestamp', None, None, ), # 5
>     ---
>      >     (5, TType.I64, 'timestamp', None, 9223372036854775807, ), # 5
>     228c228
>     <   def __init__(self, row=None, colFamily=None, colQualifier=None,
>     colVisibility=None, timestamp=None,):
>     ---
>      >   def __init__(self, row=None, colFamily=None, colQualifier=None,
>     colVisibility=None, timestamp=thrift_spec[5][4],):
>
>     I hand-jammed that timestamp default (sys.maxint) into my codebase and
>     presto: fixed. So, I was obviously working off of an old Thrift
>     interface version. Fail.
>
>     Still not sure why that would fix it. The skip condition in WRI's seek
>     method requires both that the timestamp be Long.MAX_VALUE and
>     Range.isStartKeyInclusive() be false. The only thing that I can think
>     is that the versioning iterator somehow resets the Range to the
>     correct defaults. [/shrug]
>
>     Regardless, I think the issue is actually settled. Again, thanks,
>     Josh, for the patience.
>

As always, HTH David. This is certainly one of those quirky bugs. I'm 
glad we could get to the bottom of it :D

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by Sean Busbey <bu...@cloudera.com>.

Oh! that's ACCUMULO-1994. The ticket explains the root problem. In the
1.5.0 Proxy, the timestamp defaults to 0. Since columns sort on timestamp
in reverse order, that means nothing in the start row gets included (unless
the rest of the key components are 0).

[1]: https://issues.apache.org/jira/browse/ACCUMULO-1994


On Sun, Apr 13, 2014 at 8:17 PM, David O'Gwynn <do...@acm.org> wrote:

> Urgh. If I run your code with the 1.5.1 Thrift interface, behavior's
> not there. If I run it with the one I downloaded, I get the behavior.
> I diff'ed the ttypes.py from the old and new and got this:
>
> <     (5, TType.I64, 'timestamp', None, None, ), # 5
> ---
> >     (5, TType.I64, 'timestamp', None, 9223372036854775807, ), # 5
> 228c228
> <   def __init__(self, row=None, colFamily=None, colQualifier=None,
> colVisibility=None, timestamp=None,):
> ---
> >   def __init__(self, row=None, colFamily=None, colQualifier=None,
> colVisibility=None, timestamp=thrift_spec[5][4],):
>
> I hand-jammed that timestamp default (sys.maxint) into my codebase and
> presto: fixed. So, I was obviously working off of an old Thrift
> interface version. Fail.
>
> Still not sure why that would fix it. The skip condition in WRI's seek
> method requires both that the timestamp be Long.MAX_VALUE and
> Range.isStartKeyInclusive() be false. The only thing that I can think
> is that the versioning iterator somehow resets the Range to the
> correct defaults. [/shrug]
>
> Regardless, I think the issue is actually settled. Again, thanks,
> Josh, for the patience.
>
> On Sun, Apr 13, 2014 at 10:00 PM, Josh Elser <jo...@gmail.com> wrote:
> > On 4/13/14, 9:13 PM, David O'Gwynn wrote:
> >>
> >> If the versioning iterator is attached, and the WRI's priority is <=
> >> the versioning iterator's priority, then you see this behavior (the
> >> first row of a WRI scan gets dropped). If you change the priority for
> >> the WRI in your code to <=20, then you'll see it, Josh.
> >>
> >> Still not sure why this would be the case; seems an odd behavior.
> >> Anyway, thanks for taking the time to help me suss this out.:-)
> >
> >
> > Hrm, interesting.
> >
> > What you described with the versioning iterator doesn't make much sense,
> but
> > I tried it out anyways. Setting WRI below versioning didn't change the
> > results. Also, completely removing the versioning iterator didn't change
> > anything.
> >
> > For fun, I inserted some new values for the same key and ensured that I
> got
> > both values when versioning was configured below.
> >
> > I'm still not sure what exactly the root of your problem.
>



-- 
Sean

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by David O'Gwynn <do...@acm.org>.

Urgh. If I run your code with the 1.5.1 Thrift interface, behavior's
not there. If I run it with the one I downloaded, I get the behavior.
I diff'ed the ttypes.py from the old and new and got this:

<     (5, TType.I64, 'timestamp', None, None, ), # 5
---
>     (5, TType.I64, 'timestamp', None, 9223372036854775807, ), # 5
228c228
<   def __init__(self, row=None, colFamily=None, colQualifier=None,
colVisibility=None, timestamp=None,):
---
>   def __init__(self, row=None, colFamily=None, colQualifier=None, colVisibility=None, timestamp=thrift_spec[5][4],):

I hand-jammed that timestamp default (sys.maxint) into my codebase and
presto: fixed. So, I was obviously working off of an old Thrift
interface version. Fail.

Still not sure why that would fix it. The skip condition in WRI's seek
method requires both that the timestamp be Long.MAX_VALUE and
Range.isStartKeyInclusive() be false. The only thing that I can think
is that the versioning iterator somehow resets the Range to the
correct defaults. [/shrug]

Regardless, I think the issue is actually settled. Again, thanks,
Josh, for the patience.

On Sun, Apr 13, 2014 at 10:00 PM, Josh Elser <jo...@gmail.com> wrote:
> On 4/13/14, 9:13 PM, David O'Gwynn wrote:
>>
>> If the versioning iterator is attached, and the WRI's priority is <=
>> the versioning iterator's priority, then you see this behavior (the
>> first row of a WRI scan gets dropped). If you change the priority for
>> the WRI in your code to <=20, then you'll see it, Josh.
>>
>> Still not sure why this would be the case; seems an odd behavior.
>> Anyway, thanks for taking the time to help me suss this out.:-)
>
>
> Hrm, interesting.
>
> What you described with the versioning iterator doesn't make much sense, but
> I tried it out anyways. Setting WRI below versioning didn't change the
> results. Also, completely removing the versioning iterator didn't change
> anything.
>
> For fun, I inserted some new values for the same key and ensured that I got
> both values when versioning was configured below.
>
> I'm still not sure what exactly the root of your problem.

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by Josh Elser <jo...@gmail.com>.

On 4/13/14, 9:13 PM, David O'Gwynn wrote:
> If the versioning iterator is attached, and the WRI's priority is <=
> the versioning iterator's priority, then you see this behavior (the
> first row of a WRI scan gets dropped). If you change the priority for
> the WRI in your code to <=20, then you'll see it, Josh.
>
> Still not sure why this would be the case; seems an odd behavior.
> Anyway, thanks for taking the time to help me suss this out.:-)

Hrm, interesting.

What you described with the versioning iterator doesn't make much sense, 
but I tried it out anyways. Setting WRI below versioning didn't change 
the results. Also, completely removing the versioning iterator didn't 
change anything.

For fun, I inserted some new values for the same key and ensured that I 
got both values when versioning was configured below.

I'm still not sure what exactly the root of your problem.

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by David O'Gwynn <do...@acm.org>.

Ok, so I went back to my IPython console to rerun my scan to prove to
myself that I wasn't crazy. Well, I ran it and it worked like you just
said, contra to my original point. Started to think I was on the crazy
train.

Then I remembered that the table I'd been working on, I'd removed the
versioning iterator for some other tests. Then I started checking the
priorities of my iterators. Turns out, the issue was the priority of
my WRI.

If the versioning iterator is attached, and the WRI's priority is <=
the versioning iterator's priority, then you see this behavior (the
first row of a WRI scan gets dropped). If you change the priority for
the WRI in your code to <=20, then you'll see it, Josh.

Still not sure why this would be the case; seems an odd behavior.
Anyway, thanks for taking the time to help me suss this out. :-)

On Sun, Apr 13, 2014 at 8:24 PM, Josh Elser <jo...@gmail.com> wrote:
> David,
>
> Not quite sure what you're seeing. Using the "plain" python bindings, I
> think I emulated what you described. I created a table with the following
> data:
>
> 1 => ['col1: [] 1397241795 => val1', 'col2: [] 1397241797 => val2', 'col3:
> [] 1397241800 => val3']
> 2 => ['col1: [] 1397241803 => val1', 'col2: [] 1397241806 => val2', 'col3:
> [] 1397241808 => val3']
>
> I then modified the start and end Key (really just row) for the Range with
> the following code:
>
> https://github.com/joshelser/accumulo-python-thrift/blob/master/ReadWholeRow.py
>
> I got the results I would expect (just row1, just row2, and both row1 and
> row2). Perhaps you hit some sort of bug in pyaccumulo? Not sure -- HTH if
> you have more info.
>
>
> On 4/13/14, 5:02 PM, David O'Gwynn wrote:
>>
>> Hi Russ,
>>
>> I ported it:
>>
>> def decode_row(cell):
>>      value = StringIO.StringIO(cell.value)
>>      numCells = struct.unpack('!i',value.read(4))[0]
>>      key = cell.row
>>      for i in range(numCells):
>>          if value.pos == value.len:
>>              raise Exception(
>>                  'Reached the end of the parsable string without'
>>                  ' having finished unpacking. Likely an error'
>>                  ' of passing a cell that is not from a'
>>                  ' WholeRowIterator.'
>>                  )
>>          cf = value.read(struct.unpack('!i',value.read(4))[0])
>>          cq = value.read(struct.unpack('!i',value.read(4))[0])
>>          cv = value.read(struct.unpack('!i',value.read(4))[0])
>>          cts = struct.unpack('!q',value.read(8))[0]/1000.
>>          val = value.read(struct.unpack('!i',value.read(4))[0])
>>
>> You'll want the check at the beginning of the for loop; I found out
>> how fast Python can fill my available memory before I put that in.
>>
>> On Sun, Apr 13, 2014 at 4:43 PM, Russ Weeks <rw...@newbrightidea.com>
>> wrote:
>>>
>>> Just curious, David, did you port the logic of WholeRowIterator.decodeRow
>>> over to Python, or is that functionality available somewhere in the
>>> pyaccumulo API and I just missed it?
>>>
>>> -Russ
>>>
>>>
>>> On Sun, Apr 13, 2014 at 10:48 AM, David O'Gwynn <do...@acm.org> wrote:
>>>>
>>>>
>>>> 1.5.0
>>>>
>>>> Btw, the pyaccumulo library:
>>>>
>>>> https://github.com/accumulo/pyaccumulo
>>>>
>>>> is the basis of my codebase. You should be able to use that to
>>>> replicate the issue.
>>>>
>>>> Thanks for looking into this!
>>>>
>>>> On Sun, Apr 13, 2014 at 12:51 PM, Josh Elser <jo...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Ah, gotcha.
>>>>>
>>>>> That definitely does not seem right. I'll see if I can poke around at
>>>>> this
>>>>> today.
>>>>>
>>>>> Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago)
>>>>>
>>>>>
>>>>> On 4/12/14, 4:13 PM, David O'Gwynn wrote:
>>>>>>
>>>>>>
>>>>>> Hi Josh,
>>>>>>
>>>>>> I guess I misspoke, the Range I'm passing is this:
>>>>>>
>>>>>> Range('row0', true, 'row0\0',true)
>>>>>>
>>>>>> Keeping in mind that the Thrift interface only exposes one Range
>>>>>> constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is
>>>>>> this:
>>>>>>
>>>>>> Range( Key('row0',null,...), true, Key('row0\0',null,...), true )
>>>>>>
>>>>>> If I scan for all entries (without WholeRowIterator), I get the full
>>>>>> contents of "row0". However, when I add the WholeRowIterator, it
>>>>>> returns nothing.
>>>>>>
>>>>>> Furthermore, if I were to pass the following:
>>>>>>
>>>>>> Range( Key('row0',null,...), true, Key('row1\0',null,...), true )
>>>>>>
>>>>>> not only do I get both "row0" and "row1" without the WRI, I get "row1"
>>>>>> as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow
>>>>>> interpreting my Range as having startKeyInclusive set to false, which
>>>>>> is clearly not the case.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>> On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <jo...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Looks like you're just mis-using the Range here.
>>>>>>>
>>>>>>> If you create a range that is ["row0", "row0"] as you denote below,
>>>>>>> that
>>>>>>> will only include Keys that have a rowId of "row0" with an empty
>>>>>>> colfam,
>>>>>>> colqual, etc. Since you want to use the WholeRowIterator, I can
>>>>>>> assume
>>>>>>> you
>>>>>>> want all columns in "row0". As such, ["row0", "row0\0") would be the
>>>>>>> best
>>>>>>> range to fetch all of the columns in that single row.
>>>>>>>
>>>>>>>
>>>>>>> On 4/12/2014 1:59 PM, David O'Gwynn wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'm working with the Python Thrift API for the Accumulo proxy
>>>>>>>> service,
>>>>>>>> and I have a bit of odd behavior happening. I'm using Accumulo 1.5
>>>>>>>> (the standard one from the Accumulo website).
>>>>>>>>
>>>>>>>> Whenever I use the WholeRowIterator with a Scanner, I cannot
>>>>>>>> configure
>>>>>>>> the Range for that Scanner to correctly return the start row for the
>>>>>>>> Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe
>>>>>>>> row], it returns zero entries. For Range('row0',true,'row1\0',true),
>>>>>>>> it returns only "row1".
>>>>>>>>
>>>>>>>>    From the WholeRowIterator documentation, this behavior implies
>>>>>>>> that
>>>>>>>> the startInclusive bit was set to False, which it clearly wasn't.
>>>>>>>>
>>>>>>>> I've been able to hack around this issue by setting the start key to
>>>>>>>>
>>>>>>>> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)
>>>>>>>>
>>>>>>>> but I'd really rather understand the correct way of using a Range
>>>>>>>> object in conjunction with a WholeRowIterator.
>>>>>>>>
>>>>>>>> Thanks much,
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>>
>

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by Josh Elser <jo...@gmail.com>.

David,

Not quite sure what you're seeing. Using the "plain" python bindings, I 
think I emulated what you described. I created a table with the 
following data:

1 => ['col1: [] 1397241795 => val1', 'col2: [] 1397241797 => val2', 
'col3: [] 1397241800 => val3']
2 => ['col1: [] 1397241803 => val1', 'col2: [] 1397241806 => val2', 
'col3: [] 1397241808 => val3']

I then modified the start and end Key (really just row) for the Range 
with the following code:

https://github.com/joshelser/accumulo-python-thrift/blob/master/ReadWholeRow.py

I got the results I would expect (just row1, just row2, and both row1 
and row2). Perhaps you hit some sort of bug in pyaccumulo? Not sure -- 
HTH if you have more info.

On 4/13/14, 5:02 PM, David O'Gwynn wrote:
> Hi Russ,
>
> I ported it:
>
> def decode_row(cell):
>      value = StringIO.StringIO(cell.value)
>      numCells = struct.unpack('!i',value.read(4))[0]
>      key = cell.row
>      for i in range(numCells):
>          if value.pos == value.len:
>              raise Exception(
>                  'Reached the end of the parsable string without'
>                  ' having finished unpacking. Likely an error'
>                  ' of passing a cell that is not from a'
>                  ' WholeRowIterator.'
>                  )
>          cf = value.read(struct.unpack('!i',value.read(4))[0])
>          cq = value.read(struct.unpack('!i',value.read(4))[0])
>          cv = value.read(struct.unpack('!i',value.read(4))[0])
>          cts = struct.unpack('!q',value.read(8))[0]/1000.
>          val = value.read(struct.unpack('!i',value.read(4))[0])
>
> You'll want the check at the beginning of the for loop; I found out
> how fast Python can fill my available memory before I put that in.
>
> On Sun, Apr 13, 2014 at 4:43 PM, Russ Weeks <rw...@newbrightidea.com> wrote:
>> Just curious, David, did you port the logic of WholeRowIterator.decodeRow
>> over to Python, or is that functionality available somewhere in the
>> pyaccumulo API and I just missed it?
>>
>> -Russ
>>
>>
>> On Sun, Apr 13, 2014 at 10:48 AM, David O'Gwynn <do...@acm.org> wrote:
>>>
>>> 1.5.0
>>>
>>> Btw, the pyaccumulo library:
>>>
>>> https://github.com/accumulo/pyaccumulo
>>>
>>> is the basis of my codebase. You should be able to use that to
>>> replicate the issue.
>>>
>>> Thanks for looking into this!
>>>
>>> On Sun, Apr 13, 2014 at 12:51 PM, Josh Elser <jo...@gmail.com> wrote:
>>>> Ah, gotcha.
>>>>
>>>> That definitely does not seem right. I'll see if I can poke around at
>>>> this
>>>> today.
>>>>
>>>> Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago)
>>>>
>>>>
>>>> On 4/12/14, 4:13 PM, David O'Gwynn wrote:
>>>>>
>>>>> Hi Josh,
>>>>>
>>>>> I guess I misspoke, the Range I'm passing is this:
>>>>>
>>>>> Range('row0', true, 'row0\0',true)
>>>>>
>>>>> Keeping in mind that the Thrift interface only exposes one Range
>>>>> constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is
>>>>> this:
>>>>>
>>>>> Range( Key('row0',null,...), true, Key('row0\0',null,...), true )
>>>>>
>>>>> If I scan for all entries (without WholeRowIterator), I get the full
>>>>> contents of "row0". However, when I add the WholeRowIterator, it
>>>>> returns nothing.
>>>>>
>>>>> Furthermore, if I were to pass the following:
>>>>>
>>>>> Range( Key('row0',null,...), true, Key('row1\0',null,...), true )
>>>>>
>>>>> not only do I get both "row0" and "row1" without the WRI, I get "row1"
>>>>> as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow
>>>>> interpreting my Range as having startKeyInclusive set to false, which
>>>>> is clearly not the case.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>> On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <jo...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> Looks like you're just mis-using the Range here.
>>>>>>
>>>>>> If you create a range that is ["row0", "row0"] as you denote below,
>>>>>> that
>>>>>> will only include Keys that have a rowId of "row0" with an empty
>>>>>> colfam,
>>>>>> colqual, etc. Since you want to use the WholeRowIterator, I can assume
>>>>>> you
>>>>>> want all columns in "row0". As such, ["row0", "row0\0") would be the
>>>>>> best
>>>>>> range to fetch all of the columns in that single row.
>>>>>>
>>>>>>
>>>>>> On 4/12/2014 1:59 PM, David O'Gwynn wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'm working with the Python Thrift API for the Accumulo proxy
>>>>>>> service,
>>>>>>> and I have a bit of odd behavior happening. I'm using Accumulo 1.5
>>>>>>> (the standard one from the Accumulo website).
>>>>>>>
>>>>>>> Whenever I use the WholeRowIterator with a Scanner, I cannot
>>>>>>> configure
>>>>>>> the Range for that Scanner to correctly return the start row for the
>>>>>>> Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe
>>>>>>> row], it returns zero entries. For Range('row0',true,'row1\0',true),
>>>>>>> it returns only "row1".
>>>>>>>
>>>>>>>    From the WholeRowIterator documentation, this behavior implies that
>>>>>>> the startInclusive bit was set to False, which it clearly wasn't.
>>>>>>>
>>>>>>> I've been able to hack around this issue by setting the start key to
>>>>>>>
>>>>>>> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)
>>>>>>>
>>>>>>> but I'd really rather understand the correct way of using a Range
>>>>>>> object in conjunction with a WholeRowIterator.
>>>>>>>
>>>>>>> Thanks much,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>
>>>>
>>
>>

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by David O'Gwynn <do...@acm.org>.

Hi Russ,

I ported it:

def decode_row(cell):
    value = StringIO.StringIO(cell.value)
    numCells = struct.unpack('!i',value.read(4))[0]
    key = cell.row
    for i in range(numCells):
        if value.pos == value.len:
            raise Exception(
                'Reached the end of the parsable string without'
                ' having finished unpacking. Likely an error'
                ' of passing a cell that is not from a'
                ' WholeRowIterator.'
                )
        cf = value.read(struct.unpack('!i',value.read(4))[0])
        cq = value.read(struct.unpack('!i',value.read(4))[0])
        cv = value.read(struct.unpack('!i',value.read(4))[0])
        cts = struct.unpack('!q',value.read(8))[0]/1000.
        val = value.read(struct.unpack('!i',value.read(4))[0])

You'll want the check at the beginning of the for loop; I found out
how fast Python can fill my available memory before I put that in.

On Sun, Apr 13, 2014 at 4:43 PM, Russ Weeks <rw...@newbrightidea.com> wrote:
> Just curious, David, did you port the logic of WholeRowIterator.decodeRow
> over to Python, or is that functionality available somewhere in the
> pyaccumulo API and I just missed it?
>
> -Russ
>
>
> On Sun, Apr 13, 2014 at 10:48 AM, David O'Gwynn <do...@acm.org> wrote:
>>
>> 1.5.0
>>
>> Btw, the pyaccumulo library:
>>
>> https://github.com/accumulo/pyaccumulo
>>
>> is the basis of my codebase. You should be able to use that to
>> replicate the issue.
>>
>> Thanks for looking into this!
>>
>> On Sun, Apr 13, 2014 at 12:51 PM, Josh Elser <jo...@gmail.com> wrote:
>> > Ah, gotcha.
>> >
>> > That definitely does not seem right. I'll see if I can poke around at
>> > this
>> > today.
>> >
>> > Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago)
>> >
>> >
>> > On 4/12/14, 4:13 PM, David O'Gwynn wrote:
>> >>
>> >> Hi Josh,
>> >>
>> >> I guess I misspoke, the Range I'm passing is this:
>> >>
>> >> Range('row0', true, 'row0\0',true)
>> >>
>> >> Keeping in mind that the Thrift interface only exposes one Range
>> >> constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is
>> >> this:
>> >>
>> >> Range( Key('row0',null,...), true, Key('row0\0',null,...), true )
>> >>
>> >> If I scan for all entries (without WholeRowIterator), I get the full
>> >> contents of "row0". However, when I add the WholeRowIterator, it
>> >> returns nothing.
>> >>
>> >> Furthermore, if I were to pass the following:
>> >>
>> >> Range( Key('row0',null,...), true, Key('row1\0',null,...), true )
>> >>
>> >> not only do I get both "row0" and "row1" without the WRI, I get "row1"
>> >> as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow
>> >> interpreting my Range as having startKeyInclusive set to false, which
>> >> is clearly not the case.
>> >>
>> >> Thanks,
>> >> David
>> >>
>> >>
>> >> On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <jo...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi David,
>> >>>
>> >>> Looks like you're just mis-using the Range here.
>> >>>
>> >>> If you create a range that is ["row0", "row0"] as you denote below,
>> >>> that
>> >>> will only include Keys that have a rowId of "row0" with an empty
>> >>> colfam,
>> >>> colqual, etc. Since you want to use the WholeRowIterator, I can assume
>> >>> you
>> >>> want all columns in "row0". As such, ["row0", "row0\0") would be the
>> >>> best
>> >>> range to fetch all of the columns in that single row.
>> >>>
>> >>>
>> >>> On 4/12/2014 1:59 PM, David O'Gwynn wrote:
>> >>>>
>> >>>>
>> >>>> Hi all,
>> >>>>
>> >>>> I'm working with the Python Thrift API for the Accumulo proxy
>> >>>> service,
>> >>>> and I have a bit of odd behavior happening. I'm using Accumulo 1.5
>> >>>> (the standard one from the Accumulo website).
>> >>>>
>> >>>> Whenever I use the WholeRowIterator with a Scanner, I cannot
>> >>>> configure
>> >>>> the Range for that Scanner to correctly return the start row for the
>> >>>> Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe
>> >>>> row], it returns zero entries. For Range('row0',true,'row1\0',true),
>> >>>> it returns only "row1".
>> >>>>
>> >>>>   From the WholeRowIterator documentation, this behavior implies that
>> >>>> the startInclusive bit was set to False, which it clearly wasn't.
>> >>>>
>> >>>> I've been able to hack around this issue by setting the start key to
>> >>>>
>> >>>> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)
>> >>>>
>> >>>> but I'd really rather understand the correct way of using a Range
>> >>>> object in conjunction with a WholeRowIterator.
>> >>>>
>> >>>> Thanks much,
>> >>>>
>> >>>> David
>> >>>>
>> >>>
>> >
>
>

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by Russ Weeks <rw...@newbrightidea.com>.

Just curious, David, did you port the logic of WholeRowIterator.decodeRow
over to Python, or is that functionality available somewhere in the
pyaccumulo API and I just missed it?

-Russ


On Sun, Apr 13, 2014 at 10:48 AM, David O'Gwynn <do...@acm.org> wrote:

> 1.5.0
>
> Btw, the pyaccumulo library:
>
> https://github.com/accumulo/pyaccumulo
>
> is the basis of my codebase. You should be able to use that to
> replicate the issue.
>
> Thanks for looking into this!
>
> On Sun, Apr 13, 2014 at 12:51 PM, Josh Elser <jo...@gmail.com> wrote:
> > Ah, gotcha.
> >
> > That definitely does not seem right. I'll see if I can poke around at
> this
> > today.
> >
> > Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago)
> >
> >
> > On 4/12/14, 4:13 PM, David O'Gwynn wrote:
> >>
> >> Hi Josh,
> >>
> >> I guess I misspoke, the Range I'm passing is this:
> >>
> >> Range('row0', true, 'row0\0',true)
> >>
> >> Keeping in mind that the Thrift interface only exposes one Range
> >> constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is
> >> this:
> >>
> >> Range( Key('row0',null,...), true, Key('row0\0',null,...), true )
> >>
> >> If I scan for all entries (without WholeRowIterator), I get the full
> >> contents of "row0". However, when I add the WholeRowIterator, it
> >> returns nothing.
> >>
> >> Furthermore, if I were to pass the following:
> >>
> >> Range( Key('row0',null,...), true, Key('row1\0',null,...), true )
> >>
> >> not only do I get both "row0" and "row1" without the WRI, I get "row1"
> >> as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow
> >> interpreting my Range as having startKeyInclusive set to false, which
> >> is clearly not the case.
> >>
> >> Thanks,
> >> David
> >>
> >>
> >> On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <jo...@gmail.com>
> wrote:
> >>>
> >>> Hi David,
> >>>
> >>> Looks like you're just mis-using the Range here.
> >>>
> >>> If you create a range that is ["row0", "row0"] as you denote below,
> that
> >>> will only include Keys that have a rowId of "row0" with an empty
> colfam,
> >>> colqual, etc. Since you want to use the WholeRowIterator, I can assume
> >>> you
> >>> want all columns in "row0". As such, ["row0", "row0\0") would be the
> best
> >>> range to fetch all of the columns in that single row.
> >>>
> >>>
> >>> On 4/12/2014 1:59 PM, David O'Gwynn wrote:
> >>>>
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I'm working with the Python Thrift API for the Accumulo proxy service,
> >>>> and I have a bit of odd behavior happening. I'm using Accumulo 1.5
> >>>> (the standard one from the Accumulo website).
> >>>>
> >>>> Whenever I use the WholeRowIterator with a Scanner, I cannot configure
> >>>> the Range for that Scanner to correctly return the start row for the
> >>>> Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe
> >>>> row], it returns zero entries. For Range('row0',true,'row1\0',true),
> >>>> it returns only "row1".
> >>>>
> >>>>   From the WholeRowIterator documentation, this behavior implies that
> >>>> the startInclusive bit was set to False, which it clearly wasn't.
> >>>>
> >>>> I've been able to hack around this issue by setting the start key to
> >>>>
> >>>> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)
> >>>>
> >>>> but I'd really rather understand the correct way of using a Range
> >>>> object in conjunction with a WholeRowIterator.
> >>>>
> >>>> Thanks much,
> >>>>
> >>>> David
> >>>>
> >>>
> >
>

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by David O'Gwynn <do...@acm.org>.

1.5.0

Btw, the pyaccumulo library:

https://github.com/accumulo/pyaccumulo

is the basis of my codebase. You should be able to use that to
replicate the issue.

Thanks for looking into this!

On Sun, Apr 13, 2014 at 12:51 PM, Josh Elser <jo...@gmail.com> wrote:
> Ah, gotcha.
>
> That definitely does not seem right. I'll see if I can poke around at this
> today.
>
> Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago)
>
>
> On 4/12/14, 4:13 PM, David O'Gwynn wrote:
>>
>> Hi Josh,
>>
>> I guess I misspoke, the Range I'm passing is this:
>>
>> Range('row0', true, 'row0\0',true)
>>
>> Keeping in mind that the Thrift interface only exposes one Range
>> constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is
>> this:
>>
>> Range( Key('row0',null,...), true, Key('row0\0',null,...), true )
>>
>> If I scan for all entries (without WholeRowIterator), I get the full
>> contents of "row0". However, when I add the WholeRowIterator, it
>> returns nothing.
>>
>> Furthermore, if I were to pass the following:
>>
>> Range( Key('row0',null,...), true, Key('row1\0',null,...), true )
>>
>> not only do I get both "row0" and "row1" without the WRI, I get "row1"
>> as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow
>> interpreting my Range as having startKeyInclusive set to false, which
>> is clearly not the case.
>>
>> Thanks,
>> David
>>
>>
>> On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <jo...@gmail.com> wrote:
>>>
>>> Hi David,
>>>
>>> Looks like you're just mis-using the Range here.
>>>
>>> If you create a range that is ["row0", "row0"] as you denote below, that
>>> will only include Keys that have a rowId of "row0" with an empty colfam,
>>> colqual, etc. Since you want to use the WholeRowIterator, I can assume
>>> you
>>> want all columns in "row0". As such, ["row0", "row0\0") would be the best
>>> range to fetch all of the columns in that single row.
>>>
>>>
>>> On 4/12/2014 1:59 PM, David O'Gwynn wrote:
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> I'm working with the Python Thrift API for the Accumulo proxy service,
>>>> and I have a bit of odd behavior happening. I'm using Accumulo 1.5
>>>> (the standard one from the Accumulo website).
>>>>
>>>> Whenever I use the WholeRowIterator with a Scanner, I cannot configure
>>>> the Range for that Scanner to correctly return the start row for the
>>>> Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe
>>>> row], it returns zero entries. For Range('row0',true,'row1\0',true),
>>>> it returns only "row1".
>>>>
>>>>   From the WholeRowIterator documentation, this behavior implies that
>>>> the startInclusive bit was set to False, which it clearly wasn't.
>>>>
>>>> I've been able to hack around this issue by setting the start key to
>>>>
>>>> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)
>>>>
>>>> but I'd really rather understand the correct way of using a Range
>>>> object in conjunction with a WholeRowIterator.
>>>>
>>>> Thanks much,
>>>>
>>>> David
>>>>
>>>
>

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by Josh Elser <jo...@gmail.com>.

Ah, gotcha.

That definitely does not seem right. I'll see if I can poke around at 
this today.

Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago)

On 4/12/14, 4:13 PM, David O'Gwynn wrote:
> Hi Josh,
>
> I guess I misspoke, the Range I'm passing is this:
>
> Range('row0', true, 'row0\0',true)
>
> Keeping in mind that the Thrift interface only exposes one Range
> constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is
> this:
>
> Range( Key('row0',null,...), true, Key('row0\0',null,...), true )
>
> If I scan for all entries (without WholeRowIterator), I get the full
> contents of "row0". However, when I add the WholeRowIterator, it
> returns nothing.
>
> Furthermore, if I were to pass the following:
>
> Range( Key('row0',null,...), true, Key('row1\0',null,...), true )
>
> not only do I get both "row0" and "row1" without the WRI, I get "row1"
> as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow
> interpreting my Range as having startKeyInclusive set to false, which
> is clearly not the case.
>
> Thanks,
> David
>
>
> On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <jo...@gmail.com> wrote:
>> Hi David,
>>
>> Looks like you're just mis-using the Range here.
>>
>> If you create a range that is ["row0", "row0"] as you denote below, that
>> will only include Keys that have a rowId of "row0" with an empty colfam,
>> colqual, etc. Since you want to use the WholeRowIterator, I can assume you
>> want all columns in "row0". As such, ["row0", "row0\0") would be the best
>> range to fetch all of the columns in that single row.
>>
>>
>> On 4/12/2014 1:59 PM, David O'Gwynn wrote:
>>>
>>> Hi all,
>>>
>>> I'm working with the Python Thrift API for the Accumulo proxy service,
>>> and I have a bit of odd behavior happening. I'm using Accumulo 1.5
>>> (the standard one from the Accumulo website).
>>>
>>> Whenever I use the WholeRowIterator with a Scanner, I cannot configure
>>> the Range for that Scanner to correctly return the start row for the
>>> Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe
>>> row], it returns zero entries. For Range('row0',true,'row1\0',true),
>>> it returns only "row1".
>>>
>>>   From the WholeRowIterator documentation, this behavior implies that
>>> the startInclusive bit was set to False, which it clearly wasn't.
>>>
>>> I've been able to hack around this issue by setting the start key to
>>>
>>> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)
>>>
>>> but I'd really rather understand the correct way of using a Range
>>> object in conjunction with a WholeRowIterator.
>>>
>>> Thanks much,
>>>
>>> David
>>>
>>

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by David O'Gwynn <do...@acm.org>.

Hi Josh,

I guess I misspoke, the Range I'm passing is this:

Range('row0', true, 'row0\0',true)

Keeping in mind that the Thrift interface only exposes one Range
constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is
this:

Range( Key('row0',null,...), true, Key('row0\0',null,...), true )

If I scan for all entries (without WholeRowIterator), I get the full
contents of "row0". However, when I add the WholeRowIterator, it
returns nothing.

Furthermore, if I were to pass the following:

Range( Key('row0',null,...), true, Key('row1\0',null,...), true )

not only do I get both "row0" and "row1" without the WRI, I get "row1"
as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow
interpreting my Range as having startKeyInclusive set to false, which
is clearly not the case.

Thanks,
David


On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <jo...@gmail.com> wrote:
> Hi David,
>
> Looks like you're just mis-using the Range here.
>
> If you create a range that is ["row0", "row0"] as you denote below, that
> will only include Keys that have a rowId of "row0" with an empty colfam,
> colqual, etc. Since you want to use the WholeRowIterator, I can assume you
> want all columns in "row0". As such, ["row0", "row0\0") would be the best
> range to fetch all of the columns in that single row.
>
>
> On 4/12/2014 1:59 PM, David O'Gwynn wrote:
>>
>> Hi all,
>>
>> I'm working with the Python Thrift API for the Accumulo proxy service,
>> and I have a bit of odd behavior happening. I'm using Accumulo 1.5
>> (the standard one from the Accumulo website).
>>
>> Whenever I use the WholeRowIterator with a Scanner, I cannot configure
>> the Range for that Scanner to correctly return the start row for the
>> Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe
>> row], it returns zero entries. For Range('row0',true,'row1\0',true),
>> it returns only "row1".
>>
>>  From the WholeRowIterator documentation, this behavior implies that
>> the startInclusive bit was set to False, which it clearly wasn't.
>>
>> I've been able to hack around this issue by setting the start key to
>>
>> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)
>>
>> but I'd really rather understand the correct way of using a Range
>> object in conjunction with a WholeRowIterator.
>>
>> Thanks much,
>>
>> David
>>
>

Re: Thrift proxy: Python WholeRowIterator behavior

Posted by Josh Elser <jo...@gmail.com>.

Hi David,

Looks like you're just mis-using the Range here.

If you create a range that is ["row0", "row0"] as you denote below, that 
will only include Keys that have a rowId of "row0" with an empty colfam, 
colqual, etc. Since you want to use the WholeRowIterator, I can assume 
you want all columns in "row0". As such, ["row0", "row0\0") would be the 
best range to fetch all of the columns in that single row.

On 4/12/2014 1:59 PM, David O'Gwynn wrote:
> Hi all,
>
> I'm working with the Python Thrift API for the Accumulo proxy service,
> and I have a bit of odd behavior happening. I'm using Accumulo 1.5
> (the standard one from the Accumulo website).
>
> Whenever I use the WholeRowIterator with a Scanner, I cannot configure
> the Range for that Scanner to correctly return the start row for the
> Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe
> row], it returns zero entries. For Range('row0',true,'row1\0',true),
> it returns only "row1".
>
>  From the WholeRowIterator documentation, this behavior implies that
> the startInclusive bit was set to False, which it clearly wasn't.
>
> I've been able to hack around this issue by setting the start key to
>
> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)
>
> but I'd really rather understand the correct way of using a Range
> object in conjunction with a WholeRowIterator.
>
> Thanks much,
>
> David
>