You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Peter Rainer <ra...@gmail.com> on 2013/10/25 18:59:40 UTC

BatchScanner sort question

Hi,

in the BatchScanner JavaDoc it says "Also only use this *when you do not
care about the returned data being in sorted order*.* *If you want to
lookup a few ranges and expect those ranges to contain a lot of data, then
use the Scanner instead. Also, the Scanner will return data in sorted
order, this will not."

I'm not a 100% sure how to interpret this, so I was wondering if anyone of
you could help me clarify that:

*Option 1)*
Rows are not sorted, but all Key/Value Pairs with the same Row Key are in
sequence

Example:
Format: Key:CF:CQ:Value
A:CF1:CQ1:1
A:CF2:CQ2:2
C:CF1:CQ1:1
B:CF1:CQ1:1

*Option2)*
Rows are not sorted and not even Key/Value Pairs with the same Row Key are
in sequence

Example:
Format: Key:CF:CQ:Value
A:CF1:CQ1:1
C:CF1:CQ1:1
A:CF2:CQ2:2
B:CF1:CQ1:1


Thanks,
Peter

Re: BatchScanner sort question

Posted by Peter Rainer <ra...@gmail.com>.
Thanks John, that does help me a lot


On Fri, Oct 25, 2013 at 7:03 PM, John Vines <vi...@apache.org> wrote:

> The batch scanner works by getting batches from all tablets in the scan.
> This will typically result in getting sequential batches that are in
> non-sequential ordering. Because batches are solely based on individual
> key-value pairs, it is possible to get a batch that ends mid-row such that
> the following key is a completely different key, also possibly mid-row. If
> you want to guarantee entire rows, the whole row iterator can be used.
>
> tldr; Option2 is accurate, but you can force Option1 to occur
>
>
> On Fri, Oct 25, 2013 at 12:59 PM, Peter Rainer <ra...@gmail.com>wrote:
>
>> Hi,
>>
>> in the BatchScanner JavaDoc it says "Also only use this *when you do not
>> care about the returned data being in sorted order*.* *If you want to
>> lookup a few ranges and expect those ranges to contain a lot of data, then
>> use the Scanner instead. Also, the Scanner will return data in sorted
>> order, this will not."
>>
>> I'm not a 100% sure how to interpret this, so I was wondering if anyone
>> of you could help me clarify that:
>>
>> *Option 1)*
>> Rows are not sorted, but all Key/Value Pairs with the same Row Key are in
>> sequence
>>
>> Example:
>> Format: Key:CF:CQ:Value
>> A:CF1:CQ1:1
>> A:CF2:CQ2:2
>> C:CF1:CQ1:1
>> B:CF1:CQ1:1
>>
>> *Option2)*
>> Rows are not sorted and not even Key/Value Pairs with the same Row Key
>> are in sequence
>>
>> Example:
>> Format: Key:CF:CQ:Value
>> A:CF1:CQ1:1
>> C:CF1:CQ1:1
>> A:CF2:CQ2:2
>> B:CF1:CQ1:1
>>
>>
>> Thanks,
>> Peter
>>
>>
>

Re: BatchScanner sort question

Posted by John Vines <vi...@apache.org>.
The batch scanner works by getting batches from all tablets in the scan.
This will typically result in getting sequential batches that are in
non-sequential ordering. Because batches are solely based on individual
key-value pairs, it is possible to get a batch that ends mid-row such that
the following key is a completely different key, also possibly mid-row. If
you want to guarantee entire rows, the whole row iterator can be used.

tldr; Option2 is accurate, but you can force Option1 to occur


On Fri, Oct 25, 2013 at 12:59 PM, Peter Rainer <ra...@gmail.com>wrote:

> Hi,
>
> in the BatchScanner JavaDoc it says "Also only use this *when you do not
> care about the returned data being in sorted order*.* *If you want to
> lookup a few ranges and expect those ranges to contain a lot of data, then
> use the Scanner instead. Also, the Scanner will return data in sorted
> order, this will not."
>
> I'm not a 100% sure how to interpret this, so I was wondering if anyone of
> you could help me clarify that:
>
> *Option 1)*
> Rows are not sorted, but all Key/Value Pairs with the same Row Key are in
> sequence
>
> Example:
> Format: Key:CF:CQ:Value
> A:CF1:CQ1:1
> A:CF2:CQ2:2
> C:CF1:CQ1:1
> B:CF1:CQ1:1
>
> *Option2)*
> Rows are not sorted and not even Key/Value Pairs with the same Row Key are
> in sequence
>
> Example:
> Format: Key:CF:CQ:Value
> A:CF1:CQ1:1
> C:CF1:CQ1:1
> A:CF2:CQ2:2
> B:CF1:CQ1:1
>
>
> Thanks,
> Peter
>
>