You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Lu Q <lu...@gmail.com> on 2017/02/10 02:50:31 UTC

data miss when use rowiterator

I use accumulo 1.8.0,and I develop a ORM framework for conversion the scan result to a object.

Before,I use Rowiterator because it faster than direct to use scan

RowIterator rows = new RowIterator(scan);
rows.forEachRemaining(rowIterator -> {
	while (rowIterator.hasNext()) {
		Map.Entry<Key, Value> entry = rowIterator.next();
		...
	}
}

it works ok until I query 1000+ once .I found that when the range size bigger then 1000,some data miss.
I think maybe I conversion it error ,so I change it to a map struct ,the row_id as the map key ,and other as the map value ,the problem still exists.

Then I not use RowIterator,it works ok.
for (Map.Entry<Key, Value> entry : scan) {
	...
}


Is the bug or my program error ?
Thanks.

Re: data miss when use rowiterator

Posted by Lu Q <lu...@gmail.com>.

Thanks

> 在 2017年2月10日，12:39，Josh Elser <el...@apache.org> 写道：
> 
> Just to be clear, Lu, for now stick to using a Scanner with the RowIterator :)
> 
> It sounds like we might have to re-think how the RowIterator works with the BatchScanner...
> 
> Christopher wrote:
>> I suspected that was the case. BatchScanner does not guarantee ordering
>> of entries, which is needed for the behavior you're expecting with
>> RowIterator. This means that the RowIterator could see the same row
>> multiple times with different subsets of the row's columns. This is
>> probably affecting your count.
>> 
>> On Thu, Feb 9, 2017 at 10:29 PM Lu Q <luq.java@gmail.com
>> <ma...@gmail.com>> wrote:
>> 
>>    I use BatchScanner
>> 
>>>    在 2017年2月10日，11:24，Christopher <ctubbsii@apache.org
>>>    <ma...@apache.org>> 写道：
>>> 
>>>    Does it matter if your scanner is a BatchScanner or a Scanner?
>>>    I wonder if this is due to the way BatchScanner could split rows up.
>>> 
>>>    On Thu, Feb 9, 2017 at 9:50 PM Lu Q <luq.java@gmail.com
>>>    <ma...@gmail.com>> wrote:
>>> 
>>> 
>>>        I use accumulo 1.8.0,and I develop a ORM framework for
>>>        conversion the scan result to a object.
>>> 
>>>        Before,I use Rowiterator because it faster than direct to use scan
>>> 
>>>        RowIterator rows = new RowIterator(scan);
>>>        rows.forEachRemaining(rowIterator -> {
>>>        while (rowIterator.hasNext()) {
>>>        Map.Entry<Key, Value> entry = rowIterator.next();
>>>        ...
>>>        }
>>>        }
>>> 
>>>        it works ok until I query 1000+ once .I found that when the
>>>        range size bigger then 1000,some data miss.
>>>        I think maybe I conversion it error ,so I change it to a map
>>>        struct ,the row_id as the map key ,and other as the map value
>>>        ,the problem still exists.
>>> 
>>>        Then I not use RowIterator,it works ok.
>>>        for (Map.Entry<Key, Value> entry : scan) {
>>>        ...
>>>        }
>>> 
>>> 
>>>        Is the bug or my program error ?
>>>        Thanks.
>>> 
>>>    --
>>>    Christopher
>> 
>> --
>> Christopher

Re: data miss when use rowiterator

Posted by Keith Turner <ke...@deenlo.com>.

On Thu, Feb 9, 2017 at 11:39 PM, Josh Elser <el...@apache.org> wrote:
> Just to be clear, Lu, for now stick to using a Scanner with the RowIterator
> :)
>
> It sounds like we might have to re-think how the RowIterator works with the
> BatchScanner...

I opened : https://issues.apache.org/jira/browse/ACCUMULO-4586

>
> Christopher wrote:
>>
>> I suspected that was the case. BatchScanner does not guarantee ordering
>> of entries, which is needed for the behavior you're expecting with
>> RowIterator. This means that the RowIterator could see the same row
>> multiple times with different subsets of the row's columns. This is
>> probably affecting your count.
>>
>> On Thu, Feb 9, 2017 at 10:29 PM Lu Q <luq.java@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     I use BatchScanner
>>
>>>     在 2017年2月10日，11:24，Christopher <ctubbsii@apache.org
>>>     <ma...@apache.org>> 写道：
>>>
>>>     Does it matter if your scanner is a BatchScanner or a Scanner?
>>>     I wonder if this is due to the way BatchScanner could split rows up.
>>>
>>>     On Thu, Feb 9, 2017 at 9:50 PM Lu Q <luq.java@gmail.com
>>>     <ma...@gmail.com>> wrote:
>>>
>>>
>>>         I use accumulo 1.8.0,and I develop a ORM framework for
>>>         conversion the scan result to a object.
>>>
>>>         Before,I use Rowiterator because it faster than direct to use
>>> scan
>>>
>>>         RowIterator rows = new RowIterator(scan);
>>>         rows.forEachRemaining(rowIterator -> {
>>>         while (rowIterator.hasNext()) {
>>>         Map.Entry<Key, Value> entry = rowIterator.next();
>>>         ...
>>>         }
>>>         }
>>>
>>>         it works ok until I query 1000+ once .I found that when the
>>>         range size bigger then 1000,some data miss.
>>>         I think maybe I conversion it error ,so I change it to a map
>>>         struct ,the row_id as the map key ,and other as the map value
>>>         ,the problem still exists.
>>>
>>>         Then I not use RowIterator,it works ok.
>>>         for (Map.Entry<Key, Value> entry : scan) {
>>>         ...
>>>         }
>>>
>>>
>>>         Is the bug or my program error ?
>>>         Thanks.
>>>
>>>     --
>>>     Christopher
>>
>>
>> --
>> Christopher

Re: data miss when use rowiterator

Posted by Josh Elser <el...@apache.org>.

Just to be clear, Lu, for now stick to using a Scanner with the 
RowIterator :)

It sounds like we might have to re-think how the RowIterator works with 
the BatchScanner...

Christopher wrote:
> I suspected that was the case. BatchScanner does not guarantee ordering
> of entries, which is needed for the behavior you're expecting with
> RowIterator. This means that the RowIterator could see the same row
> multiple times with different subsets of the row's columns. This is
> probably affecting your count.
>
> On Thu, Feb 9, 2017 at 10:29 PM Lu Q <luq.java@gmail.com
> <ma...@gmail.com>> wrote:
>
>     I use BatchScanner
>
>>     \u5728 2017\u5e742\u670810\u65e5\uff0c11:24\uff0cChristopher <ctubbsii@apache.org
>>     <ma...@apache.org>> \u5199\u9053\uff1a
>>
>>     Does it matter if your scanner is a BatchScanner or a Scanner?
>>     I wonder if this is due to the way BatchScanner could split rows up.
>>
>>     On Thu, Feb 9, 2017 at 9:50 PM Lu Q <luq.java@gmail.com
>>     <ma...@gmail.com>> wrote:
>>
>>
>>         I use accumulo 1.8.0,and I develop a ORM framework for
>>         conversion the scan result to a object.
>>
>>         Before,I use Rowiterator because it faster than direct to use scan
>>
>>         RowIterator rows = new RowIterator(scan);
>>         rows.forEachRemaining(rowIterator -> {
>>         while (rowIterator.hasNext()) {
>>         Map.Entry<Key, Value> entry = rowIterator.next();
>>         ...
>>         }
>>         }
>>
>>         it works ok until I query 1000+ once .I found that when the
>>         range size bigger then 1000,some data miss.
>>         I think maybe I conversion it error ,so I change it to a map
>>         struct ,the row_id as the map key ,and other as the map value
>>         ,the problem still exists.
>>
>>         Then I not use RowIterator,it works ok.
>>         for (Map.Entry<Key, Value> entry : scan) {
>>         ...
>>         }
>>
>>
>>         Is the bug or my program error ?
>>         Thanks.
>>
>>     --
>>     Christopher
>
> --
> Christopher

Re: data miss when use rowiterator

Posted by Christopher <ct...@apache.org>.

I suspected that was the case. BatchScanner does not guarantee ordering of
entries, which is needed for the behavior you're expecting with
RowIterator. This means that the RowIterator could see the same row
multiple times with different subsets of the row's columns. This is
probably affecting your count.

On Thu, Feb 9, 2017 at 10:29 PM Lu Q <lu...@gmail.com> wrote:

> I use BatchScanner
>
> 在 2017年2月10日，11:24，Christopher <ct...@apache.org> 写道：
>
> Does it matter if your scanner is a BatchScanner or a Scanner?
> I wonder if this is due to the way BatchScanner could split rows up.
>
> On Thu, Feb 9, 2017 at 9:50 PM Lu Q <lu...@gmail.com> wrote:
>
>
> I use accumulo 1.8.0,and I develop a ORM framework for conversion the scan
> result to a object.
>
> Before,I use Rowiterator because it faster than direct to use scan
>
> RowIterator rows = new RowIterator(scan);
> rows.forEachRemaining(rowIterator -> {
> while (rowIterator.hasNext()) {
> Map.Entry<Key, Value> entry = rowIterator.next();
> ...
> }
> }
>
> it works ok until I query 1000+ once .I found that when the range size
> bigger then 1000,some data miss.
> I think maybe I conversion it error ,so I change it to a map struct ,the
> row_id as the map key ,and other as the map value ,the problem still exists.
>
> Then I not use RowIterator,it works ok.
> for (Map.Entry<Key, Value> entry : scan) {
> ...
> }
>
>
> Is the bug or my program error ?
> Thanks.
>
> --
> Christopher
>
>
> --
Christopher

Re: data miss when use rowiterator

Posted by Lu Q <lu...@gmail.com>.

I use BatchScanner

> 在 2017年2月10日，11:24，Christopher <ct...@apache.org> 写道：
> 
> Does it matter if your scanner is a BatchScanner or a Scanner?
> I wonder if this is due to the way BatchScanner could split rows up.
> 
> On Thu, Feb 9, 2017 at 9:50 PM Lu Q <luq.java@gmail.com <ma...@gmail.com>> wrote:
> 
> I use accumulo 1.8.0,and I develop a ORM framework for conversion the scan result to a object.
> 
> Before,I use Rowiterator because it faster than direct to use scan
> 
> RowIterator rows = new RowIterator(scan);
> rows.forEachRemaining(rowIterator -> {
> 	while (rowIterator.hasNext()) {
> 		Map.Entry<Key, Value> entry = rowIterator.next();
> 		...
> 	}
> }
> 
> it works ok until I query 1000+ once .I found that when the range size bigger then 1000,some data miss.
> I think maybe I conversion it error ,so I change it to a map struct ,the row_id as the map key ,and other as the map value ,the problem still exists.
> 
> Then I not use RowIterator,it works ok.
> for (Map.Entry<Key, Value> entry : scan) {
> 	...
> }
> 
> 
> Is the bug or my program error ?
> Thanks.
> -- 
> Christopher

Re: data miss when use rowiterator

Posted by Christopher <ct...@apache.org>.

Does it matter if your scanner is a BatchScanner or a Scanner?
I wonder if this is due to the way BatchScanner could split rows up.

On Thu, Feb 9, 2017 at 9:50 PM Lu Q <lu...@gmail.com> wrote:

>
> I use accumulo 1.8.0,and I develop a ORM framework for conversion the scan
> result to a object.
>
> Before,I use Rowiterator because it faster than direct to use scan
>
> RowIterator rows = new RowIterator(scan);
> rows.forEachRemaining(rowIterator -> {
> while (rowIterator.hasNext()) {
> Map.Entry<Key, Value> entry = rowIterator.next();
> ...
> }
> }
>
> it works ok until I query 1000+ once .I found that when the range size
> bigger then 1000,some data miss.
> I think maybe I conversion it error ,so I change it to a map struct ,the
> row_id as the map key ,and other as the map value ,the problem still exists.
>
> Then I not use RowIterator,it works ok.
> for (Map.Entry<Key, Value> entry : scan) {
> ...
> }
>
>
> Is the bug or my program error ?
> Thanks.
>
-- 
Christopher