You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Christopher Tubbs (JIRA)" <ji...@apache.org> on 2017/02/10 21:52:41 UTC

[jira] [Commented] (ACCUMULO-4586) Make rowiterator fail when unsorted data is observed

    [ https://issues.apache.org/jira/browse/ACCUMULO-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861888#comment-15861888 ] 

Christopher Tubbs commented on ACCUMULO-4586:
---------------------------------------------

This is pretty much going to guarantee failure whenever the BatchScanner is used. At the same time, it seems like it might be overly restrictive.

As far as I can tell, the RowIterator doesn't necessarily need the data to be in sorted order... it just needs all the entries for a single row to be grouped together. RowIterator works just fine over single-entry rows (though, it's a bit unnecessary at that point), or if wrapping a custom scanner or other source which provides this guarantee. It also works just fine if the user doesn't care if a row is split into a few different objects, even if the source makes no such guarantees.

I think we should deprecate RowIterator and remove it in 2.0. The Java 8 streams API makes this class redundant, since there are better options for grouping by, using collectors. The streams API also makes it a bit more obvious the cost and results of trying to do groupBy on unsorted data. It's not hidden inside assumptions within RowIterator. Rather, it actually imposes a level of difficulty upon the user trying to use the streams API for grouping, because it's just inherently hard to do on unsorted data.

> Make rowiterator fail when unsorted data is observed
> ----------------------------------------------------
>
>                 Key: ACCUMULO-4586
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4586
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.6.6, 1.7.1, 1.8.0
>            Reporter: Keith Turner
>             Fix For: 1.7.3, 1.8.2, 2.0.0
>
>
> A batchscanner was used as a row iterator data source.  The rowiterator expects data in sorted order and the batch scanner does not supply data in sorted order.  The row iterator should have a sanity check to ensure source data is in sorted order.
> https://lists.apache.org/thread.html/c24448d171d8414321bccfc778c7fc8b53e45892cae9daafa220503f@%3Cuser.accumulo.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)