You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Christopher Tubbs (JIRA)" <ji...@apache.org> on 2017/02/10 21:52:41 UTC
[jira] [Commented] (ACCUMULO-4586) Make rowiterator fail when
unsorted data is observed
[ https://issues.apache.org/jira/browse/ACCUMULO-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861888#comment-15861888 ]
Christopher Tubbs commented on ACCUMULO-4586:
---------------------------------------------
This is pretty much going to guarantee failure whenever the BatchScanner is used. At the same time, it seems like it might be overly restrictive.
As far as I can tell, the RowIterator doesn't necessarily need the data to be in sorted order... it just needs all the entries for a single row to be grouped together. RowIterator works just fine over single-entry rows (though, it's a bit unnecessary at that point), or if wrapping a custom scanner or other source which provides this guarantee. It also works just fine if the user doesn't care if a row is split into a few different objects, even if the source makes no such guarantees.
I think we should deprecate RowIterator and remove it in 2.0. The Java 8 streams API makes this class redundant, since there are better options for grouping by, using collectors. The streams API also makes it a bit more obvious the cost and results of trying to do groupBy on unsorted data. It's not hidden inside assumptions within RowIterator. Rather, it actually imposes a level of difficulty upon the user trying to use the streams API for grouping, because it's just inherently hard to do on unsorted data.
> Make rowiterator fail when unsorted data is observed
> ----------------------------------------------------
>
> Key: ACCUMULO-4586
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4586
> Project: Accumulo
> Issue Type: Bug
> Affects Versions: 1.6.6, 1.7.1, 1.8.0
> Reporter: Keith Turner
> Fix For: 1.7.3, 1.8.2, 2.0.0
>
>
> A batchscanner was used as a row iterator data source. The rowiterator expects data in sorted order and the batch scanner does not supply data in sorted order. The row iterator should have a sanity check to ensure source data is in sorted order.
> https://lists.apache.org/thread.html/c24448d171d8414321bccfc778c7fc8b53e45892cae9daafa220503f@%3Cuser.accumulo.apache.org%3E
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)