You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@accumulo.apache.org by "Christopher Tubbs (Jira)" <ji...@apache.org> on 2022/11/02 19:03:00 UTC

[jira] [Resolved] (ACCUMULO-4562) Consider Adding Java 8 Stream support to scanners

     [ https://issues.apache.org/jira/browse/ACCUMULO-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christopher Tubbs resolved ACCUMULO-4562.
-----------------------------------------
    Resolution: Duplicate

Done in https://github.com/apache/accumulo/pull/2636

> Consider Adding Java 8 Stream support to scanners
> -------------------------------------------------
>
>                 Key: ACCUMULO-4562
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4562
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Keith Turner
>            Priority: Major
>
> For a test I wanted to find the min and max timestamp of an Accumulo table.  I used Java 8 streams to do that as follows.  The code {{StreamSupport.stream(scanner.spliterator(), false)}} is a standard way in Java 8 to create a stream from an Iterable.
> {code:java}
>     try(Scanner scanner = c.createScanner(table, Authorizations.EMPTY)){
>       Stream<Entry<Key,Value>> stream = 
>             StreamSupport.stream(scanner.spliterator(), false);
>       LongSummaryStatistics stats = stream
>          .mapToLong(e -> e.getKey().getTimestamp())
>          .summaryStatistics();
>       System.out.println(stats);
>     }
> {code}
> In Java 8, collections have the {{stream()}} and {{parallelStream()}} methods.  If ScannerBase had those methods in Accumulo, then the following could be written w/o using {{StreamSupport}}
> {code:java}
>     try(Scanner scanner = c.createScanner(table, Authorizations.EMPTY)){
>       LongSummaryStatistics stats = scanner.stream()
>          .mapToLong(e -> e.getKey().getTimestamp())
>          .summaryStatistics();
>       System.out.println(stats);
>     }
> {code}
> For the BatchScanner I think we could implement a parallel stream.  One way to do this would be a to create an internal batch scanner queue for each Java 8 split iterator.  Currently the BatchScanner has one queue that all background threads put batches of key values on.  With multiple queues, each background thread could break its batches into equal sizes and put a subset on each queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)