You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@heron.apache.org by GitBox <gi...@apache.org> on 2018/12/14 00:47:13 UTC

[GitHub] nwangtw opened a new pull request #3125: Add KVStreamlet and keyBy() operation

nwangtw opened a new pull request #3125: Add KVStreamlet and keyBy() operation
URL: https://github.com/apache/incubator-heron/pull/3125
 
 
   Currently Streamlet is the only interface with data type R. Key value is fairly common in streaming process in more complicated jobs. But with a single interface, it is impossible to have special functions for kv data.
   
   Regular reduceByKeyAndWindow() function and future aggregation functions could have simpler signature with a new KVStreamlet interface and the operations can be chained. User code could also be cleaner and more readable.
   
   For example (assuming reduce, count and sum functions are available):
   
   Streamlet<KeyValue<Integer, Integer>> reduced = stream  // Streamlet<Integer>
     .reduceByKey(Integer x -> x % 10, .....);
   
   reduced
     .countByKey(KeyValue<Integer, Integer> x -> x.getKey())
     .log();
   
   reduced.
     .sumByKey(KeyValue<Integer, Integer> x -> x.getKey(),
                         KeyValue<Integer, Integer> x -> x.getValue() * getWeight(x.getKey()))
     .log();
   
   can be written into:
   
   KVStreamlet<Integer, Integer> reduced = stream  // Streamlet<Integer>
     .reduceByKey(Integer x -> x % 10, .....)
   
   reduced.countByKey().log
   reduced.sumByKey((Integer key, Integer value) -> value * getWeight(key)).log
   
   
   Note that reduceByKey, countByKey and sumByKey can also have shorter function names in KVStreamlet.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services