You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Brachi Packter (JIRA)" <ji...@apache.org> on 2019/04/04 12:40:00 UTC
[jira] [Commented] (BEAM-2728) Extension for sketch-based
statistics
[ https://issues.apache.org/jira/browse/BEAM-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809803#comment-16809803 ]
Brachi Packter commented on BEAM-2728:
--------------------------------------
I want to save the sketch itself to BigQuery, to be able to perform merge [https://cloud.google.com/bigquery/docs/reference/standard-sql/hll_functions]
I used this library [https://github.com/apache/beam/tree/master/sdks/java/extensions/sketching]
and in the code:
{code:java}
.apply("hll-count", Combine.perKey(ApproximateDistinct.ApproximateDistinctFn .create(StringUtf8Coder.of())))
.apply("to-table-row", ParDo.of(new DoFn< ValueInSingleWindow<KV<GroupByData,HyperLogLogPlus>>, TableRow>() {
@ProcessElement
public void processElement(ProcessContext processContext) {
ValueInSingleWindow<KV<GroupByData,HyperLogLogPlus>> windowed = processContext.element();
KV<GroupByData, HyperLogLogPlus> keyData = windowed.getValue();
GroupByData key = keyData.getKey();
HyperLogLogPlus hllSketch = keyData.getValue();
TableRow tableRow = new TableRow();
tableRow.set("country_code",key.countryCode);
tableRow.set("event", key.event);
tableRow.set("profile", key.profile);
{code}
// How can I get the HLL ????????
{code:java}
tableRow.set("hll",hllSketch.getBytes());{code}
> Extension for sketch-based statistics
> -------------------------------------
>
> Key: BEAM-2728
> URL: https://issues.apache.org/jira/browse/BEAM-2728
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-sketching
> Reporter: Arnaud Fournier
> Assignee: Arnaud Fournier
> Priority: Minor
> Time Spent: 12h 40m
> Remaining Estimate: 0h
>
> Goal : Provide an extension library to compute approximate statistics on streams.
> Interest : Probabilistic data structures can create an approximation (sketch) of the current state of a stream without storing every element but rather processing each observation quickly to summarize its current state and find useful statistical insights.
> Implementation is here : https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/extensions/sketching
> More info : https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUeusiwL0Jo2ACI5PEOP1kc/edit
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)