You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Ismaël Mejía (Jira)" <ji...@apache.org> on 2019/12/18 08:42:00 UTC
[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql
using SparkRunner
[ https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998942#comment-16998942 ]
Ismaël Mejía commented on BEAM-5690:
------------------------------------
The PR on this one is already merged. can we resolve this one [~echauchot] ?
> Issue with GroupByKey in BeamSql using SparkRunner
> --------------------------------------------------
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
> Issue Type: Task
> Components: runner-spark
> Reporter: Kenneth Knowles
> Priority: Major
> Time Spent: 4.5h
> Remaining Estimate: 0h
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger used is default (AfterWatermark crosses the window).
> Below is the pipeline:
>
> KafkaSource (KafkaIO)
> ---> Windowing (FixedWindow 1min)
> ---> BeamSql
> ---> KafkaSink (KafkaIO)
>
> We are using Spark Runner for this.
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9].
>
> The records are getting emitted out at 1 min to kafka sink, but the output record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0}
> {code}
> {quote}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)