You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Kenneth Knowles (JIRA)" <ji...@apache.org> on 2018/10/09 15:59:00 UTC

[jira] [Created] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

Kenneth Knowles created BEAM-5690:
-------------------------------------

             Summary: Issue with GroupByKey in BeamSql using SparkRunner
                 Key: BEAM-5690
                 URL: https://issues.apache.org/jira/browse/BEAM-5690
             Project: Beam
          Issue Type: Task
          Components: runner-spark
            Reporter: Kenneth Knowles
            Assignee: Amit Sela


Reported on user@

{quote}We are trying to setup a pipeline with using BeamSql and the trigger used is default (AfterWatermark crosses the window). 
Below is the pipeline:
  
   KafkaSource (KafkaIO) 
       ---> Windowing (FixedWindow 1min)
       ---> BeamSql
       ---> KafkaSink (KafkaIO)
                         
We are using Spark Runner for this. 
The BeamSql query is:
{code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}

We are grouping by Col3 which is a string. It can hold values string[0-9]. 
             
The records are getting emitted out at 1 min to kafka sink, but the output record in kafka is not as expected.
Below is the output observed: (WST and WET are indicators for window start time and window end time)

{code}
{"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0}
{code}
{quote}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)