You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Vishwas Bm <bm...@gmail.com> on 2018/10/09 10:17:56 UTC

Issue with GroupByKey in BeamSql using SparkRunner

We are trying to setup a pipeline with using BeamSql and the trigger used
is default (AfterWatermark crosses the window).
Below is the pipeline:

   KafkaSource (KafkaIO) ---> Windowing (FixedWindow 1min) ---> BeamSql
---> KafkaSink (KafkaIO)

We are using Spark Runner for this.
The BeamSql query is:
             select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY
Col3

We are grouping by Col3 which is a string. It can hold values string[0-9].

The records are getting emitted out at 1 min to kafka sink, but the output
record in kafka is not as expected.
Below is the output observed: (WST and WET are indicators for window start
time and window end time)

{"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000  +0000"}
{"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000  +0000"}








*{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000
+0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000
+0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000
+0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000
+0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000
+0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000
+0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000
+0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000
+0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
+0000","WET":"2018-10-09  09-56-00 0000  +0000"}*

We ran the same pipeline using direct and flink runner and we dont see 0
entries for count_col1.

As per beam matrix page (
https://beam.apache.org/documentation/runners/capability-matrix/#cap-summary-what),
GroupBy is not fully supported,is this one of those cases ?

*Thanks & Regards,*

*Vishwas *

Re: Issue with GroupByKey in BeamSql using SparkRunner

Posted by Vishwas Bm <bm...@gmail.com>.
Hi,

I tried with 2.3.2 version of spark in local mode and I see the same issue.

Regards,
Vishwas


On Wed, Oct 10, 2018, 2:47 PM Ismaël Mejía <ie...@gmail.com> wrote:

> Are you trying this in a particular spark distribution or just locally ?
> I ask this because there was a data corruption issue with Spark 2.3.1
> (previous version used by Beam)
> https://issues.apache.org/jira/browse/SPARK-23243
>
> Current Beam master (and next release) moves Spark to version 2.3.2
> and that should fix some of the data correctness issues (maybe yours
> too).
> Can you give it a try and report back if this fixes your issue.
>
>
> On Tue, Oct 9, 2018 at 6:45 PM Vishwas Bm <bm...@gmail.com> wrote:
> >
> > Hi Kenn,
> >
> > We are using Beam 2.6 and using Spark_submit to submit jobs to Spark 2.2
> cluster on Kubernetes.
> >
> >
> > On Tue, Oct 9, 2018, 9:29 PM Kenneth Knowles <ke...@apache.org> wrote:
> >>
> >> Thanks for the report! I filed
> https://issues.apache.org/jira/browse/BEAM-5690 to track the issue.
> >>
> >> Can you share what version of Beam you are using?
> >>
> >> Kenn
> >>
> >> On Tue, Oct 9, 2018 at 3:18 AM Vishwas Bm <bm...@gmail.com> wrote:
> >>>
> >>> We are trying to setup a pipeline with using BeamSql and the trigger
> used is default (AfterWatermark crosses the window).
> >>> Below is the pipeline:
> >>>
> >>>    KafkaSource (KafkaIO) ---> Windowing (FixedWindow 1min) --->
> BeamSql ---> KafkaSink (KafkaIO)
> >>>
> >>> We are using Spark Runner for this.
> >>> The BeamSql query is:
> >>>              select Col3, count(*) as count_col1 from PCOLLECTION
> GROUP BY Col3
> >>>
> >>> We are grouping by Col3 which is a string. It can hold values
> string[0-9].
> >>>
> >>> The records are getting emitted out at 1 min to kafka sink, but the
> output record in kafka is not as expected.
> >>> Below is the output observed: (WST and WET are indicators for window
> start time and window end time)
> >>>
> >>> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> >>>
> >>> We ran the same pipeline using direct and flink runner and we dont see
> 0 entries for count_col1.
> >>>
> >>> As per beam matrix page (
> https://beam.apache.org/documentation/runners/capability-matrix/#cap-summary-what),
> GroupBy is not fully supported,is this one of those cases ?
> >>> Thanks & Regards,
> >>> Vishwas
> >>>
>

Re: Issue with GroupByKey in BeamSql using SparkRunner

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
It's maybe related: I have a pipeline (streaming with sliding windows)
that works fine with Direct and Flink runners, but I don't have any
result when using the Spark runner.

I gonna investigate this using my beam-samples.

Regards
JB

On 10/10/2018 11:16, Ismaël Mejía wrote:
> Are you trying this in a particular spark distribution or just locally ?
> I ask this because there was a data corruption issue with Spark 2.3.1
> (previous version used by Beam)
> https://issues.apache.org/jira/browse/SPARK-23243
> 
> Current Beam master (and next release) moves Spark to version 2.3.2
> and that should fix some of the data correctness issues (maybe yours
> too).
> Can you give it a try and report back if this fixes your issue.
> 
> 
> On Tue, Oct 9, 2018 at 6:45 PM Vishwas Bm <bm...@gmail.com> wrote:
>>
>> Hi Kenn,
>>
>> We are using Beam 2.6 and using Spark_submit to submit jobs to Spark 2.2 cluster on Kubernetes.
>>
>>
>> On Tue, Oct 9, 2018, 9:29 PM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>> Thanks for the report! I filed https://issues.apache.org/jira/browse/BEAM-5690 to track the issue.
>>>
>>> Can you share what version of Beam you are using?
>>>
>>> Kenn
>>>
>>> On Tue, Oct 9, 2018 at 3:18 AM Vishwas Bm <bm...@gmail.com> wrote:
>>>>
>>>> We are trying to setup a pipeline with using BeamSql and the trigger used is default (AfterWatermark crosses the window).
>>>> Below is the pipeline:
>>>>
>>>>    KafkaSource (KafkaIO) ---> Windowing (FixedWindow 1min) ---> BeamSql ---> KafkaSink (KafkaIO)
>>>>
>>>> We are using Spark Runner for this.
>>>> The BeamSql query is:
>>>>              select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3
>>>>
>>>> We are grouping by Col3 which is a string. It can hold values string[0-9].
>>>>
>>>> The records are getting emitted out at 1 min to kafka sink, but the output record in kafka is not as expected.
>>>> Below is the output observed: (WST and WET are indicators for window start time and window end time)
>>>>
>>>> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>>
>>>> We ran the same pipeline using direct and flink runner and we dont see 0 entries for count_col1.
>>>>
>>>> As per beam matrix page (https://beam.apache.org/documentation/runners/capability-matrix/#cap-summary-what), GroupBy is not fully supported,is this one of those cases ?
>>>> Thanks & Regards,
>>>> Vishwas
>>>>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Issue with GroupByKey in BeamSql using SparkRunner

Posted by Ismaël Mejía <ie...@gmail.com>.
Are you trying this in a particular spark distribution or just locally ?
I ask this because there was a data corruption issue with Spark 2.3.1
(previous version used by Beam)
https://issues.apache.org/jira/browse/SPARK-23243

Current Beam master (and next release) moves Spark to version 2.3.2
and that should fix some of the data correctness issues (maybe yours
too).
Can you give it a try and report back if this fixes your issue.


On Tue, Oct 9, 2018 at 6:45 PM Vishwas Bm <bm...@gmail.com> wrote:
>
> Hi Kenn,
>
> We are using Beam 2.6 and using Spark_submit to submit jobs to Spark 2.2 cluster on Kubernetes.
>
>
> On Tue, Oct 9, 2018, 9:29 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>> Thanks for the report! I filed https://issues.apache.org/jira/browse/BEAM-5690 to track the issue.
>>
>> Can you share what version of Beam you are using?
>>
>> Kenn
>>
>> On Tue, Oct 9, 2018 at 3:18 AM Vishwas Bm <bm...@gmail.com> wrote:
>>>
>>> We are trying to setup a pipeline with using BeamSql and the trigger used is default (AfterWatermark crosses the window).
>>> Below is the pipeline:
>>>
>>>    KafkaSource (KafkaIO) ---> Windowing (FixedWindow 1min) ---> BeamSql ---> KafkaSink (KafkaIO)
>>>
>>> We are using Spark Runner for this.
>>> The BeamSql query is:
>>>              select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3
>>>
>>> We are grouping by Col3 which is a string. It can hold values string[0-9].
>>>
>>> The records are getting emitted out at 1 min to kafka sink, but the output record in kafka is not as expected.
>>> Below is the output observed: (WST and WET are indicators for window start time and window end time)
>>>
>>> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>>
>>> We ran the same pipeline using direct and flink runner and we dont see 0 entries for count_col1.
>>>
>>> As per beam matrix page (https://beam.apache.org/documentation/runners/capability-matrix/#cap-summary-what), GroupBy is not fully supported,is this one of those cases ?
>>> Thanks & Regards,
>>> Vishwas
>>>

Re: Issue with GroupByKey in BeamSql using SparkRunner

Posted by Vishwas Bm <bm...@gmail.com>.
Hi Kenn,

We are using Beam 2.6 and using Spark_submit to submit jobs to Spark 2.2
cluster on Kubernetes.


On Tue, Oct 9, 2018, 9:29 PM Kenneth Knowles <ke...@apache.org> wrote:

> Thanks for the report! I filed
> https://issues.apache.org/jira/browse/BEAM-5690 to track the issue.
>
> Can you share what version of Beam you are using?
>
> Kenn
>
> On Tue, Oct 9, 2018 at 3:18 AM Vishwas Bm <bm...@gmail.com> wrote:
>
>> We are trying to setup a pipeline with using BeamSql and the trigger used
>> is default (AfterWatermark crosses the window).
>> Below is the pipeline:
>>
>>    KafkaSource (KafkaIO) ---> Windowing (FixedWindow 1min) ---> BeamSql
>> ---> KafkaSink (KafkaIO)
>>
>> We are using Spark Runner for this.
>> The BeamSql query is:
>>              select Col3, count(*) as count_col1 from PCOLLECTION GROUP
>> BY Col3
>>
>> We are grouping by Col3 which is a string. It can hold values
>> string[0-9].
>>
>> The records are getting emitted out at 1 min to kafka sink, but the
>> output record in kafka is not as expected.
>> Below is the output observed: (WST and WET are indicators for window
>> start time and window end time)
>>
>> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>>
>>
>>
>>
>>
>>
>>
>>
>> *{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000
>> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000
>> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000
>> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000
>> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000
>> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000
>> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000
>> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000
>> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
>> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}*
>>
>> We ran the same pipeline using direct and flink runner and we dont see 0
>> entries for count_col1.
>>
>> As per beam matrix page (
>> https://beam.apache.org/documentation/runners/capability-matrix/#cap-summary-what),
>> GroupBy is not fully supported,is this one of those cases ?
>>
>> *Thanks & Regards,*
>>
>> *Vishwas *
>>
>>

Re: Issue with GroupByKey in BeamSql using SparkRunner

Posted by Kenneth Knowles <ke...@apache.org>.
Thanks for the report! I filed
https://issues.apache.org/jira/browse/BEAM-5690 to track the issue.

Can you share what version of Beam you are using?

Kenn

On Tue, Oct 9, 2018 at 3:18 AM Vishwas Bm <bm...@gmail.com> wrote:

> We are trying to setup a pipeline with using BeamSql and the trigger used
> is default (AfterWatermark crosses the window).
> Below is the pipeline:
>
>    KafkaSource (KafkaIO) ---> Windowing (FixedWindow 1min) ---> BeamSql
> ---> KafkaSink (KafkaIO)
>
> We are using Spark Runner for this.
> The BeamSql query is:
>              select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY
> Col3
>
> We are grouping by Col3 which is a string. It can hold values string[0-9].
>
> The records are getting emitted out at 1 min to kafka sink, but the output
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start
> time and window end time)
>
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
>
>
>
>
>
>
>
>
> *{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000
> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000
> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000
> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000
> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000
> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000
> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000
> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000
> +0000"}{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000
> +0000","WET":"2018-10-09  09-56-00 0000  +0000"}*
>
> We ran the same pipeline using direct and flink runner and we dont see 0
> entries for count_col1.
>
> As per beam matrix page (
> https://beam.apache.org/documentation/runners/capability-matrix/#cap-summary-what),
> GroupBy is not fully supported,is this one of those cases ?
>
> *Thanks & Regards,*
>
> *Vishwas *
>
>