You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Dennis Yung <de...@gmail.com> on 2021/03/18 02:58:21 UTC

Fwd: Source connector stopped without log on Dataflow (SqsIO)

Hi,

I am running a streaming job with the SqsIO source connector on Dataflow.
However, the source connector stopped retrieving events with no
exception/reason logged, after running for a few hours.
I have confirmed that the connector is not making new requests to SQS:
1. I have checked on SQS side that no "receive" requests are made after
2021-03-17 19:00:00 HKT
2. I set the dataflow log level to DEBUG, and the logs for making requests
stopped at 2021-03-17 18:58:22 HKT (debug log as below)
[image: image.png]
As a result, the data latency kept increasing.
[image: image.png]

I have also checked the job log and all worker logs from all steps, but
there are no errors.
CPU utilization was around 22% each before the incident.
The SDK is Java SDK 2.28.0

Would someone advise on what I can check further to investigate the issue?
Thanks!

Re: Source connector stopped without log on Dataflow (SqsIO)

Posted by Dennis Yung <de...@gmail.com>.
I consulted GCP support and found a solution.
The support engineer identified an error from the log:
“2021-03-19 10:49:01.661 JSTError unmounting disk
dev-tracking-03181836-11wi-harness-disk-0, error=UNKNOWN: Failed to execute
shell command=umount -v
/windmill/dev-tracking-03181836-11wi-harness-disk-0, exit code=8192,
errno=0, command output”

While the log does not point to an exact cause, the support engineer
suggested that I try enabling the new streaming engine.
(enableStreamingEngine)
I no longer encounter this problem after enabling the streaming engine.


On Thu, Mar 25, 2021 at 1:41 AM Chamikara Jayalath <ch...@google.com>
wrote:

> Not exactly sure of the reason but is it possible that your pipeline is
> writing to a sink (or has an otherwise expensive downstream step) that
> could be stuck resulting in increased lag ?
>
> On Wed, Mar 17, 2021 at 7:58 PM Dennis Yung <de...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am running a streaming job with the SqsIO source connector on Dataflow.
>> However, the source connector stopped retrieving events with no
>> exception/reason logged, after running for a few hours.
>> I have confirmed that the connector is not making new requests to SQS:
>> 1. I have checked on SQS side that no "receive" requests are made after
>> 2021-03-17 19:00:00 HKT
>> 2. I set the dataflow log level to DEBUG, and the logs for making
>> requests stopped at 2021-03-17 18:58:22 HKT (debug log as below)
>> [image: image.png]
>> As a result, the data latency kept increasing.
>> [image: image.png]
>>
>> I have also checked the job log and all worker logs from all steps, but
>> there are no errors.
>> CPU utilization was around 22% each before the incident.
>> The SDK is Java SDK 2.28.0
>>
>> Would someone advise on what I can check further to investigate the
>> issue? Thanks!
>>
>

Re: Source connector stopped without log on Dataflow (SqsIO)

Posted by Chamikara Jayalath <ch...@google.com>.
Not exactly sure of the reason but is it possible that your pipeline is
writing to a sink (or has an otherwise expensive downstream step) that
could be stuck resulting in increased lag ?

On Wed, Mar 17, 2021 at 7:58 PM Dennis Yung <de...@gmail.com> wrote:

> Hi,
>
> I am running a streaming job with the SqsIO source connector on Dataflow.
> However, the source connector stopped retrieving events with no
> exception/reason logged, after running for a few hours.
> I have confirmed that the connector is not making new requests to SQS:
> 1. I have checked on SQS side that no "receive" requests are made after
> 2021-03-17 19:00:00 HKT
> 2. I set the dataflow log level to DEBUG, and the logs for making requests
> stopped at 2021-03-17 18:58:22 HKT (debug log as below)
> [image: image.png]
> As a result, the data latency kept increasing.
> [image: image.png]
>
> I have also checked the job log and all worker logs from all steps, but
> there are no errors.
> CPU utilization was around 22% each before the incident.
> The SDK is Java SDK 2.28.0
>
> Would someone advise on what I can check further to investigate the issue?
> Thanks!
>