You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Mikhail Gryzykhin <mi...@google.com> on 2019/07/01 17:53:55 UTC

BQ IO GC thrashing when specifying .withMethod(STREAMING_INSERTS)

Hello everybody,

This question is regarding user post on StackOverflow
<https://stackoverflow.com/questions/56823629/gcp-dataflow-running-streaming-inserts-into-bigquery-gc-thrashing>
.

My understanding of problem is that setting .withMethod(STREAMING_INSERTS)
on BigQueryIO sink causes GC thrashing on big amount of entries.

Is there a known issue or information how to start triaging this?

Search on Jira shown me this ticket, but it is not directly connected with
the issue: https://issues.apache.org/jira/browse/BEAM-7666

Thank you,
Mikhail.

Re: BQ IO GC thrashing when specifying .withMethod(STREAMING_INSERTS)

Posted by Lukasz Cwik <lc...@google.com>.
I think the BQ streaming writes buffers data into batches and sends them
when used with STREAMING_INSERTS.

Have you been able to ask the user to get a heap dump to see what was using
the majority of memory?

On Mon, Jul 1, 2019 at 2:34 PM Lukasz Cwik <lc...@google.com> wrote:

> I think the Bq
>
> On Mon, Jul 1, 2019 at 10:54 AM Mikhail Gryzykhin <mi...@google.com>
> wrote:
>
>> Hello everybody,
>>
>> This question is regarding user post on StackOverflow
>> <https://stackoverflow.com/questions/56823629/gcp-dataflow-running-streaming-inserts-into-bigquery-gc-thrashing>
>> .
>>
>> My understanding of problem is that
>> setting .withMethod(STREAMING_INSERTS) on BigQueryIO sink causes GC
>> thrashing on big amount of entries.
>>
>> Is there a known issue or information how to start triaging this?
>>
>> Search on Jira shown me this ticket, but it is not directly connected
>> with the issue: https://issues.apache.org/jira/browse/BEAM-7666
>>
>> Thank you,
>> Mikhail.
>>
>

Re: BQ IO GC thrashing when specifying .withMethod(STREAMING_INSERTS)

Posted by Lukasz Cwik <lc...@google.com>.
I think the Bq

On Mon, Jul 1, 2019 at 10:54 AM Mikhail Gryzykhin <mi...@google.com> wrote:

> Hello everybody,
>
> This question is regarding user post on StackOverflow
> <https://stackoverflow.com/questions/56823629/gcp-dataflow-running-streaming-inserts-into-bigquery-gc-thrashing>
> .
>
> My understanding of problem is that setting .withMethod(STREAMING_INSERTS)
> on BigQueryIO sink causes GC thrashing on big amount of entries.
>
> Is there a known issue or information how to start triaging this?
>
> Search on Jira shown me this ticket, but it is not directly connected with
> the issue: https://issues.apache.org/jira/browse/BEAM-7666
>
> Thank you,
> Mikhail.
>

Re: BQ IO GC thrashing when specifying .withMethod(STREAMING_INSERTS)

Posted by Reuven Lax <re...@google.com>.
This is not really designed to be used from batch pipelines. From batch,
you will definitely overwhelm BQ's quota, causing all sorts of problems.

On Mon, Jul 1, 2019 at 10:54 AM Mikhail Gryzykhin <mi...@google.com> wrote:

> Hello everybody,
>
> This question is regarding user post on StackOverflow
> <https://stackoverflow.com/questions/56823629/gcp-dataflow-running-streaming-inserts-into-bigquery-gc-thrashing>
> .
>
> My understanding of problem is that setting .withMethod(STREAMING_INSERTS)
> on BigQueryIO sink causes GC thrashing on big amount of entries.
>
> Is there a known issue or information how to start triaging this?
>
> Search on Jira shown me this ticket, but it is not directly connected with
> the issue: https://issues.apache.org/jira/browse/BEAM-7666
>
> Thank you,
> Mikhail.
>