You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Unais Thachuparambil <un...@careem.com> on 2018/01/11 06:22:43 UTC

Request payload size exceeds the limit: 10485760 bytes

I wrote a python dataflow job to read data from biqquery and do some
transform and save the result as bq table..

I tested with 8 days data it works fine - when I scaled to 180 days I’m
getting the below error

```"message": "Request payload size exceeds the limit: 10485760 bytes.",```


```pitools.base.py.exceptions.HttpError: HttpError accessing <
https://dataflow.googleapis.com/v1b3/projects/careem-mktg-dwh/locations/us-central1/jobs?alt=json>:
response: <{'status': '400', 'content-length': '145', 'x-xss-protection':
'1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding':
'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF',
'-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Wed, 10
Jan 2018 22:49:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc':
'hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338;
quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"',
'content-type': 'application/json; charset=UTF-8'}>, content <{
"error": {
"code": 400,
"message": "Request payload size exceeds the limit: 10485760 bytes.",
"status": "INVALID_ARGUMENT"
}

```


In short, this is what I’m doing
1 - Reading data from bigquery table using
```beam.io.BigQuerySource ```
2 - Partitioning each days using
``` beam.Partition ```
3- Applying transforms each partition and combining some output
P-Collections.
4- After the transforms, the results are saved to a biqquery date
partitioned table.

Re: Request payload size exceeds the limit: 10485760 bytes

Posted by Chamikara Jayalath <ch...@google.com>.
It's due to the size of the JSON serialized Dataflow pipeline (number of
transforms and serialized size of these transforms).

On Wed, Jan 10, 2018 at 11:40 PM Unais Thachuparambil <
unais.thachuparambil@careem.com> wrote:

> Is it because of my output I'm participating writing to 180 partitions? Or
> because of more pipeline operations & transforms
>
> On Thu, Jan 11, 2018 at 10:48 AM, Chamikara Jayalath <chamikara@google.com
> > wrote:
>
>> Dataflow service has a 10MB request size limit. Seems like you are
>> hitting this. See following for more information regarding this.
>> https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline
>>
>> Looks like your are hitting this due to number of partitions. I don't
>> think currently there's a good solution other than to execute multiple
>> jobs. We hope to introduce dynamic destinations feature to Python BQ sink
>> in the near future which will allow you to write this using a more compact
>> pipeline.
>>
>> Thanks,
>> Cham
>>
>>
>> On Wed, Jan 10, 2018 at 10:22 PM Unais Thachuparambil <
>> unais.thachuparambil@careem.com> wrote:
>>
>>> I wrote a python dataflow job to read data from biqquery and do some
>>> transform and save the result as bq table..
>>>
>>> I tested with 8 days data it works fine - when I scaled to 180 days I’m
>>> getting the below error
>>>
>>> ```"message": "Request payload size exceeds the limit: 10485760
>>> bytes.",```
>>>
>>>
>>> ```pitools.base.py.exceptions.HttpError: HttpError accessing <
>>> https://dataflow.googleapis.com/v1b3/projects/careem-mktg-dwh/locations/us-central1/jobs?alt=json>:
>>> response: <{'status': '400', 'content-length': '145', 'x-xss-protection':
>>> '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding':
>>> 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF',
>>> '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Wed, 10
>>> Jan 2018 22:49:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc':
>>> 'hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338;
>>> quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"',
>>> 'content-type': 'application/json; charset=UTF-8'}>, content <{
>>> "error": {
>>> "code": 400,
>>> "message": "Request payload size exceeds the limit: 10485760 bytes.",
>>> "status": "INVALID_ARGUMENT"
>>> }
>>>
>>> ```
>>>
>>>
>>> In short, this is what I’m doing
>>> 1 - Reading data from bigquery table using
>>> ```beam.io.BigQuerySource ```
>>> 2 - Partitioning each days using
>>> ``` beam.Partition ```
>>> 3- Applying transforms each partition and combining some output
>>> P-Collections.
>>> 4- After the transforms, the results are saved to a biqquery date
>>> partitioned table.
>>>
>>
>

Re: Request payload size exceeds the limit: 10485760 bytes

Posted by Unais Thachuparambil <un...@careem.com>.
Is it because of my output I'm participating writing to 180 partitions? Or
because of more pipeline operations & transforms

On Thu, Jan 11, 2018 at 10:48 AM, Chamikara Jayalath <ch...@google.com>
wrote:

> Dataflow service has a 10MB request size limit. Seems like you are hitting
> this. See following for more information regarding this.
> https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline
>
> Looks like your are hitting this due to number of partitions. I don't
> think currently there's a good solution other than to execute multiple
> jobs. We hope to introduce dynamic destinations feature to Python BQ sink
> in the near future which will allow you to write this using a more compact
> pipeline.
>
> Thanks,
> Cham
>
>
> On Wed, Jan 10, 2018 at 10:22 PM Unais Thachuparambil <
> unais.thachuparambil@careem.com> wrote:
>
>> I wrote a python dataflow job to read data from biqquery and do some
>> transform and save the result as bq table..
>>
>> I tested with 8 days data it works fine - when I scaled to 180 days I’m
>> getting the below error
>>
>> ```"message": "Request payload size exceeds the limit: 10485760
>> bytes.",```
>>
>>
>> ```pitools.base.py.exceptions.HttpError: HttpError accessing <
>> https://dataflow.googleapis.com/v1b3/projects/careem-mktg-
>> dwh/locations/us-central1/jobs?alt=json>: response: <{'status': '400',
>> 'content-length': '145', 'x-xss-protection': '1; mode=block',
>> 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked',
>> 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding':
>> 'gzip', 'cache-control': 'private', 'date': 'Wed, 10 Jan 2018 22:49:32
>> GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'hq=":443"; ma=2592000;
>> quic=51303431; quic=51303339; quic=51303338; quic=51303337;
>> quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"', 'content-type':
>> 'application/json; charset=UTF-8'}>, content <{
>> "error": {
>> "code": 400,
>> "message": "Request payload size exceeds the limit: 10485760 bytes.",
>> "status": "INVALID_ARGUMENT"
>> }
>>
>> ```
>>
>>
>> In short, this is what I’m doing
>> 1 - Reading data from bigquery table using
>> ```beam.io.BigQuerySource ```
>> 2 - Partitioning each days using
>> ``` beam.Partition ```
>> 3- Applying transforms each partition and combining some output
>> P-Collections.
>> 4- After the transforms, the results are saved to a biqquery date
>> partitioned table.
>>
>

Re: Request payload size exceeds the limit: 10485760 bytes

Posted by Chamikara Jayalath <ch...@google.com>.
Dataflow service has a 10MB request size limit. Seems like you are hitting
this. See following for more information regarding this.
https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline

Looks like your are hitting this due to number of partitions. I don't think
currently there's a good solution other than to execute multiple jobs. We
hope to introduce dynamic destinations feature to Python BQ sink in the
near future which will allow you to write this using a more compact
pipeline.

Thanks,
Cham

On Wed, Jan 10, 2018 at 10:22 PM Unais Thachuparambil <
unais.thachuparambil@careem.com> wrote:

> I wrote a python dataflow job to read data from biqquery and do some
> transform and save the result as bq table..
>
> I tested with 8 days data it works fine - when I scaled to 180 days I’m
> getting the below error
>
> ```"message": "Request payload size exceeds the limit: 10485760 bytes.",```
>
>
> ```pitools.base.py.exceptions.HttpError: HttpError accessing <
> https://dataflow.googleapis.com/v1b3/projects/careem-mktg-dwh/locations/us-central1/jobs?alt=json>:
> response: <{'status': '400', 'content-length': '145', 'x-xss-protection':
> '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding':
> 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF',
> '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Wed, 10
> Jan 2018 22:49:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc':
> 'hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338;
> quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"',
> 'content-type': 'application/json; charset=UTF-8'}>, content <{
> "error": {
> "code": 400,
> "message": "Request payload size exceeds the limit: 10485760 bytes.",
> "status": "INVALID_ARGUMENT"
> }
>
> ```
>
>
> In short, this is what I’m doing
> 1 - Reading data from bigquery table using
> ```beam.io.BigQuerySource ```
> 2 - Partitioning each days using
> ``` beam.Partition ```
> 3- Applying transforms each partition and combining some output
> P-Collections.
> 4- After the transforms, the results are saved to a biqquery date
> partitioned table.
>