You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Alan Krumholz <al...@betterup.co> on 2020/02/05 15:19:42 UTC

seems beam.util.GroupIntoBatches is not supported in DataFlow. Any alternative?

Hello, I'm having issues running beam.util.GroupIntoBatches() in DataFlow.

I get the following error message:

Exception: Requested execution of a stateful DoFn, but no user state
> context is available. This likely means that the current runner does not
> support the execution of stateful DoFns


Seems to be related to:
https://stackoverflow.com/questions/56403572/no-userstate-context-is-available-google-cloud-dataflow

Is there another way I can achieve the same using other beam function?

I basically want to batch rows into groups of 100 as it is a lot faster to
transform all at once than doing it 1 by 1.

I also was planning to use this function for a custom snowflake sink (so I
could insert many rows at once)

I'm sure there must be another way to do this in DataFlow but not sure how?

Thanks so much!

Re: seems beam.util.GroupIntoBatches is not supported in DataFlow. Any alternative?

Posted by Robert Bradshaw <ro...@google.com>.
Yes, you should use BatchElements. Stateful DoFns are not yet
supported for Python Dataflow. (The difference is that
GroupIntoBatches has the capability to batch across bundles, which can
be important for streaming.)



On Wed, Feb 5, 2020 at 7:53 AM Alan Krumholz <al...@betterup.co> wrote:
>
> OK, seems like beam.BatchElements(max_batch_size=x) will do the trick for me and runs fine in DataFlow!
>
> On Wed, Feb 5, 2020 at 7:38 AM Alan Krumholz <al...@betterup.co> wrote:
>>
>> Actually beam.GroupIntoBatches() gives me the same error as  beam.util.GroupIntoBatches() :(
>> back to square one.
>>
>> Any other ideas?
>>
>> Thank you!
>>
>>
>> On Wed, Feb 5, 2020 at 7:32 AM Alan Krumholz <al...@betterup.co> wrote:
>>>
>>> Never mind there seems to be a  beam.GroupIntoBatches()  that I should have originally used instead of beam.util.GroupIntoBatches()....
>>>
>>> On Wed, Feb 5, 2020 at 7:19 AM Alan Krumholz <al...@betterup.co> wrote:
>>>>
>>>> Hello, I'm having issues running beam.util.GroupIntoBatches() in DataFlow.
>>>>
>>>> I get the following error message:
>>>>
>>>>> Exception: Requested execution of a stateful DoFn, but no user state context is available. This likely means that the current runner does not support the execution of stateful DoFns
>>>>
>>>>
>>>> Seems to be related to:
>>>> https://stackoverflow.com/questions/56403572/no-userstate-context-is-available-google-cloud-dataflow
>>>>
>>>> Is there another way I can achieve the same using other beam function?
>>>>
>>>> I basically want to batch rows into groups of 100 as it is a lot faster to transform all at once than doing it 1 by 1.
>>>>
>>>> I also was planning to use this function for a custom snowflake sink (so I could insert many rows at once)
>>>>
>>>> I'm sure there must be another way to do this in DataFlow but not sure how?
>>>>
>>>> Thanks so much!

Re: seems beam.util.GroupIntoBatches is not supported in DataFlow. Any alternative?

Posted by Alan Krumholz <al...@betterup.co>.
OK, seems like beam.BatchElements(max_batch_size=x) will do the trick for
me and runs fine in DataFlow!

On Wed, Feb 5, 2020 at 7:38 AM Alan Krumholz <al...@betterup.co>
wrote:

> Actually beam.GroupIntoBatches() gives me the same error as
> beam.util.GroupIntoBatches() :(
> back to square one.
>
> Any other ideas?
>
> Thank you!
>
>
> On Wed, Feb 5, 2020 at 7:32 AM Alan Krumholz <al...@betterup.co>
> wrote:
>
>> Never mind there seems to be a  beam.GroupIntoBatches()  that I
>> should have originally used instead of beam.util.GroupIntoBatches()....
>>
>> On Wed, Feb 5, 2020 at 7:19 AM Alan Krumholz <al...@betterup.co>
>> wrote:
>>
>>> Hello, I'm having issues running beam.util.GroupIntoBatches() in
>>> DataFlow.
>>>
>>> I get the following error message:
>>>
>>> Exception: Requested execution of a stateful DoFn, but no user state
>>>> context is available. This likely means that the current runner does not
>>>> support the execution of stateful DoFns
>>>
>>>
>>> Seems to be related to:
>>>
>>> https://stackoverflow.com/questions/56403572/no-userstate-context-is-available-google-cloud-dataflow
>>>
>>> Is there another way I can achieve the same using other beam function?
>>>
>>> I basically want to batch rows into groups of 100 as it is a lot faster
>>> to transform all at once than doing it 1 by 1.
>>>
>>> I also was planning to use this function for a custom snowflake sink (so
>>> I could insert many rows at once)
>>>
>>> I'm sure there must be another way to do this in DataFlow but not sure
>>> how?
>>>
>>> Thanks so much!
>>>
>>

Re: seems beam.util.GroupIntoBatches is not supported in DataFlow. Any alternative?

Posted by Alan Krumholz <al...@betterup.co>.
Actually beam.GroupIntoBatches() gives me the same error as
beam.util.GroupIntoBatches() :(
back to square one.

Any other ideas?

Thank you!


On Wed, Feb 5, 2020 at 7:32 AM Alan Krumholz <al...@betterup.co>
wrote:

> Never mind there seems to be a  beam.GroupIntoBatches()  that I
> should have originally used instead of beam.util.GroupIntoBatches()....
>
> On Wed, Feb 5, 2020 at 7:19 AM Alan Krumholz <al...@betterup.co>
> wrote:
>
>> Hello, I'm having issues running beam.util.GroupIntoBatches() in DataFlow.
>>
>> I get the following error message:
>>
>> Exception: Requested execution of a stateful DoFn, but no user state
>>> context is available. This likely means that the current runner does not
>>> support the execution of stateful DoFns
>>
>>
>> Seems to be related to:
>>
>> https://stackoverflow.com/questions/56403572/no-userstate-context-is-available-google-cloud-dataflow
>>
>> Is there another way I can achieve the same using other beam function?
>>
>> I basically want to batch rows into groups of 100 as it is a lot faster
>> to transform all at once than doing it 1 by 1.
>>
>> I also was planning to use this function for a custom snowflake sink (so
>> I could insert many rows at once)
>>
>> I'm sure there must be another way to do this in DataFlow but not sure
>> how?
>>
>> Thanks so much!
>>
>

Re: seems beam.util.GroupIntoBatches is not supported in DataFlow. Any alternative?

Posted by Alan Krumholz <al...@betterup.co>.
Never mind there seems to be a  beam.GroupIntoBatches()  that I should have
originally used instead of beam.util.GroupIntoBatches()....

On Wed, Feb 5, 2020 at 7:19 AM Alan Krumholz <al...@betterup.co>
wrote:

> Hello, I'm having issues running beam.util.GroupIntoBatches() in DataFlow.
>
> I get the following error message:
>
> Exception: Requested execution of a stateful DoFn, but no user state
>> context is available. This likely means that the current runner does not
>> support the execution of stateful DoFns
>
>
> Seems to be related to:
>
> https://stackoverflow.com/questions/56403572/no-userstate-context-is-available-google-cloud-dataflow
>
> Is there another way I can achieve the same using other beam function?
>
> I basically want to batch rows into groups of 100 as it is a lot faster to
> transform all at once than doing it 1 by 1.
>
> I also was planning to use this function for a custom snowflake sink (so I
> could insert many rows at once)
>
> I'm sure there must be another way to do this in DataFlow but not sure how?
>
> Thanks so much!
>