You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Juan Romero <js...@gmail.com> on 2023/12/04 18:24:56 UTC

Streaming management exception in the sink target.

Hi guys. I want to ask you about how to deal with the scenario when the
target sink (eg: jdbc, kafka, bigquery, pubsub etc) fails for any reason
and i don't want to lost the message and create a bottleneck with many
errors due an hypothetical target sink problem,  and i want to use
with_excpetion_handling in order to get the message that failing to reach
the target and send the message to an other error topic. Any idea to solve
this scenario?

Re: Streaming management exception in the sink target.

Posted by John Casey via user <us...@beam.apache.org>.
Unfortunately, there's no way to leverage the existing cross language
connector in python.

Your options are somewhat limited, in my opinion.

Option 1 (My Recommendation) : Implement a DoFn that checks your data
quality before sending it to KafkaIO. If it fails the quality check, send
it to some alternate sink.
Option 2: Wait for the feature for KafkaIO. It will be available on the
Java side soon, but there may be some delay in supporting it via cross
language
Option 3 (Very not recommended): Implement a custom KafkaIO Sink with error
handling in python. This is hard to do, and it is very easy to introduce
bugs. I do not recommend this.

On Wed, Dec 6, 2023 at 11:25 AM Juan Romero <js...@gmail.com> wrote:

> Ok Jhon. But If i want to implement an alternative for myself. What do you
> recommend in order to get the message and send it to other target (you said
> is possible)? taking in mind that we re using the kafka connector which is
> a java transformation which is invoke for python
>
> El mié, 6 dic 2023 a las 11:23, John Casey (<th...@google.com>)
> escribió:
>
>> For the moment, yes.
>>
>> On Wed, Dec 6, 2023 at 11:21 AM Juan Romero <js...@gmail.com> wrote:
>>
>>> Thanks John. Is it the same case if i want to write in a postgres table
>>> with the sql connector?
>>>
>>> El mié, 6 dic 2023 a las 11:05, John Casey (<th...@google.com>)
>>> escribió:
>>>
>>>> It is, but it's not possible to to take an existing transform, and
>>>> simply configure it to do this.
>>>>
>>>> For example (and this is what I'm doing), it's possible to write a
>>>> transform that tries to write to kafka, and upon failure, emits the failure
>>>> to an alternate pcollection.
>>>>
>>>> It's not possible (yet) to take an existing PTransform that's part of
>>>> the library, and configure it to do something other than simply retrying
>>>> failures
>>>>
>>>> On Wed, Dec 6, 2023 at 10:44 AM Juan Romero <js...@gmail.com> wrote:
>>>>
>>>>> But , is it not possible to get the message that can't reach the
>>>>> target sink and put it in another target (eg: kafka error topic where we
>>>>> can verify which messages failed to be delivered to the target)?
>>>>>
>>>>>
>>>>> El mié, 6 dic 2023 a las 10:40, John Casey via user (<
>>>>> user@beam.apache.org>) escribió:
>>>>>
>>>>>> I'm currently implementing improvements on Kafka, File, Spanner, and
>>>>>> Bigtable IOs.
>>>>>>
>>>>>> I'm planning on tackling PubSub and BQ next year.
>>>>>>
>>>>>> All of this is still in progress though, so there aren't easy
>>>>>> workarounds for the moment.
>>>>>>
>>>>>> On Tue, Dec 5, 2023 at 5:56 PM Robert Bradshaw <ro...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Currently error handling is implemented on sinks in an ad-hoc basis
>>>>>>> (if at all) but John (cc'd) is looking at improving things here.
>>>>>>>
>>>>>>> On Mon, Dec 4, 2023 at 10:25 AM Juan Romero <js...@gmail.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Hi guys. I want to ask you about how to deal with the scenario
>>>>>>> when the target sink (eg: jdbc, kafka, bigquery, pubsub etc) fails for any
>>>>>>> reason and i don't want to lost the message and create a bottleneck with
>>>>>>> many errors due an hypothetical target sink problem,  and i want to use
>>>>>>> with_excpetion_handling in order to get the message that failing to reach
>>>>>>> the target and send the message to an other error topic. Any idea to solve
>>>>>>> this scenario?
>>>>>>>
>>>>>>

Re: Streaming management exception in the sink target.

Posted by Juan Romero <js...@gmail.com>.
Ok Jhon. But If i want to implement an alternative for myself. What do you
recommend in order to get the message and send it to other target (you said
is possible)? taking in mind that we re using the kafka connector which is
a java transformation which is invoke for python

El mié, 6 dic 2023 a las 11:23, John Casey (<th...@google.com>)
escribió:

> For the moment, yes.
>
> On Wed, Dec 6, 2023 at 11:21 AM Juan Romero <js...@gmail.com> wrote:
>
>> Thanks John. Is it the same case if i want to write in a postgres table
>> with the sql connector?
>>
>> El mié, 6 dic 2023 a las 11:05, John Casey (<th...@google.com>)
>> escribió:
>>
>>> It is, but it's not possible to to take an existing transform, and
>>> simply configure it to do this.
>>>
>>> For example (and this is what I'm doing), it's possible to write a
>>> transform that tries to write to kafka, and upon failure, emits the failure
>>> to an alternate pcollection.
>>>
>>> It's not possible (yet) to take an existing PTransform that's part of
>>> the library, and configure it to do something other than simply retrying
>>> failures
>>>
>>> On Wed, Dec 6, 2023 at 10:44 AM Juan Romero <js...@gmail.com> wrote:
>>>
>>>> But , is it not possible to get the message that can't reach the target
>>>> sink and put it in another target (eg: kafka error topic where we can
>>>> verify which messages failed to be delivered to the target)?
>>>>
>>>>
>>>> El mié, 6 dic 2023 a las 10:40, John Casey via user (<
>>>> user@beam.apache.org>) escribió:
>>>>
>>>>> I'm currently implementing improvements on Kafka, File, Spanner, and
>>>>> Bigtable IOs.
>>>>>
>>>>> I'm planning on tackling PubSub and BQ next year.
>>>>>
>>>>> All of this is still in progress though, so there aren't easy
>>>>> workarounds for the moment.
>>>>>
>>>>> On Tue, Dec 5, 2023 at 5:56 PM Robert Bradshaw <ro...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Currently error handling is implemented on sinks in an ad-hoc basis
>>>>>> (if at all) but John (cc'd) is looking at improving things here.
>>>>>>
>>>>>> On Mon, Dec 4, 2023 at 10:25 AM Juan Romero <js...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > Hi guys. I want to ask you about how to deal with the scenario when
>>>>>> the target sink (eg: jdbc, kafka, bigquery, pubsub etc) fails for any
>>>>>> reason and i don't want to lost the message and create a bottleneck with
>>>>>> many errors due an hypothetical target sink problem,  and i want to use
>>>>>> with_excpetion_handling in order to get the message that failing to reach
>>>>>> the target and send the message to an other error topic. Any idea to solve
>>>>>> this scenario?
>>>>>>
>>>>>

Re: Streaming management exception in the sink target.

Posted by John Casey via user <us...@beam.apache.org>.
For the moment, yes.

On Wed, Dec 6, 2023 at 11:21 AM Juan Romero <js...@gmail.com> wrote:

> Thanks John. Is it the same case if i want to write in a postgres table
> with the sql connector?
>
> El mié, 6 dic 2023 a las 11:05, John Casey (<th...@google.com>)
> escribió:
>
>> It is, but it's not possible to to take an existing transform, and simply
>> configure it to do this.
>>
>> For example (and this is what I'm doing), it's possible to write a
>> transform that tries to write to kafka, and upon failure, emits the failure
>> to an alternate pcollection.
>>
>> It's not possible (yet) to take an existing PTransform that's part of the
>> library, and configure it to do something other than simply retrying
>> failures
>>
>> On Wed, Dec 6, 2023 at 10:44 AM Juan Romero <js...@gmail.com> wrote:
>>
>>> But , is it not possible to get the message that can't reach the target
>>> sink and put it in another target (eg: kafka error topic where we can
>>> verify which messages failed to be delivered to the target)?
>>>
>>>
>>> El mié, 6 dic 2023 a las 10:40, John Casey via user (<
>>> user@beam.apache.org>) escribió:
>>>
>>>> I'm currently implementing improvements on Kafka, File, Spanner, and
>>>> Bigtable IOs.
>>>>
>>>> I'm planning on tackling PubSub and BQ next year.
>>>>
>>>> All of this is still in progress though, so there aren't easy
>>>> workarounds for the moment.
>>>>
>>>> On Tue, Dec 5, 2023 at 5:56 PM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> Currently error handling is implemented on sinks in an ad-hoc basis
>>>>> (if at all) but John (cc'd) is looking at improving things here.
>>>>>
>>>>> On Mon, Dec 4, 2023 at 10:25 AM Juan Romero <js...@gmail.com> wrote:
>>>>> >
>>>>> > Hi guys. I want to ask you about how to deal with the scenario when
>>>>> the target sink (eg: jdbc, kafka, bigquery, pubsub etc) fails for any
>>>>> reason and i don't want to lost the message and create a bottleneck with
>>>>> many errors due an hypothetical target sink problem,  and i want to use
>>>>> with_excpetion_handling in order to get the message that failing to reach
>>>>> the target and send the message to an other error topic. Any idea to solve
>>>>> this scenario?
>>>>>
>>>>

Re: Streaming management exception in the sink target.

Posted by Juan Romero <js...@gmail.com>.
Thanks John. Is it the same case if i want to write in a postgres table
with the sql connector?

El mié, 6 dic 2023 a las 11:05, John Casey (<th...@google.com>)
escribió:

> It is, but it's not possible to to take an existing transform, and simply
> configure it to do this.
>
> For example (and this is what I'm doing), it's possible to write a
> transform that tries to write to kafka, and upon failure, emits the failure
> to an alternate pcollection.
>
> It's not possible (yet) to take an existing PTransform that's part of the
> library, and configure it to do something other than simply retrying
> failures
>
> On Wed, Dec 6, 2023 at 10:44 AM Juan Romero <js...@gmail.com> wrote:
>
>> But , is it not possible to get the message that can't reach the target
>> sink and put it in another target (eg: kafka error topic where we can
>> verify which messages failed to be delivered to the target)?
>>
>>
>> El mié, 6 dic 2023 a las 10:40, John Casey via user (<
>> user@beam.apache.org>) escribió:
>>
>>> I'm currently implementing improvements on Kafka, File, Spanner, and
>>> Bigtable IOs.
>>>
>>> I'm planning on tackling PubSub and BQ next year.
>>>
>>> All of this is still in progress though, so there aren't easy
>>> workarounds for the moment.
>>>
>>> On Tue, Dec 5, 2023 at 5:56 PM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> Currently error handling is implemented on sinks in an ad-hoc basis
>>>> (if at all) but John (cc'd) is looking at improving things here.
>>>>
>>>> On Mon, Dec 4, 2023 at 10:25 AM Juan Romero <js...@gmail.com> wrote:
>>>> >
>>>> > Hi guys. I want to ask you about how to deal with the scenario when
>>>> the target sink (eg: jdbc, kafka, bigquery, pubsub etc) fails for any
>>>> reason and i don't want to lost the message and create a bottleneck with
>>>> many errors due an hypothetical target sink problem,  and i want to use
>>>> with_excpetion_handling in order to get the message that failing to reach
>>>> the target and send the message to an other error topic. Any idea to solve
>>>> this scenario?
>>>>
>>>

Re: Streaming management exception in the sink target.

Posted by John Casey via user <us...@beam.apache.org>.
It is, but it's not possible to to take an existing transform, and simply
configure it to do this.

For example (and this is what I'm doing), it's possible to write a
transform that tries to write to kafka, and upon failure, emits the failure
to an alternate pcollection.

It's not possible (yet) to take an existing PTransform that's part of the
library, and configure it to do something other than simply retrying
failures

On Wed, Dec 6, 2023 at 10:44 AM Juan Romero <js...@gmail.com> wrote:

> But , is it not possible to get the message that can't reach the target
> sink and put it in another target (eg: kafka error topic where we can
> verify which messages failed to be delivered to the target)?
>
>
> El mié, 6 dic 2023 a las 10:40, John Casey via user (<us...@beam.apache.org>)
> escribió:
>
>> I'm currently implementing improvements on Kafka, File, Spanner, and
>> Bigtable IOs.
>>
>> I'm planning on tackling PubSub and BQ next year.
>>
>> All of this is still in progress though, so there aren't easy workarounds
>> for the moment.
>>
>> On Tue, Dec 5, 2023 at 5:56 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> Currently error handling is implemented on sinks in an ad-hoc basis
>>> (if at all) but John (cc'd) is looking at improving things here.
>>>
>>> On Mon, Dec 4, 2023 at 10:25 AM Juan Romero <js...@gmail.com> wrote:
>>> >
>>> > Hi guys. I want to ask you about how to deal with the scenario when
>>> the target sink (eg: jdbc, kafka, bigquery, pubsub etc) fails for any
>>> reason and i don't want to lost the message and create a bottleneck with
>>> many errors due an hypothetical target sink problem,  and i want to use
>>> with_excpetion_handling in order to get the message that failing to reach
>>> the target and send the message to an other error topic. Any idea to solve
>>> this scenario?
>>>
>>

Re: Streaming management exception in the sink target.

Posted by Juan Romero <js...@gmail.com>.
But , is it not possible to get the message that can't reach the target
sink and put it in another target (eg: kafka error topic where we can
verify which messages failed to be delivered to the target)?


El mié, 6 dic 2023 a las 10:40, John Casey via user (<us...@beam.apache.org>)
escribió:

> I'm currently implementing improvements on Kafka, File, Spanner, and
> Bigtable IOs.
>
> I'm planning on tackling PubSub and BQ next year.
>
> All of this is still in progress though, so there aren't easy workarounds
> for the moment.
>
> On Tue, Dec 5, 2023 at 5:56 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> Currently error handling is implemented on sinks in an ad-hoc basis
>> (if at all) but John (cc'd) is looking at improving things here.
>>
>> On Mon, Dec 4, 2023 at 10:25 AM Juan Romero <js...@gmail.com> wrote:
>> >
>> > Hi guys. I want to ask you about how to deal with the scenario when the
>> target sink (eg: jdbc, kafka, bigquery, pubsub etc) fails for any reason
>> and i don't want to lost the message and create a bottleneck with many
>> errors due an hypothetical target sink problem,  and i want to use
>> with_excpetion_handling in order to get the message that failing to reach
>> the target and send the message to an other error topic. Any idea to solve
>> this scenario?
>>
>

Re: Streaming management exception in the sink target.

Posted by John Casey via user <us...@beam.apache.org>.
I'm currently implementing improvements on Kafka, File, Spanner, and
Bigtable IOs.

I'm planning on tackling PubSub and BQ next year.

All of this is still in progress though, so there aren't easy workarounds
for the moment.

On Tue, Dec 5, 2023 at 5:56 PM Robert Bradshaw <ro...@google.com> wrote:

> Currently error handling is implemented on sinks in an ad-hoc basis
> (if at all) but John (cc'd) is looking at improving things here.
>
> On Mon, Dec 4, 2023 at 10:25 AM Juan Romero <js...@gmail.com> wrote:
> >
> > Hi guys. I want to ask you about how to deal with the scenario when the
> target sink (eg: jdbc, kafka, bigquery, pubsub etc) fails for any reason
> and i don't want to lost the message and create a bottleneck with many
> errors due an hypothetical target sink problem,  and i want to use
> with_excpetion_handling in order to get the message that failing to reach
> the target and send the message to an other error topic. Any idea to solve
> this scenario?
>

Re: Streaming management exception in the sink target.

Posted by Robert Bradshaw via user <us...@beam.apache.org>.
Currently error handling is implemented on sinks in an ad-hoc basis
(if at all) but John (cc'd) is looking at improving things here.

On Mon, Dec 4, 2023 at 10:25 AM Juan Romero <js...@gmail.com> wrote:
>
> Hi guys. I want to ask you about how to deal with the scenario when the target sink (eg: jdbc, kafka, bigquery, pubsub etc) fails for any reason and i don't want to lost the message and create a bottleneck with many errors due an hypothetical target sink problem,  and i want to use with_excpetion_handling in order to get the message that failing to reach the target and send the message to an other error topic. Any idea to solve this scenario?