You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Andrea Sella <an...@radicalbit.io> on 2016/05/06 10:45:00 UTC

OutputFormat in streaming job

Hi,

I created a custom OutputFormat to send data to a remote actor, there are
issues to use an OutputFormat into a stream job? Or it will treat like a
Sink?

I prefer to use it in order to create a custom ActorSystem per TM in the
configure method.

Cheers,
Andrea

Re: OutputFormat in streaming job

Posted by Andrea Sella <an...@radicalbit.io>.
Hi Fabian,

So I misunderstood the behaviour of configure(), thank you.

Andrea

2016-05-06 14:17 GMT+02:00 Fabian Hueske <fh...@gmail.com>:

> Hi Andrea,
>
> actually, OutputFormat.configure() will also be invoked per task. So you
> would also end up with 16 ActorSystems.
> However, I think you can use synchronized singleton object to start one
> ActorSystem per TM (each TM and all tasks run in a single JVM).
>
> So from the point of view of configure(), I think it does not make a
> difference whether to use an OutputFormat or a RichSinkFunction.
> I would rather go for the SinkFunction, which is better suited for
> streaming jobs.
>
> Cheers, Fabian
>
> 2016-05-06 14:10 GMT+02:00 Andrea Sella <an...@radicalbit.io>:
>
>> Hi Fabian,
>>
>> ATM I am not interesting to guarantee exactly-once processing, thank you
>> for the clarification.
>>
>> As far as I know, it is not present a similar method as OutputFormat's
>> configure for RichSinkFunction, correct? So I am not able to instantiate an
>> ActorSystem per TM but I have to instantiate an ActorSystem per TaskSlot,
>> which it is unsuitable because ActorSystem is very heavy.
>>
>> Example:
>> Outputformat => 2 TM => 16 parallelism => 2 ActorSystem (instantiate into
>> OutputFormat's configure method)
>> Sink => 2 TM => 16 parallelism => 16 Actor System  (instantiate into
>> RichSinkFunction's open method)
>>
>> Am I wrong?
>>
>> Thanks again,
>> Andrea
>>
>> 2016-05-06 13:47 GMT+02:00 Fabian Hueske <fh...@gmail.com>:
>>
>>> Hi Andrea,
>>>
>>> you can use any OutputFormat to emit data from a DataStream using the
>>> writeUsingOutputFormat() method.
>>> However, this method does not guarantee exactly-once processing. In case
>>> of a failure, it might emit some records a second time. Hence the results
>>> will be written at least once.
>>>
>>> Hope this helps,
>>> Fabian
>>>
>>> 2016-05-06 12:45 GMT+02:00 Andrea Sella <an...@radicalbit.io>:
>>>
>>>> Hi,
>>>>
>>>> I created a custom OutputFormat to send data to a remote actor, there
>>>> are issues to use an OutputFormat into a stream job? Or it will treat like
>>>> a Sink?
>>>>
>>>> I prefer to use it in order to create a custom ActorSystem per TM in
>>>> the configure method.
>>>>
>>>> Cheers,
>>>> Andrea
>>>>
>>>
>>>
>>
>

Re: OutputFormat in streaming job

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Andrea,

actually, OutputFormat.configure() will also be invoked per task. So you
would also end up with 16 ActorSystems.
However, I think you can use synchronized singleton object to start one
ActorSystem per TM (each TM and all tasks run in a single JVM).

So from the point of view of configure(), I think it does not make a
difference whether to use an OutputFormat or a RichSinkFunction.
I would rather go for the SinkFunction, which is better suited for
streaming jobs.

Cheers, Fabian

2016-05-06 14:10 GMT+02:00 Andrea Sella <an...@radicalbit.io>:

> Hi Fabian,
>
> ATM I am not interesting to guarantee exactly-once processing, thank you
> for the clarification.
>
> As far as I know, it is not present a similar method as OutputFormat's
> configure for RichSinkFunction, correct? So I am not able to instantiate an
> ActorSystem per TM but I have to instantiate an ActorSystem per TaskSlot,
> which it is unsuitable because ActorSystem is very heavy.
>
> Example:
> Outputformat => 2 TM => 16 parallelism => 2 ActorSystem (instantiate into
> OutputFormat's configure method)
> Sink => 2 TM => 16 parallelism => 16 Actor System  (instantiate into
> RichSinkFunction's open method)
>
> Am I wrong?
>
> Thanks again,
> Andrea
>
> 2016-05-06 13:47 GMT+02:00 Fabian Hueske <fh...@gmail.com>:
>
>> Hi Andrea,
>>
>> you can use any OutputFormat to emit data from a DataStream using the
>> writeUsingOutputFormat() method.
>> However, this method does not guarantee exactly-once processing. In case
>> of a failure, it might emit some records a second time. Hence the results
>> will be written at least once.
>>
>> Hope this helps,
>> Fabian
>>
>> 2016-05-06 12:45 GMT+02:00 Andrea Sella <an...@radicalbit.io>:
>>
>>> Hi,
>>>
>>> I created a custom OutputFormat to send data to a remote actor, there
>>> are issues to use an OutputFormat into a stream job? Or it will treat like
>>> a Sink?
>>>
>>> I prefer to use it in order to create a custom ActorSystem per TM in the
>>> configure method.
>>>
>>> Cheers,
>>> Andrea
>>>
>>
>>
>

Re: OutputFormat in streaming job

Posted by Andrea Sella <an...@radicalbit.io>.
Hi Fabian,

ATM I am not interesting to guarantee exactly-once processing, thank you
for the clarification.

As far as I know, it is not present a similar method as OutputFormat's
configure for RichSinkFunction, correct? So I am not able to instantiate an
ActorSystem per TM but I have to instantiate an ActorSystem per TaskSlot,
which it is unsuitable because ActorSystem is very heavy.

Example:
Outputformat => 2 TM => 16 parallelism => 2 ActorSystem (instantiate into
OutputFormat's configure method)
Sink => 2 TM => 16 parallelism => 16 Actor System  (instantiate into
RichSinkFunction's open method)

Am I wrong?

Thanks again,
Andrea

2016-05-06 13:47 GMT+02:00 Fabian Hueske <fh...@gmail.com>:

> Hi Andrea,
>
> you can use any OutputFormat to emit data from a DataStream using the
> writeUsingOutputFormat() method.
> However, this method does not guarantee exactly-once processing. In case
> of a failure, it might emit some records a second time. Hence the results
> will be written at least once.
>
> Hope this helps,
> Fabian
>
> 2016-05-06 12:45 GMT+02:00 Andrea Sella <an...@radicalbit.io>:
>
>> Hi,
>>
>> I created a custom OutputFormat to send data to a remote actor, there are
>> issues to use an OutputFormat into a stream job? Or it will treat like a
>> Sink?
>>
>> I prefer to use it in order to create a custom ActorSystem per TM in the
>> configure method.
>>
>> Cheers,
>> Andrea
>>
>
>

Re: OutputFormat in streaming job

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Andrea,

you can use any OutputFormat to emit data from a DataStream using the
writeUsingOutputFormat() method.
However, this method does not guarantee exactly-once processing. In case of
a failure, it might emit some records a second time. Hence the results will
be written at least once.

Hope this helps,
Fabian

2016-05-06 12:45 GMT+02:00 Andrea Sella <an...@radicalbit.io>:

> Hi,
>
> I created a custom OutputFormat to send data to a remote actor, there are
> issues to use an OutputFormat into a stream job? Or it will treat like a
> Sink?
>
> I prefer to use it in order to create a custom ActorSystem per TM in the
> configure method.
>
> Cheers,
> Andrea
>