You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Himanshu Bari <hi...@gmail.com> on 2017/02/14 20:00:10 UTC

At-least once semantics for Kafka to Cassndra ingest

How to ensure that the Kafka to Cassandra ingestion pipeline in Apex will
guarantee exactly once processing semantics.
Eg. Message was read from Kafka but apex app died before it was written
successfully to Cassandra.

Re: At-least once semantics for Kafka to Cassndra ingest

Posted by Himanshu Bari <hi...@gmail.com>.
Great. Thats what I thought. But, I suspect we might be seeing otherwise in
our app. Is there anything that needs to be done wrt to the offset
management in the kafka reader app to ensure this works correctly? Eg Does
the developer need to decide when the kafka reader app moves the Kafka
topic offset forward to acknowledge that all the messages read to that
point were successfully written into Cassandra?

On Feb 15, 2017 1:32 AM, "Pramod Immaneni" <pr...@datatorrent.com> wrote:

> That is the default behavior Himanshu. The Kafka operator will re-read
> from the older checkpointed offset if there is a failure in the Kafka
> operator. If there is a failure in any of the downstream operators
> including Cassandra, they will read the previous messages from the
> buffer present in the upstream operator, so in effect you won't lose
> messages.
>
> Thanks
>
> > On Feb 15, 2017, at 10:34 AM, Himanshu Bari <hi...@gmail.com>
> wrote:
> >
> > Hey guys
> >
> > We are looking for at least once semantics and not exactly once. Sinxe
> the
> > sink is Cassandra it is ok if tge same record is written twice..iy will
> > just overwrite on any reprocessing...
> >
> > Himanshu
> >
> >> On Feb 14, 2017 9:01 PM, "Priyanka Gugale" <pr...@datatorrent.com>
> wrote:
> >>
> >> For this particular case kafka -> cassandra, you need not worry about
> >> partial windows. Cassandra output operator does batch processing i.e.
> all
> >> records received in a window will be written at end window. So IMO, if
> you
> >> set exactly once processing on Kafka Input operator, and choose
> >> transactional cassandra output operator you will achieve exactly once
> >> processing. If you have other operators in your dag you might want to
> make
> >> sure they are idempotent (please check blog shared by Sandesh for
> >> reference).
> >>
> >> -Priyanka
> >>
> >> On Wed, Feb 15, 2017 at 4:06 AM, Sandesh Hegde <sandesh@datatorrent.com
> >
> >> wrote:
> >>
> >>> Settings mentioned by Sanjay, will only guarantee exactly once for
> >> Windows,
> >>> but not for partial window processed by the operator, in a way that
> >> setting
> >>> is a misnomer.
> >>> To achieve Exactly once, there are some precoditions that need to be
> met
> >>> along with the support in the output operator. Here is a blog that
> gives
> >>> the idea about exactly once,
> >>> https://www.datatorrent.com/blog/end-to-end-exactly-once-
> >> with-apache-apex/
> >>>
> >>> On Tue, Feb 14, 2017 at 2:11 PM Sanjay Pujare <sa...@datatorrent.com>
> >>> wrote:
> >>>
> >>>> Have you taken a look at
> >>>> http://apex.apache.org/docs/apex/application_development/#
> exactly-once
> >> ?
> >>>> i.e. setting that processing mode on all the operators in the pipeline
> >> .
> >>>>
> >>>> Join us at Apex Big Data World-San Jose <
> >>>> http://www.apexbigdata.com/san-jose.html>, April 4, 2017!
> >>>>
> >>>> http://www.apexbigdata.com/san-jose-register.html
> >>>>
> >>>>
> >>>> On 2/14/17, 12:00 PM, "Himanshu Bari" <hi...@gmail.com> wrote:
> >>>>
> >>>>    How to ensure that the Kafka to Cassandra ingestion pipeline in
> >> Apex
> >>>> will
> >>>>    guarantee exactly once processing semantics.
> >>>>    Eg. Message was read from Kafka but apex app died before it was
> >>> written
> >>>>    successfully to Cassandra.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>> *Join us at Apex Big Data World-San Jose
> >>> <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
> >>> [image: http://www.apexbigdata.com/san-jose-register.html]
> >>> <http://www.apexbigdata.com/san-jose-register.html>
> >>>
> >>
>

Re: At-least once semantics for Kafka to Cassndra ingest

Posted by Pramod Immaneni <pr...@datatorrent.com>.
That is the default behavior Himanshu. The Kafka operator will re-read
from the older checkpointed offset if there is a failure in the Kafka
operator. If there is a failure in any of the downstream operators
including Cassandra, they will read the previous messages from the
buffer present in the upstream operator, so in effect you won't lose
messages.

Thanks

> On Feb 15, 2017, at 10:34 AM, Himanshu Bari <hi...@gmail.com> wrote:
>
> Hey guys
>
> We are looking for at least once semantics and not exactly once. Sinxe the
> sink is Cassandra it is ok if tge same record is written twice..iy will
> just overwrite on any reprocessing...
>
> Himanshu
>
>> On Feb 14, 2017 9:01 PM, "Priyanka Gugale" <pr...@datatorrent.com> wrote:
>>
>> For this particular case kafka -> cassandra, you need not worry about
>> partial windows. Cassandra output operator does batch processing i.e. all
>> records received in a window will be written at end window. So IMO, if you
>> set exactly once processing on Kafka Input operator, and choose
>> transactional cassandra output operator you will achieve exactly once
>> processing. If you have other operators in your dag you might want to make
>> sure they are idempotent (please check blog shared by Sandesh for
>> reference).
>>
>> -Priyanka
>>
>> On Wed, Feb 15, 2017 at 4:06 AM, Sandesh Hegde <sa...@datatorrent.com>
>> wrote:
>>
>>> Settings mentioned by Sanjay, will only guarantee exactly once for
>> Windows,
>>> but not for partial window processed by the operator, in a way that
>> setting
>>> is a misnomer.
>>> To achieve Exactly once, there are some precoditions that need to be met
>>> along with the support in the output operator. Here is a blog that gives
>>> the idea about exactly once,
>>> https://www.datatorrent.com/blog/end-to-end-exactly-once-
>> with-apache-apex/
>>>
>>> On Tue, Feb 14, 2017 at 2:11 PM Sanjay Pujare <sa...@datatorrent.com>
>>> wrote:
>>>
>>>> Have you taken a look at
>>>> http://apex.apache.org/docs/apex/application_development/#exactly-once
>> ?
>>>> i.e. setting that processing mode on all the operators in the pipeline
>> .
>>>>
>>>> Join us at Apex Big Data World-San Jose <
>>>> http://www.apexbigdata.com/san-jose.html>, April 4, 2017!
>>>>
>>>> http://www.apexbigdata.com/san-jose-register.html
>>>>
>>>>
>>>> On 2/14/17, 12:00 PM, "Himanshu Bari" <hi...@gmail.com> wrote:
>>>>
>>>>    How to ensure that the Kafka to Cassandra ingestion pipeline in
>> Apex
>>>> will
>>>>    guarantee exactly once processing semantics.
>>>>    Eg. Message was read from Kafka but apex app died before it was
>>> written
>>>>    successfully to Cassandra.
>>>>
>>>>
>>>>
>>>> --
>>> *Join us at Apex Big Data World-San Jose
>>> <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
>>> [image: http://www.apexbigdata.com/san-jose-register.html]
>>> <http://www.apexbigdata.com/san-jose-register.html>
>>>
>>

Re: At-least once semantics for Kafka to Cassndra ingest

Posted by Himanshu Bari <hi...@gmail.com>.
Hey guys

We are looking for at least once semantics and not exactly once. Sinxe the
sink is Cassandra it is ok if tge same record is written twice..iy will
just overwrite on any reprocessing...

Himanshu

On Feb 14, 2017 9:01 PM, "Priyanka Gugale" <pr...@datatorrent.com> wrote:

> For this particular case kafka -> cassandra, you need not worry about
> partial windows. Cassandra output operator does batch processing i.e. all
> records received in a window will be written at end window. So IMO, if you
> set exactly once processing on Kafka Input operator, and choose
> transactional cassandra output operator you will achieve exactly once
> processing. If you have other operators in your dag you might want to make
> sure they are idempotent (please check blog shared by Sandesh for
> reference).
>
> -Priyanka
>
> On Wed, Feb 15, 2017 at 4:06 AM, Sandesh Hegde <sa...@datatorrent.com>
> wrote:
>
> > Settings mentioned by Sanjay, will only guarantee exactly once for
> Windows,
> > but not for partial window processed by the operator, in a way that
> setting
> > is a misnomer.
> > To achieve Exactly once, there are some precoditions that need to be met
> > along with the support in the output operator. Here is a blog that gives
> > the idea about exactly once,
> > https://www.datatorrent.com/blog/end-to-end-exactly-once-
> with-apache-apex/
> >
> > On Tue, Feb 14, 2017 at 2:11 PM Sanjay Pujare <sa...@datatorrent.com>
> > wrote:
> >
> > > Have you taken a look at
> > > http://apex.apache.org/docs/apex/application_development/#exactly-once
> ?
> > > i.e. setting that processing mode on all the operators in the pipeline
> .
> > >
> > > Join us at Apex Big Data World-San Jose <
> > > http://www.apexbigdata.com/san-jose.html>, April 4, 2017!
> > >
> > > http://www.apexbigdata.com/san-jose-register.html
> > >
> > >
> > > On 2/14/17, 12:00 PM, "Himanshu Bari" <hi...@gmail.com> wrote:
> > >
> > >     How to ensure that the Kafka to Cassandra ingestion pipeline in
> Apex
> > > will
> > >     guarantee exactly once processing semantics.
> > >     Eg. Message was read from Kafka but apex app died before it was
> > written
> > >     successfully to Cassandra.
> > >
> > >
> > >
> > > --
> > *Join us at Apex Big Data World-San Jose
> > <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
> > [image: http://www.apexbigdata.com/san-jose-register.html]
> > <http://www.apexbigdata.com/san-jose-register.html>
> >
>

Re: At-least once semantics for Kafka to Cassndra ingest

Posted by Priyanka Gugale <pr...@datatorrent.com>.
For this particular case kafka -> cassandra, you need not worry about
partial windows. Cassandra output operator does batch processing i.e. all
records received in a window will be written at end window. So IMO, if you
set exactly once processing on Kafka Input operator, and choose
transactional cassandra output operator you will achieve exactly once
processing. If you have other operators in your dag you might want to make
sure they are idempotent (please check blog shared by Sandesh for
reference).

-Priyanka

On Wed, Feb 15, 2017 at 4:06 AM, Sandesh Hegde <sa...@datatorrent.com>
wrote:

> Settings mentioned by Sanjay, will only guarantee exactly once for Windows,
> but not for partial window processed by the operator, in a way that setting
> is a misnomer.
> To achieve Exactly once, there are some precoditions that need to be met
> along with the support in the output operator. Here is a blog that gives
> the idea about exactly once,
> https://www.datatorrent.com/blog/end-to-end-exactly-once-with-apache-apex/
>
> On Tue, Feb 14, 2017 at 2:11 PM Sanjay Pujare <sa...@datatorrent.com>
> wrote:
>
> > Have you taken a look at
> > http://apex.apache.org/docs/apex/application_development/#exactly-once ?
> > i.e. setting that processing mode on all the operators in the pipeline .
> >
> > Join us at Apex Big Data World-San Jose <
> > http://www.apexbigdata.com/san-jose.html>, April 4, 2017!
> >
> > http://www.apexbigdata.com/san-jose-register.html
> >
> >
> > On 2/14/17, 12:00 PM, "Himanshu Bari" <hi...@gmail.com> wrote:
> >
> >     How to ensure that the Kafka to Cassandra ingestion pipeline in Apex
> > will
> >     guarantee exactly once processing semantics.
> >     Eg. Message was read from Kafka but apex app died before it was
> written
> >     successfully to Cassandra.
> >
> >
> >
> > --
> *Join us at Apex Big Data World-San Jose
> <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
> [image: http://www.apexbigdata.com/san-jose-register.html]
> <http://www.apexbigdata.com/san-jose-register.html>
>

Re: At-least once semantics for Kafka to Cassndra ingest

Posted by Sandesh Hegde <sa...@datatorrent.com>.
Settings mentioned by Sanjay, will only guarantee exactly once for Windows,
but not for partial window processed by the operator, in a way that setting
is a misnomer.
To achieve Exactly once, there are some precoditions that need to be met
along with the support in the output operator. Here is a blog that gives
the idea about exactly once,
https://www.datatorrent.com/blog/end-to-end-exactly-once-with-apache-apex/

On Tue, Feb 14, 2017 at 2:11 PM Sanjay Pujare <sa...@datatorrent.com>
wrote:

> Have you taken a look at
> http://apex.apache.org/docs/apex/application_development/#exactly-once ?
> i.e. setting that processing mode on all the operators in the pipeline .
>
> Join us at Apex Big Data World-San Jose <
> http://www.apexbigdata.com/san-jose.html>, April 4, 2017!
>
> http://www.apexbigdata.com/san-jose-register.html
>
>
> On 2/14/17, 12:00 PM, "Himanshu Bari" <hi...@gmail.com> wrote:
>
>     How to ensure that the Kafka to Cassandra ingestion pipeline in Apex
> will
>     guarantee exactly once processing semantics.
>     Eg. Message was read from Kafka but apex app died before it was written
>     successfully to Cassandra.
>
>
>
> --
*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

Re: At-least once semantics for Kafka to Cassndra ingest

Posted by Sanjay Pujare <sa...@datatorrent.com>.
Have you taken a look at http://apex.apache.org/docs/apex/application_development/#exactly-once ? i.e. setting that processing mode on all the operators in the pipeline .

Join us at Apex Big Data World-San Jose <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!
 
http://www.apexbigdata.com/san-jose-register.html
 

On 2/14/17, 12:00 PM, "Himanshu Bari" <hi...@gmail.com> wrote:

    How to ensure that the Kafka to Cassandra ingestion pipeline in Apex will
    guarantee exactly once processing semantics.
    Eg. Message was read from Kafka but apex app died before it was written
    successfully to Cassandra.