You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Ashwin Chandra Putta <as...@gmail.com> on 2016/03/29 01:47:46 UTC

proposal for reconciled JDBC output operator

There are many use cases in which we are writing tuples to external system
using JDBC etc. There are instances when the external system might be slow
and down for some time. In those cases, the current implementation of jdbc
output operators fail and restart until the external system is up again.
Meanwhile, the DAG is slowed down by this operator. To deal with such
scenarios, we should write the output in a reconciled fashion where the
reconciler thread is writing at the pace of external system. We should also
provide an ability to spool the data to disk when the external system is
down or the output operators queue is full.

Here are the proposed features for the output operator.

1. Write to external system in a separate reconciler thread.
2. Queue the tuples in memory for reconciler thread to consume.
3. Spool the incoming tuples to hdfs using a WAL when the queue is full.
4. Read from WAL and write to queue as queue is being consumed.
5. When external system is able to consume as fast as incoming throughput,
WAL is not written. The queue will just buffer the tuples before writing to
external system.

Here is the JIRA: https://issues.apache.org/jira/browse/APEXMALHAR-2037

Please let me know if you have any feedback on the design.

-- 

Regards,
Ashwin.

Re: proposal for reconciled JDBC output operator

Posted by Chinmay Kolhatkar <ch...@apache.org>.
@Ashwin yes, It's fault tolarant. The variable waitingTuple will get
checkpointed and will hold the tuples that needs to be processed. In case
of failures it re-enqueues any tuples still left in this variable.


On Wed, Mar 30, 2016 at 2:33 AM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> @Chinmay I went through your code and related comments on pull request. I
> was originally thinking of using AbstractReconciler but seems like
> AbstractAsyncProcessor will be better because 1. it has multithreaded
> output 2. it does not wait for committed window to process queued tuples.
> One question though, is it fault tolerant?
>
> @Thomas Current design creates an abstract implementation, so it needs to
> be extended to create specific output writers. Need to think more about how
> to make this pluggable.
>
> @Others I am planning to get the common functionality out to make it
> reusable for outputs to any external system.
>
> Regards,
> Ashwin.
>
> On Tue, Mar 29, 2016 at 11:29 AM, Thomas Weise <th...@datatorrent.com>
> wrote:
>
> > Can this be solved through a pluggable component of operator, similar to
> > window data manager?
> >
> >
> > On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua <si...@datatorrent.com>
> > wrote:
> >
> > > +1 on making it a feature for all output operators that depend on
> > external
> > > system
> > >
> > > On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh <
> > > sandeep@datatorrent.com>
> > > wrote:
> > >
> > > > +1 on making it apex feature.
> > > > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <tu...@datatorrent.com>
> > wrote:
> > > >
> > > > > +1 for the idea, love to have this feature available for operators
> > > > writing
> > > > > to external systems.
> > > > >
> > > > > - Tushar.
> > > > >
> > > > >
> > > > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <
> > > chinmay@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi Ashwin,
> > > > > >
> > > > > > This is indeed a very required functionality.
> > > > > >
> > > > > > I think this is applicable for all three types of operators:
> Input
> > > > (e.g.
> > > > > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output
> > > > > (JDBCOutput,
> > > > > > HDFSOutput, etc)
> > > > > >
> > > > > > I tried to do something similar:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> > > > > >
> > > > > > But this had limitation related extending another class other
> than
> > > this
> > > > > > abstract class etc....
> > > > > >
> > > > > > From what you have mentioned above, I see the functionality
> > basically
> > > > > > provides asynchronous processing of tuples.
> > > > > >
> > > > > > Considering this is a common functionality, Can this feature
> > instead
> > > be
> > > > > > added in platform (apex-core) itself?
> > > > > >
> > > > > > Thanks,
> > > > > > Chinmay.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <
> > > mohit@datatorrent.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Dear Ashwin,
> > > > > > >
> > > > > > > The approach sounds good. I am assuming that this will be done
> > for
> > > > all
> > > > > > the
> > > > > > > output data stores and not limited to JDBC.
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Regards,
> > > > > > > Mohit
> > > > > > >
> > > > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > > > > > > ashwinchandrap@gmail.com> wrote:
> > > > > > >
> > > > > > > > There are many use cases in which we are writing tuples to
> > > external
> > > > > > > system
> > > > > > > > using JDBC etc. There are instances when the external system
> > > might
> > > > be
> > > > > > > slow
> > > > > > > > and down for some time. In those cases, the current
> > > implementation
> > > > of
> > > > > > > jdbc
> > > > > > > > output operators fail and restart until the external system
> is
> > up
> > > > > > again.
> > > > > > > > Meanwhile, the DAG is slowed down by this operator. To deal
> > with
> > > > such
> > > > > > > > scenarios, we should write the output in a reconciled fashion
> > > where
> > > > > the
> > > > > > > > reconciler thread is writing at the pace of external system.
> We
> > > > > should
> > > > > > > also
> > > > > > > > provide an ability to spool the data to disk when the
> external
> > > > system
> > > > > > is
> > > > > > > > down or the output operators queue is full.
> > > > > > > >
> > > > > > > > Here are the proposed features for the output operator.
> > > > > > > >
> > > > > > > > 1. Write to external system in a separate reconciler thread.
> > > > > > > > 2. Queue the tuples in memory for reconciler thread to
> consume.
> > > > > > > > 3. Spool the incoming tuples to hdfs using a WAL when the
> queue
> > > is
> > > > > > full.
> > > > > > > > 4. Read from WAL and write to queue as queue is being
> consumed.
> > > > > > > > 5. When external system is able to consume as fast as
> incoming
> > > > > > > throughput,
> > > > > > > > WAL is not written. The queue will just buffer the tuples
> > before
> > > > > > writing
> > > > > > > to
> > > > > > > > external system.
> > > > > > > >
> > > > > > > > Here is the JIRA:
> > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > > > > > >
> > > > > > > > Please let me know if you have any feedback on the design.
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Ashwin.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> Regards,
> Ashwin.
>

Re: proposal for reconciled JDBC output operator

Posted by Ashwin Chandra Putta <as...@gmail.com>.
This can be done as a pluggable component that can be reused in existing
output operators. The component will have a queue to buffer the regular
flow of tuples and back it using WAL (APEXMALHAR-1965) when external system
is slow or down. Will start a thread on the design soon.

Regards,
Ashwin.

On Tue, Apr 5, 2016 at 12:07 AM, Yogi Devendra <devendra.vyavahare@gmail.com
> wrote:

> This would great value addition for other operators.
> +1 for making this operator plugin.
>
> ~ Yogi
>
> On 4 April 2016 at 12:40, Shubham Pathak <sh...@datatorrent.com> wrote:
>
> > +1 for having this as a platform feature.
> >
> > On Mon, Apr 4, 2016 at 12:27 PM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> > > +1 on making this an Apex feature. It would be good to have all output
> > > stores supported with this kind of functionality.
> > >
> > > ~Bhupesh
> > >
> > > On Thu, Mar 31, 2016 at 11:15 AM, Priyanka Gugale <
> > > priyanka@datatorrent.com>
> > > wrote:
> > >
> > > > +1 for this feature for all output operators. I have faced this issue
> > > while
> > > > working on CassandraOutput operator as well.
> > > >
> > > > -Priyanka
> > > >
> > > > On Wed, Mar 30, 2016 at 11:37 AM, Thomas Weise <
> thomas@datatorrent.com
> > >
> > > > wrote:
> > > >
> > > > > Ashwin, abstract class isn't suitable.
> > > > >
> > > > > The basic ingredient is WAL, from which the operator consumes as
> > > > > appropriate. We already have the WAL implementation.
> > > > >
> > > > > Incoming tuples are added to the WAL, consumption is async. WAL
> > offset
> > > is
> > > > > checkpointed.
> > > > >
> > > > > --
> > > > > sent from mobile
> > > > > On Mar 29, 2016 2:03 PM, "Ashwin Chandra Putta" <
> > > > ashwinchandrap@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > @Chinmay I went through your code and related comments on pull
> > > > request. I
> > > > > > was originally thinking of using AbstractReconciler but seems
> like
> > > > > > AbstractAsyncProcessor will be better because 1. it has
> > multithreaded
> > > > > > output 2. it does not wait for committed window to process queued
> > > > tuples.
> > > > > > One question though, is it fault tolerant?
> > > > > >
> > > > > > @Thomas Current design creates an abstract implementation, so it
> > > needs
> > > > to
> > > > > > be extended to create specific output writers. Need to think more
> > > about
> > > > > how
> > > > > > to make this pluggable.
> > > > > >
> > > > > > @Others I am planning to get the common functionality out to make
> > it
> > > > > > reusable for outputs to any external system.
> > > > > >
> > > > > > Regards,
> > > > > > Ashwin.
> > > > > >
> > > > > > On Tue, Mar 29, 2016 at 11:29 AM, Thomas Weise <
> > > thomas@datatorrent.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Can this be solved through a pluggable component of operator,
> > > similar
> > > > > to
> > > > > > > window data manager?
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua <
> > > siyuan@datatorrent.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 on making it a feature for all output operators that
> depend
> > on
> > > > > > > external
> > > > > > > > system
> > > > > > > >
> > > > > > > > On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh <
> > > > > > > > sandeep@datatorrent.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 on making it apex feature.
> > > > > > > > > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <
> > > tushar@datatorrent.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 for the idea, love to have this feature available for
> > > > > operators
> > > > > > > > > writing
> > > > > > > > > > to external systems.
> > > > > > > > > >
> > > > > > > > > > - Tushar.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <
> > > > > > > > chinmay@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Ashwin,
> > > > > > > > > > >
> > > > > > > > > > > This is indeed a very required functionality.
> > > > > > > > > > >
> > > > > > > > > > > I think this is applicable for all three types of
> > > operators:
> > > > > > Input
> > > > > > > > > (e.g.
> > > > > > > > > > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment),
> > > Output
> > > > > > > > > > (JDBCOutput,
> > > > > > > > > > > HDFSOutput, etc)
> > > > > > > > > > >
> > > > > > > > > > > I tried to do something similar:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> > > > > > > > > > >
> > > > > > > > > > > But this had limitation related extending another class
> > > other
> > > > > > than
> > > > > > > > this
> > > > > > > > > > > abstract class etc....
> > > > > > > > > > >
> > > > > > > > > > > From what you have mentioned above, I see the
> > functionality
> > > > > > > basically
> > > > > > > > > > > provides asynchronous processing of tuples.
> > > > > > > > > > >
> > > > > > > > > > > Considering this is a common functionality, Can this
> > > feature
> > > > > > > instead
> > > > > > > > be
> > > > > > > > > > > added in platform (apex-core) itself?
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Chinmay.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <
> > > > > > > > mohit@datatorrent.com
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Dear Ashwin,
> > > > > > > > > > > >
> > > > > > > > > > > > The approach sounds good. I am assuming that this
> will
> > be
> > > > > done
> > > > > > > for
> > > > > > > > > all
> > > > > > > > > > > the
> > > > > > > > > > > > output data stores and not limited to JDBC.
> > > > > > > > > > > >
> > > > > > > > > > > > +1
> > > > > > > > > > > >
> > > > > > > > > > > > Regards,
> > > > > > > > > > > > Mohit
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra
> Putta <
> > > > > > > > > > > > ashwinchandrap@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > There are many use cases in which we are writing
> > tuples
> > > > to
> > > > > > > > external
> > > > > > > > > > > > system
> > > > > > > > > > > > > using JDBC etc. There are instances when the
> external
> > > > > system
> > > > > > > > might
> > > > > > > > > be
> > > > > > > > > > > > slow
> > > > > > > > > > > > > and down for some time. In those cases, the current
> > > > > > > > implementation
> > > > > > > > > of
> > > > > > > > > > > > jdbc
> > > > > > > > > > > > > output operators fail and restart until the
> external
> > > > system
> > > > > > is
> > > > > > > up
> > > > > > > > > > > again.
> > > > > > > > > > > > > Meanwhile, the DAG is slowed down by this operator.
> > To
> > > > deal
> > > > > > > with
> > > > > > > > > such
> > > > > > > > > > > > > scenarios, we should write the output in a
> reconciled
> > > > > fashion
> > > > > > > > where
> > > > > > > > > > the
> > > > > > > > > > > > > reconciler thread is writing at the pace of
> external
> > > > > system.
> > > > > > We
> > > > > > > > > > should
> > > > > > > > > > > > also
> > > > > > > > > > > > > provide an ability to spool the data to disk when
> the
> > > > > > external
> > > > > > > > > system
> > > > > > > > > > > is
> > > > > > > > > > > > > down or the output operators queue is full.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Here are the proposed features for the output
> > operator.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. Write to external system in a separate
> reconciler
> > > > > thread.
> > > > > > > > > > > > > 2. Queue the tuples in memory for reconciler thread
> > to
> > > > > > consume.
> > > > > > > > > > > > > 3. Spool the incoming tuples to hdfs using a WAL
> when
> > > the
> > > > > > queue
> > > > > > > > is
> > > > > > > > > > > full.
> > > > > > > > > > > > > 4. Read from WAL and write to queue as queue is
> being
> > > > > > consumed.
> > > > > > > > > > > > > 5. When external system is able to consume as fast
> as
> > > > > > incoming
> > > > > > > > > > > > throughput,
> > > > > > > > > > > > > WAL is not written. The queue will just buffer the
> > > tuples
> > > > > > > before
> > > > > > > > > > > writing
> > > > > > > > > > > > to
> > > > > > > > > > > > > external system.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Here is the JIRA:
> > > > > > > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please let me know if you have any feedback on the
> > > > design.
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > Ashwin.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Regards,
> > > > > > Ashwin.
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 

Regards,
Ashwin.

Re: proposal for reconciled JDBC output operator

Posted by Yogi Devendra <de...@gmail.com>.
This would great value addition for other operators.
+1 for making this operator plugin.

~ Yogi

On 4 April 2016 at 12:40, Shubham Pathak <sh...@datatorrent.com> wrote:

> +1 for having this as a platform feature.
>
> On Mon, Apr 4, 2016 at 12:27 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > +1 on making this an Apex feature. It would be good to have all output
> > stores supported with this kind of functionality.
> >
> > ~Bhupesh
> >
> > On Thu, Mar 31, 2016 at 11:15 AM, Priyanka Gugale <
> > priyanka@datatorrent.com>
> > wrote:
> >
> > > +1 for this feature for all output operators. I have faced this issue
> > while
> > > working on CassandraOutput operator as well.
> > >
> > > -Priyanka
> > >
> > > On Wed, Mar 30, 2016 at 11:37 AM, Thomas Weise <thomas@datatorrent.com
> >
> > > wrote:
> > >
> > > > Ashwin, abstract class isn't suitable.
> > > >
> > > > The basic ingredient is WAL, from which the operator consumes as
> > > > appropriate. We already have the WAL implementation.
> > > >
> > > > Incoming tuples are added to the WAL, consumption is async. WAL
> offset
> > is
> > > > checkpointed.
> > > >
> > > > --
> > > > sent from mobile
> > > > On Mar 29, 2016 2:03 PM, "Ashwin Chandra Putta" <
> > > ashwinchandrap@gmail.com>
> > > > wrote:
> > > >
> > > > > @Chinmay I went through your code and related comments on pull
> > > request. I
> > > > > was originally thinking of using AbstractReconciler but seems like
> > > > > AbstractAsyncProcessor will be better because 1. it has
> multithreaded
> > > > > output 2. it does not wait for committed window to process queued
> > > tuples.
> > > > > One question though, is it fault tolerant?
> > > > >
> > > > > @Thomas Current design creates an abstract implementation, so it
> > needs
> > > to
> > > > > be extended to create specific output writers. Need to think more
> > about
> > > > how
> > > > > to make this pluggable.
> > > > >
> > > > > @Others I am planning to get the common functionality out to make
> it
> > > > > reusable for outputs to any external system.
> > > > >
> > > > > Regards,
> > > > > Ashwin.
> > > > >
> > > > > On Tue, Mar 29, 2016 at 11:29 AM, Thomas Weise <
> > thomas@datatorrent.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Can this be solved through a pluggable component of operator,
> > similar
> > > > to
> > > > > > window data manager?
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua <
> > siyuan@datatorrent.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > +1 on making it a feature for all output operators that depend
> on
> > > > > > external
> > > > > > > system
> > > > > > >
> > > > > > > On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh <
> > > > > > > sandeep@datatorrent.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 on making it apex feature.
> > > > > > > > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <
> > tushar@datatorrent.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 for the idea, love to have this feature available for
> > > > operators
> > > > > > > > writing
> > > > > > > > > to external systems.
> > > > > > > > >
> > > > > > > > > - Tushar.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <
> > > > > > > chinmay@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Ashwin,
> > > > > > > > > >
> > > > > > > > > > This is indeed a very required functionality.
> > > > > > > > > >
> > > > > > > > > > I think this is applicable for all three types of
> > operators:
> > > > > Input
> > > > > > > > (e.g.
> > > > > > > > > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment),
> > Output
> > > > > > > > > (JDBCOutput,
> > > > > > > > > > HDFSOutput, etc)
> > > > > > > > > >
> > > > > > > > > > I tried to do something similar:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> > > > > > > > > >
> > > > > > > > > > But this had limitation related extending another class
> > other
> > > > > than
> > > > > > > this
> > > > > > > > > > abstract class etc....
> > > > > > > > > >
> > > > > > > > > > From what you have mentioned above, I see the
> functionality
> > > > > > basically
> > > > > > > > > > provides asynchronous processing of tuples.
> > > > > > > > > >
> > > > > > > > > > Considering this is a common functionality, Can this
> > feature
> > > > > > instead
> > > > > > > be
> > > > > > > > > > added in platform (apex-core) itself?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Chinmay.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <
> > > > > > > mohit@datatorrent.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Dear Ashwin,
> > > > > > > > > > >
> > > > > > > > > > > The approach sounds good. I am assuming that this will
> be
> > > > done
> > > > > > for
> > > > > > > > all
> > > > > > > > > > the
> > > > > > > > > > > output data stores and not limited to JDBC.
> > > > > > > > > > >
> > > > > > > > > > > +1
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Mohit
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > > > > > > > > > > ashwinchandrap@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > There are many use cases in which we are writing
> tuples
> > > to
> > > > > > > external
> > > > > > > > > > > system
> > > > > > > > > > > > using JDBC etc. There are instances when the external
> > > > system
> > > > > > > might
> > > > > > > > be
> > > > > > > > > > > slow
> > > > > > > > > > > > and down for some time. In those cases, the current
> > > > > > > implementation
> > > > > > > > of
> > > > > > > > > > > jdbc
> > > > > > > > > > > > output operators fail and restart until the external
> > > system
> > > > > is
> > > > > > up
> > > > > > > > > > again.
> > > > > > > > > > > > Meanwhile, the DAG is slowed down by this operator.
> To
> > > deal
> > > > > > with
> > > > > > > > such
> > > > > > > > > > > > scenarios, we should write the output in a reconciled
> > > > fashion
> > > > > > > where
> > > > > > > > > the
> > > > > > > > > > > > reconciler thread is writing at the pace of external
> > > > system.
> > > > > We
> > > > > > > > > should
> > > > > > > > > > > also
> > > > > > > > > > > > provide an ability to spool the data to disk when the
> > > > > external
> > > > > > > > system
> > > > > > > > > > is
> > > > > > > > > > > > down or the output operators queue is full.
> > > > > > > > > > > >
> > > > > > > > > > > > Here are the proposed features for the output
> operator.
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Write to external system in a separate reconciler
> > > > thread.
> > > > > > > > > > > > 2. Queue the tuples in memory for reconciler thread
> to
> > > > > consume.
> > > > > > > > > > > > 3. Spool the incoming tuples to hdfs using a WAL when
> > the
> > > > > queue
> > > > > > > is
> > > > > > > > > > full.
> > > > > > > > > > > > 4. Read from WAL and write to queue as queue is being
> > > > > consumed.
> > > > > > > > > > > > 5. When external system is able to consume as fast as
> > > > > incoming
> > > > > > > > > > > throughput,
> > > > > > > > > > > > WAL is not written. The queue will just buffer the
> > tuples
> > > > > > before
> > > > > > > > > > writing
> > > > > > > > > > > to
> > > > > > > > > > > > external system.
> > > > > > > > > > > >
> > > > > > > > > > > > Here is the JIRA:
> > > > > > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > > > > > > > > > >
> > > > > > > > > > > > Please let me know if you have any feedback on the
> > > design.
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > >
> > > > > > > > > > > > Regards,
> > > > > > > > > > > > Ashwin.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Regards,
> > > > > Ashwin.
> > > > >
> > > >
> > >
> >
>

Re: proposal for reconciled JDBC output operator

Posted by Shubham Pathak <sh...@datatorrent.com>.
+1 for having this as a platform feature.

On Mon, Apr 4, 2016 at 12:27 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> +1 on making this an Apex feature. It would be good to have all output
> stores supported with this kind of functionality.
>
> ~Bhupesh
>
> On Thu, Mar 31, 2016 at 11:15 AM, Priyanka Gugale <
> priyanka@datatorrent.com>
> wrote:
>
> > +1 for this feature for all output operators. I have faced this issue
> while
> > working on CassandraOutput operator as well.
> >
> > -Priyanka
> >
> > On Wed, Mar 30, 2016 at 11:37 AM, Thomas Weise <th...@datatorrent.com>
> > wrote:
> >
> > > Ashwin, abstract class isn't suitable.
> > >
> > > The basic ingredient is WAL, from which the operator consumes as
> > > appropriate. We already have the WAL implementation.
> > >
> > > Incoming tuples are added to the WAL, consumption is async. WAL offset
> is
> > > checkpointed.
> > >
> > > --
> > > sent from mobile
> > > On Mar 29, 2016 2:03 PM, "Ashwin Chandra Putta" <
> > ashwinchandrap@gmail.com>
> > > wrote:
> > >
> > > > @Chinmay I went through your code and related comments on pull
> > request. I
> > > > was originally thinking of using AbstractReconciler but seems like
> > > > AbstractAsyncProcessor will be better because 1. it has multithreaded
> > > > output 2. it does not wait for committed window to process queued
> > tuples.
> > > > One question though, is it fault tolerant?
> > > >
> > > > @Thomas Current design creates an abstract implementation, so it
> needs
> > to
> > > > be extended to create specific output writers. Need to think more
> about
> > > how
> > > > to make this pluggable.
> > > >
> > > > @Others I am planning to get the common functionality out to make it
> > > > reusable for outputs to any external system.
> > > >
> > > > Regards,
> > > > Ashwin.
> > > >
> > > > On Tue, Mar 29, 2016 at 11:29 AM, Thomas Weise <
> thomas@datatorrent.com
> > >
> > > > wrote:
> > > >
> > > > > Can this be solved through a pluggable component of operator,
> similar
> > > to
> > > > > window data manager?
> > > > >
> > > > >
> > > > > On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua <
> siyuan@datatorrent.com
> > >
> > > > > wrote:
> > > > >
> > > > > > +1 on making it a feature for all output operators that depend on
> > > > > external
> > > > > > system
> > > > > >
> > > > > > On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh <
> > > > > > sandeep@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1 on making it apex feature.
> > > > > > > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <
> tushar@datatorrent.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > +1 for the idea, love to have this feature available for
> > > operators
> > > > > > > writing
> > > > > > > > to external systems.
> > > > > > > >
> > > > > > > > - Tushar.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <
> > > > > > chinmay@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Ashwin,
> > > > > > > > >
> > > > > > > > > This is indeed a very required functionality.
> > > > > > > > >
> > > > > > > > > I think this is applicable for all three types of
> operators:
> > > > Input
> > > > > > > (e.g.
> > > > > > > > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment),
> Output
> > > > > > > > (JDBCOutput,
> > > > > > > > > HDFSOutput, etc)
> > > > > > > > >
> > > > > > > > > I tried to do something similar:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> > > > > > > > >
> > > > > > > > > But this had limitation related extending another class
> other
> > > > than
> > > > > > this
> > > > > > > > > abstract class etc....
> > > > > > > > >
> > > > > > > > > From what you have mentioned above, I see the functionality
> > > > > basically
> > > > > > > > > provides asynchronous processing of tuples.
> > > > > > > > >
> > > > > > > > > Considering this is a common functionality, Can this
> feature
> > > > > instead
> > > > > > be
> > > > > > > > > added in platform (apex-core) itself?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Chinmay.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <
> > > > > > mohit@datatorrent.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Dear Ashwin,
> > > > > > > > > >
> > > > > > > > > > The approach sounds good. I am assuming that this will be
> > > done
> > > > > for
> > > > > > > all
> > > > > > > > > the
> > > > > > > > > > output data stores and not limited to JDBC.
> > > > > > > > > >
> > > > > > > > > > +1
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Mohit
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > > > > > > > > > ashwinchandrap@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > There are many use cases in which we are writing tuples
> > to
> > > > > > external
> > > > > > > > > > system
> > > > > > > > > > > using JDBC etc. There are instances when the external
> > > system
> > > > > > might
> > > > > > > be
> > > > > > > > > > slow
> > > > > > > > > > > and down for some time. In those cases, the current
> > > > > > implementation
> > > > > > > of
> > > > > > > > > > jdbc
> > > > > > > > > > > output operators fail and restart until the external
> > system
> > > > is
> > > > > up
> > > > > > > > > again.
> > > > > > > > > > > Meanwhile, the DAG is slowed down by this operator. To
> > deal
> > > > > with
> > > > > > > such
> > > > > > > > > > > scenarios, we should write the output in a reconciled
> > > fashion
> > > > > > where
> > > > > > > > the
> > > > > > > > > > > reconciler thread is writing at the pace of external
> > > system.
> > > > We
> > > > > > > > should
> > > > > > > > > > also
> > > > > > > > > > > provide an ability to spool the data to disk when the
> > > > external
> > > > > > > system
> > > > > > > > > is
> > > > > > > > > > > down or the output operators queue is full.
> > > > > > > > > > >
> > > > > > > > > > > Here are the proposed features for the output operator.
> > > > > > > > > > >
> > > > > > > > > > > 1. Write to external system in a separate reconciler
> > > thread.
> > > > > > > > > > > 2. Queue the tuples in memory for reconciler thread to
> > > > consume.
> > > > > > > > > > > 3. Spool the incoming tuples to hdfs using a WAL when
> the
> > > > queue
> > > > > > is
> > > > > > > > > full.
> > > > > > > > > > > 4. Read from WAL and write to queue as queue is being
> > > > consumed.
> > > > > > > > > > > 5. When external system is able to consume as fast as
> > > > incoming
> > > > > > > > > > throughput,
> > > > > > > > > > > WAL is not written. The queue will just buffer the
> tuples
> > > > > before
> > > > > > > > > writing
> > > > > > > > > > to
> > > > > > > > > > > external system.
> > > > > > > > > > >
> > > > > > > > > > > Here is the JIRA:
> > > > > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > > > > > > > > >
> > > > > > > > > > > Please let me know if you have any feedback on the
> > design.
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Ashwin.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Regards,
> > > > Ashwin.
> > > >
> > >
> >
>

Re: proposal for reconciled JDBC output operator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
+1 on making this an Apex feature. It would be good to have all output
stores supported with this kind of functionality.

~Bhupesh

On Thu, Mar 31, 2016 at 11:15 AM, Priyanka Gugale <pr...@datatorrent.com>
wrote:

> +1 for this feature for all output operators. I have faced this issue while
> working on CassandraOutput operator as well.
>
> -Priyanka
>
> On Wed, Mar 30, 2016 at 11:37 AM, Thomas Weise <th...@datatorrent.com>
> wrote:
>
> > Ashwin, abstract class isn't suitable.
> >
> > The basic ingredient is WAL, from which the operator consumes as
> > appropriate. We already have the WAL implementation.
> >
> > Incoming tuples are added to the WAL, consumption is async. WAL offset is
> > checkpointed.
> >
> > --
> > sent from mobile
> > On Mar 29, 2016 2:03 PM, "Ashwin Chandra Putta" <
> ashwinchandrap@gmail.com>
> > wrote:
> >
> > > @Chinmay I went through your code and related comments on pull
> request. I
> > > was originally thinking of using AbstractReconciler but seems like
> > > AbstractAsyncProcessor will be better because 1. it has multithreaded
> > > output 2. it does not wait for committed window to process queued
> tuples.
> > > One question though, is it fault tolerant?
> > >
> > > @Thomas Current design creates an abstract implementation, so it needs
> to
> > > be extended to create specific output writers. Need to think more about
> > how
> > > to make this pluggable.
> > >
> > > @Others I am planning to get the common functionality out to make it
> > > reusable for outputs to any external system.
> > >
> > > Regards,
> > > Ashwin.
> > >
> > > On Tue, Mar 29, 2016 at 11:29 AM, Thomas Weise <thomas@datatorrent.com
> >
> > > wrote:
> > >
> > > > Can this be solved through a pluggable component of operator, similar
> > to
> > > > window data manager?
> > > >
> > > >
> > > > On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua <siyuan@datatorrent.com
> >
> > > > wrote:
> > > >
> > > > > +1 on making it a feature for all output operators that depend on
> > > > external
> > > > > system
> > > > >
> > > > > On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh <
> > > > > sandeep@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > > > +1 on making it apex feature.
> > > > > > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <tushar@datatorrent.com
> >
> > > > wrote:
> > > > > >
> > > > > > > +1 for the idea, love to have this feature available for
> > operators
> > > > > > writing
> > > > > > > to external systems.
> > > > > > >
> > > > > > > - Tushar.
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <
> > > > > chinmay@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Ashwin,
> > > > > > > >
> > > > > > > > This is indeed a very required functionality.
> > > > > > > >
> > > > > > > > I think this is applicable for all three types of operators:
> > > Input
> > > > > > (e.g.
> > > > > > > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output
> > > > > > > (JDBCOutput,
> > > > > > > > HDFSOutput, etc)
> > > > > > > >
> > > > > > > > I tried to do something similar:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> > > > > > > >
> > > > > > > > But this had limitation related extending another class other
> > > than
> > > > > this
> > > > > > > > abstract class etc....
> > > > > > > >
> > > > > > > > From what you have mentioned above, I see the functionality
> > > > basically
> > > > > > > > provides asynchronous processing of tuples.
> > > > > > > >
> > > > > > > > Considering this is a common functionality, Can this feature
> > > > instead
> > > > > be
> > > > > > > > added in platform (apex-core) itself?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Chinmay.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <
> > > > > mohit@datatorrent.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Dear Ashwin,
> > > > > > > > >
> > > > > > > > > The approach sounds good. I am assuming that this will be
> > done
> > > > for
> > > > > > all
> > > > > > > > the
> > > > > > > > > output data stores and not limited to JDBC.
> > > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Mohit
> > > > > > > > >
> > > > > > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > > > > > > > > ashwinchandrap@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > There are many use cases in which we are writing tuples
> to
> > > > > external
> > > > > > > > > system
> > > > > > > > > > using JDBC etc. There are instances when the external
> > system
> > > > > might
> > > > > > be
> > > > > > > > > slow
> > > > > > > > > > and down for some time. In those cases, the current
> > > > > implementation
> > > > > > of
> > > > > > > > > jdbc
> > > > > > > > > > output operators fail and restart until the external
> system
> > > is
> > > > up
> > > > > > > > again.
> > > > > > > > > > Meanwhile, the DAG is slowed down by this operator. To
> deal
> > > > with
> > > > > > such
> > > > > > > > > > scenarios, we should write the output in a reconciled
> > fashion
> > > > > where
> > > > > > > the
> > > > > > > > > > reconciler thread is writing at the pace of external
> > system.
> > > We
> > > > > > > should
> > > > > > > > > also
> > > > > > > > > > provide an ability to spool the data to disk when the
> > > external
> > > > > > system
> > > > > > > > is
> > > > > > > > > > down or the output operators queue is full.
> > > > > > > > > >
> > > > > > > > > > Here are the proposed features for the output operator.
> > > > > > > > > >
> > > > > > > > > > 1. Write to external system in a separate reconciler
> > thread.
> > > > > > > > > > 2. Queue the tuples in memory for reconciler thread to
> > > consume.
> > > > > > > > > > 3. Spool the incoming tuples to hdfs using a WAL when the
> > > queue
> > > > > is
> > > > > > > > full.
> > > > > > > > > > 4. Read from WAL and write to queue as queue is being
> > > consumed.
> > > > > > > > > > 5. When external system is able to consume as fast as
> > > incoming
> > > > > > > > > throughput,
> > > > > > > > > > WAL is not written. The queue will just buffer the tuples
> > > > before
> > > > > > > > writing
> > > > > > > > > to
> > > > > > > > > > external system.
> > > > > > > > > >
> > > > > > > > > > Here is the JIRA:
> > > > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > > > > > > > >
> > > > > > > > > > Please let me know if you have any feedback on the
> design.
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Ashwin.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > Ashwin.
> > >
> >
>

Re: proposal for reconciled JDBC output operator

Posted by Priyanka Gugale <pr...@datatorrent.com>.
+1 for this feature for all output operators. I have faced this issue while
working on CassandraOutput operator as well.

-Priyanka

On Wed, Mar 30, 2016 at 11:37 AM, Thomas Weise <th...@datatorrent.com>
wrote:

> Ashwin, abstract class isn't suitable.
>
> The basic ingredient is WAL, from which the operator consumes as
> appropriate. We already have the WAL implementation.
>
> Incoming tuples are added to the WAL, consumption is async. WAL offset is
> checkpointed.
>
> --
> sent from mobile
> On Mar 29, 2016 2:03 PM, "Ashwin Chandra Putta" <as...@gmail.com>
> wrote:
>
> > @Chinmay I went through your code and related comments on pull request. I
> > was originally thinking of using AbstractReconciler but seems like
> > AbstractAsyncProcessor will be better because 1. it has multithreaded
> > output 2. it does not wait for committed window to process queued tuples.
> > One question though, is it fault tolerant?
> >
> > @Thomas Current design creates an abstract implementation, so it needs to
> > be extended to create specific output writers. Need to think more about
> how
> > to make this pluggable.
> >
> > @Others I am planning to get the common functionality out to make it
> > reusable for outputs to any external system.
> >
> > Regards,
> > Ashwin.
> >
> > On Tue, Mar 29, 2016 at 11:29 AM, Thomas Weise <th...@datatorrent.com>
> > wrote:
> >
> > > Can this be solved through a pluggable component of operator, similar
> to
> > > window data manager?
> > >
> > >
> > > On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua <si...@datatorrent.com>
> > > wrote:
> > >
> > > > +1 on making it a feature for all output operators that depend on
> > > external
> > > > system
> > > >
> > > > On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh <
> > > > sandeep@datatorrent.com>
> > > > wrote:
> > > >
> > > > > +1 on making it apex feature.
> > > > > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <tu...@datatorrent.com>
> > > wrote:
> > > > >
> > > > > > +1 for the idea, love to have this feature available for
> operators
> > > > > writing
> > > > > > to external systems.
> > > > > >
> > > > > > - Tushar.
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <
> > > > chinmay@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ashwin,
> > > > > > >
> > > > > > > This is indeed a very required functionality.
> > > > > > >
> > > > > > > I think this is applicable for all three types of operators:
> > Input
> > > > > (e.g.
> > > > > > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output
> > > > > > (JDBCOutput,
> > > > > > > HDFSOutput, etc)
> > > > > > >
> > > > > > > I tried to do something similar:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> > > > > > >
> > > > > > > But this had limitation related extending another class other
> > than
> > > > this
> > > > > > > abstract class etc....
> > > > > > >
> > > > > > > From what you have mentioned above, I see the functionality
> > > basically
> > > > > > > provides asynchronous processing of tuples.
> > > > > > >
> > > > > > > Considering this is a common functionality, Can this feature
> > > instead
> > > > be
> > > > > > > added in platform (apex-core) itself?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Chinmay.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <
> > > > mohit@datatorrent.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Dear Ashwin,
> > > > > > > >
> > > > > > > > The approach sounds good. I am assuming that this will be
> done
> > > for
> > > > > all
> > > > > > > the
> > > > > > > > output data stores and not limited to JDBC.
> > > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Mohit
> > > > > > > >
> > > > > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > > > > > > > ashwinchandrap@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > There are many use cases in which we are writing tuples to
> > > > external
> > > > > > > > system
> > > > > > > > > using JDBC etc. There are instances when the external
> system
> > > > might
> > > > > be
> > > > > > > > slow
> > > > > > > > > and down for some time. In those cases, the current
> > > > implementation
> > > > > of
> > > > > > > > jdbc
> > > > > > > > > output operators fail and restart until the external system
> > is
> > > up
> > > > > > > again.
> > > > > > > > > Meanwhile, the DAG is slowed down by this operator. To deal
> > > with
> > > > > such
> > > > > > > > > scenarios, we should write the output in a reconciled
> fashion
> > > > where
> > > > > > the
> > > > > > > > > reconciler thread is writing at the pace of external
> system.
> > We
> > > > > > should
> > > > > > > > also
> > > > > > > > > provide an ability to spool the data to disk when the
> > external
> > > > > system
> > > > > > > is
> > > > > > > > > down or the output operators queue is full.
> > > > > > > > >
> > > > > > > > > Here are the proposed features for the output operator.
> > > > > > > > >
> > > > > > > > > 1. Write to external system in a separate reconciler
> thread.
> > > > > > > > > 2. Queue the tuples in memory for reconciler thread to
> > consume.
> > > > > > > > > 3. Spool the incoming tuples to hdfs using a WAL when the
> > queue
> > > > is
> > > > > > > full.
> > > > > > > > > 4. Read from WAL and write to queue as queue is being
> > consumed.
> > > > > > > > > 5. When external system is able to consume as fast as
> > incoming
> > > > > > > > throughput,
> > > > > > > > > WAL is not written. The queue will just buffer the tuples
> > > before
> > > > > > > writing
> > > > > > > > to
> > > > > > > > > external system.
> > > > > > > > >
> > > > > > > > > Here is the JIRA:
> > > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > > > > > > >
> > > > > > > > > Please let me know if you have any feedback on the design.
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Ashwin.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>

Re: proposal for reconciled JDBC output operator

Posted by Thomas Weise <th...@datatorrent.com>.
Ashwin, abstract class isn't suitable.

The basic ingredient is WAL, from which the operator consumes as
appropriate. We already have the WAL implementation.

Incoming tuples are added to the WAL, consumption is async. WAL offset is
checkpointed.

--
sent from mobile
On Mar 29, 2016 2:03 PM, "Ashwin Chandra Putta" <as...@gmail.com>
wrote:

> @Chinmay I went through your code and related comments on pull request. I
> was originally thinking of using AbstractReconciler but seems like
> AbstractAsyncProcessor will be better because 1. it has multithreaded
> output 2. it does not wait for committed window to process queued tuples.
> One question though, is it fault tolerant?
>
> @Thomas Current design creates an abstract implementation, so it needs to
> be extended to create specific output writers. Need to think more about how
> to make this pluggable.
>
> @Others I am planning to get the common functionality out to make it
> reusable for outputs to any external system.
>
> Regards,
> Ashwin.
>
> On Tue, Mar 29, 2016 at 11:29 AM, Thomas Weise <th...@datatorrent.com>
> wrote:
>
> > Can this be solved through a pluggable component of operator, similar to
> > window data manager?
> >
> >
> > On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua <si...@datatorrent.com>
> > wrote:
> >
> > > +1 on making it a feature for all output operators that depend on
> > external
> > > system
> > >
> > > On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh <
> > > sandeep@datatorrent.com>
> > > wrote:
> > >
> > > > +1 on making it apex feature.
> > > > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <tu...@datatorrent.com>
> > wrote:
> > > >
> > > > > +1 for the idea, love to have this feature available for operators
> > > > writing
> > > > > to external systems.
> > > > >
> > > > > - Tushar.
> > > > >
> > > > >
> > > > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <
> > > chinmay@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi Ashwin,
> > > > > >
> > > > > > This is indeed a very required functionality.
> > > > > >
> > > > > > I think this is applicable for all three types of operators:
> Input
> > > > (e.g.
> > > > > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output
> > > > > (JDBCOutput,
> > > > > > HDFSOutput, etc)
> > > > > >
> > > > > > I tried to do something similar:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> > > > > >
> > > > > > But this had limitation related extending another class other
> than
> > > this
> > > > > > abstract class etc....
> > > > > >
> > > > > > From what you have mentioned above, I see the functionality
> > basically
> > > > > > provides asynchronous processing of tuples.
> > > > > >
> > > > > > Considering this is a common functionality, Can this feature
> > instead
> > > be
> > > > > > added in platform (apex-core) itself?
> > > > > >
> > > > > > Thanks,
> > > > > > Chinmay.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <
> > > mohit@datatorrent.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Dear Ashwin,
> > > > > > >
> > > > > > > The approach sounds good. I am assuming that this will be done
> > for
> > > > all
> > > > > > the
> > > > > > > output data stores and not limited to JDBC.
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Regards,
> > > > > > > Mohit
> > > > > > >
> > > > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > > > > > > ashwinchandrap@gmail.com> wrote:
> > > > > > >
> > > > > > > > There are many use cases in which we are writing tuples to
> > > external
> > > > > > > system
> > > > > > > > using JDBC etc. There are instances when the external system
> > > might
> > > > be
> > > > > > > slow
> > > > > > > > and down for some time. In those cases, the current
> > > implementation
> > > > of
> > > > > > > jdbc
> > > > > > > > output operators fail and restart until the external system
> is
> > up
> > > > > > again.
> > > > > > > > Meanwhile, the DAG is slowed down by this operator. To deal
> > with
> > > > such
> > > > > > > > scenarios, we should write the output in a reconciled fashion
> > > where
> > > > > the
> > > > > > > > reconciler thread is writing at the pace of external system.
> We
> > > > > should
> > > > > > > also
> > > > > > > > provide an ability to spool the data to disk when the
> external
> > > > system
> > > > > > is
> > > > > > > > down or the output operators queue is full.
> > > > > > > >
> > > > > > > > Here are the proposed features for the output operator.
> > > > > > > >
> > > > > > > > 1. Write to external system in a separate reconciler thread.
> > > > > > > > 2. Queue the tuples in memory for reconciler thread to
> consume.
> > > > > > > > 3. Spool the incoming tuples to hdfs using a WAL when the
> queue
> > > is
> > > > > > full.
> > > > > > > > 4. Read from WAL and write to queue as queue is being
> consumed.
> > > > > > > > 5. When external system is able to consume as fast as
> incoming
> > > > > > > throughput,
> > > > > > > > WAL is not written. The queue will just buffer the tuples
> > before
> > > > > > writing
> > > > > > > to
> > > > > > > > external system.
> > > > > > > >
> > > > > > > > Here is the JIRA:
> > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > > > > > >
> > > > > > > > Please let me know if you have any feedback on the design.
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Ashwin.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> Regards,
> Ashwin.
>

Re: proposal for reconciled JDBC output operator

Posted by Ashwin Chandra Putta <as...@gmail.com>.
@Chinmay I went through your code and related comments on pull request. I
was originally thinking of using AbstractReconciler but seems like
AbstractAsyncProcessor will be better because 1. it has multithreaded
output 2. it does not wait for committed window to process queued tuples.
One question though, is it fault tolerant?

@Thomas Current design creates an abstract implementation, so it needs to
be extended to create specific output writers. Need to think more about how
to make this pluggable.

@Others I am planning to get the common functionality out to make it
reusable for outputs to any external system.

Regards,
Ashwin.

On Tue, Mar 29, 2016 at 11:29 AM, Thomas Weise <th...@datatorrent.com>
wrote:

> Can this be solved through a pluggable component of operator, similar to
> window data manager?
>
>
> On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua <si...@datatorrent.com>
> wrote:
>
> > +1 on making it a feature for all output operators that depend on
> external
> > system
> >
> > On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh <
> > sandeep@datatorrent.com>
> > wrote:
> >
> > > +1 on making it apex feature.
> > > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <tu...@datatorrent.com>
> wrote:
> > >
> > > > +1 for the idea, love to have this feature available for operators
> > > writing
> > > > to external systems.
> > > >
> > > > - Tushar.
> > > >
> > > >
> > > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <
> > chinmay@apache.org>
> > > > wrote:
> > > >
> > > > > Hi Ashwin,
> > > > >
> > > > > This is indeed a very required functionality.
> > > > >
> > > > > I think this is applicable for all three types of operators: Input
> > > (e.g.
> > > > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output
> > > > (JDBCOutput,
> > > > > HDFSOutput, etc)
> > > > >
> > > > > I tried to do something similar:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> > > > >
> > > > > But this had limitation related extending another class other than
> > this
> > > > > abstract class etc....
> > > > >
> > > > > From what you have mentioned above, I see the functionality
> basically
> > > > > provides asynchronous processing of tuples.
> > > > >
> > > > > Considering this is a common functionality, Can this feature
> instead
> > be
> > > > > added in platform (apex-core) itself?
> > > > >
> > > > > Thanks,
> > > > > Chinmay.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <
> > mohit@datatorrent.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Dear Ashwin,
> > > > > >
> > > > > > The approach sounds good. I am assuming that this will be done
> for
> > > all
> > > > > the
> > > > > > output data stores and not limited to JDBC.
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > Regards,
> > > > > > Mohit
> > > > > >
> > > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > > > > > ashwinchandrap@gmail.com> wrote:
> > > > > >
> > > > > > > There are many use cases in which we are writing tuples to
> > external
> > > > > > system
> > > > > > > using JDBC etc. There are instances when the external system
> > might
> > > be
> > > > > > slow
> > > > > > > and down for some time. In those cases, the current
> > implementation
> > > of
> > > > > > jdbc
> > > > > > > output operators fail and restart until the external system is
> up
> > > > > again.
> > > > > > > Meanwhile, the DAG is slowed down by this operator. To deal
> with
> > > such
> > > > > > > scenarios, we should write the output in a reconciled fashion
> > where
> > > > the
> > > > > > > reconciler thread is writing at the pace of external system. We
> > > > should
> > > > > > also
> > > > > > > provide an ability to spool the data to disk when the external
> > > system
> > > > > is
> > > > > > > down or the output operators queue is full.
> > > > > > >
> > > > > > > Here are the proposed features for the output operator.
> > > > > > >
> > > > > > > 1. Write to external system in a separate reconciler thread.
> > > > > > > 2. Queue the tuples in memory for reconciler thread to consume.
> > > > > > > 3. Spool the incoming tuples to hdfs using a WAL when the queue
> > is
> > > > > full.
> > > > > > > 4. Read from WAL and write to queue as queue is being consumed.
> > > > > > > 5. When external system is able to consume as fast as incoming
> > > > > > throughput,
> > > > > > > WAL is not written. The queue will just buffer the tuples
> before
> > > > > writing
> > > > > > to
> > > > > > > external system.
> > > > > > >
> > > > > > > Here is the JIRA:
> > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > > > > >
> > > > > > > Please let me know if you have any feedback on the design.
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Regards,
> > > > > > > Ashwin.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 

Regards,
Ashwin.

Re: proposal for reconciled JDBC output operator

Posted by Thomas Weise <th...@datatorrent.com>.
Can this be solved through a pluggable component of operator, similar to
window data manager?


On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua <si...@datatorrent.com> wrote:

> +1 on making it a feature for all output operators that depend on external
> system
>
> On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh <
> sandeep@datatorrent.com>
> wrote:
>
> > +1 on making it apex feature.
> > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <tu...@datatorrent.com> wrote:
> >
> > > +1 for the idea, love to have this feature available for operators
> > writing
> > > to external systems.
> > >
> > > - Tushar.
> > >
> > >
> > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <
> chinmay@apache.org>
> > > wrote:
> > >
> > > > Hi Ashwin,
> > > >
> > > > This is indeed a very required functionality.
> > > >
> > > > I think this is applicable for all three types of operators: Input
> > (e.g.
> > > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output
> > > (JDBCOutput,
> > > > HDFSOutput, etc)
> > > >
> > > > I tried to do something similar:
> > > >
> > > >
> > >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> > > >
> > > > But this had limitation related extending another class other than
> this
> > > > abstract class etc....
> > > >
> > > > From what you have mentioned above, I see the functionality basically
> > > > provides asynchronous processing of tuples.
> > > >
> > > > Considering this is a common functionality, Can this feature instead
> be
> > > > added in platform (apex-core) itself?
> > > >
> > > > Thanks,
> > > > Chinmay.
> > > >
> > > >
> > > >
> > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <
> mohit@datatorrent.com
> > >
> > > > wrote:
> > > >
> > > > > Dear Ashwin,
> > > > >
> > > > > The approach sounds good. I am assuming that this will be done for
> > all
> > > > the
> > > > > output data stores and not limited to JDBC.
> > > > >
> > > > > +1
> > > > >
> > > > > Regards,
> > > > > Mohit
> > > > >
> > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > > > > ashwinchandrap@gmail.com> wrote:
> > > > >
> > > > > > There are many use cases in which we are writing tuples to
> external
> > > > > system
> > > > > > using JDBC etc. There are instances when the external system
> might
> > be
> > > > > slow
> > > > > > and down for some time. In those cases, the current
> implementation
> > of
> > > > > jdbc
> > > > > > output operators fail and restart until the external system is up
> > > > again.
> > > > > > Meanwhile, the DAG is slowed down by this operator. To deal with
> > such
> > > > > > scenarios, we should write the output in a reconciled fashion
> where
> > > the
> > > > > > reconciler thread is writing at the pace of external system. We
> > > should
> > > > > also
> > > > > > provide an ability to spool the data to disk when the external
> > system
> > > > is
> > > > > > down or the output operators queue is full.
> > > > > >
> > > > > > Here are the proposed features for the output operator.
> > > > > >
> > > > > > 1. Write to external system in a separate reconciler thread.
> > > > > > 2. Queue the tuples in memory for reconciler thread to consume.
> > > > > > 3. Spool the incoming tuples to hdfs using a WAL when the queue
> is
> > > > full.
> > > > > > 4. Read from WAL and write to queue as queue is being consumed.
> > > > > > 5. When external system is able to consume as fast as incoming
> > > > > throughput,
> > > > > > WAL is not written. The queue will just buffer the tuples before
> > > > writing
> > > > > to
> > > > > > external system.
> > > > > >
> > > > > > Here is the JIRA:
> > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > > > >
> > > > > > Please let me know if you have any feedback on the design.
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Regards,
> > > > > > Ashwin.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: proposal for reconciled JDBC output operator

Posted by Siyuan Hua <si...@datatorrent.com>.
+1 on making it a feature for all output operators that depend on external
system

On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:

> +1 on making it apex feature.
> On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <tu...@datatorrent.com> wrote:
>
> > +1 for the idea, love to have this feature available for operators
> writing
> > to external systems.
> >
> > - Tushar.
> >
> >
> > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <ch...@apache.org>
> > wrote:
> >
> > > Hi Ashwin,
> > >
> > > This is indeed a very required functionality.
> > >
> > > I think this is applicable for all three types of operators: Input
> (e.g.
> > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output
> > (JDBCOutput,
> > > HDFSOutput, etc)
> > >
> > > I tried to do something similar:
> > >
> > >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> > >
> > > But this had limitation related extending another class other than this
> > > abstract class etc....
> > >
> > > From what you have mentioned above, I see the functionality basically
> > > provides asynchronous processing of tuples.
> > >
> > > Considering this is a common functionality, Can this feature instead be
> > > added in platform (apex-core) itself?
> > >
> > > Thanks,
> > > Chinmay.
> > >
> > >
> > >
> > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <mohit@datatorrent.com
> >
> > > wrote:
> > >
> > > > Dear Ashwin,
> > > >
> > > > The approach sounds good. I am assuming that this will be done for
> all
> > > the
> > > > output data stores and not limited to JDBC.
> > > >
> > > > +1
> > > >
> > > > Regards,
> > > > Mohit
> > > >
> > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > > > ashwinchandrap@gmail.com> wrote:
> > > >
> > > > > There are many use cases in which we are writing tuples to external
> > > > system
> > > > > using JDBC etc. There are instances when the external system might
> be
> > > > slow
> > > > > and down for some time. In those cases, the current implementation
> of
> > > > jdbc
> > > > > output operators fail and restart until the external system is up
> > > again.
> > > > > Meanwhile, the DAG is slowed down by this operator. To deal with
> such
> > > > > scenarios, we should write the output in a reconciled fashion where
> > the
> > > > > reconciler thread is writing at the pace of external system. We
> > should
> > > > also
> > > > > provide an ability to spool the data to disk when the external
> system
> > > is
> > > > > down or the output operators queue is full.
> > > > >
> > > > > Here are the proposed features for the output operator.
> > > > >
> > > > > 1. Write to external system in a separate reconciler thread.
> > > > > 2. Queue the tuples in memory for reconciler thread to consume.
> > > > > 3. Spool the incoming tuples to hdfs using a WAL when the queue is
> > > full.
> > > > > 4. Read from WAL and write to queue as queue is being consumed.
> > > > > 5. When external system is able to consume as fast as incoming
> > > > throughput,
> > > > > WAL is not written. The queue will just buffer the tuples before
> > > writing
> > > > to
> > > > > external system.
> > > > >
> > > > > Here is the JIRA:
> > > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > > >
> > > > > Please let me know if you have any feedback on the design.
> > > > >
> > > > > --
> > > > >
> > > > > Regards,
> > > > > Ashwin.
> > > > >
> > > >
> > >
> >
>

Re: proposal for reconciled JDBC output operator

Posted by Sandeep Deshmukh <sa...@datatorrent.com>.
+1 on making it apex feature.
On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <tu...@datatorrent.com> wrote:

> +1 for the idea, love to have this feature available for operators writing
> to external systems.
>
> - Tushar.
>
>
> On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <ch...@apache.org>
> wrote:
>
> > Hi Ashwin,
> >
> > This is indeed a very required functionality.
> >
> > I think this is applicable for all three types of operators: Input (e.g.
> > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output
> (JDBCOutput,
> > HDFSOutput, etc)
> >
> > I tried to do something similar:
> >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> >
> > But this had limitation related extending another class other than this
> > abstract class etc....
> >
> > From what you have mentioned above, I see the functionality basically
> > provides asynchronous processing of tuples.
> >
> > Considering this is a common functionality, Can this feature instead be
> > added in platform (apex-core) itself?
> >
> > Thanks,
> > Chinmay.
> >
> >
> >
> > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <mo...@datatorrent.com>
> > wrote:
> >
> > > Dear Ashwin,
> > >
> > > The approach sounds good. I am assuming that this will be done for all
> > the
> > > output data stores and not limited to JDBC.
> > >
> > > +1
> > >
> > > Regards,
> > > Mohit
> > >
> > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > > ashwinchandrap@gmail.com> wrote:
> > >
> > > > There are many use cases in which we are writing tuples to external
> > > system
> > > > using JDBC etc. There are instances when the external system might be
> > > slow
> > > > and down for some time. In those cases, the current implementation of
> > > jdbc
> > > > output operators fail and restart until the external system is up
> > again.
> > > > Meanwhile, the DAG is slowed down by this operator. To deal with such
> > > > scenarios, we should write the output in a reconciled fashion where
> the
> > > > reconciler thread is writing at the pace of external system. We
> should
> > > also
> > > > provide an ability to spool the data to disk when the external system
> > is
> > > > down or the output operators queue is full.
> > > >
> > > > Here are the proposed features for the output operator.
> > > >
> > > > 1. Write to external system in a separate reconciler thread.
> > > > 2. Queue the tuples in memory for reconciler thread to consume.
> > > > 3. Spool the incoming tuples to hdfs using a WAL when the queue is
> > full.
> > > > 4. Read from WAL and write to queue as queue is being consumed.
> > > > 5. When external system is able to consume as fast as incoming
> > > throughput,
> > > > WAL is not written. The queue will just buffer the tuples before
> > writing
> > > to
> > > > external system.
> > > >
> > > > Here is the JIRA:
> > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > >
> > > > Please let me know if you have any feedback on the design.
> > > >
> > > > --
> > > >
> > > > Regards,
> > > > Ashwin.
> > > >
> > >
> >
>

Re: proposal for reconciled JDBC output operator

Posted by Tushar Gosavi <tu...@datatorrent.com>.
+1 for the idea, love to have this feature available for operators writing
to external systems.

- Tushar.


On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <ch...@apache.org>
wrote:

> Hi Ashwin,
>
> This is indeed a very required functionality.
>
> I think this is applicable for all three types of operators: Input (e.g.
> HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output (JDBCOutput,
> HDFSOutput, etc)
>
> I tried to do something similar:
>
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
>
> But this had limitation related extending another class other than this
> abstract class etc....
>
> From what you have mentioned above, I see the functionality basically
> provides asynchronous processing of tuples.
>
> Considering this is a common functionality, Can this feature instead be
> added in platform (apex-core) itself?
>
> Thanks,
> Chinmay.
>
>
>
> On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <mo...@datatorrent.com>
> wrote:
>
> > Dear Ashwin,
> >
> > The approach sounds good. I am assuming that this will be done for all
> the
> > output data stores and not limited to JDBC.
> >
> > +1
> >
> > Regards,
> > Mohit
> >
> > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > ashwinchandrap@gmail.com> wrote:
> >
> > > There are many use cases in which we are writing tuples to external
> > system
> > > using JDBC etc. There are instances when the external system might be
> > slow
> > > and down for some time. In those cases, the current implementation of
> > jdbc
> > > output operators fail and restart until the external system is up
> again.
> > > Meanwhile, the DAG is slowed down by this operator. To deal with such
> > > scenarios, we should write the output in a reconciled fashion where the
> > > reconciler thread is writing at the pace of external system. We should
> > also
> > > provide an ability to spool the data to disk when the external system
> is
> > > down or the output operators queue is full.
> > >
> > > Here are the proposed features for the output operator.
> > >
> > > 1. Write to external system in a separate reconciler thread.
> > > 2. Queue the tuples in memory for reconciler thread to consume.
> > > 3. Spool the incoming tuples to hdfs using a WAL when the queue is
> full.
> > > 4. Read from WAL and write to queue as queue is being consumed.
> > > 5. When external system is able to consume as fast as incoming
> > throughput,
> > > WAL is not written. The queue will just buffer the tuples before
> writing
> > to
> > > external system.
> > >
> > > Here is the JIRA:
> https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > >
> > > Please let me know if you have any feedback on the design.
> > >
> > > --
> > >
> > > Regards,
> > > Ashwin.
> > >
> >
>

Re: proposal for reconciled JDBC output operator

Posted by Chinmay Kolhatkar <ch...@apache.org>.
Hi Ashwin,

This is indeed a very required functionality.

I think this is applicable for all three types of operators: Input (e.g.
HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output (JDBCOutput,
HDFSOutput, etc)

I tried to do something similar:
https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java

But this had limitation related extending another class other than this
abstract class etc....

>From what you have mentioned above, I see the functionality basically
provides asynchronous processing of tuples.

Considering this is a common functionality, Can this feature instead be
added in platform (apex-core) itself?

Thanks,
Chinmay.



On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <mo...@datatorrent.com>
wrote:

> Dear Ashwin,
>
> The approach sounds good. I am assuming that this will be done for all the
> output data stores and not limited to JDBC.
>
> +1
>
> Regards,
> Mohit
>
> On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
> > There are many use cases in which we are writing tuples to external
> system
> > using JDBC etc. There are instances when the external system might be
> slow
> > and down for some time. In those cases, the current implementation of
> jdbc
> > output operators fail and restart until the external system is up again.
> > Meanwhile, the DAG is slowed down by this operator. To deal with such
> > scenarios, we should write the output in a reconciled fashion where the
> > reconciler thread is writing at the pace of external system. We should
> also
> > provide an ability to spool the data to disk when the external system is
> > down or the output operators queue is full.
> >
> > Here are the proposed features for the output operator.
> >
> > 1. Write to external system in a separate reconciler thread.
> > 2. Queue the tuples in memory for reconciler thread to consume.
> > 3. Spool the incoming tuples to hdfs using a WAL when the queue is full.
> > 4. Read from WAL and write to queue as queue is being consumed.
> > 5. When external system is able to consume as fast as incoming
> throughput,
> > WAL is not written. The queue will just buffer the tuples before writing
> to
> > external system.
> >
> > Here is the JIRA: https://issues.apache.org/jira/browse/APEXMALHAR-2037
> >
> > Please let me know if you have any feedback on the design.
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>

Re: proposal for reconciled JDBC output operator

Posted by Mohit Jotwani <mo...@datatorrent.com>.
Dear Ashwin,

The approach sounds good. I am assuming that this will be done for all the
output data stores and not limited to JDBC.

+1

Regards,
Mohit

On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> There are many use cases in which we are writing tuples to external system
> using JDBC etc. There are instances when the external system might be slow
> and down for some time. In those cases, the current implementation of jdbc
> output operators fail and restart until the external system is up again.
> Meanwhile, the DAG is slowed down by this operator. To deal with such
> scenarios, we should write the output in a reconciled fashion where the
> reconciler thread is writing at the pace of external system. We should also
> provide an ability to spool the data to disk when the external system is
> down or the output operators queue is full.
>
> Here are the proposed features for the output operator.
>
> 1. Write to external system in a separate reconciler thread.
> 2. Queue the tuples in memory for reconciler thread to consume.
> 3. Spool the incoming tuples to hdfs using a WAL when the queue is full.
> 4. Read from WAL and write to queue as queue is being consumed.
> 5. When external system is able to consume as fast as incoming throughput,
> WAL is not written. The queue will just buffer the tuples before writing to
> external system.
>
> Here is the JIRA: https://issues.apache.org/jira/browse/APEXMALHAR-2037
>
> Please let me know if you have any feedback on the design.
>
> --
>
> Regards,
> Ashwin.
>

Re: proposal for reconciled JDBC output operator

Posted by Ashwin Chandra Putta <as...@gmail.com>.
Can you give more details Sandesh what you mean by bandwidth controller?

Regards,
Ashwin.

On Mon, Mar 28, 2016 at 4:58 PM, Sandesh Hegde <sa...@datatorrent.com>
wrote:

> Essentially we need bandwidth controller for output operators? not specific
> to JDBC?
>
> On Mon, Mar 28, 2016 at 4:47 PM Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
> > There are many use cases in which we are writing tuples to external
> system
> > using JDBC etc. There are instances when the external system might be
> slow
> > and down for some time. In those cases, the current implementation of
> jdbc
> > output operators fail and restart until the external system is up again.
> > Meanwhile, the DAG is slowed down by this operator. To deal with such
> > scenarios, we should write the output in a reconciled fashion where the
> > reconciler thread is writing at the pace of external system. We should
> also
> > provide an ability to spool the data to disk when the external system is
> > down or the output operators queue is full.
> >
> > Here are the proposed features for the output operator.
> >
> > 1. Write to external system in a separate reconciler thread.
> > 2. Queue the tuples in memory for reconciler thread to consume.
> > 3. Spool the incoming tuples to hdfs using a WAL when the queue is full.
> > 4. Read from WAL and write to queue as queue is being consumed.
> > 5. When external system is able to consume as fast as incoming
> throughput,
> > WAL is not written. The queue will just buffer the tuples before writing
> to
> > external system.
> >
> > Here is the JIRA: https://issues.apache.org/jira/browse/APEXMALHAR-2037
> >
> > Please let me know if you have any feedback on the design.
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>



-- 

Regards,
Ashwin.

Re: proposal for reconciled JDBC output operator

Posted by Sandesh Hegde <sa...@datatorrent.com>.
Essentially we need bandwidth controller for output operators? not specific
to JDBC?

On Mon, Mar 28, 2016 at 4:47 PM Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> There are many use cases in which we are writing tuples to external system
> using JDBC etc. There are instances when the external system might be slow
> and down for some time. In those cases, the current implementation of jdbc
> output operators fail and restart until the external system is up again.
> Meanwhile, the DAG is slowed down by this operator. To deal with such
> scenarios, we should write the output in a reconciled fashion where the
> reconciler thread is writing at the pace of external system. We should also
> provide an ability to spool the data to disk when the external system is
> down or the output operators queue is full.
>
> Here are the proposed features for the output operator.
>
> 1. Write to external system in a separate reconciler thread.
> 2. Queue the tuples in memory for reconciler thread to consume.
> 3. Spool the incoming tuples to hdfs using a WAL when the queue is full.
> 4. Read from WAL and write to queue as queue is being consumed.
> 5. When external system is able to consume as fast as incoming throughput,
> WAL is not written. The queue will just buffer the tuples before writing to
> external system.
>
> Here is the JIRA: https://issues.apache.org/jira/browse/APEXMALHAR-2037
>
> Please let me know if you have any feedback on the design.
>
> --
>
> Regards,
> Ashwin.
>