You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Hitesh Kapoor <hi...@datatorrent.com> on 2016/10/24 15:08:42 UTC

Exactly Once scenario in JdbcPOJOInsertOutput operator

Hi All,

Need your views and suggestions regarding JdbcPOJOInsertOutput operator.
This operator creates a transaction at the start of window, executes
batches of SQL updates, and closes the transaction at the end of the window.
Each tuple corresponds to an SQL insert statement. The operator groups the
inserts in a batch and submits them with one call to the database.
To write a tuple exactly once in the database, only when all the updates
are executed, the transaction is committed in the end window call.
For all this to function as per the expectation the underlying database (or
table) to which we are writing must have transaction capabilities.
For example the insert statements should not be auto committed, they should
only be committed when a commit is fired in endWindow(). If a commit is not
fired and the connection is closed (or roll backed) then there should not
be any inserts in the table.
This is important for exactly once to work correctly. For example consider
a batch size of 10000 and if the operator/container is killed after
inserting 4000 rows then when the operator comes back again these 4000
redundant rows will be inserted again.

So to handle the above scenario we can document and make it user's
responsibility to give the table/database supporting transactions, for
example if the database is MySQL then it is user's responsibility to give
the table with storage engine as InnoDB.
Please let me know if you have any other solution confirming to SQL
standards.

Regards,
Hitesh Kapoor

Re: Exactly Once scenario in JdbcPOJOInsertOutput operator

Posted by Hitesh Kapoor <hi...@datatorrent.com>.
Connection.getTransactionIsolation() can be used to determine if
transactions are supported at database level.
In MySQL, in the same database we can have tables supporting transactions
(e.g. InnoDB) and not supporting transactions.

Regards,
Hitesh

On Mon, Oct 24, 2016 at 9:12 PM, Munagala Ramanath <ra...@datatorrent.com>
wrote:

> Could Connection.getTransactionIsolation() be used to determine if
> transactions are supported ?
>
> Ram
>
> On Mon, Oct 24, 2016 at 8:08 AM, Hitesh Kapoor <hi...@datatorrent.com>
> wrote:
>
> > Hi All,
> >
> > Need your views and suggestions regarding JdbcPOJOInsertOutput operator.
> > This operator creates a transaction at the start of window, executes
> > batches of SQL updates, and closes the transaction at the end of the
> > window.
> > Each tuple corresponds to an SQL insert statement. The operator groups
> the
> > inserts in a batch and submits them with one call to the database.
> > To write a tuple exactly once in the database, only when all the updates
> > are executed, the transaction is committed in the end window call.
> > For all this to function as per the expectation the underlying database
> (or
> > table) to which we are writing must have transaction capabilities.
> > For example the insert statements should not be auto committed, they
> should
> > only be committed when a commit is fired in endWindow(). If a commit is
> not
> > fired and the connection is closed (or roll backed) then there should not
> > be any inserts in the table.
> > This is important for exactly once to work correctly. For example
> consider
> > a batch size of 10000 and if the operator/container is killed after
> > inserting 4000 rows then when the operator comes back again these 4000
> > redundant rows will be inserted again.
> >
> > So to handle the above scenario we can document and make it user's
> > responsibility to give the table/database supporting transactions, for
> > example if the database is MySQL then it is user's responsibility to give
> > the table with storage engine as InnoDB.
> > Please let me know if you have any other solution confirming to SQL
> > standards.
> >
> > Regards,
> > Hitesh Kapoor
> >
>

Re: Exactly Once scenario in JdbcPOJOInsertOutput operator

Posted by Munagala Ramanath <ra...@datatorrent.com>.
Could Connection.getTransactionIsolation() be used to determine if
transactions are supported ?

Ram

On Mon, Oct 24, 2016 at 8:08 AM, Hitesh Kapoor <hi...@datatorrent.com>
wrote:

> Hi All,
>
> Need your views and suggestions regarding JdbcPOJOInsertOutput operator.
> This operator creates a transaction at the start of window, executes
> batches of SQL updates, and closes the transaction at the end of the
> window.
> Each tuple corresponds to an SQL insert statement. The operator groups the
> inserts in a batch and submits them with one call to the database.
> To write a tuple exactly once in the database, only when all the updates
> are executed, the transaction is committed in the end window call.
> For all this to function as per the expectation the underlying database (or
> table) to which we are writing must have transaction capabilities.
> For example the insert statements should not be auto committed, they should
> only be committed when a commit is fired in endWindow(). If a commit is not
> fired and the connection is closed (or roll backed) then there should not
> be any inserts in the table.
> This is important for exactly once to work correctly. For example consider
> a batch size of 10000 and if the operator/container is killed after
> inserting 4000 rows then when the operator comes back again these 4000
> redundant rows will be inserted again.
>
> So to handle the above scenario we can document and make it user's
> responsibility to give the table/database supporting transactions, for
> example if the database is MySQL then it is user's responsibility to give
> the table with storage engine as InnoDB.
> Please let me know if you have any other solution confirming to SQL
> standards.
>
> Regards,
> Hitesh Kapoor
>