You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Divya Gehlot <di...@gmail.com> on 2016/02/19 03:35:42 UTC

Spark JDBC connection - data writing success or failure cases

Hi,
I am a Spark job which connects to RDBMS (in mycase its Oracle).
How can we check that complete data writing is successful?
Can I use commit in case of success or rollback in case of failure ?



Thanks,
Divya

RE: Spark JDBC connection - data writing success or failure cases

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.

Ok as I understand you mean pushing data from Spark to Oracle database via JDBC? Correct

There are a number of ways to do so.

Most common way is using Sqoop to get the data from HDFS file or Hive table Oracle database. With Spark you can also use that method by storing the data in Hive table and using Sqoop to do the job or directly making reference to where the data is stored

Sqoop uses JDBC for this work and I believe it delivers data in batch transactions configurable with “export.statements.per.transaction”. Have a look at sqoop export –help.

With batch transactions depending on the size of the batch you may have a partial delivery of data in case of some issues such network failure or running out of space on the Oracle schema.

It really boils down to the volume of the data and the way this is going to happen say you job runs as a cron and you may do parallel processing with multiple connections to Oracle database.

In general it should work and much like what we see loading data from HFDS file to Oracle it should follow JDBC protocols.

One interesting concept that I would like to try is loading data from RDD -> DF -> temporary table in Spark – pushing data to Oracle DB via JDBC. I have done this other way round no problem.

Like any load you are effectively doing an ETL from Spark to Oracle and you are better off loading data to Oracle staging table first and once all gone through and you have checked the job, push data from the staging table to main table in Oracle to reduce the risk of failure.

HTH

Dr Mich Talebzadeh

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.

From: Divya Gehlot [mailto:divya.htconex@gmail.com] 
Sent: 21 February 2016 00:09
To: Mich Talebzadeh <mi...@peridale.co.uk>
Cc: user @spark <us...@spark.apache.org>; Russell Jurney <ru...@gmail.com>; Jörn Franke <jo...@gmail.com>
Subject: RE: Spark JDBC connection - data writing success or failure cases

Thanks for the input everyone .
What I am trying to understand is if I use oracle to store my data after Spark job processing.
And if any spark job fails half the way.
What happens then..
Does rollback happens or we have to programatically  handle this kind of situation in spark job itself?
How transaction are being handled n spark to oracle storage ?
My apologies for such a naive question .
Thanks,
Divya 

agreed

Dr Mich Talebzadeh

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.

From: Russell Jurney [mailto:russell.jurney@gmail.com <ma...@gmail.com> ] 
Sent: 19 February 2016 16:49
To: Jörn Franke <jornfranke@gmail.com <ma...@gmail.com> >
Cc: Divya Gehlot <divya.htconex@gmail.com <ma...@gmail.com> >; user @spark <user@spark.apache.org <ma...@spark.apache.org> >
Subject: Re: Spark JDBC connection - data writing success or failure cases

Oracle is a perfectly reasonable endpoint for publishing data processed in Spark. I've got to assume he's using it that way and not as a stand in for HDFS?

On Friday, February 19, 2016, Jörn Franke <jornfranke@gmail.com <ma...@gmail.com> > wrote:

Generally oracle db should not be used as a storage layer for spark due to performance reasons. You should consider HDFS. This will help you also with fault - tolerance.

> On 19 Feb 2016, at 03:35, Divya Gehlot <divya.htconex@gmail.com <ma...@gmail.com> > wrote:
>
> Hi,
> I am a Spark job which connects to RDBMS (in mycase its Oracle).
> How can we check that complete data writing is successful?
> Can I use commit in case of success or rollback in case of failure ?
>
>
>
> Thanks,
> Divya

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org> 
For additional commands, e-mail: user-help@spark.apache.org <ma...@spark.apache.org> 

-- 

Russell Jurney twitter.com/rjurney <http://twitter.com/rjurney>  russell.jurney@gmail.com <ma...@gmail.com>  relato.io <http://relato.io>

RE: Spark JDBC connection - data writing success or failure cases

Posted by Divya Gehlot <di...@gmail.com>.

Thanks for the input everyone .
What I am trying to understand is if I use oracle to store my data after
Spark job processing.
And if any spark job fails half the way.
What happens then..
Does rollback happens or we have to programatically  handle this kind of
situation in spark job itself?
How transaction are being handled n spark to oracle storage ?
My apologies for such a naive question .
Thanks,
Divya

agreed

Dr Mich Talebzadeh

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus
free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
employees accept any responsibility.

*From:* Russell Jurney [mailto:russell.jurney@gmail.com]
*Sent:* 19 February 2016 16:49
*To:* Jörn Franke <jo...@gmail.com>
*Cc:* Divya Gehlot <di...@gmail.com>; user @spark <
user@spark.apache.org>
*Subject:* Re: Spark JDBC connection - data writing success or failure cases

Oracle is a perfectly reasonable endpoint for publishing data processed in
Spark. I've got to assume he's using it that way and not as a stand in for
HDFS?

On Friday, February 19, 2016, Jörn Franke <jo...@gmail.com> wrote:

Generally oracle db should not be used as a storage layer for spark due to
performance reasons. You should consider HDFS. This will help you also with
fault - tolerance.

> On 19 Feb 2016, at 03:35, Divya Gehlot <di...@gmail.com> wrote:
>
> Hi,
> I am a Spark job which connects to RDBMS (in mycase its Oracle).
> How can we check that complete data writing is successful?
> Can I use commit in case of success or rollback in case of failure ?
>
>
>
> Thanks,
> Divya

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

-- 

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

RE: Spark JDBC connection - data writing success or failure cases

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.

agreed

Dr Mich Talebzadeh

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.

From: Russell Jurney [mailto:russell.jurney@gmail.com] 
Sent: 19 February 2016 16:49
To: Jörn Franke <jo...@gmail.com>
Cc: Divya Gehlot <di...@gmail.com>; user @spark <us...@spark.apache.org>
Subject: Re: Spark JDBC connection - data writing success or failure cases

Oracle is a perfectly reasonable endpoint for publishing data processed in Spark. I've got to assume he's using it that way and not as a stand in for HDFS?

On Friday, February 19, 2016, Jörn Franke <jornfranke@gmail.com <ma...@gmail.com> > wrote:

Generally oracle db should not be used as a storage layer for spark due to performance reasons. You should consider HDFS. This will help you also with fault - tolerance.

> On 19 Feb 2016, at 03:35, Divya Gehlot <divya.htconex@gmail.com <javascript:;> > wrote:
>
> Hi,
> I am a Spark job which connects to RDBMS (in mycase its Oracle).
> How can we check that complete data writing is successful?
> Can I use commit in case of success or rollback in case of failure ?
>
>
>
> Thanks,
> Divya

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <javascript:;> 
For additional commands, e-mail: user-help@spark.apache.org <javascript:;> 

-- 

Russell Jurney twitter.com/rjurney <http://twitter.com/rjurney>  russell.jurney@gmail.com <ma...@gmail.com>  relato.io <http://relato.io>

Re: Spark JDBC connection - data writing success or failure cases

Posted by Russell Jurney <ru...@gmail.com>.

Oracle is a perfectly reasonable endpoint for publishing data processed in
Spark. I've got to assume he's using it that way and not as a stand in for
HDFS?

On Friday, February 19, 2016, Jörn Franke <jo...@gmail.com> wrote:

> Generally oracle db should not be used as a storage layer for spark due to
> performance reasons. You should consider HDFS. This will help you also with
> fault - tolerance.
>
> > On 19 Feb 2016, at 03:35, Divya Gehlot <divya.htconex@gmail.com
> <javascript:;>> wrote:
> >
> > Hi,
> > I am a Spark job which connects to RDBMS (in mycase its Oracle).
> > How can we check that complete data writing is successful?
> > Can I use commit in case of success or rollback in case of failure ?
> >
> >
> >
> > Thanks,
> > Divya
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <javascript:;>
> For additional commands, e-mail: user-help@spark.apache.org <javascript:;>
>
>

-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

Re: Spark JDBC connection - data writing success or failure cases

Posted by Jörn Franke <jo...@gmail.com>.

Generally oracle db should not be used as a storage layer for spark due to performance reasons. You should consider HDFS. This will help you also with fault - tolerance. 

> On 19 Feb 2016, at 03:35, Divya Gehlot <di...@gmail.com> wrote:
> 
> Hi,
> I am a Spark job which connects to RDBMS (in mycase its Oracle).
> How can we check that complete data writing is successful?
> Can I use commit in case of success or rollback in case of failure ?
> 
> 
> 
> Thanks,
> Divya 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

RE: Spark JDBC connection - data writing success or failure cases

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.

Sorry where is the source of data. Are you writing to Oracle table or reading from?

 

In general JDBC messages will you about the connection failure halfway or any other message received say from Oracle via JDBC.

 

What batch size are you using for this transaction?

 

HTH

 

Dr Mich Talebzadeh

 

LinkedIn   <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.

 

 

From: Divya Gehlot [mailto:divya.htconex@gmail.com] 
Sent: 19 February 2016 02:36
To: user @spark <us...@spark.apache.org>
Subject: Spark JDBC connection - data writing success or failure cases

 

Hi,

I am a Spark job which connects to RDBMS (in mycase its Oracle).

How can we check that complete data writing is successful?

Can I use commit in case of success or rollback in case of failure ?

 

 

 

Thanks,

Divya