You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "tosaiganesh@gmail.com" <to...@gmail.com> on 2016/09/19 19:19:38 UTC

Spark Job not failing

Hi ,

I have primary key on sql table iam trying to insert Dataframe into table
using insertIntoJDBC.

I could see failure instances in logs but still spark job is getting
successful. Do you know  how can we handle in code to make it fail?



16/09/19 18:52:51 INFO TaskSetManager: Starting task 0.99 in stage 82.0 (TID
5032, 10.0.0.24, partition 0,PROCESS_LOCAL, 11300 bytes)
16/09/19 18:52:52 INFO TaskSetManager: Lost task 0.99 in stage 82.0 (TID
5032) on executor 10.0.0.24: java.sql.BatchUpdateException (Violation of
PRIMARY KEY constraint 'pk_unique'. Cannot insert duplicate key in object
'table_name'. The duplicate key value is (2016-09-13 04:00, 2016-09-13
04:15, 5816324).) [duplicate 99]
16/09/19 18:52:52 ERROR TaskSetManager: Task 0 in stage 82.0 failed 100
times; aborting job
16/09/19 18:52:52 INFO YarnClusterScheduler: Removed TaskSet 82.0, whose
tasks have all completed, from pool 
16/09/19 18:52:52 INFO YarnClusterScheduler: Cancelling stage 82
16/09/19 18:52:52 INFO DAGScheduler: ResultStage 82 (insertIntoJDBC at
sparkjob.scala:143) failed in 9.440 s
16/09/19 18:52:52 INFO DAGScheduler: Job 19 failed: insertIntoJDBC at
sparkjob.scala:143, took 9.449118 s
16/09/19 18:52:52 INFO ApplicationMaster: Final app status: SUCCEEDED,
exitCode: 0
16/09/19 18:52:52 INFO SparkContext: Invoking stop() from shutdown hook


Regards,
Sai



-----
Sai Ganesh
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-not-failing-tp27756.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark Job not failing

Posted by Mich Talebzadeh <mi...@gmail.com>.

I am not sure a commit or roll-back by RDBMS is acknowledged by Spark.
Hence it does not know what is going on. From my recollection this is an
issue.

Other alternative is to save it as a csv file and load it into RDBMS
using a form of bulk copy.

HTH



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 19 September 2016 at 21:00, sai ganesh <to...@gmail.com> wrote:

> yes.
>
>
> Regards,
> Sai
>
> On Mon, Sep 19, 2016 at 12:29 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> As I understanding you are inserting into RDBMS from Spark and the insert
>> is failing on RDBMS due to duplicate primary key but not acknowledged by
>> Spark? Is this correct
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 19 September 2016 at 20:19, tosaiganesh@gmail.com <
>> tosaiganesh@gmail.com> wrote:
>>
>>>
>>> Hi ,
>>>
>>> I have primary key on sql table iam trying to insert Dataframe into table
>>> using insertIntoJDBC.
>>>
>>> I could see failure instances in logs but still spark job is getting
>>> successful. Do you know  how can we handle in code to make it fail?
>>>
>>>
>>>
>>> 16/09/19 18:52:51 INFO TaskSetManager: Starting task 0.99 in stage 82.0
>>> (TID
>>> 5032, 10.0.0.24, partition 0,PROCESS_LOCAL, 11300 bytes)
>>> 16/09/19 18:52:52 INFO TaskSetManager: Lost task 0.99 in stage 82.0 (TID
>>> 5032) on executor 10.0.0.24: java.sql.BatchUpdateException (Violation of
>>> PRIMARY KEY constraint 'pk_unique'. Cannot insert duplicate key in object
>>> 'table_name'. The duplicate key value is (2016-09-13 04:00, 2016-09-13
>>> 04:15, 5816324).) [duplicate 99]
>>> 16/09/19 18:52:52 ERROR TaskSetManager: Task 0 in stage 82.0 failed 100
>>> times; aborting job
>>> 16/09/19 18:52:52 INFO YarnClusterScheduler: Removed TaskSet 82.0, whose
>>> tasks have all completed, from pool
>>> 16/09/19 18:52:52 INFO YarnClusterScheduler: Cancelling stage 82
>>> 16/09/19 18:52:52 INFO DAGScheduler: ResultStage 82 (insertIntoJDBC at
>>> sparkjob.scala:143) failed in 9.440 s
>>> 16/09/19 18:52:52 INFO DAGScheduler: Job 19 failed: insertIntoJDBC at
>>> sparkjob.scala:143, took 9.449118 s
>>> 16/09/19 18:52:52 INFO ApplicationMaster: Final app status: SUCCEEDED,
>>> exitCode: 0
>>> 16/09/19 18:52:52 INFO SparkContext: Invoking stop() from shutdown hook
>>>
>>>
>>> Regards,
>>> Sai
>>>
>>>
>>>
>>> -----
>>> Sai Ganesh
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/Spark-Job-not-failing-tp27756.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>>
>

Re: Spark Job not failing

Posted by sai ganesh <to...@gmail.com>.

yes.


Regards,
Sai

On Mon, Sep 19, 2016 at 12:29 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> As I understanding you are inserting into RDBMS from Spark and the insert
> is failing on RDBMS due to duplicate primary key but not acknowledged by
> Spark? Is this correct
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 19 September 2016 at 20:19, tosaiganesh@gmail.com <
> tosaiganesh@gmail.com> wrote:
>
>>
>> Hi ,
>>
>> I have primary key on sql table iam trying to insert Dataframe into table
>> using insertIntoJDBC.
>>
>> I could see failure instances in logs but still spark job is getting
>> successful. Do you know  how can we handle in code to make it fail?
>>
>>
>>
>> 16/09/19 18:52:51 INFO TaskSetManager: Starting task 0.99 in stage 82.0
>> (TID
>> 5032, 10.0.0.24, partition 0,PROCESS_LOCAL, 11300 bytes)
>> 16/09/19 18:52:52 INFO TaskSetManager: Lost task 0.99 in stage 82.0 (TID
>> 5032) on executor 10.0.0.24: java.sql.BatchUpdateException (Violation of
>> PRIMARY KEY constraint 'pk_unique'. Cannot insert duplicate key in object
>> 'table_name'. The duplicate key value is (2016-09-13 04:00, 2016-09-13
>> 04:15, 5816324).) [duplicate 99]
>> 16/09/19 18:52:52 ERROR TaskSetManager: Task 0 in stage 82.0 failed 100
>> times; aborting job
>> 16/09/19 18:52:52 INFO YarnClusterScheduler: Removed TaskSet 82.0, whose
>> tasks have all completed, from pool
>> 16/09/19 18:52:52 INFO YarnClusterScheduler: Cancelling stage 82
>> 16/09/19 18:52:52 INFO DAGScheduler: ResultStage 82 (insertIntoJDBC at
>> sparkjob.scala:143) failed in 9.440 s
>> 16/09/19 18:52:52 INFO DAGScheduler: Job 19 failed: insertIntoJDBC at
>> sparkjob.scala:143, took 9.449118 s
>> 16/09/19 18:52:52 INFO ApplicationMaster: Final app status: SUCCEEDED,
>> exitCode: 0
>> 16/09/19 18:52:52 INFO SparkContext: Invoking stop() from shutdown hook
>>
>>
>> Regards,
>> Sai
>>
>>
>>
>> -----
>> Sai Ganesh
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Spark-Job-not-failing-tp27756.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>

Re: Spark Job not failing

Posted by Mich Talebzadeh <mi...@gmail.com>.

As I understanding you are inserting into RDBMS from Spark and the insert
is failing on RDBMS due to duplicate primary key but not acknowledged by
Spark? Is this correct

HTH



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 19 September 2016 at 20:19, tosaiganesh@gmail.com <to...@gmail.com>
wrote:

>
> Hi ,
>
> I have primary key on sql table iam trying to insert Dataframe into table
> using insertIntoJDBC.
>
> I could see failure instances in logs but still spark job is getting
> successful. Do you know  how can we handle in code to make it fail?
>
>
>
> 16/09/19 18:52:51 INFO TaskSetManager: Starting task 0.99 in stage 82.0
> (TID
> 5032, 10.0.0.24, partition 0,PROCESS_LOCAL, 11300 bytes)
> 16/09/19 18:52:52 INFO TaskSetManager: Lost task 0.99 in stage 82.0 (TID
> 5032) on executor 10.0.0.24: java.sql.BatchUpdateException (Violation of
> PRIMARY KEY constraint 'pk_unique'. Cannot insert duplicate key in object
> 'table_name'. The duplicate key value is (2016-09-13 04:00, 2016-09-13
> 04:15, 5816324).) [duplicate 99]
> 16/09/19 18:52:52 ERROR TaskSetManager: Task 0 in stage 82.0 failed 100
> times; aborting job
> 16/09/19 18:52:52 INFO YarnClusterScheduler: Removed TaskSet 82.0, whose
> tasks have all completed, from pool
> 16/09/19 18:52:52 INFO YarnClusterScheduler: Cancelling stage 82
> 16/09/19 18:52:52 INFO DAGScheduler: ResultStage 82 (insertIntoJDBC at
> sparkjob.scala:143) failed in 9.440 s
> 16/09/19 18:52:52 INFO DAGScheduler: Job 19 failed: insertIntoJDBC at
> sparkjob.scala:143, took 9.449118 s
> 16/09/19 18:52:52 INFO ApplicationMaster: Final app status: SUCCEEDED,
> exitCode: 0
> 16/09/19 18:52:52 INFO SparkContext: Invoking stop() from shutdown hook
>
>
> Regards,
> Sai
>
>
>
> -----
> Sai Ganesh
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-Job-not-failing-tp27756.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>