You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Jarek Jarcec Cecho (JIRA)" <ji...@apache.org> on 2015/05/11 15:20:00 UTC

[jira] [Comment Edited] (SQOOP-2151) Sqoop2: Sqoop mapreduce job gets into deadlock when loader throws an exception

    [ https://issues.apache.org/jira/browse/SQOOP-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537506#comment-14537506 ] 

Jarek Jarcec Cecho edited comment on SQOOP-2151 at 5/11/15 1:19 PM:
--------------------------------------------------------------------

Thx Gwen,

I was able to recreate the problem repeatedly.

The problem is two part, one it is a race condition and the other is the integration testing does a catch on exceptions where the unit tests don't.

So first the race condition.  This is what happens to make the lock happen
- Thread 1 - Lock free
- Thread 2 - Exception
- Thread 2 - Released Free
- Thread 1 - Lock free
- Thread 1 - future.get
- Thread 1 - Exception thrown
- Thread 1 - Exception catched
- Thread 1 - close
- Thread 1 - Try to lock free
- Stuck

The solution is simple:  Just add the following to  {{waitForConsumer()}} when you catch an exception:

{code}
      if (free.availablePermits() == 0) {
        free.release();
      }
{code}

I also think we should add the try catch to the unit test to better align with the integrated tests.  So in short we catch exceptions when writing and then no matter the exception we try to close the writer.

Let me know what you think

Ted Malaska


was (Author: ted.m):
Thx Gwen,

I was able to recreate the problem repeatedly.

The problem is two part, one it is a race condition and the other is the integration testing does a catch on exceptions where the unit tests don't.

So first the race condition.  This is what happens to make the lock happen
- Thread 1 - Lock free
- Thread 2 - Exception
- Thread 2 - Released Free
- Thread 1 - Lock free
- Thread 1 - future.get
- Thread 1 - Exception thrown
- Thread 1 - Exception catched
- Thread 1 - close
- Thread 1 - Try to lock free
- Stuck

The solution is simple:  Just add the following to  waitForConsumer() when you catch an exception:

      if (free.availablePermits() == 0) {
        free.release();
      }

I also think we should add the try catch to the unit test to better align with the integrated tests.  So in short we catch exceptions when writing and then no matter the exception we try to close the writer.

Let me know what you think

Ted Malaska

> Sqoop2: Sqoop mapreduce job gets into deadlock when loader throws an exception
> ------------------------------------------------------------------------------
>
>                 Key: SQOOP-2151
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2151
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.99.5
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Ted Malaska
>            Priority: Blocker
>             Fix For: 2.0.0
>
>
> I'm working on Kite integration tests and I've noticed that there is certain case where Sqoop mapreduce job gets into deadlock.
> I've get there by running Kite job after upgrading to Kite 1.0 but before fixing the temporary data set problem covered by SQOOP-2150. Here is the log output from mapper:
> {code}
> 2015-02-28 09:14:50,994 [OutputFormatLoader-consumer] INFO  org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor  - SqoopOutputFormatLoadExecutor consumer thread is starting
> 2015-02-28 09:14:51,021 [OutputFormatLoader-consumer] INFO  org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor  - Running loader class org.apache.sqoop.connector.kite.KiteLoader
> 2015-02-28 09:14:51,025 [main] INFO  org.apache.sqoop.job.mr.SqoopMapper  - Starting progress service
> 2015-02-28 09:14:51,030 [main] INFO  org.apache.sqoop.job.mr.SqoopMapper  - Running extractor class org.apache.sqoop.connector.jdbc.GenericJdbcExtractor
> 2015-02-28 09:14:51,306 [main] INFO  org.apache.sqoop.connector.jdbc.GenericJdbcExtractor  - Using query: SELECT * FROM FROMRDBMSTOKITETEST WHERE 1 <= "id" AND "id" <= 4
> 2015-02-28 09:14:51,627 [OutputFormatLoader-consumer] ERROR org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor  - Error while loading data out of MR job.
> org.kitesdk.data.ValidationException: Dataset name temp_9975e79a-7e5d-493a-b6d4-646f3452a51f is not alphanumeric (plus '_')
> 	at org.kitesdk.data.ValidationException.check(ValidationException.java:55)
> 	at org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:103)
> 	at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:66)
> 	at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
> 	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
> 	at org.kitesdk.data.Datasets.create(Datasets.java:239)
> 	at org.kitesdk.data.Datasets.create(Datasets.java:307)
> 	at org.kitesdk.data.Datasets.create(Datasets.java:335)
> 	at org.apache.sqoop.connector.kite.KiteDatasetExecutor.createDataset(KiteDatasetExecutor.java:67)
> 	at org.apache.sqoop.connector.kite.KiteLoader.getExecutor(KiteLoader.java:51)
> 	at org.apache.sqoop.connector.kite.KiteLoader.load(KiteLoader.java:61)
> 	at org.apache.sqoop.connector.kite.KiteLoader.load(KiteLoader.java:36)
> 	at org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor$ConsumerThread.run(SqoopOutputFormatLoadExecutor.java:250)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-02-28 09:14:51,633 [main] INFO  org.apache.sqoop.job.mr.SqoopMapper  - Stopping progress service
> 2015-02-28 09:14:51,634 [main] INFO  org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor  - SqoopOutputFormatLoadExecutor::SqoopRecordWriter is about to be closed
> {code}
> But the mapper never finished, here is the relevant jstack:
> {code}
> "main" #1 prio=5 os_prio=31 tid=0x00007fedf180a800 nid=0xc07 waiting on condition [0x000000010b3f2000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x0000000127399b50> (a java.util.concurrent.Semaphore$FairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> 	at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
> 	at org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor$SqoopRecordWriter.close(SqoopOutputFormatLoadExecutor.java:113)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
> 	at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2012)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)