You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by daniel voros <da...@gmail.com> on 2018/03/02 15:39:17 UTC

Review Request 65884: SqoopJobDataPublisher is invoked before Hive/HCat imports succeed

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65884/
-----------------------------------------------------------

Review request for Sqoop.


Bugs: SQOOP-3291
    https://issues.apache.org/jira/browse/SQOOP-3291


Repository: sqoop-trunk


Description
-------

Job data is published to listeners (defined via sqoop.job.data.publish.class) in case of Hive and HCat imports. Currently this happens before the Hive import completes, so it gets reported even if Hive import fails.


Diffs
-----

  src/java/org/apache/sqoop/hive/HiveImport.java c272911 
  src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 6529bd2 
  src/java/org/apache/sqoop/mapreduce/ImportJobBase.java fb5d054 
  src/java/org/apache/sqoop/mapreduce/PublishJobData.java fc18188 
  src/java/org/apache/sqoop/tool/ImportTool.java e992005 
  src/test/org/apache/sqoop/TestSqoopJobDataPublisher.java b3579ac 
  src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java a5f85a0 


Diff: https://reviews.apache.org/r/65884/diff/1/


Testing
-------

- created unit test
 - tested on a cluster with Atlas SqoopHook in place


Thanks,

daniel voros


Re: Review Request 65884: SqoopJobDataPublisher is invoked before Hive/HCat imports succeed

Posted by daniel voros <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65884/#review198530
-----------------------------------------------------------



Patch #1 changes the following:
 - moves the reporting after Hive import
 - creats isolated classloader for Hive import, since Hive was polluting its classloader that caused trouble in Atlas' SqoopHook
 - changes PublishJobData#publishJobData to accept class name as its first argument instead of unnecessary Configuration object

- daniel voros


On March 2, 2018, 3:39 p.m., daniel voros wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65884/
> -----------------------------------------------------------
> 
> (Updated March 2, 2018, 3:39 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3291
>     https://issues.apache.org/jira/browse/SQOOP-3291
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Job data is published to listeners (defined via sqoop.job.data.publish.class) in case of Hive and HCat imports. Currently this happens before the Hive import completes, so it gets reported even if Hive import fails.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/hive/HiveImport.java c272911 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 6529bd2 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java fb5d054 
>   src/java/org/apache/sqoop/mapreduce/PublishJobData.java fc18188 
>   src/java/org/apache/sqoop/tool/ImportTool.java e992005 
>   src/test/org/apache/sqoop/TestSqoopJobDataPublisher.java b3579ac 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java a5f85a0 
> 
> 
> Diff: https://reviews.apache.org/r/65884/diff/1/
> 
> 
> Testing
> -------
> 
> - created unit test
>  - tested on a cluster with Atlas SqoopHook in place
> 
> 
> Thanks,
> 
> daniel voros
> 
>


Re: Review Request 65884: SqoopJobDataPublisher is invoked before Hive/HCat imports succeed

Posted by Fero Szabo via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65884/#review198933
-----------------------------------------------------------




src/java/org/apache/sqoop/hive/HiveImport.java
Lines 312-316 (patched)
<https://reviews.apache.org/r/65884/#comment279212>

    Hi Dani,
    
    I believe the original design that includes the testMode boolean flag, that determines whether we are in a test or not, is quite flawed. One thing you could consider is removing it and use some kind of mocking instead. 
    
    But, as it's used in 5 tests, this might be an issue with a bigger scope and best captured and solved in a different Jira? (Though that might just get forgotten and lost...)


- Fero Szabo


On March 2, 2018, 3:39 p.m., daniel voros wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65884/
> -----------------------------------------------------------
> 
> (Updated March 2, 2018, 3:39 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3291
>     https://issues.apache.org/jira/browse/SQOOP-3291
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Job data is published to listeners (defined via sqoop.job.data.publish.class) in case of Hive and HCat imports. Currently this happens before the Hive import completes, so it gets reported even if Hive import fails.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/hive/HiveImport.java c272911 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 6529bd2 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java fb5d054 
>   src/java/org/apache/sqoop/mapreduce/PublishJobData.java fc18188 
>   src/java/org/apache/sqoop/tool/ImportTool.java e992005 
>   src/test/org/apache/sqoop/TestSqoopJobDataPublisher.java b3579ac 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java a5f85a0 
> 
> 
> Diff: https://reviews.apache.org/r/65884/diff/1/
> 
> 
> Testing
> -------
> 
> - created unit test
>  - tested on a cluster with Atlas SqoopHook in place
> 
> 
> Thanks,
> 
> daniel voros
> 
>


Re: Review Request 65884: SqoopJobDataPublisher is invoked before Hive/HCat imports succeed

Posted by Boglarka Egyed <bo...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65884/#review198887
-----------------------------------------------------------



Hi Daniel,

Nice catch! I took a look at your change and I have couple of comments, please find them below.

Also, PostgresqlExternalTableImportTest third party test fails for me with your patch:

[ERROR - org.apache.sqoop.tool.ImportTool.run(ImportTool.java:653)] Import failed: java.io.IOException: Hive exited with status 1
	at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:398)
	at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:351)
	at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:248)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:549)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:647)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
	at org.apache.sqoop.testutil.ImportJobTestCase.runImport(ImportJobTestCase.java:227)
	at org.apache.sqoop.testutil.ImportJobTestCase.runImport(ImportJobTestCase.java:242)
	at org.apache.sqoop.manager.postgresql.PostgresqlExternalTableImportTest.doImportAndVerify(PostgresqlExternalTableImportTest.java:251)
	at org.apache.sqoop.manager.postgresql.PostgresqlExternalTableImportTest.testJdbcBasedImport(PostgresqlExternalTableImportTest.java:285)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:38)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)

Could you please take a look?

Thank you,
Bogi


src/java/org/apache/sqoop/hive/HiveImport.java
Lines 312-316 (patched)
<https://reviews.apache.org/r/65884/#comment279159>

    I'm wondering of you could maybe mock an exception instead of introducing this test-only logic in the implementation?



src/test/org/apache/sqoop/TestSqoopJobDataPublisher.java
Lines 115-116 (patched)
<https://reviews.apache.org/r/65884/#comment279137>

    It seema to be like some debugging lines have been left here.



src/test/org/apache/sqoop/TestSqoopJobDataPublisher.java
Lines 173-178 (patched)
<https://reviews.apache.org/r/65884/#comment279138>

    Instead of this try-catch logic could you please use the ExpectedException rule like in TestIncrementalImport for example?


- Boglarka Egyed


On March 2, 2018, 3:39 p.m., daniel voros wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65884/
> -----------------------------------------------------------
> 
> (Updated March 2, 2018, 3:39 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3291
>     https://issues.apache.org/jira/browse/SQOOP-3291
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Job data is published to listeners (defined via sqoop.job.data.publish.class) in case of Hive and HCat imports. Currently this happens before the Hive import completes, so it gets reported even if Hive import fails.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/hive/HiveImport.java c272911 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 6529bd2 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java fb5d054 
>   src/java/org/apache/sqoop/mapreduce/PublishJobData.java fc18188 
>   src/java/org/apache/sqoop/tool/ImportTool.java e992005 
>   src/test/org/apache/sqoop/TestSqoopJobDataPublisher.java b3579ac 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java a5f85a0 
> 
> 
> Diff: https://reviews.apache.org/r/65884/diff/1/
> 
> 
> Testing
> -------
> 
> - created unit test
>  - tested on a cluster with Atlas SqoopHook in place
> 
> 
> Thanks,
> 
> daniel voros
> 
>