You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Shuaishuai Nie <sh...@microsoft.com> on 2013/09/11 19:12:32 UTC
Review Request 14085: Review request for SQOOP-1192 Add option
"--skip-dist-cache" to allow Sqoop not copying jars in %SQOOP_HOME%\lib
folder when launched by Oozie and use Oozie share lib
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14085/
-----------------------------------------------------------
Review request for Sqoop.
Bugs: SQOOP-1192
https://issues.apache.org/jira/browse/SQOOP-1192
Repository: sqoop-trunk
Description
-------
Now Sqoop will copy jar files in %SQOOP_HOME%\lib folder to the job cache every time a Sqoop job is launched. When Oozie launch a Sqoop job, this behavior can be optimized by add these jars in Oozie Sqoop sharelib. In this case, the jar files in share lib only needed be localized to each worker node once and reuse by all Sqoop job launched by Oozie. This can reduce massive disk I/O on worker node when using Sqoop by Oozie. To enable this, Sqoop need to have an option which enable the job to skip adding lib jars to the job cache. For now, this option should only be used by Oozie started Sqoop job. The patch attached introduce "--skip-dist-cache" option to enable this feature.
Diffs
-----
src/java/org/apache/sqoop/SqoopOptions.java 01805f9
src/java/org/apache/sqoop/mapreduce/JobBase.java 322df1c
src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java b05f587
src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857
Diff: https://reviews.apache.org/r/14085/diff/
Testing
-------
Tested the new option with Oozie-Sqoop workflow to ensure it doesn't break Sqoop library dependencies when launched by Oozie
Thanks,
Shuaishuai Nie
Re: Review Request 14085: Review request for SQOOP-1192 Add option
"--skip-dist-cache" to allow Sqoop not copying jars in %SQOOP_HOME%\lib
folder when launched by Oozie and use Oozie share lib
Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14085/#review26986
-----------------------------------------------------------
Ship it!
Hi sir, please just fix the following two nits and publish the updated patch on the JIRA, I'll be more then happy to commit it!
src/docs/user/import.txt
<https://reviews.apache.org/r/14085/#comment52554>
Nit: please use the linux notation $SQOOP_HOME instead of the windows notation %SQOOP_HOME% (entire guide is written in a way to be compatible with linux, so it would be great to stay consistent).
src/docs/user/import.txt
<https://reviews.apache.org/r/14085/#comment52553>
Nit: s/massave/massive/
Jarcec
- Jarek Cecho
On Oct. 14, 2013, 9:31 p.m., Shuaishuai Nie wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14085/
> -----------------------------------------------------------
>
> (Updated Oct. 14, 2013, 9:31 p.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-1192
> https://issues.apache.org/jira/browse/SQOOP-1192
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> Now Sqoop will copy jar files in %SQOOP_HOME%\lib folder to the job cache every time a Sqoop job is launched. When Oozie launch a Sqoop job, this behavior can be optimized by add these jars in Oozie Sqoop sharelib. In this case, the jar files in share lib only needed be localized to each worker node once and reuse by all Sqoop job launched by Oozie. This can reduce massive disk I/O on worker node when using Sqoop by Oozie. To enable this, Sqoop need to have an option which enable the job to skip adding lib jars to the job cache. For now, this option should only be used by Oozie started Sqoop job. The patch attached introduce "--skip-dist-cache" option to enable this feature.
>
>
> Diffs
> -----
>
> src/docs/user/import.txt 71b50d8
> src/java/org/apache/sqoop/SqoopOptions.java 01805f9
> src/java/org/apache/sqoop/mapreduce/JobBase.java 322df1c
> src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java b05f587
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857
> src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504
>
> Diff: https://reviews.apache.org/r/14085/diff/
>
>
> Testing
> -------
>
> Tested the new option with Oozie-Sqoop workflow to ensure it doesn't break Sqoop library dependencies when launched by Oozie
>
>
> Thanks,
>
> Shuaishuai Nie
>
>
Re: Review Request 14085: Review request for SQOOP-1192 Add option
"--skip-dist-cache" to allow Sqoop not copying jars in %SQOOP_HOME%\lib
folder when launched by Oozie and use Oozie share lib
Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14085/#review26987
-----------------------------------------------------------
Ship it!
Ship It!
- Jarek Cecho
On Oct. 14, 2013, 10:20 p.m., Shuaishuai Nie wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14085/
> -----------------------------------------------------------
>
> (Updated Oct. 14, 2013, 10:20 p.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-1192
> https://issues.apache.org/jira/browse/SQOOP-1192
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> Now Sqoop will copy jar files in %SQOOP_HOME%\lib folder to the job cache every time a Sqoop job is launched. When Oozie launch a Sqoop job, this behavior can be optimized by add these jars in Oozie Sqoop sharelib. In this case, the jar files in share lib only needed be localized to each worker node once and reuse by all Sqoop job launched by Oozie. This can reduce massive disk I/O on worker node when using Sqoop by Oozie. To enable this, Sqoop need to have an option which enable the job to skip adding lib jars to the job cache. For now, this option should only be used by Oozie started Sqoop job. The patch attached introduce "--skip-dist-cache" option to enable this feature.
>
>
> Diffs
> -----
>
> src/docs/user/import.txt 71b50d8
> src/java/org/apache/sqoop/SqoopOptions.java 01805f9
> src/java/org/apache/sqoop/mapreduce/JobBase.java 322df1c
> src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java b05f587
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857
> src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504
>
> Diff: https://reviews.apache.org/r/14085/diff/
>
>
> Testing
> -------
>
> Tested the new option with Oozie-Sqoop workflow to ensure it doesn't break Sqoop library dependencies when launched by Oozie
>
>
> Thanks,
>
> Shuaishuai Nie
>
>
Re: Review Request 14085: Review request for SQOOP-1192 Add option
"--skip-dist-cache" to allow Sqoop not copying jars in %SQOOP_HOME%\lib
folder when launched by Oozie and use Oozie share lib
Posted by Shuaishuai Nie <sh...@microsoft.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14085/
-----------------------------------------------------------
(Updated Oct. 14, 2013, 10:20 p.m.)
Review request for Sqoop.
Bugs: SQOOP-1192
https://issues.apache.org/jira/browse/SQOOP-1192
Repository: sqoop-trunk
Description
-------
Now Sqoop will copy jar files in %SQOOP_HOME%\lib folder to the job cache every time a Sqoop job is launched. When Oozie launch a Sqoop job, this behavior can be optimized by add these jars in Oozie Sqoop sharelib. In this case, the jar files in share lib only needed be localized to each worker node once and reuse by all Sqoop job launched by Oozie. This can reduce massive disk I/O on worker node when using Sqoop by Oozie. To enable this, Sqoop need to have an option which enable the job to skip adding lib jars to the job cache. For now, this option should only be used by Oozie started Sqoop job. The patch attached introduce "--skip-dist-cache" option to enable this feature.
Diffs (updated)
-----
src/docs/user/import.txt 71b50d8
src/java/org/apache/sqoop/SqoopOptions.java 01805f9
src/java/org/apache/sqoop/mapreduce/JobBase.java 322df1c
src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java b05f587
src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857
src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504
Diff: https://reviews.apache.org/r/14085/diff/
Testing
-------
Tested the new option with Oozie-Sqoop workflow to ensure it doesn't break Sqoop library dependencies when launched by Oozie
Thanks,
Shuaishuai Nie
Re: Review Request 14085: Review request for SQOOP-1192 Add option
"--skip-dist-cache" to allow Sqoop not copying jars in %SQOOP_HOME%\lib
folder when launched by Oozie and use Oozie share lib
Posted by Shuaishuai Nie <sh...@microsoft.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14085/
-----------------------------------------------------------
(Updated Oct. 14, 2013, 9:31 p.m.)
Review request for Sqoop.
Bugs: SQOOP-1192
https://issues.apache.org/jira/browse/SQOOP-1192
Repository: sqoop-trunk
Description
-------
Now Sqoop will copy jar files in %SQOOP_HOME%\lib folder to the job cache every time a Sqoop job is launched. When Oozie launch a Sqoop job, this behavior can be optimized by add these jars in Oozie Sqoop sharelib. In this case, the jar files in share lib only needed be localized to each worker node once and reuse by all Sqoop job launched by Oozie. This can reduce massive disk I/O on worker node when using Sqoop by Oozie. To enable this, Sqoop need to have an option which enable the job to skip adding lib jars to the job cache. For now, this option should only be used by Oozie started Sqoop job. The patch attached introduce "--skip-dist-cache" option to enable this feature.
Diffs (updated)
-----
src/docs/user/import.txt 71b50d8
src/java/org/apache/sqoop/SqoopOptions.java 01805f9
src/java/org/apache/sqoop/mapreduce/JobBase.java 322df1c
src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java b05f587
src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857
src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504
Diff: https://reviews.apache.org/r/14085/diff/
Testing
-------
Tested the new option with Oozie-Sqoop workflow to ensure it doesn't break Sqoop library dependencies when launched by Oozie
Thanks,
Shuaishuai Nie
Re: Review Request 14085: Review request for SQOOP-1192 Add option
"--skip-dist-cache" to allow Sqoop not copying jars in %SQOOP_HOME%\lib
folder when launched by Oozie and use Oozie share lib
Posted by Shuaishuai Nie <sh...@microsoft.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14085/
-----------------------------------------------------------
(Updated Oct. 14, 2013, 9:10 p.m.)
Review request for Sqoop.
Changes
-------
Updated the documentation for the new option --skip-dist-cache
Bugs: SQOOP-1192
https://issues.apache.org/jira/browse/SQOOP-1192
Repository: sqoop-trunk
Description
-------
Now Sqoop will copy jar files in %SQOOP_HOME%\lib folder to the job cache every time a Sqoop job is launched. When Oozie launch a Sqoop job, this behavior can be optimized by add these jars in Oozie Sqoop sharelib. In this case, the jar files in share lib only needed be localized to each worker node once and reuse by all Sqoop job launched by Oozie. This can reduce massive disk I/O on worker node when using Sqoop by Oozie. To enable this, Sqoop need to have an option which enable the job to skip adding lib jars to the job cache. For now, this option should only be used by Oozie started Sqoop job. The patch attached introduce "--skip-dist-cache" option to enable this feature.
Diffs (updated)
-----
src/docs/user/import.txt 71b50d8
src/java/org/apache/sqoop/SqoopOptions.java 01805f9
src/java/org/apache/sqoop/mapreduce/JobBase.java 322df1c
src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java b05f587
src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857
src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504
Diff: https://reviews.apache.org/r/14085/diff/
Testing
-------
Tested the new option with Oozie-Sqoop workflow to ensure it doesn't break Sqoop library dependencies when launched by Oozie
Thanks,
Shuaishuai Nie
Re: Review Request 14085: Review request for SQOOP-1192 Add option
"--skip-dist-cache" to allow Sqoop not copying jars in %SQOOP_HOME%\lib
folder when launched by Oozie and use Oozie share lib
Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14085/#review26949
-----------------------------------------------------------
Hi Shuaishuai,
thank you for working on this! The changes looks good, I do have just one question - would you mind updating user guide with the new command? You can find the documentation in src/docs/user and you can build it using "ant docs".
Jarcec
- Jarek Cecho
On Sept. 18, 2013, 6:53 p.m., Shuaishuai Nie wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14085/
> -----------------------------------------------------------
>
> (Updated Sept. 18, 2013, 6:53 p.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-1192
> https://issues.apache.org/jira/browse/SQOOP-1192
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> Now Sqoop will copy jar files in %SQOOP_HOME%\lib folder to the job cache every time a Sqoop job is launched. When Oozie launch a Sqoop job, this behavior can be optimized by add these jars in Oozie Sqoop sharelib. In this case, the jar files in share lib only needed be localized to each worker node once and reuse by all Sqoop job launched by Oozie. This can reduce massive disk I/O on worker node when using Sqoop by Oozie. To enable this, Sqoop need to have an option which enable the job to skip adding lib jars to the job cache. For now, this option should only be used by Oozie started Sqoop job. The patch attached introduce "--skip-dist-cache" option to enable this feature.
>
>
> Diffs
> -----
>
> src/java/org/apache/sqoop/SqoopOptions.java 01805f9
> src/java/org/apache/sqoop/mapreduce/JobBase.java 322df1c
> src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java b05f587
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857
> src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504
>
> Diff: https://reviews.apache.org/r/14085/diff/
>
>
> Testing
> -------
>
> Tested the new option with Oozie-Sqoop workflow to ensure it doesn't break Sqoop library dependencies when launched by Oozie
>
>
> Thanks,
>
> Shuaishuai Nie
>
>
Re: Review Request 14085: Review request for SQOOP-1192 Add option
"--skip-dist-cache" to allow Sqoop not copying jars in %SQOOP_HOME%\lib
folder when launched by Oozie and use Oozie share lib
Posted by Shuaishuai Nie <sh...@microsoft.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14085/
-----------------------------------------------------------
(Updated Sept. 18, 2013, 6:53 p.m.)
Review request for Sqoop.
Bugs: SQOOP-1192
https://issues.apache.org/jira/browse/SQOOP-1192
Repository: sqoop-trunk
Description
-------
Now Sqoop will copy jar files in %SQOOP_HOME%\lib folder to the job cache every time a Sqoop job is launched. When Oozie launch a Sqoop job, this behavior can be optimized by add these jars in Oozie Sqoop sharelib. In this case, the jar files in share lib only needed be localized to each worker node once and reuse by all Sqoop job launched by Oozie. This can reduce massive disk I/O on worker node when using Sqoop by Oozie. To enable this, Sqoop need to have an option which enable the job to skip adding lib jars to the job cache. For now, this option should only be used by Oozie started Sqoop job. The patch attached introduce "--skip-dist-cache" option to enable this feature.
Diffs (updated)
-----
src/java/org/apache/sqoop/SqoopOptions.java 01805f9
src/java/org/apache/sqoop/mapreduce/JobBase.java 322df1c
src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java b05f587
src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857
src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504
Diff: https://reviews.apache.org/r/14085/diff/
Testing
-------
Tested the new option with Oozie-Sqoop workflow to ensure it doesn't break Sqoop library dependencies when launched by Oozie
Thanks,
Shuaishuai Nie
Re: Review Request 14085: Review request for SQOOP-1192 Add option
"--skip-dist-cache" to allow Sqoop not copying jars in %SQOOP_HOME%\lib
folder when launched by Oozie and use Oozie share lib
Posted by Shuaishuai Nie <sh...@microsoft.com>.
> On Sept. 11, 2013, 6:01 p.m., Venkat Ranganathan wrote:
> > Thanks for working on this Shuaishuai Nie. Can you add some unit tests (this option being recognized properly for example). Also, documentation can be updated to describe this feature
Thanks Venkat. Sorry for the delay, add unit test for the new option. How can I update the documentation for this feature? Do you mean the "Sqoop User Guide" website?
- Shuaishuai
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14085/#review26039
-----------------------------------------------------------
On Sept. 18, 2013, 6:53 p.m., Shuaishuai Nie wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14085/
> -----------------------------------------------------------
>
> (Updated Sept. 18, 2013, 6:53 p.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-1192
> https://issues.apache.org/jira/browse/SQOOP-1192
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> Now Sqoop will copy jar files in %SQOOP_HOME%\lib folder to the job cache every time a Sqoop job is launched. When Oozie launch a Sqoop job, this behavior can be optimized by add these jars in Oozie Sqoop sharelib. In this case, the jar files in share lib only needed be localized to each worker node once and reuse by all Sqoop job launched by Oozie. This can reduce massive disk I/O on worker node when using Sqoop by Oozie. To enable this, Sqoop need to have an option which enable the job to skip adding lib jars to the job cache. For now, this option should only be used by Oozie started Sqoop job. The patch attached introduce "--skip-dist-cache" option to enable this feature.
>
>
> Diffs
> -----
>
> src/java/org/apache/sqoop/SqoopOptions.java 01805f9
> src/java/org/apache/sqoop/mapreduce/JobBase.java 322df1c
> src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java b05f587
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857
> src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504
>
> Diff: https://reviews.apache.org/r/14085/diff/
>
>
> Testing
> -------
>
> Tested the new option with Oozie-Sqoop workflow to ensure it doesn't break Sqoop library dependencies when launched by Oozie
>
>
> Thanks,
>
> Shuaishuai Nie
>
>
Re: Review Request 14085: Review request for SQOOP-1192 Add option
"--skip-dist-cache" to allow Sqoop not copying jars in %SQOOP_HOME%\lib
folder when launched by Oozie and use Oozie share lib
Posted by Venkat Ranganathan <n....@live.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14085/#review26039
-----------------------------------------------------------
Thanks for working on this Shuaishuai Nie. Can you add some unit tests (this option being recognized properly for example). Also, documentation can be updated to describe this feature
- Venkat Ranganathan
On Sept. 11, 2013, 5:12 p.m., Shuaishuai Nie wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14085/
> -----------------------------------------------------------
>
> (Updated Sept. 11, 2013, 5:12 p.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-1192
> https://issues.apache.org/jira/browse/SQOOP-1192
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> Now Sqoop will copy jar files in %SQOOP_HOME%\lib folder to the job cache every time a Sqoop job is launched. When Oozie launch a Sqoop job, this behavior can be optimized by add these jars in Oozie Sqoop sharelib. In this case, the jar files in share lib only needed be localized to each worker node once and reuse by all Sqoop job launched by Oozie. This can reduce massive disk I/O on worker node when using Sqoop by Oozie. To enable this, Sqoop need to have an option which enable the job to skip adding lib jars to the job cache. For now, this option should only be used by Oozie started Sqoop job. The patch attached introduce "--skip-dist-cache" option to enable this feature.
>
>
> Diffs
> -----
>
> src/java/org/apache/sqoop/SqoopOptions.java 01805f9
> src/java/org/apache/sqoop/mapreduce/JobBase.java 322df1c
> src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java b05f587
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857
>
> Diff: https://reviews.apache.org/r/14085/diff/
>
>
> Testing
> -------
>
> Tested the new option with Oozie-Sqoop workflow to ensure it doesn't break Sqoop library dependencies when launched by Oozie
>
>
> Thanks,
>
> Shuaishuai Nie
>
>