You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Venkat Ranganathan <n....@live.com> on 2013/04/14 06:58:11 UTC

Re: Review Request: Request to review patch for SQOOP-954: Create Sqoop runtime scripts to run Sqoop on Windows


> On March 29, 2013, 7:03 p.m., Venkat Ranganathan wrote:
> > Hi Ahmed
> > 
> > Thanks for the new patch.  It looks good.  I still have one issue and suggestion.  The powershell script to generate the jar file is very good!  You are generating a jar file everytime and the jar file is generated under SQOOP_HOME.   There may be installations for the SQOOP_HOME may not be writable by user.   Also, I think the main motivation is to overcome the environment strings limitation.   Since JDK 1.6, Java has the ability to provide an option to provide a shortcut for all jars in a file (This probably should be done for the Unix classpaths also).   Please see http://docs.oracle.com/javase/6/docs/technotes/tools/windows/classpath.html  
> > 
> > I am thinking whether this should be a simpler change to just add all jars in SQOOP_LIB.  We have to say %SQOOP_HOME%\lib\*.   Of course, this introduces dependency on 1.6+ versions of JDK, but given that 1.5 is EOLed this should be OK
> > 
> > Thanks
> 
> Ahmed El Baz wrote:
>     Thank you a lot Venkat for the valuable comments,
>     
>     I have considered the wildcard option, however, there are some limitations why it was not preferable to go this route, and using the referencing jar would give more flexibility:
>     1) The need to specify particular jars to include, or exclude some jars and not include all jars by default in a dorectory by using wildcard. For example, in configure-sqoop a list of dependency jars for HBase are returned by invoking "hbase classpath" which returns a list of jars. In this case using a wrapper Jar releases us from worrying about the length of jars returned, and it is not possible to use the * in this case, unless we do some logic to get common dirs.
>     2) As you can see also in configure-jar, Sqoop has dependency on other components rather than just SQOOP_HOME\lib, like HBase, SQOOP_CONF, ZOOCFGDIR.
>     3) Using the wrapper jar would scale regardless of how many directories we include. I understand it is hard the number of folders increases to the limit where we see the long command error, but even in this case the wrapper jar would work just fine.
>     
>     I would like to unederstand more about scenarios where we anticipate SQOOP_HOME would not be writable on Windows systems.
>     
>     Thank you again,
>     Ahmed

Thanks Ahmed for the explanation.

I thought we are primarily limited by the 8K limit in the command line so if we can potentially limit the large jar file dirs in this format, then it would be fit within the limit.
Good point of hbase -classpath option.  May be we can have improvement on Hbase to return the hbase classpath with jar dirs properly added

For example, when people install Hadoop on Windows and decide that Hadoop stack will be installed under a terminal server and this is shared across multiple users - or it  may be installed in a common location and mapped based on logon scripts.   And the directory can become inaccessible for people running sqoop jobs.   This is a scheme used by some  Hadoop distributions today.

Thanks


- Venkat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10055/#review18523
-----------------------------------------------------------


On March 29, 2013, 3:01 a.m., Ahmed El Baz wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10055/
> -----------------------------------------------------------
> 
> (Updated March 29, 2013, 3:01 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> A patch implementing the Windows version of Sqoop run scripts. The scripts follow the same logic as there .sh counterparts.
> One difference is to create a Jar which references all classpath elements in its Manifest, and provide that jar as the single jar needed for Sqoop. The reason here is that in some cases if the number of classpath elements is large, HADOOP_CLASSPATH gets very long which causes failures in Windows since there is a limit to command lines.
> As a workaround, I added a step to wrap all jars in the classpath in a single jar, and then use that generated jar (this is also done in hadoop for Windows to handle similar issues)
> I did this in a utility script "BuildJar" which can be used for other components as well.
> This change is specific to Windows scripts, Linux scripts are not affected.
> 
> 
> This addresses bug SQOOP-954.
>     https://issues.apache.org/jira/browse/SQOOP-954
> 
> 
> Diffs
> -----
> 
>   bin/BuildJar.ps1 PRE-CREATION 
>   bin/configure-sqoop.cmd PRE-CREATION 
>   bin/sqoop.cmd PRE-CREATION 
>   conf/sqoop-env-template.cmd PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10055/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Ahmed El Baz
> 
>