You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Colin Ma <ju...@intel.com> on 2015/09/22 03:30:08 UTC

Review Request 38589: SQOOP-2563: Sqoop2: Jetty: Update script for Jetty server

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38589/
-----------------------------------------------------------

Review request for Sqoop.


Repository: sqoop-sqoop2


Description
-------

Update script for Jetty server


Diffs
-----

  dist/src/main/bin/sqoop-sys.sh 64dd0bf 
  dist/src/main/bin/sqoop.sh 707c3fc 

Diff: https://reviews.apache.org/r/38589/diff/


Testing
-------


Thanks,

Colin Ma


Re: Review Request 38589: SQOOP-2563: Sqoop2: Jetty: Update script for Jetty server

Posted by Colin Ma <ju...@intel.com>.

> On Sept. 24, 2015, 10:17 p.m., Jarek Cecho wrote:
> > dist/src/main/bin/sqoop.sh, lines 28-31
> > <https://reviews.apache.org/r/38589/diff/1/?file=1079624#file1079624line28>
> >
> >     I believe that with Hadoop 2 and various distributions such as BigTop, it's very likely that users will end up with different file structure then one single HADOOP_HOME. Hence I have few suggestions:
> >     
> >     1) Instead of using "our" own *_LIB directories, what about using *_HOME that are actualy referenced in the documentation (HADOOP_YARN_HOME for example):
> >     
> >     http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/ClusterSetup.html
> >     
> >     2) Rather then re-defining those properties from HADOOP_HOME, can we allow user to keep the original content if it exists. There is a bash expression for that, for example:
> >     
> >     CATALINA_BIN=${CATALINA_BIN:-${BASEDIR}/server/bin}
> >     
> >     Will either re-use the CATALINA_BIN if it was defined or use a "default" value of $BASEDIR/server/bin. E.g. user can set all those various locations for our dependencies and we are not insisting on the "right" directory structure.
> >     
> >     Jarcec

Thanks for the comments.
I update the patch and do the following changes:
1. remove the check for HADOOP_HOME, if user has the classpath problem when start the server, he can check the sqoop's document.
2. Update the environment variables from *_LIB* to *_HOME*, and supposed the necessary will be located in, eg, HADOOP_COMMON_HOME/*.jar, HADOOP_COMMON_HOME/*.jar.
3. For the environment variables, allow user to keep the original content if it exists. If no exist, the value will be, eg, ${HADOOP_HOME}/share/hadoop/common.
4. Add external jar for HIVE_HOME and exclude the jdbc jar for derby to avoid the sealing violation exception.

For the detail, please check the new patch, thanks.


- Colin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38589/#review100481
-----------------------------------------------------------


On Sept. 25, 2015, 7:10 a.m., Colin Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38589/
> -----------------------------------------------------------
> 
> (Updated Sept. 25, 2015, 7:10 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Update script for Jetty server
> 
> 
> Diffs
> -----
> 
>   dist/src/main/bin/sqoop-sys.sh 64dd0bf 
>   dist/src/main/bin/sqoop.sh 707c3fc 
> 
> Diff: https://reviews.apache.org/r/38589/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Colin Ma
> 
>


Re: Review Request 38589: SQOOP-2563: Sqoop2: Jetty: Update script for Jetty server

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38589/#review100481
-----------------------------------------------------------



dist/src/main/bin/sqoop.sh (lines 28 - 31)
<https://reviews.apache.org/r/38589/#comment157687>

    I believe that with Hadoop 2 and various distributions such as BigTop, it's very likely that users will end up with different file structure then one single HADOOP_HOME. Hence I have few suggestions:
    
    1) Instead of using "our" own *_LIB directories, what about using *_HOME that are actualy referenced in the documentation (HADOOP_YARN_HOME for example):
    
    http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/ClusterSetup.html
    
    2) Rather then re-defining those properties from HADOOP_HOME, can we allow user to keep the original content if it exists. There is a bash expression for that, for example:
    
    CATALINA_BIN=${CATALINA_BIN:-${BASEDIR}/server/bin}
    
    Will either re-use the CATALINA_BIN if it was defined or use a "default" value of $BASEDIR/server/bin. E.g. user can set all those various locations for our dependencies and we are not insisting on the "right" directory structure.
    
    Jarcec


- Jarek Cecho


On Sept. 22, 2015, 1:30 a.m., Colin Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38589/
> -----------------------------------------------------------
> 
> (Updated Sept. 22, 2015, 1:30 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Update script for Jetty server
> 
> 
> Diffs
> -----
> 
>   dist/src/main/bin/sqoop-sys.sh 64dd0bf 
>   dist/src/main/bin/sqoop.sh 707c3fc 
> 
> Diff: https://reviews.apache.org/r/38589/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Colin Ma
> 
>


Re: Review Request 38589: SQOOP-2563: Sqoop2: Jetty: Update script for Jetty server

Posted by Colin Ma <ju...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38589/
-----------------------------------------------------------

(Updated Sept. 30, 2015, 1:42 a.m.)


Review request for Sqoop.


Repository: sqoop-sqoop2


Description
-------

Update script for Jetty server


Diffs (updated)
-----

  dist/src/main/bin/sqoop-sys.sh 64dd0bf 
  dist/src/main/bin/sqoop.sh 707c3fc 

Diff: https://reviews.apache.org/r/38589/diff/


Testing
-------


Thanks,

Colin Ma


Re: Review Request 38589: SQOOP-2563: Sqoop2: Jetty: Update script for Jetty server

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38589/#review100960
-----------------------------------------------------------



dist/src/main/bin/sqoop.sh (lines 29 - 33)
<https://reviews.apache.org/r/38589/#comment158248>

    Wouldn't it make more sense to make this check after we define the variables as we're setting up defaults there? 
    
    Also rather then verifying that those variables are non empty, would it make sense to verify that those variables describes valid directories (e.g. -d instead of -n).


- Jarek Cecho


On Sept. 29, 2015, 8:18 a.m., Colin Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38589/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2015, 8:18 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Update script for Jetty server
> 
> 
> Diffs
> -----
> 
>   dist/src/main/bin/sqoop-sys.sh 64dd0bf 
>   dist/src/main/bin/sqoop.sh 707c3fc 
> 
> Diff: https://reviews.apache.org/r/38589/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Colin Ma
> 
>


Re: Review Request 38589: SQOOP-2563: Sqoop2: Jetty: Update script for Jetty server

Posted by Colin Ma <ju...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38589/
-----------------------------------------------------------

(Updated Sept. 29, 2015, 8:18 a.m.)


Review request for Sqoop.


Repository: sqoop-sqoop2


Description
-------

Update script for Jetty server


Diffs (updated)
-----

  dist/src/main/bin/sqoop-sys.sh 64dd0bf 
  dist/src/main/bin/sqoop.sh 707c3fc 

Diff: https://reviews.apache.org/r/38589/diff/


Testing
-------


Thanks,

Colin Ma


Re: Review Request 38589: SQOOP-2563: Sqoop2: Jetty: Update script for Jetty server

Posted by Colin Ma <ju...@intel.com>.

> On Sept. 28, 2015, 4:29 p.m., Jarek Cecho wrote:
> > My last round of testing on real cluster:

hi Jareck,
   Thanks for your review, I updated the following things in the latest patch:
   1. Load the jar located in the SQOOP_SERVER_EXTRA_LIB.
   2. Add the environment check for the Hadoop related jar:  (HADOOP_COMMON_HOME && HADOOP_HDFS_HOME && HADOOP_MAPRED_HOME && HADOOP_YARN_HOME) || HADOOP_HOME
   3. Check if the server starts successfully, if not,  "Sqoop2 server started." won't be displayed.
   4. Fix a bug in function is_sqoop_server_running.
   5. Add validation for java command.
   6. Remove the pid file when stopping the server.


> On Sept. 28, 2015, 4:29 p.m., Jarek Cecho wrote:
> > dist/src/main/bin/sqoop.sh, lines 77-82
> > <https://reviews.apache.org/r/38589/diff/2/?file=1084669#file1084669line77>
> >
> >     From some reason on my cluster this method was returning "0" if the file didn't existed at all. I had to add a "return 1" at the end of the method to workaround it.

This should be a bug, updated.


> On Sept. 28, 2015, 4:29 p.m., Jarek Cecho wrote:
> > dist/src/main/bin/sqoop.sh, line 194
> > <https://reviews.apache.org/r/38589/diff/2/?file=1084669#file1084669line194>
> >
> >     This one I'm not clear about, but I'm looking for your thoughts. Should we also remove the the pid file when stopping the server?

Good suggestion, to get the server status, we can check pid file exist or not, if exist, validate the pid.


> On Sept. 28, 2015, 4:29 p.m., Jarek Cecho wrote:
> > dist/src/main/bin/sqoop.sh, line 150
> > <https://reviews.apache.org/r/38589/diff/2/?file=1084669#file1084669line150>
> >
> >     Can we add somewhere check that EXEC_JAVA is a valid command?
> >     
> >     I got this output of my cluster when "java" is not available on the path:
> >     
> >     [root@streamsets-1 sqoop-2.0.0-SNAPSHOT-bin-hadoop200]# ./bin/sqoop2-server start
> >     Sqoop home directory: /root/sqoop-2.0.0-SNAPSHOT-bin-hadoop200
> >     PID: /tmp/sqoop-root-jetty-server.pid
> >     Starting the Sqoop2 server...
> >     Sqoop2 server started.
> >     [root@streamsets-1 sqoop-2.0.0-SNAPSHOT-bin-hadoop200]# /root/sqoop-2.0.0-SNAPSHOT-bin-hadoop200/bin/sqoop.sh: line 167: java: command not found
> >     echo $?
> >     0
> >     
> >     Seems very deceiving as the scripts reports "0" (e.g. success) and prints out "Sqoop2 server started" even though that the server didn't started at all.

Add the validation for java command.


> On Sept. 28, 2015, 4:29 p.m., Jarek Cecho wrote:
> > dist/src/main/bin/sqoop.sh, lines 28-31
> > <https://reviews.apache.org/r/38589/diff/2/?file=1084669#file1084669line28>
> >
> >     Thank you for incorporating my suggestion here!
> >     
> >     Might I suggest to be more defensive and verify that those variables are defined? I've tried running without them and I got this output:
> >     
> >     Sqoop home directory: /root/sqoop-2.0.0-SNAPSHOT-bin-hadoop200
> >     PID: /tmp/sqoop-root-jetty-server.pid
> >     Starting the Sqoop2 server...
> >     Sqoop2 server started.
> >     [root@streamsets-1 sqoop-2.0.0-SNAPSHOT-bin-hadoop200]# 0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
> >     33   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
> >     Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
> >     	at org.apache.sqoop.security.authentication.SimpleAuthenticationHandler.secureLogin(SimpleAuthenticationHandler.java:36)
> >     	at org.apache.sqoop.security.AuthenticationManager.initialize(AuthenticationManager.java:98)
> >     	at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:54)
> >     	at org.apache.sqoop.server.SqoopJettyServer.<init>(SqoopJettyServer.java:50)
> >     	at org.apache.sqoop.server.SqoopJettyServer.main(SqoopJettyServer.java:121)
> >     Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
> >     	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> >     	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> >     	at java.security.AccessController.doPrivileged(Native Method)
> >     	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> >     	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> >     	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> >     	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> >     	... 5 more
> >         
> >     Which is kind of confusing - the scripts exists with success even though that we immediately fail.

Add 2 checks: 1. for the environment variables, Hadoop lib related should be set.  2. Start the server, wait 5 seconds and check the result, if there has some exceptions, "Sqoop2 server started." won't be displayed.


> On Sept. 28, 2015, 4:29 p.m., Jarek Cecho wrote:
> > dist/src/main/bin/sqoop.sh, line 27
> > <https://reviews.apache.org/r/38589/diff/2/?file=1084669#file1084669line27>
> >
> >     In the old Tomcat world [1] we had default of SQOOP_HOME/lib (masked as ${catalina.home}/../lib/*.jar) that allowed user to create lib/ directory, put random jars (mostly JDBC drivers) and inject them easily to Sqoop 2 server classpath. I'm wondering whether we can provide similar functionality? Let's perhaps define variable SQOOP_SERVER_EXTRA_LIB that defaults to "lib/" that enables admin to easily inject the jars?
> >     
> >     I can see the variable SQOOP_SERVER_LIB pointing to our own directory, but I would prefer user not to mess with our own directories and have a dedicated directory for "inserted" jars. It's better for upgrade as you can simply point the new version to the same "extra" directory without need to see what jars belongs to Sqoop and what were added and re-add them to the new version.
> >     
> >     Links:
> >     1: https://github.com/apache/sqoop/blob/sqoop2/dist/src/main/server/conf/catalina.properties

Good suggestion, add the SQOOP_SERVER_EXTRA_LIB to the script.


- Colin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38589/#review100815
-----------------------------------------------------------


On Sept. 29, 2015, 8:18 a.m., Colin Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38589/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2015, 8:18 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Update script for Jetty server
> 
> 
> Diffs
> -----
> 
>   dist/src/main/bin/sqoop-sys.sh 64dd0bf 
>   dist/src/main/bin/sqoop.sh 707c3fc 
> 
> Diff: https://reviews.apache.org/r/38589/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Colin Ma
> 
>


Re: Review Request 38589: SQOOP-2563: Sqoop2: Jetty: Update script for Jetty server

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38589/#review100815
-----------------------------------------------------------


My last round of testing on real cluster:


dist/src/main/bin/sqoop.sh (line 27)
<https://reviews.apache.org/r/38589/#comment158068>

    In the old Tomcat world [1] we had default of SQOOP_HOME/lib (masked as ${catalina.home}/../lib/*.jar) that allowed user to create lib/ directory, put random jars (mostly JDBC drivers) and inject them easily to Sqoop 2 server classpath. I'm wondering whether we can provide similar functionality? Let's perhaps define variable SQOOP_SERVER_EXTRA_LIB that defaults to "lib/" that enables admin to easily inject the jars?
    
    I can see the variable SQOOP_SERVER_LIB pointing to our own directory, but I would prefer user not to mess with our own directories and have a dedicated directory for "inserted" jars. It's better for upgrade as you can simply point the new version to the same "extra" directory without need to see what jars belongs to Sqoop and what were added and re-add them to the new version.
    
    Links:
    1: https://github.com/apache/sqoop/blob/sqoop2/dist/src/main/server/conf/catalina.properties



dist/src/main/bin/sqoop.sh (lines 28 - 31)
<https://reviews.apache.org/r/38589/#comment158066>

    Thank you for incorporating my suggestion here!
    
    Might I suggest to be more defensive and verify that those variables are defined? I've tried running without them and I got this output:
    
    Sqoop home directory: /root/sqoop-2.0.0-SNAPSHOT-bin-hadoop200
    PID: /tmp/sqoop-root-jetty-server.pid
    Starting the Sqoop2 server...
    Sqoop2 server started.
    [root@streamsets-1 sqoop-2.0.0-SNAPSHOT-bin-hadoop200]# 0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
    33   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
    	at org.apache.sqoop.security.authentication.SimpleAuthenticationHandler.secureLogin(SimpleAuthenticationHandler.java:36)
    	at org.apache.sqoop.security.AuthenticationManager.initialize(AuthenticationManager.java:98)
    	at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:54)
    	at org.apache.sqoop.server.SqoopJettyServer.<init>(SqoopJettyServer.java:50)
    	at org.apache.sqoop.server.SqoopJettyServer.main(SqoopJettyServer.java:121)
    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    	... 5 more
        
    Which is kind of confusing - the scripts exists with success even though that we immediately fail.



dist/src/main/bin/sqoop.sh (lines 77 - 82)
<https://reviews.apache.org/r/38589/#comment158061>

    From some reason on my cluster this method was returning "0" if the file didn't existed at all. I had to add a "return 1" at the end of the method to workaround it.



dist/src/main/bin/sqoop.sh (line 141)
<https://reviews.apache.org/r/38589/#comment158062>

    Can we add somewhere check that EXEC_JAVA is a valid command?
    
    I got this output of my cluster when "java" is not available on the path:
    
    [root@streamsets-1 sqoop-2.0.0-SNAPSHOT-bin-hadoop200]# ./bin/sqoop2-server start
    Sqoop home directory: /root/sqoop-2.0.0-SNAPSHOT-bin-hadoop200
    PID: /tmp/sqoop-root-jetty-server.pid
    Starting the Sqoop2 server...
    Sqoop2 server started.
    [root@streamsets-1 sqoop-2.0.0-SNAPSHOT-bin-hadoop200]# /root/sqoop-2.0.0-SNAPSHOT-bin-hadoop200/bin/sqoop.sh: line 167: java: command not found
    echo $?
    0
    
    Seems very deceiving as the scripts reports "0" (e.g. success) and prints out "Sqoop2 server started" even though that the server didn't started at all.



dist/src/main/bin/sqoop.sh (line 183)
<https://reviews.apache.org/r/38589/#comment158067>

    This one I'm not clear about, but I'm looking for your thoughts. Should we also remove the the pid file when stopping the server?


Jarcec

- Jarek Cecho


On Sept. 25, 2015, 7:10 a.m., Colin Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38589/
> -----------------------------------------------------------
> 
> (Updated Sept. 25, 2015, 7:10 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Update script for Jetty server
> 
> 
> Diffs
> -----
> 
>   dist/src/main/bin/sqoop-sys.sh 64dd0bf 
>   dist/src/main/bin/sqoop.sh 707c3fc 
> 
> Diff: https://reviews.apache.org/r/38589/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Colin Ma
> 
>


Re: Review Request 38589: SQOOP-2563: Sqoop2: Jetty: Update script for Jetty server

Posted by Colin Ma <ju...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38589/
-----------------------------------------------------------

(Updated Sept. 25, 2015, 7:10 a.m.)


Review request for Sqoop.


Repository: sqoop-sqoop2


Description
-------

Update script for Jetty server


Diffs (updated)
-----

  dist/src/main/bin/sqoop-sys.sh 64dd0bf 
  dist/src/main/bin/sqoop.sh 707c3fc 

Diff: https://reviews.apache.org/r/38589/diff/


Testing
-------


Thanks,

Colin Ma


Re: Review Request 38589: SQOOP-2563: Sqoop2: Jetty: Update script for Jetty server

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38589/#review99992
-----------------------------------------------------------

Ship it!


Reading the code it seems reasonable to me - I did not try running it yet though.

- Jarek Cecho


On Sept. 22, 2015, 1:30 a.m., Colin Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38589/
> -----------------------------------------------------------
> 
> (Updated Sept. 22, 2015, 1:30 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Update script for Jetty server
> 
> 
> Diffs
> -----
> 
>   dist/src/main/bin/sqoop-sys.sh 64dd0bf 
>   dist/src/main/bin/sqoop.sh 707c3fc 
> 
> Diff: https://reviews.apache.org/r/38589/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Colin Ma
> 
>