You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Andrzej Bialecki <ab...@getopt.org> on 2010/08/10 19:03:21 UTC

Hsqldb 2.0 conflicts with Hsqldb 1.8 in Hadoop

Hi,

I was trying to run Benchmark in trunk using MySQL, on a standalone 
Hadoop cluster. My conf/gora.properties has this:

gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?user=nutch&password=nutch

Jobs were failing though, with the following:

Exception in thread "main" java.lang.NoSuchMethodError: 
org.hsqldb.DatabaseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/persist/HsqlProperties;
         at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source)
         at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source)
         at java.sql.DriverManager.getConnection(DriverManager.java:582)
         at java.sql.DriverManager.getConnection(DriverManager.java:207)
         at org.gora.sql.store.SqlStore.getConnection(SqlStore.java:712)
         at org.gora.sql.store.SqlStore.initialize(SqlStore.java:145)
         at 
org.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:64)
         at 
org.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:86)
         at 
org.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:98)
         at 
org.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:70)
         at 
org.apache.nutch.storage.StorageUtils.createDataStore(StorageUtils.java:25)
         at 
org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:68)
         at 
org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:50)
         at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:237)
         at org.apache.nutch.tools.Benchmark.benchmark(Benchmark.java:190)
         at org.apache.nutch.tools.Benchmark.run(Benchmark.java:139)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
         at org.apache.nutch.tools.Benchmark.main(Benchmark.java:32)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


Isn't this puzzling... It turns out that java.sql.DriverManager will try 
_all_ drivers in turn to see which one can handle the jdbcUrl, and the 
usual magic of Class.forName(jdbcDriver) doesn't mean we are going to 
use jdbcDriver, it's just to make sure the driver class was loaded and 
registered itself on the list of available drivers.

Now, I know why the particular error occured - Hadoop includes HSQLDB 
1.8, and we use HSQLDB 2.0. When DriverManager tries each driver in 
turn, unfortunately Hsqldb is first on the classpath (it comes in 
Hadoop/lib), and MySQL is the last, so it bombs out even before trying 
the right driver...

For now I changed my build.xml to this:

Index: build.xml
===================================================================
--- build.xml   (revision 983564)
+++ build.xml   (working copy)
@@ -123,7 +123,7 @@
                    excludes="nutch-default.xml,nutch-site.xml"/>
        <zipfileset dir="${conf.dir}" excludes="*.template,hadoop*.*"/>
        <zipfileset dir="${build.lib.dir}" prefix="lib"
-                  includes="**/*.jar" excludes="hadoop-*.jar"/>
+                  includes="**/*.jar" excludes="hadoop-*.jar,hsqldb*.jar"/>
        <zipfileset dir="${build.plugins}" prefix="plugins"/>
      </jar>
    </target>



-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Hsqldb 2.0 conflicts with Hsqldb 1.8 in Hadoop

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Andrzej,

Interesting, and good sleuth work:


> 
> For now I changed my build.xml to this:
> 
> [...] 

Is it worth making that change permanent?

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++