You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by "Kartashov, Andy" <An...@mpac.ca> on 2012/10/23 21:17:01 UTC

Sqoop 1.4.1-cdh4.0.1 is not running in Hadoop 2.0.0-cdh4.1.1

Guys tried for hours to resolve this error.

I am trying to import a table to Hadoop using Sqoop.

ERROR is:
Error: org.hsqldb.DatabaseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/persist/HsqlProperties


I realise that there is an issue with the versions of hsqldb.jar files

At first, Sqoop was spitting above error until I realised that my /usr/lib/sqoop/lib folder had both versions hsqldb-1.8.0.10.jar and just hsqldb.jar (2.0? I suppose), and sqoop-conf was picking up the first (wrong jar).

When I moved the hsqldb-1.8.0.10.jar away, Sqoop stopped complaining but them Hadoop began spitting out the same error. No matter what I tried I could not get Hadoop to pick the right jar.

I tried setting:
export HADOOP_CLASSPATH="/usr/lib/sqoop/lib/hsqldb.jar" and then
export HADOOP_USER_CLASSPATH_FIRST=true
without luck..

Please help.

Thnks
AK


From: Jonathan Bishop [mailto:jbishop.rwc@gmail.com]
Sent: Tuesday, October 23, 2012 2:41 PM
To: user@hadoop.apache.org
Subject: Re: zlib does not uncompress gzip during MR run

Just to follow up on my own question...

I believe the problem is caused by the input split during MR. So my real question is how to handle input splits when the input is gzipped.

Is it even possible to have splits of a gzipped file?

Thanks,

Jon
On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com>> wrote:
Hi,

My input files are gzipped, and I am using the builtin java codecs successfully to uncompress them in a normal java run...

        fileIn = fs.open(fsplit.getPath());
        codec = compressionCodecs.getCodec(fsplit.getPath());
        in = new LineReader(codec != null ? codec.createInputStream(fileIn) : fileIn, config);

But when I use the same piece of code in a MR job I am getting...


12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table output configured.
12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process : 3
12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014
12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
12/10/23 11:02:49 INFO mapred.JobClient: Task Id : attempt_201210221549_0014_m_000003_0, Status : FAILED
java.io.IOException: incorrect header check
    at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
    at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
    at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
    at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
    at java.io.InputStream.read(InputStream.java:101)
So I am thinking that there is some incompatibility of zlib and my gzip. Is there a way to force hadoop to use the java built-in compression codecs?

Also, I would like to try lzo which I hope will allow splitting of the input files (I recall reading this somewhere). Can someone point me to the best way to do this?

Thanks,

Jon

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

Re: Sqoop 1.4.1-cdh4.0.1 is not running in Hadoop 2.0.0-cdh4.1.1

Posted by Arun C Murthy <ac...@hortonworks.com>.

Please ask CDH questions on CDH lists. 

On Oct 23, 2012, at 3:17 PM, Kartashov, Andy wrote:

> Guys tried for hours to resolve this error.
>  
> I am trying to import a table to Hadoop using Sqoop.
>  
> ERROR is:
> Error: org.hsqldb.DatabaseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/persist/HsqlProperties
>  
>  
> I realise that there is an issue with the versions of hsqldb.jar files
>  
> At first, Sqoop was spitting above error until I realised that my /usr/lib/sqoop/lib folder had both versions hsqldb-1.8.0.10.jar and just hsqldb.jar (2.0? I suppose), and sqoop-conf was picking up the first (wrong jar).
>  
> When I moved the hsqldb-1.8.0.10.jar away, Sqoop stopped complaining but them Hadoop began spitting out the same error. No matter what I tried I could not get Hadoop to pick the right jar.
>  
> I tried setting:
> export HADOOP_CLASSPATH=”/usr/lib/sqoop/lib/hsqldb.jar” and then
> export HADOOP_USER_CLASSPATH_FIRST=true
> without luck..
>  
> Please help.
>  
> Thnks
> AK
>  
>  
> From: Jonathan Bishop [mailto:jbishop.rwc@gmail.com] 
> Sent: Tuesday, October 23, 2012 2:41 PM
> To: user@hadoop.apache.org
> Subject: Re: zlib does not uncompress gzip during MR run
>  
> Just to follow up on my own question...
>  
> I believe the problem is caused by the input split during MR. So my real question is how to handle input splits when the input is gzipped.
>  
> Is it even possible to have splits of a gzipped file?
>  
> Thanks,
>  
> Jon
> 
> On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com> wrote:
> Hi,
> 
> My input files are gzipped, and I am using the builtin java codecs successfully to uncompress them in a normal java run...
> 
>         fileIn = fs.open(fsplit.getPath());
>         codec = compressionCodecs.getCodec(fsplit.getPath());
>         in = new LineReader(codec != null ? codec.createInputStream(fileIn) : fileIn, config);
> 
> But when I use the same piece of code in a MR job I am getting...
> 
> 
> 
> 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop library
> 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
> 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
> 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table output configured.
> 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process : 3
> 12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014
> 12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
> 12/10/23 11:02:49 INFO mapred.JobClient: Task Id : attempt_201210221549_0014_m_000003_0, Status : FAILED
> java.io.IOException: incorrect header check
>     at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
>     at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>     at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>     at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>     at java.io.InputStream.read(InputStream.java:101)
> 
> So I am thinking that there is some incompatibility of zlib and my gzip. Is there a way to force hadoop to use the java built-in compression codecs?
> 
> Also, I would like to try lzo which I hope will allow splitting of the input files (I recall reading this somewhere). Can someone point me to the best way to do this?
> 
> Thanks,
> 
> Jon
>  
> NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Sqoop 1.4.1-cdh4.0.1 is not running in Hadoop 2.0.0-cdh4.1.1

Posted by Arun C Murthy <ac...@hortonworks.com>.

Please ask CDH questions on CDH lists. 

On Oct 23, 2012, at 3:17 PM, Kartashov, Andy wrote:

> Guys tried for hours to resolve this error.
>  
> I am trying to import a table to Hadoop using Sqoop.
>  
> ERROR is:
> Error: org.hsqldb.DatabaseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/persist/HsqlProperties
>  
>  
> I realise that there is an issue with the versions of hsqldb.jar files
>  
> At first, Sqoop was spitting above error until I realised that my /usr/lib/sqoop/lib folder had both versions hsqldb-1.8.0.10.jar and just hsqldb.jar (2.0? I suppose), and sqoop-conf was picking up the first (wrong jar).
>  
> When I moved the hsqldb-1.8.0.10.jar away, Sqoop stopped complaining but them Hadoop began spitting out the same error. No matter what I tried I could not get Hadoop to pick the right jar.
>  
> I tried setting:
> export HADOOP_CLASSPATH=”/usr/lib/sqoop/lib/hsqldb.jar” and then
> export HADOOP_USER_CLASSPATH_FIRST=true
> without luck..
>  
> Please help.
>  
> Thnks
> AK
>  
>  
> From: Jonathan Bishop [mailto:jbishop.rwc@gmail.com] 
> Sent: Tuesday, October 23, 2012 2:41 PM
> To: user@hadoop.apache.org
> Subject: Re: zlib does not uncompress gzip during MR run
>  
> Just to follow up on my own question...
>  
> I believe the problem is caused by the input split during MR. So my real question is how to handle input splits when the input is gzipped.
>  
> Is it even possible to have splits of a gzipped file?
>  
> Thanks,
>  
> Jon
> 
> On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com> wrote:
> Hi,
> 
> My input files are gzipped, and I am using the builtin java codecs successfully to uncompress them in a normal java run...
> 
>         fileIn = fs.open(fsplit.getPath());
>         codec = compressionCodecs.getCodec(fsplit.getPath());
>         in = new LineReader(codec != null ? codec.createInputStream(fileIn) : fileIn, config);
> 
> But when I use the same piece of code in a MR job I am getting...
> 
> 
> 
> 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop library
> 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
> 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
> 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table output configured.
> 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process : 3
> 12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014
> 12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
> 12/10/23 11:02:49 INFO mapred.JobClient: Task Id : attempt_201210221549_0014_m_000003_0, Status : FAILED
> java.io.IOException: incorrect header check
>     at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
>     at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>     at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>     at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>     at java.io.InputStream.read(InputStream.java:101)
> 
> So I am thinking that there is some incompatibility of zlib and my gzip. Is there a way to force hadoop to use the java built-in compression codecs?
> 
> Also, I would like to try lzo which I hope will allow splitting of the input files (I recall reading this somewhere). Can someone point me to the best way to do this?
> 
> Thanks,
> 
> Jon
>  
> NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Sqoop 1.4.1-cdh4.0.1 is not running in Hadoop 2.0.0-cdh4.1.1

Posted by Arun C Murthy <ac...@hortonworks.com>.

Please ask CDH questions on CDH lists. 

On Oct 23, 2012, at 3:17 PM, Kartashov, Andy wrote:

> Guys tried for hours to resolve this error.
>  
> I am trying to import a table to Hadoop using Sqoop.
>  
> ERROR is:
> Error: org.hsqldb.DatabaseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/persist/HsqlProperties
>  
>  
> I realise that there is an issue with the versions of hsqldb.jar files
>  
> At first, Sqoop was spitting above error until I realised that my /usr/lib/sqoop/lib folder had both versions hsqldb-1.8.0.10.jar and just hsqldb.jar (2.0? I suppose), and sqoop-conf was picking up the first (wrong jar).
>  
> When I moved the hsqldb-1.8.0.10.jar away, Sqoop stopped complaining but them Hadoop began spitting out the same error. No matter what I tried I could not get Hadoop to pick the right jar.
>  
> I tried setting:
> export HADOOP_CLASSPATH=”/usr/lib/sqoop/lib/hsqldb.jar” and then
> export HADOOP_USER_CLASSPATH_FIRST=true
> without luck..
>  
> Please help.
>  
> Thnks
> AK
>  
>  
> From: Jonathan Bishop [mailto:jbishop.rwc@gmail.com] 
> Sent: Tuesday, October 23, 2012 2:41 PM
> To: user@hadoop.apache.org
> Subject: Re: zlib does not uncompress gzip during MR run
>  
> Just to follow up on my own question...
>  
> I believe the problem is caused by the input split during MR. So my real question is how to handle input splits when the input is gzipped.
>  
> Is it even possible to have splits of a gzipped file?
>  
> Thanks,
>  
> Jon
> 
> On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com> wrote:
> Hi,
> 
> My input files are gzipped, and I am using the builtin java codecs successfully to uncompress them in a normal java run...
> 
>         fileIn = fs.open(fsplit.getPath());
>         codec = compressionCodecs.getCodec(fsplit.getPath());
>         in = new LineReader(codec != null ? codec.createInputStream(fileIn) : fileIn, config);
> 
> But when I use the same piece of code in a MR job I am getting...
> 
> 
> 
> 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop library
> 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
> 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
> 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table output configured.
> 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process : 3
> 12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014
> 12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
> 12/10/23 11:02:49 INFO mapred.JobClient: Task Id : attempt_201210221549_0014_m_000003_0, Status : FAILED
> java.io.IOException: incorrect header check
>     at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
>     at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>     at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>     at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>     at java.io.InputStream.read(InputStream.java:101)
> 
> So I am thinking that there is some incompatibility of zlib and my gzip. Is there a way to force hadoop to use the java built-in compression codecs?
> 
> Also, I would like to try lzo which I hope will allow splitting of the input files (I recall reading this somewhere). Can someone point me to the best way to do this?
> 
> Thanks,
> 
> Jon
>  
> NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Sqoop 1.4.1-cdh4.0.1 is not running in Hadoop 2.0.0-cdh4.1.1

Posted by Arun C Murthy <ac...@hortonworks.com>.

Please ask CDH questions on CDH lists. 

On Oct 23, 2012, at 3:17 PM, Kartashov, Andy wrote:

> Guys tried for hours to resolve this error.
>  
> I am trying to import a table to Hadoop using Sqoop.
>  
> ERROR is:
> Error: org.hsqldb.DatabaseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/persist/HsqlProperties
>  
>  
> I realise that there is an issue with the versions of hsqldb.jar files
>  
> At first, Sqoop was spitting above error until I realised that my /usr/lib/sqoop/lib folder had both versions hsqldb-1.8.0.10.jar and just hsqldb.jar (2.0? I suppose), and sqoop-conf was picking up the first (wrong jar).
>  
> When I moved the hsqldb-1.8.0.10.jar away, Sqoop stopped complaining but them Hadoop began spitting out the same error. No matter what I tried I could not get Hadoop to pick the right jar.
>  
> I tried setting:
> export HADOOP_CLASSPATH=”/usr/lib/sqoop/lib/hsqldb.jar” and then
> export HADOOP_USER_CLASSPATH_FIRST=true
> without luck..
>  
> Please help.
>  
> Thnks
> AK
>  
>  
> From: Jonathan Bishop [mailto:jbishop.rwc@gmail.com] 
> Sent: Tuesday, October 23, 2012 2:41 PM
> To: user@hadoop.apache.org
> Subject: Re: zlib does not uncompress gzip during MR run
>  
> Just to follow up on my own question...
>  
> I believe the problem is caused by the input split during MR. So my real question is how to handle input splits when the input is gzipped.
>  
> Is it even possible to have splits of a gzipped file?
>  
> Thanks,
>  
> Jon
> 
> On Tue, Oct 23, 2012 at 11:10 AM, Jonathan Bishop <jb...@gmail.com> wrote:
> Hi,
> 
> My input files are gzipped, and I am using the builtin java codecs successfully to uncompress them in a normal java run...
> 
>         fileIn = fs.open(fsplit.getPath());
>         codec = compressionCodecs.getCodec(fsplit.getPath());
>         in = new LineReader(codec != null ? codec.createInputStream(fileIn) : fileIn, config);
> 
> But when I use the same piece of code in a MR job I am getting...
> 
> 
> 
> 12/10/23 11:02:25 INFO util.NativeCodeLoader: Loaded the native-hadoop library
> 12/10/23 11:02:25 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
> 12/10/23 11:02:25 INFO compress.CodecPool: Got brand-new compressor
> 12/10/23 11:02:26 INFO mapreduce.HFileOutputFormat: Incremental table output configured.
> 12/10/23 11:02:26 INFO input.FileInputFormat: Total input paths to process : 3
> 12/10/23 11:02:27 INFO mapred.JobClient: Running job: job_201210221549_0014
> 12/10/23 11:02:28 INFO mapred.JobClient:  map 0% reduce 0%
> 12/10/23 11:02:49 INFO mapred.JobClient: Task Id : attempt_201210221549_0014_m_000003_0, Status : FAILED
> java.io.IOException: incorrect header check
>     at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
>     at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>     at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>     at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>     at java.io.InputStream.read(InputStream.java:101)
> 
> So I am thinking that there is some incompatibility of zlib and my gzip. Is there a way to force hadoop to use the java built-in compression codecs?
> 
> Also, I would like to try lzo which I hope will allow splitting of the input files (I recall reading this somewhere). Can someone point me to the best way to do this?
> 
> Thanks,
> 
> Jon
>  
> NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/