You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Matthieu Labour <ma...@actionx.com> on 2013/03/15 22:16:45 UTC

Tips on finding the file that is causing sqoop to choke

Sometimes when processing lots of logs I get the following errors. Please
do you have any tips on where/ how i can find out the file / line that is
causing sqoop to choke?
Thank you

2013-03-15 21:01:53,619 INFO org.apache.hadoop.util.NativeCodeLoader
(main): Loaded the native-hadoop library
2013-03-15 21:01:54,301 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): Source name ugi
already exists!
2013-03-15 21:01:54,562 INFO org.apache.hadoop.mapred.MapTask (main): Host
name: domU-12-31-39-02-61-87.compute-1.internal
2013-03-15 21:01:54,618 INFO org.apache.hadoop.util.ProcessTree (main):
setsid exited with exit code 0
2013-03-15 21:01:54,651 INFO org.apache.hadoop.mapred.Task (main):  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1b7ae22
2013-03-15 21:01:55,123 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader
(main): Loaded native gpl library
2013-03-15 21:01:55,167 WARN com.hadoop.compression.lzo.LzoCodec (main):
Could not find build properties file with revision hash
2013-03-15 21:01:55,167 INFO com.hadoop.compression.lzo.LzoCodec (main):
Successfully loaded & initialized native-lzo library [hadoop-lzo rev
UNKNOWN]
2013-03-15 21:01:55,178 WARN
org.apache.hadoop.io.compress.snappy.LoadSnappy (main): Snappy native
library is available
2013-03-15 21:01:55,178 INFO
org.apache.hadoop.io.compress.snappy.LoadSnappy (main): Snappy native
library loaded
2013-03-15 21:01:59,652 DEBUG
org.apache.sqoop.mapreduce.AsyncSqlOutputFormat (Thread-13): Committing
transaction of 100 statements
2013-03-15 21:01:59,874 DEBUG org.apache.sqoop.mapreduce.AutoProgressMapper
(main): Instructing auto-progress thread to quit.
2013-03-15 21:01:59,875 INFO org.apache.sqoop.mapreduce.AutoProgressMapper
(Thread-14): Auto-progress thread is finished. keepGoing=false
2013-03-15 21:01:59,875 DEBUG org.apache.sqoop.mapreduce.AutoProgressMapper
(main): Waiting for progress thread shutdown...
2013-03-15 21:01:59,875 DEBUG org.apache.sqoop.mapreduce.AutoProgressMapper
(main): Progress thread shutdown detected.
2013-03-15 21:01:59,878 INFO org.apache.hadoop.mapred.TaskLogsTruncater
(main): Initializing logs' truncater with mapRetainSize=-1 and
reduceRetainSize=-1
2013-03-15 21:01:59,958 INFO org.apache.hadoop.io.nativeio.NativeIO (main):
Initialized cache for UID to User mapping with a cache timeout of 14400
seconds.
2013-03-15 21:01:59,958 INFO org.apache.hadoop.io.nativeio.NativeIO (main):
Got UserName hadoop for UID 106 from the native implementation
2013-03-15 21:01:59,961 WARN org.apache.hadoop.mapred.Child (main): Error
running child
java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at
ml_ys_log_gmt_daily_experiment_2.__loadFromFields(ml_ys_log_gmt_daily_experiment_2.java:1290)
at
ml_ys_log_gmt_daily_experiment_2.parse(ml_ys_log_gmt_daily_experiment_2.java:1144)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:77)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:36)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at
org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2013-03-15 21:01:59,976 INFO org.apache.hadoop.mapred.Task (main): Runnning
cleanup for the task

Re: Tips on finding the file that is causing sqoop to choke

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Hi Matthieu,
this looks to me as a corrupted data on first sight. It seems that Sqoop is trying to access more columns that are on the input line. We've significantly improved error logging for this kind of exceptions, so it would be great if you could upgrade to latest Sqoop 1.4.3 [1] and rerun your failing job.

Jarcec

Links:
1: http://www.apache.org/dist/sqoop/1.4.3/

On Fri, Mar 15, 2013 at 05:16:45PM -0400, Matthieu Labour wrote:
> Sometimes when processing lots of logs I get the following errors. Please
> do you have any tips on where/ how i can find out the file / line that is
> causing sqoop to choke?
> Thank you
> 
> 2013-03-15 21:01:53,619 INFO org.apache.hadoop.util.NativeCodeLoader
> (main): Loaded the native-hadoop library
> 2013-03-15 21:01:54,301 WARN
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): Source name ugi
> already exists!
> 2013-03-15 21:01:54,562 INFO org.apache.hadoop.mapred.MapTask (main): Host
> name: domU-12-31-39-02-61-87.compute-1.internal
> 2013-03-15 21:01:54,618 INFO org.apache.hadoop.util.ProcessTree (main):
> setsid exited with exit code 0
> 2013-03-15 21:01:54,651 INFO org.apache.hadoop.mapred.Task (main):  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1b7ae22
> 2013-03-15 21:01:55,123 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader
> (main): Loaded native gpl library
> 2013-03-15 21:01:55,167 WARN com.hadoop.compression.lzo.LzoCodec (main):
> Could not find build properties file with revision hash
> 2013-03-15 21:01:55,167 INFO com.hadoop.compression.lzo.LzoCodec (main):
> Successfully loaded & initialized native-lzo library [hadoop-lzo rev
> UNKNOWN]
> 2013-03-15 21:01:55,178 WARN
> org.apache.hadoop.io.compress.snappy.LoadSnappy (main): Snappy native
> library is available
> 2013-03-15 21:01:55,178 INFO
> org.apache.hadoop.io.compress.snappy.LoadSnappy (main): Snappy native
> library loaded
> 2013-03-15 21:01:59,652 DEBUG
> org.apache.sqoop.mapreduce.AsyncSqlOutputFormat (Thread-13): Committing
> transaction of 100 statements
> 2013-03-15 21:01:59,874 DEBUG org.apache.sqoop.mapreduce.AutoProgressMapper
> (main): Instructing auto-progress thread to quit.
> 2013-03-15 21:01:59,875 INFO org.apache.sqoop.mapreduce.AutoProgressMapper
> (Thread-14): Auto-progress thread is finished. keepGoing=false
> 2013-03-15 21:01:59,875 DEBUG org.apache.sqoop.mapreduce.AutoProgressMapper
> (main): Waiting for progress thread shutdown...
> 2013-03-15 21:01:59,875 DEBUG org.apache.sqoop.mapreduce.AutoProgressMapper
> (main): Progress thread shutdown detected.
> 2013-03-15 21:01:59,878 INFO org.apache.hadoop.mapred.TaskLogsTruncater
> (main): Initializing logs' truncater with mapRetainSize=-1 and
> reduceRetainSize=-1
> 2013-03-15 21:01:59,958 INFO org.apache.hadoop.io.nativeio.NativeIO (main):
> Initialized cache for UID to User mapping with a cache timeout of 14400
> seconds.
> 2013-03-15 21:01:59,958 INFO org.apache.hadoop.io.nativeio.NativeIO (main):
> Got UserName hadoop for UID 106 from the native implementation
> 2013-03-15 21:01:59,961 WARN org.apache.hadoop.mapred.Child (main): Error
> running child
> java.util.NoSuchElementException
> at java.util.AbstractList$Itr.next(AbstractList.java:350)
> at
> ml_ys_log_gmt_daily_experiment_2.__loadFromFields(ml_ys_log_gmt_daily_experiment_2.java:1290)
> at
> ml_ys_log_gmt_daily_experiment_2.parse(ml_ys_log_gmt_daily_experiment_2.java:1144)
> at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:77)
> at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:36)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at
> org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 2013-03-15 21:01:59,976 INFO org.apache.hadoop.mapred.Task (main): Runnning
> cleanup for the task