You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Danny Antonetti <da...@gmail.com> on 2012/04/06 02:22:03 UTC

Avro problem with merge

I sent this message to the Sqoop user list, but no one responded.

Is there a more appropriate list for this question?


I am using the Avro format.

I currently have sqoop setup to do an initial full dump, or an incremental
dump and then a merge from the previous full dump.


But the merge fails with the following error

>$ sqoop merge --verbose --jar-file=acdc-hadoop.jar  --class-name
'com.test.hadoop.sqoop.clients'  --merge-key client_id --new-data
/data/dev/inputs/incremental/clients/20120331/12/00/localhost --target-dir
/temp/data3/complete/clients/20120331/11/00 --onto
/data/dev/inputs/complete/clients/20120331/11/00 --class-name clients
12/03/31 20:02:13 DEBUG tool.MergeTool: Enabled debug logging.
12/03/31 20:02:13 INFO security.UserGroupInformation: JAAS Configuration
already set up for Hadoop, not re-installing.
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Setting job jar to
user-specified jar: acdc-hadoop.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/sqoop-1.3.0-cdh3u3.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/sqoop-1.3.0-cdh3u3.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/ant-eclipse-1.0-jvm1.2.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/avro-ipc-1.5.4.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/snappy-java-1.0.3.2.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/jackson-mapper-asl-1.7.3.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/avro-1.5.4.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/jopt-simple-3.2.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/mysql-connector-java-5.1.13-bin.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/jackson-core-asl-1.7.3.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/paranamer-2.3.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/avro-mapred-1.5.4.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/hadoop-mrunit-0.20.2-CDH3b2-SNAPSHOT.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/commons-io-1.4.jar
12/03/31 20:02:13 DEBUG mapreduce.JobBase: Adding to job classpath:
file:/usr/lib/sqoop/lib/ant-contrib-1.0b3.jar
12/03/31 20:02:14 INFO input.FileInputFormat: Total input paths to process
: 2
12/03/31 20:02:14 WARN snappy.LoadSnappy: Snappy native library is available
12/03/31 20:02:14 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
12/03/31 20:02:14 INFO snappy.LoadSnappy: Snappy native library loaded
12/03/31 20:02:14 INFO mapred.JobClient: Running job: job_201203311845_0057
12/03/31 20:02:15 INFO mapred.JobClient:  map 0% reduce 0%
12/03/31 20:02:21 INFO mapred.JobClient: Task Id :
attempt_201203311845_0057_m_000000_0, Status : FAILED
java.lang.NumberFormatException: For input string: "Obj avro.schema{""
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:410)
at java.lang.Long.valueOf(Long.java:525)
at com.test.hadoop.sqoop.clients.__loadFromFields(clients.java:310)
at com.test.hadoop.sqoop.clients.parse(clients.java:260)
at com.cloudera.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:56)
at com.cloudera.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:37)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child
12/03/31 20:02:21 INFO mapred.JobClient: Task Id :
attempt_201203311845_0057_m_000001_0, Status : FAILED
java.io.FileNotFoundException: File does not exist:
/data/dev/inputs/complete/clients/20120331/11/00/localhost
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1813)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
at
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:75)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:451)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.d
12/03/31 20:02:38 INFO mapred.JobClient: Job complete: job_201203311845_0057
12/03/31 20:02:38 INFO mapred.JobClient: Counters: 7
12/03/31 20:02:38 INFO mapred.JobClient:   Job Counters
12/03/31 20:02:38 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=41086
12/03/31 20:02:38 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
12/03/31 20:02:38 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
12/03/31 20:02:38 INFO mapred.JobClient:     Launched map tasks=8
12/03/31 20:02:38 INFO mapred.JobClient:     Data-local map tasks=4
12/03/31 20:02:38 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
12/03/31 20:02:38 INFO mapred.JobClient:     Failed map tasks=1
12/03/31 20:02:38 ERROR tool.MergeTool: MapReduce job failed!
>$ hadoop dfs -ls /data/dev/inputs/complete/clients/20120331/11/00/localhost
12/03/31 20:02:59 INFO security.UserGroupInformation: JAAS Configuration
already set up for Hadoop, not re-installing.
Found 6 items
-rw-r--r--   1 mapred supergroup          1 2012-03-31 19:36
/data/dev/inputs/complete/clients/20120331/11/00/localhost/_SUCCESS
drwxr-xr-x   - mapred supergroup          0 2012-03-31 19:36
/data/dev/inputs/complete/clients/20120331/11/00/localhost/_logs
-rw-r--r--   1 mapred supergroup        623 2012-03-31 19:36
/data/dev/inputs/complete/clients/20120331/11/00/localhost/part-m-00000.avro
-rw-r--r--   1 mapred supergroup        571 2012-03-31 19:36
/data/dev/inputs/complete/clients/20120331/11/00/localhost/part-m-00001.avro
-rw-r--r--   1 mapred supergroup        571 2012-03-31 19:36
/data/dev/inputs/complete/clients/20120331/11/00/localhost/part-m-00002.avro
-rw-r--r--   1 mapred supergroup        715 2012-03-31 19:36
/data/dev/inputs/complete/clients/20120331/11/00/localhost/part-m-00003.avro





Here are my sqoop commands






FULL import
sqoop import
--class-name "$TABLE"
--verbose
--as-avrodatafile
--null-non-string=""
--columns $COLUMNS
--incremental "append"
--null-string=""
--compress
--compression-codec=snappy
--check-column "object_version"
--append
--fields-terminated-by=\t
--where "$WHERE"
--connect jdbc:mysql://$SERVER:3306/database
--password=$PASSWORD
--username=$USERNAME
--table=$TABLE
--target-dir $OUTPUTDIR


Incremental import
sqoop import
--class-name "$TABLE"
--verbose
--as-avrodatafile
--null-non-string=''
--columns $COLUMNS
--incremental 'append'
--null-string=''
--compress
--compression-codec=snappy
--check-column 'object_version'
--append
--fields-terminated-by=\t
--where "$WHERE"
--connect jdbc:mysql://$SERVER:3306/database
--password=$PASSWORD
--username=$USERNAME
--table=$TABLE
--target-dir $OUTPUTDIR
--last-value "$LAST_VALUE"



sqoop merge
--verbose
--merge-key $MERGE_KEY
--new-data $SERVEROUTPUTDIRHOURLY
--target-dir $OUTPUTDIRCOMPLETE
--onto $OLD_OUPUTDIRCOMPLETE
--class-name $TABLE








Thanks


Danny