You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Adarsh Sharma <ed...@gmail.com> on 2012/09/13 16:53:44 UTC

Sqoop Writes Inaccurate Records in DB

Hi all,

I am using sqoop-1.4.2 with cloudera hadoop and doing some tesing. We need
to export some tables from CSV's in HDFS.
As sqoop provides a mechanism of staging tables to write data in main
tables only if all maps are succeeded.

While executing a sqoop job on hadoop , suppose a map fails & hadoop
reattempt the map to re-run and finish after 3 attempts, it results in
 duplicate records in staging table and the job finished but data inserted
is higher than in CSV's. Below is the output :

12/09/13 14:46:55 INFO mapreduce.ExportJobBase: Exported 4071315 records.
12/09/13 14:46:55 INFO mapreduce.ExportJobBase: Starting to migrate data
from staging table to destination.
12/09/13 14:47:29 INFO manager.SqlManager: Migrated 5391315 records from
table1_tmp to table

Is this is a bug in Sqoop and is there any fix or patch for it. Please let
me know.


Thanks

Re: Sqoop Writes Inaccurate Records in DB

Posted by Adarsh Sharma <ed...@gmail.com>.
Would this helps :-

https://issues.apache.org/jira/browse/SQOOP-390

Thanks

On Fri, Sep 14, 2012 at 1:12 PM, Jarek Jarcec Cecho <ja...@apache.org>wrote:

> Hi Adarsh,
> it seems as a bug to me. Would you mind creating a JIRA issue for that?
>
> Jarcec
>
> On Fri, Sep 14, 2012 at 09:57:45AM +0530, Adarsh Sharma wrote:
> > Ya sure, please have a look on below commands :-
> >
> > bin/sqoop job  -- export --connect jdbc:postgresql://localhost/dbname
> > --export-dir /data/data.2012-09-08-00.csv --staging-table daily_tmp
> > --clear-staging-table --verbose  --table daily  --username abc --password
> > abc --input-fields-terminated-by '^A'
> >
> > Also attaching the output of job. But below lines explains everything :
> > 12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Transferred 396.1099 MB
> in
> > 237.2008 seconds (1.6699 MB/sec)
> > 12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Exported 4071315 records.
> > 12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Starting to migrate data
> > from staging table to destination.
> > 12/09/14 04:14:29 INFO manager.SqlManager: Migrated 5321315 records from
> > daily_tmp to daily
> >
> > Total Records in CSV : 4071315 , Records Inserted : 52321315 ( due to
> fail
> > & rerun map tasks )
> > 12/09/14 04:11:57 INFO mapred.JobClient:  map 79% reduce 0%
> > 12/09/14 04:12:00 INFO mapred.JobClient:  map 80% reduce 0% ( Now map
> fails
> > )
> > 12/09/14 04:12:01 INFO mapred.JobClient:  map 75% reduce 0%
> > 12/09/14 04:12:12 INFO mapred.JobClient:  map 76% reduce 0%
> >
> > Please let me know if other info is reqd.
> >
> > Thanks
> >
> > On Thu, Sep 13, 2012 at 10:12 PM, Kathleen Ting <ka...@apache.org>
> wrote:
> >
> > > Hi Adarsh, can you re-run with the --verbose option enabled? Also,
> > > please paste in the entire Sqoop command used.
> > >
> > > Thanks, Kathleen
> > >
> > > On Thu, Sep 13, 2012 at 7:53 AM, Adarsh Sharma <ed...@gmail.com>
> > > wrote:
> > > > Hi all,
> > > >
> > > > I am using sqoop-1.4.2 with cloudera hadoop and doing some tesing. We
> > > need
> > > > to export some tables from CSV's in HDFS.
> > > > As sqoop provides a mechanism of staging tables to write data in main
> > > tables
> > > > only if all maps are succeeded.
> > > >
> > > > While executing a sqoop job on hadoop , suppose a map fails & hadoop
> > > > reattempt the map to re-run and finish after 3 attempts, it results
> in
> > > > duplicate records in staging table and the job finished but data
> > > inserted is
> > > > higher than in CSV's. Below is the output :
> > > >
> > > > 12/09/13 14:46:55 INFO mapreduce.ExportJobBase: Exported 4071315
> records.
> > > > 12/09/13 14:46:55 INFO mapreduce.ExportJobBase: Starting to migrate
> data
> > > > from staging table to destination.
> > > > 12/09/13 14:47:29 INFO manager.SqlManager: Migrated 5391315 records
> from
> > > > table1_tmp to table
> > > >
> > > > Is this is a bug in Sqoop and is there any fix or patch for it.
> Please
> > > let
> > > > me know.
> > > >
> > > >
> > > > Thanks
> > >
>
> > adarsh@1002:~/sqoop-1.4.2.bin__hadoop-0.20$ bin/sqoop j export
> --connect jdbc:postgresql://localhost/dbname --export-dir
> /data/data.2012-09-08-00.csv --staging-table daily_tmp
> --clear-staging-table --verbose  --table daily  --username abc --password
> abc --input-fields-terminated-by '^A'
> > 12/09/14 04:09:53 DEBUG tool.BaseSqoopTool: Enabled debug logging.
> > 12/09/14 04:09:53 WARN tool.BaseSqoopTool: Setting your password on the
> command-line is insecure. Consider using -P instead.
> > 12/09/14 04:09:53 DEBUG sqoop.ConnFactory: Loaded manager factory:
> com.cloudera.sqoop.manager.DefaultManagerFactory
> > 12/09/14 04:09:53 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> com.cloudera.sqoop.manager.DefaultManagerFactory
> > 12/09/14 04:09:53 DEBUG manager.DefaultManagerFactory: Trying with
> scheme: jdbc:postgresql:
> > 12/09/14 04:09:53 INFO manager.SqlManager: Using default fetchSize of
> 1000
> > 12/09/14 04:09:53 DEBUG sqoop.ConnFactory: Instantiated ConnManager
> org.apache.sqoop.manager.PostgresqlManager@b6e39f
> > 12/09/14 04:09:53 INFO tool.CodeGenTool: Beginning code generation
> > 12/09/14 04:09:53 DEBUG manager.SqlManager: No connection paramenters
> specified. Using regular API for making connection.
> > 12/09/14 04:09:53 DEBUG manager.SqlManager: Using fetchSize for next
> query: 1000
> > 12/09/14 04:09:53 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM "daily" AS t LIMIT 1
> > 12/09/14 04:09:53 DEBUG orm.ClassWriter: selected columns:
> > 12/09/14 04:09:53 DEBUG orm.ClassWriter:   a
> > 12/09/14 04:09:53 DEBUG orm.ClassWriter:   b
> > 12/09/14 04:09:53 DEBUG orm.ClassWriter:   total_cost
> > 12/09/14 04:09:53 DEBUG orm.ClassWriter:   c
> > 12/09/14 04:09:53 DEBUG orm.ClassWriter:   tpid
> > 12/09/14 04:09:53 DEBUG orm.ClassWriter:   daily
> > 12/09/14 04:09:53 DEBUG orm.ClassWriter:   e
> > 12/09/14 04:09:53 DEBUG orm.ClassWriter: Writing source file:
> /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.java
> > 12/09/14 04:09:53 DEBUG orm.ClassWriter: Table name: daily
> > 12/09/14 04:09:53 DEBUG orm.ClassWriter: Columns:  a:12,b:8,
> total_cost:8, c:4, tpid:12, daily:-5, e:12,
> > 12/09/14 04:09:53 DEBUG orm.ClassWriter: sourceFilename is daily.java
> > 12/09/14 04:09:53 DEBUG orm.CompilationManager: Found existing
> /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/
> > 12/09/14 04:09:53 INFO orm.CompilationManager: HADOOP_HOME is
> /usr/lib/hadoop-0.20
> > 12/09/14 04:09:53 INFO orm.CompilationManager: Found hadoop core jar at:
> /usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u3-core.jar
> > 12/09/14 04:09:53 DEBUG orm.CompilationManager: Adding source file:
> /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.java
> > 12/09/14 04:09:53 DEBUG orm.CompilationManager: Invoking javac with args:
> > 12/09/14 04:09:53 DEBUG orm.CompilationManager:   -sourcepath
> > 12/09/14 04:09:53 DEBUG orm.CompilationManager:
> /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/
> > 12/09/14 04:09:53 DEBUG orm.CompilationManager:   -d
> > 12/09/14 04:09:53 DEBUG orm.CompilationManager:
> /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/
> > 12/09/14 04:09:53 DEBUG orm.CompilationManager:   -classpath
> > 12/09/14 04:09:53 DEBUG orm.CompilationManager:
> /usr/lib/hadoop/conf/:/usr/lib/jvm/java-6-sun-1.6.0.24/lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-lang-2.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar:/usr/lib/hadoop-0.20/lib/hadoop-capacity-scheduler-0.20.2-cdh3u3.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u3.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.10.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../conf:/etc/zookeeper::/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/ant-contrib-1.0b3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/ant-eclipse-1.0-jvm1.2.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/avro-1.5.3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/avro-ipc-1.5.3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/avro-mapred-1.5.3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/commons-io-1.4.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/hsqldb-1.8.0.10.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/jackson-core-asl-1.7.3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/jackson-mapper-asl-1.7.3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/jopt-simple-3.2.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/paranamer-2.3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/postgresql-9.1-902.jdbc3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/snappy-java-1.0.3.2.jar:/usr/lib/hbase/bin/../conf:/usr/lib/jvm/java-6-sun-1.6.0.24/lib/tools.jar:/usr/lib/hbase/bin/..:/usr/lib/hbase/bin/../hbase-0.90.4-cdh3u3.jar:/usr/lib/hbase/bin/../hbase-0.90.4-cdh3u3-tests.jar:/usr/lib/hbase/bin/../lib/activation-1.1.jar:/usr/lib/hbase/bin/../lib/asm-3.1.jar:/usr/lib/hbase/bin/../lib/avro-1.5.4.jar:/usr/lib/hbase/bin/../lib/avro-ipc-1.5.4.jar:/usr/lib/hbase/bin/../lib/commons-cli-1.2.jar:/usr/lib/hbase/bin/../lib/commons-codec-1.4.jar:/usr/lib/hbase/bin/../lib/commons-el-1.0.jar:/usr/lib/hbase/bin/../lib/commons-httpclient-3.1.jar:/usr/lib/hbase/bin/../lib/commons-lang-2.5.jar:/usr/lib/hbase/bin/../lib/commons-logging-1.1.1.jar:/usr/lib/hbase/bin/../lib/commons-net-1.4.1.jar:/usr/lib/hbase/bin/../lib/core-3.1.1.jar:/usr/lib/hbase/bin/../lib/guava-r06.jar:/usr/lib/hbase/bin/../lib/guava-r09-jarjar.jar:/usr/lib/hbase/bin/../lib/hadoop-core.jar:/usr/lib/hbase/bin/../lib/jackson-core-asl-1.5.2.jar:/usr/lib/hbase/bin/../lib/jackson-jaxrs-1.5.5.jar:/usr/lib/hbase/bin/../lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hbase/bin/../lib/jackson-xc-1.5.5.jar:/usr/lib/hbase/bin/../lib/jamon-runtime-2.3.1.jar:/usr/lib/hbase/bin/../lib/jasper-compiler-5.5.23.jar:/usr/lib/hbase/bin/../lib/jasper-runtime-5.5.23.jar:/usr/lib/hbase/bin/../lib/jaxb-api-2.1.jar:/usr/lib/hbase/bin/../lib/jaxb-impl-2.1.12.jar:/usr/lib/hbase/bin/../lib/jersey-core-1.4.jar:/usr/lib/hbase/bin/../lib/jersey-json-1.4.jar:/usr/lib/hbase/bin/../lib/jersey-server-1.4.jar:/usr/lib/hbase/bin/../lib/jettison-1.1.jar:/usr/lib/hbase/bin/../lib/jetty-6.1.26.jar:/usr/lib/hbase/bin/../lib/jetty-util-6.1.26.jar:/usr/lib/hbase/bin/../lib/jruby-complete-1.6.0.jar:/usr/lib/hbase/bin/../lib/jsp-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsp-api-2.1.jar:/usr/lib/hbase/bin/../lib/jsr311-api-1.1.1.jar:/usr/lib/hbase/bin/../lib/log4j-1.2.16.jar:/usr/lib/hbase/bin/../lib/netty-3.2.4.Final.jar:/usr/lib/hbase/bin/../lib/protobuf-java-2.3.0.jar:/usr/lib/hbase/bin/../lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hbase/bin/../lib/servlet-api-2.5.jar:/usr/lib/hbase/bin/../lib/slf4j-api-1.5.8.jar:/usr/lib/hbase/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/lib/hbase/bin/../lib/snappy-java-1.0.3.2.jar:/usr/lib/hbase/bin/../lib/stax-api-1.0.1.jar:/usr/lib/hbase/bin/../lib/thrift-0.2.0.jar:/usr/lib/hbase/bin/../lib/velocity-1.5.jar:/usr/lib/hbase/bin/../lib/xmlenc-0.52.jar:/usr/lib/hbase/bin/../lib/zookeeper.jar:/usr/lib/hadoop/conf/:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-lang-2.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar:/usr/lib/hadoop-0.20/lib/hadoop-capacity-scheduler-0.20.2-cdh3u3.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u3.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.10.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../sqoop-1.4.2.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../sqoop-test-1.4.2.jar::/usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u3-core.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/sqoop-1.4.2.jar
> > Note:
> /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.java uses
> or overrides a deprecated API.
> > Note: Recompile with -Xlint:deprecation for details.
> > 12/09/14 04:09:54 DEBUG orm.CompilationManager: Could not rename
> /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.java to
> /home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/./daily.java
> > java.io.IOException: Destination
> '/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/./daily.java' already exists
> >       at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:1811)
> >       at
> org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:227)
> >       at
> org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
> >       at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
> >       at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:97)
> >       at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> >       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >       at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> >       at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> >       at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> >       at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> >       at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
> > 12/09/14 04:09:54 INFO orm.CompilationManager: Writing jar file:
> /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.jar
> > 12/09/14 04:09:54 DEBUG orm.CompilationManager: Scanning for .class
> files in directory:
> /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227
> > 12/09/14 04:09:54 DEBUG orm.CompilationManager: Got classfile:
> /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.class ->
> daily.class
> > 12/09/14 04:09:54 DEBUG orm.CompilationManager: Finished writing jar
> file /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.jar
> > 12/09/14 04:09:54 INFO mapreduce.ExportJobBase: Data will be staged in
> the table: daily_tmp
> > 12/09/14 04:09:54 INFO mapreduce.ExportJobBase: Beginning export of daily
> > 12/09/14 04:09:54 INFO manager.SqlManager: Deleted 0 records from
> daily_tmp
> > 12/09/14 04:09:54 INFO security.UserGroupInformation: JAAS Configuration
> already set up for Hadoop, not re-installing.
> > 12/09/14 04:09:54 DEBUG mapreduce.JobBase: Using InputFormat: class
> org.apache.sqoop.mapreduce.ExportInputFormat
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/sqoop-1.4.2.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/postgresql-9.1-902.jdbc3.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/sqoop-1.4.2.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/sqoop-1.4.2.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/jopt-simple-3.2.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/ant-eclipse-1.0-jvm1.2.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/paranamer-2.3.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/commons-io-1.4.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/jackson-mapper-asl-1.7.3.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/avro-ipc-1.5.3.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/avro-1.5.3.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/ant-contrib-1.0b3.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/hsqldb-1.8.0.10.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/avro-mapred-1.5.3.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/postgresql-9.1-902.jdbc3.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/jackson-core-asl-1.7.3.jar
> > 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath:
> file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/snappy-java-1.0.3.2.jar
> > 12/09/14 04:09:56 INFO input.FileInputFormat: Total input paths to
> process : 1
> > 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat: Target numMapTasks=4
> > 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat: Total input
> bytes=414957434
> > 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat:
> maxSplitSize=103739358
> > 12/09/14 04:09:56 INFO input.FileInputFormat: Total input paths to
> process : 1
> > 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat: Generated splits:
> > 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat:
> Paths:/data/data.2012-09-08-00.csv:0+134217728 Locations:localhost:;
> > 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat:
> Paths:/data/data.2012-09-08-00.csv:134217728+134217728 Locations:localhost:;
> > 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat:
> Paths:/data/data.2012-09-08-00.csv:268435456+134217728
> Locations:10.1.2.3.4:;
> > 12/09/14 04:09:56 INFO mapred.JobClient: Running job:
> job_201209050808_49604
> > 12/09/14 04:09:57 INFO mapred.JobClient:  map 0% reduce 0%
> > 12/09/14 04:10:16 INFO mapred.JobClient:  map 14% reduce 0%
> > 12/09/14 04:10:19 INFO mapred.JobClient:  map 23% reduce 0%
> > 12/09/14 04:10:22 INFO mapred.JobClient:  map 31% reduce 0%
> > 12/09/14 04:10:23 INFO mapred.JobClient:  map 32% reduce 0%
> > 12/09/14 04:10:25 INFO mapred.JobClient:  map 33% reduce 0%
> > 12/09/14 04:10:26 INFO mapred.JobClient:  map 34% reduce 0%
> > 12/09/14 04:10:29 INFO mapred.JobClient:  map 36% reduce 0%
> > 12/09/14 04:10:31 INFO mapred.JobClient:  map 37% reduce 0%
> > 12/09/14 04:10:32 INFO mapred.JobClient:  map 38% reduce 0%
> > 12/09/14 04:10:34 INFO mapred.JobClient:  map 39% reduce 0%
> > 12/09/14 04:10:35 INFO mapred.JobClient:  map 40% reduce 0%
> > 12/09/14 04:10:37 INFO mapred.JobClient:  map 41% reduce 0%
> > 12/09/14 04:10:38 INFO mapred.JobClient:  map 42% reduce 0%
> > 12/09/14 04:10:41 INFO mapred.JobClient:  map 44% reduce 0%
> > 12/09/14 04:10:44 INFO mapred.JobClient:  map 46% reduce 0%
> > 12/09/14 04:10:47 INFO mapred.JobClient:  map 48% reduce 0%
> > 12/09/14 04:10:50 INFO mapred.JobClient:  map 50% reduce 0%
> > 12/09/14 04:10:53 INFO mapred.JobClient:  map 52% reduce 0%
> > 12/09/14 04:10:56 INFO mapred.JobClient:  map 55% reduce 0%
> > 12/09/14 04:10:59 INFO mapred.JobClient:  map 57% reduce 0%
> > 12/09/14 04:11:02 INFO mapred.JobClient:  map 59% reduce 0%
> > 12/09/14 04:11:05 INFO mapred.JobClient:  map 61% reduce 0%
> > 12/09/14 04:11:08 INFO mapred.JobClient:  map 63% reduce 0%
> > 12/09/14 04:11:11 INFO mapred.JobClient:  map 65% reduce 0%
> > 12/09/14 04:11:14 INFO mapred.JobClient:  map 67% reduce 0%
> > 12/09/14 04:11:17 INFO mapred.JobClient:  map 68% reduce 0%
> > 12/09/14 04:11:18 INFO mapred.JobClient:  map 69% reduce 0%
> > 12/09/14 04:11:20 INFO mapred.JobClient:  map 70% reduce 0%
> > 12/09/14 04:11:21 INFO mapred.JobClient:  map 71% reduce 0%
> > 12/09/14 04:11:23 INFO mapred.JobClient:  map 72% reduce 0%
> > 12/09/14 04:11:24 INFO mapred.JobClient:  map 74% reduce 0%
> > 12/09/14 04:11:27 INFO mapred.JobClient:  map 59% reduce 0%
> > 12/09/14 04:11:29 INFO mapred.JobClient:  map 60% reduce 0%
> > 12/09/14 04:11:30 INFO mapred.JobClient:  map 61% reduce 0%
> > 12/09/14 04:11:33 INFO mapred.JobClient:  map 62% reduce 0%
> > 12/09/14 04:11:35 INFO mapred.JobClient:  map 63% reduce 0%
> > 12/09/14 04:11:36 INFO mapred.JobClient:  map 65% reduce 0%
> > 12/09/14 04:11:39 INFO mapred.JobClient:  map 67% reduce 0%
> > 12/09/14 04:11:42 INFO mapred.JobClient:  map 69% reduce 0%
> > 12/09/14 04:11:44 INFO mapred.JobClient:  map 70% reduce 0%
> > 12/09/14 04:11:45 INFO mapred.JobClient:  map 71% reduce 0%
> > 12/09/14 04:11:48 INFO mapred.JobClient:  map 73% reduce 0%
> > 12/09/14 04:11:51 INFO mapred.JobClient:  map 75% reduce 0%
> > 12/09/14 04:11:54 INFO mapred.JobClient:  map 77% reduce 0%
> > 12/09/14 04:11:57 INFO mapred.JobClient:  map 79% reduce 0%
> > 12/09/14 04:12:00 INFO mapred.JobClient:  map 80% reduce 0%
> > 12/09/14 04:12:01 INFO mapred.JobClient:  map 75% reduce 0%
> > 12/09/14 04:12:12 INFO mapred.JobClient:  map 76% reduce 0%
> > 12/09/14 04:12:15 INFO mapred.JobClient:  map 77% reduce 0%
> > 12/09/14 04:12:21 INFO mapred.JobClient:  map 78% reduce 0%
> > 12/09/14 04:12:24 INFO mapred.JobClient:  map 79% reduce 0%
> > 12/09/14 04:12:30 INFO mapred.JobClient:  map 80% reduce 0%
> > 12/09/14 04:12:33 INFO mapred.JobClient:  map 81% reduce 0%
> > 12/09/14 04:12:36 INFO mapred.JobClient:  map 82% reduce 0%
> > 12/09/14 04:12:42 INFO mapred.JobClient:  map 83% reduce 0%
> > 12/09/14 04:12:45 INFO mapred.JobClient:  map 84% reduce 0%
> > 12/09/14 04:12:51 INFO mapred.JobClient:  map 85% reduce 0%
> > 12/09/14 04:12:54 INFO mapred.JobClient:  map 86% reduce 0%
> > 12/09/14 04:13:01 INFO mapred.JobClient:  map 87% reduce 0%
> > 12/09/14 04:13:04 INFO mapred.JobClient:  map 88% reduce 0%
> > 12/09/14 04:13:07 INFO mapred.JobClient:  map 89% reduce 0%
> > 12/09/14 04:13:13 INFO mapred.JobClient:  map 90% reduce 0%
> > 12/09/14 04:13:16 INFO mapred.JobClient:  map 91% reduce 0%
> > 12/09/14 04:13:22 INFO mapred.JobClient:  map 92% reduce 0%
> > 12/09/14 04:13:25 INFO mapred.JobClient:  map 93% reduce 0%
> > 12/09/14 04:13:28 INFO mapred.JobClient:  map 94% reduce 0%
> > 12/09/14 04:13:34 INFO mapred.JobClient:  map 95% reduce 0%
> > 12/09/14 04:13:37 INFO mapred.JobClient:  map 96% reduce 0%
> > 12/09/14 04:13:43 INFO mapred.JobClient:  map 97% reduce 0%
> > 12/09/14 04:13:46 INFO mapred.JobClient:  map 98% reduce 0%
> > 12/09/14 04:13:49 INFO mapred.JobClient:  map 99% reduce 0%
> > 12/09/14 04:13:52 INFO mapred.JobClient:  map 100% reduce 0%
> > 12/09/14 04:13:52 INFO mapred.JobClient: Job complete:
> job_201209050808_49604
> > 12/09/14 04:13:52 INFO mapred.JobClient: Counters: 16
> > 12/09/14 04:13:52 INFO mapred.JobClient:   Job Counters
> > 12/09/14 04:13:52 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=346917
> > 12/09/14 04:13:52 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> > 12/09/14 04:13:52 INFO mapred.JobClient:     Total time spent by all
> maps waiting after reserving slots (ms)=0
> > 12/09/14 04:13:52 INFO mapred.JobClient:     Rack-local map tasks=6
> > 12/09/14 04:13:52 INFO mapred.JobClient:     Launched map tasks=6
> > 12/09/14 04:13:52 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1483
> > 12/09/14 04:13:52 INFO mapred.JobClient:   FileSystemCounters
> > 12/09/14 04:13:52 INFO mapred.JobClient:     HDFS_BYTES_READ=415351349
> > 12/09/14 04:13:52 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=243012
> > 12/09/14 04:13:52 INFO mapred.JobClient:   Map-Reduce Framework
> > 12/09/14 04:13:52 INFO mapred.JobClient:     Map input records=4071315
> > 12/09/14 04:13:52 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=2018009088
> > 12/09/14 04:13:52 INFO mapred.JobClient:     Spilled Records=0
> > 12/09/14 04:13:52 INFO mapred.JobClient:     CPU time spent (ms)=240740
> > 12/09/14 04:13:52 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=1919221760
> > 12/09/14 04:13:52 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=5843054592
> > 12/09/14 04:13:52 INFO mapred.JobClient:     Map output records=4071315
> > 12/09/14 04:13:52 INFO mapred.JobClient:     SPLIT_RAW_BYTES=672
> > 12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Transferred 396.1099 MB
> in 237.2008 seconds (1.6699 MB/sec)
> > 12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Exported 4071315 records.
> > 12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Starting to migrate data
> from staging table to destination.
> > 12/09/14 04:14:29 INFO manager.SqlManager: Migrated 5321315 records from
> daily_tmp to daily
> > adarsh@gs1002:~/sqoop-1.4.2.bin__hadoop-0.20$
>
>

Re: Sqoop Writes Inaccurate Records in DB

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Hi Adarsh,
it seems as a bug to me. Would you mind creating a JIRA issue for that?

Jarcec

On Fri, Sep 14, 2012 at 09:57:45AM +0530, Adarsh Sharma wrote:
> Ya sure, please have a look on below commands :-
> 
> bin/sqoop job  -- export --connect jdbc:postgresql://localhost/dbname
> --export-dir /data/data.2012-09-08-00.csv --staging-table daily_tmp
> --clear-staging-table --verbose  --table daily  --username abc --password
> abc --input-fields-terminated-by '^A'
> 
> Also attaching the output of job. But below lines explains everything :
> 12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Transferred 396.1099 MB in
> 237.2008 seconds (1.6699 MB/sec)
> 12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Exported 4071315 records.
> 12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Starting to migrate data
> from staging table to destination.
> 12/09/14 04:14:29 INFO manager.SqlManager: Migrated 5321315 records from
> daily_tmp to daily
> 
> Total Records in CSV : 4071315 , Records Inserted : 52321315 ( due to fail
> & rerun map tasks )
> 12/09/14 04:11:57 INFO mapred.JobClient:  map 79% reduce 0%
> 12/09/14 04:12:00 INFO mapred.JobClient:  map 80% reduce 0% ( Now map fails
> )
> 12/09/14 04:12:01 INFO mapred.JobClient:  map 75% reduce 0%
> 12/09/14 04:12:12 INFO mapred.JobClient:  map 76% reduce 0%
> 
> Please let me know if other info is reqd.
> 
> Thanks
> 
> On Thu, Sep 13, 2012 at 10:12 PM, Kathleen Ting <ka...@apache.org> wrote:
> 
> > Hi Adarsh, can you re-run with the --verbose option enabled? Also,
> > please paste in the entire Sqoop command used.
> >
> > Thanks, Kathleen
> >
> > On Thu, Sep 13, 2012 at 7:53 AM, Adarsh Sharma <ed...@gmail.com>
> > wrote:
> > > Hi all,
> > >
> > > I am using sqoop-1.4.2 with cloudera hadoop and doing some tesing. We
> > need
> > > to export some tables from CSV's in HDFS.
> > > As sqoop provides a mechanism of staging tables to write data in main
> > tables
> > > only if all maps are succeeded.
> > >
> > > While executing a sqoop job on hadoop , suppose a map fails & hadoop
> > > reattempt the map to re-run and finish after 3 attempts, it results in
> > > duplicate records in staging table and the job finished but data
> > inserted is
> > > higher than in CSV's. Below is the output :
> > >
> > > 12/09/13 14:46:55 INFO mapreduce.ExportJobBase: Exported 4071315 records.
> > > 12/09/13 14:46:55 INFO mapreduce.ExportJobBase: Starting to migrate data
> > > from staging table to destination.
> > > 12/09/13 14:47:29 INFO manager.SqlManager: Migrated 5391315 records from
> > > table1_tmp to table
> > >
> > > Is this is a bug in Sqoop and is there any fix or patch for it. Please
> > let
> > > me know.
> > >
> > >
> > > Thanks
> >

> adarsh@1002:~/sqoop-1.4.2.bin__hadoop-0.20$ bin/sqoop j export --connect jdbc:postgresql://localhost/dbname --export-dir /data/data.2012-09-08-00.csv --staging-table daily_tmp --clear-staging-table --verbose  --table daily  --username abc --password abc --input-fields-terminated-by '^A'
> 12/09/14 04:09:53 DEBUG tool.BaseSqoopTool: Enabled debug logging.
> 12/09/14 04:09:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
> 12/09/14 04:09:53 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory
> 12/09/14 04:09:53 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory
> 12/09/14 04:09:53 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:postgresql:
> 12/09/14 04:09:53 INFO manager.SqlManager: Using default fetchSize of 1000
> 12/09/14 04:09:53 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.PostgresqlManager@b6e39f
> 12/09/14 04:09:53 INFO tool.CodeGenTool: Beginning code generation
> 12/09/14 04:09:53 DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection.
> 12/09/14 04:09:53 DEBUG manager.SqlManager: Using fetchSize for next query: 1000
> 12/09/14 04:09:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "daily" AS t LIMIT 1
> 12/09/14 04:09:53 DEBUG orm.ClassWriter: selected columns:
> 12/09/14 04:09:53 DEBUG orm.ClassWriter:   a
> 12/09/14 04:09:53 DEBUG orm.ClassWriter:   b
> 12/09/14 04:09:53 DEBUG orm.ClassWriter:   total_cost
> 12/09/14 04:09:53 DEBUG orm.ClassWriter:   c
> 12/09/14 04:09:53 DEBUG orm.ClassWriter:   tpid
> 12/09/14 04:09:53 DEBUG orm.ClassWriter:   daily
> 12/09/14 04:09:53 DEBUG orm.ClassWriter:   e
> 12/09/14 04:09:53 DEBUG orm.ClassWriter: Writing source file: /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.java
> 12/09/14 04:09:53 DEBUG orm.ClassWriter: Table name: daily
> 12/09/14 04:09:53 DEBUG orm.ClassWriter: Columns:  a:12,b:8, total_cost:8, c:4, tpid:12, daily:-5, e:12, 
> 12/09/14 04:09:53 DEBUG orm.ClassWriter: sourceFilename is daily.java
> 12/09/14 04:09:53 DEBUG orm.CompilationManager: Found existing /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/
> 12/09/14 04:09:53 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop-0.20
> 12/09/14 04:09:53 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u3-core.jar
> 12/09/14 04:09:53 DEBUG orm.CompilationManager: Adding source file: /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.java
> 12/09/14 04:09:53 DEBUG orm.CompilationManager: Invoking javac with args:
> 12/09/14 04:09:53 DEBUG orm.CompilationManager:   -sourcepath
> 12/09/14 04:09:53 DEBUG orm.CompilationManager:   /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/
> 12/09/14 04:09:53 DEBUG orm.CompilationManager:   -d
> 12/09/14 04:09:53 DEBUG orm.CompilationManager:   /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/
> 12/09/14 04:09:53 DEBUG orm.CompilationManager:   -classpath
> 12/09/14 04:09:53 DEBUG orm.CompilationManager:   /usr/lib/hadoop/conf/:/usr/lib/jvm/java-6-sun-1.6.0.24/lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-lang-2.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar:/usr/lib/hadoop-0.20/lib/hadoop-capacity-scheduler-0.20.2-cdh3u3.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u3.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.10.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../conf:/etc/zookeeper::/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/ant-contrib-1.0b3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/ant-eclipse-1.0-jvm1.2.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/avro-1.5.3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/avro-ipc-1.5.3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/avro-mapred-1.5.3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/commons-io-1.4.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/hsqldb-1.8.0.10.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/jackson-core-asl-1.7.3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/jackson-mapper-asl-1.7.3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/jopt-simple-3.2.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/paranamer-2.3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/postgresql-9.1-902.jdbc3.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../lib/snappy-java-1.0.3.2.jar:/usr/lib/hbase/bin/../conf:/usr/lib/jvm/java-6-sun-1.6.0.24/lib/tools.jar:/usr/lib/hbase/bin/..:/usr/lib/hbase/bin/../hbase-0.90.4-cdh3u3.jar:/usr/lib/hbase/bin/../hbase-0.90.4-cdh3u3-tests.jar:/usr/lib/hbase/bin/../lib/activation-1.1.jar:/usr/lib/hbase/bin/../lib/asm-3.1.jar:/usr/lib/hbase/bin/../lib/avro-1.5.4.jar:/usr/lib/hbase/bin/../lib/avro-ipc-1.5.4.jar:/usr/lib/hbase/bin/../lib/commons-cli-1.2.jar:/usr/lib/hbase/bin/../lib/commons-codec-1.4.jar:/usr/lib/hbase/bin/../lib/commons-el-1.0.jar:/usr/lib/hbase/bin/../lib/commons-httpclient-3.1.jar:/usr/lib/hbase/bin/../lib/commons-lang-2.5.jar:/usr/lib/hbase/bin/../lib/commons-logging-1.1.1.jar:/usr/lib/hbase/bin/../lib/commons-net-1.4.1.jar:/usr/lib/hbase/bin/../lib/core-3.1.1.jar:/usr/lib/hbase/bin/../lib/guava-r06.jar:/usr/lib/hbase/bin/../lib/guava-r09-jarjar.jar:/usr/lib/hbase/bin/../lib/hadoop-core.jar:/usr/lib/hbase/bin/../lib/jackson-core-asl-1.5.2.jar:/usr/lib/hbase/bin/../lib/jackson-jaxrs-1.5.5.jar:/usr/lib/hbase/bin/../lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hbase/bin/../lib/jackson-xc-1.5.5.jar:/usr/lib/hbase/bin/../lib/jamon-runtime-2.3.1.jar:/usr/lib/hbase/bin/../lib/jasper-compiler-5.5.23.jar:/usr/lib/hbase/bin/../lib/jasper-runtime-5.5.23.jar:/usr/lib/hbase/bin/../lib/jaxb-api-2.1.jar:/usr/lib/hbase/bin/../lib/jaxb-impl-2.1.12.jar:/usr/lib/hbase/bin/../lib/jersey-core-1.4.jar:/usr/lib/hbase/bin/../lib/jersey-json-1.4.jar:/usr/lib/hbase/bin/../lib/jersey-server-1.4.jar:/usr/lib/hbase/bin/../lib/jettison-1.1.jar:/usr/lib/hbase/bin/../lib/jetty-6.1.26.jar:/usr/lib/hbase/bin/../lib/jetty-util-6.1.26.jar:/usr/lib/hbase/bin/../lib/jruby-complete-1.6.0.jar:/usr/lib/hbase/bin/../lib/jsp-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsp-api-2.1.jar:/usr/lib/hbase/bin/../lib/jsr311-api-1.1.1.jar:/usr/lib/hbase/bin/../lib/log4j-1.2.16.jar:/usr/lib/hbase/bin/../lib/netty-3.2.4.Final.jar:/usr/lib/hbase/bin/../lib/protobuf-java-2.3.0.jar:/usr/lib/hbase/bin/../lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hbase/bin/../lib/servlet-api-2.5.jar:/usr/lib/hbase/bin/../lib/slf4j-api-1.5.8.jar:/usr/lib/hbase/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/lib/hbase/bin/../lib/snappy-java-1.0.3.2.jar:/usr/lib/hbase/bin/../lib/stax-api-1.0.1.jar:/usr/lib/hbase/bin/../lib/thrift-0.2.0.jar:/usr/lib/hbase/bin/../lib/velocity-1.5.jar:/usr/lib/hbase/bin/../lib/xmlenc-0.52.jar:/usr/lib/hbase/bin/../lib/zookeeper.jar:/usr/lib/hadoop/conf/:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-lang-2.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar:/usr/lib/hadoop-0.20/lib/hadoop-capacity-scheduler-0.20.2-cdh3u3.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u3.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.10.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../sqoop-1.4.2.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/bin/../sqoop-test-1.4.2.jar::/usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u3-core.jar:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/sqoop-1.4.2.jar
> Note: /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.java uses or overrides a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> 12/09/14 04:09:54 DEBUG orm.CompilationManager: Could not rename /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.java to /home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/./daily.java
> java.io.IOException: Destination '/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/./daily.java' already exists
> 	at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:1811)
> 	at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:227)
> 	at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
> 	at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
> 	at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:97)
> 	at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 	at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> 	at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
> 12/09/14 04:09:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.jar
> 12/09/14 04:09:54 DEBUG orm.CompilationManager: Scanning for .class files in directory: /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227
> 12/09/14 04:09:54 DEBUG orm.CompilationManager: Got classfile: /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.class -> daily.class
> 12/09/14 04:09:54 DEBUG orm.CompilationManager: Finished writing jar file /tmp/sqoop-adarsh/compile/70af23436d0ed0f9d7f1b6bb713f1227/daily.jar
> 12/09/14 04:09:54 INFO mapreduce.ExportJobBase: Data will be staged in the table: daily_tmp
> 12/09/14 04:09:54 INFO mapreduce.ExportJobBase: Beginning export of daily
> 12/09/14 04:09:54 INFO manager.SqlManager: Deleted 0 records from daily_tmp
> 12/09/14 04:09:54 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
> 12/09/14 04:09:54 DEBUG mapreduce.JobBase: Using InputFormat: class org.apache.sqoop.mapreduce.ExportInputFormat
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/sqoop-1.4.2.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/postgresql-9.1-902.jdbc3.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/sqoop-1.4.2.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/sqoop-1.4.2.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/jopt-simple-3.2.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/ant-eclipse-1.0-jvm1.2.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/paranamer-2.3.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/commons-io-1.4.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/jackson-mapper-asl-1.7.3.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/avro-ipc-1.5.3.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/avro-1.5.3.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/ant-contrib-1.0b3.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/hsqldb-1.8.0.10.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/avro-mapred-1.5.3.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/postgresql-9.1-902.jdbc3.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/jackson-core-asl-1.7.3.jar
> 12/09/14 04:09:55 DEBUG mapreduce.JobBase: Adding to job classpath: file:/home/adarsh/sqoop-1.4.2.bin__hadoop-0.20/lib/snappy-java-1.0.3.2.jar
> 12/09/14 04:09:56 INFO input.FileInputFormat: Total input paths to process : 1
> 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat: Target numMapTasks=4
> 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat: Total input bytes=414957434
> 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat: maxSplitSize=103739358
> 12/09/14 04:09:56 INFO input.FileInputFormat: Total input paths to process : 1
> 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat: Generated splits:
> 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat:   Paths:/data/data.2012-09-08-00.csv:0+134217728 Locations:localhost:; 
> 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat:   Paths:/data/data.2012-09-08-00.csv:134217728+134217728 Locations:localhost:; 
> 12/09/14 04:09:56 DEBUG mapreduce.ExportInputFormat:   Paths:/data/data.2012-09-08-00.csv:268435456+134217728 Locations:10.1.2.3.4:; 
> 12/09/14 04:09:56 INFO mapred.JobClient: Running job: job_201209050808_49604
> 12/09/14 04:09:57 INFO mapred.JobClient:  map 0% reduce 0%
> 12/09/14 04:10:16 INFO mapred.JobClient:  map 14% reduce 0%
> 12/09/14 04:10:19 INFO mapred.JobClient:  map 23% reduce 0%
> 12/09/14 04:10:22 INFO mapred.JobClient:  map 31% reduce 0%
> 12/09/14 04:10:23 INFO mapred.JobClient:  map 32% reduce 0%
> 12/09/14 04:10:25 INFO mapred.JobClient:  map 33% reduce 0%
> 12/09/14 04:10:26 INFO mapred.JobClient:  map 34% reduce 0%
> 12/09/14 04:10:29 INFO mapred.JobClient:  map 36% reduce 0%
> 12/09/14 04:10:31 INFO mapred.JobClient:  map 37% reduce 0%
> 12/09/14 04:10:32 INFO mapred.JobClient:  map 38% reduce 0%
> 12/09/14 04:10:34 INFO mapred.JobClient:  map 39% reduce 0%
> 12/09/14 04:10:35 INFO mapred.JobClient:  map 40% reduce 0%
> 12/09/14 04:10:37 INFO mapred.JobClient:  map 41% reduce 0%
> 12/09/14 04:10:38 INFO mapred.JobClient:  map 42% reduce 0%
> 12/09/14 04:10:41 INFO mapred.JobClient:  map 44% reduce 0%
> 12/09/14 04:10:44 INFO mapred.JobClient:  map 46% reduce 0%
> 12/09/14 04:10:47 INFO mapred.JobClient:  map 48% reduce 0%
> 12/09/14 04:10:50 INFO mapred.JobClient:  map 50% reduce 0%
> 12/09/14 04:10:53 INFO mapred.JobClient:  map 52% reduce 0%
> 12/09/14 04:10:56 INFO mapred.JobClient:  map 55% reduce 0%
> 12/09/14 04:10:59 INFO mapred.JobClient:  map 57% reduce 0%
> 12/09/14 04:11:02 INFO mapred.JobClient:  map 59% reduce 0%
> 12/09/14 04:11:05 INFO mapred.JobClient:  map 61% reduce 0%
> 12/09/14 04:11:08 INFO mapred.JobClient:  map 63% reduce 0%
> 12/09/14 04:11:11 INFO mapred.JobClient:  map 65% reduce 0%
> 12/09/14 04:11:14 INFO mapred.JobClient:  map 67% reduce 0%
> 12/09/14 04:11:17 INFO mapred.JobClient:  map 68% reduce 0%
> 12/09/14 04:11:18 INFO mapred.JobClient:  map 69% reduce 0%
> 12/09/14 04:11:20 INFO mapred.JobClient:  map 70% reduce 0%
> 12/09/14 04:11:21 INFO mapred.JobClient:  map 71% reduce 0%
> 12/09/14 04:11:23 INFO mapred.JobClient:  map 72% reduce 0%
> 12/09/14 04:11:24 INFO mapred.JobClient:  map 74% reduce 0%
> 12/09/14 04:11:27 INFO mapred.JobClient:  map 59% reduce 0%
> 12/09/14 04:11:29 INFO mapred.JobClient:  map 60% reduce 0%
> 12/09/14 04:11:30 INFO mapred.JobClient:  map 61% reduce 0%
> 12/09/14 04:11:33 INFO mapred.JobClient:  map 62% reduce 0%
> 12/09/14 04:11:35 INFO mapred.JobClient:  map 63% reduce 0%
> 12/09/14 04:11:36 INFO mapred.JobClient:  map 65% reduce 0%
> 12/09/14 04:11:39 INFO mapred.JobClient:  map 67% reduce 0%
> 12/09/14 04:11:42 INFO mapred.JobClient:  map 69% reduce 0%
> 12/09/14 04:11:44 INFO mapred.JobClient:  map 70% reduce 0%
> 12/09/14 04:11:45 INFO mapred.JobClient:  map 71% reduce 0%
> 12/09/14 04:11:48 INFO mapred.JobClient:  map 73% reduce 0%
> 12/09/14 04:11:51 INFO mapred.JobClient:  map 75% reduce 0%
> 12/09/14 04:11:54 INFO mapred.JobClient:  map 77% reduce 0%
> 12/09/14 04:11:57 INFO mapred.JobClient:  map 79% reduce 0%
> 12/09/14 04:12:00 INFO mapred.JobClient:  map 80% reduce 0%
> 12/09/14 04:12:01 INFO mapred.JobClient:  map 75% reduce 0%
> 12/09/14 04:12:12 INFO mapred.JobClient:  map 76% reduce 0%
> 12/09/14 04:12:15 INFO mapred.JobClient:  map 77% reduce 0%
> 12/09/14 04:12:21 INFO mapred.JobClient:  map 78% reduce 0%
> 12/09/14 04:12:24 INFO mapred.JobClient:  map 79% reduce 0%
> 12/09/14 04:12:30 INFO mapred.JobClient:  map 80% reduce 0%
> 12/09/14 04:12:33 INFO mapred.JobClient:  map 81% reduce 0%
> 12/09/14 04:12:36 INFO mapred.JobClient:  map 82% reduce 0%
> 12/09/14 04:12:42 INFO mapred.JobClient:  map 83% reduce 0%
> 12/09/14 04:12:45 INFO mapred.JobClient:  map 84% reduce 0%
> 12/09/14 04:12:51 INFO mapred.JobClient:  map 85% reduce 0%
> 12/09/14 04:12:54 INFO mapred.JobClient:  map 86% reduce 0%
> 12/09/14 04:13:01 INFO mapred.JobClient:  map 87% reduce 0%
> 12/09/14 04:13:04 INFO mapred.JobClient:  map 88% reduce 0%
> 12/09/14 04:13:07 INFO mapred.JobClient:  map 89% reduce 0%
> 12/09/14 04:13:13 INFO mapred.JobClient:  map 90% reduce 0%
> 12/09/14 04:13:16 INFO mapred.JobClient:  map 91% reduce 0%
> 12/09/14 04:13:22 INFO mapred.JobClient:  map 92% reduce 0%
> 12/09/14 04:13:25 INFO mapred.JobClient:  map 93% reduce 0%
> 12/09/14 04:13:28 INFO mapred.JobClient:  map 94% reduce 0%
> 12/09/14 04:13:34 INFO mapred.JobClient:  map 95% reduce 0%
> 12/09/14 04:13:37 INFO mapred.JobClient:  map 96% reduce 0%
> 12/09/14 04:13:43 INFO mapred.JobClient:  map 97% reduce 0%
> 12/09/14 04:13:46 INFO mapred.JobClient:  map 98% reduce 0%
> 12/09/14 04:13:49 INFO mapred.JobClient:  map 99% reduce 0%
> 12/09/14 04:13:52 INFO mapred.JobClient:  map 100% reduce 0%
> 12/09/14 04:13:52 INFO mapred.JobClient: Job complete: job_201209050808_49604
> 12/09/14 04:13:52 INFO mapred.JobClient: Counters: 16
> 12/09/14 04:13:52 INFO mapred.JobClient:   Job Counters 
> 12/09/14 04:13:52 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=346917
> 12/09/14 04:13:52 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
> 12/09/14 04:13:52 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
> 12/09/14 04:13:52 INFO mapred.JobClient:     Rack-local map tasks=6
> 12/09/14 04:13:52 INFO mapred.JobClient:     Launched map tasks=6
> 12/09/14 04:13:52 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1483
> 12/09/14 04:13:52 INFO mapred.JobClient:   FileSystemCounters
> 12/09/14 04:13:52 INFO mapred.JobClient:     HDFS_BYTES_READ=415351349
> 12/09/14 04:13:52 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=243012
> 12/09/14 04:13:52 INFO mapred.JobClient:   Map-Reduce Framework
> 12/09/14 04:13:52 INFO mapred.JobClient:     Map input records=4071315
> 12/09/14 04:13:52 INFO mapred.JobClient:     Physical memory (bytes) snapshot=2018009088
> 12/09/14 04:13:52 INFO mapred.JobClient:     Spilled Records=0
> 12/09/14 04:13:52 INFO mapred.JobClient:     CPU time spent (ms)=240740
> 12/09/14 04:13:52 INFO mapred.JobClient:     Total committed heap usage (bytes)=1919221760
> 12/09/14 04:13:52 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=5843054592
> 12/09/14 04:13:52 INFO mapred.JobClient:     Map output records=4071315
> 12/09/14 04:13:52 INFO mapred.JobClient:     SPLIT_RAW_BYTES=672
> 12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Transferred 396.1099 MB in 237.2008 seconds (1.6699 MB/sec)
> 12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Exported 4071315 records.
> 12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Starting to migrate data from staging table to destination.
> 12/09/14 04:14:29 INFO manager.SqlManager: Migrated 5321315 records from daily_tmp to daily
> adarsh@gs1002:~/sqoop-1.4.2.bin__hadoop-0.20$ 


Re: Sqoop Writes Inaccurate Records in DB

Posted by Adarsh Sharma <ed...@gmail.com>.
Ya sure, please have a look on below commands :-

bin/sqoop job  -- export --connect jdbc:postgresql://localhost/dbname
--export-dir /data/data.2012-09-08-00.csv --staging-table daily_tmp
--clear-staging-table --verbose  --table daily  --username abc --password
abc --input-fields-terminated-by '^A'

Also attaching the output of job. But below lines explains everything :
12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Transferred 396.1099 MB in
237.2008 seconds (1.6699 MB/sec)
12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Exported 4071315 records.
12/09/14 04:13:52 INFO mapreduce.ExportJobBase: Starting to migrate data
from staging table to destination.
12/09/14 04:14:29 INFO manager.SqlManager: Migrated 5321315 records from
daily_tmp to daily

Total Records in CSV : 4071315 , Records Inserted : 52321315 ( due to fail
& rerun map tasks )
12/09/14 04:11:57 INFO mapred.JobClient:  map 79% reduce 0%
12/09/14 04:12:00 INFO mapred.JobClient:  map 80% reduce 0% ( Now map fails
)
12/09/14 04:12:01 INFO mapred.JobClient:  map 75% reduce 0%
12/09/14 04:12:12 INFO mapred.JobClient:  map 76% reduce 0%

Please let me know if other info is reqd.

Thanks

On Thu, Sep 13, 2012 at 10:12 PM, Kathleen Ting <ka...@apache.org> wrote:

> Hi Adarsh, can you re-run with the --verbose option enabled? Also,
> please paste in the entire Sqoop command used.
>
> Thanks, Kathleen
>
> On Thu, Sep 13, 2012 at 7:53 AM, Adarsh Sharma <ed...@gmail.com>
> wrote:
> > Hi all,
> >
> > I am using sqoop-1.4.2 with cloudera hadoop and doing some tesing. We
> need
> > to export some tables from CSV's in HDFS.
> > As sqoop provides a mechanism of staging tables to write data in main
> tables
> > only if all maps are succeeded.
> >
> > While executing a sqoop job on hadoop , suppose a map fails & hadoop
> > reattempt the map to re-run and finish after 3 attempts, it results in
> > duplicate records in staging table and the job finished but data
> inserted is
> > higher than in CSV's. Below is the output :
> >
> > 12/09/13 14:46:55 INFO mapreduce.ExportJobBase: Exported 4071315 records.
> > 12/09/13 14:46:55 INFO mapreduce.ExportJobBase: Starting to migrate data
> > from staging table to destination.
> > 12/09/13 14:47:29 INFO manager.SqlManager: Migrated 5391315 records from
> > table1_tmp to table
> >
> > Is this is a bug in Sqoop and is there any fix or patch for it. Please
> let
> > me know.
> >
> >
> > Thanks
>

Re: Sqoop Writes Inaccurate Records in DB

Posted by Kathleen Ting <ka...@apache.org>.
Hi Adarsh, can you re-run with the --verbose option enabled? Also,
please paste in the entire Sqoop command used.

Thanks, Kathleen

On Thu, Sep 13, 2012 at 7:53 AM, Adarsh Sharma <ed...@gmail.com> wrote:
> Hi all,
>
> I am using sqoop-1.4.2 with cloudera hadoop and doing some tesing. We need
> to export some tables from CSV's in HDFS.
> As sqoop provides a mechanism of staging tables to write data in main tables
> only if all maps are succeeded.
>
> While executing a sqoop job on hadoop , suppose a map fails & hadoop
> reattempt the map to re-run and finish after 3 attempts, it results in
> duplicate records in staging table and the job finished but data inserted is
> higher than in CSV's. Below is the output :
>
> 12/09/13 14:46:55 INFO mapreduce.ExportJobBase: Exported 4071315 records.
> 12/09/13 14:46:55 INFO mapreduce.ExportJobBase: Starting to migrate data
> from staging table to destination.
> 12/09/13 14:47:29 INFO manager.SqlManager: Migrated 5391315 records from
> table1_tmp to table
>
> Is this is a bug in Sqoop and is there any fix or patch for it. Please let
> me know.
>
>
> Thanks