You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@sqoop.apache.org by Kate Ting <ka...@cloudera.com> on 2011/12/20 08:02:06 UTC

Re: [sqoop-user] Sqoop doesn't finish its output of the job on console.

Yeonki -

Please subscribe to incubator-sqoop-user-subscribe@apache.org for
faster response.

1. What Sqoop and Hadoop versions are you running?
2. Please re-run your Sqoop job with the --verbose flag and then
attach the console log.
3. Also, please provide a representative input data set that triggers
this problem.
4. Please attach the task logs from Hadoop so we can see if there are
any specific failures recorded there. It is possible that the failure
occurs while the task execution is not relayed correctly to the
console.
5. Can you try breaking the job into two separate actions to see where
the problem really occurs? First, do the import alone. Second, create
a Hive table without the import using the create-hive-table tool. This
helps narrow down the problem to either regular import or during the
creation and population of the Hive table.

Regards, Kate

On Mon, Dec 19, 2011 at 6:38 PM, Yeonki Choi <ye...@gmail.com> wrote:
> Hi All,
>
> I'd like to move a table named t_stations and its data from mysql to
> hive.
> the problem I was encounter was that couldn't complete the import job
> on the console.
> The output of the console is below:
>
> $ sqoop import --connect jdbc:mysql://localhost/ncdc --username user01
> --password user01 --table t_stations --hive-import --create-hive-table
> --warehouse-dir /user/hive/warehouse/
> 11/12/20 10:16:37 WARN tool.BaseSqoopTool: Setting your password on
> the command-line is insecure. Consider using -P instead.
> 11/12/20 10:16:37 INFO tool.BaseSqoopTool: Using Hive-specific
> delimiters for output. You can override
> 11/12/20 10:16:37 INFO tool.BaseSqoopTool: delimiters with --fields-
> terminated-by, etc.
> 11/12/20 10:16:37 INFO manager.MySQLManager: Preparing to use a MySQL
> streaming resultset.
> 11/12/20 10:16:37 INFO tool.CodeGenTool: Beginning code generation
> 11/12/20 10:16:38 INFO manager.SqlManager: Executing SQL
> statement:SELECT t.* FROM `t_stations` AS t LIMIT 1
> 11/12/20 10:16:38 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/
> hadoop-0.20
> 11/12/20 10:16:38 INFO orm.CompilationManager: Found hadoop core jar
> at: /usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u2-core.jar
> 11/12/20 10:16:42 INFO orm.CompilationManager: Writing jar file: /tmp/
> sqoop-apexcns/compile/e200d46ab611e92c58b2d6f8d04abf54/t_stations.jar
> 11/12/20 10:16:42 WARN manager.MySQLManager: It looks like you are
> importing from mysql.
> 11/12/20 10:16:42 WARN manager.MySQLManager: This transfer can be
> faster! Use the --direct
> 11/12/20 10:16:42 WARN manager.MySQLManager: option to exercise a
> MySQL-specific fast path.
> 11/12/20 10:16:42 INFO manager.MySQLManager: Setting zero DATETIME
> behavior to convertToNull (mysql)
> 11/12/20 10:16:42 INFO mapreduce.ImportJobBase: Beginning import of
> t_stations
> 11/12/20 10:16:48 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
> SELECT MIN(`STN`), MAX(`STN`) FROM `t_stations`
> 11/12/20 10:16:48 WARN db.TextSplitter: Generating splits for a
> textual index column.
> 11/12/20 10:16:48 WARN db.TextSplitter: If your database sorts in a
> case-insensitive order, this may result in a partial import or
> duplicate records.
> 11/12/20 10:16:48 WARN db.TextSplitter: You are strongly encouraged to
> choose an integral split column.
> 11/12/20 10:16:49 INFO mapred.JobClient: Running job:
> job_201112151605_0051
> 11/12/20 10:16:50 INFO mapred.JobClient:  map 0% reduce 0%
> 11/12/20 10:17:09 INFO mapred.JobClient:  map 50% reduce 0%
> 11/12/20 10:17:11 INFO mapred.JobClient:  map 75% reduce 0%
> 11/12/20 10:17:16 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/20 10:17:21 INFO mapred.JobClient: Job complete:
> job_201112151605_0051
> 11/12/20 10:17:21 INFO mapred.JobClient: Counters: 12
> 11/12/20 10:17:21 INFO mapred.JobClient:   Job Counters
> 11/12/20 10:17:21 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=64298
> 11/12/20 10:17:21 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 11/12/20 10:17:21 INFO mapred.JobClient:     Total time spent by all
> maps waiting after reserving slots (ms)=0
> 11/12/20 10:17:21 INFO mapred.JobClient:     Launched map tasks=4
> 11/12/20 10:17:21 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 11/12/20 10:17:21 INFO mapred.JobClient:   FileSystemCounters
> 11/12/20 10:17:21 INFO mapred.JobClient:     HDFS_BYTES_READ=523
> 11/12/20 10:17:21 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=230115
> 11/12/20 10:17:21 INFO mapred.JobClient:
> HDFS_BYTES_WRITTEN=1173690
> 11/12/20 10:17:21 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/20 10:17:21 INFO mapred.JobClient:     Map input records=27945
> 11/12/20 10:17:21 INFO mapred.JobClient:     Spilled Records=0
> 11/12/20 10:17:21 INFO mapred.JobClient:     Map output records=27945
> 11/12/20 10:17:21 INFO mapred.JobClient:     SPLIT_RAW_BYTES=523
> 11/12/20 10:17:21 INFO mapreduce.ImportJobBase: Transferred 1.1193 MB
> in 38.1035 seconds (30.0808 KB/sec)
> 11/12/20 10:17:21 INFO mapreduce.ImportJobBase: Retrieved 27945
> records.
> 11/12/20 10:17:21 INFO hive.HiveImport: Removing temporary files from
> import process: /user/hive/warehouse/t_stations/_logs
> 11/12/20 10:17:22 INFO hive.HiveImport: Loading uploaded data into
> Hive
> 11/12/20 10:17:22 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM `t_stations` AS t LIMIT 1
> 11/12/20 10:17:26 INFO hive.HiveImport: Hive history file=/tmp/apexcns/
> hive_job_log_apexcns_201112201017_38000787.txt
> 11/12/20 10:17:40 INFO hive.HiveImport: OK
> 11/12/20 10:17:40 INFO hive.HiveImport: Time taken: 13.257 seconds
> 11/12/20 10:17:40 INFO hive.HiveImport: Loading data to table
> default.t_stations
> =>  Always, the output stops here
>
> However, When I checked whether data was well imported or not, It was
> imported successfully.
> Following logs are for data check.
>
> hive> select count(*) from t_stations;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>  set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>  set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>  set mapred.reduce.tasks=<number>
> Starting Job = job_201112151605_0052, Tracking URL =
> http://name01:50030/jobdetails.jsp?jobid=job_201112151605_0052
> Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job -
> Dmapred.job.tracker=name01:9001 -kill job_201112151605_0052
> 2011-12-20 10:29:22,543 Stage-1 map = 0%,  reduce = 0%
> 2011-12-20 10:29:29,716 Stage-1 map = 50%,  reduce = 0%
> 2011-12-20 10:29:30,786 Stage-1 map = 100%,  reduce = 0%
> 2011-12-20 10:29:48,245 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201112151605_0052
> OK
> 27945
> Time taken: 69.289 seconds
>
> Is there any tip to complete output of the console?
>
> Thanks
> Yeonki.
>
> --
> NOTE: The mailing list sqoop-user@cloudera.org is deprecated in favor of Apache Sqoop mailing list sqoop-user@incubator.apache.org. Please subscribe to it by sending an email to incubator-sqoop-user-subscribe@apache.org.

Re: [sqoop-user] Sqoop doesn't finish its output of the job on console.

Posted by alo alt <wg...@googlemail.com>.

Hi Yeonki,

what should be wrong? Sqoop exports the data and write them into the
table t_stations. To examine if the data are correct take a "select *
from t_stations limit 2" and compare them with your dataset in MySQL.
If sqoop says "Time taken blah" all went okay. The WARN are only
"intormations" how you can optimize your sqoop-command.

- Alex


On Tue, Dec 20, 2011 at 8:02 AM, Kate Ting <ka...@cloudera.com> wrote:
> Yeonki -
>
> Please subscribe to incubator-sqoop-user-subscribe@apache.org for
> faster response.
>
> 1. What Sqoop and Hadoop versions are you running?
> 2. Please re-run your Sqoop job with the --verbose flag and then
> attach the console log.
> 3. Also, please provide a representative input data set that triggers
> this problem.
> 4. Please attach the task logs from Hadoop so we can see if there are
> any specific failures recorded there. It is possible that the failure
> occurs while the task execution is not relayed correctly to the
> console.
> 5. Can you try breaking the job into two separate actions to see where
> the problem really occurs? First, do the import alone. Second, create
> a Hive table without the import using the create-hive-table tool. This
> helps narrow down the problem to either regular import or during the
> creation and population of the Hive table.
>
> Regards, Kate
>
> On Mon, Dec 19, 2011 at 6:38 PM, Yeonki Choi <ye...@gmail.com> wrote:
>> Hi All,
>>
>> I'd like to move a table named t_stations and its data from mysql to
>> hive.
>> the problem I was encounter was that couldn't complete the import job
>> on the console.
>> The output of the console is below:
>>
>> $ sqoop import --connect jdbc:mysql://localhost/ncdc --username user01
>> --password user01 --table t_stations --hive-import --create-hive-table
>> --warehouse-dir /user/hive/warehouse/
>> 11/12/20 10:16:37 WARN tool.BaseSqoopTool: Setting your password on
>> the command-line is insecure. Consider using -P instead.
>> 11/12/20 10:16:37 INFO tool.BaseSqoopTool: Using Hive-specific
>> delimiters for output. You can override
>> 11/12/20 10:16:37 INFO tool.BaseSqoopTool: delimiters with --fields-
>> terminated-by, etc.
>> 11/12/20 10:16:37 INFO manager.MySQLManager: Preparing to use a MySQL
>> streaming resultset.
>> 11/12/20 10:16:37 INFO tool.CodeGenTool: Beginning code generation
>> 11/12/20 10:16:38 INFO manager.SqlManager: Executing SQL
>> statement:SELECT t.* FROM `t_stations` AS t LIMIT 1
>> 11/12/20 10:16:38 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/
>> hadoop-0.20
>> 11/12/20 10:16:38 INFO orm.CompilationManager: Found hadoop core jar
>> at: /usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u2-core.jar
>> 11/12/20 10:16:42 INFO orm.CompilationManager: Writing jar file: /tmp/
>> sqoop-apexcns/compile/e200d46ab611e92c58b2d6f8d04abf54/t_stations.jar
>> 11/12/20 10:16:42 WARN manager.MySQLManager: It looks like you are
>> importing from mysql.
>> 11/12/20 10:16:42 WARN manager.MySQLManager: This transfer can be
>> faster! Use the --direct
>> 11/12/20 10:16:42 WARN manager.MySQLManager: option to exercise a
>> MySQL-specific fast path.
>> 11/12/20 10:16:42 INFO manager.MySQLManager: Setting zero DATETIME
>> behavior to convertToNull (mysql)
>> 11/12/20 10:16:42 INFO mapreduce.ImportJobBase: Beginning import of
>> t_stations
>> 11/12/20 10:16:48 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
>> SELECT MIN(`STN`), MAX(`STN`) FROM `t_stations`
>> 11/12/20 10:16:48 WARN db.TextSplitter: Generating splits for a
>> textual index column.
>> 11/12/20 10:16:48 WARN db.TextSplitter: If your database sorts in a
>> case-insensitive order, this may result in a partial import or
>> duplicate records.
>> 11/12/20 10:16:48 WARN db.TextSplitter: You are strongly encouraged to
>> choose an integral split column.
>> 11/12/20 10:16:49 INFO mapred.JobClient: Running job:
>> job_201112151605_0051
>> 11/12/20 10:16:50 INFO mapred.JobClient:  map 0% reduce 0%
>> 11/12/20 10:17:09 INFO mapred.JobClient:  map 50% reduce 0%
>> 11/12/20 10:17:11 INFO mapred.JobClient:  map 75% reduce 0%
>> 11/12/20 10:17:16 INFO mapred.JobClient:  map 100% reduce 0%
>> 11/12/20 10:17:21 INFO mapred.JobClient: Job complete:
>> job_201112151605_0051
>> 11/12/20 10:17:21 INFO mapred.JobClient: Counters: 12
>> 11/12/20 10:17:21 INFO mapred.JobClient:   Job Counters
>> 11/12/20 10:17:21 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=64298
>> 11/12/20 10:17:21 INFO mapred.JobClient:     Total time spent by all
>> reduces waiting after reserving slots (ms)=0
>> 11/12/20 10:17:21 INFO mapred.JobClient:     Total time spent by all
>> maps waiting after reserving slots (ms)=0
>> 11/12/20 10:17:21 INFO mapred.JobClient:     Launched map tasks=4
>> 11/12/20 10:17:21 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>> 11/12/20 10:17:21 INFO mapred.JobClient:   FileSystemCounters
>> 11/12/20 10:17:21 INFO mapred.JobClient:     HDFS_BYTES_READ=523
>> 11/12/20 10:17:21 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=230115
>> 11/12/20 10:17:21 INFO mapred.JobClient:
>> HDFS_BYTES_WRITTEN=1173690
>> 11/12/20 10:17:21 INFO mapred.JobClient:   Map-Reduce Framework
>> 11/12/20 10:17:21 INFO mapred.JobClient:     Map input records=27945
>> 11/12/20 10:17:21 INFO mapred.JobClient:     Spilled Records=0
>> 11/12/20 10:17:21 INFO mapred.JobClient:     Map output records=27945
>> 11/12/20 10:17:21 INFO mapred.JobClient:     SPLIT_RAW_BYTES=523
>> 11/12/20 10:17:21 INFO mapreduce.ImportJobBase: Transferred 1.1193 MB
>> in 38.1035 seconds (30.0808 KB/sec)
>> 11/12/20 10:17:21 INFO mapreduce.ImportJobBase: Retrieved 27945
>> records.
>> 11/12/20 10:17:21 INFO hive.HiveImport: Removing temporary files from
>> import process: /user/hive/warehouse/t_stations/_logs
>> 11/12/20 10:17:22 INFO hive.HiveImport: Loading uploaded data into
>> Hive
>> 11/12/20 10:17:22 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM `t_stations` AS t LIMIT 1
>> 11/12/20 10:17:26 INFO hive.HiveImport: Hive history file=/tmp/apexcns/
>> hive_job_log_apexcns_201112201017_38000787.txt
>> 11/12/20 10:17:40 INFO hive.HiveImport: OK
>> 11/12/20 10:17:40 INFO hive.HiveImport: Time taken: 13.257 seconds
>> 11/12/20 10:17:40 INFO hive.HiveImport: Loading data to table
>> default.t_stations
>> =>  Always, the output stops here
>>
>> However, When I checked whether data was well imported or not, It was
>> imported successfully.
>> Following logs are for data check.
>>
>> hive> select count(*) from t_stations;
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks determined at compile time: 1
>> In order to change the average load for a reducer (in bytes):
>>  set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>  set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>  set mapred.reduce.tasks=<number>
>> Starting Job = job_201112151605_0052, Tracking URL =
>> http://name01:50030/jobdetails.jsp?jobid=job_201112151605_0052
>> Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job -
>> Dmapred.job.tracker=name01:9001 -kill job_201112151605_0052
>> 2011-12-20 10:29:22,543 Stage-1 map = 0%,  reduce = 0%
>> 2011-12-20 10:29:29,716 Stage-1 map = 50%,  reduce = 0%
>> 2011-12-20 10:29:30,786 Stage-1 map = 100%,  reduce = 0%
>> 2011-12-20 10:29:48,245 Stage-1 map = 100%,  reduce = 100%
>> Ended Job = job_201112151605_0052
>> OK
>> 27945
>> Time taken: 69.289 seconds
>>
>> Is there any tip to complete output of the console?
>>
>> Thanks
>> Yeonki.
>>
>> --
>> NOTE: The mailing list sqoop-user@cloudera.org is deprecated in favor of Apache Sqoop mailing list sqoop-user@incubator.apache.org. Please subscribe to it by sending an email to incubator-sqoop-user-subscribe@apache.org.



-- 
Alexander Lorenz
http://mapredit.blogspot.com

P Think of the environment: please don't print this email unless you
really need to.