You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@sqoop.apache.org by Marcin Cylke <mc...@touk.pl> on 2012/03/07 15:10:57 UTC

import to hive hangs

Hi

I'm trying to import some tables into Hive. I'm using this kind of CLI 
command:

sqoop import \
--driver oracle.jdbc.driver.OracleDriver \
--connect jdbc:oracle:thin:@localhost:1521:XE \
--username had_test --password abc1234 \
--query "select * from all_tables where \$CONDITIONS" \
--num-mappers 1 \
--hive-import \
--hive-table "sample_rm2" \
--target-dir /user/hive/warehouse/sample_rm2 \
--mysql-delimiters \
--verbose

The output table is created, everything looks good, but sqoop does not 
exit properly. It just hangs indefinitelly on the last line (look in the 
attached log file).

My environment is:
- Sqoop 1.4.1-incubating
   git commit id
   Compiled by jarcec on Tue Feb  7 20:54:25 CET 2012
- Hadoop 1.0.0
- hive 0.9.0-SNAPSHOT - built from git a few days ago

Is this a know issue?

Thanks for any help
Regards, Marcin

Re: import to hive hangs

Posted by Marcin Cylke <mc...@touk.pl>.

On 09/03/12 16:28, Arvind Prabhakar wrote:
> Hi Marcin,
>
> Thanks a lot for fixing the logging issue. In order for us to commit
> your patch, you would need to attach it to the Jira and grant license
> to it for inclusion. I have created the Jira for you at:
>
> https://issues.apache.org/jira/browse/SQOOP-460

Thanks, I've attached the patch. Mind that it's quick and dirty and is 
rather intended as a heads-up - might require more general approach.

Marcin

Re: import to hive hangs

Posted by Arvind Prabhakar <ar...@apache.org>.

Hi Marcin,

Thanks a lot for fixing the logging issue. In order for us to commit
your patch, you would need to attach it to the Jira and grant license
to it for inclusion. I have created the Jira for you at:

https://issues.apache.org/jira/browse/SQOOP-460

Thanks,
Arvind Prabhakar

On Fri, Mar 9, 2012 at 5:58 AM, Marcin Cylke <mc...@touk.pl> wrote:
> On 08/03/12 08:29, Marcin Cylke wrote:
>>
>> On 07/03/12 18:01, Jarek Jarcec Cecho wrote:
>>>
>>> It's hard for me to guess what happened. Would you mind providing log
>>> file with --verbose flag set?
>>
>>
>> Here - I've attached sqoop verbose log as well as hive job output log
>> (the one present in the sqoop logs)
>
>
> I've realized that adding --verbose to sqoop does not change log category to
> DEBUG.
>
> I've applied this simple patch (sqoop.patch) against trunk and now the
> output of --verbose is a bit longer. Hope this helps.
>
> Marcin
>

Re: import to hive hangs

Posted by Marcin Cylke <mc...@touk.pl>.

On 08/03/12 08:29, Marcin Cylke wrote:
> On 07/03/12 18:01, Jarek Jarcec Cecho wrote:
>> It's hard for me to guess what happened. Would you mind providing log
>> file with --verbose flag set?
>
> Here - I've attached sqoop verbose log as well as hive job output log
> (the one present in the sqoop logs)

I've realized that adding --verbose to sqoop does not change log 
category to DEBUG.

I've applied this simple patch (sqoop.patch) against trunk and now the 
output of --verbose is a bit longer. Hope this helps.

Marcin

Re: import to hive hangs

Posted by Marcin Cylke <mc...@touk.pl>.

On 07/03/12 18:01, Jarek Jarcec Cecho wrote:
> It's hard for me to guess what happened. Would you mind providing log file with --verbose flag set?

Here - I've attached sqoop verbose log as well as hive job output log 
(the one present in the sqoop logs)

Hope this helps with the problem.

Marcin

Re: [ sqoopusers ] Re: import to hive hangs

Posted by Jarek Jarcec Cecho <ja...@apache.org>.

It's hard for me to guess what happened. Would you mind providing log file with --verbose flag set?

Jarcec

On Mar 7, 2012, at 4:54 PM, Marcin Cylke wrote:

> On 07/03/12 15:28, Jarek Jarcec Cecho wrote:
>> I'm not sure whether it's connected to your problem, but sqoop is designed to firstly import data into HDFS to a temporary directory and than using hive command "LOAD DATA" move them to target location in HIVE warehouse directory. You seems to be importing data directly into default HIVE warehouse directory (--targer-dir /user/hive/warehouse/sample_rm2). I would firstly try to change --target-dir parameter to simple "--target-dir sample_rm2" (directory with this name will be created in your HDFS home directory) or any other HDFS path outside /user/hive/warehouse.
> 
> This helped quite a lot  I've again forgot about not writting into
> hive warehouse directly.
> 
> Now, when I execute it with --target-dir sample_rm , and not full path,
> sqoop finishes execution. However Hive is not aware of this table. The
> appropriate directory is in /user/hive/warehouse, but is not visible via
> 'show tables' in hive shell, nor I can select any data from it.
> 
> Is there an easy fix for that?
> 
> Marcin
>

Re: [ sqoopusers ] Re: import to hive hangs

Posted by Marcin Cylke <mc...@touk.pl>.

On 07/03/12 15:28, Jarek Jarcec Cecho wrote:
> I'm not sure whether it's connected to your problem, but sqoop is designed to firstly import data into HDFS to a temporary directory and than using hive command "LOAD DATA" move them to target location in HIVE warehouse directory. You seems to be importing data directly into default HIVE warehouse directory (--targer-dir /user/hive/warehouse/sample_rm2). I would firstly try to change --target-dir parameter to simple "--target-dir sample_rm2" (directory with this name will be created in your HDFS home directory) or any other HDFS path outside /user/hive/warehouse.

This helped quite a lot  I've again forgot about not writting into
hive warehouse directly.

Now, when I execute it with --target-dir sample_rm , and not full path,
sqoop finishes execution. However Hive is not aware of this table. The
appropriate directory is in /user/hive/warehouse, but is not visible via
'show tables' in hive shell, nor I can select any data from it.

Is there an easy fix for that?

Marcin

Re: import to hive hangs

Posted by Jarek Jarcec Cecho <ja...@apache.org>.

I'm not sure whether it's connected to your problem, but sqoop is designed to firstly import data into HDFS to a temporary directory and than using hive command "LOAD DATA" move them to target location in HIVE warehouse directory. You seems to be importing data directly into default HIVE warehouse directory (--targer-dir /user/hive/warehouse/sample_rm2). I would firstly try to change --target-dir parameter to simple "--target-dir sample_rm2" (directory with this name will be created in your HDFS home directory) or any other HDFS path outside /user/hive/warehouse.

Jarcec

On Wed, Mar 07, 2012 at 03:10:57PM +0100, Marcin Cylke wrote:
> Hi
> 
> I'm trying to import some tables into Hive. I'm using this kind of
> CLI command:
> 
> sqoop import \
> --driver oracle.jdbc.driver.OracleDriver \
> --connect jdbc:oracle:thin:@localhost:1521:XE \
> --username had_test --password abc1234 \
> --query "select * from all_tables where \$CONDITIONS" \
> --num-mappers 1 \
> --hive-import \
> --hive-table "sample_rm2" \
> --target-dir /user/hive/warehouse/sample_rm2 \
> --mysql-delimiters \
> --verbose
> 
> The output table is created, everything looks good, but sqoop does
> not exit properly. It just hangs indefinitelly on the last line
> (look in the attached log file).
> 
> My environment is:
> - Sqoop 1.4.1-incubating
>   git commit id
>   Compiled by jarcec on Tue Feb  7 20:54:25 CET 2012
> - Hadoop 1.0.0
> - hive 0.9.0-SNAPSHOT - built from git a few days ago
> 
> Is this a know issue?
> 
> Thanks for any help
> Regards, Marcin

> 
> 12/02/28 13:54:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
> 12/02/28 13:54:13 WARN tool.BaseSqoopTool: Hive does not support escape characters in fields;
> 12/02/28 13:54:13 WARN tool.BaseSqoopTool: parse errors in Hive may result from using --escaped-by.
> 12/02/28 13:54:13 WARN tool.BaseSqoopTool: Hive does not support quoted strings; parse errors
> 12/02/28 13:54:13 WARN tool.BaseSqoopTool: in Hive may result from using --enclosed-by.
> 12/02/28 13:54:13 INFO manager.SqlManager: Using default fetchSize of 1000
> 12/02/28 13:54:13 INFO tool.CodeGenTool: Beginning code generation
> 12/02/28 13:54:14 INFO manager.SqlManager: Executing SQL statement: select * from all_tables where (1 = 0)
> 12/02/28 13:54:14 INFO manager.SqlManager: Executing SQL statement: select * from all_tables where (1 = 0)
> 12/02/28 13:54:14 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop-1.0.0/libexec/..
> Note: /tmp/sqoop-m/compile/e30db5359f37c6d21004e22c78085f74/QueryResult.java uses or overrides a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> 12/02/28 13:54:16 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-michler/compile/e30db5359f37c6d21004e22c78085f74/QueryResult.java to /home /michler/./QueryResult.java
> java.io.IOException: Destination '/home/michler/./QueryResult.java' already exists
>         at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:1811)
>         at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:227)
>         at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
>         at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:367)
>         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
>         at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
>         at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
>         at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
> 12/02/28 13:54:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-m/compile/e30db5359f37c6d21004e22c78085f74/QueryResult.jar
> 12/02/28 13:54:16 INFO mapreduce.ImportJobBase: Beginning query import.
> 12/02/28 13:54:18 INFO mapred.JobClient: Running job: job_201202281147_0003
> 12/02/28 13:54:19 INFO mapred.JobClient: map 0% reduce 0%
> 12/02/28 13:54:39 INFO mapred.JobClient: map 100% reduce 0%
> 12/02/28 13:54:44 INFO mapred.JobClient: Job complete: job_201202281147_0003
> 12/02/28 13:54:44 INFO mapred.JobClient: Counters: 18
> 12/02/28 13:54:44 INFO mapred.JobClient: Job Counters
> 12/02/28 13:54:44 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18733
> 12/02/28 13:54:44 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
> 12/02/28 13:54:44 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
> 12/02/28 13:54:44 INFO mapred.JobClient: Launched map tasks=1
> 12/02/28 13:54:44 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
> 12/02/28 13:54:44 INFO mapred.JobClient: File Output Format Counters
> 12/02/28 13:54:44 INFO mapred.JobClient: Bytes Written=28941
> 12/02/28 13:54:44 INFO mapred.JobClient: FileSystemCounters
> 12/02/28 13:54:44 INFO mapred.JobClient: HDFS_BYTES_READ=87
> 12/02/28 13:54:44 INFO mapred.JobClient: FILE_BYTES_WRITTEN=29027
> 12/02/28 13:54:44 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28941
> 12/02/28 13:54:44 INFO mapred.JobClient: File Input Format Counters
> 12/02/28 13:54:44 INFO mapred.JobClient: Bytes Read=0
> 12/02/28 13:54:44 INFO mapred.JobClient: Map-Reduce Framework
> 12/02/28 13:54:44 INFO mapred.JobClient: Map input records=95
> 12/02/28 13:54:44 INFO mapred.JobClient: Physical memory (bytes) snapshot=112734208
> 12/02/28 13:54:44 INFO mapred.JobClient: Spilled Records=0
> 12/02/28 13:54:44 INFO mapred.JobClient: CPU time spent (ms)=3640
> 12/02/28 13:54:44 INFO mapred.JobClient: Total committed heap usage (bytes)=99221504
> 12/02/28 13:54:44 INFO mapred.JobClient: Virtual memory (bytes) snapshot=623640576
> 12/02/28 13:54:44 INFO mapred.JobClient: Map output records=95
> 12/02/28 13:54:44 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
> 12/02/28 13:54:44 INFO mapreduce.ImportJobBase: Transferred 28.2627 KB in 28.1098 seconds (1.0054 KB/sec)
> 12/02/28 13:54:44 INFO mapreduce.ImportJobBase: Retrieved 95 records.
> 12/02/28 13:54:44 INFO hive.HiveImport: Removing temporary files from import process: /user/hive/warehouse/sample_rm2/_logs
> 12/02/28 13:54:44 INFO hive.HiveImport: Loading uploaded data into Hive
> 12/02/28 13:54:44 INFO manager.SqlManager: Executing SQL statement: select * from all_tables where (1 = 0)
> 12/02/28 13:54:44 INFO manager.SqlManager: Executing SQL statement: select * from all_tables where (1 = 0)
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column PCT_FREE had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column PCT_USED had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column INI_TRANS had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column MAX_TRANS had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column INITIAL_EXTENT had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column NEXT_EXTENT had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column MIN_EXTENTS had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column MAX_EXTENTS had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column PCT_INCREASE had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column FREELISTS had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column FREELIST_GROUPS had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column NUM_ROWS had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column BLOCKS had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column EMPTY_BLOCKS had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column AVG_SPACE had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column CHAIN_CNT had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column AVG_ROW_LEN had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column AVG_SPACE_FREELIST_BLOCKS had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column NUM_FREELIST_BLOCKS had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column SAMPLE_SIZE had to be cast to a less precise type in Hive
> 12/02/28 13:54:44 WARN hive.TableDefWriter: Column LAST_ANALYZED had to be cast to a less precise type in Hive
> 12/02/28 13:54:45 INFO hive.HiveImport: WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.Ev entCounter in all the log4j.properties files.
> 12/02/28 13:54:46 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/m/hive/lib/hive-common-0.9.0-SNAPSHOT.jar!/hiv e-log4j.properties
> 12/02/28 13:54:46 INFO hive.HiveImport: Hive history file=/tmp/m/hive_job_log_michler_201202281354_287717854.txt
> 12/02/28 13:54:51 INFO hive.HiveImport: OK
> 12/02/28 13:54:51 INFO hive.HiveImport: Time taken: 4.591 seconds
> 12/02/28 13:54:51 INFO hive.HiveImport: Loading data to table default.sample_rm2