You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@sqoop.apache.org by Tushar Sudake <et...@gmail.com> on 2012/05/17 13:59:30 UTC

Sqoop export using non-new line row delimiter

Hi,

I have some data in text file on HDFS and want to export this data into
MySQL database.
But I want Sqoop to use "|" as record delimiter instead of default "\n"
record delimiter.

So I am specifying ' --input-lines-terminated-by "|" ' option in my Sqoop
export command.

The export succeeds, but the number of records exported shown is  only 1.
And when I check in MySQL target table, I see only one record.

Looks like only one record before first "|" is getting exported.

Here's sample data on HDFS:

1,Hello|2,How|3,Are|4,You|5,I|6,am|7,fine|

Sqoop Export command:

bin/sqoop export --connect 'jdbc:mysql://localhost/mydb' -password pwd
--username usr --table mytable --export-dir data
--input-lines-terminated-by "|"

Console logs:

12/05/17 03:32:02 WARN tool.BaseSqoopTool: Setting your password on the
command-line is insecure. Consider using -P instead.
12/05/17 03:32:02 INFO manager.MySQLManager: Preparing to use a MySQL
streaming resultset.
12/05/17 03:32:02 INFO tool.CodeGenTool: Beginning code generation
12/05/17 03:32:02 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM `mytable` AS t LIMIT 1
12/05/17 03:32:03 INFO orm.CompilationManager: HADOOP_HOME is
/home/tushar/hadoop-0.20.2-cdh3u4
Note:
/tmp/sqoop-tushar/compile/fd6d3bfd4c2ed7f2e19a2de418993dfc/mytable.java
uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/05/17 03:32:04 INFO orm.CompilationManager: Writing jar file:
/tmp/sqooptushar/compile/fd6d3bfd4c2ed7f2e19a2de418993dfc/mytable.jar
12/05/17 03:32:04 INFO mapreduce.ExportJobBase: Beginning export of mytable
12/05/17 03:32:06 INFO input.FileInputFormat: Total input paths to process
: 1
12/05/17 03:32:06 INFO input.FileInputFormat: Total input paths to process
: 1
12/05/17 03:32:06 INFO mapred.JobClient: Running job: job_201205110542_0432
12/05/17 03:32:07 INFO mapred.JobClient:  map 0% reduce 0%
12/05/17 03:32:13 INFO mapred.JobClient:  map 100% reduce 0%
12/05/17 03:32:14 INFO mapred.JobClient: Job complete: job_201205110542_0432
12/05/17 03:32:14 INFO mapred.JobClient: Counters: 16
12/05/17 03:32:14 INFO mapred.JobClient:   Job Counters
12/05/17 03:32:14 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6685
12/05/17 03:32:14 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
12/05/17 03:32:14 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
12/05/17 03:32:14 INFO mapred.JobClient:     Launched map tasks=1
12/05/17 03:32:14 INFO mapred.JobClient:     Data-local map tasks=1
12/05/17 03:32:14 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
12/05/17 03:32:14 INFO mapred.JobClient:   FileSystemCounters
12/05/17 03:32:14 INFO mapred.JobClient:     HDFS_BYTES_READ=166
12/05/17 03:32:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=79082
12/05/17 03:32:14 INFO mapred.JobClient:   Map-Reduce Framework
12/05/17 03:32:14 INFO mapred.JobClient:     Map input records=1
12/05/17 03:32:14 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=68677632
12/05/17 03:32:14 INFO mapred.JobClient:     Spilled Records=0
12/05/17 03:32:14 INFO mapred.JobClient:     CPU time spent (ms)=1130
12/05/17 03:32:14 INFO mapred.JobClient:     Total committed heap usage
(bytes)=39911424
12/05/17 03:32:14 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=392290304
12/05/17 03:32:14 INFO mapred.JobClient:     Map output records=1
12/05/17 03:32:14 INFO mapred.JobClient:     SPLIT_RAW_BYTES=117
12/05/17 03:32:14 INFO mapreduce.ExportJobBase: Transferred 166 bytes in
9.6013 seconds (17.2893 bytes/sec)
12/05/17 03:32:14 INFO mapreduce.ExportJobBase: Exported 1 records.

On MySQL side:

mysql> select * from mytable;
+------+-------+
| i    | name  |
+------+-------+
|    1 | Hello |
+------+-------+
1 row in set (0.00 sec)

Sqoop version is: sqoop-1.4.1-incubating__hadoop-1.0.0
Hadoop Version: CDH3u4

Doen't Sqoop support any other record delimiter than "\n" or am I missing
something?
Please suggest solution for this.

Thanks,
Tushar

Re: Sqoop export using non-new line row delimiter

Posted by Tushar Sudake <et...@gmail.com>.

Hi,

Found that Cloudera Sqoop has open issue which didn't allow another record
delimiter than 'n'.
https://issues.cloudera.org/browse/SQOOP-136

Could anyone please confirm that this issue has been moved to/fixed in
Apache Sqoop after incubation?

Thanks,
Tushar

On Thu, May 17, 2012 at 5:29 PM, Tushar Sudake <et...@gmail.com> wrote:

> Hi,
>
> I have some data in text file on HDFS and want to export this data into
> MySQL database.
> But I want Sqoop to use "|" as record delimiter instead of default "\n"
> record delimiter.
>
> So I am specifying ' --input-lines-terminated-by "|" ' option in my Sqoop
> export command.
>
> The export succeeds, but the number of records exported shown is  only 1.
> And when I check in MySQL target table, I see only one record.
>
> Looks like only one record before first "|" is getting exported.
>
> Here's sample data on HDFS:
>
> 1,Hello|2,How|3,Are|4,You|5,I|6,am|7,fine|
>
> Sqoop Export command:
>
> bin/sqoop export --connect 'jdbc:mysql://localhost/mydb' -password pwd
> --username usr --table mytable --export-dir data
> --input-lines-terminated-by "|"
>
> Console logs:
>
> 12/05/17 03:32:02 WARN tool.BaseSqoopTool: Setting your password on the
> command-line is insecure. Consider using -P instead.
> 12/05/17 03:32:02 INFO manager.MySQLManager: Preparing to use a MySQL
> streaming resultset.
> 12/05/17 03:32:02 INFO tool.CodeGenTool: Beginning code generation
> 12/05/17 03:32:02 INFO manager.SqlManager: Executing SQL statement: SELECT
> t.* FROM `mytable` AS t LIMIT 1
> 12/05/17 03:32:03 INFO orm.CompilationManager: HADOOP_HOME is
> /home/tushar/hadoop-0.20.2-cdh3u4
> Note:
> /tmp/sqoop-tushar/compile/fd6d3bfd4c2ed7f2e19a2de418993dfc/mytable.java
> uses or overrides a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> 12/05/17 03:32:04 INFO orm.CompilationManager: Writing jar file:
> /tmp/sqooptushar/compile/fd6d3bfd4c2ed7f2e19a2de418993dfc/mytable.jar
> 12/05/17 03:32:04 INFO mapreduce.ExportJobBase: Beginning export of mytable
> 12/05/17 03:32:06 INFO input.FileInputFormat: Total input paths to process
> : 1
> 12/05/17 03:32:06 INFO input.FileInputFormat: Total input paths to process
> : 1
> 12/05/17 03:32:06 INFO mapred.JobClient: Running job: job_201205110542_0432
> 12/05/17 03:32:07 INFO mapred.JobClient:  map 0% reduce 0%
> 12/05/17 03:32:13 INFO mapred.JobClient:  map 100% reduce 0%
> 12/05/17 03:32:14 INFO mapred.JobClient: Job complete:
> job_201205110542_0432
> 12/05/17 03:32:14 INFO mapred.JobClient: Counters: 16
> 12/05/17 03:32:14 INFO mapred.JobClient:   Job Counters
> 12/05/17 03:32:14 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6685
> 12/05/17 03:32:14 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 12/05/17 03:32:14 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 12/05/17 03:32:14 INFO mapred.JobClient:     Launched map tasks=1
> 12/05/17 03:32:14 INFO mapred.JobClient:     Data-local map tasks=1
> 12/05/17 03:32:14 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 12/05/17 03:32:14 INFO mapred.JobClient:   FileSystemCounters
> 12/05/17 03:32:14 INFO mapred.JobClient:     HDFS_BYTES_READ=166
> 12/05/17 03:32:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=79082
> 12/05/17 03:32:14 INFO mapred.JobClient:   Map-Reduce Framework
> 12/05/17 03:32:14 INFO mapred.JobClient:     Map input records=1
> 12/05/17 03:32:14 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=68677632
> 12/05/17 03:32:14 INFO mapred.JobClient:     Spilled Records=0
> 12/05/17 03:32:14 INFO mapred.JobClient:     CPU time spent (ms)=1130
> 12/05/17 03:32:14 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=39911424
> 12/05/17 03:32:14 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=392290304
> 12/05/17 03:32:14 INFO mapred.JobClient:     Map output records=1
> 12/05/17 03:32:14 INFO mapred.JobClient:     SPLIT_RAW_BYTES=117
> 12/05/17 03:32:14 INFO mapreduce.ExportJobBase: Transferred 166 bytes in
> 9.6013 seconds (17.2893 bytes/sec)
> 12/05/17 03:32:14 INFO mapreduce.ExportJobBase: Exported 1 records.
>
> On MySQL side:
>
> mysql> select * from mytable;
> +------+-------+
> | i    | name  |
> +------+-------+
> |    1 | Hello |
> +------+-------+
> 1 row in set (0.00 sec)
>
> Sqoop version is: sqoop-1.4.1-incubating__hadoop-1.0.0
> Hadoop Version: CDH3u4
>
> Doen't Sqoop support any other record delimiter than "\n" or am I missing
> something?
> Please suggest solution for this.
>
> Thanks,
> Tushar
>