You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@sqoop.apache.org by Rajkumar <ra...@yahoo.co.in> on 2016/03/17 05:25:45 UTC

Sqoop export to SQL Server fails/hangs for more number of columns

Hi folks,





I am trying to export data from HDFS to SQL Server. Theoriginal table has over 500 columns and every time I execute Sqoop export jobit gets stuck showing mapreduce completed at 100%. I created two dummy tablesas shown below to find out where the exact problem persists.  The only difference between table1 and table2is that the later has one additional column [col14 varchar(5)]  First, I ran export job for Table1 which has 13 columns [datatypevarchar (5)]. The job completed successfully and exported all the 3 records tothe SQL Server.   Next, I executed the export job for Table2 with 14 columns. WhenI ran this job, I didn’t see any error messages/exceptions, but it hangsforever after map completed at 100%. The SQL Server Activity Monitor shows aprocess is being created however it’s not receiving any data/prepared statementfrom Hadoop.   Is this problem exists only with SQL Server? Is there anylimitation on the number of columns exported to SQL Server? Please advise.   ConfigurationHadoop Version – Cloudera 2.6.0-CDH-5.5.2Sqoop Version – 1.4.6SQL Server Version – 2008 R2   Table 1   CREATE TABLE[dbo].[tbldummy1](      [col1] [varchar] (5) NOT NULL,      [col2] [varchar](5) NULL,      [col3] [varchar](5) NULL,      [col4] [varchar](5) NULL,      [col5] [varchar](5) NULL,      [col6] [varchar](5) NULL,      [col7] [varchar](5) NULL,      [col8] [varchar](5) NULL,      [col9] [varchar](5) NULL,      [col10] [varchar](5) NULL,      [col11] [varchar](5) NULL,      [col12] [varchar](5) NULL,      [col13] [varchar](5) NULL, CONSTRAINT [PK_dummy1] PRIMARYKEY ([col1] ASC))  Sqoop Command for Table 1  sqoop export \--connect “jdbc:sqlserver://x.x.x.x:port;database=xxxxxxx” \--username xxxxxx --password yyyyyy \--table tbldummy1 \--export-dir /user/hue/Out1 \--input-fields-terminated-by '|' \-m 1 \--verbose  Input data for Out1  aa|01|02|03|04|05|06|07|08|09|10|11|12bb|01|02|03|04|05|06|07|08|09|10|11|12cc|01|02|03|04|05|06|07|08|09|10|11|12    Table 2   CREATE TABLE[dbo].[tbldummy2](      [col1] [varchar] (5) NOT NULL,      [col2] [varchar](5) NULL,      [col3] [varchar](5) NULL,      [col4] [varchar](5) NULL,      [col5] [varchar](5) NULL,      [col6] [varchar](5) NULL,      [col7] [varchar](5) NULL,      [col8] [varchar](5) NULL,      [col9] [varchar](5) NULL,      [col10] [varchar](5) NULL,      [col11] [varchar](5) NULL,      [col12] [varchar](5) NULL,      [col13] [varchar](5) NULL,      [col14] [varchar](5) NULL, CONSTRAINT [PK_dummy2] PRIMARYKEY ([col1] ASC))  Sqoop Command for Table 2  sqoop export \--connect "jdbc:sqlserver://x.x.x.x:port;database=xxxxxxx"\--username xxxxxx --password yyyyyy \--table tbldummy2 \--export-dir /user/hue/Out2 \--input-fields-terminated-by '|' \-m 1 \--verbose  Input data for Table 2  aa|01|02|03|04|05|06|07|08|09|10|11|12|13bb|01|02|03|04|05|06|07|08|09|10|11|12|13cc|01|02|03|04|05|06|07|08|09|10|11|12|13                  Console logs for Table 2  16/03/1623:35:01 INFO mapreduce.Job: Running job: job_1458150283440_002816/03/1623:35:07 INFO mapreduce.Job: Job job_1458150283440_0028 running in uber mode :false16/03/1623:35:07 INFO mapreduce.Job:  map 0%reduce 0%16/03/1623:35:18 INFO mapreduce.Job:  map 100%reduce 0%   Thanks!

Re: Sqoop export to SQL Server fails/hangs for more number of columns

Posted by Ali Thawban <or...@gmail.com>.

What do the hadoop job/attempt logs say?

Kill the job by running
mapred job -kill <job_id>

Get the job *attempt* logs by running
mapred job -kill <job_id> <attempt_id>

Whenever Sqoop fails because of SQL errors - e.g. if a decimal value being
inserted is too big for the column or a varchar field would be truncated,
it'll appear to hang like this, and you'll find the actual SQL error in the
task attempt logs. IMO this is a huge flaw in Sqoop's export feature and I
hate to use Sqoop for exports because of it. My time-sensitive Hadoop job
queues get jammed by Sqoop exports choking on one bad value in one row.

Unless you really need a clustered export to SQL for this, i.e. you're
exporting so many rows that you're exhausting a single *client* machine's
CPU/RAM, you might want to avoid the export feature. I've found it too
immature to be useful, at least in Sqoop1.

Good luck.

On Thu, Mar 17, 2016 at 12:25 AM, Rajkumar <ra...@yahoo.co.in> wrote:

> Hi folks,
>
>
>
>
>
> I am trying to export data from HDFS to SQL Server. The original table has
> over 500 columns and every time I execute Sqoop export job it gets stuck
> showing mapreduce completed at 100%. I created two dummy tables as shown
> below to find out where the exact problem persists.  The only difference
> between table1 and table2 is that the later has one additional column
> [col14 varchar(5)]
>
> First, I ran export job for Table1 which has 13 columns [datatype varchar
> (5)]. The job completed successfully and exported all the 3 records to the
> SQL Server.
>
> Next, I executed the export job for Table2 with 14 columns. When I ran
> this job, I didn’t see any error messages/exceptions, but it hangs forever
> after map completed at 100%. The SQL Server Activity Monitor shows a
> process is being created however it’s not receiving any data/prepared
> statement from Hadoop.
>
> Is this problem exists only with SQL Server? Is there any limitation on
> the number of columns exported to SQL Server? Please advise.
>
> *Configuration*
> Hadoop Version – Cloudera 2.6.0-CDH-5.5.2
> Sqoop Version – 1.4.6
> SQL Server Version – 2008 R2
>
> *Table 1 *
>
> CREATE TABLE [dbo].[tbldummy1](
>       [col1] [varchar] (5) NOT NULL,
>       [col2] [varchar](5) NULL,
>       [col3] [varchar](5) NULL,
>       [col4] [varchar](5) NULL,
>       [col5] [varchar](5) NULL,
>       [col6] [varchar](5) NULL,
>       [col7] [varchar](5) NULL,
>       [col8] [varchar](5) NULL,
>       [col9] [varchar](5) NULL,
>       [col10] [varchar](5) NULL,
>       [col11] [varchar](5) NULL,
>       [col12] [varchar](5) NULL,
>       [col13] [varchar](5) NULL,
>  CONSTRAINT [PK_dummy1] PRIMARY KEY ([col1] ASC))
>
> *Sqoop Command for Table 1*
>
> sqoop export \
> --connect “jdbc:sqlserver://x.x.x.x:port;database=xxxxxxx” \
> --username xxxxxx --password yyyyyy \
> --table tbldummy1 \
> --export-dir /user/hue/Out1 \
> --input-fields-terminated-by '|' \
> -m 1 \
> --verbose
>
> *Input data for Out1*
>
> aa|01|02|03|04|05|06|07|08|09|10|11|12
> bb|01|02|03|04|05|06|07|08|09|10|11|12
> cc|01|02|03|04|05|06|07|08|09|10|11|12
>
>
> *Table 2 *
>
> CREATE TABLE [dbo].[tbldummy2](
>       [col1] [varchar] (5) NOT NULL,
>       [col2] [varchar](5) NULL,
>       [col3] [varchar](5) NULL,
>       [col4] [varchar](5) NULL,
>       [col5] [varchar](5) NULL,
>       [col6] [varchar](5) NULL,
>       [col7] [varchar](5) NULL,
>       [col8] [varchar](5) NULL,
>       [col9] [varchar](5) NULL,
>       [col10] [varchar](5) NULL,
>       [col11] [varchar](5) NULL,
>       [col12] [varchar](5) NULL,
>       [col13] [varchar](5) NULL,
>       [col14] [varchar](5) NULL,
>  CONSTRAINT [PK_dummy2] PRIMARY KEY ([col1] ASC))
>
> *Sqoop Command for Table 2*
>
> sqoop export \
> --connect "jdbc:sqlserver://x.x.x.x:port;database=xxxxxxx" \
> --username xxxxxx --password yyyyyy \
> --table tbldummy2 \
> --export-dir /user/hue/Out2 \
> --input-fields-terminated-by '|' \
> -m 1 \
> --verbose
>
> *Input data for Table 2*
>
> aa|01|02|03|04|05|06|07|08|09|10|11|12|13
> bb|01|02|03|04|05|06|07|08|09|10|11|12|13
> cc|01|02|03|04|05|06|07|08|09|10|11|12|13
>
>                 *Console logs for Table 2*
>
> 16/03/16 23:35:01 INFO mapreduce.Job: Running job: job_1458150283440_0028
> 16/03/16 23:35:07 INFO mapreduce.Job: Job job_1458150283440_0028 running
> in uber mode : false
> 16/03/16 23:35:07 INFO mapreduce.Job:  map 0% reduce 0%
> 16/03/16 23:35:18 INFO mapreduce.Job:  map 100% reduce 0%
>
>  Thanks!
>
>
>
>
>