You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Vlad MARIN (JIRA)" <ji...@apache.org> on 2016/01/05 19:47:39 UTC
[jira] [Created] (SQOOP-2777) Sqoop 1.4.6 import failure on special SQL Server column names

Vlad MARIN created SQOOP-2777:
---------------------------------

             Summary: Sqoop 1.4.6 import failure on special SQL Server column names
                 Key: SQOOP-2777
                 URL: https://issues.apache.org/jira/browse/SQOOP-2777
             Project: Sqoop
          Issue Type: Bug
          Components: codegen, connectors/sqlserver
    Affects Versions: 1.4.6
         Environment: Hortonworks Data Platform v2.3.0.0 (build 2557)
RHEL v6.7
Microsoft JDBC driver v4.1
            Reporter: Vlad MARIN


I'm using Sqoop to import various tables from an MS SQL Server instance. The majority of imports are fine but a couple of tables fail to import due to what apparently is the specific naming of some table columns. Such columns are actually named "group", "merge", "order", "from", etc.

It looks like when executing the final Java statement, SQL Server thinks those are actual SQL keywords instead of column names, and hence tries to execute commands like "group by" or "order by". Or at least so it seams...

Eliminating those specific columns from the Sqoop import job ends up in a successful table import. But of course, I need those columns as well and changing their name in SQL Server is not an option (production DB with thousands of users and many other live applications connected to it constantly).

Sqoop command sample:
{noformat}
sqoop import 
--connect 'jdbc:sqlserver://SERVER\INSTANCE;database=DB;username=USR;password=PWD' 
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver 
--table TABLE 
--columns IdGGD,IdGGM,IdImage,LinkName,Invert,Merge,Mirror,Order,GGMXTId,GGMXTName,GGMXTP,MarkerP,MarkerW,MarkerH,MarkerX1,MarkerY1,Instruction 
--direct 
--outdir /some/local/directory 
--as-textfile 
--target-dir /some/HDFS/directory 
--null-string NULL 
--null-non-string NULL 
--append
{noformat}

The obtained error looks like this:
{noformat}
2016-01-05 18:55:42,469 INFO  - [main:] ~  map 0% reduce 0% (Job:1367)
2016-01-05 18:55:47,531 INFO  - [main:] ~ Task Id : attempt_1451990134058_0040_m_000000_0, Status : FAILED (Job:1406)
Error: java.io.IOException: SQLException in nextKeyValue
        at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword 'Merge'.
        at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:217)
        at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1635)
        at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:426)
        at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:372)
        at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:6276)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1793)
        at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:184)
        at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:159)
        at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:284)
        at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
        at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
        ... 12 more
{noformat}

One workaround that I found is to use the Sqoop --query option instead of the --table one. But even in this case I can only import all the columns of the specific table, without being able to select specific columns. 

Sqoop comand sample:
{noformat}
sqoop import 
--connect 'jdbc:sqlserver://SERVER\INSTANCE;database=DB;username=USR;password=PWD' 
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver 
--query 'SELECT * FROM TABLE WHERE $CONDITIONS' 
--num-mappers 1 
--direct 
--outdir /some/local/directory 
--as-textfile 
--target-dir /some/HDFS/directory 
--null-string NULL 
--null-non-string NULL 
--append
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)