You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Marcus Truscello (JIRA)" <ji...@apache.org> on 2015/12/18 16:12:46 UTC
[jira] [Created] (SQOOP-2750) Support --fields-terminated-by value
greater than 127 when using --hive-import
Marcus Truscello created SQOOP-2750:
---------------------------------------
Summary: Support --fields-terminated-by value greater than 127 when using --hive-import
Key: SQOOP-2750
URL: https://issues.apache.org/jira/browse/SQOOP-2750
Project: Sqoop
Issue Type: Improvement
Components: hive-integration
Affects Versions: 1.99.6
Reporter: Marcus Truscello
Priority: Minor
Using a {{fields-terminated-by}} value greater than 127 builds a file with the correct delimiter but causes an exception when included with {{hive-import}}. The relevant code is in {{src/java/apache/sqoop/hive/TableDefWriter.java}}:
https://github.com/apache/sqoop/blob/f19e2a523579db8c28a96febfd3cf35a5d58adc6/src/java/org/apache/sqoop/hive/TableDefWriter.java#L278-L300
The assumption is only half true. Hive only supports delimiters up to 127 in *octal* form, but it also supports delimiters up to 255 in signed character form (two's compliment).
For example, a {{fields-terminated-by}} value {{'\0376'}} (ASCII 254) is valid for sqoop, but when used in a Hive table definition it should be converted to {{'-2'}} (with single quotes).
I suggest rejecting delimiters over 255, converting delimiters over 127 to two's compliment signed characters, and leaving delimiters at or below 127 as octal.
(Work estimate inflated to account of number of tests that may need to be modified.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)