You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Joel Sciandra (JIRA)" <ji...@apache.org> on 2012/10/05 20:36:02 UTC
[jira] [Created] (SQOOP-622) possible import bug with embedded LF
(0x0A) in VARCHAR field
Joel Sciandra created SQOOP-622:
-----------------------------------
Summary: possible import bug with embedded LF (0x0A) in VARCHAR field
Key: SQOOP-622
URL: https://issues.apache.org/jira/browse/SQOOP-622
Project: Sqoop
Issue Type: Bug
Components: connectors
Affects Versions: 1.4.1-incubating
Environment: CentOS 5.8
Reporter: Joel Sciandra
Priority: Minor
Given the command:
sqoop import --connect jdbc:oracle:thin:@//somecomputer.com:2115/bla --username USER_SELECT --password itssecret --target-dir /user/me/sqoop --table PROD2.XXX_TRANS --fields-terminated-by '\0x7C' --enclosed-by '\0x60'
I have a REMARKS field defined as a VARCHAR2(4000). It is for a comments text box on a web site. Sometimes customers hit <CR> and that gets embedded in the remarks field.
When that gets processed, it appears that SQOOP is responding to the contents of the field instead of just outputting the whole thing within the enclosed-by characters.
grep 53159612 part-m-00000
`53159612`|`53159611`|`anapi`|`OWENS TEGRA`|`USPS=8101 LEPRECHAUN WAY
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-622) possible import bug with embedded LF
(0x0A) in VARCHAR field
Posted by "Joel Sciandra (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470550#comment-13470550 ]
Joel Sciandra commented on SQOOP-622:
-------------------------------------
actually, the LF is in the output file which causes problems processing it.
Found the actual line and the next line starts with:
`|`null`|.
So, not sure if it could be fixed with --hive-drop-import-delims
> possible import bug with embedded LF (0x0A) in VARCHAR field
> ------------------------------------------------------------
>
> Key: SQOOP-622
> URL: https://issues.apache.org/jira/browse/SQOOP-622
> Project: Sqoop
> Issue Type: Bug
> Components: connectors
> Affects Versions: 1.4.1-incubating
> Environment: CentOS 5.8
> Reporter: Joel Sciandra
> Priority: Minor
> Labels: import, oracle
>
> Given the command:
> sqoop import --connect jdbc:oracle:thin:@//somecomputer.com:2115/bla --username USER_SELECT --password itssecret --target-dir /user/me/sqoop --table PROD2.XXX_TRANS --fields-terminated-by '\0x7C' --enclosed-by '\0x60'
> I have a REMARKS field defined as a VARCHAR2(4000). It is for a comments text box on a web site. Sometimes customers hit <CR> and that gets embedded in the remarks field.
> When that gets processed, it appears that SQOOP is responding to the contents of the field instead of just outputting the whole thing within the enclosed-by characters.
> grep 53159612 part-m-00000
> `53159612`|`53159611`|`anapi`|`OWENS TEGRA`|`USPS=8101 LEPRECHAUN WAY
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (SQOOP-622) possible import bug with embedded LF
(0x0A) in VARCHAR field
Posted by "Joel Sciandra (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joel Sciandra resolved SQOOP-622.
---------------------------------
Resolution: Invalid
Fix Version/s: 1.4.1-incubating
--hive-drop-import-delims solves my issue
> possible import bug with embedded LF (0x0A) in VARCHAR field
> ------------------------------------------------------------
>
> Key: SQOOP-622
> URL: https://issues.apache.org/jira/browse/SQOOP-622
> Project: Sqoop
> Issue Type: Bug
> Components: connectors
> Affects Versions: 1.4.1-incubating
> Environment: CentOS 5.8
> Reporter: Joel Sciandra
> Priority: Minor
> Labels: import, oracle
> Fix For: 1.4.1-incubating
>
>
> Given the command:
> sqoop import --connect jdbc:oracle:thin:@//somecomputer.com:2115/bla --username USER_SELECT --password itssecret --target-dir /user/me/sqoop --table PROD2.XXX_TRANS --fields-terminated-by '\0x7C' --enclosed-by '\0x60'
> I have a REMARKS field defined as a VARCHAR2(4000). It is for a comments text box on a web site. Sometimes customers hit <CR> and that gets embedded in the remarks field.
> When that gets processed, it appears that SQOOP is responding to the contents of the field instead of just outputting the whole thing within the enclosed-by characters.
> grep 53159612 part-m-00000
> `53159612`|`53159611`|`anapi`|`OWENS TEGRA`|`USPS=8101 LEPRECHAUN WAY
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-622) possible import bug with embedded LF
(0x0A) in VARCHAR field
Posted by "Joel Sciandra (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472471#comment-13472471 ]
Joel Sciandra commented on SQOOP-622:
-------------------------------------
Sorry. Fixed by --fields-terminated-by '\0x7C' --enclosed-by '\0x60' --hive-drop-import-delims
> possible import bug with embedded LF (0x0A) in VARCHAR field
> ------------------------------------------------------------
>
> Key: SQOOP-622
> URL: https://issues.apache.org/jira/browse/SQOOP-622
> Project: Sqoop
> Issue Type: Bug
> Components: connectors
> Affects Versions: 1.4.1-incubating
> Environment: CentOS 5.8
> Reporter: Joel Sciandra
> Priority: Minor
> Labels: import, oracle
>
> Given the command:
> sqoop import --connect jdbc:oracle:thin:@//somecomputer.com:2115/bla --username USER_SELECT --password itssecret --target-dir /user/me/sqoop --table PROD2.XXX_TRANS --fields-terminated-by '\0x7C' --enclosed-by '\0x60'
> I have a REMARKS field defined as a VARCHAR2(4000). It is for a comments text box on a web site. Sometimes customers hit <CR> and that gets embedded in the remarks field.
> When that gets processed, it appears that SQOOP is responding to the contents of the field instead of just outputting the whole thing within the enclosed-by characters.
> grep 53159612 part-m-00000
> `53159612`|`53159611`|`anapi`|`OWENS TEGRA`|`USPS=8101 LEPRECHAUN WAY
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira