You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Joel Sciandra (JIRA)" <ji...@apache.org> on 2012/10/05 20:36:02 UTC

[jira] [Created] (SQOOP-622) possible import bug with embedded LF (0x0A) in VARCHAR field

Joel Sciandra created SQOOP-622:
-----------------------------------

             Summary: possible import bug with embedded LF (0x0A) in VARCHAR field
                 Key: SQOOP-622
                 URL: https://issues.apache.org/jira/browse/SQOOP-622
             Project: Sqoop
          Issue Type: Bug
          Components: connectors
    Affects Versions: 1.4.1-incubating
         Environment: CentOS 5.8
            Reporter: Joel Sciandra
            Priority: Minor


Given the command:
sqoop import --connect jdbc:oracle:thin:@//somecomputer.com:2115/bla --username USER_SELECT --password itssecret --target-dir /user/me/sqoop --table PROD2.XXX_TRANS --fields-terminated-by '\0x7C' --enclosed-by '\0x60'

I have a REMARKS field defined as a VARCHAR2(4000). It is for a comments text box on a web site. Sometimes customers hit <CR> and that gets embedded in the remarks field. 
When that gets processed, it appears that SQOOP is responding to the contents of the field instead of just outputting the whole thing within the enclosed-by characters.

grep 53159612 part-m-00000
`53159612`|`53159611`|`anapi`|`OWENS TEGRA`|`USPS=8101 LEPRECHAUN WAY


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SQOOP-622) possible import bug with embedded LF (0x0A) in VARCHAR field

Posted by "Joel Sciandra (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470550#comment-13470550 ] 

Joel Sciandra commented on SQOOP-622:
-------------------------------------

actually, the LF is in the output file which causes problems processing it. 
Found the actual line and the next line starts with:
`|`null`|.

So, not sure if it could be fixed with --hive-drop-import-delims
                
> possible import bug with embedded LF (0x0A) in VARCHAR field
> ------------------------------------------------------------
>
>                 Key: SQOOP-622
>                 URL: https://issues.apache.org/jira/browse/SQOOP-622
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors
>    Affects Versions: 1.4.1-incubating
>         Environment: CentOS 5.8
>            Reporter: Joel Sciandra
>            Priority: Minor
>              Labels: import, oracle
>
> Given the command:
> sqoop import --connect jdbc:oracle:thin:@//somecomputer.com:2115/bla --username USER_SELECT --password itssecret --target-dir /user/me/sqoop --table PROD2.XXX_TRANS --fields-terminated-by '\0x7C' --enclosed-by '\0x60'
> I have a REMARKS field defined as a VARCHAR2(4000). It is for a comments text box on a web site. Sometimes customers hit <CR> and that gets embedded in the remarks field. 
> When that gets processed, it appears that SQOOP is responding to the contents of the field instead of just outputting the whole thing within the enclosed-by characters.
> grep 53159612 part-m-00000
> `53159612`|`53159611`|`anapi`|`OWENS TEGRA`|`USPS=8101 LEPRECHAUN WAY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (SQOOP-622) possible import bug with embedded LF (0x0A) in VARCHAR field

Posted by "Joel Sciandra (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SQOOP-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Sciandra resolved SQOOP-622.
---------------------------------

       Resolution: Invalid
    Fix Version/s: 1.4.1-incubating

--hive-drop-import-delims solves my issue
                
> possible import bug with embedded LF (0x0A) in VARCHAR field
> ------------------------------------------------------------
>
>                 Key: SQOOP-622
>                 URL: https://issues.apache.org/jira/browse/SQOOP-622
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors
>    Affects Versions: 1.4.1-incubating
>         Environment: CentOS 5.8
>            Reporter: Joel Sciandra
>            Priority: Minor
>              Labels: import, oracle
>             Fix For: 1.4.1-incubating
>
>
> Given the command:
> sqoop import --connect jdbc:oracle:thin:@//somecomputer.com:2115/bla --username USER_SELECT --password itssecret --target-dir /user/me/sqoop --table PROD2.XXX_TRANS --fields-terminated-by '\0x7C' --enclosed-by '\0x60'
> I have a REMARKS field defined as a VARCHAR2(4000). It is for a comments text box on a web site. Sometimes customers hit <CR> and that gets embedded in the remarks field. 
> When that gets processed, it appears that SQOOP is responding to the contents of the field instead of just outputting the whole thing within the enclosed-by characters.
> grep 53159612 part-m-00000
> `53159612`|`53159611`|`anapi`|`OWENS TEGRA`|`USPS=8101 LEPRECHAUN WAY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SQOOP-622) possible import bug with embedded LF (0x0A) in VARCHAR field

Posted by "Joel Sciandra (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472471#comment-13472471 ] 

Joel Sciandra commented on SQOOP-622:
-------------------------------------

Sorry. Fixed by --fields-terminated-by '\0x7C' --enclosed-by '\0x60' --hive-drop-import-delims
                
> possible import bug with embedded LF (0x0A) in VARCHAR field
> ------------------------------------------------------------
>
>                 Key: SQOOP-622
>                 URL: https://issues.apache.org/jira/browse/SQOOP-622
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors
>    Affects Versions: 1.4.1-incubating
>         Environment: CentOS 5.8
>            Reporter: Joel Sciandra
>            Priority: Minor
>              Labels: import, oracle
>
> Given the command:
> sqoop import --connect jdbc:oracle:thin:@//somecomputer.com:2115/bla --username USER_SELECT --password itssecret --target-dir /user/me/sqoop --table PROD2.XXX_TRANS --fields-terminated-by '\0x7C' --enclosed-by '\0x60'
> I have a REMARKS field defined as a VARCHAR2(4000). It is for a comments text box on a web site. Sometimes customers hit <CR> and that gets embedded in the remarks field. 
> When that gets processed, it appears that SQOOP is responding to the contents of the field instead of just outputting the whole thing within the enclosed-by characters.
> grep 53159612 part-m-00000
> `53159612`|`53159611`|`anapi`|`OWENS TEGRA`|`USPS=8101 LEPRECHAUN WAY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira