You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Harsh J Chouraria (JIRA)" <ji...@apache.org> on 2011/03/11 05:20:59 UTC

[jira] Created: (HBASE-3623) Allow non-XML representable separator characters in the ImportTSV tool

Allow non-XML representable separator characters in the ImportTSV tool
----------------------------------------------------------------------

                 Key: HBASE-3623
                 URL: https://issues.apache.org/jira/browse/HBASE-3623
             Project: HBase
          Issue Type: Improvement
          Components: mapreduce
    Affects Versions: 0.90.1
         Environment: Cloudera Hadoop/HBase (3B4)
            Reporter: Harsh J Chouraria
             Fix For: 0.92.0
         Attachments: hbase.importtsv.xml.friendly.r1.diff

The current importtsv functionality will not work if one passes a non-XML representable character as the separator character (say, an escape character - \u001b, fairly common in use).

{code}
-Dimporttsv.separator=$'\x1b' # This param fails the submitter when serialized.
{code}

While this is a limitation with the Configuration class's being serialized as an XML, it can be circumvented by applying a suitable encoding that makes a string XML-compatible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3623) Allow non-XML representable separator characters in the ImportTSV tool

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005588#comment-13005588 ] 

Hudson commented on HBASE-3623:
-------------------------------

Integrated in HBase-TRUNK #1781 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1781/])
    

> Allow non-XML representable separator characters in the ImportTSV tool
> ----------------------------------------------------------------------
>
>                 Key: HBASE-3623
>                 URL: https://issues.apache.org/jira/browse/HBASE-3623
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Cloudera Hadoop/HBase (3B4)
>            Reporter: Harsh J Chouraria
>              Labels: import
>             Fix For: 0.92.0
>
>         Attachments: hbase.importtsv.xml.friendly.r1.diff
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The current importtsv functionality will not work if one passes a non-XML representable character as the separator character (say, an escape character - \u001b, fairly common in use).
> {code}
> -Dimporttsv.separator=$'\x1b' # This param fails the submitter when serialized.
> {code}
> While this is a limitation with the Configuration class's being serialized as an XML, it can be circumvented by applying a suitable encoding that makes a string XML-compatible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3623) Allow non-XML representable separator characters in the ImportTSV tool

Posted by "Harsh J Chouraria (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005504#comment-13005504 ] 

Harsh J Chouraria commented on HBASE-3623:
------------------------------------------

Whoa. That was fast!

> Allow non-XML representable separator characters in the ImportTSV tool
> ----------------------------------------------------------------------
>
>                 Key: HBASE-3623
>                 URL: https://issues.apache.org/jira/browse/HBASE-3623
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Cloudera Hadoop/HBase (3B4)
>            Reporter: Harsh J Chouraria
>              Labels: import
>             Fix For: 0.92.0
>
>         Attachments: hbase.importtsv.xml.friendly.r1.diff
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The current importtsv functionality will not work if one passes a non-XML representable character as the separator character (say, an escape character - \u001b, fairly common in use).
> {code}
> -Dimporttsv.separator=$'\x1b' # This param fails the submitter when serialized.
> {code}
> While this is a limitation with the Configuration class's being serialized as an XML, it can be circumvented by applying a suitable encoding that makes a string XML-compatible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3623) Allow non-XML representable separator characters in the ImportTSV tool

Posted by "Harsh J Chouraria (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J Chouraria updated HBASE-3623:
-------------------------------------

    Attachment: hbase.importtsv.xml.friendly.r1.diff

I've attached a patch (against trunk/) that uses Base64 encoding to achieve this.

Perhaps this can be back-ported too (vastly helps imports in some scenarios, where one would otherwise translate (tr, etc.) the files before using this tool).

The existing test-case for ImportTSV passes, and I have added a new one for testing the importtsv's mapper (no test was present at all for this one).

> Allow non-XML representable separator characters in the ImportTSV tool
> ----------------------------------------------------------------------
>
>                 Key: HBASE-3623
>                 URL: https://issues.apache.org/jira/browse/HBASE-3623
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Cloudera Hadoop/HBase (3B4)
>            Reporter: Harsh J Chouraria
>              Labels: import
>             Fix For: 0.92.0
>
>         Attachments: hbase.importtsv.xml.friendly.r1.diff
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The current importtsv functionality will not work if one passes a non-XML representable character as the separator character (say, an escape character - \u001b, fairly common in use).
> {code}
> -Dimporttsv.separator=$'\x1b' # This param fails the submitter when serialized.
> {code}
> While this is a limitation with the Configuration class's being serialized as an XML, it can be circumvented by applying a suitable encoding that makes a string XML-compatible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (HBASE-3623) Allow non-XML representable separator characters in the ImportTSV tool

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3623.
--------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Thank you for the patch Harsh.  I applied to trunk and branch.

> Allow non-XML representable separator characters in the ImportTSV tool
> ----------------------------------------------------------------------
>
>                 Key: HBASE-3623
>                 URL: https://issues.apache.org/jira/browse/HBASE-3623
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Cloudera Hadoop/HBase (3B4)
>            Reporter: Harsh J Chouraria
>              Labels: import
>             Fix For: 0.92.0
>
>         Attachments: hbase.importtsv.xml.friendly.r1.diff
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The current importtsv functionality will not work if one passes a non-XML representable character as the separator character (say, an escape character - \u001b, fairly common in use).
> {code}
> -Dimporttsv.separator=$'\x1b' # This param fails the submitter when serialized.
> {code}
> While this is a limitation with the Configuration class's being serialized as an XML, it can be circumvented by applying a suitable encoding that makes a string XML-compatible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira