You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Harsh J Chouraria (JIRA)" <ji...@apache.org> on 2011/03/11 05:20:59 UTC

[jira] Updated: (HBASE-3623) Allow non-XML representable separator characters in the ImportTSV tool

     [ https://issues.apache.org/jira/browse/HBASE-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J Chouraria updated HBASE-3623:
-------------------------------------

    Attachment: hbase.importtsv.xml.friendly.r1.diff

I've attached a patch (against trunk/) that uses Base64 encoding to achieve this.

Perhaps this can be back-ported too (vastly helps imports in some scenarios, where one would otherwise translate (tr, etc.) the files before using this tool).

The existing test-case for ImportTSV passes, and I have added a new one for testing the importtsv's mapper (no test was present at all for this one).

> Allow non-XML representable separator characters in the ImportTSV tool
> ----------------------------------------------------------------------
>
>                 Key: HBASE-3623
>                 URL: https://issues.apache.org/jira/browse/HBASE-3623
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Cloudera Hadoop/HBase (3B4)
>            Reporter: Harsh J Chouraria
>              Labels: import
>             Fix For: 0.92.0
>
>         Attachments: hbase.importtsv.xml.friendly.r1.diff
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The current importtsv functionality will not work if one passes a non-XML representable character as the separator character (say, an escape character - \u001b, fairly common in use).
> {code}
> -Dimporttsv.separator=$'\x1b' # This param fails the submitter when serialized.
> {code}
> While this is a limitation with the Configuration class's being serialized as an XML, it can be circumvented by applying a suitable encoding that makes a string XML-compatible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira