You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Harsh J Chouraria (JIRA)" <ji...@apache.org> on 2011/03/11 05:20:59 UTC
[jira] Created: (HBASE-3623) Allow non-XML representable separator
characters in the ImportTSV tool
Allow non-XML representable separator characters in the ImportTSV tool
----------------------------------------------------------------------
Key: HBASE-3623
URL: https://issues.apache.org/jira/browse/HBASE-3623
Project: HBase
Issue Type: Improvement
Components: mapreduce
Affects Versions: 0.90.1
Environment: Cloudera Hadoop/HBase (3B4)
Reporter: Harsh J Chouraria
Fix For: 0.92.0
Attachments: hbase.importtsv.xml.friendly.r1.diff
The current importtsv functionality will not work if one passes a non-XML representable character as the separator character (say, an escape character - \u001b, fairly common in use).
{code}
-Dimporttsv.separator=$'\x1b' # This param fails the submitter when serialized.
{code}
While this is a limitation with the Configuration class's being serialized as an XML, it can be circumvented by applying a suitable encoding that makes a string XML-compatible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3623) Allow non-XML representable
separator characters in the ImportTSV tool
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005588#comment-13005588 ]
Hudson commented on HBASE-3623:
-------------------------------
Integrated in HBase-TRUNK #1781 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1781/])
> Allow non-XML representable separator characters in the ImportTSV tool
> ----------------------------------------------------------------------
>
> Key: HBASE-3623
> URL: https://issues.apache.org/jira/browse/HBASE-3623
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Affects Versions: 0.90.1
> Environment: Cloudera Hadoop/HBase (3B4)
> Reporter: Harsh J Chouraria
> Labels: import
> Fix For: 0.92.0
>
> Attachments: hbase.importtsv.xml.friendly.r1.diff
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The current importtsv functionality will not work if one passes a non-XML representable character as the separator character (say, an escape character - \u001b, fairly common in use).
> {code}
> -Dimporttsv.separator=$'\x1b' # This param fails the submitter when serialized.
> {code}
> While this is a limitation with the Configuration class's being serialized as an XML, it can be circumvented by applying a suitable encoding that makes a string XML-compatible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3623) Allow non-XML representable
separator characters in the ImportTSV tool
Posted by "Harsh J Chouraria (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005504#comment-13005504 ]
Harsh J Chouraria commented on HBASE-3623:
------------------------------------------
Whoa. That was fast!
> Allow non-XML representable separator characters in the ImportTSV tool
> ----------------------------------------------------------------------
>
> Key: HBASE-3623
> URL: https://issues.apache.org/jira/browse/HBASE-3623
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Affects Versions: 0.90.1
> Environment: Cloudera Hadoop/HBase (3B4)
> Reporter: Harsh J Chouraria
> Labels: import
> Fix For: 0.92.0
>
> Attachments: hbase.importtsv.xml.friendly.r1.diff
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The current importtsv functionality will not work if one passes a non-XML representable character as the separator character (say, an escape character - \u001b, fairly common in use).
> {code}
> -Dimporttsv.separator=$'\x1b' # This param fails the submitter when serialized.
> {code}
> While this is a limitation with the Configuration class's being serialized as an XML, it can be circumvented by applying a suitable encoding that makes a string XML-compatible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3623) Allow non-XML representable separator
characters in the ImportTSV tool
Posted by "Harsh J Chouraria (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J Chouraria updated HBASE-3623:
-------------------------------------
Attachment: hbase.importtsv.xml.friendly.r1.diff
I've attached a patch (against trunk/) that uses Base64 encoding to achieve this.
Perhaps this can be back-ported too (vastly helps imports in some scenarios, where one would otherwise translate (tr, etc.) the files before using this tool).
The existing test-case for ImportTSV passes, and I have added a new one for testing the importtsv's mapper (no test was present at all for this one).
> Allow non-XML representable separator characters in the ImportTSV tool
> ----------------------------------------------------------------------
>
> Key: HBASE-3623
> URL: https://issues.apache.org/jira/browse/HBASE-3623
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Affects Versions: 0.90.1
> Environment: Cloudera Hadoop/HBase (3B4)
> Reporter: Harsh J Chouraria
> Labels: import
> Fix For: 0.92.0
>
> Attachments: hbase.importtsv.xml.friendly.r1.diff
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The current importtsv functionality will not work if one passes a non-XML representable character as the separator character (say, an escape character - \u001b, fairly common in use).
> {code}
> -Dimporttsv.separator=$'\x1b' # This param fails the submitter when serialized.
> {code}
> While this is a limitation with the Configuration class's being serialized as an XML, it can be circumvented by applying a suitable encoding that makes a string XML-compatible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (HBASE-3623) Allow non-XML representable separator
characters in the ImportTSV tool
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack resolved HBASE-3623.
--------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Thank you for the patch Harsh. I applied to trunk and branch.
> Allow non-XML representable separator characters in the ImportTSV tool
> ----------------------------------------------------------------------
>
> Key: HBASE-3623
> URL: https://issues.apache.org/jira/browse/HBASE-3623
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Affects Versions: 0.90.1
> Environment: Cloudera Hadoop/HBase (3B4)
> Reporter: Harsh J Chouraria
> Labels: import
> Fix For: 0.92.0
>
> Attachments: hbase.importtsv.xml.friendly.r1.diff
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The current importtsv functionality will not work if one passes a non-XML representable character as the separator character (say, an escape character - \u001b, fairly common in use).
> {code}
> -Dimporttsv.separator=$'\x1b' # This param fails the submitter when serialized.
> {code}
> While this is a limitation with the Configuration class's being serialized as an XML, it can be circumvented by applying a suitable encoding that makes a string XML-compatible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira