You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2009/02/05 21:21:59 UTC

[jira] Commented: (HIVE-136) SerDe should escape some special characters

    [ https://issues.apache.org/jira/browse/HIVE-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670887#action_12670887 ] 

Joydeep Sen Sarma commented on HIVE-136:
----------------------------------------

sorry for dropping the ball on this:

the case that concerned me was the case of transform. say we have an input file with "\005" that was produced outside hive. based on what i understand - the proposal is to convert it to code point 5. i feel uncomfortable with this - i would rather that we pass thru this data and let the user (either in the transform script or via an explicit UDF) deal with it.  same concern with \0. (let's say i dump out a directory listing from a windows machine - "\0" fragment would be for a filename beginning with char '0' - instead if we unescape it to character 0 and pass it to the script - it would make it difficult to analyze this kind of data. I am also not sure how different languages would deal with a null character - some (like C) might drop the part beyond the null character altogether).

so my vote would be to keep the unescaping to the bare minimum required and provide other functions to provide any enhanced semantics.

> SerDe should escape some special characters
> -------------------------------------------
>
>                 Key: HIVE-136
>                 URL: https://issues.apache.org/jira/browse/HIVE-136
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Zheng Shao
>            Priority: Critical
>
> MetadataTypedColumnsetSerDe and DynamicSerDe should escape some special characters like '\n' or the column/item/key separator.
> Otherwise the data will look corrupted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.