You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/01/31 02:18:59 UTC

[jira] Created: (HIVE-263) TCTLSeparatedProtocol should use UTF-8 to decode the data

TCTLSeparatedProtocol should use UTF-8 to decode the data
---------------------------------------------------------

                 Key: HIVE-263
                 URL: https://issues.apache.org/jira/browse/HIVE-263
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Zheng Shao


TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:

Now:
{code}
          String row = new String(buf, 0, length);
{code}

We want:
{code}
          String row;
          try {
            row = Text.decode(buf, 0, length);
          } catch (CharacterCodingException e) {
            throw new RuntimeException(e);
          }
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-263) TCTLSeparatedProtocol should use UTF-8 to decode the data

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-263:
----------------------------

    Attachment: HIVE-263.2.patch

Modified serialization as well.
Also checked MetadataTypedColumnsetSerDe. That code is already using Text.


> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
>                 Key: HIVE-263
>                 URL: https://issues.apache.org/jira/browse/HIVE-263
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-263.1.patch, HIVE-263.2.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
>           String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
>           String row;
>           try {
>             row = Text.decode(buf, 0, length);
>           } catch (CharacterCodingException e) {
>             throw new RuntimeException(e);
>           }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-263) TCTLSeparatedProtocol should use UTF-8 to decode the data

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-263:
--------------------------------

    Fix Version/s: 0.3.0
                       (was: 0.6.0)
      Component/s: Serializers/Deserializers

> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
>                 Key: HIVE-263
>                 URL: https://issues.apache.org/jira/browse/HIVE-263
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.3.0
>
>         Attachments: HIVE-263.1.patch, HIVE-263.2.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
>           String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
>           String row;
>           try {
>             row = Text.decode(buf, 0, length);
>           } catch (CharacterCodingException e) {
>             throw new RuntimeException(e);
>           }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-263) TCTLSeparatedProtocol should use UTF-8 to decode the data

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-263:
----------------------------

    Attachment: HIVE-263.1.patch

> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
>                 Key: HIVE-263
>                 URL: https://issues.apache.org/jira/browse/HIVE-263
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>         Attachments: HIVE-263.1.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
>           String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
>           String row;
>           try {
>             row = Text.decode(buf, 0, length);
>           } catch (CharacterCodingException e) {
>             throw new RuntimeException(e);
>           }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-263) TCTLSeparatedProtocol should use UTF-8 to decode the data

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669779#action_12669779 ] 

Joydeep Sen Sarma commented on HIVE-263:
----------------------------------------

we also need to encode using Text i think.

also - metadatatypedserde - i guess we need to fix that as well?

> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
>                 Key: HIVE-263
>                 URL: https://issues.apache.org/jira/browse/HIVE-263
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-263.1.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
>           String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
>           String row;
>           try {
>             row = Text.decode(buf, 0, length);
>           } catch (CharacterCodingException e) {
>             throw new RuntimeException(e);
>           }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-263) TCTLSeparatedProtocol should use UTF-8 to decode the data

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao reassigned HIVE-263:
-------------------------------

    Assignee: Zheng Shao

> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
>                 Key: HIVE-263
>                 URL: https://issues.apache.org/jira/browse/HIVE-263
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-263.1.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
>           String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
>           String row;
>           try {
>             row = Text.decode(buf, 0, length);
>           } catch (CharacterCodingException e) {
>             throw new RuntimeException(e);
>           }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-263) TCTLSeparatedProtocol should use UTF-8 to decode the data

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669825#action_12669825 ] 

Joydeep Sen Sarma commented on HIVE-263:
----------------------------------------

+1

the only thing that concerns me is that if any row does not conform to utf-8 - then the whole job fails. earlier we had tried to have a setup that the serde throws a serdeexception and we deal with it in query layer (we can ignore some fixed % of bad rows for example).

but looking at the code - this might be hard to do right now - so happy to wait for a rewrite for this :-)

> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
>                 Key: HIVE-263
>                 URL: https://issues.apache.org/jira/browse/HIVE-263
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-263.1.patch, HIVE-263.2.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
>           String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
>           String row;
>           try {
>             row = Text.decode(buf, 0, length);
>           } catch (CharacterCodingException e) {
>             throw new RuntimeException(e);
>           }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HIVE-263) TCTLSeparatedProtocol should use UTF-8 to decode the data

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao resolved HIVE-263.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.2.0
     Release Note: HIVE-263. TCTLSeparatedProtocol should use UTF-8 to encode/decode the data. (zshao)
     Hadoop Flags: [Reviewed]

Committed revision 740187. (for trunk)
Committed revision 740188. (for branch 0.2)


> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
>                 Key: HIVE-263
>                 URL: https://issues.apache.org/jira/browse/HIVE-263
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.2.0
>
>         Attachments: HIVE-263.1.patch, HIVE-263.2.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
>           String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
>           String row;
>           try {
>             row = Text.decode(buf, 0, length);
>           } catch (CharacterCodingException e) {
>             throw new RuntimeException(e);
>           }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.