You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/01/31 02:18:59 UTC
[jira] Created: (HIVE-263) TCTLSeparatedProtocol should use UTF-8
to decode the data
TCTLSeparatedProtocol should use UTF-8 to decode the data
---------------------------------------------------------
Key: HIVE-263
URL: https://issues.apache.org/jira/browse/HIVE-263
Project: Hadoop Hive
Issue Type: Bug
Reporter: Zheng Shao
TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
Now:
{code}
String row = new String(buf, 0, length);
{code}
We want:
{code}
String row;
try {
row = Text.decode(buf, 0, length);
} catch (CharacterCodingException e) {
throw new RuntimeException(e);
}
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-263) TCTLSeparatedProtocol should use UTF-8
to decode the data
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-263:
----------------------------
Attachment: HIVE-263.2.patch
Modified serialization as well.
Also checked MetadataTypedColumnsetSerDe. That code is already using Text.
> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
> Key: HIVE-263
> URL: https://issues.apache.org/jira/browse/HIVE-263
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Attachments: HIVE-263.1.patch, HIVE-263.2.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
> String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
> String row;
> try {
> row = Text.decode(buf, 0, length);
> } catch (CharacterCodingException e) {
> throw new RuntimeException(e);
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-263) TCTLSeparatedProtocol should use UTF-8
to decode the data
Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-263:
--------------------------------
Fix Version/s: 0.3.0
(was: 0.6.0)
Component/s: Serializers/Deserializers
> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
> Key: HIVE-263
> URL: https://issues.apache.org/jira/browse/HIVE-263
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Fix For: 0.3.0
>
> Attachments: HIVE-263.1.patch, HIVE-263.2.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
> String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
> String row;
> try {
> row = Text.decode(buf, 0, length);
> } catch (CharacterCodingException e) {
> throw new RuntimeException(e);
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-263) TCTLSeparatedProtocol should use UTF-8
to decode the data
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-263:
----------------------------
Attachment: HIVE-263.1.patch
> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
> Key: HIVE-263
> URL: https://issues.apache.org/jira/browse/HIVE-263
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Zheng Shao
> Attachments: HIVE-263.1.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
> String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
> String row;
> try {
> row = Text.decode(buf, 0, length);
> } catch (CharacterCodingException e) {
> throw new RuntimeException(e);
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-263) TCTLSeparatedProtocol should use UTF-8
to decode the data
Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669779#action_12669779 ]
Joydeep Sen Sarma commented on HIVE-263:
----------------------------------------
we also need to encode using Text i think.
also - metadatatypedserde - i guess we need to fix that as well?
> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
> Key: HIVE-263
> URL: https://issues.apache.org/jira/browse/HIVE-263
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Attachments: HIVE-263.1.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
> String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
> String row;
> try {
> row = Text.decode(buf, 0, length);
> } catch (CharacterCodingException e) {
> throw new RuntimeException(e);
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-263) TCTLSeparatedProtocol should use UTF-8
to decode the data
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao reassigned HIVE-263:
-------------------------------
Assignee: Zheng Shao
> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
> Key: HIVE-263
> URL: https://issues.apache.org/jira/browse/HIVE-263
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Attachments: HIVE-263.1.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
> String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
> String row;
> try {
> row = Text.decode(buf, 0, length);
> } catch (CharacterCodingException e) {
> throw new RuntimeException(e);
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-263) TCTLSeparatedProtocol should use UTF-8
to decode the data
Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669825#action_12669825 ]
Joydeep Sen Sarma commented on HIVE-263:
----------------------------------------
+1
the only thing that concerns me is that if any row does not conform to utf-8 - then the whole job fails. earlier we had tried to have a setup that the serde throws a serdeexception and we deal with it in query layer (we can ignore some fixed % of bad rows for example).
but looking at the code - this might be hard to do right now - so happy to wait for a rewrite for this :-)
> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
> Key: HIVE-263
> URL: https://issues.apache.org/jira/browse/HIVE-263
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Attachments: HIVE-263.1.patch, HIVE-263.2.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
> String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
> String row;
> try {
> row = Text.decode(buf, 0, length);
> } catch (CharacterCodingException e) {
> throw new RuntimeException(e);
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-263) TCTLSeparatedProtocol should use UTF-8
to decode the data
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao resolved HIVE-263.
-----------------------------
Resolution: Fixed
Fix Version/s: 0.2.0
Release Note: HIVE-263. TCTLSeparatedProtocol should use UTF-8 to encode/decode the data. (zshao)
Hadoop Flags: [Reviewed]
Committed revision 740187. (for trunk)
Committed revision 740188. (for branch 0.2)
> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
> Key: HIVE-263
> URL: https://issues.apache.org/jira/browse/HIVE-263
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Fix For: 0.2.0
>
> Attachments: HIVE-263.1.patch, HIVE-263.2.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from hadoop Text class:
> Now:
> {code}
> String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
> String row;
> try {
> row = Text.decode(buf, 0, length);
> } catch (CharacterCodingException e) {
> throw new RuntimeException(e);
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.