You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Ashish Thusoo (JIRA)" <ji...@apache.org> on 2008/09/06 09:02:46 UTC

[jira] Commented: (HADOOP-4085) internationalization support and sort order (ascedning/descending) support in create table

    [ https://issues.apache.org/jira/browse/HADOOP-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628843#action_12628843 ] 

Ashish Thusoo commented on HADOOP-4085:
---------------------------------------

Comments are below. The most major one is about how we are treating character set name in the grammar. Ideally we would want this to an identifier instead of token (similar to table name identifiers). With that approach we would be able to support any kinds of character sets very easily.

Inline Comments:
cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java:85: nitpick - Can we follow the convention of having the opening brace on the same line as the code.
ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:781: Instead of having fixed tokens per character set in the grammar, we should define a character-set identifier and pass that across to the java calls. That is much more scalable and would get us to seamlessly be able to support any character sets supported by the java run time.

 http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html 

has information on what can be grammar rules to determine the character set name and how new charactersets can be added to the JVM by CharactersetProvider. 
So the rule for the character set could look something like

 charSetStringLiteral : charSetIdentifier StringLiteral charSetIdentifier can be defined in terms of the rules mentioned in the link above.

ql/src/test/queries/clientpositive/inputddl4.q:0: Lets put a brief comment in this describing what this actually tests.
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java:157: nitpick - maybe we should call this PREFIX and not SAME
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java:143: Should this not check across all sort columns instead of bucket columns? Is this a bug?
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:384: This function hardcodes the terminating character and the field delimiters while in the current code these are parameterized which is better as later we want to drive them through session level properties.

> internationalization support and sort order (ascedning/descending) support in create table
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4085
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4085
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: patch1
>
>
> User cannot specify utf8 strings in the query, both for selection and filtering. Mysql syntax should be followed: 
> select _utf8 'string' from <TableName>
> select <selectExpr> from <TableName> where col = _utf8 0x<HexValue>
> To start with, utf8 strings should be supported. Support for other character sets can be added in the future on demand.
> The identifiers (table name/column name etc.) cannot be utf8 strings, it is only for the data values.
> Although, in create table, the user has the option of specifying sorted columns, he does not have the option of specifying whether they are ascending or descending.
> Create Table syntax should be enhanced to support that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.