You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@orc.apache.org by Owen O'Malley <om...@apache.org> on 2016/10/06 20:48:57 UTC

Re: What characters are allowed in a column name in ORC file?

ORC itself can handle any UTF-8 characters in the column names, but the
type name parser made too many assumptions about valid characters in the
field names. I've created a new jira
https://issues.apache.org/jira/browse/ORC-104 to address the problem.

.. Owen


On Fri, Sep 23, 2016 at 12:34 PM, Manoj Narayanan <manoj.narayanan@gmail.com
> wrote:

> Looks like Hive is allowing any Unicode character since 0.13. as per Hive
>    Documentation at https://cwiki.apache.org/co
> nfluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable.
> Column names can be surrounded by backtick (`)
>
> But proabably ORC is allowing only alphanumerics, period and underscore.
> Could some one please confirm that? If so, is there a plan to support other
> characters in ORC?
>
> I am using TypeDescription::fromString to generate a schema and I tried
> using a field name with '-' (hyphen) in it. I got an exception with this
> trace.
> java.lang.IllegalArgumentException: Missing required char ':' at
> 'struct<before^-after:string>'
>
> at org.apache.orc.TypeDescription.requireChar(TypeDescription.java:259)
> at org.apache.orc.TypeDescription.parseStruct(TypeDescription.java:286)
> at org.apache.orc.TypeDescription.parseType(TypeDescription.java:338)
> at org.apache.orc.TypeDescription.fromString(TypeDescription.java:359)
>
> Looking at this code at https://github.com/apache/orc/
> blob/master/java/core/src/java/org/apache/orc/TypeDescription.java#L241,
>  it seems only alphanumeric, period and underscores are supported in
> column names.
>
> While in hive, I could create a table with columns containing '-' (hyphen)
> when they are surrounded by backtick.
> hive> create table table_with_hyphen ( `hyphen-inbetween` string) stored
> as ORC;
> OK
> Time taken: 0.075 seconds
>
> Schema for this orc file came out like this. Via a call
> to org.apache.orc.Reader::getSchema()::getJson()
>
> {"category": "struct", "id": 0, "max": 1, "fields": [
>   "_col0": {"category": "string", "id": 1, "max": 1}]}
>
> Thanks,
> Manoj
>