You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Thanh Do <do...@gmail.com> on 2015/08/06 23:12:22 UTC

Read/Write Logical Type Information

Hi all,

>From the documentation, I understand that Parquet supports a small number
of primitive types and it is up to the reader to interpret these primitive
types to a potentially broader logical types.

Indeed, ConvertedType annotations can be use do specify such
interpretation. According to the documentation (
http://parquet.apache.org/documentation/latest/):  "Annotations are stored
as a ConvertedType in the file metadata"

But looking at the FileMetaData.java code (
http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-format/2.2.0/parquet/format/FileMetaData.java#FileMetaData._Fields
)

I cannot not find an API to get the annotation information.

Am I missing something here? How do I set/get these annotations?

Regards,
Thanh

Re: Read/Write Logical Type Information

Posted by Ryan Blue <bl...@cloudera.com>.
Logical types are more for the object models than they are for users, 
but the nice thing is that they are optional. So if an object model 
can't support a type, the user can get the underlying data and still use it.

For example, Thrift doesn't have date/time types. So the object model 
can only return the underlying data for a user to convert to a day. 
Avro, on the other hand, is about to release support for date/time types 
and the support in Parquet will implement the same conversions.

One thing we don't do very well is allow users to annotate types when 
they're writing. We should be looking into that pretty soon.

rb

On 08/10/2015 07:32 AM, Thanh Do wrote:
> Thanks Julien! Got it.
>
> A follow up question. Are logical type annotations supposed to be hints? I
> mean, if some users generate a Parquet file using Hive (via external table
> mechanism), then consume it using Impala (again, through external table),
> should there be some standardized annotations between the two systems
> right? Or the users are responsible for creating the correct schema types
> that map correctly with Parquet primitive types, regardless of  the
> annotations?
>
> Thanh
>
> On Thu, Aug 6, 2015 at 5:11 PM, Julien Le Dem <ju...@twitter.com.invalid>
> wrote:
>
>> FileMetadata.schema is a list of SchemaElements
>> SchemaElement.converted_type contains the annotation
>> If you use parquet-mr to access the schema, look at the originalType field:
>>
>> https://github.com/apache/parquet-mr/blob/2f956f46580e5b4752173e885d37a20fe31a78d8/parquet-column/src/main/java/org/apache/parquet/schema/Type.java#L113
>>
>> On Thu, Aug 6, 2015 at 2:12 PM, Thanh Do <do...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>>  From the documentation, I understand that Parquet supports a small number
>>> of primitive types and it is up to the reader to interpret these
>> primitive
>>> types to a potentially broader logical types.
>>>
>>> Indeed, ConvertedType annotations can be use do specify such
>>> interpretation. According to the documentation (
>>> http://parquet.apache.org/documentation/latest/):  "Annotations are
>> stored
>>> as a ConvertedType in the file metadata"
>>>
>>> But looking at the FileMetaData.java code (
>>>
>>>
>> http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-format/2.2.0/parquet/format/FileMetaData.java#FileMetaData._Fields
>>> )
>>>
>>> I cannot not find an API to get the annotation information.
>>>
>>> Am I missing something here? How do I set/get these annotations?
>>>
>>> Regards,
>>> Thanh
>>>
>>
>


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: Read/Write Logical Type Information

Posted by Thanh Do <do...@gmail.com>.
Thanks Julien! Got it.

A follow up question. Are logical type annotations supposed to be hints? I
mean, if some users generate a Parquet file using Hive (via external table
mechanism), then consume it using Impala (again, through external table),
should there be some standardized annotations between the two systems
right? Or the users are responsible for creating the correct schema types
that map correctly with Parquet primitive types, regardless of  the
annotations?

Thanh

On Thu, Aug 6, 2015 at 5:11 PM, Julien Le Dem <ju...@twitter.com.invalid>
wrote:

> FileMetadata.schema is a list of SchemaElements
> SchemaElement.converted_type contains the annotation
> If you use parquet-mr to access the schema, look at the originalType field:
>
> https://github.com/apache/parquet-mr/blob/2f956f46580e5b4752173e885d37a20fe31a78d8/parquet-column/src/main/java/org/apache/parquet/schema/Type.java#L113
>
> On Thu, Aug 6, 2015 at 2:12 PM, Thanh Do <do...@gmail.com> wrote:
>
> > Hi all,
> >
> > From the documentation, I understand that Parquet supports a small number
> > of primitive types and it is up to the reader to interpret these
> primitive
> > types to a potentially broader logical types.
> >
> > Indeed, ConvertedType annotations can be use do specify such
> > interpretation. According to the documentation (
> > http://parquet.apache.org/documentation/latest/):  "Annotations are
> stored
> > as a ConvertedType in the file metadata"
> >
> > But looking at the FileMetaData.java code (
> >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-format/2.2.0/parquet/format/FileMetaData.java#FileMetaData._Fields
> > )
> >
> > I cannot not find an API to get the annotation information.
> >
> > Am I missing something here? How do I set/get these annotations?
> >
> > Regards,
> > Thanh
> >
>

Re: Read/Write Logical Type Information

Posted by Julien Le Dem <ju...@twitter.com.INVALID>.
FileMetadata.schema is a list of SchemaElements
SchemaElement.converted_type contains the annotation
If you use parquet-mr to access the schema, look at the originalType field:
https://github.com/apache/parquet-mr/blob/2f956f46580e5b4752173e885d37a20fe31a78d8/parquet-column/src/main/java/org/apache/parquet/schema/Type.java#L113

On Thu, Aug 6, 2015 at 2:12 PM, Thanh Do <do...@gmail.com> wrote:

> Hi all,
>
> From the documentation, I understand that Parquet supports a small number
> of primitive types and it is up to the reader to interpret these primitive
> types to a potentially broader logical types.
>
> Indeed, ConvertedType annotations can be use do specify such
> interpretation. According to the documentation (
> http://parquet.apache.org/documentation/latest/):  "Annotations are stored
> as a ConvertedType in the file metadata"
>
> But looking at the FileMetaData.java code (
>
> http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-format/2.2.0/parquet/format/FileMetaData.java#FileMetaData._Fields
> )
>
> I cannot not find an API to get the annotation information.
>
> Am I missing something here? How do I set/get these annotations?
>
> Regards,
> Thanh
>