You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Felipe Aramburu <fe...@blazingdb.com> on 2017/06/08 14:34:18 UTC

Documentation on Relationship between Logical and Physical Types

I was playing around with some Parquet files that were generated using
Apache Drill and I as I look at the ColumnDescriptors that one of the
columns has a logical type LogicalType::None and a physical type of
Type::Int32.

Is it normal for this to happen. When something is of type none can that
mean and the ColumnDescriptor's node  is_primitive()  function returns true
does that mean I can ignore the logical type and just look at the primitive
type to know how to interpret the data?

Felipe

ᐧ

Re: Documentation on Relationship between Logical and Physical Types

Posted by Wes McKinney <we...@gmail.com>.
I think the first option is the best, and basically what what we did for
Apache Arrow in

https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/schema.cc

Look at the FromPrimitive function

If the LogicalType is None, then the logical type is whatever the physical
type is. For example:

static Status FromInt32(const PrimitiveNode* node, TypePtr* out) {
  switch (node->logical_type()) {
    case LogicalType::NONE:
      *out = ::arrow::int32();
      break;
    case LogicalType::UINT_8:
      *out = ::arrow::uint8();
      break;
    case LogicalType::INT_8:
      *out = ::arrow::int8();
      break;
<SNIP>
etc.

On Thu, Jun 8, 2017 at 10:45 AM, Felipe Aramburu <fe...@blazingdb.com>
wrote:

> Ok to be clear.
>
> Would you say the safest behaviour is.
>
> 1. check for a logical type
>
> 2. if set to none check for a physical type
>
>
> or is it
>
>
> 1. check if the node has is_primitive() set to true
>
> 2. if true use physical type, if false use logical type
>
> Felipe
>
> On Thu, Jun 8, 2017 at 9:40 AM, Wes McKinney <we...@gmail.com> wrote:
>
> > hi Felipe,
> >
> > Yes, that's right. For primitive types it is typical for the
> > LogicalType to be not set in the Thrift metadata. The particular
> > integer logical types were added relatively late to the Parquet format
> > and are not used in all implementations (for example, some databases
> > like Hive and Impala have their own metastores which are used together
> > with Parquet files to cast to the appropriate runtime type, like
> > smallint or tinyint)
> >
> > - Wes
> >
> > On Thu, Jun 8, 2017 at 10:34 AM, Felipe Aramburu <fe...@blazingdb.com>
> > wrote:
> > > I was playing around with some Parquet files that were generated using
> > > Apache Drill and I as I look at the ColumnDescriptors that one of the
> > > columns has a logical type LogicalType::None and a physical type of
> > > Type::Int32.
> > >
> > > Is it normal for this to happen. When something is of type none can
> that
> > > mean and the ColumnDescriptor's node  is_primitive()  function returns
> > true
> > > does that mean I can ignore the logical type and just look at the
> > primitive
> > > type to know how to interpret the data?
> > >
> > > Felipe
> > >
> > > ᐧ
> >
>

Re: Documentation on Relationship between Logical and Physical Types

Posted by Felipe Aramburu <fe...@blazingdb.com>.
Ok to be clear.

Would you say the safest behaviour is.

1. check for a logical type

2. if set to none check for a physical type


or is it


1. check if the node has is_primitive() set to true

2. if true use physical type, if false use logical type

Felipe

On Thu, Jun 8, 2017 at 9:40 AM, Wes McKinney <we...@gmail.com> wrote:

> hi Felipe,
>
> Yes, that's right. For primitive types it is typical for the
> LogicalType to be not set in the Thrift metadata. The particular
> integer logical types were added relatively late to the Parquet format
> and are not used in all implementations (for example, some databases
> like Hive and Impala have their own metastores which are used together
> with Parquet files to cast to the appropriate runtime type, like
> smallint or tinyint)
>
> - Wes
>
> On Thu, Jun 8, 2017 at 10:34 AM, Felipe Aramburu <fe...@blazingdb.com>
> wrote:
> > I was playing around with some Parquet files that were generated using
> > Apache Drill and I as I look at the ColumnDescriptors that one of the
> > columns has a logical type LogicalType::None and a physical type of
> > Type::Int32.
> >
> > Is it normal for this to happen. When something is of type none can that
> > mean and the ColumnDescriptor's node  is_primitive()  function returns
> true
> > does that mean I can ignore the logical type and just look at the
> primitive
> > type to know how to interpret the data?
> >
> > Felipe
> >
> > ᐧ
>

Re: Documentation on Relationship between Logical and Physical Types

Posted by Wes McKinney <we...@gmail.com>.
hi Felipe,

Yes, that's right. For primitive types it is typical for the
LogicalType to be not set in the Thrift metadata. The particular
integer logical types were added relatively late to the Parquet format
and are not used in all implementations (for example, some databases
like Hive and Impala have their own metastores which are used together
with Parquet files to cast to the appropriate runtime type, like
smallint or tinyint)

- Wes

On Thu, Jun 8, 2017 at 10:34 AM, Felipe Aramburu <fe...@blazingdb.com> wrote:
> I was playing around with some Parquet files that were generated using
> Apache Drill and I as I look at the ColumnDescriptors that one of the
> columns has a logical type LogicalType::None and a physical type of
> Type::Int32.
>
> Is it normal for this to happen. When something is of type none can that
> mean and the ColumnDescriptor's node  is_primitive()  function returns true
> does that mean I can ignore the logical type and just look at the primitive
> type to know how to interpret the data?
>
> Felipe
>
> ᐧ