You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Mohammad Islam <mi...@yahoo.com.INVALID> on 2015/10/09 03:06:35 UTC

Hive+Parquet : auto-type promotion

Hi,
In Hive+parquet, auto type promotion (short->int->bigint) is not supported. Few questions:
1. Does Parquet support this type of auto type widening?
2. is it being supported in pig or any similar project?


I'm considering this  for Hive. Other data-formats already support this in Hive. ORC recently added the support as well.
I'm not expert in Parquet.

Can someone please give some idea? Some relevant pointers? 
One dev tried it (https://issues.apache.org/jira/browse/HIVE-6784), but could not push due to performance ground.

Regards,
Mohammad

Re: Hive+Parquet : auto-type promotion

Posted by Daniel Weeks <dw...@netflix.com.INVALID>.
I believe pig has this already:
https://issues.apache.org/jira/browse/PARQUET-2

-Dan

On Fri, Oct 9, 2015 at 9:59 AM, Sergio Pena <se...@cloudera.com>
wrote:

> Hey Ryan, Mohammad
>
> I just left a comment on dev@hive.apache.org regarding this. It may be
> possible to do such promotion in Hive. I don't know if Pig does such
> promotion.
>
> Here's the comment left on dev@hive.apache.org:
> "
> I reviewed the HIVE-6784 patch and it has performance penalties because the
> way is implemented.
>
> I think we can do the promotion in a different way to avoid such penalties.
> If you take a look at ParquetStringInspector.java, this class gets a String
> value from different string writables (BytesWritable, Text or String). We
> may do something like that to return a Long from Integer and Short types.
> However, I am a little worried about the little overhead we will add by
> checking the writable type with 'instance of' everytime we get the value.
>
> I'll review the object inspector and ETypeConverter.java (parquet type
> coverter) to see if there's a better way to do the promotion.
> "
>
> I'll do my investigation, and see the best way to do it.
>
> On Fri, Oct 9, 2015 at 11:27 AM, Ryan Blue <bl...@cloudera.com> wrote:
>
> > Mohammad,
> >
> > This is definitely something that I think we should support. Sergio is
> the
> > expert on the Hive implementation and can correct me, but I think we
> should
> > be able to support this in the SerDe fairly easily by adding promotion
> > methods to the object inspectors.
> >
> > Sergio, what do you think needs to be done here?
> >
> > rb
> >
> >
> > On 10/08/2015 06:06 PM, Mohammad Islam wrote:
> >
> >> Hi,
> >> In Hive+parquet, auto type promotion (short->int->bigint) is not
> >> supported. Few questions:
> >> 1. Does Parquet support this type of auto type widening?
> >> 2. is it being supported in pig or any similar project?
> >>
> >>
> >> I'm considering this  for Hive. Other data-formats already support this
> >> in Hive. ORC recently added the support as well.
> >> I'm not expert in Parquet.
> >>
> >> Can someone please give some idea? Some relevant pointers?
> >> One dev tried it (https://issues.apache.org/jira/browse/HIVE-6784), but
> >> could not push due to performance ground.
> >>
> >> Regards,
> >> Mohammad
> >>
> >>
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Cloudera, Inc.
> >
>

Hive+Parquet : auto-type promotion

Posted by Mohammad Islam <mi...@yahoo.com.INVALID>.
Hi,
In Hive+parquet, auto type promotion (short->int->bigint) is not supported. Few questions:
1. Does Parquet support this type of auto type widening?
2. is it being supported in pig or any similar project?


I'm considering this  for Hive. Other data-formats already support this in Hive. ORC recently added the support as well.
I'm not expert in Parquet.

Can someone please give some idea? Some relevant pointers? 
One dev tried it (https://issues.apache.org/jira/browse/HIVE-6784), but could not push due to performance ground.

Regards,
Mohammad

Re: Hive+Parquet : auto-type promotion

Posted by Sergio Pena <se...@cloudera.com>.
Hey Ryan, Mohammad

I just left a comment on dev@hive.apache.org regarding this. It may be
possible to do such promotion in Hive. I don't know if Pig does such
promotion.

Here's the comment left on dev@hive.apache.org:
"
I reviewed the HIVE-6784 patch and it has performance penalties because the
way is implemented.

I think we can do the promotion in a different way to avoid such penalties.
If you take a look at ParquetStringInspector.java, this class gets a String
value from different string writables (BytesWritable, Text or String). We
may do something like that to return a Long from Integer and Short types.
However, I am a little worried about the little overhead we will add by
checking the writable type with 'instance of' everytime we get the value.

I'll review the object inspector and ETypeConverter.java (parquet type
coverter) to see if there's a better way to do the promotion.
"

I'll do my investigation, and see the best way to do it.

On Fri, Oct 9, 2015 at 11:27 AM, Ryan Blue <bl...@cloudera.com> wrote:

> Mohammad,
>
> This is definitely something that I think we should support. Sergio is the
> expert on the Hive implementation and can correct me, but I think we should
> be able to support this in the SerDe fairly easily by adding promotion
> methods to the object inspectors.
>
> Sergio, what do you think needs to be done here?
>
> rb
>
>
> On 10/08/2015 06:06 PM, Mohammad Islam wrote:
>
>> Hi,
>> In Hive+parquet, auto type promotion (short->int->bigint) is not
>> supported. Few questions:
>> 1. Does Parquet support this type of auto type widening?
>> 2. is it being supported in pig or any similar project?
>>
>>
>> I'm considering this  for Hive. Other data-formats already support this
>> in Hive. ORC recently added the support as well.
>> I'm not expert in Parquet.
>>
>> Can someone please give some idea? Some relevant pointers?
>> One dev tried it (https://issues.apache.org/jira/browse/HIVE-6784), but
>> could not push due to performance ground.
>>
>> Regards,
>> Mohammad
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>