You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Mohammad Islam <mi...@yahoo.com.INVALID> on 2015/10/09 03:06:35 UTC
Hive+Parquet : auto-type promotion
Hi,
In Hive+parquet, auto type promotion (short->int->bigint) is not supported. Few questions:
1. Does Parquet support this type of auto type widening?
2. is it being supported in pig or any similar project?
I'm considering this for Hive. Other data-formats already support this in Hive. ORC recently added the support as well.
I'm not expert in Parquet.
Can someone please give some idea? Some relevant pointers?
One dev tried it (https://issues.apache.org/jira/browse/HIVE-6784), but could not push due to performance ground.
Regards,
Mohammad
Re: Hive+Parquet : auto-type promotion
Posted by Daniel Weeks <dw...@netflix.com.INVALID>.
I believe pig has this already:
https://issues.apache.org/jira/browse/PARQUET-2
-Dan
On Fri, Oct 9, 2015 at 9:59 AM, Sergio Pena <se...@cloudera.com>
wrote:
> Hey Ryan, Mohammad
>
> I just left a comment on dev@hive.apache.org regarding this. It may be
> possible to do such promotion in Hive. I don't know if Pig does such
> promotion.
>
> Here's the comment left on dev@hive.apache.org:
> "
> I reviewed the HIVE-6784 patch and it has performance penalties because the
> way is implemented.
>
> I think we can do the promotion in a different way to avoid such penalties.
> If you take a look at ParquetStringInspector.java, this class gets a String
> value from different string writables (BytesWritable, Text or String). We
> may do something like that to return a Long from Integer and Short types.
> However, I am a little worried about the little overhead we will add by
> checking the writable type with 'instance of' everytime we get the value.
>
> I'll review the object inspector and ETypeConverter.java (parquet type
> coverter) to see if there's a better way to do the promotion.
> "
>
> I'll do my investigation, and see the best way to do it.
>
> On Fri, Oct 9, 2015 at 11:27 AM, Ryan Blue <bl...@cloudera.com> wrote:
>
> > Mohammad,
> >
> > This is definitely something that I think we should support. Sergio is
> the
> > expert on the Hive implementation and can correct me, but I think we
> should
> > be able to support this in the SerDe fairly easily by adding promotion
> > methods to the object inspectors.
> >
> > Sergio, what do you think needs to be done here?
> >
> > rb
> >
> >
> > On 10/08/2015 06:06 PM, Mohammad Islam wrote:
> >
> >> Hi,
> >> In Hive+parquet, auto type promotion (short->int->bigint) is not
> >> supported. Few questions:
> >> 1. Does Parquet support this type of auto type widening?
> >> 2. is it being supported in pig or any similar project?
> >>
> >>
> >> I'm considering this for Hive. Other data-formats already support this
> >> in Hive. ORC recently added the support as well.
> >> I'm not expert in Parquet.
> >>
> >> Can someone please give some idea? Some relevant pointers?
> >> One dev tried it (https://issues.apache.org/jira/browse/HIVE-6784), but
> >> could not push due to performance ground.
> >>
> >> Regards,
> >> Mohammad
> >>
> >>
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Cloudera, Inc.
> >
>
Hive+Parquet : auto-type promotion
Posted by Mohammad Islam <mi...@yahoo.com.INVALID>.
Hi,
In Hive+parquet, auto type promotion (short->int->bigint) is not supported. Few questions:
1. Does Parquet support this type of auto type widening?
2. is it being supported in pig or any similar project?
I'm considering this for Hive. Other data-formats already support this in Hive. ORC recently added the support as well.
I'm not expert in Parquet.
Can someone please give some idea? Some relevant pointers?
One dev tried it (https://issues.apache.org/jira/browse/HIVE-6784), but could not push due to performance ground.
Regards,
Mohammad
Re: Hive+Parquet : auto-type promotion
Posted by Sergio Pena <se...@cloudera.com>.
Hey Ryan, Mohammad
I just left a comment on dev@hive.apache.org regarding this. It may be
possible to do such promotion in Hive. I don't know if Pig does such
promotion.
Here's the comment left on dev@hive.apache.org:
"
I reviewed the HIVE-6784 patch and it has performance penalties because the
way is implemented.
I think we can do the promotion in a different way to avoid such penalties.
If you take a look at ParquetStringInspector.java, this class gets a String
value from different string writables (BytesWritable, Text or String). We
may do something like that to return a Long from Integer and Short types.
However, I am a little worried about the little overhead we will add by
checking the writable type with 'instance of' everytime we get the value.
I'll review the object inspector and ETypeConverter.java (parquet type
coverter) to see if there's a better way to do the promotion.
"
I'll do my investigation, and see the best way to do it.
On Fri, Oct 9, 2015 at 11:27 AM, Ryan Blue <bl...@cloudera.com> wrote:
> Mohammad,
>
> This is definitely something that I think we should support. Sergio is the
> expert on the Hive implementation and can correct me, but I think we should
> be able to support this in the SerDe fairly easily by adding promotion
> methods to the object inspectors.
>
> Sergio, what do you think needs to be done here?
>
> rb
>
>
> On 10/08/2015 06:06 PM, Mohammad Islam wrote:
>
>> Hi,
>> In Hive+parquet, auto type promotion (short->int->bigint) is not
>> supported. Few questions:
>> 1. Does Parquet support this type of auto type widening?
>> 2. is it being supported in pig or any similar project?
>>
>>
>> I'm considering this for Hive. Other data-formats already support this
>> in Hive. ORC recently added the support as well.
>> I'm not expert in Parquet.
>>
>> Can someone please give some idea? Some relevant pointers?
>> One dev tried it (https://issues.apache.org/jira/browse/HIVE-6784), but
>> could not push due to performance ground.
>>
>> Regards,
>> Mohammad
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>