You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Nitin Pawar <ni...@gmail.com> on 2013/03/29 06:16:04 UTC

Optimizing hive queries

Hi,

Here is is a nice presentation from Owen from Hortonworks on "Optimizing
hive queries"

http://www.slideshare.net/oom65/optimize-hivequeriespptx



Thanks,
Nitin Pawar

Re: Optimizing hive queries

Posted by Nitin Pawar <ni...@gmail.com>.

I could just find this link
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/orcfile.html

according to this, the metadata is handled by protobuf which allows of
adding/removing fields.


On Fri, Mar 29, 2013 at 10:55 AM, Jagat Singh <ja...@gmail.com> wrote:

> Hello Nitin,
>
> Thanks for sharing.
>
> Do we have more details on
>
> Versioned metadata feature of ORC ? , is it like handling varying schemas
> in Hive?
>
> Regards,
>
> Jagat Singh
>
>
>
> On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar <ni...@gmail.com>wrote:
>
>>
>> Hi,
>>
>> Here is is a nice presentation from Owen from Hortonworks on "Optimizing
>> hive queries"
>>
>> http://www.slideshare.net/oom65/optimize-hivequeriespptx
>>
>>
>>
>> Thanks,
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: Optimizing hive queries

Posted by Owen O'Malley <om...@apache.org>.

On Thu, Mar 28, 2013 at 11:08 PM, Jagat Singh <ja...@gmail.com> wrote:

> Hello Owen,
>
> Thanks for your reply.
>
> I am seeing its providing the advantage which Avro provided , of adding
> and removing fields.
>

ORC files like Avro files are self-describing. They include the type
structure of the records in the metadata of the file. It will take more
integration work with hive to make the schemas very flexible with ORC.


> Can you please write some sample code for hive table which is partitioned
> and each partitioned has different schema.
>

As with all tables:

create table people (first_name string, last_name string) partitioned by
(state string);
load data local inpath 'part-0' overwrite into table people partition
(state='ca');
alter table people add columns (address string);
load data local inpath 'part-1' overwrite into table people partition
(state='nv');

You'll end up with the first partition with 2 columns (and thus implicitly
the third one is null) and the second partition with 3 columns.

-- Owen



>
> I tried searching but could not find any example.
>
> Thanks in advance for your help.
>
> Regards,
>
> Jagat Singh
>
>
> On Fri, Mar 29, 2013 at 4:48 PM, Owen O'Malley <om...@apache.org> wrote:
>
>> Actually, Hive already has the ability to have different schemas for
>> different partitions. (Although of course it would be nice to have the
>> alter table be more flexible!)
>>
>> The "versioned metadata" means that the ORC file's metadata is stored in
>> ProtoBufs so that we can add (or remove) fields to the metadata. That means
>> that for some changes to ORC file format we can provide both forward and
>> backward compatibility.
>>
>> -- Owen
>>
>>
>> On Thu, Mar 28, 2013 at 10:25 PM, Jagat Singh <ja...@gmail.com>wrote:
>>
>>> Hello Nitin,
>>>
>>> Thanks for sharing.
>>>
>>> Do we have more details on
>>>
>>> Versioned metadata feature of ORC ? , is it like handling varying
>>> schemas in Hive?
>>>
>>> Regards,
>>>
>>> Jagat Singh
>>>
>>>
>>>
>>> On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar <ni...@gmail.com>wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> Here is is a nice presentation from Owen from Hortonworks on
>>>> "Optimizing hive queries"
>>>>
>>>> http://www.slideshare.net/oom65/optimize-hivequeriespptx
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>

Re: Optimizing hive queries

Posted by Jagat Singh <ja...@gmail.com>.

Hello Owen,

Thanks for your reply.

I am seeing its providing the advantage which Avro provided , of adding and
removing fields.

Can you please write some sample code for hive table which is partitioned
and each partitioned has different schema.

I tried searching but could not find any example.

Thanks in advance for your help.

Regards,

Jagat Singh

On Fri, Mar 29, 2013 at 4:48 PM, Owen O'Malley <om...@apache.org> wrote:

> Actually, Hive already has the ability to have different schemas for
> different partitions. (Although of course it would be nice to have the
> alter table be more flexible!)
>
> The "versioned metadata" means that the ORC file's metadata is stored in
> ProtoBufs so that we can add (or remove) fields to the metadata. That means
> that for some changes to ORC file format we can provide both forward and
> backward compatibility.
>
> -- Owen
>
>
> On Thu, Mar 28, 2013 at 10:25 PM, Jagat Singh <ja...@gmail.com>wrote:
>
>> Hello Nitin,
>>
>> Thanks for sharing.
>>
>> Do we have more details on
>>
>> Versioned metadata feature of ORC ? , is it like handling varying schemas
>> in Hive?
>>
>> Regards,
>>
>> Jagat Singh
>>
>>
>>
>> On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar <ni...@gmail.com>wrote:
>>
>>>
>>> Hi,
>>>
>>> Here is is a nice presentation from Owen from Hortonworks on "Optimizing
>>> hive queries"
>>>
>>> http://www.slideshare.net/oom65/optimize-hivequeriespptx
>>>
>>>
>>>
>>> Thanks,
>>> Nitin Pawar
>>>
>>
>>
>

Re: Optimizing hive queries

Posted by Owen O'Malley <om...@apache.org>.

Actually, Hive already has the ability to have different schemas for
different partitions. (Although of course it would be nice to have the
alter table be more flexible!)

The "versioned metadata" means that the ORC file's metadata is stored in
ProtoBufs so that we can add (or remove) fields to the metadata. That means
that for some changes to ORC file format we can provide both forward and
backward compatibility.

-- Owen

On Thu, Mar 28, 2013 at 10:25 PM, Jagat Singh <ja...@gmail.com> wrote:

> Hello Nitin,
>
> Thanks for sharing.
>
> Do we have more details on
>
> Versioned metadata feature of ORC ? , is it like handling varying schemas
> in Hive?
>
> Regards,
>
> Jagat Singh
>
>
>
> On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar <ni...@gmail.com>wrote:
>
>>
>> Hi,
>>
>> Here is is a nice presentation from Owen from Hortonworks on "Optimizing
>> hive queries"
>>
>> http://www.slideshare.net/oom65/optimize-hivequeriespptx
>>
>>
>>
>> Thanks,
>> Nitin Pawar
>>
>
>

Re: Optimizing hive queries

Posted by Jagat Singh <ja...@gmail.com>.

Hello Nitin,

Thanks for sharing.

Do we have more details on

Versioned metadata feature of ORC ? , is it like handling varying schemas
in Hive?

Regards,

Jagat Singh

On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar <ni...@gmail.com>wrote:

>
> Hi,
>
> Here is is a nice presentation from Owen from Hortonworks on "Optimizing
> hive queries"
>
> http://www.slideshare.net/oom65/optimize-hivequeriespptx
>
>
>
> Thanks,
> Nitin Pawar
>