You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Prasanth Jayachandran <pj...@hortonworks.com> on 2014/07/22 20:28:10 UTC

Re: hive 13: dynamic partition inserts

Hi Vishnu

Yes. There is change in the way dynamic partitions are inserted in hive 13. The new dynamic partitioning is highly scalable and uses very less memory. Here is the related JIRA  https://issues.apache.org/jira/browse/HIVE-6455. 

Setting "hive.optimize.sort.dynamic.partition" to false will fallback to old way of insertion. If your destination table uses columnar formats like ORC, Parquet etc. then it makes sense leave the optimization ON, as columnar formats needs some buffer space for each column before flushing to disk. Buffer space (runtime memory) will quickly shoot up when there are lots of partition column values and columns. HIVE-6455 addresses this issue.

Thanks
Prasanth Jayachandran

On Jul 22, 2014, at 10:51 AM, Gajendran, Vishnu <vi...@amazon.com> wrote:

> adding user@hive.apache.org for wider audience
> From: Gajendran, Vishnu
> Sent: Tuesday, July 22, 2014 10:42 AM
> To: dev@hive.apache.org
> Subject: hive 13: dynamic partition inserts
> 
> Hello,
> 
> I am seeing a difference between hive 11 and hive 13 when inserting to a table with dynamic partitions.
> 
> In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic partition insert, I see number of files (generated my each mapper) in the specified hdfs location as expected. But, in Hive 13, when I set hive.merge.mapfiles=false, I just see one file in specified hdfs location for the same query. I think hive is not honoring the hive.merge.mapfiles parameter and it merged all the mapper outputs to a single file.
> 
> In Hive 11, 19 mappers were executed for the dynamic partition insert task. But in Hive 13, 19 mappers and 2 reducers were executed.
> 
> When I checked the query plan for hive 11, there is only a map operator task for dynamic partition insert. But, in hive 13, I see both map operator and reduce operator task.
> 
> Is there any changes in hive 13 regarding dymamic partition inserts? Any comments on this issue is greatly appreciated.
> 
> Thanks,
> vishnu


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: hive 13: dynamic partition inserts

Posted by "Gajendran, Vishnu" <vi...@amazon.com>.
Hi Prasanth,

 Thanks a lot for your quick response.
________________________________
From: Gajendran, Vishnu
Sent: Tuesday, July 22, 2014 11:47 AM
To: user@hive.apache.org
Cc: dev@hive.apache.org
Subject: RE: hive 13: dynamic partition inserts

Hi Prasanth,

 Thanks a lot for your quick response.
________________________________
From: Prasanth Jayachandran [pjayachandran@hortonworks.com]
Sent: Tuesday, July 22, 2014 11:28 AM
To: user@hive.apache.org
Cc: dev@hive.apache.org
Subject: Re: hive 13: dynamic partition inserts

Hi Vishnu

Yes. There is change in the way dynamic partitions are inserted in hive 13. The new dynamic partitioning is highly scalable and uses very less memory. Here is the related JIRA  https://issues.apache.org/jira/browse/HIVE-6455.

Setting "hive.optimize.sort.dynamic.partition" to false will fallback to old way of insertion. If your destination table uses columnar formats like ORC, Parquet etc. then it makes sense leave the optimization ON, as columnar formats needs some buffer space for each column before flushing to disk. Buffer space (runtime memory) will quickly shoot up when there are lots of partition column values and columns. HIVE-6455 addresses this issue.

Thanks
Prasanth Jayachandran

On Jul 22, 2014, at 10:51 AM, Gajendran, Vishnu <vi...@amazon.com>> wrote:

adding user@hive.apache.org<ma...@hive.apache.org> for wider audience
________________________________
From: Gajendran, Vishnu
Sent: Tuesday, July 22, 2014 10:42 AM
To: dev@hive.apache.org<ma...@hive.apache.org>
Subject: hive 13: dynamic partition inserts

Hello,

I am seeing a difference between hive 11 and hive 13 when inserting to a table with dynamic partitions.

In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic partition insert, I see number of files (generated my each mapper) in the specified hdfs location as expected. But, in Hive 13, when I set hive.merge.mapfiles=false, I just see one file in specified hdfs location for the same query. I think hive is not honoring the hive.merge.mapfiles parameter and it merged all the mapper outputs to a single file.

In Hive 11, 19 mappers were executed for the dynamic partition insert task. But in Hive 13, 19 mappers and 2 reducers were executed.

When I checked the query plan for hive 11, there is only a map operator task for dynamic partition insert. But, in hive 13, I see both map operator and reduce operator task.

Is there any changes in hive 13 regarding dymamic partition inserts? Any comments on this issue is greatly appreciated.

Thanks,
vishnu


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: hive 13: dynamic partition inserts

Posted by "Gajendran, Vishnu" <vi...@amazon.com>.
Hi Prasanth,

 Thanks a lot for your quick response.
________________________________
From: Prasanth Jayachandran [pjayachandran@hortonworks.com]
Sent: Tuesday, July 22, 2014 11:28 AM
To: user@hive.apache.org
Cc: dev@hive.apache.org
Subject: Re: hive 13: dynamic partition inserts

Hi Vishnu

Yes. There is change in the way dynamic partitions are inserted in hive 13. The new dynamic partitioning is highly scalable and uses very less memory. Here is the related JIRA  https://issues.apache.org/jira/browse/HIVE-6455.

Setting "hive.optimize.sort.dynamic.partition" to false will fallback to old way of insertion. If your destination table uses columnar formats like ORC, Parquet etc. then it makes sense leave the optimization ON, as columnar formats needs some buffer space for each column before flushing to disk. Buffer space (runtime memory) will quickly shoot up when there are lots of partition column values and columns. HIVE-6455 addresses this issue.

Thanks
Prasanth Jayachandran

On Jul 22, 2014, at 10:51 AM, Gajendran, Vishnu <vi...@amazon.com>> wrote:

adding user@hive.apache.org<ma...@hive.apache.org> for wider audience
________________________________
From: Gajendran, Vishnu
Sent: Tuesday, July 22, 2014 10:42 AM
To: dev@hive.apache.org<ma...@hive.apache.org>
Subject: hive 13: dynamic partition inserts

Hello,

I am seeing a difference between hive 11 and hive 13 when inserting to a table with dynamic partitions.

In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic partition insert, I see number of files (generated my each mapper) in the specified hdfs location as expected. But, in Hive 13, when I set hive.merge.mapfiles=false, I just see one file in specified hdfs location for the same query. I think hive is not honoring the hive.merge.mapfiles parameter and it merged all the mapper outputs to a single file.

In Hive 11, 19 mappers were executed for the dynamic partition insert task. But in Hive 13, 19 mappers and 2 reducers were executed.

When I checked the query plan for hive 11, there is only a map operator task for dynamic partition insert. But, in hive 13, I see both map operator and reduce operator task.

Is there any changes in hive 13 regarding dymamic partition inserts? Any comments on this issue is greatly appreciated.

Thanks,
vishnu


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: hive 13: dynamic partition inserts

Posted by "Gajendran, Vishnu" <vi...@amazon.com>.
Hi Prasanth,

 Thanks a lot for your quick response.
________________________________
From: Prasanth Jayachandran [pjayachandran@hortonworks.com]
Sent: Tuesday, July 22, 2014 11:28 AM
To: user@hive.apache.org
Cc: dev@hive.apache.org
Subject: Re: hive 13: dynamic partition inserts

Hi Vishnu

Yes. There is change in the way dynamic partitions are inserted in hive 13. The new dynamic partitioning is highly scalable and uses very less memory. Here is the related JIRA  https://issues.apache.org/jira/browse/HIVE-6455.

Setting "hive.optimize.sort.dynamic.partition" to false will fallback to old way of insertion. If your destination table uses columnar formats like ORC, Parquet etc. then it makes sense leave the optimization ON, as columnar formats needs some buffer space for each column before flushing to disk. Buffer space (runtime memory) will quickly shoot up when there are lots of partition column values and columns. HIVE-6455 addresses this issue.

Thanks
Prasanth Jayachandran

On Jul 22, 2014, at 10:51 AM, Gajendran, Vishnu <vi...@amazon.com>> wrote:

adding user@hive.apache.org<ma...@hive.apache.org> for wider audience
________________________________
From: Gajendran, Vishnu
Sent: Tuesday, July 22, 2014 10:42 AM
To: dev@hive.apache.org<ma...@hive.apache.org>
Subject: hive 13: dynamic partition inserts

Hello,

I am seeing a difference between hive 11 and hive 13 when inserting to a table with dynamic partitions.

In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic partition insert, I see number of files (generated my each mapper) in the specified hdfs location as expected. But, in Hive 13, when I set hive.merge.mapfiles=false, I just see one file in specified hdfs location for the same query. I think hive is not honoring the hive.merge.mapfiles parameter and it merged all the mapper outputs to a single file.

In Hive 11, 19 mappers were executed for the dynamic partition insert task. But in Hive 13, 19 mappers and 2 reducers were executed.

When I checked the query plan for hive 11, there is only a map operator task for dynamic partition insert. But, in hive 13, I see both map operator and reduce operator task.

Is there any changes in hive 13 regarding dymamic partition inserts? Any comments on this issue is greatly appreciated.

Thanks,
vishnu


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.