You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Tianqi Tong <tt...@brightedge.com> on 2015/04/09 19:34:04 UTC

[Hive] Slow Loading Data Process with Parquet over 30k Partitions

Hello Hive,
I'm a developer using Hive to process TB level data, and I'm having some difficulty loading the data to table.
I have 2 tables now:

-- table_1:
CREATE EXTERNAL TABLE `table_1`(
  `keyword` string,
  `domain` string,
  `url` string
  )
PARTITIONED BY (yearmonth INT, partition1 STRING)
STORED AS RCfile

-- table_2:
CREATE EXTERNAL TABLE `table_2`(
  `keyword` string,
  `domain` string,
  `url` string
  )
PARTITIONED BY (yearmonth INT, partition2 STRING)
STORED AS Parquet

I'm doing an INSERT OVERWRITE to table_2 from SELECT FROM table_1 with dynamic partitioning, and the number of partitions grows dramatically from 1500 to 40k (because I want to use something else as partitioning).
The mapreduce job was fine.
Somehow the process stucked at " Loading data to table default.table_2 (yearmonth=null, domain_prefix=null) ", and I've been waiting for hours.

Is this expected when we have 40k partitions?

--------------------------------------------------------------
Refs - Here are the parameters that I used:
export HADOOP_HEAPSIZE=16384
set PARQUET_FILE_SIZE=268435456;
set parquet.block.size=268435456;
set dfs.blocksize=268435456;
set parquet.compression=SNAPPY;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=500000;
SET hive.exec.max.dynamic.partitions.pernode=50000;
SET hive.exec.max.created.files=1000000;


Thank you very much!
Tianqi Tong

Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

Posted by Edward Capriolo <ed...@gmail.com>.

That is too many partitions. Way to much overhead in anything that has that
many partitions.

On Tue, Apr 14, 2015 at 12:53 PM, Tianqi Tong <tt...@brightedge.com> wrote:

>  Hi Slava and Ferdinand,
>
> Thanks for the reply! Later when I was looking at the hive.log, I found
> Hive was indeed calculating the partition stats, and the log looks like:
>
> ….
>
> 2015-04-14 09:38:21,146 WARN  [main]: hive.log
> (MetaStoreUtils.java:updatePartitionStatsFast(296)) - Updating partition
> stats fast for: parquet_table
>
> 2015-04-14 09:38:21,147 WARN  [main]: hive.log
> (MetaStoreUtils.java:updatePartitionStatsFast(299)) - Updated size to
> 5533480
>
> 2015-04-14 09:38:44,511 WARN  [main]: hive.log
> (MetaStoreUtils.java:updatePartitionStatsFast(296)) - Updating partition
> stats fast for: parquet_table
>
> 2015-04-14 09:38:44,512 WARN  [main]: hive.log
> (MetaStoreUtils.java:updatePartitionStatsFast(299)) - Updated size to 66246
>
> 2015-04-14 09:39:07,554 WARN  [main]: hive.log
> (MetaStoreUtils.java:updatePartitionStatsFast(296)) - Updating partition
> stats fast for: parquet_table
>
> 2015-04-14 09:39:07,555 WARN  [main]: hive.log
> (MetaStoreUtils.java:updatePartitionStatsFast(299)) - Updated size to 418925
>
> ….
>
>
>
> One interesting thing is, it's getting slower and slower. Right after I
> launched the job, it took less than 1s to calculate for one partition. Now
> it's taking 20+s for each one.
>
> I tried hive.stats.autogather=false, but somehow it didn't seem to work. I
> also ended up hard coding a little bit to the Hive source code.
>
>
>
> In my case, I have around 40000 partitions with one file (varies from 1M
> to 1G) in each of them. Now it's been 4 days and the first job I launched
> is still not done yet, with partition stats.
>
>
>
> Thanks
>
> Tianqi Tong
>
>
>
> *From:* Slava Markeyev [mailto:slava.markeyev@upsight.com]
> *Sent:* Monday, April 13, 2015 11:00 PM
> *To:* user@hive.apache.org
> *Cc:* Sergio Pena
> *Subject:* Re: [Hive] Slow Loading Data Process with Parquet over 30k
> Partitions
>
>
>
> This is something I've encountered when doing ETL with hive and having it
> create 10's of thousands partitions. The issue is each partition needs to
> be added to the metastore and this is an expensive operation to perform. My
> work around was adding a flag to hive that optionally disables the
> metastore partition creation step. This may not be a solution for everyone
> as that table then has no partitions and you would have to run msck repair
> but depending on your use case, you may just want the data in hdfs.
>
> If there is interest in having this be an option I'll make a ticket and
> submit the patch.
>
> -Slava
>
>
>
> On Mon, Apr 13, 2015 at 10:40 PM, Xu, Cheng A <ch...@intel.com>
> wrote:
>
> Hi Tianqi,
>
> Can you attach hive.log as more detailed information?
>
> +Sergio
>
>
>
> Yours,
>
> Ferdinand Xu
>
>
>
> *From:* Tianqi Tong [mailto:ttong@brightedge.com]
> *Sent:* Friday, April 10, 2015 1:34 AM
> *To:* user@hive.apache.org
> *Subject:* [Hive] Slow Loading Data Process with Parquet over 30k
> Partitions
>
>
>
> Hello Hive,
>
> I'm a developer using Hive to process TB level data, and I'm having some
> difficulty loading the data to table.
>
> I have 2 tables now:
>
>
>
> -- table_1:
>
> CREATE EXTERNAL TABLE `table_1`(
>
>   `keyword` string,
>
>   `domain` string,
>
>   `url` string
>
>   )
>
> PARTITIONED BY (yearmonth INT, partition1 STRING)
>
> STORED AS RCfile
>
>
>
> -- table_2:
>
> CREATE EXTERNAL TABLE `table_2`(
>
>   `keyword` string,
>
>   `domain` string,
>
>   `url` string
>
>   )
>
> PARTITIONED BY (yearmonth INT, partition2 STRING)
>
> STORED AS Parquet
>
>
>
> I'm doing an INSERT OVERWRITE to table_2 from SELECT FROM table_1 with
> dynamic partitioning, and the number of partitions grows dramatically from
> 1500 to 40k (because I want to use something else as partitioning).
>
> The mapreduce job was fine.
>
> Somehow the process stucked at " Loading data to table default.table_2
> (yearmonth=null, domain_prefix=null) ", and I've been waiting for hours.
>
>
>
> Is this expected when we have 40k partitions?
>
>
>
> --------------------------------------------------------------
>
> Refs - Here are the parameters that I used:
>
> export HADOOP_HEAPSIZE=16384
>
> set PARQUET_FILE_SIZE=268435456;
>
> set parquet.block.size=268435456;
>
> set dfs.blocksize=268435456;
>
> set parquet.compression=SNAPPY;
>
> SET hive.exec.dynamic.partition.mode=nonstrict;
>
> SET hive.exec.max.dynamic.partitions=500000;
>
> SET hive.exec.max.dynamic.partitions.pernode=50000;
>
> SET hive.exec.max.created.files=1000000;
>
>
>
>
>
> Thank you very much!
>
> Tianqi Tong
>
>
>
>
> --
>
> Slava Markeyev | Engineering | Upsight
>

RE: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

Posted by Tianqi Tong <tt...@brightedge.com>.

Hi Slava and Ferdinand,
Thanks for the reply! Later when I was looking at the hive.log, I found Hive was indeed calculating the partition stats, and the log looks like:
….
2015-04-14 09:38:21,146 WARN  [main]: hive.log (MetaStoreUtils.java:updatePartitionStatsFast(296)) - Updating partition stats fast for: parquet_table
2015-04-14 09:38:21,147 WARN  [main]: hive.log (MetaStoreUtils.java:updatePartitionStatsFast(299)) - Updated size to 5533480
2015-04-14 09:38:44,511 WARN  [main]: hive.log (MetaStoreUtils.java:updatePartitionStatsFast(296)) - Updating partition stats fast for: parquet_table
2015-04-14 09:38:44,512 WARN  [main]: hive.log (MetaStoreUtils.java:updatePartitionStatsFast(299)) - Updated size to 66246
2015-04-14 09:39:07,554 WARN  [main]: hive.log (MetaStoreUtils.java:updatePartitionStatsFast(296)) - Updating partition stats fast for: parquet_table
2015-04-14 09:39:07,555 WARN  [main]: hive.log (MetaStoreUtils.java:updatePartitionStatsFast(299)) - Updated size to 418925
….

One interesting thing is, it's getting slower and slower. Right after I launched the job, it took less than 1s to calculate for one partition. Now it's taking 20+s for each one.
I tried hive.stats.autogather=false, but somehow it didn't seem to work. I also ended up hard coding a little bit to the Hive source code.

In my case, I have around 40000 partitions with one file (varies from 1M to 1G) in each of them. Now it's been 4 days and the first job I launched is still not done yet, with partition stats.

Thanks
Tianqi Tong

From: Slava Markeyev [mailto:slava.markeyev@upsight.com]
Sent: Monday, April 13, 2015 11:00 PM
To: user@hive.apache.org
Cc: Sergio Pena
Subject: Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

This is something I've encountered when doing ETL with hive and having it create 10's of thousands partitions. The issue is each partition needs to be added to the metastore and this is an expensive operation to perform. My work around was adding a flag to hive that optionally disables the metastore partition creation step. This may not be a solution for everyone as that table then has no partitions and you would have to run msck repair but depending on your use case, you may just want the data in hdfs.
If there is interest in having this be an option I'll make a ticket and submit the patch.
-Slava

On Mon, Apr 13, 2015 at 10:40 PM, Xu, Cheng A <ch...@intel.com>> wrote:
Hi Tianqi,
Can you attach hive.log as more detailed information?
+Sergio

Yours,
Ferdinand Xu

From: Tianqi Tong [mailto:ttong@brightedge.com<ma...@brightedge.com>]
Sent: Friday, April 10, 2015 1:34 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

Hello Hive,
I'm a developer using Hive to process TB level data, and I'm having some difficulty loading the data to table.
I have 2 tables now:

-- table_1:
CREATE EXTERNAL TABLE `table_1`(
  `keyword` string,
  `domain` string,
  `url` string
  )
PARTITIONED BY (yearmonth INT, partition1 STRING)
STORED AS RCfile

-- table_2:
CREATE EXTERNAL TABLE `table_2`(
  `keyword` string,
  `domain` string,
  `url` string
  )
PARTITIONED BY (yearmonth INT, partition2 STRING)
STORED AS Parquet

I'm doing an INSERT OVERWRITE to table_2 from SELECT FROM table_1 with dynamic partitioning, and the number of partitions grows dramatically from 1500 to 40k (because I want to use something else as partitioning).
The mapreduce job was fine.
Somehow the process stucked at " Loading data to table default.table_2 (yearmonth=null, domain_prefix=null) ", and I've been waiting for hours.

Is this expected when we have 40k partitions?

--------------------------------------------------------------
Refs - Here are the parameters that I used:
export HADOOP_HEAPSIZE=16384
set PARQUET_FILE_SIZE=268435456;
set parquet.block.size=268435456;
set dfs.blocksize=268435456;
set parquet.compression=SNAPPY;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=500000;
SET hive.exec.max.dynamic.partitions.pernode=50000;
SET hive.exec.max.created.files=1000000;


Thank you very much!
Tianqi Tong



--

Slava Markeyev | Engineering | Upsight

Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

Posted by Slava Markeyev <sl...@upsight.com>.

I've created HIVE-10385 and attached a patch. Unit tests to come.

-Slava

On Fri, Apr 17, 2015 at 1:34 PM, Chris Roblee <ch...@unity3d.com> wrote:

> Hi Slava,
>
> We would be interested in reviewing your patch.  Can you please provide
> more details?
>
> Is there any other way to disable the partition creation step?
>
> Thanks,
> Chris
>
> On 4/13/15 10:59 PM, Slava Markeyev wrote:
>
>> This is something I've encountered when doing ETL with hive and having it
>> create 10's of thousands partitions. The issue
>> is each partition needs to be added to the metastore and this is an
>> expensive operation to perform. My work around was
>> adding a flag to hive that optionally disables the metastore partition
>> creation step. This may not be a solution for
>> everyone as that table then has no partitions and you would have to run
>> msck repair but depending on your use case, you
>> may just want the data in hdfs.
>>
>> If there is interest in having this be an option I'll make a ticket and
>> submit the patch.
>>
>> -Slava
>>
>> On Mon, Apr 13, 2015 at 10:40 PM, Xu, Cheng A <cheng.a.xu@intel.com
>> <ma...@intel.com>> wrote:
>>
>>     Hi Tianqi,____
>>
>>     Can you attach hive.log as more detailed information?____
>>
>>     +Sergio____
>>
>>     __ __
>>
>>     Yours,____
>>
>>     Ferdinand Xu____
>>
>>     __ __
>>
>>     *From:*Tianqi Tong [mailto:ttong@brightedge.com <mailto:
>> ttong@brightedge.com>]
>>     *Sent:* Friday, April 10, 2015 1:34 AM
>>     *To:* user@hive.apache.org <ma...@hive.apache.org>
>>     *Subject:* [Hive] Slow Loading Data Process with Parquet over 30k
>> Partitions____
>>
>>     __ __
>>
>>     Hello Hive,____
>>
>>     I'm a developer using Hive to process TB level data, and I'm having
>> some difficulty loading the data to table.____
>>
>>     I have 2 tables now:____
>>
>>     __ __
>>
>>     -- table_1:____
>>
>>     CREATE EXTERNAL TABLE `table_1`(____
>>
>>        `keyword` string,____
>>
>>        `domain` string,____
>>
>>        `url` string____
>>
>>        )____
>>
>>     PARTITIONED BY (yearmonth INT, partition1 STRING)____
>>
>>     STORED AS RCfile____
>>
>>     __ __
>>
>>     -- table_2:____
>>
>>     CREATE EXTERNAL TABLE `table_2`(____
>>
>>        `keyword` string,____
>>
>>        `domain` string,____
>>
>>        `url` string____
>>
>>        )____
>>
>>     PARTITIONED BY (yearmonth INT, partition2 STRING)____
>>
>>     STORED AS Parquet____
>>
>>     __ __
>>
>>     I'm doing an INSERT OVERWRITE to table_2 from SELECT FROM table_1
>> with dynamic partitioning, and the number of
>>     partitions grows dramatically from 1500 to 40k (because I want to use
>> something else as partitioning).____
>>
>>     The mapreduce job was fine.____
>>
>>     Somehow the process stucked at " Loading data to table
>> default.table_2 (yearmonth=null, domain_prefix=null) ", and
>>     I've been waiting for hours.____
>>
>>     __ __
>>
>>     Is this expected when we have 40k partitions?____
>>
>>     __ __
>>
>>     --------------------------------------------------------------____
>>
>>     Refs - Here are the parameters that I used:____
>>
>>     export HADOOP_HEAPSIZE=16384____
>>
>>     set PARQUET_FILE_SIZE=268435456;____
>>
>>     set parquet.block.size=268435456;____
>>
>>     set dfs.blocksize=268435456;____
>>
>>     set parquet.compression=SNAPPY;____
>>
>>     SET hive.exec.dynamic.partition.mode=nonstrict;____
>>
>>     SET hive.exec.max.dynamic.partitions=500000;____
>>
>>     SET hive.exec.max.dynamic.partitions.pernode=50000;____
>>
>>     SET hive.exec.max.created.files=1000000;____
>>
>>     __ __
>>
>>     __ __
>>
>>     Thank you very much!____
>>
>>     Tianqi Tong____
>>
>>
>>
>>
>> --
>>
>> Slava Markeyev | Engineering | Upsight
>>
>>
>


-- 

Slava Markeyev | Engineering | Upsight
<http://www.linkedin.com/in/slavamarkeyev>
<http://www.linkedin.com/in/slavamarkeyev>

Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

Posted by Chris Roblee <ch...@unity3d.com>.

Hi Slava,

We would be interested in reviewing your patch.  Can you please provide more details?

Is there any other way to disable the partition creation step?

Thanks,
Chris

On 4/13/15 10:59 PM, Slava Markeyev wrote:
> This is something I've encountered when doing ETL with hive and having it create 10's of thousands partitions. The issue
> is each partition needs to be added to the metastore and this is an expensive operation to perform. My work around was
> adding a flag to hive that optionally disables the metastore partition creation step. This may not be a solution for
> everyone as that table then has no partitions and you would have to run msck repair but depending on your use case, you
> may just want the data in hdfs.
>
> If there is interest in having this be an option I'll make a ticket and submit the patch.
>
> -Slava
>
> On Mon, Apr 13, 2015 at 10:40 PM, Xu, Cheng A <cheng.a.xu@intel.com <ma...@intel.com>> wrote:
>
>     Hi Tianqi,____
>
>     Can you attach hive.log as more detailed information?____
>
>     +Sergio____
>
>     __ __
>
>     Yours,____
>
>     Ferdinand Xu____
>
>     __ __
>
>     *From:*Tianqi Tong [mailto:ttong@brightedge.com <ma...@brightedge.com>]
>     *Sent:* Friday, April 10, 2015 1:34 AM
>     *To:* user@hive.apache.org <ma...@hive.apache.org>
>     *Subject:* [Hive] Slow Loading Data Process with Parquet over 30k Partitions____
>
>     __ __
>
>     Hello Hive,____
>
>     I'm a developer using Hive to process TB level data, and I'm having some difficulty loading the data to table.____
>
>     I have 2 tables now:____
>
>     __ __
>
>     -- table_1:____
>
>     CREATE EXTERNAL TABLE `table_1`(____
>
>        `keyword` string,____
>
>        `domain` string,____
>
>        `url` string____
>
>        )____
>
>     PARTITIONED BY (yearmonth INT, partition1 STRING)____
>
>     STORED AS RCfile____
>
>     __ __
>
>     -- table_2:____
>
>     CREATE EXTERNAL TABLE `table_2`(____
>
>        `keyword` string,____
>
>        `domain` string,____
>
>        `url` string____
>
>        )____
>
>     PARTITIONED BY (yearmonth INT, partition2 STRING)____
>
>     STORED AS Parquet____
>
>     __ __
>
>     I'm doing an INSERT OVERWRITE to table_2 from SELECT FROM table_1 with dynamic partitioning, and the number of
>     partitions grows dramatically from 1500 to 40k (because I want to use something else as partitioning).____
>
>     The mapreduce job was fine.____
>
>     Somehow the process stucked at " Loading data to table default.table_2 (yearmonth=null, domain_prefix=null) ", and
>     I've been waiting for hours.____
>
>     __ __
>
>     Is this expected when we have 40k partitions?____
>
>     __ __
>
>     --------------------------------------------------------------____
>
>     Refs - Here are the parameters that I used:____
>
>     export HADOOP_HEAPSIZE=16384____
>
>     set PARQUET_FILE_SIZE=268435456;____
>
>     set parquet.block.size=268435456;____
>
>     set dfs.blocksize=268435456;____
>
>     set parquet.compression=SNAPPY;____
>
>     SET hive.exec.dynamic.partition.mode=nonstrict;____
>
>     SET hive.exec.max.dynamic.partitions=500000;____
>
>     SET hive.exec.max.dynamic.partitions.pernode=50000;____
>
>     SET hive.exec.max.created.files=1000000;____
>
>     __ __
>
>     __ __
>
>     Thank you very much!____
>
>     Tianqi Tong____
>
>
>
>
> --
>
> Slava Markeyev | Engineering | Upsight
>

Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

Posted by Slava Markeyev <sl...@upsight.com>.

This is something I've encountered when doing ETL with hive and having it
create 10's of thousands partitions. The issue is each partition needs to
be added to the metastore and this is an expensive operation to perform. My
work around was adding a flag to hive that optionally disables the
metastore partition creation step. This may not be a solution for everyone
as that table then has no partitions and you would have to run msck repair
but depending on your use case, you may just want the data in hdfs.

If there is interest in having this be an option I'll make a ticket and
submit the patch.

-Slava

On Mon, Apr 13, 2015 at 10:40 PM, Xu, Cheng A <ch...@intel.com> wrote:

>  Hi Tianqi,
>
> Can you attach hive.log as more detailed information?
>
> +Sergio
>
>
>
> Yours,
>
> Ferdinand Xu
>
>
>
> *From:* Tianqi Tong [mailto:ttong@brightedge.com]
> *Sent:* Friday, April 10, 2015 1:34 AM
> *To:* user@hive.apache.org
> *Subject:* [Hive] Slow Loading Data Process with Parquet over 30k
> Partitions
>
>
>
> Hello Hive,
>
> I'm a developer using Hive to process TB level data, and I'm having some
> difficulty loading the data to table.
>
> I have 2 tables now:
>
>
>
> -- table_1:
>
> CREATE EXTERNAL TABLE `table_1`(
>
>   `keyword` string,
>
>   `domain` string,
>
>   `url` string
>
>   )
>
> PARTITIONED BY (yearmonth INT, partition1 STRING)
>
> STORED AS RCfile
>
>
>
> -- table_2:
>
> CREATE EXTERNAL TABLE `table_2`(
>
>   `keyword` string,
>
>   `domain` string,
>
>   `url` string
>
>   )
>
> PARTITIONED BY (yearmonth INT, partition2 STRING)
>
> STORED AS Parquet
>
>
>
> I'm doing an INSERT OVERWRITE to table_2 from SELECT FROM table_1 with
> dynamic partitioning, and the number of partitions grows dramatically from
> 1500 to 40k (because I want to use something else as partitioning).
>
> The mapreduce job was fine.
>
> Somehow the process stucked at " Loading data to table default.table_2
> (yearmonth=null, domain_prefix=null) ", and I've been waiting for hours.
>
>
>
> Is this expected when we have 40k partitions?
>
>
>
> --------------------------------------------------------------
>
> Refs - Here are the parameters that I used:
>
> export HADOOP_HEAPSIZE=16384
>
> set PARQUET_FILE_SIZE=268435456;
>
> set parquet.block.size=268435456;
>
> set dfs.blocksize=268435456;
>
> set parquet.compression=SNAPPY;
>
> SET hive.exec.dynamic.partition.mode=nonstrict;
>
> SET hive.exec.max.dynamic.partitions=500000;
>
> SET hive.exec.max.dynamic.partitions.pernode=50000;
>
> SET hive.exec.max.created.files=1000000;
>
>
>
>
>
> Thank you very much!
>
> Tianqi Tong
>



-- 

Slava Markeyev | Engineering | Upsight

RE: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

Posted by "Xu, Cheng A" <ch...@intel.com>.

Hi Tianqi,
Can you attach hive.log as more detailed information?
+Sergio

Yours,
Ferdinand Xu

From: Tianqi Tong [mailto:ttong@brightedge.com]
Sent: Friday, April 10, 2015 1:34 AM
To: user@hive.apache.org
Subject: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

Hello Hive,
I'm a developer using Hive to process TB level data, and I'm having some difficulty loading the data to table.
I have 2 tables now:

-- table_1:
CREATE EXTERNAL TABLE `table_1`(
  `keyword` string,
  `domain` string,
  `url` string
  )
PARTITIONED BY (yearmonth INT, partition1 STRING)
STORED AS RCfile

-- table_2:
CREATE EXTERNAL TABLE `table_2`(
  `keyword` string,
  `domain` string,
  `url` string
  )
PARTITIONED BY (yearmonth INT, partition2 STRING)
STORED AS Parquet

I'm doing an INSERT OVERWRITE to table_2 from SELECT FROM table_1 with dynamic partitioning, and the number of partitions grows dramatically from 1500 to 40k (because I want to use something else as partitioning).
The mapreduce job was fine.
Somehow the process stucked at " Loading data to table default.table_2 (yearmonth=null, domain_prefix=null) ", and I've been waiting for hours.

Is this expected when we have 40k partitions?

--------------------------------------------------------------
Refs - Here are the parameters that I used:
export HADOOP_HEAPSIZE=16384
set PARQUET_FILE_SIZE=268435456;
set parquet.block.size=268435456;
set dfs.blocksize=268435456;
set parquet.compression=SNAPPY;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=500000;
SET hive.exec.max.dynamic.partitions.pernode=50000;
SET hive.exec.max.created.files=1000000;

Thank you very much!
Tianqi Tong