You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Brandon White <bw...@gmail.com> on 2015/08/05 17:43:29 UTC

Spark SQL Hive - merge small files

Hello,

I would love to have hive merge the small files in my managed hive context
after every query. Right now, I am setting the hive configuration in my
Spark Job configuration but hive is not managing the files. Do I need to
set the hive fields in around place? How do you set Hive configurations in
Spark?

Here is what I'd like to set

hive.merge.mapfilestrue
hive.merge.mapredfilestrue
hive.merge.size.per.task256000000
hive.merge.smallfiles.avgsize16000000

Re: Spark SQL Hive - merge small files

Posted by Brandon White <bw...@gmail.com>.
So there is no good way to merge spark files in a manage hive table right
now?

On Wed, Aug 5, 2015 at 10:02 AM, Michael Armbrust <mi...@databricks.com>
wrote:

> This feature isn't currently supported.
>
> On Wed, Aug 5, 2015 at 8:43 AM, Brandon White <bw...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I would love to have hive merge the small files in my managed hive
>> context after every query. Right now, I am setting the hive configuration
>> in my Spark Job configuration but hive is not managing the files. Do I need
>> to set the hive fields in around place? How do you set Hive configurations
>> in Spark?
>>
>> Here is what I'd like to set
>>
>> hive.merge.mapfilestrue
>> hive.merge.mapredfilestrue
>> hive.merge.size.per.task256000000
>> hive.merge.smallfiles.avgsize16000000
>>
>
>

Re: Spark SQL Hive - merge small files

Posted by Michael Armbrust <mi...@databricks.com>.
This feature isn't currently supported.

On Wed, Aug 5, 2015 at 8:43 AM, Brandon White <bw...@gmail.com>
wrote:

> Hello,
>
> I would love to have hive merge the small files in my managed hive context
> after every query. Right now, I am setting the hive configuration in my
> Spark Job configuration but hive is not managing the files. Do I need to
> set the hive fields in around place? How do you set Hive configurations in
> Spark?
>
> Here is what I'd like to set
>
> hive.merge.mapfilestrue
> hive.merge.mapredfilestrue
> hive.merge.size.per.task256000000
> hive.merge.smallfiles.avgsize16000000
>