You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sadhan Sood <sa...@gmail.com> on 2014/11/20 19:33:28 UTC

Adding partitions to parquet data

We are loading parquet data as temp tables but wondering if there is a way
to add a partition to the data without going through hive (we still want to
use spark's parquet serde as compared to hive). The data looks like ->

/date1/file1, /date1/file2 ... , /date2/file1,
/date2/file2,..../daten/filem

and we are loading it like:
val parquetFileRDD = sqlContext.parquetFile(comma separated parquet file
names)

but it would be nice to able to add a partition and provide date in the
query parameter.

Re: Adding partitions to parquet data

Posted by Sadhan Sood <sa...@gmail.com>.
Ah awesome, thanks!!

On Thu, Nov 20, 2014 at 3:01 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> In 1.2 by default we use Spark parquet support instead of Hive when the
> SerDe contains the word "Parquet".  This should work with hive partitioning.
>
> On Thu, Nov 20, 2014 at 10:33 AM, Sadhan Sood <sa...@gmail.com>
> wrote:
>
>> We are loading parquet data as temp tables but wondering if there is a
>> way to add a partition to the data without going through hive (we still
>> want to use spark's parquet serde as compared to hive). The data looks like
>> ->
>>
>> /date1/file1, /date1/file2 ... , /date2/file1,
>> /date2/file2,..../daten/filem
>>
>> and we are loading it like:
>> val parquetFileRDD = sqlContext.parquetFile(comma separated parquet file
>> names)
>>
>> but it would be nice to able to add a partition and provide date in the
>> query parameter.
>>
>
>

Re: Adding partitions to parquet data

Posted by Michael Armbrust <mi...@databricks.com>.
In 1.2 by default we use Spark parquet support instead of Hive when the
SerDe contains the word "Parquet".  This should work with hive partitioning.

On Thu, Nov 20, 2014 at 10:33 AM, Sadhan Sood <sa...@gmail.com> wrote:

> We are loading parquet data as temp tables but wondering if there is a way
> to add a partition to the data without going through hive (we still want to
> use spark's parquet serde as compared to hive). The data looks like ->
>
> /date1/file1, /date1/file2 ... , /date2/file1,
> /date2/file2,..../daten/filem
>
> and we are loading it like:
> val parquetFileRDD = sqlContext.parquetFile(comma separated parquet file
> names)
>
> but it would be nice to able to add a partition and provide date in the
> query parameter.
>