You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Nitin Goyal <ni...@gmail.com> on 2016/11/25 05:06:09 UTC

Parquet-like partitioning support in spark SQL's in-memory columnar cache

Hi,

Do we have any plan of supporting parquet-like partitioning support in
Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
in-memory cache partition.


-Nitin

Re: Parquet-like partitioning support in spark SQL's in-memory columnar cache

Posted by Nitin Goyal <ni...@gmail.com>.

+Cheng

Hi Reynold,

I think you are referring to bucketing in in-memory columnar cache.

I am proposing that if we have a parquet structure like following :-

/<parent-directory>/file1/id=1/<parquet-part-files>
/<parent-directory>/file1/id=2/<parquet-part-files>

and if we read and cache it, it should create 2 RDD[CachedBatch] (each per
value of "id")

Is this what you were refering to originally?

Thanks
-Nitin


On Fri, Nov 25, 2016 at 11:29 AM, Reynold Xin <rx...@databricks.com> wrote:

> It's already there isn't it? The in-memory columnar cache format.
>
>
> On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal <ni...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Do we have any plan of supporting parquet-like partitioning support in
>> Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
>> in-memory cache partition.
>>
>>
>> -Nitin
>>
>
>


-- 
Regards
Nitin Goyal

Re: Parquet-like partitioning support in spark SQL's in-memory columnar cache

Posted by Reynold Xin <rx...@databricks.com>.

It's already there isn't it? The in-memory columnar cache format.

On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal <ni...@gmail.com> wrote:

> Hi,
>
> Do we have any plan of supporting parquet-like partitioning support in
> Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
> in-memory cache partition.
>
>
> -Nitin
>