You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nitin Goyal <ni...@gmail.com> on 2016/11/25 05:06:09 UTC
Parquet-like partitioning support in spark SQL's in-memory columnar cache
Hi,
Do we have any plan of supporting parquet-like partitioning support in
Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
in-memory cache partition.
-Nitin
Re: Parquet-like partitioning support in spark SQL's in-memory
columnar cache
Posted by Nitin Goyal <ni...@gmail.com>.
+Cheng
Hi Reynold,
I think you are referring to bucketing in in-memory columnar cache.
I am proposing that if we have a parquet structure like following :-
/<parent-directory>/file1/id=1/<parquet-part-files>
/<parent-directory>/file1/id=2/<parquet-part-files>
and if we read and cache it, it should create 2 RDD[CachedBatch] (each per
value of "id")
Is this what you were refering to originally?
Thanks
-Nitin
On Fri, Nov 25, 2016 at 11:29 AM, Reynold Xin <rx...@databricks.com> wrote:
> It's already there isn't it? The in-memory columnar cache format.
>
>
> On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal <ni...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Do we have any plan of supporting parquet-like partitioning support in
>> Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
>> in-memory cache partition.
>>
>>
>> -Nitin
>>
>
>
--
Regards
Nitin Goyal
Re: Parquet-like partitioning support in spark SQL's in-memory
columnar cache
Posted by Reynold Xin <rx...@databricks.com>.
It's already there isn't it? The in-memory columnar cache format.
On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal <ni...@gmail.com> wrote:
> Hi,
>
> Do we have any plan of supporting parquet-like partitioning support in
> Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
> in-memory cache partition.
>
>
> -Nitin
>