You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nitin Goyal <ni...@gmail.com> on 2015/09/02 09:58:38 UTC
Re: [ compress in-memory column storage used in sparksql cache
table ]
I think spark sql's in-memory columnar cache already does compression. Check
out classes in following path :-
https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/compression
Although compression ratio is not as good as Parquet.
Thanks
-Nitin
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/compress-in-memory-column-storage-used-in-sparksql-cache-table-tp13932p13937.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org
Re: [ compress in-memory column storage used in sparksql cache table
]
Posted by Cheng Lian <li...@gmail.com>.
Yeah, two of the reasons why the built-in in-memory columnar storage
doesn't achieve comparable compression ratio as Parquet are:
1. The in-memory columnar representation doesn't handle nested types. So
array/map/struct values are not compressed.
2. Parquet may use more than one kind of compression methods to compress
a single column. For example, dictionary + RLE.
Cheng
On 9/2/15 3:58 PM, Nitin Goyal wrote:
> I think spark sql's in-memory columnar cache already does compression. Check
> out classes in following path :-
>
> https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/compression
>
> Although compression ratio is not as good as Parquet.
>
> Thanks
> -Nitin
>
>
>
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/compress-in-memory-column-storage-used-in-sparksql-cache-table-tp13932p13937.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org