You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Mick Davies <mi...@gmail.com> on 2015/02/13 10:40:24 UTC

Re: Optimize encoding/decoding strings when using Parquet

I have put in a PR on Parquet to support dictionaries when filters are pushed
down, which should reduce binary conversion overhear when Spark pushes down
string predicates on columns that are dictionary encoded.

https://github.com/apache/incubator-parquet-mr/pull/117

It's blocked at the moment as I part of my parquet build fails on my Mac due
to issue getting thrift 0.7 installed. Installation instructions available
on Parquet do not seem to work I think due to this issue
https://issues.apache.org/jira/browse/THRIFT-2229
<https://issues.apache.org/jira/browse/THRIFT-2229>.

This is not directly related to Spark but I wondered if anyone has got
thrift 0.7 working on Mac Yosemite 10.0, or can suggest a work round.



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Optimize-encoding-decoding-strings-when-using-Parquet-tp10141p10617.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org