You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shuai Zheng <sz...@gmail.com> on 2015/03/09 17:25:33 UTC

Read Parquet file from scala directly

Hi All,

 

I have a lot of parquet files, and I try to open them directly instead of
load them into RDD in driver (so I can optimize some performance through
special logic). 

But I do some research online and can't find any example to access parquet
directly from scala, anyone has done this before?

 

Regards,

 

Shuai


Re: Read Parquet file from scala directly

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Here's a Java version
https://github.com/cloudera/parquet-examples/tree/master/MapReduce It won't
be that hard to make that in Scala.

Thanks
Best Regards

On Mon, Mar 9, 2015 at 9:55 PM, Shuai Zheng <sz...@gmail.com> wrote:

> Hi All,
>
>
>
> I have a lot of parquet files, and I try to open them directly instead of
> load them into RDD in driver (so I can optimize some performance through
> special logic).
>
> But I do some research online and can’t find any example to access parquet
> directly from scala, anyone has done this before?
>
>
>
> Regards,
>
>
>
> Shuai
>

Re: Read Parquet file from scala directly

Posted by Cheng Lian <li...@gmail.com>.
The parquet-tools code should be pretty helpful (although it's Java)

https://github.com/apache/incubator-parquet-mr/tree/master/parquet-tools/src/main/java/parquet/tools/command

On 3/10/15 12:25 AM, Shuai Zheng wrote:
>
> Hi All,
>
> I have a lot of parquet files, and I try to open them directly instead 
> of load them into RDD in driver (so I can optimize some performance 
> through special logic).
>
> But I do some research online and can’t find any example to access 
> parquet directly from scala, anyone has done this before?
>
> Regards,
>
> Shuai
>