You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shaikh Riyaz <sh...@gmail.com> on 2014/07/07 02:30:30 UTC

Data loading to Parquet using spark

Hi,

We are planning to use spark to load data to Parquet and this data will be
query by Impala for present visualization through Tableau.

Can we achieve this flow? How to load data to Parquet from spark? Will
impala be able to access the data loaded by spark?

I will greatly appreciate if someone can help with the example to achieve
the goal.

Thanks in advance.

-- 
Regards,

Riyaz

Re: Data loading to Parquet using spark

Posted by Soren Macbeth <so...@yieldbot.com>.
I typed "spark parquet" into google and the top results was this blog post
about reading and writing parquet files from spark

http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/


On Mon, Jul 7, 2014 at 5:23 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> SchemaRDDs, provided by Spark SQL, have a saveAsParquetFile command.  You
> can turn a normal RDD into a SchemaRDD using the techniques described here:
> http://spark.apache.org/docs/latest/sql-programming-guide.html
>
> This should work with Impala, but if you run into any issues please let me
> know.
>
>
> On Sun, Jul 6, 2014 at 5:30 PM, Shaikh Riyaz <sh...@gmail.com> wrote:
>
>> Hi,
>>
>> We are planning to use spark to load data to Parquet and this data will
>> be query by Impala for present visualization through Tableau.
>>
>> Can we achieve this flow? How to load data to Parquet from spark? Will
>> impala be able to access the data loaded by spark?
>>
>> I will greatly appreciate if someone can help with the example to achieve
>> the goal.
>>
>> Thanks in advance.
>>
>> --
>> Regards,
>>
>> Riyaz
>>
>>
>

Re: Data loading to Parquet using spark

Posted by Michael Armbrust <mi...@databricks.com>.
SchemaRDDs, provided by Spark SQL, have a saveAsParquetFile command.  You
can turn a normal RDD into a SchemaRDD using the techniques described here:
http://spark.apache.org/docs/latest/sql-programming-guide.html

This should work with Impala, but if you run into any issues please let me
know.


On Sun, Jul 6, 2014 at 5:30 PM, Shaikh Riyaz <sh...@gmail.com> wrote:

> Hi,
>
> We are planning to use spark to load data to Parquet and this data will be
> query by Impala for present visualization through Tableau.
>
> Can we achieve this flow? How to load data to Parquet from spark? Will
> impala be able to access the data loaded by spark?
>
> I will greatly appreciate if someone can help with the example to achieve
> the goal.
>
> Thanks in advance.
>
> --
> Regards,
>
> Riyaz
>
>