You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chengi Liu <ch...@gmail.com> on 2014/02/26 18:43:38 UTC
specify output format using pyspark
Hi,
How do we save data to hdfs using pyspark in "right" format.
I use:
counts = counts.saveAsTextFile("hdfs://localhost:1234//foo")
But when I look into the data... It is always in tuple format
(1245,23)
(1235,99)
How do i specify output format in pyspark.
Thanks
Re: specify output format using pyspark
Posted by Chengi Liu <ch...@gmail.com>.
Cool.Thanks
On Wed, Feb 26, 2014 at 9:48 AM, Ewen Cheslack-Postava <me...@ewencp.org>wrote:
> You need to convert it to the format you want yourself. The output you're
> seeing is just the automatic conversion of your data by unicode().
>
> -Ewen
>
> Chengi Liu <ch...@gmail.com>
> February 26, 2014 at 9:43 AM
> Hi,
> How do we save data to hdfs using pyspark in "right" format.
> I use:
> counts = counts.saveAsTextFile("hdfs://localhost:1234//foo")
> But when I look into the data... It is always in tuple format
> (1245,23)
> (1235,99)
>
> How do i specify output format in pyspark.
> Thanks
>
>
Re: specify output format using pyspark
Posted by Ewen Cheslack-Postava <me...@ewencp.org>.
You need to convert it to the format you want yourself. The output
you're seeing is just the automatic conversion of your data by unicode().
-Ewen
> Chengi Liu <ma...@gmail.com>
> February 26, 2014 at 9:43 AM
> Hi,
> How do we save data to hdfs using pyspark in "right" format.
> I use:
> counts = counts.saveAsTextFile("hdfs://localhost:1234//foo")
> But when I look into the data... It is always in tuple format
> (1245,23)
> (1235,99)
>
> How do i specify output format in pyspark.
> Thanks