You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chengi Liu <ch...@gmail.com> on 2014/02/26 18:43:38 UTC

specify output format using pyspark

Hi,
  How do we save data to hdfs using pyspark in "right" format.
I use:
counts = counts.saveAsTextFile("hdfs://localhost:1234//foo")
But when I look into the data... It is always in tuple format
(1245,23)
(1235,99)

How do i specify output format in pyspark.
Thanks

Re: specify output format using pyspark

Posted by Chengi Liu <ch...@gmail.com>.
Cool.Thanks


On Wed, Feb 26, 2014 at 9:48 AM, Ewen Cheslack-Postava <me...@ewencp.org>wrote:

> You need to convert it to the format you want yourself. The output you're
> seeing is just the automatic conversion of your data by unicode().
>
> -Ewen
>
>   Chengi Liu <ch...@gmail.com>
>  February 26, 2014 at 9:43 AM
> Hi,
>   How do we save data to hdfs using pyspark in "right" format.
> I use:
> counts = counts.saveAsTextFile("hdfs://localhost:1234//foo")
> But when I look into the data... It is always in tuple format
> (1245,23)
> (1235,99)
>
> How do i specify output format in pyspark.
> Thanks
>
>

Re: specify output format using pyspark

Posted by Ewen Cheslack-Postava <me...@ewencp.org>.
You need to convert it to the format you want yourself. The output 
you're seeing is just the automatic conversion of your data by unicode().

-Ewen
> Chengi Liu <ma...@gmail.com>
> February 26, 2014 at 9:43 AM
> Hi,
>   How do we save data to hdfs using pyspark in "right" format.
> I use:
> counts = counts.saveAsTextFile("hdfs://localhost:1234//foo")
> But when I look into the data... It is always in tuple format
> (1245,23)
> (1235,99)
>
> How do i specify output format in pyspark.
> Thanks