You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Stephen Fletcher <st...@gmail.com> on 2017/02/26 12:31:19 UTC

attempting to map Dataset[Row]

I'm attempting to perform a map on a Dataset[Row] but getting an error on
decode when attempting to pass a custom encoder.
 My code looks similar to the following:


val source =
spark.read.format("parquet").load("/emrdata/sources/very_large_ds")



source.map{ row => {
      val key = row(0)

   }
}

Re: attempting to map Dataset[Row]

Posted by "颜发才 (Yan Facai)" <fa...@gmail.com>.
Hi, Fletcher.
case class can help construct complex structure.
and also,  RDD, StructType and StructureField are helpful if you need.

However,
the code is a little confusing,

source.map{ row => {
      val key = row(0)
      val buff = new ArrayBuffer[Row]()
      buff += row
      (key,buff)
   }
}

The expected result is (row[0], row), right?
Would you like to explain its purpose?








On Sun, Feb 26, 2017 at 8:36 PM, Stephen Fletcher <
stephen.fletcher@gmail.com> wrote:

> sorry here's the whole code
>
> val source = spark.read.format("parquet").load("/emrdata/sources/very_lar
> ge_ds")
>
> implicit val mapEncoder = org.apache.spark.sql.Encoders.
> kryo[(Any,ArrayBuffer[Row])]
>
> source.map{ row => {
>       val key = row(0)
>       val buff = new ArrayBuffer[Row]()
>       buff += row
>       (key,buff)
>    }
> }
>
> ...
>
> On Sun, Feb 26, 2017 at 7:31 AM, Stephen Fletcher <
> stephen.fletcher@gmail.com> wrote:
>
>> I'm attempting to perform a map on a Dataset[Row] but getting an error on
>> decode when attempting to pass a custom encoder.
>>  My code looks similar to the following:
>>
>>
>> val source = spark.read.format("parquet").load("/emrdata/sources/very_lar
>> ge_ds")
>>
>>
>>
>> source.map{ row => {
>>       val key = row(0)
>>
>>    }
>> }
>>
>
>

Re: attempting to map Dataset[Row]

Posted by Stephen Fletcher <st...@gmail.com>.
sorry here's the whole code

val source =
spark.read.format("parquet").load("/emrdata/sources/very_large_ds")

implicit val mapEncoder =
org.apache.spark.sql.Encoders.kryo[(Any,ArrayBuffer[Row])]

source.map{ row => {
      val key = row(0)
      val buff = new ArrayBuffer[Row]()
      buff += row
      (key,buff)
   }
}

...

On Sun, Feb 26, 2017 at 7:31 AM, Stephen Fletcher <
stephen.fletcher@gmail.com> wrote:

> I'm attempting to perform a map on a Dataset[Row] but getting an error on
> decode when attempting to pass a custom encoder.
>  My code looks similar to the following:
>
>
> val source = spark.read.format("parquet").load("/emrdata/sources/very_
> large_ds")
>
>
>
> source.map{ row => {
>       val key = row(0)
>
>    }
> }
>