You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Tim Gautier <ti...@gmail.com> on 2016/05/31 23:17:12 UTC

Map tuple to case class in Dataset

How should I go about mapping from say a Dataset[(Int,Int)] to a
Dataset[<case class here>]?

I tried to use a map, but it throws exceptions:

case class Test(a: Int)
Seq(1,2).toDS.map(t => Test(t)).show

Thanks,
Tim

Re: Map tuple to case class in Dataset

Posted by Tim Gautier <ti...@gmail.com>.

I was getting a warning about /tmp/hive not being writable whenever I
started spark-shell, but I was ignoring it. I decided to set the
permissions to 777 and restart the shell. After doing that, I now get the
same result as Ted Yu when running Seq(1,2).toDS.map(t => Test(t)).show.

On Wed, Jun 1, 2016 at 9:05 AM Tim Gautier <ti...@gmail.com> wrote:

> I spun up another EC2 cluster today with Spark 1.6.1 and I still get the
> error.
>
> scala>       case class Test(a: Int)
> defined class Test
>
> scala>       Seq(1,2).toDS.map(t => Test(t)).show
> 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 39.0 in stage
> 0.0 (TID 39, ip-10-2-2-203.us-west-2.compute.internal):
> java.lang.NoClassDefFoundError: Could not initialize class $line29.$read$
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> 16/06/01 15:04:21 INFO scheduler.TaskSetManager: Starting task 39.1 in
> stage 0.0 (TID 40, ip-10-2-2-111.us-west-2.compute.internal, partition
> 39,PROCESS_LOCAL, 2386 bytes)
> 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 19.0 in stage
> 0.0 (TID 19, ip-10-2-2-203.us-west-2.compute.internal):
> java.lang.ExceptionInInitializerError
> at $line29.$read$$iwC.<init>(<console>:7)
> at $line29.$read.<init>(<console>:24)
> at $line29.$read$.<init>(<console>:28)
> at $line29.$read$.<clinit>(<console>)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at $line3.$read$$iwC$$iwC.<init>(<console>:15)
> at $line3.$read$$iwC.<init>(<console>:24)
> at $line3.$read.<init>(<console>:26)
> at $line3.$read$.<init>(<console>:30)
> at $line3.$read$.<clinit>(<console>)
> ... 18 more
>
>
> On Tue, May 31, 2016 at 8:48 PM Tim Gautier <ti...@gmail.com> wrote:
>
>> That's really odd. I copied that code directly out of the shell and it
>> errored out on me, several times. I wonder if something I did previously
>> caused some instability. I'll see if it happens again tomorrow.
>>
>> On Tue, May 31, 2016, 8:37 PM Ted Yu <yu...@gmail.com> wrote:
>>
>>> Using spark-shell of 1.6.1 :
>>>
>>> scala> case class Test(a: Int)
>>> defined class Test
>>>
>>> scala> Seq(1,2).toDS.map(t => Test(t)).show
>>> +---+
>>> |  a|
>>> +---+
>>> |  1|
>>> |  2|
>>> +---+
>>>
>>> FYI
>>>
>>> On Tue, May 31, 2016 at 7:35 PM, Tim Gautier <ti...@gmail.com>
>>> wrote:
>>>
>>>> 1.6.1 The exception is a null pointer exception. I'll paste the whole
>>>> thing after I fire my cluster up again tomorrow.
>>>>
>>>> I take it by the responses that this is supposed to work?
>>>>
>>>> Anyone know when the next version is coming out? I keep running into
>>>> bugs with 1.6.1 that are hindering my progress.
>>>>
>>>> On Tue, May 31, 2016, 8:21 PM Saisai Shao <sa...@gmail.com>
>>>> wrote:
>>>>
>>>>> It works fine in my local test, I'm using latest master, maybe this
>>>>> bug is already fixed.
>>>>>
>>>>> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust <
>>>>> michael@databricks.com> wrote:
>>>>>
>>>>>> Version of Spark? What is the exception?
>>>>>>
>>>>>> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier <ti...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> How should I go about mapping from say a Dataset[(Int,Int)] to a
>>>>>>> Dataset[<case class here>]?
>>>>>>>
>>>>>>> I tried to use a map, but it throws exceptions:
>>>>>>>
>>>>>>> case class Test(a: Int)
>>>>>>> Seq(1,2).toDS.map(t => Test(t)).show
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tim
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>

Re: Map tuple to case class in Dataset

Posted by Michael Armbrust <mi...@databricks.com>.

That error looks like its caused by the class being defined in the repl
itself.  $line29.$read$ is the name of out outer object that is being used
to compile the line containing case class Test(a: Int).

Is this EMR or the Apache 1.6.1 release?

On Wed, Jun 1, 2016 at 8:05 AM, Tim Gautier <ti...@gmail.com> wrote:

> I spun up another EC2 cluster today with Spark 1.6.1 and I still get the
> error.
>
> scala>       case class Test(a: Int)
> defined class Test
>
> scala>       Seq(1,2).toDS.map(t => Test(t)).show
> 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 39.0 in stage
> 0.0 (TID 39, ip-10-2-2-203.us-west-2.compute.internal):
> java.lang.NoClassDefFoundError: Could not initialize class $line29.$read$
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> 16/06/01 15:04:21 INFO scheduler.TaskSetManager: Starting task 39.1 in
> stage 0.0 (TID 40, ip-10-2-2-111.us-west-2.compute.internal, partition
> 39,PROCESS_LOCAL, 2386 bytes)
> 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 19.0 in stage
> 0.0 (TID 19, ip-10-2-2-203.us-west-2.compute.internal):
> java.lang.ExceptionInInitializerError
> at $line29.$read$$iwC.<init>(<console>:7)
> at $line29.$read.<init>(<console>:24)
> at $line29.$read$.<init>(<console>:28)
> at $line29.$read$.<clinit>(<console>)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at $line3.$read$$iwC$$iwC.<init>(<console>:15)
> at $line3.$read$$iwC.<init>(<console>:24)
> at $line3.$read.<init>(<console>:26)
> at $line3.$read$.<init>(<console>:30)
> at $line3.$read$.<clinit>(<console>)
> ... 18 more
>
>
> On Tue, May 31, 2016 at 8:48 PM Tim Gautier <ti...@gmail.com> wrote:
>
>> That's really odd. I copied that code directly out of the shell and it
>> errored out on me, several times. I wonder if something I did previously
>> caused some instability. I'll see if it happens again tomorrow.
>>
>> On Tue, May 31, 2016, 8:37 PM Ted Yu <yu...@gmail.com> wrote:
>>
>>> Using spark-shell of 1.6.1 :
>>>
>>> scala> case class Test(a: Int)
>>> defined class Test
>>>
>>> scala> Seq(1,2).toDS.map(t => Test(t)).show
>>> +---+
>>> |  a|
>>> +---+
>>> |  1|
>>> |  2|
>>> +---+
>>>
>>> FYI
>>>
>>> On Tue, May 31, 2016 at 7:35 PM, Tim Gautier <ti...@gmail.com>
>>> wrote:
>>>
>>>> 1.6.1 The exception is a null pointer exception. I'll paste the whole
>>>> thing after I fire my cluster up again tomorrow.
>>>>
>>>> I take it by the responses that this is supposed to work?
>>>>
>>>> Anyone know when the next version is coming out? I keep running into
>>>> bugs with 1.6.1 that are hindering my progress.
>>>>
>>>> On Tue, May 31, 2016, 8:21 PM Saisai Shao <sa...@gmail.com>
>>>> wrote:
>>>>
>>>>> It works fine in my local test, I'm using latest master, maybe this
>>>>> bug is already fixed.
>>>>>
>>>>> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust <
>>>>> michael@databricks.com> wrote:
>>>>>
>>>>>> Version of Spark? What is the exception?
>>>>>>
>>>>>> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier <ti...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> How should I go about mapping from say a Dataset[(Int,Int)] to a
>>>>>>> Dataset[<case class here>]?
>>>>>>>
>>>>>>> I tried to use a map, but it throws exceptions:
>>>>>>>
>>>>>>> case class Test(a: Int)
>>>>>>> Seq(1,2).toDS.map(t => Test(t)).show
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tim
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>

Re: Map tuple to case class in Dataset

Posted by Tim Gautier <ti...@gmail.com>.

I spun up another EC2 cluster today with Spark 1.6.1 and I still get the
error.

scala>       case class Test(a: Int)
defined class Test

scala>       Seq(1,2).toDS.map(t => Test(t)).show
16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 39.0 in stage
0.0 (TID 39, ip-10-2-2-203.us-west-2.compute.internal):
java.lang.NoClassDefFoundError: Could not initialize class $line29.$read$
at
$line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
at
$line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

16/06/01 15:04:21 INFO scheduler.TaskSetManager: Starting task 39.1 in
stage 0.0 (TID 40, ip-10-2-2-111.us-west-2.compute.internal, partition
39,PROCESS_LOCAL, 2386 bytes)
16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 19.0 in stage
0.0 (TID 19, ip-10-2-2-203.us-west-2.compute.internal):
java.lang.ExceptionInInitializerError
at $line29.$read$$iwC.<init>(<console>:7)
at $line29.$read.<init>(<console>:24)
at $line29.$read$.<init>(<console>:28)
at $line29.$read$.<clinit>(<console>)
at
$line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
at
$line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at $line3.$read$$iwC$$iwC.<init>(<console>:15)
at $line3.$read$$iwC.<init>(<console>:24)
at $line3.$read.<init>(<console>:26)
at $line3.$read$.<init>(<console>:30)
at $line3.$read$.<clinit>(<console>)
... 18 more

On Tue, May 31, 2016 at 8:48 PM Tim Gautier <ti...@gmail.com> wrote:

> That's really odd. I copied that code directly out of the shell and it
> errored out on me, several times. I wonder if something I did previously
> caused some instability. I'll see if it happens again tomorrow.
>
> On Tue, May 31, 2016, 8:37 PM Ted Yu <yu...@gmail.com> wrote:
>
>> Using spark-shell of 1.6.1 :
>>
>> scala> case class Test(a: Int)
>> defined class Test
>>
>> scala> Seq(1,2).toDS.map(t => Test(t)).show
>> +---+
>> |  a|
>> +---+
>> |  1|
>> |  2|
>> +---+
>>
>> FYI
>>
>> On Tue, May 31, 2016 at 7:35 PM, Tim Gautier <ti...@gmail.com>
>> wrote:
>>
>>> 1.6.1 The exception is a null pointer exception. I'll paste the whole
>>> thing after I fire my cluster up again tomorrow.
>>>
>>> I take it by the responses that this is supposed to work?
>>>
>>> Anyone know when the next version is coming out? I keep running into
>>> bugs with 1.6.1 that are hindering my progress.
>>>
>>> On Tue, May 31, 2016, 8:21 PM Saisai Shao <sa...@gmail.com>
>>> wrote:
>>>
>>>> It works fine in my local test, I'm using latest master, maybe this bug
>>>> is already fixed.
>>>>
>>>> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust <
>>>> michael@databricks.com> wrote:
>>>>
>>>>> Version of Spark? What is the exception?
>>>>>
>>>>> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier <ti...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> How should I go about mapping from say a Dataset[(Int,Int)] to a
>>>>>> Dataset[<case class here>]?
>>>>>>
>>>>>> I tried to use a map, but it throws exceptions:
>>>>>>
>>>>>> case class Test(a: Int)
>>>>>> Seq(1,2).toDS.map(t => Test(t)).show
>>>>>>
>>>>>> Thanks,
>>>>>> Tim
>>>>>>
>>>>>
>>>>>
>>>>
>>

Re: Map tuple to case class in Dataset

Posted by Tim Gautier <ti...@gmail.com>.

That's really odd. I copied that code directly out of the shell and it
errored out on me, several times. I wonder if something I did previously
caused some instability. I'll see if it happens again tomorrow.

On Tue, May 31, 2016, 8:37 PM Ted Yu <yu...@gmail.com> wrote:

> Using spark-shell of 1.6.1 :
>
> scala> case class Test(a: Int)
> defined class Test
>
> scala> Seq(1,2).toDS.map(t => Test(t)).show
> +---+
> |  a|
> +---+
> |  1|
> |  2|
> +---+
>
> FYI
>
> On Tue, May 31, 2016 at 7:35 PM, Tim Gautier <ti...@gmail.com>
> wrote:
>
>> 1.6.1 The exception is a null pointer exception. I'll paste the whole
>> thing after I fire my cluster up again tomorrow.
>>
>> I take it by the responses that this is supposed to work?
>>
>> Anyone know when the next version is coming out? I keep running into bugs
>> with 1.6.1 that are hindering my progress.
>>
>> On Tue, May 31, 2016, 8:21 PM Saisai Shao <sa...@gmail.com> wrote:
>>
>>> It works fine in my local test, I'm using latest master, maybe this bug
>>> is already fixed.
>>>
>>> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust <michael@databricks.com
>>> > wrote:
>>>
>>>> Version of Spark? What is the exception?
>>>>
>>>> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier <ti...@gmail.com>
>>>> wrote:
>>>>
>>>>> How should I go about mapping from say a Dataset[(Int,Int)] to a
>>>>> Dataset[<case class here>]?
>>>>>
>>>>> I tried to use a map, but it throws exceptions:
>>>>>
>>>>> case class Test(a: Int)
>>>>> Seq(1,2).toDS.map(t => Test(t)).show
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>
>>>>
>>>
>

Re: Map tuple to case class in Dataset

Posted by Ted Yu <yu...@gmail.com>.

Using spark-shell of 1.6.1 :

scala> case class Test(a: Int)
defined class Test

scala> Seq(1,2).toDS.map(t => Test(t)).show
+---+
|  a|
+---+
|  1|
|  2|
+---+

FYI

On Tue, May 31, 2016 at 7:35 PM, Tim Gautier <ti...@gmail.com> wrote:

> 1.6.1 The exception is a null pointer exception. I'll paste the whole
> thing after I fire my cluster up again tomorrow.
>
> I take it by the responses that this is supposed to work?
>
> Anyone know when the next version is coming out? I keep running into bugs
> with 1.6.1 that are hindering my progress.
>
> On Tue, May 31, 2016, 8:21 PM Saisai Shao <sa...@gmail.com> wrote:
>
>> It works fine in my local test, I'm using latest master, maybe this bug
>> is already fixed.
>>
>> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust <mi...@databricks.com>
>> wrote:
>>
>>> Version of Spark? What is the exception?
>>>
>>> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier <ti...@gmail.com>
>>> wrote:
>>>
>>>> How should I go about mapping from say a Dataset[(Int,Int)] to a
>>>> Dataset[<case class here>]?
>>>>
>>>> I tried to use a map, but it throws exceptions:
>>>>
>>>> case class Test(a: Int)
>>>> Seq(1,2).toDS.map(t => Test(t)).show
>>>>
>>>> Thanks,
>>>> Tim
>>>>
>>>
>>>
>>

Re: Map tuple to case class in Dataset

Posted by Tim Gautier <ti...@gmail.com>.

1.6.1 The exception is a null pointer exception. I'll paste the whole thing
after I fire my cluster up again tomorrow.

I take it by the responses that this is supposed to work?

Anyone know when the next version is coming out? I keep running into bugs
with 1.6.1 that are hindering my progress.

On Tue, May 31, 2016, 8:21 PM Saisai Shao <sa...@gmail.com> wrote:

> It works fine in my local test, I'm using latest master, maybe this bug is
> already fixed.
>
> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> Version of Spark? What is the exception?
>>
>> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier <ti...@gmail.com>
>> wrote:
>>
>>> How should I go about mapping from say a Dataset[(Int,Int)] to a
>>> Dataset[<case class here>]?
>>>
>>> I tried to use a map, but it throws exceptions:
>>>
>>> case class Test(a: Int)
>>> Seq(1,2).toDS.map(t => Test(t)).show
>>>
>>> Thanks,
>>> Tim
>>>
>>
>>
>

Re: Map tuple to case class in Dataset

Posted by Saisai Shao <sa...@gmail.com>.

It works fine in my local test, I'm using latest master, maybe this bug is
already fixed.

On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust <mi...@databricks.com>
wrote:

> Version of Spark? What is the exception?
>
> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier <ti...@gmail.com>
> wrote:
>
>> How should I go about mapping from say a Dataset[(Int,Int)] to a
>> Dataset[<case class here>]?
>>
>> I tried to use a map, but it throws exceptions:
>>
>> case class Test(a: Int)
>> Seq(1,2).toDS.map(t => Test(t)).show
>>
>> Thanks,
>> Tim
>>
>
>

Re: Map tuple to case class in Dataset

Posted by Michael Armbrust <mi...@databricks.com>.

Version of Spark? What is the exception?

On Tue, May 31, 2016 at 4:17 PM, Tim Gautier <ti...@gmail.com> wrote:

> How should I go about mapping from say a Dataset[(Int,Int)] to a
> Dataset[<case class here>]?
>
> I tried to use a map, but it throws exceptions:
>
> case class Test(a: Int)
> Seq(1,2).toDS.map(t => Test(t)).show
>
> Thanks,
> Tim
>