You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kefah Issa <ke...@freesoft.jo> on 2014/07/10 14:39:04 UTC

RDD registerAsTable gives error on regular scala class records

Hi,

SQL on spark 1.0 is an interesting feature. It works fine when the "record"
is made of a case-class.

The issue I have is that I have around 50 attributes per record. scala
Case-class can not handle that (hard-coded limit is 22 for some reason). So
I created a regular class and defined the attributes in there.


// When running for the case-class I remove the "new"
val rdd = sc.textFile("myrecords.csv").map(line => new
Record(line.split(",")))

rdd.registerAsTable("records")


// This works
// case class Record(first:String, second:String, third:String)

// This causes the registerAsTable to fail
class Record (list:Array[String]) {
val first = list(0)
val second = list(1)
val third = list(3)
}


When compiling, I get the following error:

value registerAsTable is not a member of org.apache.spark.rdd.RDD[Record]

What I'm I missing here? or is SQL/Spark 1.0 only capable of dealing with
data set with 22-coloumns max?

Regards,
- Kefah.

Re: RDD registerAsTable gives error on regular scala class records

Posted by Thomas Robert <th...@creativedata.fr>.
Hi,

I'm quite a Spark newbie so I might be wrong but I think that
registerAsTable works either on case classes or on classes extending
Product.

You find this info in an example on the doc page of Spark SQL:
http://spark.apache.org/docs/latest/sql-programming-guide.html

// Define the schema using a case class.// Note: Case classes in Scala
2.10 can support only up to 22 fields. To work around this limit, //
you can use custom classes that implement the Product interface.case
class Person(name: String, age: Int)


If you want an example of a class extending Product in the code of
Sparkling Water:
https://github.com/0xdata/h2o-sparkling/blob/master/src/main/scala/water/sparkling/demo/Schemas.scala

class Airlines( year          :Option[Int],    // 0
                month         :Option[Int],    // 1
                dayOfMonth    :Option[Int],    // 2
                dayOfWeek     :Option[Int],    // 3
                crsDepTime    :Option[Int],    // 5
                crsArrTime    :Option[Int],    // 7
                uniqueCarrier :Option[String], // 8
                flightNum     :Option[Int],    // 9
                tailNum       :Option[Int],    // 10
                crsElapsedTime:Option[Int],    // 12
                origin        :Option[String], // 16
                dest          :Option[String], // 17
                distance      :Option[Int],    // 18
                isArrDelayed  :Option[Boolean],// 29
                isDepDelayed  :Option[Boolean] // 30
                ) extends Product {
...
}


I managed to register tables larger than 22 columns with this method.

Bye.

-- 

*Thomas ROBERT*
www.creativedata.fr


2014-07-10 14:39 GMT+02:00 Kefah Issa <ke...@freesoft.jo>:

> Hi,
>
> SQL on spark 1.0 is an interesting feature. It works fine when the
> "record" is made of a case-class.
>
> The issue I have is that I have around 50 attributes per record. scala
> Case-class can not handle that (hard-coded limit is 22 for some reason). So
> I created a regular class and defined the attributes in there.
>
>
> // When running for the case-class I remove the "new"
> val rdd = sc.textFile("myrecords.csv").map(line => new
> Record(line.split(",")))
>
> rdd.registerAsTable("records")
>
>
> // This works
> // case class Record(first:String, second:String, third:String)
>
> // This causes the registerAsTable to fail
> class Record (list:Array[String]) {
> val first = list(0)
> val second = list(1)
> val third = list(3)
> }
>
>
> When compiling, I get the following error:
>
> value registerAsTable is not a member of org.apache.spark.rdd.RDD[Record]
>
> What I'm I missing here? or is SQL/Spark 1.0 only capable of dealing with
> data set with 22-coloumns max?
>
> Regards,
> - Kefah.
>