You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kefah Issa <ke...@freesoft.jo> on 2014/07/10 14:39:04 UTC
RDD registerAsTable gives error on regular scala class records
Hi,
SQL on spark 1.0 is an interesting feature. It works fine when the "record"
is made of a case-class.
The issue I have is that I have around 50 attributes per record. scala
Case-class can not handle that (hard-coded limit is 22 for some reason). So
I created a regular class and defined the attributes in there.
// When running for the case-class I remove the "new"
val rdd = sc.textFile("myrecords.csv").map(line => new
Record(line.split(",")))
rdd.registerAsTable("records")
// This works
// case class Record(first:String, second:String, third:String)
// This causes the registerAsTable to fail
class Record (list:Array[String]) {
val first = list(0)
val second = list(1)
val third = list(3)
}
When compiling, I get the following error:
value registerAsTable is not a member of org.apache.spark.rdd.RDD[Record]
What I'm I missing here? or is SQL/Spark 1.0 only capable of dealing with
data set with 22-coloumns max?
Regards,
- Kefah.
Re: RDD registerAsTable gives error on regular scala class records
Posted by Thomas Robert <th...@creativedata.fr>.
Hi,
I'm quite a Spark newbie so I might be wrong but I think that
registerAsTable works either on case classes or on classes extending
Product.
You find this info in an example on the doc page of Spark SQL:
http://spark.apache.org/docs/latest/sql-programming-guide.html
// Define the schema using a case class.// Note: Case classes in Scala
2.10 can support only up to 22 fields. To work around this limit, //
you can use custom classes that implement the Product interface.case
class Person(name: String, age: Int)
If you want an example of a class extending Product in the code of
Sparkling Water:
https://github.com/0xdata/h2o-sparkling/blob/master/src/main/scala/water/sparkling/demo/Schemas.scala
class Airlines( year :Option[Int], // 0
month :Option[Int], // 1
dayOfMonth :Option[Int], // 2
dayOfWeek :Option[Int], // 3
crsDepTime :Option[Int], // 5
crsArrTime :Option[Int], // 7
uniqueCarrier :Option[String], // 8
flightNum :Option[Int], // 9
tailNum :Option[Int], // 10
crsElapsedTime:Option[Int], // 12
origin :Option[String], // 16
dest :Option[String], // 17
distance :Option[Int], // 18
isArrDelayed :Option[Boolean],// 29
isDepDelayed :Option[Boolean] // 30
) extends Product {
...
}
I managed to register tables larger than 22 columns with this method.
Bye.
--
*Thomas ROBERT*
www.creativedata.fr
2014-07-10 14:39 GMT+02:00 Kefah Issa <ke...@freesoft.jo>:
> Hi,
>
> SQL on spark 1.0 is an interesting feature. It works fine when the
> "record" is made of a case-class.
>
> The issue I have is that I have around 50 attributes per record. scala
> Case-class can not handle that (hard-coded limit is 22 for some reason). So
> I created a regular class and defined the attributes in there.
>
>
> // When running for the case-class I remove the "new"
> val rdd = sc.textFile("myrecords.csv").map(line => new
> Record(line.split(",")))
>
> rdd.registerAsTable("records")
>
>
> // This works
> // case class Record(first:String, second:String, third:String)
>
> // This causes the registerAsTable to fail
> class Record (list:Array[String]) {
> val first = list(0)
> val second = list(1)
> val third = list(3)
> }
>
>
> When compiling, I get the following error:
>
> value registerAsTable is not a member of org.apache.spark.rdd.RDD[Record]
>
> What I'm I missing here? or is SQL/Spark 1.0 only capable of dealing with
> data set with 22-coloumns max?
>
> Regards,
> - Kefah.
>