You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Justin Uang (JIRA)" <ji...@apache.org> on 2015/04/20 17:09:59 UTC

[jira] [Commented] (SPARK-6999) infinite recursion with createDataFrame(JavaRDD[Row], java.util.List[String])

    [ https://issues.apache.org/jira/browse/SPARK-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502994#comment-14502994 ] 

Justin Uang commented on SPARK-6999:
------------------------------------

Looking at the source, it looks like one way to implement this is to extract part of getSchema(), specifically the

{code}
        case c: Class[_] if c.isAnnotationPresent(classOf[SQLUserDefinedType]) =>
          (c.getAnnotation(classOf[SQLUserDefinedType]).udt().newInstance(), true)
        case c: Class[_] if c == classOf[java.lang.String] => (StringType, true)
        case c: Class[_] if c == java.lang.Short.TYPE => (ShortType, false)
        case c: Class[_] if c == java.lang.Integer.TYPE => (IntegerType, false)
        case c: Class[_] if c == java.lang.Long.TYPE => (LongType, false)
        case c: Class[_] if c == java.lang.Double.TYPE => (DoubleType, false)
        case c: Class[_] if c == java.lang.Byte.TYPE => (ByteType, false)
        case c: Class[_] if c == java.lang.Float.TYPE => (FloatType, false)
        case c: Class[_] if c == java.lang.Boolean.TYPE => (BooleanType, false)

        case c: Class[_] if c == classOf[java.lang.Short] => (ShortType, true)
        case c: Class[_] if c == classOf[java.lang.Integer] => (IntegerType, true)
        case c: Class[_] if c == classOf[java.lang.Long] => (LongType, true)
        case c: Class[_] if c == classOf[java.lang.Double] => (DoubleType, true)
        case c: Class[_] if c == classOf[java.lang.Byte] => (ByteType, true)
        case c: Class[_] if c == classOf[java.lang.Float] => (FloatType, true)
        case c: Class[_] if c == classOf[java.lang.Boolean] => (BooleanType, true)
        case c: Class[_] if c == classOf[java.math.BigDecimal] => (DecimalType(), true)
        case c: Class[_] if c == classOf[java.sql.Date] => (DateType, true)
        case c: Class[_] if c == classOf[java.sql.Timestamp] => (TimestampType, true)
{code}

section and then have another method pull out elements from the first {{Row}} using {{Row.get}}, then using the switch statement to identify the type. Are there any gotchas that I'm missing?

> infinite recursion with createDataFrame(JavaRDD[Row], java.util.List[String])
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-6999
>                 URL: https://issues.apache.org/jira/browse/SPARK-6999
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: Justin Uang
>            Priority: Critical
>
> It looks like 
> {code}
>   def createDataFrame(rowRDD: JavaRDD[Row], columns: java.util.List[String]): DataFrame = {
>     createDataFrame(rowRDD.rdd, columns.toSeq)
>   }
> {code}
> is in fact an infinite recursion because it calls itself. Scala implicit conversions convert the arguments back into a JavaRDD and a java.util.List.
> {code}
> 15/04/19 16:51:24 INFO BlockManagerMaster: Trying to register BlockManager
> 15/04/19 16:51:24 INFO BlockManagerMasterActor: Registering block manager localhost:53711 with 1966.1 MB RAM, BlockManagerId(<driver>, localhost, 53711)
> 15/04/19 16:51:24 INFO BlockManagerMaster: Registered BlockManager
> Exception in thread "main" java.lang.StackOverflowError
>     at scala.collection.mutable.AbstractSeq.<init>(Seq.scala:47)
>     at scala.collection.mutable.AbstractBuffer.<init>(Buffer.scala:48)
>     at scala.collection.convert.Wrappers$JListWrapper.<init>(Wrappers.scala:84)
>     at scala.collection.convert.WrapAsScala$class.asScalaBuffer(WrapAsScala.scala:127)
>     at scala.collection.JavaConversions$.asScalaBuffer(JavaConversions.scala:53)
>     at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:408)
>     at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:408)
>     at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:408)
>     at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:408)
> {code}
> Here is the code sample I used to reproduce the issue:
> {code}
> /**
>  * @author juang
>  */
> public final class InfiniteRecursionExample {
>     public static void main(String[] args) {
>         JavaSparkContext sc = new JavaSparkContext("local", "infinite_recursion_example");
>         List<Row> rows = Lists.newArrayList();
>         JavaRDD<Row> rowRDD = sc.parallelize(rows);
>         SQLContext sqlContext = new SQLContext(sc);
>         sqlContext.createDataFrame(rowRDD, ImmutableList.of("myCol"));
>     }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org