You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2016/11/03 09:50:58 UTC
[jira] [Updated] (SPARK-18246) Throws an exception before execution for unsupported types in Json, CSV and text functionailities

     [ https://issues.apache.org/jira/browse/SPARK-18246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-18246:
---------------------------------
    Description: 
* Case 1

{code}
val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": "str$i"}""")
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").json(rdd).show()
{code}

should throw an exception before the execution.


* Case 2

{code}
val path = "/tmp/a"
val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": "str$i"}""").saveAsTextFile(path)
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").json(path).show()
{code}

should throw an exception before the execution.

* Case 3

{code}
val path = "/tmp/b"
val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").csv(path).show()
{code}

should throw an exception before the execution.

* Case 4

{code}
val path = "/tmp/c"
val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
val schema = new StructType().add("a", LongType)
spark.read.schema(schema).text(path).show()
{code}

should throw an exception before the execution rather than printing incorrect values.

{code}
+-----------+
|          a|
+-----------+
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476739|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
+-----------+
{code}


* Case 5

{code}
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import spark.implicits._

val df = Seq("""{"a" 1}""").toDS()
val schema = new StructType().add("a", CalendarIntervalType)
df.select(from_json($"value", schema)).show()
{code}

prints

{code}
+-------------------+
|jsontostruct(value)|
+-------------------+
|               null|
+-------------------+
{code}

This should throw analysis exception as {{CalendarIntervalType}} is not supported.


Likewise {{to_json}} throws an analysis error, for example,

{code}
val df = Seq(Tuple1(Tuple1("interval -3 month 7 hours"))).toDF("a")
  .select(struct($"a._1".cast(CalendarIntervalType).as("a")).as("c"))
df.select(to_json($"c")).collect()
{code}

  was:
* Case 1

{code}
val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": "str$i"}""")
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").json(rdd).show()
{code}

should throw an exception before the execution.


* Case 2

{code}
val path = "/tmp/a"
val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": "str$i"}""").saveAsTextFile(path)
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").json(path).show()
{code}

should throw an exception before the execution.

* Case 3

{code}
val path = "/tmp/b"
val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").csv(path).show()
{code}

should throw an exception before the execution.

* Case 4

{code}
val path = "/tmp/c"
val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
val schema = new StructType().add("a", LongType)
spark.read.schema(schema).text(path).show()
{code}

should throw an exception before the execution rather than printing incorrect values.

{code}
+-----------+
|          a|
+-----------+
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476739|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
+-----------+
{code}


* Case 5

{code}
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import spark.implicits._

val df = Seq("""{"a" 1}""").toDS()
val schema = new StructType().add("a", CalendarIntervalType)
df.select(from_json($"value", schema)).collect()
{code}

prints

{code}
+-------------------+
|jsontostruct(value)|
+-------------------+
|               null|
+-------------------+
{code}

This should throw analysis exception as {{CalendarIntervalType}} is not supported.


Likewise {{to_json}} throws an analysis error, for example,

{code}
val df = Seq(Tuple1(Tuple1("interval -3 month 7 hours"))).toDF("a")
  .select(struct($"a._1".cast(CalendarIntervalType).as("a")).as("c"))
df.select(to_json($"c")).collect()
{code}


> Throws an exception before execution for unsupported types in Json, CSV and text functionailities
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18246
>                 URL: https://issues.apache.org/jira/browse/SPARK-18246
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Hyukjin Kwon
>
> * Case 1
> {code}
> val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": "str$i"}""")
> val schema = new StructType().add("a", CalendarIntervalType)
> spark.read.schema(schema).option("mode", "FAILFAST").json(rdd).show()
> {code}
> should throw an exception before the execution.
> * Case 2
> {code}
> val path = "/tmp/a"
> val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": "str$i"}""").saveAsTextFile(path)
> val schema = new StructType().add("a", CalendarIntervalType)
> spark.read.schema(schema).option("mode", "FAILFAST").json(path).show()
> {code}
> should throw an exception before the execution.
> * Case 3
> {code}
> val path = "/tmp/b"
> val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
> val schema = new StructType().add("a", CalendarIntervalType)
> spark.read.schema(schema).option("mode", "FAILFAST").csv(path).show()
> {code}
> should throw an exception before the execution.
> * Case 4
> {code}
> val path = "/tmp/c"
> val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
> val schema = new StructType().add("a", LongType)
> spark.read.schema(schema).text(path).show()
> {code}
> should throw an exception before the execution rather than printing incorrect values.
> {code}
> +-----------+
> |          a|
> +-----------+
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476739|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> |68719476738|
> +-----------+
> {code}
> * Case 5
> {code}
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.functions._
> import spark.implicits._
> val df = Seq("""{"a" 1}""").toDS()
> val schema = new StructType().add("a", CalendarIntervalType)
> df.select(from_json($"value", schema)).show()
> {code}
> prints
> {code}
> +-------------------+
> |jsontostruct(value)|
> +-------------------+
> |               null|
> +-------------------+
> {code}
> This should throw analysis exception as {{CalendarIntervalType}} is not supported.
> Likewise {{to_json}} throws an analysis error, for example,
> {code}
> val df = Seq(Tuple1(Tuple1("interval -3 month 7 hours"))).toDF("a")
>   .select(struct($"a._1".cast(CalendarIntervalType).as("a")).as("c"))
> df.select(to_json($"c")).collect()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org