You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/06/30 01:25:00 UTC

[jira] [Resolved] (SPARK-21230) Spark Encoder with mysql Enum and data truncated Error

     [ https://issues.apache.org/jira/browse/SPARK-21230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-21230.
----------------------------------
    Resolution: Invalid

It is hard for me to read and reproduce as well. Let's close unless you are going to fix it or provide some steps to reproduce anyone can follow explicitly. Also, I think the point is to narrow down. I don't think this is an actionable JIRA as well.


> Spark Encoder with mysql Enum and data truncated Error
> ------------------------------------------------------
>
>                 Key: SPARK-21230
>                 URL: https://issues.apache.org/jira/browse/SPARK-21230
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.1.1
>         Environment: macosX
>            Reporter: Michael Kunkel
>
> I am using Spark via Java for a MYSQL/ML(machine learning) project.
> In the mysql database, I have a column "status_change_type" of type enum = {broke, fixed} in a table called "status_change" in a DB called "test".
> I have an object StatusChangeDB that constructs the needed structure for the table, however for the "status_change_type", I constructed it as a String. I know the bytes from MYSQL enum to Java string are much different, but I am using Spark, so the encoder does not recognize enums properly. However when I try to set the value of the enum via a Java string, I receive the "data truncated" error
> h5.     org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 9, localhost, executor driver): java.sql.BatchUpdateException: Data truncated for column 'status_change_type' at row 1 at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2055)
> I have tried to use enum for "status_change_type", however it fails with a stack trace of
> h5.     Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:127) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:127) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) ... ...
> h5. 
> I have tried to use the jdbc setting "jdbcCompliantTruncation=false" but this does nothing as I get the same error of "data truncated" as first stated. Here are my jdbc options map, in case I am using the "jdbcCompliantTruncation=false" incorrectly.
> public static Map<String, String> jdbcOptions() {
>     Map<String, String> jdbcOptions = new HashMap<String, String>();
>     jdbcOptions.put("url", "jdbc:mysql://localhost:3306/test?jdbcCompliantTruncation=false");
>     jdbcOptions.put("driver", "com.mysql.jdbc.Driver");
>     jdbcOptions.put("dbtable", "status_change");
>     jdbcOptions.put("user", "root");
>     jdbcOptions.put("password", "");
>     return jdbcOptions;
> }
> Here is the Spark method for inserting into the mysql DB
> private void insertMYSQLQuery(Dataset<Row> changeDF) {
>     try {
>         changeDF.write().mode(SaveMode.Append).jdbc(SparkManager.jdbcAppendOptions(), "status_change",
>                 new java.util.Properties());
>     } catch (Exception e) {
>         System.out.println(e);
>     }
> }
> where jdbcAppendOptions uses the jdbcOptions methods as:
> public static String jdbcAppendOptions() {
>     return SparkManager.jdbcOptions().get("url") + "&user=" + SparkManager.jdbcOptions().get("user") + "&password="
>             + SparkManager.jdbcOptions().get("password");
> }
> How do I achieve getting the values of type enum into the mysqlDB using spark, or avoiding this "data truncated" error?
> My only other thought would be to change the DB itself to use VARCHAR, but the project leader is not to happy with the idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org