You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/06/27 19:29:00 UTC
[jira] [Commented] (SPARK-21230) Spark Encoder with mysql Enum and data truncated Error

    [ https://issues.apache.org/jira/browse/SPARK-21230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065338#comment-16065338 ] 

Sean Owen commented on SPARK-21230:
-----------------------------------

This does also not look like a useful JIRA. It looks like a question about using MySQL and JDBC. Until it's narrowed down to a Spark issue, we'd generally close this.

> Spark Encoder with mysql Enum and data truncated Error
> ------------------------------------------------------
>
>                 Key: SPARK-21230
>                 URL: https://issues.apache.org/jira/browse/SPARK-21230
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.1.1
>         Environment: macosX
>            Reporter: Michael Kunkel
>
> I am using Spark via Java for a MYSQL/ML(machine learning) project.
> In the mysql database, I have a column "status_change_type" of type enum = {broke, fixed} in a table called "status_change" in a DB called "test".
> I have an object StatusChangeDB that constructs the needed structure for the table, however for the "status_change_type", I constructed it as a String. I know the bytes from MYSQL enum to Java string are much different, but I am using Spark, so the encoder does not recognize enums properly. However when I try to set the value of the enum via a Java string, I receive the "data truncated" error
> h5.     org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 9, localhost, executor driver): java.sql.BatchUpdateException: Data truncated for column 'status_change_type' at row 1 at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2055)
> I have tried to use enum for "status_change_type", however it fails with a stack trace of
> h5.     Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:127) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:127) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) ... ...
> h5. 
> I have tried to use the jdbc setting "jdbcCompliantTruncation=false" but this does nothing as I get the same error of "data truncated" as first stated. Here are my jdbc options map, in case I am using the "jdbcCompliantTruncation=false" incorrectly.
> public static Map<String, String> jdbcOptions() {
>     Map<String, String> jdbcOptions = new HashMap<String, String>();
>     jdbcOptions.put("url", "jdbc:mysql://localhost:3306/test?jdbcCompliantTruncation=false");
>     jdbcOptions.put("driver", "com.mysql.jdbc.Driver");
>     jdbcOptions.put("dbtable", "status_change");
>     jdbcOptions.put("user", "root");
>     jdbcOptions.put("password", "");
>     return jdbcOptions;
> }
> Here is the Spark method for inserting into the mysql DB
> private void insertMYSQLQuery(Dataset<Row> changeDF) {
>     try {
>         changeDF.write().mode(SaveMode.Append).jdbc(SparkManager.jdbcAppendOptions(), "status_change",
>                 new java.util.Properties());
>     } catch (Exception e) {
>         System.out.println(e);
>     }
> }
> where jdbcAppendOptions uses the jdbcOptions methods as:
> public static String jdbcAppendOptions() {
>     return SparkManager.jdbcOptions().get("url") + "&user=" + SparkManager.jdbcOptions().get("user") + "&password="
>             + SparkManager.jdbcOptions().get("password");
> }
> How do I achieve getting the values of type enum into the mysqlDB using spark, or avoiding this "data truncated" error?
> My only other thought would be to change the DB itself to use VARCHAR, but the project leader is not to happy with the idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org