You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Anuj Gargava (Jira)" <ji...@apache.org> on 2022/08/15 00:29:00 UTC

[jira] [Created] (SPARK-40074) Error while creating dataset in Java spark-3.x using Encoders bean with Dense Vector. (Issue arises when updating spark from 2.4 to 3.x)

Anuj Gargava created SPARK-40074:
------------------------------------

             Summary: Error while creating dataset in Java spark-3.x using Encoders bean with Dense Vector. (Issue arises when updating spark from 2.4 to 3.x)
                 Key: SPARK-40074
                 URL: https://issues.apache.org/jira/browse/SPARK-40074
             Project: Spark
          Issue Type: Bug
          Components: Java API, ML, SQL
    Affects Versions: 3.1.2
         Environment: Scala 2.12

Spark 3.x
            Reporter: Anuj Gargava


Encountered a compatibility issue while upgrading spark from 2.4 to 3.x (also scala is upgraded from 2.11 to 2.12). 

This java code below used to work with spark 2.4 but when migrated to 3.x it gives the error (mentioned below) I have done my own research but couldn't find a solution or any related information.
 
{{public void test() {    final SparkSession spark = SparkSession.builder()
            .appName("Test")
            .getOrCreate();    DenseClass denseFactor1 = new DenseClass( new DenseVector( new double[]{0.13, 0.24}));    DenseClass denseFactor2 = new DenseClass( new DenseVector( new double[]{0.24, 0.32}));    final List<DenseClass> inputsNew = Arrays.asList(denseFactor1, denseFactor2);    final Dataset<DenseClass> denseVectorDf = spark.createDataset(inputsNew, Encoders.bean(DenseClass.class));
    denseVectorDf.printSchema();
}public static class DenseClass implements Serializable {    private org.apache.spark.ml.linalg.DenseVector denseVector;
  }}}

The error occurs while creating the dataset *denseVectorDf* .

Error
 
{{org.apache.spark.sql.AnalysisException: Cannot up cast `denseVector` from struct<> to struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.
The type path of the target object is:
- field (class: "org.apache.spark.ml.linalg.DenseVector", name: "denseVector")
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object}}

I have tried to use _double_ instead of dense vector and it works just fine, but fails on using the dense vector with encoders bean.

 

StackOverflow link for the issue: [https://stackoverflow.com/questions/73313660/error-while-creating-dataset-in-java-spark-3-x-using-encoders-bean-with-dense-ve]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org