You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "TranHuyTiep (via GitHub)" <gi...@apache.org> on 2023/03/31 09:18:35 UTC

[GitHub] [hudi] TranHuyTiep opened a new issue, #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

TranHuyTiep opened a new issue, #8340:
URL: https://github.com/apache/hudi/issues/8340

   
   **Describe the problem you faced**
   - I submit spark job in k8s by spark operator read file by hudi lib throw cannot assign instance of java.lang.invoke.SerializedLambda
   - I read the file directly on the parquet  works fine
   
   **Environment Description**
   
   * Hudi version : 0.13.0
   * Spark version : 3.3.1, apache/spark-py:v3.3.1
   * Running on Docker operator : sparkoperator.k8s.io/v1beta2
   * openjdk version "11.0.16"
     OpenJDK Runtime Environment 18.9 (build 11.0.16+8)
     OpenJDK 64-Bit Server VM 18.9 (build 11.0.16+8, mixed mode, sharing)
   * SPARK_PACKAGES=org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0,org.apache.spark:spark-avro_2.12:3.3.1
   
   **Additional context**
   - file yaml
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: Demo
     namespace: default
   spec:
     type: Python
     pythonVersion: "3"
     mode: cluster
     image: apache/spark-py:v3.3.1
     imagePullPolicy: Always
     mainApplicationFile: local:///opt/spark/work-dir/demo.py
     sparkVersion: "3.3.1"
     restartPolicy:
       type: OnFailure
       onFailureRetries: 3
       onFailureRetryInterval: 10
       onSubmissionFailureRetries: 5
       onSubmissionFailureRetryInterval: 20
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "1024m"
       labels:
         version: 3.3.1
       serviceAccount: spark
       envFrom:
         - configMapRef:
             name: spark-configmap
     executor:
       cores: 1
       instances: 2
       memory: "1024m"
       labels:
         version: 3.3.1
       envFrom:
         - configMapRef:
             name: spark-configmap
     sparkConf:
       spark.jars.packages: "org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0,org.apache.spark:spark-avro_2.12:3.3.1"
       spark.rdd.compress: "true"
       spark.serializer: "org.apache.spark.serializer.KryoSerializer"
       spark.sql.shuffle.partitions: "100"
   
   -  throw an error
     `INFO DAGScheduler: Job 0 failed: collect at HoodieSparkEngineContext.java:137, took 0.609365 s
   Traceback (most recent call last):
     File "/opt/spark/work-dir/demo.py", line 73, in <module>
       df_load.show(10)
     File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 606, in show
     File "/opt/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
     File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco
     File "/opt/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o70.showString.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.244.0.45 executor 2): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD`
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] TranHuyTiep commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

Posted by "TranHuyTiep (via GitHub)" <gi...@apache.org>.

TranHuyTiep commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1493224689

   > is it can work in your local ? not k8s ?
   yes,  it can work in local
   I set up spark_conf.setMaster("local[*]") can work in k8s but not create executor and run in one driver


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codope commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

Posted by "codope (via GitHub)" <gi...@apache.org>.

codope commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1514660990

   cc @harsh1231 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] TranHuyTiep commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

Posted by "TranHuyTiep (via GitHub)" <gi...@apache.org>.

TranHuyTiep commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1538193791

   > @TranHuyTiep Were you able to resolve this issue or still facing the same?
   I solved the above problem by build new image and copy all packages in .ivy2/jars/* to /opt/spark/jars/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1539430091

   Thanks @TranHuyTiep. Closing the issue as you are able to fix. Please reopen if you see issue again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] TranHuyTiep commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

Posted by "TranHuyTiep (via GitHub)" <gi...@apache.org>.

TranHuyTiep commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1492861891

   > can you provide a simple reproduce step, like `code` or `sql`
   
   Here is my code
   
   `# -*- coding: utf-8 -*-
   from pyspark.conf import SparkConf
   from pyspark.sql import SparkSession
   
   if __name__ == '__main__':
       print("_____________Run______________")
   
       spark_conf = SparkConf()
       spark_conf.set("spark.jars.packages", "org.apache.hudi:hudi-spark3.2-bundle_2.12:0.13.0,org.apache.spark:spark-avro_2.12:3.3.1")
       spark_conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
       spark_conf.set("spark.sql.hive.convertMetastoreParquet", "false")
       spark_conf.set("spark.rdd.compress", "true")
   
       sparkSession = (SparkSession
                       .builder
                       .appName('read_hudi')
                       .config(conf=spark_conf)
                       .getOrCreate())
   
       file_path = "hdfs://hdfs_host:9000/data/dwd/trans_event"
   
       # READ HUDI
       df_load = sparkSession.read.format("org.apache.hudi").load(
           file_path, inferSchema=True,
           header=True
       )
   
       # READ
       df_load.createOrReplaceTempView("trans_event")
       query_e34_trans_raw = """
           SELECT transaction_data.profile_id FROM trans_event limit 10
       """
       df_load_profile_trans = sparkSession.sql(sqlQuery=query_e34_trans_raw)
       df_load_profile_trans.printSchema()
       df_load_profile_trans.show(10)
       sparkSession.stop()
   
       print("_____________End______________")
   
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bigdata-spec commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

Posted by "bigdata-spec (via GitHub)" <gi...@apache.org>.

bigdata-spec commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1538137162

   @TranHuyTiep I have the same  environment on k8s. how can I connect with you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1506785394

   @TranHuyTiep Were you able to resolve this issue or still facing the same?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] KnightChess commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

Posted by "KnightChess (via GitHub)" <gi...@apache.org>.

KnightChess commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1492815283

   can you provide a simple reproduce step, like `code` or  `sql`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda [hudi]

Posted by "parisni (via GitHub)" <gi...@apache.org>.

parisni commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1983524308

   > I solved the above problem by build new image and copy all packages in .ivy2/jars/* to /opt/spark/jars/
   
   same here on kubernetes. Sounds like k8s does not works well with `spark.jars.packages` and hudi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] KnightChess commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

Posted by "KnightChess (via GitHub)" <gi...@apache.org>.

KnightChess commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1493198904

   is it can work in your local ? not k8s ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] KnightChess commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

Posted by "KnightChess (via GitHub)" <gi...@apache.org>.

KnightChess commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1493256045

   have you point some config in your submit command, something like:
   ```shell
   pyspark \
   --packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 \
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
   --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \
   --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   ```
   and look like the env is different in diff submit mode. I think you can confirm the env, for example, if you in cluster mode and use yarn archive, make sure hudi jar in archive and so on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codope closed issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda

Posted by "codope (via GitHub)" <gi...@apache.org>.

codope closed issue #8340: [SUPPORT]  cannot assign instance of java.lang.invoke.SerializedLambda
URL: https://github.com/apache/hudi/issues/8340


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org