You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "TranHuyTiep (via GitHub)" <gi...@apache.org> on 2023/03/31 09:18:35 UTC
[GitHub] [hudi] TranHuyTiep opened a new issue, #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
TranHuyTiep opened a new issue, #8340:
URL: https://github.com/apache/hudi/issues/8340
**Describe the problem you faced**
- I submit spark job in k8s by spark operator read file by hudi lib throw cannot assign instance of java.lang.invoke.SerializedLambda
- I read the file directly on the parquet works fine
**Environment Description**
* Hudi version : 0.13.0
* Spark version : 3.3.1, apache/spark-py:v3.3.1
* Running on Docker operator : sparkoperator.k8s.io/v1beta2
* openjdk version "11.0.16"
OpenJDK Runtime Environment 18.9 (build 11.0.16+8)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.16+8, mixed mode, sharing)
* SPARK_PACKAGES=org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0,org.apache.spark:spark-avro_2.12:3.3.1
**Additional context**
- file yaml
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: Demo
namespace: default
spec:
type: Python
pythonVersion: "3"
mode: cluster
image: apache/spark-py:v3.3.1
imagePullPolicy: Always
mainApplicationFile: local:///opt/spark/work-dir/demo.py
sparkVersion: "3.3.1"
restartPolicy:
type: OnFailure
onFailureRetries: 3
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
driver:
cores: 1
coreLimit: "1200m"
memory: "1024m"
labels:
version: 3.3.1
serviceAccount: spark
envFrom:
- configMapRef:
name: spark-configmap
executor:
cores: 1
instances: 2
memory: "1024m"
labels:
version: 3.3.1
envFrom:
- configMapRef:
name: spark-configmap
sparkConf:
spark.jars.packages: "org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0,org.apache.spark:spark-avro_2.12:3.3.1"
spark.rdd.compress: "true"
spark.serializer: "org.apache.spark.serializer.KryoSerializer"
spark.sql.shuffle.partitions: "100"
- throw an error
`INFO DAGScheduler: Job 0 failed: collect at HoodieSparkEngineContext.java:137, took 0.609365 s
Traceback (most recent call last):
File "/opt/spark/work-dir/demo.py", line 73, in <module>
df_load.show(10)
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 606, in show
File "/opt/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco
File "/opt/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o70.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.244.0.45 executor 2): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] TranHuyTiep commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
Posted by "TranHuyTiep (via GitHub)" <gi...@apache.org>.
TranHuyTiep commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1493224689
> is it can work in your local ? not k8s ?
yes, it can work in local
I set up spark_conf.setMaster("local[*]") can work in k8s but not create executor and run in one driver
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] codope commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1514660990
cc @harsh1231
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] TranHuyTiep commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
Posted by "TranHuyTiep (via GitHub)" <gi...@apache.org>.
TranHuyTiep commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1538193791
> @TranHuyTiep Were you able to resolve this issue or still facing the same?
I solved the above problem by build new image and copy all packages in .ivy2/jars/* to /opt/spark/jars/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1539430091
Thanks @TranHuyTiep. Closing the issue as you are able to fix. Please reopen if you see issue again.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] TranHuyTiep commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
Posted by "TranHuyTiep (via GitHub)" <gi...@apache.org>.
TranHuyTiep commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1492861891
> can you provide a simple reproduce step, like `code` or `sql`
Here is my code
`# -*- coding: utf-8 -*-
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
if __name__ == '__main__':
print("_____________Run______________")
spark_conf = SparkConf()
spark_conf.set("spark.jars.packages", "org.apache.hudi:hudi-spark3.2-bundle_2.12:0.13.0,org.apache.spark:spark-avro_2.12:3.3.1")
spark_conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
spark_conf.set("spark.sql.hive.convertMetastoreParquet", "false")
spark_conf.set("spark.rdd.compress", "true")
sparkSession = (SparkSession
.builder
.appName('read_hudi')
.config(conf=spark_conf)
.getOrCreate())
file_path = "hdfs://hdfs_host:9000/data/dwd/trans_event"
# READ HUDI
df_load = sparkSession.read.format("org.apache.hudi").load(
file_path, inferSchema=True,
header=True
)
# READ
df_load.createOrReplaceTempView("trans_event")
query_e34_trans_raw = """
SELECT transaction_data.profile_id FROM trans_event limit 10
"""
df_load_profile_trans = sparkSession.sql(sqlQuery=query_e34_trans_raw)
df_load_profile_trans.printSchema()
df_load_profile_trans.show(10)
sparkSession.stop()
print("_____________End______________")
`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bigdata-spec commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
Posted by "bigdata-spec (via GitHub)" <gi...@apache.org>.
bigdata-spec commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1538137162
@TranHuyTiep I have the same environment on k8s. how can I connect with you?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1506785394
@TranHuyTiep Were you able to resolve this issue or still facing the same?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] KnightChess commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
Posted by "KnightChess (via GitHub)" <gi...@apache.org>.
KnightChess commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1492815283
can you provide a simple reproduce step, like `code` or `sql`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda [hudi]
Posted by "parisni (via GitHub)" <gi...@apache.org>.
parisni commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1983524308
> I solved the above problem by build new image and copy all packages in .ivy2/jars/* to /opt/spark/jars/
same here on kubernetes. Sounds like k8s does not works well with `spark.jars.packages` and hudi
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] KnightChess commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
Posted by "KnightChess (via GitHub)" <gi...@apache.org>.
KnightChess commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1493198904
is it can work in your local ? not k8s ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] KnightChess commented on issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
Posted by "KnightChess (via GitHub)" <gi...@apache.org>.
KnightChess commented on issue #8340:
URL: https://github.com/apache/hudi/issues/8340#issuecomment-1493256045
have you point some config in your submit command, something like:
```shell
pyspark \
--packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \
--conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
```
and look like the env is different in diff submit mode. I think you can confirm the env, for example, if you in cluster mode and use yarn archive, make sure hudi jar in archive and so on.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] codope closed issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
Posted by "codope (via GitHub)" <gi...@apache.org>.
codope closed issue #8340: [SUPPORT] cannot assign instance of java.lang.invoke.SerializedLambda
URL: https://github.com/apache/hudi/issues/8340
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org