You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/10 15:28:58 UTC

[GitHub] [iceberg] thompson0012 opened a new issue, #6171: iceberg cant read parquet after configuration

thompson0012 opened a new issue, #6171:
URL: https://github.com/apache/iceberg/issues/6171

   ### Apache Iceberg version
   
   1.0.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   as the official docker image has several problem
   i use the pyspark image with the following jars as instead
   
   but when i reading the parquet files, it occurs
   ```java
   Py4JJavaError: An error occurred while calling o43.parquet.
   : java.lang.NoClassDefFoundError: scala/$less$colon$less
   	at org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.$anonfun$apply$2(IcebergSparkSessionExtensions.scala:50)
   	at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildResolutionRules$1(SparkSessionExtensions.scala:152)
   	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
   	at scala.collection.TraversableLike.map(TraversableLike.scala:286)
   	at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
   	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
   	at org.apache.spark.sql.SparkSessionExtensions.buildResolutionRules(SparkSessionExtensions.scala:152)
   	at org.apache.spark.sql.internal.BaseSessionStateBuilder.customResolutionRules(BaseSessionStateBuilder.scala:216)
   	at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.<init>(BaseSessionStateBuilder.scala:190)
   	at org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:182)
   	at org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$build$2(BaseSessionStateBuilder.scala:360)
   	at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:87)
   	at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:87)
   	at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
   	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
   	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
   	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
   	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
   	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
   	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
   	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
   	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:91)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
   	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89)
   	at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:444)
   	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
   	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
   	at scala.Option.getOrElse(Option.scala:189)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
   	at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:562)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
   	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
   	at java.base/java.lang.Thread.run(Thread.java:833)
   Caused by: java.lang.ClassNotFoundException: scala.$less$colon$less
   	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
   	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
   	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
   	... 47 more
   ```
   
   my pyspark configuration as follow:
   ```python
   from pyspark.sql import SparkSession
   builder = SparkSession.builder.appName('iceberg')\
       .config('spark.jars.packages','org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.0.0')\
       .config('spark.sql.extensions','org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')\
       .config('spark.sql.catalog.local','org.apache.iceberg.spark.SparkSessionCatalog')\
       .config('spark.sql.catalog.spark_catalog.type','hadoop')
   
   spark = builder.getOrCreate()
   
   df = spark.read.parquet('./yellow_tripdata_2022-01.parquet')
   ```
   
   it doesn't have any problem when i didn't add any iceberg configuration
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] closed issue #6171: iceberg cant read parquet after configuration

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #6171: iceberg cant read parquet after configuration
URL: https://github.com/apache/iceberg/issues/6171


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] thompson0012 commented on issue #6171: iceberg cant read parquet after configuration

Posted by GitBox <gi...@apache.org>.
thompson0012 commented on issue #6171:
URL: https://github.com/apache/iceberg/issues/6171#issuecomment-1310487799

   i know there are some issues found in #5993 , so i use this [images](https://hub.docker.com/r/jupyter/pyspark-notebook)
   with the following .jars
   1. [1.0.0 Spark 3.3_2.12 runtime Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.3_2.12/1.0.0/iceberg-spark-runtime-3.3_2.12-1.0.0.jar) – [3.3_2.13](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.3_2.13/1.0.0/iceberg-spark-runtime-3.3_2.13-1.0.0.jar)
   2. [iceberg-core-1.0.0.jar](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-core/1.0.0)
   3. [iceberg-parquet-1.0.0.jar](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-parquet/1.0.0/iceberg-parquet-1.0.0.jar)
   
   as following the [instruction](https://iceberg.apache.org/releases/)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #6171: iceberg cant read parquet after configuration

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6171:
URL: https://github.com/apache/iceberg/issues/6171#issuecomment-1541044085

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on issue #6171: iceberg cant read parquet after configuration

Posted by GitBox <gi...@apache.org>.
nastra commented on issue #6171:
URL: https://github.com/apache/iceberg/issues/6171#issuecomment-1310475564

   `NoClassDefFoundError` usually indicates that some classes were available during compilation but are not available anymore at runtime.
   Also what issue does the Docker image have? Would be good to get that fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #6171: iceberg cant read parquet after configuration

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #6171:
URL: https://github.com/apache/iceberg/issues/6171#issuecomment-1310831081

   IN your note you seem to be including both Scala 2.12 and 2.13 libraries, this is probably the issue. Usually when I see noClassDef with Scala classes like ```cala/$less$colon$less``` it means multiple scala versions on the classpath.
   
   In almost all cases the only iceberg dependency you should be including with spark is a single instance of the Iceberg-spark-version-runtime.jar, nothing else should be included. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] thompson0012 commented on issue #6171: iceberg cant read parquet after configuration

Posted by GitBox <gi...@apache.org>.
thompson0012 commented on issue #6171:
URL: https://github.com/apache/iceberg/issues/6171#issuecomment-1310555881

   i just found that, almost all the read extension is fail when i added iceberg configuration
   i cant normally read my local files after setting up the iceberg
   
   but it goes normal if i use clean spark configuration
   
   do you have some idea?
   
   just image the use case or scenario:
   i http request to get some network file and sync to my datalake


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #6171: iceberg cant read parquet after configuration

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6171:
URL: https://github.com/apache/iceberg/issues/6171#issuecomment-1562082778

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org