You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/06/30 21:28:41 UTC

[GitHub] [hudi] WaterKnight1998 opened a new issue #1776: [SUPPORT] org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V

WaterKnight1998 opened a new issue #1776:
URL: https://github.com/apache/hudi/issues/1776


   
   
   **To Reproduce**
   
   I was trying to use PySpark with Hudi to create a table in Google Storage. I have code before that read data from Google Storage so the connector is not the problem.
   
   I can see that hudi stores something in Google Cloud, but code is producing that error.
   
   The code was as follows:
   ```
   tableName = "forecasts"
   basePath = "gs://hudi-datalake/" + tableName
   
   hudi_options = {
     'hoodie.table.name': tableName,
     'hoodie.datasource.write.recordkey.field': 'uuid',
     'hoodie.datasource.write.partitionpath.field': 'partitionpath',
     'hoodie.datasource.write.table.name': tableName,
     'hoodie.datasource.write.operation': 'insert',
     'hoodie.datasource.write.precombine.field': 'ts',
     'hoodie.upsert.shuffle.parallelism': 2, 
     'hoodie.insert.shuffle.parallelism': 2
   }
   
   dataGen = sc._jvm.org.apache.hudi.QuickstartUtils.DataGenerator()
   inserts = sc._jvm.org.apache.hudi.QuickstartUtils.convertToStringList(dataGen.generateInserts(10))
   df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
   
   df.write.format("hudi"). \
     options(**hudi_options). \
     mode("overwrite"). \
     save(basePath)
   ```
   
   However, this code produces the following error:
   ```
   Py4JJavaError: An error occurred while calling o346.save.
   : java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
   	at io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
   	at io.javalin.Javalin.<init>(Javalin.java:94)
   	at io.javalin.Javalin.create(Javalin.java:107)
   	at org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:102)
   	at org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:74)
   	at org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:102)
   	at org.apache.hudi.client.AbstractHoodieClient.<init>(AbstractHoodieClient.java:69)
   	at org.apache.hudi.client.AbstractHoodieWriteClient.<init>(AbstractHoodieWriteClient.java:83)
   	at org.apache.hudi.client.HoodieWriteClient.<init>(HoodieWriteClient.java:137)
   	at org.apache.hudi.client.HoodieWriteClient.<init>(HoodieWriteClient.java:124)
   	at org.apache.hudi.client.HoodieWriteClient.<init>(HoodieWriteClient.java:120)
   	at org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:195)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:135)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:748)
   
   (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred while calling o346.save.\n', JavaObject id=o347), <traceback object at 0x7f1a1ce00b48>)
   ```
   
   Trying to read the data as follows:
   ```
   tableName = "forecasts"
   basePath = "gs://hudi-datalake/" + tableName
   
   tripsSnapshotDF = spark. \
     read. \
     format("hudi"). \
     load(basePath)
   # load(basePath) use "/partitionKey=partitionValue" folder structure for Spark auto partition discovery
   
   tripsSnapshotDF.createOrReplaceTempView("forecasts")
   ```
   
   Give another error:
   ```
   Fail to execute line 7:   load(basePath)
   Traceback (most recent call last):
     File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco
       return f(*a, **kw)
     File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
       format(target_id, ".", name), value)
   py4j.protocol.Py4JJavaError: An error occurred while calling o399.load.
   : org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;
   	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)
   	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)
   	at scala.Option.getOrElse(Option.scala:121)
   	at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:184)
   	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:78)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:47)
   	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
   	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:748)
   
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/tmp/1593550879377-0/zeppelin_python.py", line 153, in <module>
       exec(code, _zcUserQueryNameSpace)
     File "<stdin>", line 7, in <module>
     File "/opt/spark/python/pyspark/sql/readwriter.py", line 166, in load
       return self._df(self._jreader.load(path))
     File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
       answer, self.gateway_client, self.target_id, self.name)
     File "/opt/spark/python/pyspark/sql/utils.py", line 69, in deco
       raise AnalysisException(s.split(': ', 1)[1], stackTrace)
   pyspark.sql.utils.AnalysisException: 'Unable to infer schema for Parquet. It must be specified manually.;'
   ```
   
   **Environment Description**
   
   * Hudi version :0.5.3
   
   * Spark version : 2.4.5
   
   * Hive version :
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : GCS
   
   * Running on Docker? (yes/no) : yes
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1776: [SUPPORT] org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1776:
URL: https://github.com/apache/hudi/issues/1776#issuecomment-653342451


   @WaterKnight1998 hudi is not yet fully supported on Hadoop 3. Will get this filed towards that jira


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] WaterKnight1998 commented on issue #1776: [SUPPORT] org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V

Posted by GitBox <gi...@apache.org>.
WaterKnight1998 commented on issue #1776:
URL: https://github.com/apache/hudi/issues/1776#issuecomment-653611644


   > 
   > 
   > @WaterKnight1998 hudi is not yet fully supported on Hadoop 3. Will get this filed towards that jira
   
   Yes, I solved using Hadoop 2.10. What do you mean by "Will get this filed towards that jira"?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1776: [SUPPORT] org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1776:
URL: https://github.com/apache/hudi/issues/1776#issuecomment-655206781


   I meant HUDI-259 .. where we collect issues like this related to hadoop 3


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] WaterKnight1998 commented on issue #1776: [SUPPORT] org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V

Posted by GitBox <gi...@apache.org>.
WaterKnight1998 commented on issue #1776:
URL: https://github.com/apache/hudi/issues/1776#issuecomment-652086755


   I think that the problem was Hadoop 3.2.1. With hadoop 2.7.7 the error dissapears.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar closed issue #1776: [SUPPORT] org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V

Posted by GitBox <gi...@apache.org>.
vinothchandar closed issue #1776:
URL: https://github.com/apache/hudi/issues/1776


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org