You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/02 22:25:04 UTC

[GitHub] [hudi] asheeshgarg opened a new issue #1787: Exception During Insert

asheeshgarg opened a new issue #1787:
URL: https://github.com/apache/hudi/issues/1787


   **Setup  org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
   Client PySpark
   Storage S3:**
   hudi_options = {
     | 'hoodie.table.name': self.table_name,
     | 'hoodie.datasource.write.recordkey.field': 'column',
     | 'hoodie.datasource.write.table.name': self.table_name,
     | 'hoodie.datasource.write.precombine.field': 'column',
     | 'hoodie.datasource.write.partitionpath.field': 'dl_snapshot_date',
     | 'hoodie.upsert.shuffle.parallelism': 2,
     | 'hoodie.insert.shuffle.parallelism': 2
     | }
   **Data get written and able to load with spark. But write produce exception**
   
   20/07/02 21:53:36 ERROR PriorityBasedFileSystemView: Got error running preferred function. Trying secondary
   org.apache.hudi.exception.HoodieRemoteException: 10.34.184.84:38937 failed to respond
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getPendingCompactionOperations(RemoteHoodieTableFileSystemView.java:376)
   	at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:66)
   	at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getPendingCompactionOperations(PriorityBasedFileSystemView.java:199)
   	at org.apache.hudi.table.CleanHelper.<init>(CleanHelper.java:78)
   	at org.apache.hudi.table.HoodieCopyOnWriteTable.scheduleClean(HoodieCopyOnWriteTable.java:288)
   	at org.apache.hudi.client.HoodieCleanClient.scheduleClean(HoodieCleanClient.java:118)
   	at org.apache.hudi.client.HoodieCleanClient.clean(HoodieCleanClient.java:95)
   	at org.apache.hudi.client.HoodieWriteClient.clean(HoodieWriteClient.java:835)
   	at org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:512)
   	at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:157)
   	at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:101)
   	at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:92)
   	at org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:268)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:188)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:748)
   hudi_options = {
   --
     | 'hoodie.table.name': self.table_name,
     | 'hoodie.datasource.write.recordkey.field': 'column',
     | 'hoodie.datasource.write.table.name': self.table_name,
     | 'hoodie.datasource.write.precombine.field': 'column',
     | 'hoodie.datasource.write.partitionpath.field': 'dl_snapshot_date',
     | 'hoodie.upsert.shuffle.parallelism': 2,
     | 'hoodie.insert.shuffle.parallelism': 2
     | }
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-657999488


   > @bhasudha in my setup we are not running hive we are just using the metadatastore from hive. So what I did I regster a external table like
   > CREATE TABLE test.hoodie_test2(
   > "_hoodie_commit_time" varchar,
   > "_hoodie_commit_seqno" varchar,
   > "_hoodie_record_key" varchar,
   > "_hoodie_partition_path" varchar,
   > "_hoodie_file_name" varchar,
   > "column" varchar,
   > "data_type" varchar,
   > "is_data_type_inferred" varchar,
   > "completeness" double,
   > "approximate_num_distinct_values" bigint,
   > "histogram" array(row(count bigint, ratio double, value varchar)),
   > "mean" double,
   > "maximum" double,
   > "minimum" double,
   > "sum" double,
   > "std_dev" double,
   > approx_percentiles ARRAY )
   > WITH (
   > format='parquet',
   > external_location='s3a://tempwrite/hudi/'
   > )
   > Just wanted to know if this is right way of doing it does it going to loose any of the functionality?
   
   @asheeshgarg  I understand you are not running Hive. For Presto queries the Hudi table needs to be registered with Hive metastore. Looks like that what you are trying to do above, but from Presto. The above might not work because it doesn't represent that this table is Hudi formatted. That would take work on Presto side to support writes to Hudi. Instead you might want to do Hive sync (which is registering the table schema to Hive metastore) - https://hudi.apache.org/docs/writing_data.html#syncing-to-hive which sets the input and output formats and also the SerDe s as Hudi would require. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar closed issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #1787:
URL: https://github.com/apache/hudi/issues/1787


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-662540757


   @asheeshgarg : I can see the class in the jar. Please check if you are using the right options in adding the jar to spark driver.
   
   MacBook-Pro:oss balaji.varadarajan$ jar tf ~/Downloads/json-1.8.jar 
   META-INF/
   META-INF/MANIFEST.MF
   org/
   org/json/
   org/json/JSON.class
   org/json/JSONArray.class
   org/json/JSONException.class
   org/json/JSONObject$1.class
   org/json/JSONObject.class
   org/json/JSONString.class
   org/json/JSONStringer$Scope.class
   org/json/JSONStringer.class
   org/json/JSONTokener.class
   META-INF/maven/
   META-INF/maven/com.tdunning/
   META-INF/maven/com.tdunning/json/
   META-INF/maven/com.tdunning/json/pom.xml
   META-INF/maven/com.tdunning/json/pom.properties


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
asheeshgarg commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-657806405


   @bhasudha in my setup we are not running hive we are just using the metadatastore from hive. So what I did I regster a external table like 
   CREATE TABLE test.hoodie_test2(
   "_hoodie_commit_time" varchar,
   "_hoodie_commit_seqno" varchar,
   "_hoodie_record_key" varchar,
   "_hoodie_partition_path" varchar,
   "_hoodie_file_name" varchar,
   "column" varchar,
   "data_type" varchar,
   "is_data_type_inferred" varchar,
   "completeness" double,
   "approximate_num_distinct_values" bigint,
   "histogram" array(row(count bigint, ratio double, value varchar)),
   "mean" double,
   "maximum" double,
   "minimum" double,
   "sum" double,
   "std_dev" double,
   approx_percentiles ARRAY )
   WITH (
   format='parquet',
   external_location='s3a://tempwrite/hudi/'
   )
   Just wanted to know if this is right way of doing it does it going to loose any of the functionality?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] asheeshgarg removed a comment on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
asheeshgarg removed a comment on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-662610773


   @bvaradar I have used the --jars option in submit other jars are picked up as well. I also see the class is there but still getting the same error.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
leesf commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-653707211


   @asheeshgarg In fact, enable the embedtimelineserver would reduce calls to FileSystem/S3,before 0.5.3 version, it is disabled on default, so I think it is ok to disable it in 0.5.3.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-668660351


   Closing this issue. Please reopen if needed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
asheeshgarg commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-658191364


   @bhasudha it work with Presto and I am able to query data fine and data seems to be correct based on my queries. The only concern I have is it missing anything that might hit in long run?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
asheeshgarg commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-662610773


   @bvaradar I have used the --jars option in submit other jars are picked up as well. I also see the class is there but still getting the same error.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-663677973


   @asheeshgarg : I may have accidentally deleted a comment from. Has the issue been resolved ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
asheeshgarg commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-661874955


   @bvaradar any recommendation on this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
asheeshgarg commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-662525238


   @bvaradar I added https://mvnrepository.com/artifact/com.tdunning/json/1.8/json-1.8.jar to spark jars but still facing the same issue
    An error occurred while calling o179.save.
   : java.lang.NoClassDefFoundError: org/json/JSONException
   	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10847)
   	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10047)
   	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128)
   	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)
   	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
   	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
   	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-657552873


   @asheeshgarg : As folks mentioned, the exception is happening when executor is trying to get timeline details from embedded timeline server running in driver. This is during only writes. It is a non-fatal error as client would fallback to querying the timeline locally and the commit would succeed. This is not expected to happen though. Can you look at the complete logs to get more details on the exceptions. Also, are you seeing the exception consistently ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
asheeshgarg commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-653641776


   @leesf after adding the options it works fine. Does setting the option to false have any impact?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-658248015


   @asheeshgarg : If the table is represented as simple parquet table, presto queries will start showing duplicates when there are multiple file versions present or could fail when writes are happening (no snapshot isolation). Creating a table using hive sync would ensure only valid and single file versions  are read. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
leesf commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-653344351


   this exception will be thrown in 0.5.3, but I think it do not affect writing into hudi, you would use `option("hoodie.embed.timeline.server", false)` to give it try @asheeshgarg ,  cc @bvaradar this is because in 0.5.3, the embededtimelineserver is on. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-655191206


   it's okay to disable.  But trying to understand why you got that error consistently.. was the port blocked on the driver? do you have the entire executor/driver logs.. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
asheeshgarg commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-661196097


   @bvaradar I am running  hudi-spark-bundle


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
asheeshgarg commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-659690154


   @bvaradar @bhasudha I tried using following 
   "hoodie.datasource.hive_sync.use_jdbc":False,
   "hoodie.datasource.hive_sync.enable":True,
    
   My spark is configured already with thrifturl of metastore for hive.  Does the hudi will use the thrift instance as we I have disabled the jdbc?
   
   I get the error 
    An error occurred while calling o175.save.\n: java.lang.NoClassDefFoundError: org/json/JSONException\n\tat org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10847)\n\tat org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10047)\n\tat org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128)\n\tat org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)\n\tat org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)\n\tat org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)\n\tat org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)\n\tat org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)\n\tat org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)\n\tat org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)\n\tat org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)\n\tat org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:515)\n\tat org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:498)\n\tat org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:488)\n\tat org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:273)\n\tat org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:146)\n\tat
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
asheeshgarg commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-663688727


   @bvaradar I am getting the same exception I had added the jars to the --jars option of submit so its available to both driver and executors.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-662166742


   @asheeshgarg : JSONException class is coming from https://mvnrepository.com/artifact/org.json/json There is licensing issue and hence not part of hudi bundle packages. The underlying issue is due to  Hive 1.x vs 2.x ( See https://issues.apache.org/jira/browse/HUDI-150?jql=text%20~%20%22org.json%22%20and%20project%20%3D%20%22Apache%20Hudi%22%20)  
   
   Spark Hive integration still brings in hive 1.x jars which depends on org.json.  I believe this was provided in user's environment and hence we have not seen folks complaining about this issue. 
   
   Even though this is not Hudi issue per se, let me check a jar with compatible license : https://mvnrepository.com/artifact/com.tdunning/json/1.8 and if it works, we will add to 0.6 bundles after discussing with community. Meanwhile, can you add the json jar ( https://mvnrepository.com/artifact/com.tdunning/json/1.8 or https://mvnrepository.com/artifact/org.json/json) in your classpath and this should resolve the issue.
   
   Tracking Jira: https://issues.apache.org/jira/browse/HUDI-1117


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
asheeshgarg commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-656772241


   @vinothchandar sorry for the delay I will try to pull the logs and attach.
   
   Another quick question I have I have created external table using presto for the data I have written to s3 using
   CREATE TABLE test.hoodie_test2(
   "_hoodie_commit_time" varchar,
   "_hoodie_commit_seqno" varchar,
   "_hoodie_record_key" varchar,
   "_hoodie_partition_path" varchar,
   "_hoodie_file_name" varchar,
   "column" varchar,
   "data_type" varchar,
   "is_data_type_inferred" varchar,
   "completeness" double,
   "approximate_num_distinct_values" bigint,
   "histogram" array(row(count bigint, ratio double, value varchar)),
   "mean" double,
   "maximum" double,
   "minimum" double,
   "sum" double,
   "std_dev" double,
   approx_percentiles ARRAY <double> )
   WITH (
       format='parquet',
       external_location='s3a://tempwrite/hudi/'
   )
   
   It worked fine and able to query it with presto.
   I haven't added the jar presto_install>/plugin/hive-hadoop2/hudi-presto-bundle.jar still it work fine I think it reading the parquet files directly? Is it the right way to do it or need to be done differently?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-657289513


   @asheeshgarg  which version of Presto are you using ? Also, for querying through Presto, you don't have to write anything. As long as the table is registered in Hive, you can simply query using Presto by adding the hudi-presto-bundle.jar to presto_install>/plugin/hive-hadoop2. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-668067209


   @asheeshgarg : Hope this helps : https://issues.apache.org/jira/browse/HUDI-1117?focusedCommentId=17169214&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17169214


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf edited a comment on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
leesf edited a comment on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-653707211


   @asheeshgarg In fact, enable the embedtimelineserver would reduce calls to FileSystem/S3, no other impact. before 0.5.3 version, it is disabled on default, so I think it is ok to disable it in 0.5.3.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
asheeshgarg commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-660175460


   @bvaradar pease let me know if anything else need to be done to disable the jdbc interface?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-653341170


   Do you get it consistently? At first glance looks like an issue talking to namenode? Are you using hdfs


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1787: Exception During Insert

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1787:
URL: https://github.com/apache/hudi/issues/1787#issuecomment-660418451


   @asheeshgarg : Are you running hive sync as part of spark datasource writing (using hudi-spark-bundle ?) or  in standalone mode using hudi-hive-sync-bundle ? I am trying to see which bundle is having this issue. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org