You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/02/19 10:16:41 UTC

[GitHub] [hive] wangyum opened a new pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

wangyum opened a new pull request #1994:
URL: https://github.com/apache/hive/pull/1994


   ### What changes were proposed in this pull request?
   
   This pr disable validate default values when parsing Avro schemas.
   
   ### Why are the changes needed?
   
   It will throw exception when upgrading Avro to 1.10.1 for this schema::
   ```
   {
       "type": "record",
       "name": "EventData",
       "doc": "event data",
       "fields": [
           {"name": "ARRAY_WITH_DEFAULT", "type": {"type": "array", "items": "string"}, "default": null }
       ]
   }
   ```
   ```
   org.apache.avro.AvroTypeException: Invalid default for field USERACTIONS: null not a {"type":"array","items":"string"}
   	at org.apache.avro.Schema.validateDefault(Schema.java:1571)
   	at org.apache.avro.Schema.access$500(Schema.java:87)
   	at org.apache.avro.Schema$Field.<init>(Schema.java:544)
   	at org.apache.avro.Schema.parse(Schema.java:1678)
   	at org.apache.avro.Schema$Parser.parse(Schema.java:1425)
   	at org.apache.avro.Schema$Parser.parse(Schema.java:1396)
   	at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFor(AvroSerdeUtils.java:287)
   	at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFromFS(AvroSerdeUtils.java:170)
   	at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:139)
   	at org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:187)
   	at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:107)
   	at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
   	at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
   	at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:493)
   	at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:225)
   ```
   
   Related Avro ticket: https://issues.apache.org/jira/browse/AVRO-2035
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Manual test.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sunchao commented on pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #1994:
URL: https://github.com/apache/hive/pull/1994#issuecomment-785311457


   @wangyum gotcha. I'll keep this in mind. There are also a few other fixes we want to include in the 2.3.9 release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] wangyum commented on pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #1994:
URL: https://github.com/apache/hive/pull/1994#issuecomment-784614813


   @sunchao We need backport this patch to branch-2.3 and make a new release. Please see https://issues.apache.org/jira/browse/SPARK-34512 for more details.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sunchao commented on pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #1994:
URL: https://github.com/apache/hive/pull/1994#issuecomment-784710645


   @wangyum yes will backport to branch-2.3. How severe is the issue from Spark side? Wondering about this since 2.3.8 was released just weeks ago.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sunchao merged pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
sunchao merged pull request #1994:
URL: https://github.com/apache/hive/pull/1994


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] wangyum commented on pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #1994:
URL: https://github.com/apache/hive/pull/1994#issuecomment-782608079


   Full stack trace:
   ```
   21/02/19 01:43:22,745 WARN [HiveServer2-Background-Pool: Thread-162871] avro.AvroSerDe:195 : Encountered exception determining schema. Returning signal schema to indicate problem
   org.apache.avro.AvroTypeException: Invalid default for field USERACTIONS: null not a {"type":"array","items":"string"}
   	at org.apache.avro.Schema.validateDefault(Schema.java:1571)
   	at org.apache.avro.Schema.access$500(Schema.java:87)
   	at org.apache.avro.Schema$Field.<init>(Schema.java:544)
   	at org.apache.avro.Schema.parse(Schema.java:1678)
   	at org.apache.avro.Schema$Parser.parse(Schema.java:1425)
   	at org.apache.avro.Schema$Parser.parse(Schema.java:1396)
   	at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFor(AvroSerdeUtils.java:287)
   	at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFromFS(AvroSerdeUtils.java:170)
   	at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:139)
   	at org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:187)
   	at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:107)
   	at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
   	at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
   	at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:493)
   	at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:225)
   	at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$makeRDDForPartitionedTable$1(TableReader.scala:146)
   	at scala.collection.immutable.Stream.map(Stream.scala:418)
   	at org.apache.spark.sql.hive.HadoopTableReader.makeRDDForPartitionedTable(TableReader.scala:145)
   	at org.apache.spark.sql.hive.execution.HiveTableScanExec.$anonfun$doExecute$2(HiveTableScanExec.scala:211)
   	at org.apache.spark.util.Utils$.withDummyCallSite(Utils.scala:2469)
   	at org.apache.spark.sql.hive.execution.HiveTableScanExec.doExecute(HiveTableScanExec.scala:211)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:181)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:219)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:216)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:177)
   	at org.apache.spark.sql.execution.ProjectExec.doExecute(basicPhysicalOperators.scala:79)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:181)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:219)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:216)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:177)
   	at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:323)
   	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:465)
   	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:451)
   	at org.apache.spark.sql.execution.SparkPlan.executeTakeAsIterator(SparkPlan.scala:543)
   	at org.apache.spark.sql.execution.CollectLimitExec.executeCollectAsIterator(limit.scala:50)
   	at org.apache.spark.sql.Dataset.collectIteratorFromPlan(Dataset.scala:3659)
   	at org.apache.spark.sql.Dataset.$anonfun$collectAsIterator$1(Dataset.scala:2951)
   	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3641)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:105)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:67)
   	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3639)
   	at org.apache.spark.sql.Dataset.collectAsIterator(Dataset.scala:2951)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$executeForResults$1(SparkExecuteStatementOperation.scala:481)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.executeResultInShareGroup(SparkExecuteStatementOperation.scala:521)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.executeForResults(SparkExecuteStatementOperation.scala:481)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$3(SparkExecuteStatementOperation.scala:329)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withRetry(SparkExecuteStatementOperation.scala:419)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:322)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.$anonfun$run$1(SparkExecuteStatementOperation.scala:225)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
   	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:57)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:225)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:220)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:236)
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] wangyum closed pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
wangyum closed pull request #1994:
URL: https://github.com/apache/hive/pull/1994


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] wangyum commented on pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #1994:
URL: https://github.com/apache/hive/pull/1994#issuecomment-782600067


   @iemejia @viirya The stack trace is different:
   HIVE-24797 | AVRO-2817
   -- | --
   at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFor(AvroSerdeUtils.java:287) | at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:130)
   
   
   My use case is to use the current Spark master branch(Built-in Hive is 2.3.8, Avro is 1.10.1) to read the Hive Avro table.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] wangyum commented on pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #1994:
URL: https://github.com/apache/hive/pull/1994#issuecomment-781979028


   cc @sunchao 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] viirya commented on pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #1994:
URL: https://github.com/apache/hive/pull/1994#issuecomment-782598055


   +1 for adding a comment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] iemejia commented on pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
iemejia commented on pull request #1994:
URL: https://github.com/apache/hive/pull/1994#issuecomment-782604213


   Can you please share the full stack trace just for curiosity @wangyum 
   
   CC @RyanSkraba You might be interested on this one since you know the defaults (assuming it is also a fixable 'regression' on the Avro side).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] wangyum commented on pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #1994:
URL: https://github.com/apache/hive/pull/1994#issuecomment-784966278


   @sunchao This will cause Spark SQL fail to read such Avro tables. I think we can release Hive 2.3.9 before releasing Spark 3.2.0. cc @dongjoon-hyun 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] wangyum edited a comment on pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
wangyum edited a comment on pull request #1994:
URL: https://github.com/apache/hive/pull/1994#issuecomment-781979028


   cc @sunchao @iemejia @viirya


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sunchao commented on pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #1994:
URL: https://github.com/apache/hive/pull/1994#issuecomment-784391627


   Merged to master. Thanks @wangyum and all!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] wangyum commented on pull request #1994: HIVE-24797: Disable validate default values when parsing Avro schemas

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #1994:
URL: https://github.com/apache/hive/pull/1994#issuecomment-782769100


   It seems there is no workaround.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org