You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/10/26 14:29:40 UTC

[GitHub] [hudi] matthiasdg edited a comment on issue #3868: [SUPPORT] Querying hudi datasets from standalone metastore

matthiasdg edited a comment on issue #3868:
URL: https://github.com/apache/hudi/issues/3868#issuecomment-951981890


   Meanwhile experimented with some other versions of hive metastore + mysql running in docker containers (e.g. 2.3.7 cf. spark). Same problems like the hive partition columns missing in the data:
   ```
   21/10/26 16:05:26 WARN HoodieFileIndex: Cannot do the partition prune for table abfss://dev@stsdpglasshouse.dfs.core.windows.net/devs/degeyt70/partitiontests/datalakehouse/vmm.aq_msm.The partitionFragments size (10893,2021,06,30) is not equal to the partition columns size(StructField(sensorId,LongType,false),StructField(timestamp,TimestampType,true))
   21/10/26 16:05:28 ERROR Executor: Exception in task 0.0 in stage 6.0 (TID 15) 1]
   java.io.IOException: Required column is missing in data file. Col: [hiveid]
   	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initializeInternal(VectorizedParquetRecordReader.java:314)
   	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:154)
   	at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:329)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org