You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/05 04:36:34 UTC

[GitHub] [hudi] KarthickAN opened a new issue #2144: [SUPPORT] HoodieException: timestamp(Part -timestamp) field not found in record

KarthickAN opened a new issue #2144:
URL: https://github.com/apache/hudi/issues/2144


   
   **Describe the problem you faced**
   
   Even though there's timestamp in the data it complains its not there. Below is the hudi options I am using
   
   {
     "hoodie.table.Name": "event_processed_cow_jd",
     "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator",
     "hoodie.datasource.write.recordkey.field": "sourceid,sourceassetid,sourceeventid,value,timestamp",
     "hoodie.datasource.write.table.Type": "COPY_ON_WRITE",
     "hoodie.datasource.write.partitionpath.field": "date,sourceid",
     "hoodie.datasource.write.hive_style_partitioning": true,
     "hoodie.datasource.write.table.Name": "event_processed_cow_jd",
     "hoodie.datasource.write.operation": "insert",
     "hoodie.parquet.compression.codec": "snappy",
     "hoodie.parquet.compression.ratio": "6",
     "hoodie.parquet.small.file.limit": "104857600",
     "hoodie.parquet.max.file.size": "134217728",
     "hoodie.parquet.block.size": "134217728",
     "hoodie.copyonwrite.insert.split.size": "4880640",
     "hoodie.copyonwrite.record.size.estimate": "165",
     "hoodie.cleaner.commits.retained": 1,
     "hoodie.combine.before.insert": true,
     "hoodie.datasource.write.precombine.field": "timestamp",
     "hoodie.insert.shuffle.parallelism": 10,
     "hoodie.datasource.write.insert.drop.duplicates": true
   }
   
   **Schema
   root 
   	|-- sourceid: string (nullable = true) 
   	|-- sourcetypeid: integer (nullable = true) 
   	|-- sourceassetid: string (nullable = true) 
   	|-- sourceeventid: string (nullable = true) 
   	|-- mode: integer (nullable = true) 
   	|-- quality: integer (nullable = true) 
   	|-- timestamp: double (nullable = true) 
   	|-- value: integer (nullable = true) 
   	|-- categoryid: integer (nullable = true) 
   	|-- subcategoryid: string (nullable = true) 
   	|-- description: string (nullable = true) 
   	|-- signalmap: map (nullable = true) 
   		| |-- key: string 
   		| |-- value: string (valueContainsNull = true) 
   	|-- argumentmap: map (nullable = true) 
   		| |-- key: string 
   		| |-- value: string (valueContainsNull = true) 
   	|-- publishtimestamp: double (nullable = true) 
   	|-- messageindex: integer (nullable = true) 
   	|-- date: string (nullable = true) 
   	|-- inserttimestamp: double (nullable = false)
   
   **Environment Description**
   
   * Hudi version : 0.6.0
   
   * Spark version : 2.4.3
   
   * Hadoop version : 2.8.5-amzn-1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No. Running on AWS Glue
   
   
   **Stacktrace**
   
   ```Caused by: org.apache.hudi.exception.HoodieException: timestamp(Part -timestamp) field not found in record. Acceptable fields were :[sourceid, sourcetypeid, sourceassetid, sourceeventid, mode, quality, timestamp, value, categoryid, subcategoryid, description, signalmap, argumentmap, publishtimestamp, messageindex, date, inserttimestamp]
   	at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:415)
   	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:140)
   	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:139)
   	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
   	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
   	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
   	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
   	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
   	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
   	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
   	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
   	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
   	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
   	at org.apache.spa
   rk.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
   	at org.apache.spark.scheduler.Task.run(Task.scala:121)
   	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	... 1 more```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] KarthickAN commented on issue #2144: [SUPPORT] HoodieException: timestamp(Part -timestamp) field not found in record

Posted by GitBox <gi...@apache.org>.
KarthickAN commented on issue #2144:
URL: https://github.com/apache/hudi/issues/2144#issuecomment-703473193


   Although error was misleading this is not an issue with hudi. Actual data had null values for the column specified and that caused this issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] KarthickAN closed issue #2144: [SUPPORT] HoodieException: timestamp(Part -timestamp) field not found in record

Posted by GitBox <gi...@apache.org>.
KarthickAN closed issue #2144:
URL: https://github.com/apache/hudi/issues/2144


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org