You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/06/30 22:38:21 UTC

[GitHub] [hudi] WaterKnight1998 opened a new issue #1777: [SUPPORT] org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were

WaterKnight1998 opened a new issue #1777:
URL: https://github.com/apache/hudi/issues/1777


   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   I was trying to store a dataframe in hudi in the following way:
   
   ```
   tableName = "forecasts"
   basePath = "gs://hudi-datalake/" + tableName
   
   hudi_options = {
     'hoodie.table.name': tableName,
     'hoodie.datasource.write.recordkey.field': 'uuid',
     'hoodie.datasource.write.partitionpath.field': 'partitionpath',
     'hoodie.datasource.write.table.name': tableName,
     'hoodie.datasource.write.operation': 'insert',
     'hoodie.datasource.write.precombine.field': 'ts',
     'hoodie.upsert.shuffle.parallelism': 2, 
     'hoodie.insert.shuffle.parallelism': 2
   }
   
   results = results.selectExpr(
       "ds as date",
       "store",
       "item",
       "y as sales",
       "yhat as sales_predicted",
       "yhat_upper as sales_predicted_upper",
       "yhat_lower as sales_predicted_lower",
       "training_date")
   
   
   results.write.format("hudi"). \
     options(**hudi_options). \
     mode("overwrite"). \
     save(basePath)
   ```
   
   i got this error:
   ```
   Py4JJavaError: An error occurred while calling o207.save.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 24.0 failed 4 times, most recent failure: Lost task 0.3 in stage 24.0 (TID 1633, 10.20.0.11, executor 1): org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were :[date, store, item, sales, sales_predicted, sales_predicted_upper, sales_predicted_lower, training_date]
   	at org.apache.hudi.DataSourceUtils.getNestedFieldVal(DataSourceUtils.java:100)
   	at org.apache.hudi.DataSourceUtils.getNestedFieldValAsString(DataSourceUtils.java:64)
   	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:108)
   	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:107)
   	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
   	at scala.collection.Iterator$$anon$10.next(Iterator.scala:394)
   	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
   	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
   	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
   	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
   	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
   	at scala.collection.AbstractIterator.to(Iterator.scala:1334)
   	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
   	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
   	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
   	at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
   	at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
   	at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
   	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
   	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:123)
   	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   
   Driver stacktrace:
   	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)
   	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)
   	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)
   	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
   	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)
   	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
   	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
   	at scala.Option.foreach(Option.scala:257)
   	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)
   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
   	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
   	at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1409)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
   	at org.apache.spark.rdd.RDD.take(RDD.scala:1382)
   	at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1517)
   	at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1517)
   	at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1517)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
   	at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1516)
   	at org.apache.spark.api.java.JavaRDDLike$class.isEmpty(JavaRDDLike.scala:544)
   	at org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:146)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were :[date, store, item, sales, sales_predicted, sales_predicted_upper, sales_predicted_lower, training_date]
   	at org.apache.hudi.DataSourceUtils.getNestedFieldVal(DataSourceUtils.java:100)
   	at org.apache.hudi.DataSourceUtils.getNestedFieldValAsString(DataSourceUtils.java:64)
   	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:108)
   	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:107)
   	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
   	at scala.collection.Iterator$$anon$10.next(Iterator.scala:394)
   	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
   	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
   	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
   	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
   	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
   	at scala.collection.AbstractIterator.to(Iterator.scala:1334)
   	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
   	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
   	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
   	at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
   	at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
   	at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
   	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
   	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:123)
   	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	... 1 more
   ```
   
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version : 2.4.5
   
   * Hive version :
   
   * Hadoop version : 2.7.7
   
   * Storage (HDFS/S3/GCS..) : GCS
   
   * Running on Docker? (yes/no) : yes
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bhasudha commented on issue #1777: [SUPPORT] org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were

Posted by GitBox <gi...@apache.org>.

bhasudha commented on issue #1777:
URL: https://github.com/apache/hudi/issues/1777#issuecomment-653726518


   How big is your `results` df ? You don't have to do the CONCAT for combining multiple columns for record key. You could simply pass them as comma separated columns and set the config `hoodie.datasource.write.keygenerator.class` to `org.apache.hudi.keygen.ComplexKeyGenerator. Other questions:
   
   1. Is above all the configs you have passed and using defaults for the rest?
   2. And could you also share your spark ui to help debug further?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar closed issue #1777: [SUPPORT] org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were

Posted by GitBox <gi...@apache.org>.

bvaradar closed issue #1777:
URL: https://github.com/apache/hudi/issues/1777


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #1777: [SUPPORT] org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #1777:
URL: https://github.com/apache/hudi/issues/1777#issuecomment-658791168


   Thanks @WaterKnight1998 : Looks like this is resolved. Please open a new ticket if there are any other issues you are seeing. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bhasudha commented on issue #1777: [SUPPORT] org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were

Posted by GitBox <gi...@apache.org>.

bhasudha commented on issue #1777:
URL: https://github.com/apache/hudi/issues/1777#issuecomment-652590917


   Ah okay, I think these are default values for the configs. You would need configure each of them based on table schema. Here is the config session that has explanation of these configs - https://hudi.apache.org/docs/configurations.html#PRECOMBINE_FIELD_OPT_KEY
   https://hudi.apache.org/docs/configurations.html#RECORDKEY_FIELD_OPT_KEY
   https://hudi.apache.org/docs/configurations.html#PARTITIONPATH_FIELD_OPT_KEY
   
   I can help with these configs. You could chose a combination of `date,store,item` for record key to ensure uniqueness.
   For precombine key, you need to chose a field that would help determine which is the latest record among two records with same record key. 
   For partition path, you would need to chose how to group you data. Here it could just be on date or a combination of date and store and more. This determines how your table data is partitioned.  If you are interested in sales on a daily basis may be just date based partition would be good. 
   
   Please let me know if you have more questions.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bhasudha commented on issue #1777: [SUPPORT] org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were

Posted by GitBox <gi...@apache.org>.

bhasudha commented on issue #1777:
URL: https://github.com/apache/hudi/issues/1777#issuecomment-652189984


   @WaterKnight1998 It looks like this field `ts` is not there in the record. Could you print the table schema ? Also which version of Hudi are you using ? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] WaterKnight1998 edited a comment on issue #1777: [SUPPORT] org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were

Posted by GitBox <gi...@apache.org>.

WaterKnight1998 edited a comment on issue #1777:
URL: https://github.com/apache/hudi/issues/1777#issuecomment-652446917


   > @WaterKnight1998 It looks like this field `ts` is not there in the record. Could you print the table schema ? Also which version of Hudi are you using ?
   
   @bhasudha here you have. I don't understand pretty well how to store a normal dataframe in Hudi. I don't understand the config of hudi, like recordkey and precombine field.
   ```
   root
    |-- date: date (nullable = true)
    |-- store: integer (nullable = true)
    |-- item: integer (nullable = true)
    |-- sales: float (nullable = true)
    |-- sales_predicted: float (nullable = true)
    |-- sales_predicted_upper: float (nullable = true)
    |-- sales_predicted_lower: float (nullable = true)
    |-- training_date: date (nullable = false)
   ```
   
   My data does not have a primary key. However, i could combine date,store,item to get one!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] WaterKnight1998 commented on issue #1777: [SUPPORT] org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were

Posted by GitBox <gi...@apache.org>.

WaterKnight1998 commented on issue #1777:
URL: https://github.com/apache/hudi/issues/1777#issuecomment-652597165


   > Ah okay, I think these are default values for the configs. You would need configure each of them based on table schema. Here is the config session that has explanation of these configs - https://hudi.apache.org/docs/configurations.html#PRECOMBINE_FIELD_OPT_KEY
   > https://hudi.apache.org/docs/configurations.html#RECORDKEY_FIELD_OPT_KEY
   > https://hudi.apache.org/docs/configurations.html#PARTITIONPATH_FIELD_OPT_KEY
   > 
   > I can help with these configs. You could chose a combination of `date,store,item` for record key to ensure uniqueness.
   > For precombine key, you need to chose a field that would help determine which is the latest record among two records with same record key.
   > For partition path, you would need to chose how to group you data. Here it could just be on date or a combination of date and store and more. This determines how your table data is partitioned. If you are interested in sales on a daily basis may be just date based partition would be good.
   > 
   > Please let me know if you have more questions.
   
   I make it work as follows:
   ```
   tableName = "forecast_evals"
   basePath = "gs://hudi-datalake/" + tableName
   
   hudi_options = {
     'hoodie.table.name': tableName,
     'hoodie.datasource.write.recordkey.field': 'key',
     'hoodie.datasource.write.table.name': tableName,
     'hoodie.datasource.write.operation': 'insert',
     'hoodie.datasource.write.precombine.field': 'training_date'
   }
   
   results = results.selectExpr(
                       "CONCAT('Store=',  store, ' Item=', item) as key",
                       "store",
                       "item",
                       "mae",
                       "mse",
                       "rmse",
                       "training_date")
   
   results.write.format("hudi"). \
     options(**hudi_options). \
     mode("overwrite"). \
     save(basePath)
   ```
   
   However, it runs very slow!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] WaterKnight1998 commented on issue #1777: [SUPPORT] org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were

Posted by GitBox <gi...@apache.org>.

WaterKnight1998 commented on issue #1777:
URL: https://github.com/apache/hudi/issues/1777#issuecomment-657467137


   > 
   > 
   > @WaterKnight1998 Were you able to resolve ?
   
   1. Yes, above is all the config.
   
   With this code snippet code it worked. I didn't try the `ComplexKeyGenerator`, as my solution worked. But looks like a more solid solution


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bhasudha commented on issue #1777: [SUPPORT] org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were

Posted by GitBox <gi...@apache.org>.

bhasudha commented on issue #1777:
URL: https://github.com/apache/hudi/issues/1777#issuecomment-657285023


   @WaterKnight1998 Were you able to resolve ? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] WaterKnight1998 commented on issue #1777: [SUPPORT] org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were

Posted by GitBox <gi...@apache.org>.

WaterKnight1998 commented on issue #1777:
URL: https://github.com/apache/hudi/issues/1777#issuecomment-652446917


   > @WaterKnight1998 It looks like this field `ts` is not there in the record. Could you print the table schema ? Also which version of Hudi are you using ?
   
   @bhasudha here you have. I don't understand pretty well how to store a normal dataframe in Hudi. I don't understand the config of hudi, like recordkey and precombine field.
   ```
   root
    |-- date: date (nullable = true)
    |-- store: integer (nullable = true)
    |-- item: integer (nullable = true)
    |-- sales: float (nullable = true)
    |-- sales_predicted: float (nullable = true)
    |-- sales_predicted_upper: float (nullable = true)
    |-- sales_predicted_lower: float (nullable = true)
    |-- training_date: date (nullable = false)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org