You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/22 08:04:11 UTC

[GitHub] [hudi] LinMingQiang opened a new issue, #5934: When reading the mor table with `QUERY_TYPE_SNAPSHOT`,Unable to correctly sort and de duplicate data by `PRECOMBINE_FIELD`.

LinMingQiang opened a new issue, #5934:
URL: https://github.com/apache/hudi/issues/5934

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] LinMingQiang commented on issue #5934: When reading the mor table with `QUERY_TYPE_SNAPSHOT`,Unable to correctly sort and de duplicate data by `PRECOMBINE_FIELD`.

Posted by GitBox <gi...@apache.org>.
LinMingQiang commented on issue #5934:
URL: https://github.com/apache/hudi/issues/5934#issuecomment-1181594993

   https://github.com/apache/hudi/pull/5937


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5934: When reading the mor table with `QUERY_TYPE_SNAPSHOT`,Unable to correctly sort and de duplicate data by `PRECOMBINE_FIELD`.

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5934:
URL: https://github.com/apache/hudi/issues/5934#issuecomment-1163895715

   by default hudi uses OverwriteWithLatestAvroPayload which does not honor precombine in all code paths. specifically when records in base file and records in log files are merged together. You can try using DefaultHoodieRecordPayload to achieve this. 
   https://hudi.apache.org/docs/configurations/#hoodiedatasourcewritepayloadclass
   https://hudi.apache.org/docs/configurations/#writepayloadclass
   https://hudi.apache.org/docs/configurations/#hoodiecompactionpayloadclass
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] minihippo commented on issue #5934: When reading the mor table with `QUERY_TYPE_SNAPSHOT`,Unable to correctly sort and de duplicate data by `PRECOMBINE_FIELD`.

Posted by GitBox <gi...@apache.org>.
minihippo commented on issue #5934:
URL: https://github.com/apache/hudi/issues/5934#issuecomment-1181587252

   Changing the payload to `DefaultHoodieRecordPayload` may solve your problem. It match the semantic that get the latest one which value of combine field is bigger.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] LinMingQiang closed issue #5934: When reading the mor table with `QUERY_TYPE_SNAPSHOT`,Unable to correctly sort and de duplicate data by `PRECOMBINE_FIELD`.

Posted by GitBox <gi...@apache.org>.
LinMingQiang closed issue #5934: When reading the mor table with `QUERY_TYPE_SNAPSHOT`,Unable to correctly sort and de duplicate data by `PRECOMBINE_FIELD`.
URL: https://github.com/apache/hudi/issues/5934


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] LinMingQiang commented on issue #5934: When reading the mor table with `QUERY_TYPE_SNAPSHOT`,Unable to correctly sort and de duplicate data by `PRECOMBINE_FIELD`.

Posted by GitBox <gi...@apache.org>.
LinMingQiang commented on issue #5934:
URL: https://github.com/apache/hudi/issues/5934#issuecomment-1163918227

   by default FLink SQL uses Eventtimeavropayload,
   `MergeIterator.mergeRowWithLog` calls `record.getData().combineAndGetUpdateValue(historyAvroRecord, tableSchema)`  instead of `record.getData().combineAndGetUpdateValue(historyAvroRecord, tableSchema, payloadConf)` . so The final call is `OverwriteWithLatestAvroPayload.combineAndGetUpdateValue(IndexedRecord,Schema)`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org