You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/09/09 17:28:00 UTC

[jira] [Commented] (HUDI-2405) HoodieTest tables enhancement

    [ https://issues.apache.org/jira/browse/HUDI-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412730#comment-17412730 ] 

sivabalan narayanan commented on HUDI-2405:
-------------------------------------------

[~rxu]: 

I checked the design proposal. Def looks good and the way we wanna go.

couple of comments

1. I feel apis like this (with10Records3PartitionsAsCommits() ) api is tad bit rigid. I am ok having these apis. but also, we should have apis so that users can dictate the partitions and just pass count of files. Something like testTable.insert(commitInstant, operationType, list of new partitions to add, list of partitions to insert/update, files to be added per partition. I feel this will be very useful to write tests for around certain partitions like updates, insert_overwrite, etc. 
2. Not sure if its implicit in the attached doc. but would like to ensure we have 2 diff set of apis. 1 set of apis is just about metadata management w/ just empty files. and another set of apis to operate with actual records. 

> HoodieTest tables enhancement
> -----------------------------
>
>                 Key: HUDI-2405
>                 URL: https://issues.apache.org/jira/browse/HUDI-2405
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: sivabalan narayanan
>            Assignee: Raymond Xu
>            Priority: Major
>
> [WIP design doc|https://lucid.app/publicSegments/view/563d5afe-919a-4d3b-8933-bb764a89f512/image.jpeg]
>  
>  
>  
>  * Objective : test metadata table for files and timeline integrity. 
>  **  Manipulate commits and transitions. empty files should do. Ability to sync to metadata table. Commit metadata is the crux here. 
>  *** - Commit/DeltaCommit
>  *** - Compaction
>  *** - Cleaning
>  *** - ReplaceCommit/Clustering
>  *** - Savepoint/delete savepoint/restore savepoint
>  *** - Rollback
>  *** - Restore
>  ** We will list using this test table and verify data integrity. 
>  
> Also, enhance to support actual records.
> Objective: test whole of Hoodie for data integrity. records to file locations are user defined or test driven. 
>  * Updates? Deletes. should we let callers pass in HoodieRecords w/ proper file location and write them directly. 
>  * should work for inserts, upserts, deletes, compaction, clustering, rollback. 
>  * how does cleaner plan, compaction plan would pan out?
>  * can we maintain in-memory state and simulate updates, etc. anyways, its not distributed right. We are testing just functionality. 
>  
> Document what do we miss testing in actual code path if we start using this test tables for testing.  
>  * for eg: index. 
>  * partitioner. 
>  * write handles (create, append, merge). 
>  * ...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)