You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2022/03/02 16:37:00 UTC
[jira] [Closed] (HUDI-3469) Refactor HoodieTestDataGenerator to enable reproducible builds
[ https://issues.apache.org/jira/browse/HUDI-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu closed HUDI-3469.
----------------------------
Resolution: Done
> Refactor HoodieTestDataGenerator to enable reproducible builds
> --------------------------------------------------------------
>
> Key: HUDI-3469
> URL: https://issues.apache.org/jira/browse/HUDI-3469
> Project: Apache Hudi
> Issue Type: Bug
> Components: tests-ci
> Reporter: Alexey Kudinkin
> Assignee: Alexey Kudinkin
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Currently, `HoodieTestDataGenerator` relies on static state which make its state shared across all of the tests making data generation dependent on the order of execution.
>
> Instead we should properly abstract `HoodieTestDataGenerator` to hold all of the state w/in individual instances so that individual Tests can:
> 1. Create they own isolated instance (which won't be affected by other Tests)
> 2. Pass "seed" value to DataGenerator to init its PRNG w/ it, so that it always produces the same (pseudo-)random sequence (for a given seed)
> 3. Be certain that all of the data produced by DataGenerator will be 100% reproducible w/ the same seed (meaning that all of the DataGenerator operations w/in it only rely on such internal PRNG and don't rely on any external sources, such as `UUID.randomUUID()`, `System.currentTimeMillis()`, etc)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)