You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2020/06/09 02:37:00 UTC

[jira] [Comment Edited] (HUDI-781) Re-design test utilities

    [ https://issues.apache.org/jira/browse/HUDI-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128778#comment-17128778 ] 

Raymond Xu edited comment on HUDI-781 at 6/9/20, 2:36 AM:
----------------------------------------------------------

[~yanghua] [~vinoth] [~nishith29] [~garyli1019]

Here is an execution plan of the subtasks
 * To begin with, I'm trying to finish subtask #1 as it can be a quick win. As shown in [https://github.com/apache/hudi/pull/1619#issuecomment-627610722,] we can reduce CI time by 10+ min by simply split the test tasks
 * In parallel we can start #3. The proposed `hudi-testutils` module is to encompass all `testutils` from each module, which makes the test dependencies clearer. It will clean up some misplaced tests found during package restructure. 
 ** org.apache.hudi.execution.TestBoundedInMemoryQueue in `hudi-client` should be put in `hudi-common` (due to client test harness dependency)
 ** org.apache.hudi.utilities.inline.fs.TestParquetInLining in `hudi-utilities` should be put in `hudi-common` (due to data generator dependency)
 * Once a minimum setup of `hudi-testutils` is done, we can start #4
 ** Implement a shared spark session provider there
 ** Use the shared spark session provider for test suites, which group functional tests with similar setup/teardown logic (may need to figure out Junit 5 version of Junit 4 test suites with Rule / ClassRule )
 ** By using the new provider class on functional tests one by one, we should start observing reduced test time of hudi-client module or others
 * #2 and #5 can be done in parallel

Each subtask has its own detailed points in its ticket. Please review this rough plan and feedback accordingly. Thanks!


was (Author: rxu):
[~yanghua] [~yanghua] [~nishith29] [~garyli1019]

Here is an execution plan of the subtasks
 * To begin with, I'm trying to finish subtask #1 as it can be a quick win. As shown in [https://github.com/apache/hudi/pull/1619#issuecomment-627610722,] we can reduce CI time by 10+ min by simply split the test tasks
 * In parallel we can start #3. The proposed `hudi-testutils` module is to encompass all `testutils` from each module, which makes the test dependencies clearer. It will clean up some misplaced tests found during package restructure. 
 ** org.apache.hudi.execution.TestBoundedInMemoryQueue in `hudi-client` should be put in `hudi-common` (due to client test harness dependency)
 ** org.apache.hudi.utilities.inline.fs.TestParquetInLining in `hudi-utilities` should be put in `hudi-common` (due to data generator dependency)
 * Once a minimum setup of `hudi-testutils` is done, we can start #4
 ** Implement a shared spark session provider there
 ** Use the shared spark session provider for test suites, which group functional tests with similar setup/teardown logic (may need to figure out Junit 5 version of Junit 4 test suites with Rule / ClassRule )
 ** By using the new provider class on functional tests one by one, we should start observing reduced test time of hudi-client module or others
 * #2 and #5 can be done in parallel

Each subtask has its own detailed points in its ticket. Please review this rough plan and feedback accordingly. Thanks!

> Re-design test utilities
> ------------------------
>
>                 Key: HUDI-781
>                 URL: https://issues.apache.org/jira/browse/HUDI-781
>             Project: Apache Hudi
>          Issue Type: Test
>          Components: Testing
>            Reporter: Raymond Xu
>            Priority: Major
>
> Test utility classes are to re-designed with considerations like
>  * Use more mockings
>  * Reduce spark context setup
>  * Improve/clean up data generator
> An RFC would be preferred for illustrating the design work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)