You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/06/18 02:19:11 UTC

[GitHub] [hudi] xushiyan opened a new pull request #1746: [HUDI-996] Add SharedResources

xushiyan opened a new pull request #1746:
URL: https://github.com/apache/hudi/pull/1746


   - Add suite for selected utilities functional tests
   - Share resources for functional tests
   - Run suite in a different maven profile
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r443170684



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+
+import java.io.IOException;
+
+public class SharedResources implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  /**
+   * An indicator of the initialization status.
+   */
+  protected boolean initialized = false;
+
+  @Override
+  public SparkSession spark() {
+    return spark;
+  }
+
+  @Override
+  public SQLContext sqlContext() {
+    return sqlContext;
+  }
+
+  @Override
+  public JavaSparkContext jsc() {
+    return jsc;
+  }
+
+  @Override
+  public MiniDFSCluster dfsCluster() {
+    return dfsCluster;
+  }
+
+  @Override
+  public DistributedFileSystem dfs() {
+    return dfs;
+  }
+
+  @Override
+  public Path dfsBasePath() {
+    return dfs.getWorkingDirectory();
+  }
+
+  @BeforeEach
+  public synchronized void runBeforeEach() throws Exception {
+    initialized = spark != null && hdfsTestService != null;
+    if (!initialized) {
+      spark = SparkSession.builder()
+          .config(HoodieWriteClient.registerClasses(conf()))
+          .getOrCreate();
+      sqlContext = spark.sqlContext();
+      jsc = new JavaSparkContext(spark.sparkContext());
+
+      FileSystem.closeAll();
+      hdfsTestService = new HdfsTestService();
+      dfsCluster = hdfsTestService.start(true);
+      dfs = dfsCluster.getFileSystem();
+      dfs.mkdirs(dfs.getWorkingDirectory());
+    }
+  }
+
+  @AfterAll
+  public static synchronized void cleanUpAfterAll() throws IOException {

Review comment:
       Tried to close them all in `org.apache.hudi.utilities.functional.FunctionalTestSuite#afterAll` method hook but couldn't trigger the method...not sure why. As those services are only to shut down when all suite test classes finish, i could at least close them in a jvm shutdown hook.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #1746: [HUDI-996] Add SharedResources for functional tests

Posted by GitBox <gi...@apache.org>.
xushiyan commented on pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#issuecomment-646396974


   > FYI: The total travis test time has increased by 5 min which seem to be the time for newly added functional test suite. I dont know enough about surefire so cannot verify but it seems the functional tests are also being executed as part of the unit tests?
   
   Thank you for checking @prashantwason ! The overall test time fluctuates with a few min difference. I would pay attention to the module test time printed at the end of the tests.
   
   The last 2 success runs ended up with ~9 min for `hudi-utilities`
   
   ```
   [INFO] hudi-utilities_2.11 ................................ SUCCESS [09:13 min]
   [INFO] hudi-utilities_2.11 ................................ SUCCESS [09:07 min]
   ```
   
    The [last master build](https://travis-ci.org/github/apache/hudi/jobs/699696974) used almost 11 min for `hudi-utilities`
   
   ```
   [INFO] hudi-utilities_2.11 ................................ SUCCESS [10:50 min]
   ```
   
   Also did a text search from the travis logs that the tests included in the functional suite are not found, so they are excluded. By separating a few tests to the functional suite, we started to save time from unit tests execution. Incrementally we'll save more. :) 
   
   cc @n3nash @vinothchandar


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r449931988



##########
File path: hudi-client/src/test/java/org/apache/hudi/testutils/FunctionalTestHarness.java
##########
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+
+public class FunctionalTestHarness implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  /**
+   * An indicator of the initialization status.
+   */
+  protected boolean initialized = false;

Review comment:
       Its a little confusing.. but okay to be fixed later.

##########
File path: hudi-client/src/test/java/org/apache/hudi/testutils/FunctionalTestHarness.java
##########
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+
+public class FunctionalTestHarness implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  /**
+   * An indicator of the initialization status.
+   */
+  protected boolean initialized = false;

Review comment:
       okay `initialized = spark != null && hdfsTestService != null;` is what makes the spark and hdfsTestService singleton across a run.. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
xushiyan commented on pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#issuecomment-647138537


   > > @xushiyan codecov seems happy?
   > 
   > @vinothchandar When we split some tests from unit tests to another travis job, we lost some coverage reporting in the final report, due to overwriting report data. We were discussing this in [#1619 (comment)](https://github.com/apache/hudi/pull/1619#issuecomment-628260195)
   > 
   > I just opened #1753 to verify the solution for it. I'll address your comments soon after that.
   
   @vinothchandar The solution of report coverage by module works; the CI time is reduced by 12min thanks to the split. Could you please review #1753 and try to merge it first? I'll then rebase this one. Meanwhile, the comments were addressed in 443fd2a, which you could review as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #1746: [HUDI-996] Add SharedResources for functional tests

Posted by GitBox <gi...@apache.org>.
prashantwason commented on pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#issuecomment-646380011


   FYI: The total travis test time has increased by 5 min which seem to be the time for newly added functional test suite.  I dont know enough about surefire so cannot verify but it seems the functional tests are also being executed as part of the unit tests?
   
   ![before](https://user-images.githubusercontent.com/58448203/85085829-d6c59700-b18d-11ea-94d2-4c7da5605aea.png)
   ![after](https://user-images.githubusercontent.com/58448203/85085831-d927f100-b18d-11ea-91c7-ec892029979c.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r449815176



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/UtilitiesFunctionalTestSuite.java
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.functional;
+
+import org.junit.platform.runner.JUnitPlatform;
+import org.junit.platform.suite.api.IncludeTags;
+import org.junit.platform.suite.api.SelectPackages;
+import org.junit.runner.RunWith;
+
+@RunWith(JUnitPlatform.class)
+@SelectPackages("org.apache.hudi.utilities.functional")
+@IncludeTags("functional")
+public class UtilitiesFunctionalTestSuite {

Review comment:
       This class is to like an entry point to all functional test classes annotated with `@Tag("functional")` in this module..so I would instead optionally make it `final`. It'll be run by surefire which looks for this pattern in each module.
   https://github.com/apache/hudi/blob/574dcf920c4677513f6fdc8d441bb42827afa5a2/pom.xml#L1026-L1028
   The relevant maven profile settings were merged by the codecov improvement PR..sorry it is hard to correlate here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-commenter commented on pull request #1746: [HUDI-996] Add functional test suite

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#issuecomment-646906318


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1746?src=pr&el=h1) Report
   > Merging [#1746](https://codecov.io/gh/apache/hudi/pull/1746?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hudi/commit/8a9fdd603e3e532ea5252b98205acfb8aa648795&el=desc) will **decrease** coverage by `0.64%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/1746/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1746?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1746      +/-   ##
   ============================================
   - Coverage     18.14%   17.49%   -0.65%     
   - Complexity      859      907      +48     
   ============================================
     Files           352      398      +46     
     Lines         15413    17004    +1591     
     Branches       1526     1691     +165     
   ============================================
   + Hits           2796     2975     +179     
   - Misses        12259    13658    +1399     
   - Partials        358      371      +13     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/1746?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...g/apache/hudi/utilities/sources/AvroDFSSource.java](https://codecov.io/gh/apache/hudi/pull/1746/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb0RGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | [...i/utilities/deltastreamer/SourceFormatAdapter.java](https://codecov.io/gh/apache/hudi/pull/1746/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvU291cmNlRm9ybWF0QWRhcHRlci5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | [...apache/hudi/utilities/sources/AvroKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/1746/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb0thZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | [...udi/utilities/sources/helpers/DFSPathSelector.java](https://codecov.io/gh/apache/hudi/pull/1746/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9ERlNQYXRoU2VsZWN0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/1746/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | [.../hudi/utilities/schema/RowBasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/1746/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9Sb3dCYXNlZFNjaGVtYVByb3ZpZGVyLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | [...hudi/utilities/schema/FilebasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/1746/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9GaWxlYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/1746/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | [...pache/hudi/utilities/sources/HoodieIncrSource.java](https://codecov.io/gh/apache/hudi/pull/1746/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSG9vZGllSW5jclNvdXJjZS5qYXZh) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | [...s/exception/HoodieIncrementalPullSQLException.java](https://codecov.io/gh/apache/hudi/pull/1746/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVJbmNyZW1lbnRhbFB1bGxTUUxFeGNlcHRpb24uamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | ... and [36 more](https://codecov.io/gh/apache/hudi/pull/1746/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hudi/pull/1746?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hudi/pull/1746?src=pr&el=footer). Last update [8a9fdd6...6dd8cc7](https://codecov.io/gh/apache/hudi/pull/1746?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #1746: [HUDI-996] Add SharedResources for functional tests

Posted by GitBox <gi...@apache.org>.
prashantwason commented on pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#issuecomment-646399348


   Awesome. Thanks for confirming Raymond.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add SharedResources for functional tests

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r442576261



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+public class SharedResources implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  @TempDir
+  protected static java.nio.file.Path sharedTempDir;
+
+  protected boolean initialized = false;
+
+  @Override
+  public SparkSession spark() {
+    return spark;
+  }
+
+  @Override
+  public SQLContext sqlContext() {
+    return sqlContext;
+  }
+
+  @Override
+  public JavaSparkContext jsc() {
+    return jsc;
+  }
+
+  @Override
+  public MiniDFSCluster dfsCluster() {
+    return dfsCluster;
+  }
+
+  @Override
+  public DistributedFileSystem dfs() {
+    return dfs;
+  }
+
+  @Override
+  public Path dfsBasePath() {
+    return dfs.getWorkingDirectory();
+  }
+
+  @BeforeEach
+  public synchronized void runBeforeEach() throws Exception {
+    initialized = spark != null && hdfsTestService != null;
+    if (!initialized) {
+      spark = SparkSession.builder()
+          .config(HoodieWriteClient.registerClasses(conf()))
+          .getOrCreate();
+      sqlContext = spark.sqlContext();
+      jsc = new JavaSparkContext(spark.sparkContext());
+
+      FileSystem.closeAll();
+      hdfsTestService = new HdfsTestService(sharedTempDir);
+      dfsCluster = hdfsTestService.start(true);
+      dfs = dfsCluster.getFileSystem();

Review comment:
       @prashantwason Please note that this `@BeforeEach` is guarded by `null` check on the static vars, and the test classes are run using `FunctionalTestSuite`. So for many test classes, regardless of using hdfsTestService or not, the shared resources are only init'ed once within the same test suite, which I'm thinking we can have 1 for each module.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
xushiyan commented on pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#issuecomment-647062786


   > @xushiyan codecov seems happy?
   
   @vinothchandar When we split some tests from unit tests to another travis job, we lost some coverage reporting in the final report, due to overwriting report data. We were discussing this in https://github.com/apache/hudi/pull/1619#issuecomment-628260195
   
   I just opened https://github.com/apache/hudi/pull/1753 to verify the solution for it. I'll address your comments soon after that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add SharedResources

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r441931628



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieSnapshotCopier.java
##########
@@ -147,14 +142,4 @@ public void testSnapshotCopy() throws Exception {
 
     assertTrue(fs.exists(new Path(outputPath + "/_SUCCESS")));
   }
-
-  @AfterEach
-  public void cleanup() {
-    if (rootPath != null) {
-      new File(rootPath).delete();
-    }
-    if (jsc != null) {
-      jsc.stop();
-    }
-  }

Review comment:
       `@TempDir java.nio.file.Path tempDir` will clean up itself automatically

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestChainedTransformer.java
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.functional;
+
+import org.apache.hudi.utilities.testutils.SharedResources;
+import org.apache.hudi.utilities.transform.ChainedTransformer;
+import org.apache.hudi.utilities.transform.Transformer;
+
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.RowFactory;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.Test;
+
+import java.util.Arrays;
+import java.util.List;
+
+import static org.apache.spark.sql.types.DataTypes.IntegerType;
+import static org.apache.spark.sql.types.DataTypes.StringType;
+import static org.apache.spark.sql.types.DataTypes.createStructField;
+import static org.junit.jupiter.api.Assertions.assertArrayEquals;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+@Tag("suite")
+public class TestChainedTransformer extends SharedResources {

Review comment:
       functional test separated from its unit test class

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/transform/TestChainedTransformer.java
##########
@@ -19,67 +19,15 @@
 
 package org.apache.hudi.utilities.transform;
 
-import org.apache.hudi.utilities.UtilHelpers;
-
-import org.apache.spark.api.java.JavaSparkContext;
-import org.apache.spark.sql.Dataset;
-import org.apache.spark.sql.Row;
-import org.apache.spark.sql.RowFactory;
-import org.apache.spark.sql.SparkSession;
-import org.apache.spark.sql.types.DataTypes;
-import org.apache.spark.sql.types.StructField;
-import org.apache.spark.sql.types.StructType;
-import org.junit.jupiter.api.AfterEach;
-import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 
 import java.util.Arrays;
 import java.util.List;
 
 import static org.apache.spark.sql.types.DataTypes.IntegerType;
-import static org.apache.spark.sql.types.DataTypes.StringType;
-import static org.apache.spark.sql.types.DataTypes.createStructField;
-import static org.junit.jupiter.api.Assertions.assertArrayEquals;
 import static org.junit.jupiter.api.Assertions.assertEquals;
 
 public class TestChainedTransformer {
-
-  private JavaSparkContext jsc;
-  private SparkSession sparkSession;
-
-  @BeforeEach
-  public void setUp() {
-    jsc = UtilHelpers.buildSparkContext(this.getClass().getName() + "-hoodie", "local[2]");
-    sparkSession = SparkSession.builder().config(jsc.getConf()).getOrCreate();
-  }
-
-  @AfterEach
-  public void tearDown() {
-    jsc.stop();
-  }
-
-  @Test
-  public void testChainedTransformation() {

Review comment:
       moved to functional

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.utilities.HoodieSnapshotExporter.OutputFormatValidator;
+
+import com.beust.jcommander.ParameterException;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.NullSource;
+import org.junit.jupiter.params.provider.ValueSource;
+
+import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+public class TestHoodieSnapshotExporter {

Review comment:
       unit test separated from its functional test class

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/checkpointing/TestKafkaConnectHdfsProvider.java
##########
@@ -19,37 +19,39 @@
 package org.apache.hudi.utilities.checkpointing;
 
 import org.apache.hudi.common.config.TypedProperties;
-import org.apache.hudi.common.testutils.HoodieCommonTestHarness;
 import org.apache.hudi.common.testutils.HoodieTestUtils;
 import org.apache.hudi.exception.HoodieException;
 
 import org.apache.hadoop.conf.Configuration;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
 
 import java.io.File;
+import java.nio.file.Files;
 
 import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertThrows;
 
-public class TestKafkaConnectHdfsProvider extends HoodieCommonTestHarness {
+public class TestKafkaConnectHdfsProvider {

Review comment:
       This can be a unit test; getting rid of CommonTestHarness

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieSnapshotExporter.java
##########
@@ -258,49 +253,21 @@ public void setUp() throws Exception {
     public void testExportWithPartitionField() throws IOException {
       // `driver` field is set in HoodieTestDataGenerator
       cfg.outputPartitionField = "driver";
-      new HoodieSnapshotExporter().export(jsc, cfg);
+      new HoodieSnapshotExporter().export(jsc(), cfg);
 
-      assertEquals(NUM_RECORDS, sqlContext.read().format("json").load(targetPath).count());
-      assertTrue(dfs.exists(new Path(targetPath + "/_SUCCESS")));
-      assertTrue(dfs.listStatus(new Path(targetPath)).length > 1);
+      assertEquals(NUM_RECORDS, sqlContext().read().format("json").load(targetPath).count());
+      assertTrue(dfs().exists(new Path(targetPath + "/_SUCCESS")));
+      assertTrue(dfs().listStatus(new Path(targetPath)).length > 1);
     }
 
     @Test
     public void testExportForUserDefinedPartitioner() throws IOException {
       cfg.outputPartitioner = UserDefinedPartitioner.class.getName();
-      new HoodieSnapshotExporter().export(jsc, cfg);
-
-      assertEquals(NUM_RECORDS, sqlContext.read().format("json").load(targetPath).count());
-      assertTrue(dfs.exists(new Path(targetPath + "/_SUCCESS")));
-      assertTrue(dfs.exists(new Path(String.format("%s/%s=%s", targetPath, UserDefinedPartitioner.PARTITION_NAME, PARTITION_PATH))));
-    }
-  }
-
-  @Nested
-  public class TestHoodieSnapshotExporterInputValidation {

Review comment:
       moved to unit test

##########
File path: scripts/run_travis_tests.sh
##########
@@ -20,12 +20,13 @@ mode=$1
 sparkVersion=2.4.4
 hadoopVersion=2.7
 
-if [ "$mode" = "unit" ];
-then
+if [ "$mode" = "unit" ]; then
   echo "Running Unit Tests"
   mvn test -DskipITs=true -B
-elif [ "$mode" = "integration" ];
-then
+elif [ "$mode" = "functional" ]; then
+  echo "Running Functional Test Suite"
+  mvn test -pl hudi-utilities -Pfunctional-test-suite -B

Review comment:
       to begin with utilities module; later expand to others




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1746: [HUDI-996] Add functional test suite

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r443130343



##########
File path: scripts/run_travis_tests.sh
##########
@@ -20,12 +20,13 @@ mode=$1
 sparkVersion=2.4.4
 hadoopVersion=2.7
 
-if [ "$mode" = "unit" ];
-then
+if [ "$mode" = "unit" ]; then
   echo "Running Unit Tests"
   mvn test -DskipITs=true -B
-elif [ "$mode" = "integration" ];
-then
+elif [ "$mode" = "functional" ]; then
+  echo "Running Functional Test Suite"
+  mvn test -pl hudi-utilities -Pfunctional-test-suite -B

Review comment:
       let's make the PR title reflective of this scope?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
xushiyan commented on pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#issuecomment-650807492


   > I will take another pass at this once you rebase.. merged #1753
   
   @vinothchandar ok update the branch


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #1746: [HUDI-996] Add SharedResources for functional tests

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r442576528



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+public class SharedResources implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  @TempDir
+  protected static java.nio.file.Path sharedTempDir;
+
+  protected boolean initialized = false;
+
+  @Override
+  public SparkSession spark() {
+    return spark;
+  }
+
+  @Override
+  public SQLContext sqlContext() {
+    return sqlContext;
+  }
+
+  @Override
+  public JavaSparkContext jsc() {
+    return jsc;
+  }
+
+  @Override
+  public MiniDFSCluster dfsCluster() {
+    return dfsCluster;
+  }
+
+  @Override
+  public DistributedFileSystem dfs() {
+    return dfs;
+  }
+
+  @Override
+  public Path dfsBasePath() {
+    return dfs.getWorkingDirectory();
+  }
+
+  @BeforeEach
+  public synchronized void runBeforeEach() throws Exception {
+    initialized = spark != null && hdfsTestService != null;
+    if (!initialized) {
+      spark = SparkSession.builder()
+          .config(HoodieWriteClient.registerClasses(conf()))
+          .getOrCreate();
+      sqlContext = spark.sqlContext();
+      jsc = new JavaSparkContext(spark.sparkContext());
+
+      FileSystem.closeAll();
+      hdfsTestService = new HdfsTestService(sharedTempDir);
+      dfsCluster = hdfsTestService.start(true);
+      dfs = dfsCluster.getFileSystem();

Review comment:
       Got it. Looks good. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r443169701



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/checkpointing/TestKafkaConnectHdfsProvider.java
##########
@@ -19,37 +19,39 @@
 package org.apache.hudi.utilities.checkpointing;
 
 import org.apache.hudi.common.config.TypedProperties;
-import org.apache.hudi.common.testutils.HoodieCommonTestHarness;
 import org.apache.hudi.common.testutils.HoodieTestUtils;
 import org.apache.hudi.exception.HoodieException;
 
 import org.apache.hadoop.conf.Configuration;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
 
 import java.io.File;
+import java.nio.file.Files;
 
 import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertThrows;
 
-public class TestKafkaConnectHdfsProvider extends HoodieCommonTestHarness {

Review comment:
       make sense.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1746: [HUDI-996] Add functional test suite

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r443131872



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/checkpointing/TestKafkaConnectHdfsProvider.java
##########
@@ -19,37 +19,39 @@
 package org.apache.hudi.utilities.checkpointing;
 
 import org.apache.hudi.common.config.TypedProperties;
-import org.apache.hudi.common.testutils.HoodieCommonTestHarness;
 import org.apache.hudi.common.testutils.HoodieTestUtils;
 import org.apache.hudi.exception.HoodieException;
 
 import org.apache.hadoop.conf.Configuration;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
 
 import java.io.File;
+import java.nio.file.Files;
 
 import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertThrows;
 
-public class TestKafkaConnectHdfsProvider extends HoodieCommonTestHarness {
+public class TestKafkaConnectHdfsProvider {
 
-  private String topicPath = null;
-  private Configuration hadoopConf = null;
+  @TempDir
+  public java.nio.file.Path basePath;

Review comment:
       I wish java lmports had aliases like scala does :) ...again with a common base class, this can be avoided in every class?

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.utilities.HoodieSnapshotExporter.OutputFormatValidator;
+
+import com.beust.jcommander.ParameterException;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.NullSource;
+import org.junit.jupiter.params.provider.ValueSource;
+
+import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+public class TestHoodieSnapshotExporter {

Review comment:
       meaning this is not bringing up any new resources to run the test... 
   I assume this the principle we will be following?
   
   unit - testing basic functionality at the class level, potentially using mocks. Expected to finish quicker
   functional - brings up the services needed and runs test without mocking
   integration - runs subset of functional tests, on a full fledged enviroment with dockerized services 
   
   Might be good to add such a doc somewhere.. may be in travis.yml or in README even.. so developers understand what test is what.

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+
+import java.io.IOException;
+
+public class SharedResources implements SparkProvider, DFSProvider {

Review comment:
       rename to `SharedTestResources`?
   
   Also when I read `Test extends SharedTestResources`, the class relationship does not feel very natural.. Something to think about around naming as well and generally check ourselves against the more common pattern for this kind of stuff

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/DFSProvider.java
##########
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+
+public interface DFSProvider {

Review comment:
       guess eventually, we will reuse this across hudi-client and hudi-common? (with necessary relocation?)

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+
+import java.io.IOException;
+
+public class SharedResources implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  /**
+   * An indicator of the initialization status.
+   */
+  protected boolean initialized = false;
+
+  @Override
+  public SparkSession spark() {
+    return spark;
+  }
+
+  @Override
+  public SQLContext sqlContext() {
+    return sqlContext;
+  }
+
+  @Override
+  public JavaSparkContext jsc() {
+    return jsc;
+  }
+
+  @Override
+  public MiniDFSCluster dfsCluster() {
+    return dfsCluster;
+  }
+
+  @Override
+  public DistributedFileSystem dfs() {
+    return dfs;
+  }
+
+  @Override
+  public Path dfsBasePath() {
+    return dfs.getWorkingDirectory();
+  }
+
+  @BeforeEach
+  public synchronized void runBeforeEach() throws Exception {
+    initialized = spark != null && hdfsTestService != null;
+    if (!initialized) {
+      spark = SparkSession.builder()
+          .config(HoodieWriteClient.registerClasses(conf()))
+          .getOrCreate();
+      sqlContext = spark.sqlContext();
+      jsc = new JavaSparkContext(spark.sparkContext());
+
+      FileSystem.closeAll();
+      hdfsTestService = new HdfsTestService();
+      dfsCluster = hdfsTestService.start(true);
+      dfs = dfsCluster.getFileSystem();
+      dfs.mkdirs(dfs.getWorkingDirectory());
+    }
+  }
+
+  @AfterAll
+  public static synchronized void cleanUpAfterAll() throws IOException {

Review comment:
       when are the miniCluster and spark really shutdown?

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/checkpointing/TestKafkaConnectHdfsProvider.java
##########
@@ -19,37 +19,39 @@
 package org.apache.hudi.utilities.checkpointing;
 
 import org.apache.hudi.common.config.TypedProperties;
-import org.apache.hudi.common.testutils.HoodieCommonTestHarness;
 import org.apache.hudi.common.testutils.HoodieTestUtils;
 import org.apache.hudi.exception.HoodieException;
 
 import org.apache.hadoop.conf.Configuration;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
 
 import java.io.File;
+import java.nio.file.Files;
 
 import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertThrows;
 
-public class TestKafkaConnectHdfsProvider extends HoodieCommonTestHarness {

Review comment:
       I wonder if we still need a `HoodieTestHarness` lite?  for things like basePath, conf etc.. it's probably easier than having every unit test do this?

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+
+import java.io.IOException;
+
+public class SharedResources implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  /**
+   * An indicator of the initialization status.
+   */
+  protected boolean initialized = false;
+
+  @Override
+  public SparkSession spark() {
+    return spark;
+  }
+
+  @Override
+  public SQLContext sqlContext() {
+    return sqlContext;
+  }
+
+  @Override
+  public JavaSparkContext jsc() {
+    return jsc;
+  }
+
+  @Override
+  public MiniDFSCluster dfsCluster() {
+    return dfsCluster;
+  }
+
+  @Override
+  public DistributedFileSystem dfs() {
+    return dfs;
+  }
+
+  @Override
+  public Path dfsBasePath() {
+    return dfs.getWorkingDirectory();
+  }
+
+  @BeforeEach
+  public synchronized void runBeforeEach() throws Exception {
+    initialized = spark != null && hdfsTestService != null;
+    if (!initialized) {
+      spark = SparkSession.builder()
+          .config(HoodieWriteClient.registerClasses(conf()))
+          .getOrCreate();
+      sqlContext = spark.sqlContext();
+      jsc = new JavaSparkContext(spark.sparkContext());
+
+      FileSystem.closeAll();

Review comment:
       No harm in doing this before spark is inited?  Spark threads may be creating some FileSystem objects, which may be killed here? with enough bad luck, we could see flakiness? (thinking out loud)

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/FunctionalTestSuite.java
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.functional;
+
+import org.junit.platform.runner.JUnitPlatform;
+import org.junit.platform.suite.api.IncludeTags;
+import org.junit.platform.suite.api.SelectPackages;
+import org.junit.runner.RunWith;
+
+@RunWith(JUnitPlatform.class)
+@SelectPackages("org.apache.hudi.utilities.functional")
+@IncludeTags("suite")
+public class FunctionalTestSuite {

Review comment:
       rename to `UtilitiesFunctionalTestSuite` ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add SharedResources for functional tests

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r442598560



##########
File path: pom.xml
##########
@@ -253,10 +255,6 @@
               ${surefire-log4j.file}
             </log4j.configuration>
           </systemPropertyVariables>
-          <!-- Excludes integration tests when unit tests are run. -->
-          <excludes>
-            <exclude>**/IT*.java</exclude>
-          </excludes>

Review comment:
       The exclusion is moved to `unit-tests` profile.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#issuecomment-650703651


   I will take another pass at this once you rebase.. merged #1753 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add SharedResources for functional tests

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r442577755



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+public class SharedResources implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  @TempDir
+  protected static java.nio.file.Path sharedTempDir;
+
+  protected boolean initialized = false;
+
+  @Override
+  public SparkSession spark() {
+    return spark;
+  }
+
+  @Override
+  public SQLContext sqlContext() {
+    return sqlContext;
+  }
+
+  @Override
+  public JavaSparkContext jsc() {
+    return jsc;
+  }
+
+  @Override
+  public MiniDFSCluster dfsCluster() {
+    return dfsCluster;
+  }
+
+  @Override
+  public DistributedFileSystem dfs() {
+    return dfs;
+  }
+
+  @Override
+  public Path dfsBasePath() {
+    return dfs.getWorkingDirectory();
+  }
+
+  @BeforeEach
+  public synchronized void runBeforeEach() throws Exception {
+    initialized = spark != null && hdfsTestService != null;
+    if (!initialized) {
+      spark = SparkSession.builder()
+          .config(HoodieWriteClient.registerClasses(conf()))
+          .getOrCreate();
+      sqlContext = spark.sqlContext();
+      jsc = new JavaSparkContext(spark.sparkContext());
+
+      FileSystem.closeAll();
+      hdfsTestService = new HdfsTestService(sharedTempDir);
+      dfsCluster = hdfsTestService.start(true);
+      dfs = dfsCluster.getFileSystem();

Review comment:
       @prashantwason actually this makes me realized i made a mistake of cleaning up resources in `runAfterAll()` which runs for each test class. Let me fix that...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r443169927



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+
+import java.io.IOException;
+
+public class SharedResources implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  /**
+   * An indicator of the initialization status.
+   */
+  protected boolean initialized = false;
+
+  @Override
+  public SparkSession spark() {
+    return spark;
+  }
+
+  @Override
+  public SQLContext sqlContext() {
+    return sqlContext;
+  }
+
+  @Override
+  public JavaSparkContext jsc() {
+    return jsc;
+  }
+
+  @Override
+  public MiniDFSCluster dfsCluster() {
+    return dfsCluster;
+  }
+
+  @Override
+  public DistributedFileSystem dfs() {
+    return dfs;
+  }
+
+  @Override
+  public Path dfsBasePath() {
+    return dfs.getWorkingDirectory();
+  }
+
+  @BeforeEach
+  public synchronized void runBeforeEach() throws Exception {
+    initialized = spark != null && hdfsTestService != null;
+    if (!initialized) {
+      spark = SparkSession.builder()
+          .config(HoodieWriteClient.registerClasses(conf()))
+          .getOrCreate();
+      sqlContext = spark.sqlContext();
+      jsc = new JavaSparkContext(spark.sparkContext());
+
+      FileSystem.closeAll();

Review comment:
       Not quite sure about the harm.. but def. looks safer to move it to the beginning of the method




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r449815885



##########
File path: hudi-client/src/test/java/org/apache/hudi/testutils/FunctionalTestHarness.java
##########
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+
+public class FunctionalTestHarness implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  /**
+   * An indicator of the initialization status.
+   */
+  protected boolean initialized = false;

Review comment:
       To run functional tests for multiple modules, probably will need to ensure different process for each module and the sequential execution within a module...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar merged pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
vinothchandar merged pull request #1746:
URL: https://github.com/apache/hudi/pull/1746


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1746: [HUDI-996] Add functional test suite

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#issuecomment-646993804


   @xushiyan codecov seems happy?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r443222584



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+
+import java.io.IOException;
+
+public class SharedResources implements SparkProvider, DFSProvider {

Review comment:
       yes.. that sounds great.. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add SharedResources for functional tests

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r442573109



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+public class SharedResources implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  @TempDir
+  protected static java.nio.file.Path sharedTempDir;
+
+  protected boolean initialized = false;
+
+  @Override
+  public SparkSession spark() {
+    return spark;
+  }
+
+  @Override
+  public SQLContext sqlContext() {
+    return sqlContext;
+  }
+
+  @Override
+  public JavaSparkContext jsc() {
+    return jsc;
+  }
+
+  @Override
+  public MiniDFSCluster dfsCluster() {
+    return dfsCluster;
+  }
+
+  @Override
+  public DistributedFileSystem dfs() {
+    return dfs;
+  }
+
+  @Override
+  public Path dfsBasePath() {
+    return dfs.getWorkingDirectory();
+  }
+
+  @BeforeEach
+  public synchronized void runBeforeEach() throws Exception {
+    initialized = spark != null && hdfsTestService != null;
+    if (!initialized) {
+      spark = SparkSession.builder()
+          .config(HoodieWriteClient.registerClasses(conf()))
+          .getOrCreate();
+      sqlContext = spark.sqlContext();
+      jsc = new JavaSparkContext(spark.sparkContext());

Review comment:
       @prashantwason Thanks for checking. The instantiation is guarded by checking `spark != null && hdfsTestService != null;` which are static, so only the first subclass (a test class) will initialize `spark`, `hdfsTestService`, `jsc`, etc for 1 run of the same test suite, which encompass multiple test classes. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r449814824



##########
File path: hudi-client/src/test/java/org/apache/hudi/testutils/FunctionalTestHarness.java
##########
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+
+public class FunctionalTestHarness implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  /**
+   * An indicator of the initialization status.
+   */
+  protected boolean initialized = false;

Review comment:
       > Food for thought: if tests are run parallely, in the same jvm. (parallelism option in surefire)... this boolean may not be sufficient for synchronization.. i.e two tests can attempt to create these test resources in parallel.
   
   Yes, and this is taken care of in the `functional-tests` profile
   https://github.com/apache/hudi/blob/574dcf920c4677513f6fdc8d441bb42827afa5a2/pom.xml#L1024-L1025
   
   These 2 settings make sure creating only one new JVM process to execute all tests in one Maven module. So here we make sure all functional tests in utilities are run sequentially.
   
   > also how come this boolean is not `static`? otherwise in every before each it will try to create resourceS?
   
   The `initialized` is not used as lock, rather only an indicator. It is reset by checking `spark` and `hdfsTestService`
   https://github.com/apache/hudi/pull/1746/files/87eb111b4f40dc0129790c3043683e2e1b593ca0#diff-543bacddbb566cb15499906d64c783a5R87-R89
   This can be used for debugging purpose where subclass of this class prints this info...I can see that this may be confusing, maybe removing it later if not commonly used.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #1746: [HUDI-996] Add SharedResources for functional tests

Posted by GitBox <gi...@apache.org>.
xushiyan commented on pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#issuecomment-646877512


   waiting for codecov results, which are expected to be affected. Will investigate solution after that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r449719578



##########
File path: hudi-client/src/test/java/org/apache/hudi/testutils/FunctionalTestHarness.java
##########
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+
+public class FunctionalTestHarness implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  /**
+   * An indicator of the initialization status.
+   */
+  protected boolean initialized = false;

Review comment:
       Food for thought: if tests are run parallely, in the same jvm. (parallelism option in surefire)... this boolean may not be sufficient for synchronization.. i.e two tests can attempt to create these test resources in parallel.

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/UtilitiesFunctionalTestSuite.java
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.functional;
+
+import org.junit.platform.runner.JUnitPlatform;
+import org.junit.platform.suite.api.IncludeTags;
+import org.junit.platform.suite.api.SelectPackages;
+import org.junit.runner.RunWith;
+
+@RunWith(JUnitPlatform.class)
+@SelectPackages("org.apache.hudi.utilities.functional")
+@IncludeTags("functional")
+public class UtilitiesFunctionalTestSuite {

Review comment:
       should this be abstract? 

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/UtilitiesFunctionalTestSuite.java
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.functional;
+
+import org.junit.platform.runner.JUnitPlatform;
+import org.junit.platform.suite.api.IncludeTags;
+import org.junit.platform.suite.api.SelectPackages;
+import org.junit.runner.RunWith;
+
+@RunWith(JUnitPlatform.class)
+@SelectPackages("org.apache.hudi.utilities.functional")
+@IncludeTags("functional")
+public class UtilitiesFunctionalTestSuite {

Review comment:
       btw where is this used actually.. may be missing something.. 

##########
File path: hudi-client/src/test/java/org/apache/hudi/testutils/FunctionalTestHarness.java
##########
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+
+public class FunctionalTestHarness implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  /**
+   * An indicator of the initialization status.
+   */
+  protected boolean initialized = false;

Review comment:
       also how come this boolean is not `static`? otherwise in every before each it will try to create resourceS?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r443169804



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+
+import java.io.IOException;
+
+public class SharedResources implements SparkProvider, DFSProvider {

Review comment:
       How about `FunctionalTestHarness`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add functional test suite

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r443169647



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.utilities.HoodieSnapshotExporter.OutputFormatValidator;
+
+import com.beust.jcommander.ParameterException;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.NullSource;
+import org.junit.jupiter.params.provider.ValueSource;
+
+import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+public class TestHoodieSnapshotExporter {

Review comment:
       @vinothchandar yes exactly. created https://issues.apache.org/jira/browse/HUDI-1034 for this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r443169749



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/DFSProvider.java
##########
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+
+public interface DFSProvider {

Review comment:
       yup. it should be ok to move this in future PR where extend to more modules' functional tests




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #1746: [HUDI-996] Add SharedResources for functional tests

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r442568533



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+public class SharedResources implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  @TempDir
+  protected static java.nio.file.Path sharedTempDir;
+
+  protected boolean initialized = false;
+
+  @Override
+  public SparkSession spark() {
+    return spark;
+  }
+
+  @Override
+  public SQLContext sqlContext() {
+    return sqlContext;
+  }
+
+  @Override
+  public JavaSparkContext jsc() {
+    return jsc;
+  }
+
+  @Override
+  public MiniDFSCluster dfsCluster() {
+    return dfsCluster;
+  }
+
+  @Override
+  public DistributedFileSystem dfs() {
+    return dfs;
+  }
+
+  @Override
+  public Path dfsBasePath() {
+    return dfs.getWorkingDirectory();
+  }
+
+  @BeforeEach
+  public synchronized void runBeforeEach() throws Exception {
+    initialized = spark != null && hdfsTestService != null;
+    if (!initialized) {
+      spark = SparkSession.builder()
+          .config(HoodieWriteClient.registerClasses(conf()))
+          .getOrCreate();
+      sqlContext = spark.sqlContext();
+      jsc = new JavaSparkContext(spark.sparkContext());

Review comment:
       Can we create the JavaSparkContext only once per JVM? This may have the benefit of making tests faster as there is some delay (on my laptop 1second+) for creating jsc each time. 
   
   public JavaSparkContext jsc() {
      if (jsc == null) {
          // create spark context
      }
      return jsc;
   }
   

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+public class SharedResources implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  @TempDir
+  protected static java.nio.file.Path sharedTempDir;
+
+  protected boolean initialized = false;
+
+  @Override
+  public SparkSession spark() {
+    return spark;
+  }
+
+  @Override
+  public SQLContext sqlContext() {
+    return sqlContext;
+  }
+
+  @Override
+  public JavaSparkContext jsc() {
+    return jsc;
+  }
+
+  @Override
+  public MiniDFSCluster dfsCluster() {
+    return dfsCluster;
+  }
+
+  @Override
+  public DistributedFileSystem dfs() {
+    return dfs;
+  }
+
+  @Override
+  public Path dfsBasePath() {
+    return dfs.getWorkingDirectory();
+  }
+
+  @BeforeEach
+  public synchronized void runBeforeEach() throws Exception {
+    initialized = spark != null && hdfsTestService != null;
+    if (!initialized) {
+      spark = SparkSession.builder()
+          .config(HoodieWriteClient.registerClasses(conf()))
+          .getOrCreate();
+      sqlContext = spark.sqlContext();
+      jsc = new JavaSparkContext(spark.sparkContext());
+
+      FileSystem.closeAll();
+      hdfsTestService = new HdfsTestService(sharedTempDir);
+      dfsCluster = hdfsTestService.start(true);
+      dfs = dfsCluster.getFileSystem();

Review comment:
       Same here. Can we create the test service on-demand and only once? Does not need to be done in @BeforeEach as many tests may not even end up using the hdfsTestService.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1746:
URL: https://github.com/apache/hudi/pull/1746#discussion_r443222650



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/SharedResources.java
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.testutils;
+
+import org.apache.hudi.client.HoodieWriteClient;
+import org.apache.hudi.common.testutils.minicluster.HdfsTestService;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeEach;
+
+import java.io.IOException;
+
+public class SharedResources implements SparkProvider, DFSProvider {
+
+  private static transient SparkSession spark;
+  private static transient SQLContext sqlContext;
+  private static transient JavaSparkContext jsc;
+
+  private static transient HdfsTestService hdfsTestService;
+  private static transient MiniDFSCluster dfsCluster;
+  private static transient DistributedFileSystem dfs;
+
+  /**
+   * An indicator of the initialization status.
+   */
+  protected boolean initialized = false;
+
+  @Override
+  public SparkSession spark() {
+    return spark;
+  }
+
+  @Override
+  public SQLContext sqlContext() {
+    return sqlContext;
+  }
+
+  @Override
+  public JavaSparkContext jsc() {
+    return jsc;
+  }
+
+  @Override
+  public MiniDFSCluster dfsCluster() {
+    return dfsCluster;
+  }
+
+  @Override
+  public DistributedFileSystem dfs() {
+    return dfs;
+  }
+
+  @Override
+  public Path dfsBasePath() {
+    return dfs.getWorkingDirectory();
+  }
+
+  @BeforeEach
+  public synchronized void runBeforeEach() throws Exception {
+    initialized = spark != null && hdfsTestService != null;
+    if (!initialized) {
+      spark = SparkSession.builder()
+          .config(HoodieWriteClient.registerClasses(conf()))
+          .getOrCreate();
+      sqlContext = spark.sqlContext();
+      jsc = new JavaSparkContext(spark.sparkContext());
+
+      FileSystem.closeAll();
+      hdfsTestService = new HdfsTestService();
+      dfsCluster = hdfsTestService.start(true);
+      dfs = dfsCluster.getFileSystem();
+      dfs.mkdirs(dfs.getWorkingDirectory());
+    }
+  }
+
+  @AfterAll
+  public static synchronized void cleanUpAfterAll() throws IOException {

Review comment:
       yes good idea to register a shutdown hook, incase some thread pools not shutting down holds up the jvm exit.?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org