You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2020/03/09 06:26:53 UTC

[GitHub] [incubator-gobblin] amarnathkarthik commented on a change in pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were …

amarnathkarthik commented on a change in pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r389480643
 
 

 ##########
 File path: gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDatasetTest.java
 ##########
 @@ -91,12 +110,82 @@ public void testConfigOptions() {
         TimeBasedCopyPolicyForTest.class.getName());
   }
 
+  @Test
+  public void testCopyWithFilter() throws IOException {
+
+    /** source setup **/
+    Path srcRoot = new Path(this.testTempPath, "src/data/dataset1/daily");
+
+    if (this.localFs.exists(srcRoot)) {
+      this.localFs.delete(srcRoot, true);
+    }
+
+    List<DateTime> dateTimeList = Lists.newArrayList();
+    IntStream.range(0, 4)
+        .forEach(
+            i -> dateTimeList.add(new DateTime(DateTimeZone.forID(ConfigurationKeys.PST_TIMEZONE_NAME)).minusDays(i)));
+
+    String datePattern = "yyyy/MM/dd";
+    DateTimeFormatter formatter = DateTimeFormat.forPattern(datePattern);
+
+    for (DateTime dt : dateTimeList) {
+      String srcVersionPathStr = formatter.print(dt);
+      Path srcVersionPath = new Path(srcRoot, srcVersionPathStr);
+      this.localFs.mkdirs(srcVersionPath);
+
+      Path srcfile = new Path(srcVersionPath, "file1.avro");
+      this.localFs.create(srcfile);
+    }
+
+    /** destination setup **/
+    Path destRoot = new Path(this.testTempPath, "dest/data/dataset1");
+    if (this.localFs.exists(destRoot)) {
+      this.localFs.delete(destRoot, true);
+    }
+    this.localFs.mkdirs(destRoot);
+
+    Properties props = new Properties();
+    props.setProperty(TimestampBasedCopyableDataset.COPY_POLICY, SelectBetweenTimeBasedPolicy.class.getName());
+    props.setProperty(TimestampBasedCopyableDataset.DATASET_VERSION_FINDER,
+        DateTimeDatasetVersionFinder.class.getName());
+    props.setProperty(SelectBetweenTimeBasedPolicy.TIME_BASED_SELECTION_MIN_LOOK_BACK_TIME_KEY, "1d");
+    props.setProperty(SelectBetweenTimeBasedPolicy.TIME_BASED_SELECTION_MAX_LOOK_BACK_TIME_KEY, "6d");
+    props.setProperty(DateTimeDatasetVersionFinder.DATE_TIME_PATTERN_KEY, "yyyy/MM/dd");
+    props.setProperty("gobblin.dataset.copyable.file.filter.class",
 
 Review comment:
   org.apache.gobblin.data.management.dataset.DatasetUtils and org.apache.gobblin.data.management.copy.TimestampBasedCopyableDatasetTest are in different package, will change the access modifier to public.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services