You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@gobblin.apache.org by ab...@apache.org on 2017/07/29 08:01:49 UTC

[1/2] incubator-gobblin git commit: gobblin-data-management cli + example configuration

Repository: incubator-gobblin
Updated Branches:
  refs/heads/master 15ac4679b -> 725a0829d


gobblin-data-management cli + example configuration


Project: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/commit/5a896d23
Tree: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/tree/5a896d23
Diff: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/diff/5a896d23

Branch: refs/heads/master
Commit: 5a896d23aab7f037fd206b984caab7d39085f68b
Parents: 0751146
Author: Michal Ferlinski <mi...@allegrogroup.com>
Authored: Wed May 17 10:28:45 2017 +0200
Committer: Michal Ferlinski <mi...@allegrogroup.com>
Committed: Mon May 22 08:19:08 2017 +0200

----------------------------------------------------------------------
 .../config-example/gobblin-retention-run.sh     | 20 +++++
 .../config-example/gobblin-retention.properties | 25 ++++++
 .../_CONFIG_STORE/1.0/hive/db1/main.conf        | 18 ++++
 .../_CONFIG_STORE/1.0/hive/db2/main.conf        | 18 ++++
 .../_CONFIG_STORE/1.0/hive/db2/table1/main.conf | 18 ++++
 .../_CONFIG_STORE/1.0/hive/includes.conf        | 18 ++++
 .../1.0/tags/retention/hive/main.conf           | 38 +++++++++
 .../1.0/tags/retention/timebased/main.conf      | 31 +++++++
 .../_CONFIG_STORE/store-metadata.conf           | 18 ++++
 .../runtime/retention/DatasetCleanerCli.java    | 89 ++++++++++++++++++++
 .../data-management/Gobblin-Retention.md        |  3 +
 11 files changed, 296 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/5a896d23/gobblin-data-management/config-example/gobblin-retention-run.sh
----------------------------------------------------------------------
diff --git a/gobblin-data-management/config-example/gobblin-retention-run.sh b/gobblin-data-management/config-example/gobblin-retention-run.sh
new file mode 100755
index 0000000..72ce340
--- /dev/null
+++ b/gobblin-data-management/config-example/gobblin-retention-run.sh
@@ -0,0 +1,20 @@
+#!/bin/bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+$GOBBLIN_HOME/bin/gobblin cleaner -c gobblin-retention.properties

http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/5a896d23/gobblin-data-management/config-example/gobblin-retention.properties
----------------------------------------------------------------------
diff --git a/gobblin-data-management/config-example/gobblin-retention.properties b/gobblin-data-management/config-example/gobblin-retention.properties
new file mode 100644
index 0000000..0e7c805
--- /dev/null
+++ b/gobblin-data-management/config-example/gobblin-retention.properties
@@ -0,0 +1,25 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+gobblin.config.management.store.uri=simple-hdfs://cluster1/user/root/gobblin-config-management
+
+hive.dataset.database=db1,db2
+gobblin.retention.hive.shouldDeleteData=true
+
+gobblin.retention.tag=simple-hdfs://cluster1/user/root/gobblin-config-management/tags/retention/timebased,simple-hdfs://cluster1/user/root/gobblin-config-management/tags/retention/hive
+
+gobblin.retention.skip.trash=true

http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/5a896d23/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/db1/main.conf
----------------------------------------------------------------------
diff --git a/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/db1/main.conf b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/db1/main.conf
new file mode 100644
index 0000000..7fb8d7f
--- /dev/null
+++ b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/db1/main.conf
@@ -0,0 +1,18 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+gobblin.retention.selection.timeBased.lookbackTime=30d

http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/5a896d23/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/db2/main.conf
----------------------------------------------------------------------
diff --git a/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/db2/main.conf b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/db2/main.conf
new file mode 100644
index 0000000..35fa65d
--- /dev/null
+++ b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/db2/main.conf
@@ -0,0 +1,18 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+gobblin.retention.selection.timeBased.lookbackTime=32d

http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/5a896d23/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/db2/table1/main.conf
----------------------------------------------------------------------
diff --git a/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/db2/table1/main.conf b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/db2/table1/main.conf
new file mode 100644
index 0000000..595ab06
--- /dev/null
+++ b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/db2/table1/main.conf
@@ -0,0 +1,18 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+gobblin.retention.selection.timeBased.lookbackTime=15d

http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/5a896d23/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/includes.conf
----------------------------------------------------------------------
diff --git a/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/includes.conf b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/includes.conf
new file mode 100644
index 0000000..1763dcf
--- /dev/null
+++ b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/hive/includes.conf
@@ -0,0 +1,18 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+/tags/retention/hive

http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/5a896d23/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/tags/retention/hive/main.conf
----------------------------------------------------------------------
diff --git a/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/tags/retention/hive/main.conf b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/tags/retention/hive/main.conf
new file mode 100644
index 0000000..14fd2d5
--- /dev/null
+++ b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/tags/retention/hive/main.conf
@@ -0,0 +1,38 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+gobblin.retention : {
+
+    is.blacklisted=false
+
+    dataset : {
+      finder.class=gobblin.data.management.retention.dataset.finder.CleanableHiveDatasetFinder
+    }
+
+    selection : {
+      policy.class=gobblin.data.management.policy.SelectBeforeTimeBasedPolicy
+    }
+
+    version.finder.class=gobblin.data.management.version.finder.DatePartitionHiveVersionFinder
+
+    hive {
+      partition {
+        key.name=partition_column_name
+        value.datetime.pattern=yyyy-MM-dd
+      }
+    }
+}

http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/5a896d23/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/tags/retention/timebased/main.conf
----------------------------------------------------------------------
diff --git a/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/tags/retention/timebased/main.conf b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/tags/retention/timebased/main.conf
new file mode 100644
index 0000000..ad20ad2
--- /dev/null
+++ b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/1.0/tags/retention/timebased/main.conf
@@ -0,0 +1,31 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+gobblin.retention : {
+
+    dataset : {
+      finder.class=gobblin.data.management.retention.profile.ManagedCleanableDatasetFinder
+    }
+
+    selection : {
+      policy.class=gobblin.data.management.policy.SelectBeforeTimeBasedPolicy
+    }
+
+    version : {
+      finder.class=gobblin.data.management.version.finder.GlobModTimeDatasetVersionFinder
+    }
+}

http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/5a896d23/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/store-metadata.conf
----------------------------------------------------------------------
diff --git a/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/store-metadata.conf b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/store-metadata.conf
new file mode 100644
index 0000000..d4e32ac
--- /dev/null
+++ b/gobblin-data-management/config-example/hdfs-gobblin-config-store/user/root/gobblin-config-management/_CONFIG_STORE/store-metadata.conf
@@ -0,0 +1,18 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+config.hdfs.store.version.current=1.0

http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/5a896d23/gobblin-data-management/src/main/java/gobblin/runtime/retention/DatasetCleanerCli.java
----------------------------------------------------------------------
diff --git a/gobblin-data-management/src/main/java/gobblin/runtime/retention/DatasetCleanerCli.java b/gobblin-data-management/src/main/java/gobblin/runtime/retention/DatasetCleanerCli.java
new file mode 100644
index 0000000..f1ca40c
--- /dev/null
+++ b/gobblin-data-management/src/main/java/gobblin/runtime/retention/DatasetCleanerCli.java
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package gobblin.runtime.retention;
+
+import gobblin.annotation.Alias;
+import gobblin.data.management.retention.DatasetCleaner;
+import gobblin.runtime.cli.CliApplication;
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.CommandLineParser;
+import org.apache.commons.cli.DefaultParser;
+import org.apache.commons.cli.HelpFormatter;
+import org.apache.commons.cli.Option;
+import org.apache.commons.cli.Options;
+import org.apache.commons.cli.ParseException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Properties;
+
+
+@Alias(value = "cleaner", description = "Data retention utility")
+public class DatasetCleanerCli implements CliApplication {
+  private static final Option CLEANER_CONFIG =
+          Option.builder("c").longOpt("config").hasArg().required().desc("DatasetCleaner configuration").build();
+
+  @Override
+  public void run(String[] args) {
+    try {
+      Properties properties = readProperties(parseConfigLocation(args));
+      DatasetCleaner datasetCleaner = new DatasetCleaner(FileSystem.get(new Configuration()), properties);
+      datasetCleaner.clean();
+    } catch (IOException e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  private Properties readProperties(String fileLocation) {
+    try {
+      Properties prop = new Properties();
+      FileInputStream input = new FileInputStream(fileLocation);
+      prop.load(input);
+      input.close();
+      return prop;
+    } catch (IOException e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  private String parseConfigLocation(String[] args) {
+    Options options = new Options();
+    options.addOption(CLEANER_CONFIG);
+
+    CommandLine cli;
+    try {
+      CommandLineParser parser = new DefaultParser();
+      cli = parser.parse(options, Arrays.copyOfRange(args, 1, args.length));
+    } catch (ParseException pe) {
+      System.out.println("Command line parse exception: " + pe.getMessage());
+      printUsage(options);
+      throw new RuntimeException(pe);
+    }
+    return cli.getOptionValue(CLEANER_CONFIG.getOpt());
+  }
+
+  private void printUsage(Options options) {
+    HelpFormatter formatter = new HelpFormatter();
+
+    String usage = "DatasetCleaner configuration ";
+    formatter.printHelp(usage, options);
+  }
+}

http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/5a896d23/gobblin-docs/data-management/Gobblin-Retention.md
----------------------------------------------------------------------
diff --git a/gobblin-docs/data-management/Gobblin-Retention.md b/gobblin-docs/data-management/Gobblin-Retention.md
index e29b81f..7aa51c5 100644
--- a/gobblin-docs/data-management/Gobblin-Retention.md
+++ b/gobblin-docs/data-management/Gobblin-Retention.md
@@ -145,6 +145,9 @@ gobblin.retention : {
 }
 </pre>
 
+### Examples
+Browse the [gobblin-data-management/config-example](/gobblin-data-management/config-example) directory to see example configuration.
+
 ## Supported Retention Configurations
 Below is a list of ready to use supported retention configurations. But users can always implement their own ```DatasetFinder```,```VersionFinder``` and ```VersionSelectionPolicy``` and plug it in.
 


[2/2] incubator-gobblin git commit: Merge pull request #1873 from fermich/gobblin-data-retention-run

Posted by ab...@apache.org.
Merge pull request #1873 from fermich/gobblin-data-retention-run


Project: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/commit/725a0829
Tree: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/tree/725a0829
Diff: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/diff/725a0829

Branch: refs/heads/master
Commit: 725a0829d99c7d2d0fa8f63fc3b36b0c77334089
Parents: 15ac467 5a896d2
Author: Abhishek Tiwari <ab...@gmail.com>
Authored: Sat Jul 29 01:01:45 2017 -0700
Committer: Abhishek Tiwari <ab...@gmail.com>
Committed: Sat Jul 29 01:01:45 2017 -0700

----------------------------------------------------------------------
 .../config-example/gobblin-retention-run.sh     | 20 +++++
 .../config-example/gobblin-retention.properties | 25 ++++++
 .../_CONFIG_STORE/1.0/hive/db1/main.conf        | 18 ++++
 .../_CONFIG_STORE/1.0/hive/db2/main.conf        | 18 ++++
 .../_CONFIG_STORE/1.0/hive/db2/table1/main.conf | 18 ++++
 .../_CONFIG_STORE/1.0/hive/includes.conf        | 18 ++++
 .../1.0/tags/retention/hive/main.conf           | 38 +++++++++
 .../1.0/tags/retention/timebased/main.conf      | 31 +++++++
 .../_CONFIG_STORE/store-metadata.conf           | 18 ++++
 .../runtime/retention/DatasetCleanerCli.java    | 89 ++++++++++++++++++++
 .../data-management/Gobblin-Retention.md        |  3 +
 11 files changed, 296 insertions(+)
----------------------------------------------------------------------