You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/08/09 13:59:41 UTC

[GitHub] [incubator-seatunnel] CalvinKirs opened a new pull request, #2391: [New-Engine]Checkpoint storage

CalvinKirs opened a new pull request, #2391:
URL: https://github.com/apache/incubator-seatunnel/pull/2391

   Support local file storage plugin
   Support proto-stuff serillazer
   Support SPI load check-point storage
   
   CheckPoint storage is a very important part of the SeaTunnel engine CheckPoint. This Issue mainly discusses the design and implementation here.
   
   The `check-point-storage-api` module defines the API for storage plugins and some common classes,
   
   We use SPI to load storage plugins. All sub-plugins need to implement CheckPointStorageFactory. In fact, implementing a plugin only pays attention to this class.
   
   ````
   public interface CheckPointStorageFactory {
   
       /**
        * Returns the name of the storage plugin
        */
       String name();
   
       /**
        * create storage plugin instance
        *
        * @param configuration storage system config params
        * key: storage system config key
        * value: storage system config value
        *e.g.
        * key: "FS_DEFAULT_NAME_KEY"
        * value: "fs.defaultFS"
        * return storage plugin instance
        */
       CheckPointStorage create(Map<String, String> configuration);
   }
   
   
   ````
   We implement local file storage by default, see the `check-point-storage-local-file` module for details.
   
   CheckPointStorage is a specific plug-in instance that needs to implement the following methods.
   
   ````
   public interface CheckPointStorage {
   
       /**
        * init storage and create parent directory if not exists
        *
        * @param configuration configuration storage system config params
        * @throws CheckPointStorageException if init failed
        */
       void initStorage(Map<String, String> configuration) throws CheckPointStorageException;
   
       /**
        * save checkpoint to storage
        *
        * @param state PipelineState
        * @throws CheckPointStorageException if save checkpoint failed
        */
       String storeCheckPoint(PipelineState state) throws CheckPointStorageException;
   
       /**
        * get all checkpoints from storage
        *
        * @param jobId job id
        * @return All job's checkpoint data from storage
        * @throws CheckPointStorageException if get checkpoint failed
        */
       List<PipelineState> getAllCheckpoints(String jobId);
   
       /**
        * get latest checkpoint from storage
        *
        * @param jobId job id
        * @return latest checkpoint data from storage
        * @throws CheckPointStorageException if get checkpoint failed
        */
       PipelineState getLatestCheckpoint(String jobId) throws CheckPointStorageException;
   
       /**
        * get checkpoint from storage, If there are multiple records, one will be returned randomly
        *
        * @param jobId job id
        * @param pipelineId pipeline id
        * @return checkpoint data from storage
        * @throws CheckPointStorageException if get checkpoint failed or no checkpoint found
        */
       PipelineState getCheckpointByJobIdAndPipelineId(String jobId, String pipelineId) throws CheckPointStorageException;
   
       /**
        * Delete all checkpoint data under the job
        *
        * @param jobId job id
        * @throws CheckPointStorageException if delete checkpoint failed
        */
       void deleteCheckpoint(String jobId);
   ````
   
   `org.apache.seatunnel.engine.checkpoint.storage.api.AbstractCheckPointStorage`
   Contains some abstract methods and common methods, such as serialization and deserialization, file naming, etc.
   
   We use `proto-stuff` serialization and deserialization by default. All stored file information is serialized data, and will be deserialized when data needs to be obtained. If you need to change other ways, it can also be implemented quickly.
   
   Remaining todo:
   
   support asynchronous,
   Support hdfs plugin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ruanwenjun commented on a diff in pull request #2391: [New-Engine]Checkpoint storage

Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on code in PR #2391:
URL: https://github.com/apache/incubator-seatunnel/pull/2391#discussion_r946314435


##########
seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/pom.xml:
##########
@@ -0,0 +1,44 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  ~
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+    <parent>
+        <artifactId>seatunnel-engine-storage</artifactId>
+        <groupId>org.apache.seatunnel</groupId>
+        <version>${revision}</version>
+    </parent>
+    <modelVersion>4.0.0</modelVersion>
+
+    <artifactId>checkpoint-storage-plugins</artifactId>
+    <packaging>pom</packaging>
+    <dependencies>
+        <dependency>
+            <groupId>com.google.auto.service</groupId>
+            <artifactId>auto-service</artifactId>
+        </dependency>
+    </dependencies>

Review Comment:
   It seems we already has this dependency in root/pom.xml's dependencies.



##########
seatunnel-engine/seatunnel-engine-storage/pom.xml:
##########
@@ -0,0 +1,46 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  ~
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+    <parent>
+        <artifactId>seatunnel-engine</artifactId>
+        <groupId>org.apache.seatunnel</groupId>
+        <version>${revision}</version>
+    </parent>
+    <modelVersion>4.0.0</modelVersion>
+
+    <artifactId>seatunnel-engine-storage</artifactId>
+    <packaging>pom</packaging>
+    <modules>
+        <module>checkpoint-storage-api</module>
+        <module>checkpoint-storage-plugins</module>
+    </modules>
+    
+    <dependencies>
+        <dependency>
+            <groupId>org.scala-lang</groupId>
+            <artifactId>scala-library</artifactId>
+            <scope>provided</scope>
+        </dependency>
+    </dependencies>

Review Comment:
   Do we need to add this in root/pom's dependencies?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] CalvinKirs commented on a diff in pull request #2391: [New-Engine]Checkpoint storage

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on code in PR #2391:
URL: https://github.com/apache/incubator-seatunnel/pull/2391#discussion_r943286326


##########
seatunnel-engine/seatunnel-check-point/check-point-storage-api/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/api/CheckPointStorage.java:
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+
+package org.apache.seatunnel.engine.checkpoint.storage.api;
+
+import org.apache.seatunnel.engine.checkpoint.storage.PipelineState;
+import org.apache.seatunnel.engine.checkpoint.storage.exception.CheckPointStorageException;
+
+import java.util.List;
+import java.util.Map;
+
+public interface CheckPointStorage {

Review Comment:
   thanks, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ashulin commented on a diff in pull request #2391: [New-Engine]Checkpoint storage

Posted by GitBox <gi...@apache.org>.
ashulin commented on code in PR #2391:
URL: https://github.com/apache/incubator-seatunnel/pull/2391#discussion_r943070205


##########
seatunnel-engine/seatunnel-check-point/check-point-storage-api/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/api/CheckPointStorage.java:
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+
+package org.apache.seatunnel.engine.checkpoint.storage.api;
+
+import org.apache.seatunnel.engine.checkpoint.storage.PipelineState;
+import org.apache.seatunnel.engine.checkpoint.storage.exception.CheckPointStorageException;
+
+import java.util.List;
+import java.util.Map;
+
+public interface CheckPointStorage {
+
+    /**
+     * init storage and create parent directory if not exists
+     *
+     * @param configuration configuration storage system config params
+     * @throws CheckPointStorageException if init failed
+     */
+    void initStorage(Map<String, String> configuration) throws CheckPointStorageException;
+
+    /**
+     * save checkpoint to storage
+     *
+     * @param state PipelineState
+     * @throws CheckPointStorageException if save checkpoint failed
+     */
+    String storeCheckPoint(PipelineState state) throws CheckPointStorageException;
+
+    /**
+     * get all checkpoint from storage
+     *
+     * @param jobId job id
+     * @return All job's checkpoint data from storage
+     * @throws CheckPointStorageException if get checkpoint failed
+     */
+    List<PipelineState> getAllCheckpoints(String jobId);
+
+    /**
+     * get latest checkpoint from storage
+     *
+     * @param jobId job id
+     * @return latest checkpoint data from storage
+     * @throws CheckPointStorageException if get checkpoint failed
+     */
+    PipelineState getLatestCheckpoint(String jobId) throws CheckPointStorageException;

Review Comment:
   ```suggestion
       List<PipelineState> getLatestCheckpoint(String jobId) throws CheckPointStorageException;
   ```
   A job has multiple pipelines. When the entire job restarts, it needs to obtain the latest checkpoint of all pipelines.



##########
seatunnel-engine/seatunnel-check-point/check-point-storage-api/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/api/CheckPointStorage.java:
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+
+package org.apache.seatunnel.engine.checkpoint.storage.api;
+
+import org.apache.seatunnel.engine.checkpoint.storage.PipelineState;
+import org.apache.seatunnel.engine.checkpoint.storage.exception.CheckPointStorageException;
+
+import java.util.List;
+import java.util.Map;
+
+public interface CheckPointStorage {

Review Comment:
   We need a uniform name, `checkpoint` is a word.



##########
seatunnel-engine/pom.xml:
##########
@@ -30,5 +30,6 @@
         <module>seatunnel-engine-common</module>
         <module>seatunnel-engine-server</module>
         <module>seatunnel-engine-core</module>
+        <module>seatunnel-check-point</module>

Review Comment:
   I don't think a module is needed for checkpoint because checkpoint is part of the server;
   I think this module can be changed to storage to support multiple storages, store data such as checkpoints, conf files, etc.
   ```suggestion
           <module>seatunnel-storage</module>
   ```
   And submodules will not be prefixed with checkpoint.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] CalvinKirs commented on a diff in pull request #2391: [New-Engine]Checkpoint storage

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on code in PR #2391:
URL: https://github.com/apache/incubator-seatunnel/pull/2391#discussion_r943287738


##########
seatunnel-engine/pom.xml:
##########
@@ -30,5 +30,6 @@
         <module>seatunnel-engine-common</module>
         <module>seatunnel-engine-server</module>
         <module>seatunnel-engine-core</module>
+        <module>seatunnel-check-point</module>

Review Comment:
   I updated to seatunnel-engine-storage, but for sub-module, I still keep this. The current interface is basically designed for checkpoint. If support is needed in the future, we will modify it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] CalvinKirs commented on a diff in pull request #2391: [New-Engine]Checkpoint storage

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on code in PR #2391:
URL: https://github.com/apache/incubator-seatunnel/pull/2391#discussion_r943286133


##########
seatunnel-engine/seatunnel-check-point/check-point-storage-api/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/api/CheckPointStorage.java:
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+
+package org.apache.seatunnel.engine.checkpoint.storage.api;
+
+import org.apache.seatunnel.engine.checkpoint.storage.PipelineState;
+import org.apache.seatunnel.engine.checkpoint.storage.exception.CheckPointStorageException;
+
+import java.util.List;
+import java.util.Map;
+
+public interface CheckPointStorage {
+
+    /**
+     * init storage and create parent directory if not exists
+     *
+     * @param configuration configuration storage system config params
+     * @throws CheckPointStorageException if init failed
+     */
+    void initStorage(Map<String, String> configuration) throws CheckPointStorageException;
+
+    /**
+     * save checkpoint to storage
+     *
+     * @param state PipelineState
+     * @throws CheckPointStorageException if save checkpoint failed
+     */
+    String storeCheckPoint(PipelineState state) throws CheckPointStorageException;
+
+    /**
+     * get all checkpoint from storage
+     *
+     * @param jobId job id
+     * @return All job's checkpoint data from storage
+     * @throws CheckPointStorageException if get checkpoint failed
+     */
+    List<PipelineState> getAllCheckpoints(String jobId);
+
+    /**
+     * get latest checkpoint from storage
+     *
+     * @param jobId job id
+     * @return latest checkpoint data from storage
+     * @throws CheckPointStorageException if get checkpoint failed
+     */
+    PipelineState getLatestCheckpoint(String jobId) throws CheckPointStorageException;

Review Comment:
   Thanks, updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ruanwenjun merged pull request #2391: [New-Engine]Checkpoint storage

Posted by GitBox <gi...@apache.org>.
ruanwenjun merged PR #2391:
URL: https://github.com/apache/incubator-seatunnel/pull/2391


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org