You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2021/10/17 00:29:56 UTC

[GitHub] [flink-ml] lindong28 commented on a change in pull request #10: [FLINK-24354][FLIP-174] Improve the WithParams interface

lindong28 commented on a change in pull request #10:
URL: https://github.com/apache/flink-ml/pull/10#discussion_r730194806



##########
File path: flink-ml-api/src/main/java/org/apache/flink/ml/api/core/Pipeline.java
##########
@@ -97,17 +100,17 @@ public PipelineModel fit(Table... inputs) {
     }
 
     @Override
-    public void save(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    public Map<Param<?>, Object> getUserDefinedParamMap() {
+        return paramMap;
     }
 
-    public static Pipeline load(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    @Override
+    public void save(String path) throws IOException {
+        ReadWriteUtils.savePipeline(this, stages, path);

Review comment:
       Just to clarify, the current implementation didn't assume the client can access the stored model data. It just assumes the client can access the stored model metadata such as class name and parameters, which, unlike model data, is usually small in size and can be stored in a single file.
   
   The PR does not try to save/load any model data. The work to save/load model data will be left to the implementation of save/load of each `Stage` and is supposed to happen during the Job Graph execution.
   
   My understanding is that  this concern is unrelated to this PR. This is the metadata of the store has to be read from some external storage. If the client can not access this storage, there is nothing this PR can do, right?
   
   

##########
File path: flink-ml-api/src/main/java/org/apache/flink/ml/api/core/Pipeline.java
##########
@@ -97,17 +100,17 @@ public PipelineModel fit(Table... inputs) {
     }
 
     @Override
-    public void save(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    public Map<Param<?>, Object> getUserDefinedParamMap() {
+        return paramMap;
     }
 
-    public static Pipeline load(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    @Override
+    public void save(String path) throws IOException {
+        ReadWriteUtils.savePipeline(this, stages, path);

Review comment:
       Just to clarify, the current implementation does not assume the client can access the stored model data. It just assumes the client can access the stored model metadata such as class name and parameters, which, unlike model data, is usually small in size and can be stored in a single file.
   
   The PR does not try to save/load any model data. The work to save/load model data will be left to the implementation of save/load of each `Stage` and is supposed to happen during the Job Graph execution.
   
   My understanding is that  this concern is unrelated to this PR. This is the metadata of the store has to be read from some external storage. If the client can not access this storage, there is nothing this PR can do, right?
   
   

##########
File path: flink-ml-api/src/main/java/org/apache/flink/ml/api/core/Pipeline.java
##########
@@ -97,17 +100,17 @@ public PipelineModel fit(Table... inputs) {
     }
 
     @Override
-    public void save(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    public Map<Param<?>, Object> getUserDefinedParamMap() {
+        return paramMap;
     }
 
-    public static Pipeline load(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    @Override
+    public void save(String path) throws IOException {
+        ReadWriteUtils.savePipeline(this, stages, path);

Review comment:
       Just to clarify, the current implementation does not assume the client can access the stored model data. It just assumes the client can access the stored model metadata such as class name and parameters, which, unlike model data, is usually small in size and can be stored in a single file.
   
   The PR does not try to save/load any model data. The work to save/load model data will be left to the implementation of save/load of each `Stage` and is supposed to happen during the Job Graph execution.
   
   Note that the path to the metadata file can be a on a remote storage (e.g. HDFS, Amazon S3), and in the future we can extend the loadMetadata(...) to fetch file from those remote storage, as long as the client can access those remote storage
   
   My understanding is that  this concern is unrelated to this PR. This is the metadata of the store has to be read from some external storage. If the client can not access this storage, there is nothing this PR can do, right?
   
   

##########
File path: flink-ml-api/src/main/java/org/apache/flink/ml/api/core/Pipeline.java
##########
@@ -97,17 +100,17 @@ public PipelineModel fit(Table... inputs) {
     }
 
     @Override
-    public void save(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    public Map<Param<?>, Object> getUserDefinedParamMap() {
+        return paramMap;
     }
 
-    public static Pipeline load(String path) throws IOException {
-        throw new UnsupportedOperationException();
+    @Override
+    public void save(String path) throws IOException {
+        ReadWriteUtils.savePipeline(this, stages, path);

Review comment:
       Just to clarify, the current implementation does not assume the client can access the stored model data. It just assumes the client can access the stored model metadata such as class name and parameters, which, unlike model data, is usually small in size and can be stored in a single file.
   
   The PR does not try to save/load any model data. The work to save/load model data will be left to the implementation of save/load of each `Stage` and is supposed to happen during the Job Graph execution.
   
   Note that the path to the metadata file can be a on a remote storage (e.g. HDFS, Amazon S3), and in the future we can extend the loadMetadata(...) to fetch file from those remote storage, as long as the client can access those remote storage.
   
   My understanding is that  this concern is unrelated to this PR. This is the metadata of the store has to be read from some external storage. If the client can not access this storage, there is nothing this PR can do, right?
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org