You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "chamikaramj (via GitHub)" <gi...@apache.org> on 2023/01/24 18:22:13 UTC

[GitHub] [beam] chamikaramj commented on a diff in pull request #25083: Multi language/runinference example

chamikaramj commented on code in PR #25083:
URL: https://github.com/apache/beam/pull/25083#discussion_r1085658344


##########
sdks/python/apache_beam/examples/inference/multi_language_inference/README.md:
##########
@@ -0,0 +1,60 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+-->
+## Setting up the Expansion service
+*Note: skip this step for Beam 2.44 and later.*
+
+Because we can not add local packages in Beam 2.43 we must create our own expansion service.
+Start up the expansion service with this command:
+
+```bash
+export PORT = <port to host expansion service>
+export IMAGE = <custom docker image>
+
+python -m multi_language_custom_transform.start_expansion_service  \
+    --port=$PORT \
+    --fully_qualified_name_glob="*" \
+    --environment_config=$IMAGE \
+    --environment_type=DOCKER
+```
+## Running the Java pipeline
+In another terminal, run the following command to start the Java pipeline:
+
+```bash
+export JOB_SERVER_PORT= <port to host expansion service>
+export JAVA_HOME= <path to java HOME>
+export GCP_PROJECT= <your gcp project>
+export GCP_BUCKET= <your gcp bucker>
+export GCP_REGION= <region of bucket>
+
+cd multi-language-beam

Review Comment:
   There should be instructions to download the Maven arche-type (or setup the Maven project) before this. For example,
   
   https://github.com/apache/beam/tree/master/examples/multi-language#instructions-for-running-the-java-pipeline-on-released-beam-beam-2430-and-later



##########
sdks/python/apache_beam/examples/inference/multi_language_inference/last_word_prediction/src/main/java/org/MultiLangRunInference.java:
##########
@@ -0,0 +1,89 @@
+package org;
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.extensions.python.PythonExternalTransform;
+import org.apache.beam.sdk.io.TextIO;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.options.Validation.Required;
+import org.apache.beam.sdk.values.PCollection;
+
+public class MultiLangRunInference {
+    public interface MultiLanguageOptions extends PipelineOptions {
+
+        @Description("Path to an input file that contains labels and pixels to feed into the model")
+        @Required
+        String getInputFile();
+
+        void setInputFile(String value);
+
+        @Description("Path to a stored model.")
+        @Required
+        String getModelPath();
+
+        void setModelPath(String value);
+
+        @Description("Path to an input file that contains labels and pixels to feed into the model")
+        @Required
+        String getOutputFile();
+
+        void setOutputFile(String value);
+
+        @Description("Name of the model on HuggingFace.")
+        @Required
+        String getModelName();
+
+        void setModelName(String value);
+
+        @Description("Port number of the expansion service.")
+        @Required
+        String getPort();
+
+        void setPort(String value);
+    }
+
+    public static void main(String[] args) {
+
+        MultiLanguageOptions options = PipelineOptionsFactory.fromArgs(args).withValidation()
+                .as(MultiLanguageOptions.class);
+
+        Pipeline p = Pipeline.create(options);
+        PCollection<String> input = p.apply("Read Input", TextIO.read().from(options.getInputFile()));
+        
+        /* For 2.44.0 and on
+        List<String> local_packages=new ArrayList<String>(); 

Review Comment:
   2.44.0 is out now. But I'm not sure if you even need this since Python modules mentioned here are released with Beam. `withExtraPackages` is only needed when installing custom packages that are not available in Beam **and** not using a custom expansion service.



##########
sdks/python/apache_beam/examples/inference/multi_language_inference/last_word_prediction/src/main/java/org/MultiLangRunInference.java:
##########
@@ -0,0 +1,89 @@
+package org;
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.extensions.python.PythonExternalTransform;
+import org.apache.beam.sdk.io.TextIO;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.options.Validation.Required;
+import org.apache.beam.sdk.values.PCollection;
+
+public class MultiLangRunInference {
+    public interface MultiLanguageOptions extends PipelineOptions {
+
+        @Description("Path to an input file that contains labels and pixels to feed into the model")
+        @Required
+        String getInputFile();
+
+        void setInputFile(String value);
+
+        @Description("Path to a stored model.")
+        @Required
+        String getModelPath();
+
+        void setModelPath(String value);
+
+        @Description("Path to an input file that contains labels and pixels to feed into the model")
+        @Required
+        String getOutputFile();
+
+        void setOutputFile(String value);
+
+        @Description("Name of the model on HuggingFace.")
+        @Required
+        String getModelName();
+
+        void setModelName(String value);
+
+        @Description("Port number of the expansion service.")
+        @Required
+        String getPort();
+
+        void setPort(String value);
+    }
+
+    public static void main(String[] args) {
+
+        MultiLanguageOptions options = PipelineOptionsFactory.fromArgs(args).withValidation()
+                .as(MultiLanguageOptions.class);
+
+        Pipeline p = Pipeline.create(options);
+        PCollection<String> input = p.apply("Read Input", TextIO.read().from(options.getInputFile()));
+        
+        /* For 2.44.0 and on
+        List<String> local_packages=new ArrayList<String>(); 
+        local_packages.add("multi_language_custom_transform"); 

Review Comment:
   FYI, `withExtraPackages` takes Python packages (PyPi or local tarball) that can be staged by Beam: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#local-or-nonpypi
   
    



##########
sdks/python/apache_beam/examples/inference/multi_language_inference/last_word_prediction/src/main/java/org/MultiLangRunInference.java:
##########
@@ -0,0 +1,89 @@
+package org;
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.extensions.python.PythonExternalTransform;
+import org.apache.beam.sdk.io.TextIO;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.options.Validation.Required;
+import org.apache.beam.sdk.values.PCollection;
+
+public class MultiLangRunInference {
+    public interface MultiLanguageOptions extends PipelineOptions {
+
+        @Description("Path to an input file that contains labels and pixels to feed into the model")
+        @Required
+        String getInputFile();
+
+        void setInputFile(String value);
+
+        @Description("Path to a stored model.")
+        @Required
+        String getModelPath();
+
+        void setModelPath(String value);
+
+        @Description("Path to an input file that contains labels and pixels to feed into the model")
+        @Required
+        String getOutputFile();
+
+        void setOutputFile(String value);
+
+        @Description("Name of the model on HuggingFace.")
+        @Required
+        String getModelName();
+
+        void setModelName(String value);
+
+        @Description("Port number of the expansion service.")
+        @Required
+        String getPort();
+
+        void setPort(String value);
+    }
+
+    public static void main(String[] args) {
+
+        MultiLanguageOptions options = PipelineOptionsFactory.fromArgs(args).withValidation()
+                .as(MultiLanguageOptions.class);
+
+        Pipeline p = Pipeline.create(options);
+        PCollection<String> input = p.apply("Read Input", TextIO.read().from(options.getInputFile()));
+        
+        /* For 2.44.0 and on
+        List<String> local_packages=new ArrayList<String>(); 
+        local_packages.add("multi_language_custom_transform"); 
+        */
+        List<String> packages=new ArrayList<String>();  
+        input.apply("Predict", PythonExternalTransform.<PCollection<String>, PCollection<String>>from(
+                "multi_language_custom_transform.composite_transform.InferenceTransform", "localhost:" + options.getPort())
+                .withKwarg("model", options.getModelName())
+                .withKwarg("model_path", options.getModelPath())
+                // .withExtraPackages(multi_language_custom_transform)
+                )
+                .apply("Write Output", TextIO.write().to(options.getOutputFile()));
+
+        p.run().waitUntilFinish();

Review Comment:
   +1 but this might not even be needed in the future if Python modules here are released with Beam.



##########
sdks/python/apache_beam/examples/inference/multi_language_inference/multi_language_custom_transform/start_expansion_service.py:
##########
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import logging
+import sys
+
+from apache_beam.runners.portability import expansion_service_main
+
+if __name__ == '__main__':
+  # Start the expansion service.
+  logging.getLogger().setLevel(logging.INFO)
+  expansion_service_main.main(sys.argv)

Review Comment:
   Can we just run "expansion_service_main" directly and remove this ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org