You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/09/08 17:34:47 UTC

[GitHub] [beam] yeandy commented on a diff in pull request #23087: Support models returning a dictionary of outputs

yeandy commented on code in PR #23087:
URL: https://github.com/apache/beam/pull/23087#discussion_r966233619


##########
sdks/python/apache_beam/ml/inference/pytorch_inference.py:
##########
@@ -281,6 +293,18 @@ def run_inference(
         batched_tensors = _convert_to_device(batched_tensors, self._device)
         key_to_batched_tensors[key] = batched_tensors
       predictions = model(**key_to_batched_tensors, **inference_args)
+      if isinstance(predictions, dict):
+        # Go from one dictionary of type: {key_type1: Iterable<val_type1>,
+        # key_type2: Iterable<val_type2>, ...} where each Iterable is of
+        # length batch_size, to a list of dictionaries:
+        # [{key_type1: value_type1, key_type2: value_type2}]
+        predictions_per_tensor = [
+            dict(zip(predictions, v)) for v in zip(*predictions.values())

Review Comment:
   nit: To be more explicit that we want the keys here
   ```suggestion
               dict(zip(predictions.keys(), v)) for v in zip(*predictions.values())
   ```



##########
sdks/python/apache_beam/ml/inference/pytorch_inference.py:
##########
@@ -164,6 +164,18 @@ def run_inference(
       batched_tensors = torch.stack(batch)
       batched_tensors = _convert_to_device(batched_tensors, self._device)
       predictions = model(batched_tensors, **inference_args)
+      if isinstance(predictions, dict):
+        # Go from one dictionary of type: {key_type1: Iterable<val_type1>,
+        # key_type2: Iterable<val_type2>, ...} where each Iterable is of
+        # length batch_size, to a list of dictionaries:
+        # [{key_type1: value_type1, key_type2: value_type2}]
+        predictions_per_tensor = [
+            dict(zip(predictions, v)) for v in zip(*predictions.values())
+        ]
+        return [
+            PredictionResult(x, y) for x,
+            y in zip(batch, predictions_per_tensor)
+        ]

Review Comment:
   Would it make sense to refactor the two occurrences of this into a helper?



##########
sdks/python/apache_beam/ml/inference/sklearn_inference.py:
##########
@@ -183,6 +195,18 @@ def run_inference(
     splits = [
         vectorized_batch.iloc[[i]] for i in range(vectorized_batch.shape[0])
     ]
+    if isinstance(predictions, dict):
+      # Go from one dictionary of type: {key_type1: Iterable<value_type1>,
+      # key_type2: Iterable<value_type2>, ...} where each Iterable is of
+      # length batch_size, to a list of dictionaries:
+      # [{key_type1: value_type1, key_type2: value_type2}]
+      predictions_per_split = [
+          dict(zip(predictions, v)) for v in zip(*predictions.values())
+      ]
+      return [
+          PredictionResult(example, inference) for example,
+          inference in zip(splits, predictions_per_split)
+      ]

Review Comment:
   Similarly to the Pytorch file, could we refactor this into a helper? Though it looks a bit more difficult since we're working with different types (numpy vs pandas) now, so I don't think we have to. It may make sense to explicitly keep the logic here to clarify. I'll leave this up to you.



##########
sdks/python/apache_beam/ml/inference/pytorch_inference_test.py:
##########
@@ -63,6 +63,17 @@
              for f1, f2 in TWO_FEATURES_EXAMPLES]).reshape(-1, 1))
 ]
 
+TWO_FEATURES_DICT_OUT_PREDICTIONS = [
+    PredictionResult(ex, pred) for ex,
+    pred in zip(
+        TWO_FEATURES_EXAMPLES,
+        [{
+            "output1": torch.Tensor([f1 * 2.0 + f2 * 3 + 0.5]),
+            "output2": torch.Tensor([f1 * 2.0 + f2 * 3 + 0.5])
+        } for f1,
+         f2 in TWO_FEATURES_EXAMPLES])
+]

Review Comment:
   Could we somehow use the existing `TWO_FEATURES_PREDICTIONS` to construct `TWO_FEATURES_DICT_OUT_PREDICTIONS` so that if the former changes the computation, the latter will follow?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org