You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/19 15:01:45 UTC
[GitHub] [beam] ryanthompson591 commented on a diff in pull request #22795: Fix gpu to cpu conversion with warning logs

ryanthompson591 commented on code in PR #22795:
URL: https://github.com/apache/beam/pull/22795#discussion_r950283081


##########
sdks/python/apache_beam/ml/inference/pytorch_inference.py:
##########
@@ -40,11 +41,30 @@
 def _load_model(
     model_class: torch.nn.Module, state_dict_path, device, **model_params):
   model = model_class(**model_params)
+
+  if device == torch.device('cuda') and not torch.cuda.is_available():
+    logging.warning(
+        "Specified 'GPU', but could not find device. Switching to CPU.")
+    device = torch.device('cpu')
+
+  try:
+    logging.info("Reading state_dict_path %s onto %s", state_dict_path, device)
+    file = FileSystems.open(state_dict_path, 'rb')
+    state_dict = torch.load(file, map_location=device)
+  except RuntimeError as e:

Review Comment:
   can this be more specific than a RuntimeError?
   
   Also can we narrow down the scope of the try/except block?
   
   For example, are we only expecting to fail if torch.load fails?
   



##########
sdks/python/apache_beam/ml/inference/pytorch_inference_test.py:
##########
@@ -373,6 +373,40 @@ def test_invalid_input_type(self):
         # pylint: disable=expression-not-assigned
         pcoll | RunInference(model_handler)
 
+  def test_gpu_convert_to_cpu(self):
+    with self.assertLogs() as log:
+      with TestPipeline() as pipeline:
+        examples = torch.from_numpy(
+            np.array([1, 5, 3, 10], dtype="float32").reshape(-1, 1))
+
+        state_dict = OrderedDict([('linear.weight', torch.Tensor([[2.0]])),
+                                  ('linear.bias', torch.Tensor([0.5]))])
+        path = os.path.join(self.tmpdir, 'my_state_dict_path')
+        torch.save(state_dict, path)
+
+        model_handler = PytorchModelHandlerTensor(
+            state_dict_path=path,
+            model_class=PytorchLinearRegression,
+            model_params={
+                'input_dim': 1, 'output_dim': 1
+            },
+            device='GPU')
+        # Upon initialization, device is cuda
+        self.assertEqual(model_handler._device, torch.device('cuda'))
+
+        pcoll = pipeline | 'start' >> beam.Create(examples)
+        # pylint: disable=expression-not-assigned
+        pcoll | RunInference(model_handler)
+
+        # During model loading, device converted to cuda
+        self.assertEqual(model_handler._device, torch.device('cuda'))
+
+      self.assertIn("INFO:root:Device is set to CUDA", log.output)
+      self.assertIn(

Review Comment:
   can you add something more to the test other than just looking at logs.  Maybe make sure the device change, or make sure that it runs.



##########
sdks/python/apache_beam/ml/inference/pytorch_inference_test.py:
##########
@@ -373,6 +373,40 @@ def test_invalid_input_type(self):
         # pylint: disable=expression-not-assigned
         pcoll | RunInference(model_handler)
 
+  def test_gpu_convert_to_cpu(self):
+    with self.assertLogs() as log:
+      with TestPipeline() as pipeline:
+        examples = torch.from_numpy(
+            np.array([1, 5, 3, 10], dtype="float32").reshape(-1, 1))
+
+        state_dict = OrderedDict([('linear.weight', torch.Tensor([[2.0]])),
+                                  ('linear.bias', torch.Tensor([0.5]))])
+        path = os.path.join(self.tmpdir, 'my_state_dict_path')
+        torch.save(state_dict, path)
+
+        model_handler = PytorchModelHandlerTensor(
+            state_dict_path=path,
+            model_class=PytorchLinearRegression,
+            model_params={
+                'input_dim': 1, 'output_dim': 1
+            },
+            device='GPU')
+        # Upon initialization, device is cuda
+        self.assertEqual(model_handler._device, torch.device('cuda'))
+
+        pcoll = pipeline | 'start' >> beam.Create(examples)
+        # pylint: disable=expression-not-assigned
+        pcoll | RunInference(model_handler)
+
+        # During model loading, device converted to cuda
+        self.assertEqual(model_handler._device, torch.device('cuda'))
+
+      self.assertIn("INFO:root:Device is set to CUDA", log.output)
+      self.assertIn(
+          "WARNING:root:Specified 'GPU', but could not find device. " \
+          "Switching to CPU.",
+          log.output)
+

Review Comment:
   Does it make sense to add a unit test where the model also fails even after trying to fall back to running on the CPU?



##########
sdks/python/apache_beam/ml/inference/pytorch_inference_test.py:
##########
@@ -373,6 +373,40 @@ def test_invalid_input_type(self):
         # pylint: disable=expression-not-assigned
         pcoll | RunInference(model_handler)
 
+  def test_gpu_convert_to_cpu(self):

Review Comment:
   why does this test fail over and work as CPU? Is it because the model is set up to be a CPU model, or is it because we are sure we are running the unit test on a device where the GPU fails?
   
   Can you add some comments for that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org