You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/09/19 20:59:39 UTC

[GitHub] [beam] yeandy opened a new pull request, #23296: Add PytorchBatchConverter

yeandy opened a new pull request, #23296:
URL: https://github.com/apache/beam/pull/23296

   Adding `PytorchBatchConverter` as [discussed](https://github.com/apache/beam/issues/21440#issuecomment-1239777760) in https://github.com/apache/beam/issues/21440 as an initial step to integrate Batched DoFns into RunInference.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on pull request #23296: Add PytorchBatchConverter

Posted by GitBox <gi...@apache.org>.
yeandy commented on PR #23296:
URL: https://github.com/apache/beam/pull/23296#issuecomment-1281240059

   Thanks for the small fix and confirmation!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on pull request #23296: Add PytorchBatchConverter

Posted by GitBox <gi...@apache.org>.
yeandy commented on PR #23296:
URL: https://github.com/apache/beam/pull/23296#issuecomment-1251544546

   R: @TheNeuralBit 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on a diff in pull request #23296: Add PytorchBatchConverter

Posted by GitBox <gi...@apache.org>.
yeandy commented on code in PR #23296:
URL: https://github.com/apache/beam/pull/23296#discussion_r995937420


##########
sdks/python/apache_beam/typehints/batch.py:
##########
@@ -35,6 +35,7 @@
 from typing import TypeVar
 
 import numpy as np
+import torch

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on a diff in pull request #23296: Add PytorchBatchConverter

Posted by GitBox <gi...@apache.org>.
yeandy commented on code in PR #23296:
URL: https://github.com/apache/beam/pull/23296#discussion_r974687712


##########
sdks/python/apache_beam/typehints/batch_test.py:
##########
@@ -54,6 +64,17 @@
         'element_typehint': str,
         'batch': ["foo" * (i % 5) + str(i) for i in range(1000)],
     },
+    {
+        'batch_typehint': torch.Tensor,
+        'element_typehint': torch.Tensor,

Review Comment:
   A single element of a `Tensor` array is also a `Tensor`. e.g.
   ```
   >>> a = torch.Tensor([1,2,3])
   >>> a
   tensor([1., 2., 3.])
   >>> a[0]
   tensor(1.)
   >>> type(a)
   <class 'torch.Tensor'>
   >>> type(a[0])
   <class 'torch.Tensor'>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] TheNeuralBit commented on pull request #23296: Add PytorchBatchConverter

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on PR #23296:
URL: https://github.com/apache/beam/pull/23296#issuecomment-1279487100

   It looks like codecov is unhappy since this file looks untested. Presumably because we don't have pytorch installed in the run generating coverage data. I confirmed this test is exercised on Jenkins: 
   ![image](https://user-images.githubusercontent.com/675055/195946894-ff5160bf-6b75-43b6-b2d0-b47e049d69a2.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] TheNeuralBit merged pull request #23296: Add PytorchBatchConverter

Posted by GitBox <gi...@apache.org>.
TheNeuralBit merged PR #23296:
URL: https://github.com/apache/beam/pull/23296


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] TheNeuralBit commented on a diff in pull request #23296: Add PytorchBatchConverter

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on code in PR #23296:
URL: https://github.com/apache/beam/pull/23296#discussion_r996170680


##########
sdks/python/apache_beam/typehints/pytorch_type_compatibility_test.py:
##########
@@ -0,0 +1,138 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Unit tests for pytorch_type_compabitility."""
+
+import unittest
+
+import pytest
+from parameterized import parameterized
+from parameterized import parameterized_class
+
+from apache_beam.typehints import typehints
+from apache_beam.typehints.batch import BatchConverter
+from apache_beam.typehints.batch import N
+
+# Protect against environments where pytorch library is not available.
+# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports
+try:
+  import torch
+  from apache_beam.typehints.pytorch_type_compatibility import PytorchTensor
+except ImportError:
+  raise unittest.SkipTest('PyTorch dependencies are not installed')
+
+
+@parameterized_class([
+    {
+        'batch_typehint': torch.Tensor,
+        'element_typehint': PytorchTensor[torch.int32, ()],
+        'batch': torch.tensor(range(100), dtype=torch.int32)
+    },
+    {
+        'batch_typehint': PytorchTensor[torch.int64, (N, 10)],
+        'element_typehint': PytorchTensor[torch.int64, (10, )],
+        'batch': torch.tensor([list(range(i, i + 10)) for i in range(100)],
+                              dtype=torch.int64),
+    },
+])
+@pytest.mark.uses_pytorch
+class DataFrameBatchConverterTest(unittest.TestCase):

Review Comment:
   nit: copy-paste error
   ```suggestion
   class PytorchBatchConverterTest(unittest.TestCase):
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] TheNeuralBit commented on a diff in pull request #23296: Add PytorchBatchConverter

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on code in PR #23296:
URL: https://github.com/apache/beam/pull/23296#discussion_r975656793


##########
sdks/python/apache_beam/typehints/batch.py:
##########
@@ -35,6 +35,7 @@
 from typing import TypeVar
 
 import numpy as np
+import torch

Review Comment:
   Could you make this a separate module, following the pattern I just used for pandas: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/pandas_type_compatibility.py
   
   That way it can be separable from everything else here if `torch` is available.



##########
sdks/python/apache_beam/typehints/batch.py:
##########
@@ -35,6 +35,7 @@
 from typing import TypeVar
 
 import numpy as np
+import torch

Review Comment:
   One issue with that is I haven't found a good way to re-use the BatchConverterTest logic, for now it's just duplicated in `pandas_type_compatibility_test`.



##########
sdks/python/apache_beam/typehints/batch_test.py:
##########
@@ -54,6 +64,17 @@
         'element_typehint': str,
         'batch': ["foo" * (i % 5) + str(i) for i in range(1000)],
     },
+    {
+        'batch_typehint': torch.Tensor,
+        'element_typehint': torch.Tensor,

Review Comment:
   Interesting. It could be problematic to allow this as an element type though, since it's unclear what data type is.
   
   For now, could we always represent scalars as a 0-dim PytorchTensor? i.e. `PytorchTensor[torch.int32, (,)]`. There could also be a wrapper for this, like `PytorchScalar[torch.int32]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on pull request #23296: Add PytorchBatchConverter

Posted by GitBox <gi...@apache.org>.
yeandy commented on PR #23296:
URL: https://github.com/apache/beam/pull/23296#issuecomment-1279220064

   PTAL @TheNeuralBit 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #23296: Add PytorchBatchConverter

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #23296:
URL: https://github.com/apache/beam/pull/23296#issuecomment-1251545630

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on a diff in pull request #23296: Add PytorchBatchConverter

Posted by GitBox <gi...@apache.org>.
yeandy commented on code in PR #23296:
URL: https://github.com/apache/beam/pull/23296#discussion_r976789614


##########
sdks/python/apache_beam/typehints/batch_test.py:
##########
@@ -54,6 +64,17 @@
         'element_typehint': str,
         'batch': ["foo" * (i % 5) + str(i) for i in range(1000)],
     },
+    {
+        'batch_typehint': torch.Tensor,
+        'element_typehint': torch.Tensor,

Review Comment:
   Yeah, let me try that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org