You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "smeet07 (via GitHub)" <gi...@apache.org> on 2023/04/11 19:54:48 UTC

[GitHub] [beam] smeet07 opened a new pull request, #26225: create helper function

smeet07 opened a new pull request, #26225:
URL: https://github.com/apache/beam/pull/26225

   Create helper function to convert rank 2 tensor to rank 1 tensor
   
   addresses #24902
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #26225: create helper function

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1520528333

   Stop reviewer notifications


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on a diff in pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on code in PR #26225:
URL: https://github.com/apache/beam/pull/26225#discussion_r1266984948


##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py:
##########
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+
+import numpy as np
+import pytest
+
+from typing import Any, Callable, Optional

Review Comment:
   Remove Any, Callable from imports since they are not being used anywhere.
   
   Lint errors would get solved after this.



##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py:
##########
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+
+import numpy as np
+import pytest
+
+from typing import Any, Callable, Optional
+
+try:
+  import tensorflow_transform as tft
+  import tensorflow as tf
+  from apache_beam.testing.benchmarks.cloudml.criteo_tft.criteo import fill_in_missing
+except ImportError:
+  tft = None  # type: ignore[assignment]

Review Comment:
   My bad. Remove the `# type: ignore[assignment]`.
   
   Mypy check should pass after that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1505834589

   > Run TFT Criteo Benchmarks
   
   could you guide me how to do it ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1509122320

   > Run TFT Criteo Benchmarks
   
   this benchmark is failing currently since we updated protobuf to 4.x.x and TFT still relies on protobuf < 4. I am watching TFT pypi for the next release which will make the test suite green


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #26225: create helper function

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1504033117

   ## [Codecov](https://codecov.io/gh/apache/beam/pull/26225?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#26225](https://codecov.io/gh/apache/beam/pull/26225?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c1eb39d) into [master](https://codecov.io/gh/apache/beam/commit/d3f1acbf159cbd04dcec81c26368956b533d479c?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d3f1acb) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@           Coverage Diff           @@
   ##           master   #26225   +/-   ##
   =======================================
     Coverage   71.20%   71.21%           
   =======================================
     Files         787      787           
     Lines      103330   103328    -2     
   =======================================
     Hits        73581    73581           
   + Misses      28252    28250    -2     
     Partials     1497     1497           
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/26225?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...am/testing/benchmarks/cloudml/criteo\_tft/criteo.py](https://codecov.io/gh/apache/beam/pull/26225?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL2Nsb3VkbWwvY3JpdGVvX3RmdC9jcml0ZW8ucHk=) | `0.00% <ø> (ø)` | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on a diff in pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on code in PR #26225:
URL: https://github.com/apache/beam/pull/26225#discussion_r1196620138


##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py:
##########
@@ -0,0 +1,36 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+import pytest
+
+@pytest.mark.uses_tft

Review Comment:
   Add a skipIf condition when `tft` package is not installed similar to https://github.com/apache/beam/blob/cf8003e97da9029695f5f872f1f4bfde14e9943b/sdks/python/apache_beam/ml/gcp/cloud_dlp_test.py#L46



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on a diff in pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on code in PR #26225:
URL: https://github.com/apache/beam/pull/26225#discussion_r1210431176


##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py:
##########
@@ -0,0 +1,45 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+import pytest
+import numpy as np
+from .criteo import fill_in_missing
+try:
+  import tensorflow_transform as tft
+  import tensorflow as tf

Review Comment:
   You need to move `from .criteo import fill_in_missing` into the try except. Also, please use absolute imports `apache_beam.testing.benchmarks.cloudml.criteo_tft.criteo` instead of `.criteo`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1649124582

   @AnandInguva I'm having trouble finding out the exact cause of github  workflow checks failing  since the console output is too large, what do you look for while browsing through the files?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1621914449

   > Hey @smeet07 typically an author should be responsible for fixing issues with lint/formatting and other CI breakages. The logs will usually have the information needed to find the problem and fix it.
   > 
   > In this case, looking at the logs I see the following issue in the PythonLint check:
   > 
   > `apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py:24: error: Module 'apache_beam.testing.benchmarks.cloudml.criteo_tft.criteo' has no attribute 'fill_in_missing' [attr-defined]`
   > 
   > and the following issue in the `PythonFormatter` check:
   > 
   > ```
   > 03:32:11 --- apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py	(original)
   > 03:32:11 +++ apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py	(reformatted)
   > 03:32:11 @@ -135,7 +135,7 @@
   > 03:32:11      result = {'clicked': inputs['clicked']}
   > 03:32:11      for name in _INTEGER_COLUMN_NAMES:
   > 03:32:11        feature = inputs[name]
   > 03:32:11 -      
   > 03:32:11 +
   > 03:32:11        def fill_in_missing(feature, default_value=-1):
   > 03:32:11          if tf is not None:
   > 03:32:11            feature = tf.sparse.SparseTensor(
   > 03:32:11 Command exited with non-zero status 1
   > ```
   > 
   > This one looks like there is some useless white space in between 2 lines, the trailing whitespace should be removed (or you can just run `tox -e py3-yapf` to autoformat).
   > 
   > The first issue represents more of a real issue. You're trying to import `apache_beam.testing.benchmarks.cloudml.criteo_tft.criteo.fill_in_missing` but that function doesn't exist as a top level exported function (it only exists as a function defined inside of the `preprocessing_fn` function and should be moved out if you want to reference it from other files)
   
   Thanks @damccorm , I did look at the console output but wasn’t able to get rid of the useless space between the 2 lines . I did not know about the import issue , will push a commit with appropriate changes 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1509936462

   I tried testing the helper function only with some values and it worked fine
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1552693796

   Run TFT Criteo Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #26225: create helper function

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1508982988

   Run TFT Criteo Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1627271039

   Run PythonLint PreCommit
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1739731692

   > able
   
   Sorry. I didn't get what you are saying. Can you provide which commands you are using?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] create helper function [beam]

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1784078363

   forgot about this issue tbh, will get this fixed up in 1-2 days


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1649146195

   The Jenkins tests are passing but GHA tests are not. Can you merge master branch on to yours?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1627311821

   Run PythonLint PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1632974903

   @AnandInguva 
   I was getting apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py:30: error: Incompatible types in assignment (expression has type "None", variable has type "Callable[[Any, Any], Any]")  [assignment] error so I changed the import statement of fill in missing to account for None as 
   fill_in_missing: Optional[Callable[[tf.sparse.SparseTensor, int], tf.Tensor]] = None .
    But now I am getting AttributeError: 'NoneType' object has no attribute 'sparse' because tf is not getting imported correctly and is present in the same try block. can you suggest some workaround for this or the original error?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1635928436

   > @AnandInguva I was getting apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py:30: error: Incompatible types in assignment (expression has type "None", variable has type "Callable[[Any, Any], Any]") [assignment] error so I changed the import statement of fill in missing to account for None as fill_in_missing: Optional[Callable[[tf.sparse.SparseTensor, int], tf.Tensor]] = None . But now I am getting AttributeError: 'NoneType' object has no attribute 'sparse' because tf is not getting imported correctly and is present in the same try block. can you suggest some workaround for this or the original error?
   
   Thanks. I will take a look in some time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #26225: create helper function

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1505870810

   Contributors can as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #26225: create helper function

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on code in PR #26225:
URL: https://github.com/apache/beam/pull/26225#discussion_r1164342485


##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py:
##########
@@ -132,15 +132,19 @@ def preprocessing_fn(inputs):
     result = {'clicked': inputs['clicked']}
     for name in _INTEGER_COLUMN_NAMES:
       feature = inputs[name]
-      # TODO(https://github.com/apache/beam/issues/24902):
-      #  Replace this boilerplate with a helper function.
-      # This is a SparseTensor because it is optional. Here we fill in a
-      # default value when it is missing.
-      feature = tft.sparse_tensor_to_dense_with_shape(
-          feature, [None, 1], default_value=-1)
-      # Reshaping from a batch of vectors of size 1 to a batch of scalars and
-      # adding a bucketized version.
-      feature = tf.squeeze(feature, axis=1)
+      
+      def fill_in_missing(feature, default_value=-1):
+        feature = tf.sparse.SparseTensor(
+            indices=feature.indices,
+            values=feature.values,
+            dense_shape=[feature.dense_shape[0], 1])
+        feature = tf.sparse_to_dense(feature, default_value=default_value)
+        # Reshaping from a batch of vectors of size 1 to a batch of scalars and
+        # adding a bucketized version.
+        feature = tf.squeeze(feature, axis=1)
+        return feature
+      
+      fill_in_missing(feature)

Review Comment:
   We probably need to reassign these back to feature, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #26225: create helper function

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1505851743

   Run TFT Criteo Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1544147222

   R: @AnandInguva 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1507153836

   @damccorm the TFT criteo benchmark test is giving the following error
   ImportError: cannot import name 'builder' from 'google.protobuf.internal' (/home/jenkins/jenkins-slave/workspace/beam_CloudML_Benchmarks_Dataflow_PR/src/build/gradleenv/-1734967050/lib/python3.9/site-packages/google/protobuf/internal/__init__.py)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1553402349

   @AnandInguva is tensorflow not included in python containers?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1544216787

   Please create a new test file 
   `sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py` and place the test in this file. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on a diff in pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on code in PR #26225:
URL: https://github.com/apache/beam/pull/26225#discussion_r1266713454


##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py:
##########
@@ -0,0 +1,57 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+
+import numpy as np
+import pytest
+
+from typing import Any, Callable, Optional
+
+try:
+  import tensorflow_transform as tft
+  import tensorflow as tf
+  from apache_beam.testing.benchmarks.cloudml.criteo_tft.criteo import fill_in_missing
+except ImportError:
+  tft = None
+  tf = None
+  fill_in_missing : Optional[Callable[[tf.sparse.SparseTensor, int], tf.Tensor]] = None

Review Comment:
   ```suggestion
     except ImportError:
     tft = None  # type: ignore[assignment]
   
   if not tft:
     raise unittest.SkipTest('tensorflow_transform is not installed.')
   ```



##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py:
##########
@@ -110,6 +113,20 @@ def make_input_feature_spec(include_label=True):
   return result
 
 
+def fill_in_missing(feature, default_value=-1):
+  if tf is not None:

Review Comment:
   remove this if condition as well.



##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py:
##########
@@ -0,0 +1,57 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+
+import numpy as np
+import pytest
+
+from typing import Any, Callable, Optional
+
+try:
+  import tensorflow_transform as tft
+  import tensorflow as tf
+  from apache_beam.testing.benchmarks.cloudml.criteo_tft.criteo import fill_in_missing
+except ImportError:
+  tft = None
+  tf = None
+  fill_in_missing : Optional[Callable[[tf.sparse.SparseTensor, int], tf.Tensor]] = None

Review Comment:
   Try with this change. It should pass.



##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py:
##########
@@ -20,8 +20,11 @@
 from __future__ import division
 from __future__ import print_function
 
-import tensorflow as tf
-import tensorflow_transform as tft
+try:
+  import tensorflow as tf
+  import tensorflow_transform as tft
+except ImportError as e:

Review Comment:
   you don't need this try except here. It is needed only in the tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1621792014

   @AnandInguva 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1528956159

   Run TFT Criteo Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on a diff in pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on code in PR #26225:
URL: https://github.com/apache/beam/pull/26225#discussion_r1266781469


##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py:
##########
@@ -20,8 +20,11 @@
 from __future__ import division
 from __future__ import print_function
 
-import tensorflow as tf
-import tensorflow_transform as tft
+try:
+  import tensorflow as tf
+  import tensorflow_transform as tft
+except ImportError as e:

Review Comment:
   Yeah I've imported criteo.py in a try block



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1553407783

   > > @AnandInguva is tensorflow not included in python containers?
   > 
   > Tensorflow is included but not tensorflow_transform
   
   I am getting ModuleNotFoundError: No module named 'tensorflow' in Python unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #26225: create helper function

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1621903255

   Hey @smeet07 typically an author should be responsible for fixing issues with lint/formatting and other CI breakages. The logs will usually have the information needed to find the problem and fix it.
   
   In this case, looking at the logs I see the following issue in the PythonLint check:
   
   `apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py:24: error: Module 'apache_beam.testing.benchmarks.cloudml.criteo_tft.criteo' has no attribute 'fill_in_missing'  [attr-defined]`
   
   and the following issue in the `PythonFormatter` check:
   
   ```
   03:32:11 --- apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py	(original)
   03:32:11 +++ apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py	(reformatted)
   03:32:11 @@ -135,7 +135,7 @@
   03:32:11      result = {'clicked': inputs['clicked']}
   03:32:11      for name in _INTEGER_COLUMN_NAMES:
   03:32:11        feature = inputs[name]
   03:32:11 -      
   03:32:11 +
   03:32:11        def fill_in_missing(feature, default_value=-1):
   03:32:11          if tf is not None:
   03:32:11            feature = tf.sparse.SparseTensor(
   03:32:11 Command exited with non-zero status 1
   ```
   
   This one looks like there is some useless white space in between  2 lines, the trailing whitespace should be removed (or you can just run `tox -e py3-yapf` to autoformat).
   
   The first issue represents more of a real issue. You're trying to import `apache_beam.testing.benchmarks.cloudml.criteo_tft.criteo.fill_in_missing` but that function doesn't exist as a top level exported function (it only exists as a function defined inside of the `preprocessing_fn` function and should be moved out if you want to reference it from other files)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #26225: create helper function

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1505811865

   Run TFT Criteo Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #26225: create helper function

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1505851630

   > could you guide me how to do it ?
   
   Sorry for the confusion, that was a command for our automation (our Jenkins setup will trigger builds based on comments in the PR). So just commenting `Run TFT Criteo Benchmarks` should eventually cause it to run. I'll comment again so it triggers against the most recent commit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1505868467

   Oh, can only reviewers run the tests or the contributors as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1509096097

   Run TFT Criteo Benchmarks
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] create helper function [beam]

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-2049696711

   @smeet07 are you still working on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #26225: create helper function

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1519053144

   Reminder, please take a look at this pr: @damccorm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1530071809

   > Also is there a naming convention for the unit test I need follow?
   
   You can see some examples. For example, https://github.com/apache/beam/blob/35406e08678c08e87e4d0ef87c4f29b5bf56872a/sdks/python/apache_beam/ml/inference/pytorch_inference_test.py#LL255. There is no naming convention except for the test name should start with `test_` and the test name should clearly define what it is doing(would be easier for others to read)
   
   Also, since this is a tft test, use the pytest marker `uses_tft` so that relevant dependencies are installed to run this test. Look here https://github.com/apache/beam/blob/35406e08678c08e87e4d0ef87c4f29b5bf56872a/sdks/python/apache_beam/testing/benchmarks/cloudml/cloudml_benchmark_test.py#L59
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1530064423

   > Where should I put this unit test?
   
   sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1530063226

   Where should I put this unit test?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1529073902

   I have one request. Can you add a unit test to the feature that you added?  That would ensure everything works as expected. There are no unit tests for this file yet. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1621826487

   Hey seems like there are lint and formatting errors.  
   ![image](https://github.com/apache/beam/assets/34158215/83cdadbe-02e8-4331-8cda-4504eca41d54)
   
   
   Can you fix them?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1509122717

   On that note, I would suggest to do some testing locally for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #26225: create helper function

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1508982822

   Lets run it again, I think it may be fixed by https://github.com/apache/beam/commit/0dcb26dc4efde36e14e381e9fc0bfcd35ee96c26


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] create helper function [beam]

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1783066733

   @smeet07 any luck getting this fixed up? Sounds like your local git config is currently broken, my recommendation would probably be to try uninstalling/reinstalling.
   
   You could also try opening this in a GitHub codespace - https://github.com/features/codespaces


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #26225: create helper function

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1504214491

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @damccorm for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on a diff in pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on code in PR #26225:
URL: https://github.com/apache/beam/pull/26225#discussion_r1164352659


##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py:
##########
@@ -132,15 +132,19 @@ def preprocessing_fn(inputs):
     result = {'clicked': inputs['clicked']}
     for name in _INTEGER_COLUMN_NAMES:
       feature = inputs[name]
-      # TODO(https://github.com/apache/beam/issues/24902):
-      #  Replace this boilerplate with a helper function.
-      # This is a SparseTensor because it is optional. Here we fill in a
-      # default value when it is missing.
-      feature = tft.sparse_tensor_to_dense_with_shape(
-          feature, [None, 1], default_value=-1)
-      # Reshaping from a batch of vectors of size 1 to a batch of scalars and
-      # adding a bucketized version.
-      feature = tf.squeeze(feature, axis=1)
+      
+      def fill_in_missing(feature, default_value=-1):
+        feature = tf.sparse.SparseTensor(
+            indices=feature.indices,
+            values=feature.values,
+            dense_shape=[feature.dense_shape[0], 1])
+        feature = tf.sparse_to_dense(feature, default_value=default_value)
+        # Reshaping from a batch of vectors of size 1 to a batch of scalars and
+        # adding a bucketized version.
+        feature = tf.squeeze(feature, axis=1)
+        return feature
+      
+      fill_in_missing(feature)

Review Comment:
   right 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #26225: create helper function

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1520527989

   Unfortunately, we probably should wait until the full workflow can be tested before merging changes. This change seems like its doing roughly the right thing, once TFT is released we should definitely revisit this. Sorry @smeet07, we can leave the PR open until then though


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1521580344

   @damccorm @AnandInguva The TFT Criteo benchmark test is now passing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1553289206

   Run TFT Criteo Benchmarks
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1553403877

   > @AnandInguva is tensorflow not included in python containers?
   
   Tensorflow is included but not tensorflow_transform


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1621885837

    
   
   > Hey seems like there are lint and formatting errors. ![image](https://user-images.githubusercontent.com/34158215/251174708-83cdadbe-02e8-4331-8cda-4504eca41d54.png)
   > 
   > Can you fix them?
   
   I ran the auto formatter command but still doesn’t pass the check, could you fix them?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] create helper function [beam]

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1742521123

   > > able
   > 
   > Sorry. I didn't get what you are saying. Can you provide which commands you are using?
   
   git clone <url> is not copying the whole code for some reason in my new laptop


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1649253103

   Run TFT Criteo Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1530068925

   Also is there a naming convention for the unit test I need follow?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1505547704

   > Thanks for doing this! Looks like there are some formatting/linting issues - you can get more detail on them by looking in the logs (linked in the checks section) or by running them locally with the commands described here - https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-LintandFormattingChecks
   
   I tried solving them by referring to the console output but seems like there is still something I'm missing out , I'll look into it again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #26225: create helper function

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1505532177

   Run TFT Criteo Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #26225: create helper function

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1506923558

   Run TFT Criteo Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #26225: create helper function

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1520530074

   Stopping reviewer notifications for this pull request: requested by reviewer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1520537217

   Run TFT Criteo Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on a diff in pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on code in PR #26225:
URL: https://github.com/apache/beam/pull/26225#discussion_r1210431176


##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo_test.py:
##########
@@ -0,0 +1,45 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+import pytest
+import numpy as np
+from .criteo import fill_in_missing
+try:
+  import tensorflow_transform as tft
+  import tensorflow as tf

Review Comment:
   You need to move `from .criteo import fill_in_missing` into the try except. Also, please use absolute imports `apache_beam.testing.benchmarks.cloudml.criteo_tft.criteo`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on PR #26225:
URL: https://github.com/apache/beam/pull/26225#issuecomment-1739706655

   @AnandInguva I'm not able to merge it on my this computer as git clone is not cloning the whole repo


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on a diff in pull request #26225: create helper function

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on code in PR #26225:
URL: https://github.com/apache/beam/pull/26225#discussion_r1266774333


##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py:
##########
@@ -20,8 +20,11 @@
 from __future__ import division
 from __future__ import print_function
 
-import tensorflow as tf
-import tensorflow_transform as tft
+try:
+  import tensorflow as tf
+  import tensorflow_transform as tft
+except ImportError as e:

Review Comment:
   You should import `criteo.py` in tests under a `try` `except` block since `criteo.py` has a dependency on `tft` which might not be installed on all test environments. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] smeet07 commented on a diff in pull request #26225: create helper function

Posted by "smeet07 (via GitHub)" <gi...@apache.org>.
smeet07 commented on code in PR #26225:
URL: https://github.com/apache/beam/pull/26225#discussion_r1266772331


##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py:
##########
@@ -20,8 +20,11 @@
 from __future__ import division
 from __future__ import print_function
 
-import tensorflow as tf
-import tensorflow_transform as tft
+try:
+  import tensorflow as tf
+  import tensorflow_transform as tft
+except ImportError as e:

Review Comment:
   Not adding it was causing import error that is why added it , I'll remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org