You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "damccorm (via GitHub)" <gi...@apache.org> on 2023/04/12 16:00:55 UTC

[GitHub] [beam] damccorm commented on a diff in pull request #26225: create helper function

damccorm commented on code in PR #26225:
URL: https://github.com/apache/beam/pull/26225#discussion_r1164342485


##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py:
##########
@@ -132,15 +132,19 @@ def preprocessing_fn(inputs):
     result = {'clicked': inputs['clicked']}
     for name in _INTEGER_COLUMN_NAMES:
       feature = inputs[name]
-      # TODO(https://github.com/apache/beam/issues/24902):
-      #  Replace this boilerplate with a helper function.
-      # This is a SparseTensor because it is optional. Here we fill in a
-      # default value when it is missing.
-      feature = tft.sparse_tensor_to_dense_with_shape(
-          feature, [None, 1], default_value=-1)
-      # Reshaping from a batch of vectors of size 1 to a batch of scalars and
-      # adding a bucketized version.
-      feature = tf.squeeze(feature, axis=1)
+      
+      def fill_in_missing(feature, default_value=-1):
+        feature = tf.sparse.SparseTensor(
+            indices=feature.indices,
+            values=feature.values,
+            dense_shape=[feature.dense_shape[0], 1])
+        feature = tf.sparse_to_dense(feature, default_value=default_value)
+        # Reshaping from a batch of vectors of size 1 to a batch of scalars and
+        # adding a bucketized version.
+        feature = tf.squeeze(feature, axis=1)
+        return feature
+      
+      fill_in_missing(feature)

Review Comment:
   We probably need to reassign these back to feature, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org