You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/09/11 16:34:14 UTC

[GitHub] [beam] Edusanc95 commented on a change in pull request #15450: [BEAM-12701] Added extra parameter in to_csv for DeferredFrame to name the PTransform label

Edusanc95 commented on a change in pull request #15450:
URL: https://github.com/apache/beam/pull/15450#discussion_r706633103



##########
File path: sdks/python/apache_beam/dataframe/io.py
##########
@@ -74,16 +74,17 @@ def read_csv(path, *args, splittable=False, **kwargs):
       splitter=_CsvSplitter(args, kwargs) if splittable else None)
 
 
-def _as_pc(df):
+def _as_pc(df, label=None):
   from apache_beam.dataframe import convert  # avoid circular import
   # TODO(roberwb): Amortize the computation for multiple writes?
-  return convert.to_pcollection(df, yield_elements='pandas')
+  return convert.to_pcollection(df, yield_elements='pandas', label=label)
 
 
 @frame_base.with_docs_from(pd.DataFrame)
-def to_csv(df, path, *args, **kwargs):
-
-  return _as_pc(df) | _WriteToPandas(
+def to_csv(df, path, transform_label=None, *args, **kwargs):
+  label_pc = f"{transform_label} - ToPCollection" if transform_label else "ToPCollection(df)"
+  label_pd = f"{transform_label} - ToPandasDataFrame" if transform_label else "ToPandasDataFrame(df)"

Review comment:
       Hello! I agree with your remarks. I just pushed a commit that includes this change as well as the linter fix.
   
   The message is slightly different, `WriteToPandas(df) - {path}` instead of `{path} - WriteToPandas(df)`. I think when looking at a glance it makes more sense to see first the operation that's being done and afterwards the specific file that's being transformed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org