You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/09/20 00:12:18 UTC

[GitHub] [beam] TheNeuralBit opened a new pull request #12882: [BEAM-10814] DataframeTransform outputs elements

TheNeuralBit opened a new pull request #12882:
URL: https://github.com/apache/beam/pull/12882


   This PR adds the ability for `DataframeTransform` to produce element-wise PCollections, via a new `UnbatchPandas` transform. The original behavior is still possible, with a `yield_dataframes=True` option.
   
   In order to support this, the PR also adds more robust support for converting between native Python types and pandas/numpy dtypes. Note that pandas' catch-all `np.object` type is mapped to `typing.Any`. This PR also closes BEAM-10570 - it adds support for RowCoder to encode fields with arbitrary Python types using FastPrimitivesCoder.
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](htt
 ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_
 Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_P
 ostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website | Whitespace | Typescript
   --- | --- | --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) <br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/be
 am_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | --- | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r493712516



##########
File path: build.gradle
##########
@@ -199,7 +199,6 @@ task goIntegrationTests() {
 
 task pythonPreCommit() {
   dependsOn ":sdks:python:test-suites:tox:pycommon:preCommitPyCommon"
-  dependsOn ":sdks:python:test-suites:tox:py35:preCommitPy35"

Review comment:
       FYI @tvalentyn I just merged this PR including a commit to drop python 3.5 in the precommit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r491664979



##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+

Review comment:
       extra whitespace?

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -15,25 +15,129 @@
 # limitations under the License.
 #
 
-"""Utilities for relating schema-aware PCollections and dataframe transforms.
+r"""Utilities for relating schema-aware PCollections and dataframe transforms.
+
+pandas dtype               Python typing
+np.int{8,16,32,64}      <-----> np.int{8,16,32,64}*
+pd.Int{8,16,32,64}Dtype <-----> Optional[np.int{8,16,32,64}]*
+np.float{32,64}         <-----> Optional[np.float{32,64}]
+                           \--- np.float{32,64}
+np.dtype('S')           <-----> bytes
+Not supported           <------ Optional[bytes]
+np.bool                 <-----> np.bool
+
+* int, float, bool are treated the same as np.int64, np.float64, np.bool
+
+Any unknown or unsupported types are trested as Any and shunted to
+np.object:
+
+np.object               <-----> Any
+
+Strings and nullable Booleans are handled differently when using pandas 0.x vs.
+1.x. pandas 0.x has no mapping for these types, so they are shunted lossily to
+  np.object.
+
+pandas 0.x:
+np.object         <------ Optional[bool]
+                     \--- Optional[str]
+                      \-- str
+
+pandas 1.x:
+pd.BooleanDType() <-----> Optional[bool]
+pd.StringDType()  <-----> Optional[str]
+                     \--- str
+
+Pandas does not support hierarchical data natively. All structured types

Review comment:
       We should consider whether we want flattening here (e.g. with dotted attributes). Let's at least mark this paragraph as subject to change.

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -15,25 +15,129 @@
 # limitations under the License.
 #
 
-"""Utilities for relating schema-aware PCollections and dataframe transforms.
+r"""Utilities for relating schema-aware PCollections and dataframe transforms.
+
+pandas dtype               Python typing
+np.int{8,16,32,64}      <-----> np.int{8,16,32,64}*
+pd.Int{8,16,32,64}Dtype <-----> Optional[np.int{8,16,32,64}]*
+np.float{32,64}         <-----> Optional[np.float{32,64}]
+                           \--- np.float{32,64}
+np.dtype('S')           <-----> bytes
+Not supported           <------ Optional[bytes]
+np.bool                 <-----> np.bool
+
+* int, float, bool are treated the same as np.int64, np.float64, np.bool
+
+Any unknown or unsupported types are trested as Any and shunted to
+np.object:
+
+np.object               <-----> Any
+
+Strings and nullable Booleans are handled differently when using pandas 0.x vs.
+1.x. pandas 0.x has no mapping for these types, so they are shunted lossily to
+  np.object.
+
+pandas 0.x:
+np.object         <------ Optional[bool]
+                     \--- Optional[str]
+                      \-- str
+
+pandas 1.x:
+pd.BooleanDType() <-----> Optional[bool]
+pd.StringDType()  <-----> Optional[str]
+                     \--- str
+
+Pandas does not support hierarchical data natively. All structured types
+(Sequence, Mapping, nested NamedTuple types), will be shunted lossily to
+np.object/Any.
+
+TODO: Mapping for date/time types
+https://pandas.pydata.org/docs/user_guide/timeseries.html#overview
+
+timestamps and timedeltas in pandas always use nanosecond precision
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
-import typing
+from typing import Any
+from typing import NamedTuple
+from typing import Optional
+from typing import TypeVar
+from typing import Union
 
+import numpy as np
 import pandas as pd
 
 import apache_beam as beam
 from apache_beam import typehints
+from apache_beam.portability.api import schema_pb2
 from apache_beam.transforms.util import BatchElements
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
 from apache_beam.typehints.schemas import named_fields_from_element_type
+from apache_beam.typehints.schemas import named_fields_to_schema
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+from apache_beam.utils import proto_utils
+
+__all__ = (
+    'BatchRowsAsDataFrame',
+    'generate_proxy',
+    'UnbatchPandas',
+    'element_type_from_proxy')
+
+T = TypeVar('T', bound=NamedTuple)
+
+PD_MAJOR, _, _ = map(int, pd.__version__.split('.'))

Review comment:
       This will break for betas, rcs, etc. Maybe just do `int(pd.__version__.split('.')[0])`

##########
File path: sdks/python/apache_beam/dataframe/convert.py
##########
@@ -118,6 +119,15 @@ def extract_input(placeholder):
              } | label >> transforms._DataframeExpressionsTransform(
                  dict((ix, df._expr) for ix, df in enumerate(
                      dataframes)))  # type: Dict[Any, pvalue.PCollection]
+
+  if not yield_dataframes:
+    results = {
+        key: pc | "Unbatch '%s'" % dataframes[key]._expr._id >>
+        schemas.UnbatchPandas(dataframes[key]._expr.proxy())
+        for key,

Review comment:
       Put ()'s around (key, pc) for better formatting.

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):

Review comment:
       Maybe `..._from_dataframe`? (Proxy may not be a dataframe.)

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.

Review comment:
       Is whitespace stripped at the beginning of a docstring? (Similarly below.)

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
       I was wondering about this as well--do we want to return the index iff it's a multi-index or it's named? Should we make whether to return the index another option? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r492219317



##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
       I thought the MultiIndex or named case was important since otherwise we'll drop the grouped column(s) when unbatching the result of a grouped aggregation.
   
   It raise some tricky issues though:
   - Index names are not required to be unique.
   - It looks like my assumption that all MultiIndexes are named is wrong. It's possible to create a `MultiIndex` with `names=[None, None, 'foo']`, which would break this badly.
   - Type information is not necessarily preserved in indexes. e.g. Int64Index doesn't support nulls like Series with Int64Dtype does. if one is added it's converted to a Float64Index with nans.
   
   Maybe including the index shouldn't be the default until we have a better handle on these edge cases.

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
       We could log a warning if there's a named index in the result and `include_indexes` is `False`

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):

Review comment:
       Done

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.

Review comment:
       Done

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+

Review comment:
       Removed

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -15,25 +15,129 @@
 # limitations under the License.
 #
 
-"""Utilities for relating schema-aware PCollections and dataframe transforms.
+r"""Utilities for relating schema-aware PCollections and dataframe transforms.
+
+pandas dtype               Python typing
+np.int{8,16,32,64}      <-----> np.int{8,16,32,64}*
+pd.Int{8,16,32,64}Dtype <-----> Optional[np.int{8,16,32,64}]*
+np.float{32,64}         <-----> Optional[np.float{32,64}]
+                           \--- np.float{32,64}
+np.dtype('S')           <-----> bytes
+Not supported           <------ Optional[bytes]
+np.bool                 <-----> np.bool
+
+* int, float, bool are treated the same as np.int64, np.float64, np.bool
+
+Any unknown or unsupported types are trested as Any and shunted to
+np.object:
+
+np.object               <-----> Any
+
+Strings and nullable Booleans are handled differently when using pandas 0.x vs.
+1.x. pandas 0.x has no mapping for these types, so they are shunted lossily to
+  np.object.
+
+pandas 0.x:
+np.object         <------ Optional[bool]
+                     \--- Optional[str]
+                      \-- str
+
+pandas 1.x:
+pd.BooleanDType() <-----> Optional[bool]
+pd.StringDType()  <-----> Optional[str]
+                     \--- str
+
+Pandas does not support hierarchical data natively. All structured types
+(Sequence, Mapping, nested NamedTuple types), will be shunted lossily to
+np.object/Any.
+
+TODO: Mapping for date/time types
+https://pandas.pydata.org/docs/user_guide/timeseries.html#overview
+
+timestamps and timedeltas in pandas always use nanosecond precision
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
-import typing
+from typing import Any
+from typing import NamedTuple
+from typing import Optional
+from typing import TypeVar
+from typing import Union
 
+import numpy as np
 import pandas as pd
 
 import apache_beam as beam
 from apache_beam import typehints
+from apache_beam.portability.api import schema_pb2
 from apache_beam.transforms.util import BatchElements
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
 from apache_beam.typehints.schemas import named_fields_from_element_type
+from apache_beam.typehints.schemas import named_fields_to_schema
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+from apache_beam.utils import proto_utils
+
+__all__ = (
+    'BatchRowsAsDataFrame',
+    'generate_proxy',
+    'UnbatchPandas',
+    'element_type_from_proxy')
+
+T = TypeVar('T', bound=NamedTuple)
+
+PD_MAJOR, _, _ = map(int, pd.__version__.split('.'))

Review comment:
       Fixed, thanks

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -15,25 +15,129 @@
 # limitations under the License.
 #
 
-"""Utilities for relating schema-aware PCollections and dataframe transforms.
+r"""Utilities for relating schema-aware PCollections and dataframe transforms.
+
+pandas dtype               Python typing
+np.int{8,16,32,64}      <-----> np.int{8,16,32,64}*
+pd.Int{8,16,32,64}Dtype <-----> Optional[np.int{8,16,32,64}]*
+np.float{32,64}         <-----> Optional[np.float{32,64}]
+                           \--- np.float{32,64}
+np.dtype('S')           <-----> bytes
+Not supported           <------ Optional[bytes]
+np.bool                 <-----> np.bool
+
+* int, float, bool are treated the same as np.int64, np.float64, np.bool
+
+Any unknown or unsupported types are trested as Any and shunted to
+np.object:
+
+np.object               <-----> Any
+
+Strings and nullable Booleans are handled differently when using pandas 0.x vs.
+1.x. pandas 0.x has no mapping for these types, so they are shunted lossily to
+  np.object.
+
+pandas 0.x:
+np.object         <------ Optional[bool]
+                     \--- Optional[str]
+                      \-- str
+
+pandas 1.x:
+pd.BooleanDType() <-----> Optional[bool]
+pd.StringDType()  <-----> Optional[str]
+                     \--- str
+
+Pandas does not support hierarchical data natively. All structured types

Review comment:
       SG, I added a sentence indicating we might add better support for these types in the future.

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
       I added an `included_indexes` option on `DataframeTransform`, `to_pcollection`, and `UnbatchPandas`. It raises an exception if used when index names are not unique or unnamed. PTAL

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
       I added an `include_indexes` option on `DataframeTransform`, `to_pcollection`, and `UnbatchPandas`. It raises an exception if used when index names are not unique or unnamed. PTAL




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-695369909


   R: @robertwb
   
   Sorry I didn't get this out sooner, will you be able to review before the release cut?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit merged pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit merged pull request #12882:
URL: https://github.com/apache/beam/pull/12882


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r492330184



##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
       I added an `included_indexes` option on `DataframeTransform`, `to_pcollection`, and `UnbatchPandas`. It raises an exception if used when index names are not unique or unnamed. PTAL




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-697053216


   Run Portable_Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] commented on pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/067cba8229694e7fb9693313f51ca686746b620a?el=desc) will **increase** coverage by `0.02%`.
   > The diff coverage is `96.89%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   + Coverage   82.34%   82.36%   +0.02%     
   ==========================================
     Files         452      454       +2     
     Lines       54016    54200     +184     
   ==========================================
   + Hits        44481    44644     +163     
   - Misses       9535     9556      +21     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3NjaGVtYXMucHk=) | `97.56% <97.22%> (-2.44%)` | :arrow_down: |
   | [sdks/python/apache\_beam/coders/row\_coder.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Jvd19jb2Rlci5weQ==) | `93.66% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `89.62% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/transforms.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3RyYW5zZm9ybXMucHk=) | `95.54% <100.00%> (+0.05%)` | :arrow_up: |
   | [sdks/python/apache\_beam/typehints/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL3NjaGVtYXMucHk=) | `93.33% <100.00%> (+0.12%)` | :arrow_up: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <0.00%> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/localfilesystem.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vbG9jYWxmaWxlc3lzdGVtLnB5) | `90.90% <0.00%> (-0.76%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <0.00%> (-0.63%)` | :arrow_down: |
   | [sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=) | `88.75% <0.00%> (-0.45%)` | :arrow_down: |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [067cba8...fb60b69](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696915376






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/067cba8229694e7fb9693313f51ca686746b620a?el=desc) will **increase** coverage by `0.02%`.
   > The diff coverage is `96.89%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   + Coverage   82.34%   82.36%   +0.02%     
   ==========================================
     Files         452      454       +2     
     Lines       54016    54200     +184     
   ==========================================
   + Hits        44481    44644     +163     
   - Misses       9535     9556      +21     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3NjaGVtYXMucHk=) | `97.56% <97.22%> (-2.44%)` | :arrow_down: |
   | [sdks/python/apache\_beam/coders/row\_coder.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Jvd19jb2Rlci5weQ==) | `93.66% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `89.62% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/transforms.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3RyYW5zZm9ybXMucHk=) | `95.54% <100.00%> (+0.05%)` | :arrow_up: |
   | [sdks/python/apache\_beam/typehints/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL3NjaGVtYXMucHk=) | `93.33% <100.00%> (+0.12%)` | :arrow_up: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <0.00%> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/localfilesystem.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vbG9jYWxmaWxlc3lzdGVtLnB5) | `90.90% <0.00%> (-0.76%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <0.00%> (-0.63%)` | :arrow_down: |
   | [sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=) | `88.75% <0.00%> (-0.45%)` | :arrow_down: |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [067cba8...fb60b69](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit merged pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit merged pull request #12882:
URL: https://github.com/apache/beam/pull/12882


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r492219936



##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
       We could log a warning if there's a named index in the result and `include_indexes` is `False`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r491664979



##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+

Review comment:
       extra whitespace?

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -15,25 +15,129 @@
 # limitations under the License.
 #
 
-"""Utilities for relating schema-aware PCollections and dataframe transforms.
+r"""Utilities for relating schema-aware PCollections and dataframe transforms.
+
+pandas dtype               Python typing
+np.int{8,16,32,64}      <-----> np.int{8,16,32,64}*
+pd.Int{8,16,32,64}Dtype <-----> Optional[np.int{8,16,32,64}]*
+np.float{32,64}         <-----> Optional[np.float{32,64}]
+                           \--- np.float{32,64}
+np.dtype('S')           <-----> bytes
+Not supported           <------ Optional[bytes]
+np.bool                 <-----> np.bool
+
+* int, float, bool are treated the same as np.int64, np.float64, np.bool
+
+Any unknown or unsupported types are trested as Any and shunted to
+np.object:
+
+np.object               <-----> Any
+
+Strings and nullable Booleans are handled differently when using pandas 0.x vs.
+1.x. pandas 0.x has no mapping for these types, so they are shunted lossily to
+  np.object.
+
+pandas 0.x:
+np.object         <------ Optional[bool]
+                     \--- Optional[str]
+                      \-- str
+
+pandas 1.x:
+pd.BooleanDType() <-----> Optional[bool]
+pd.StringDType()  <-----> Optional[str]
+                     \--- str
+
+Pandas does not support hierarchical data natively. All structured types

Review comment:
       We should consider whether we want flattening here (e.g. with dotted attributes). Let's at least mark this paragraph as subject to change.

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -15,25 +15,129 @@
 # limitations under the License.
 #
 
-"""Utilities for relating schema-aware PCollections and dataframe transforms.
+r"""Utilities for relating schema-aware PCollections and dataframe transforms.
+
+pandas dtype               Python typing
+np.int{8,16,32,64}      <-----> np.int{8,16,32,64}*
+pd.Int{8,16,32,64}Dtype <-----> Optional[np.int{8,16,32,64}]*
+np.float{32,64}         <-----> Optional[np.float{32,64}]
+                           \--- np.float{32,64}
+np.dtype('S')           <-----> bytes
+Not supported           <------ Optional[bytes]
+np.bool                 <-----> np.bool
+
+* int, float, bool are treated the same as np.int64, np.float64, np.bool
+
+Any unknown or unsupported types are trested as Any and shunted to
+np.object:
+
+np.object               <-----> Any
+
+Strings and nullable Booleans are handled differently when using pandas 0.x vs.
+1.x. pandas 0.x has no mapping for these types, so they are shunted lossily to
+  np.object.
+
+pandas 0.x:
+np.object         <------ Optional[bool]
+                     \--- Optional[str]
+                      \-- str
+
+pandas 1.x:
+pd.BooleanDType() <-----> Optional[bool]
+pd.StringDType()  <-----> Optional[str]
+                     \--- str
+
+Pandas does not support hierarchical data natively. All structured types
+(Sequence, Mapping, nested NamedTuple types), will be shunted lossily to
+np.object/Any.
+
+TODO: Mapping for date/time types
+https://pandas.pydata.org/docs/user_guide/timeseries.html#overview
+
+timestamps and timedeltas in pandas always use nanosecond precision
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
-import typing
+from typing import Any
+from typing import NamedTuple
+from typing import Optional
+from typing import TypeVar
+from typing import Union
 
+import numpy as np
 import pandas as pd
 
 import apache_beam as beam
 from apache_beam import typehints
+from apache_beam.portability.api import schema_pb2
 from apache_beam.transforms.util import BatchElements
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
 from apache_beam.typehints.schemas import named_fields_from_element_type
+from apache_beam.typehints.schemas import named_fields_to_schema
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+from apache_beam.utils import proto_utils
+
+__all__ = (
+    'BatchRowsAsDataFrame',
+    'generate_proxy',
+    'UnbatchPandas',
+    'element_type_from_proxy')
+
+T = TypeVar('T', bound=NamedTuple)
+
+PD_MAJOR, _, _ = map(int, pd.__version__.split('.'))

Review comment:
       This will break for betas, rcs, etc. Maybe just do `int(pd.__version__.split('.')[0])`

##########
File path: sdks/python/apache_beam/dataframe/convert.py
##########
@@ -118,6 +119,15 @@ def extract_input(placeholder):
              } | label >> transforms._DataframeExpressionsTransform(
                  dict((ix, df._expr) for ix, df in enumerate(
                      dataframes)))  # type: Dict[Any, pvalue.PCollection]
+
+  if not yield_dataframes:
+    results = {
+        key: pc | "Unbatch '%s'" % dataframes[key]._expr._id >>
+        schemas.UnbatchPandas(dataframes[key]._expr.proxy())
+        for key,

Review comment:
       Put ()'s around (key, pc) for better formatting.

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):

Review comment:
       Maybe `..._from_dataframe`? (Proxy may not be a dataframe.)

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.

Review comment:
       Is whitespace stripped at the beginning of a docstring? (Similarly below.)

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
       I was wondering about this as well--do we want to return the index iff it's a multi-index or it's named? Should we make whether to return the index another option? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/067cba8229694e7fb9693313f51ca686746b620a?el=desc) will **increase** coverage by `0.02%`.
   > The diff coverage is `96.89%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   + Coverage   82.34%   82.36%   +0.02%     
   ==========================================
     Files         452      454       +2     
     Lines       54016    54200     +184     
   ==========================================
   + Hits        44481    44644     +163     
   - Misses       9535     9556      +21     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3NjaGVtYXMucHk=) | `97.56% <97.22%> (-2.44%)` | :arrow_down: |
   | [sdks/python/apache\_beam/coders/row\_coder.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Jvd19jb2Rlci5weQ==) | `93.66% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `89.62% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/transforms.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3RyYW5zZm9ybXMucHk=) | `95.54% <100.00%> (+0.05%)` | :arrow_up: |
   | [sdks/python/apache\_beam/typehints/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL3NjaGVtYXMucHk=) | `93.33% <100.00%> (+0.12%)` | :arrow_up: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <0.00%> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/localfilesystem.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vbG9jYWxmaWxlc3lzdGVtLnB5) | `90.90% <0.00%> (-0.76%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <0.00%> (-0.63%)` | :arrow_down: |
   | [sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=) | `88.75% <0.00%> (-0.45%)` | :arrow_down: |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [067cba8...fb60b69](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696915376


   Some tests are only failing on Jenkins with Python 3.5, and I can't reproduce locally. I suspect it's because I have 3.5.4 locally and Jenkins is using 3.5.2. I'm not sure it's worth debugging the issue in 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r492328919



##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):

Review comment:
       Done

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.

Review comment:
       Done

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+

Review comment:
       Removed

##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -15,25 +15,129 @@
 # limitations under the License.
 #
 
-"""Utilities for relating schema-aware PCollections and dataframe transforms.
+r"""Utilities for relating schema-aware PCollections and dataframe transforms.
+
+pandas dtype               Python typing
+np.int{8,16,32,64}      <-----> np.int{8,16,32,64}*
+pd.Int{8,16,32,64}Dtype <-----> Optional[np.int{8,16,32,64}]*
+np.float{32,64}         <-----> Optional[np.float{32,64}]
+                           \--- np.float{32,64}
+np.dtype('S')           <-----> bytes
+Not supported           <------ Optional[bytes]
+np.bool                 <-----> np.bool
+
+* int, float, bool are treated the same as np.int64, np.float64, np.bool
+
+Any unknown or unsupported types are trested as Any and shunted to
+np.object:
+
+np.object               <-----> Any
+
+Strings and nullable Booleans are handled differently when using pandas 0.x vs.
+1.x. pandas 0.x has no mapping for these types, so they are shunted lossily to
+  np.object.
+
+pandas 0.x:
+np.object         <------ Optional[bool]
+                     \--- Optional[str]
+                      \-- str
+
+pandas 1.x:
+pd.BooleanDType() <-----> Optional[bool]
+pd.StringDType()  <-----> Optional[str]
+                     \--- str
+
+Pandas does not support hierarchical data natively. All structured types
+(Sequence, Mapping, nested NamedTuple types), will be shunted lossily to
+np.object/Any.
+
+TODO: Mapping for date/time types
+https://pandas.pydata.org/docs/user_guide/timeseries.html#overview
+
+timestamps and timedeltas in pandas always use nanosecond precision
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
-import typing
+from typing import Any
+from typing import NamedTuple
+from typing import Optional
+from typing import TypeVar
+from typing import Union
 
+import numpy as np
 import pandas as pd
 
 import apache_beam as beam
 from apache_beam import typehints
+from apache_beam.portability.api import schema_pb2
 from apache_beam.transforms.util import BatchElements
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
 from apache_beam.typehints.schemas import named_fields_from_element_type
+from apache_beam.typehints.schemas import named_fields_to_schema
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+from apache_beam.utils import proto_utils
+
+__all__ = (
+    'BatchRowsAsDataFrame',
+    'generate_proxy',
+    'UnbatchPandas',
+    'element_type_from_proxy')
+
+T = TypeVar('T', bound=NamedTuple)
+
+PD_MAJOR, _, _ = map(int, pd.__version__.split('.'))

Review comment:
       Fixed, thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r492329417



##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -15,25 +15,129 @@
 # limitations under the License.
 #
 
-"""Utilities for relating schema-aware PCollections and dataframe transforms.
+r"""Utilities for relating schema-aware PCollections and dataframe transforms.
+
+pandas dtype               Python typing
+np.int{8,16,32,64}      <-----> np.int{8,16,32,64}*
+pd.Int{8,16,32,64}Dtype <-----> Optional[np.int{8,16,32,64}]*
+np.float{32,64}         <-----> Optional[np.float{32,64}]
+                           \--- np.float{32,64}
+np.dtype('S')           <-----> bytes
+Not supported           <------ Optional[bytes]
+np.bool                 <-----> np.bool
+
+* int, float, bool are treated the same as np.int64, np.float64, np.bool
+
+Any unknown or unsupported types are trested as Any and shunted to
+np.object:
+
+np.object               <-----> Any
+
+Strings and nullable Booleans are handled differently when using pandas 0.x vs.
+1.x. pandas 0.x has no mapping for these types, so they are shunted lossily to
+  np.object.
+
+pandas 0.x:
+np.object         <------ Optional[bool]
+                     \--- Optional[str]
+                      \-- str
+
+pandas 1.x:
+pd.BooleanDType() <-----> Optional[bool]
+pd.StringDType()  <-----> Optional[str]
+                     \--- str
+
+Pandas does not support hierarchical data natively. All structured types

Review comment:
       SG, I added a sentence indicating we might add better support for these types in the future.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/a4ea68ced3deb4a6e5ae745dedf8ac65740f79ab?el=desc) will **decrease** coverage by `0.18%`.
   > The diff coverage is `91.55%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   - Coverage   82.58%   82.40%   -0.19%     
   ==========================================
     Files         454      454              
     Lines       55403    54355    -1048     
   ==========================================
   - Hits        45757    44791     -966     
   + Misses       9646     9564      -82     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/frame\_base.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lX2Jhc2UucHk=) | `84.97% <ø> (ø)` | |
   | [sdks/python/apache\_beam/io/external/snowflake.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZXh0ZXJuYWwvc25vd2ZsYWtlLnB5) | `0.00% <0.00%> (ø)` | |
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `96.52% <ø> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `79.78% <44.44%> (-0.28%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <58.33%> (-2.18%)` | :arrow_down: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <84.61%> (-3.85%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [...python/apache\_beam/examples/wordcount\_dataframe.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X2RhdGFmcmFtZS5weQ==) | `91.66% <91.66%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `92.09% <92.09%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3NjaGVtYXMucHk=) | `97.56% <97.22%> (-2.44%)` | :arrow_down: |
   | ... and [35 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [e3cdcc3...369bba0](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] commented on pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/067cba8229694e7fb9693313f51ca686746b620a?el=desc) will **increase** coverage by `0.02%`.
   > The diff coverage is `96.89%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   + Coverage   82.34%   82.36%   +0.02%     
   ==========================================
     Files         452      454       +2     
     Lines       54016    54200     +184     
   ==========================================
   + Hits        44481    44644     +163     
   - Misses       9535     9556      +21     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3NjaGVtYXMucHk=) | `97.56% <97.22%> (-2.44%)` | :arrow_down: |
   | [sdks/python/apache\_beam/coders/row\_coder.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Jvd19jb2Rlci5weQ==) | `93.66% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `89.62% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/transforms.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3RyYW5zZm9ybXMucHk=) | `95.54% <100.00%> (+0.05%)` | :arrow_up: |
   | [sdks/python/apache\_beam/typehints/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL3NjaGVtYXMucHk=) | `93.33% <100.00%> (+0.12%)` | :arrow_up: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <0.00%> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/localfilesystem.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vbG9jYWxmaWxlc3lzdGVtLnB5) | `90.90% <0.00%> (-0.76%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <0.00%> (-0.63%)` | :arrow_down: |
   | [sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=) | `88.75% <0.00%> (-0.45%)` | :arrow_down: |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [067cba8...fb60b69](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/a4ea68ced3deb4a6e5ae745dedf8ac65740f79ab?el=desc) will **decrease** coverage by `0.17%`.
   > The diff coverage is `91.85%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   - Coverage   82.58%   82.41%   -0.18%     
   ==========================================
     Files         454      454              
     Lines       55403    54352    -1051     
   ==========================================
   - Hits        45757    44793     -964     
   + Misses       9646     9559      -87     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/frame\_base.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lX2Jhc2UucHk=) | `84.97% <ø> (ø)` | |
   | [sdks/python/apache\_beam/io/external/snowflake.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZXh0ZXJuYWwvc25vd2ZsYWtlLnB5) | `0.00% <0.00%> (ø)` | |
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `96.52% <ø> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `79.78% <44.44%> (-0.28%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <58.33%> (-2.18%)` | :arrow_down: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <84.61%> (-3.85%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `94.75% <90.90%> (-0.24%)` | :arrow_down: |
   | [...python/apache\_beam/examples/wordcount\_dataframe.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X2RhdGFmcmFtZS5weQ==) | `91.66% <91.66%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `92.09% <92.09%> (+2.46%)` | :arrow_up: |
   | ... and [38 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [65047c6...c86da95](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/a4ea68ced3deb4a6e5ae745dedf8ac65740f79ab?el=desc) will **decrease** coverage by `0.17%`.
   > The diff coverage is `91.85%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   - Coverage   82.58%   82.41%   -0.18%     
   ==========================================
     Files         454      454              
     Lines       55403    54352    -1051     
   ==========================================
   - Hits        45757    44793     -964     
   + Misses       9646     9559      -87     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/frame\_base.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lX2Jhc2UucHk=) | `84.97% <ø> (ø)` | |
   | [sdks/python/apache\_beam/io/external/snowflake.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZXh0ZXJuYWwvc25vd2ZsYWtlLnB5) | `0.00% <0.00%> (ø)` | |
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `96.52% <ø> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `79.78% <44.44%> (-0.28%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <58.33%> (-2.18%)` | :arrow_down: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <84.61%> (-3.85%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `94.75% <90.90%> (-0.24%)` | :arrow_down: |
   | [...python/apache\_beam/examples/wordcount\_dataframe.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X2RhdGFmcmFtZS5weQ==) | `91.66% <91.66%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `92.09% <92.09%> (+2.46%)` | :arrow_up: |
   | ... and [38 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [65047c6...c86da95](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/a4ea68ced3deb4a6e5ae745dedf8ac65740f79ab?el=desc) will **decrease** coverage by `0.17%`.
   > The diff coverage is `91.85%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   - Coverage   82.58%   82.41%   -0.18%     
   ==========================================
     Files         454      454              
     Lines       55403    54352    -1051     
   ==========================================
   - Hits        45757    44793     -964     
   + Misses       9646     9559      -87     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/frame\_base.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lX2Jhc2UucHk=) | `84.97% <ø> (ø)` | |
   | [sdks/python/apache\_beam/io/external/snowflake.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZXh0ZXJuYWwvc25vd2ZsYWtlLnB5) | `0.00% <0.00%> (ø)` | |
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `96.52% <ø> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `79.78% <44.44%> (-0.28%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <58.33%> (-2.18%)` | :arrow_down: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <84.61%> (-3.85%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `94.75% <90.90%> (-0.24%)` | :arrow_down: |
   | [...python/apache\_beam/examples/wordcount\_dataframe.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X2RhdGFmcmFtZS5weQ==) | `91.66% <91.66%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `92.09% <92.09%> (+2.46%)` | :arrow_up: |
   | ... and [38 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [65047c6...c86da95](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/a4ea68ced3deb4a6e5ae745dedf8ac65740f79ab?el=desc) will **decrease** coverage by `0.19%`.
   > The diff coverage is `91.55%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   - Coverage   82.58%   82.39%   -0.20%     
   ==========================================
     Files         454      454              
     Lines       55403    54299    -1104     
   ==========================================
   - Hits        45757    44742    -1015     
   + Misses       9646     9557      -89     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/frame\_base.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lX2Jhc2UucHk=) | `84.97% <ø> (ø)` | |
   | [sdks/python/apache\_beam/io/external/snowflake.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZXh0ZXJuYWwvc25vd2ZsYWtlLnB5) | `0.00% <0.00%> (ø)` | |
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `96.52% <ø> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `79.78% <44.44%> (-0.28%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <58.33%> (-2.18%)` | :arrow_down: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <84.61%> (-3.85%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [...python/apache\_beam/examples/wordcount\_dataframe.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X2RhdGFmcmFtZS5weQ==) | `91.66% <91.66%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `92.09% <92.09%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3NjaGVtYXMucHk=) | `97.56% <97.22%> (-2.44%)` | :arrow_down: |
   | ... and [26 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [e3cdcc3...c80993d](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r492330184



##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
       I added an `include_indexes` option on `DataframeTransform`, `to_pcollection`, and `UnbatchPandas`. It raises an exception if used when index names are not unique or unnamed. PTAL




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/a4ea68ced3deb4a6e5ae745dedf8ac65740f79ab?el=desc) will **decrease** coverage by `0.17%`.
   > The diff coverage is `91.85%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   - Coverage   82.58%   82.41%   -0.18%     
   ==========================================
     Files         454      454              
     Lines       55403    54352    -1051     
   ==========================================
   - Hits        45757    44793     -964     
   + Misses       9646     9559      -87     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/frame\_base.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lX2Jhc2UucHk=) | `84.97% <ø> (ø)` | |
   | [sdks/python/apache\_beam/io/external/snowflake.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZXh0ZXJuYWwvc25vd2ZsYWtlLnB5) | `0.00% <0.00%> (ø)` | |
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `96.52% <ø> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `79.78% <44.44%> (-0.28%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <58.33%> (-2.18%)` | :arrow_down: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <84.61%> (-3.85%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `94.75% <90.90%> (-0.24%)` | :arrow_down: |
   | [...python/apache\_beam/examples/wordcount\_dataframe.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X2RhdGFmcmFtZS5weQ==) | `91.66% <91.66%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `92.09% <92.09%> (+2.46%)` | :arrow_up: |
   | ... and [38 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [65047c6...c86da95](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r492465294



##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
       +1 We should probably allow an explicit `include_indexes=False` to not raise an exception. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/a4ea68ced3deb4a6e5ae745dedf8ac65740f79ab?el=desc) will **decrease** coverage by `0.18%`.
   > The diff coverage is `91.55%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   - Coverage   82.58%   82.40%   -0.19%     
   ==========================================
     Files         454      454              
     Lines       55403    54355    -1048     
   ==========================================
   - Hits        45757    44791     -966     
   + Misses       9646     9564      -82     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/frame\_base.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lX2Jhc2UucHk=) | `84.97% <ø> (ø)` | |
   | [sdks/python/apache\_beam/io/external/snowflake.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZXh0ZXJuYWwvc25vd2ZsYWtlLnB5) | `0.00% <0.00%> (ø)` | |
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `96.52% <ø> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `79.78% <44.44%> (-0.28%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <58.33%> (-2.18%)` | :arrow_down: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <84.61%> (-3.85%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [...python/apache\_beam/examples/wordcount\_dataframe.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X2RhdGFmcmFtZS5weQ==) | `91.66% <91.66%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `92.09% <92.09%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3NjaGVtYXMucHk=) | `97.56% <97.22%> (-2.44%)` | :arrow_down: |
   | ... and [35 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [65047c6...c86da95](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/067cba8229694e7fb9693313f51ca686746b620a?el=desc) will **increase** coverage by `0.02%`.
   > The diff coverage is `96.89%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   + Coverage   82.34%   82.36%   +0.02%     
   ==========================================
     Files         452      454       +2     
     Lines       54016    54200     +184     
   ==========================================
   + Hits        44481    44644     +163     
   - Misses       9535     9556      +21     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3NjaGVtYXMucHk=) | `97.56% <97.22%> (-2.44%)` | :arrow_down: |
   | [sdks/python/apache\_beam/coders/row\_coder.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Jvd19jb2Rlci5weQ==) | `93.66% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `89.62% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/transforms.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3RyYW5zZm9ybXMucHk=) | `95.54% <100.00%> (+0.05%)` | :arrow_up: |
   | [sdks/python/apache\_beam/typehints/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL3NjaGVtYXMucHk=) | `93.33% <100.00%> (+0.12%)` | :arrow_up: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <0.00%> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/localfilesystem.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vbG9jYWxmaWxlc3lzdGVtLnB5) | `90.90% <0.00%> (-0.76%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <0.00%> (-0.63%)` | :arrow_down: |
   | [sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=) | `88.75% <0.00%> (-0.45%)` | :arrow_down: |
   | ... and [9 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [067cba8...fb60b69](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r492219317



##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -55,17 +159,149 @@ def expand(self, pcoll):
         lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-    return pd.Series(name=name, dtype=typ)
-  except TypeError:
-    raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-      name: _make_empty_series(name, typ)
-      for name,
-      typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+      {name: _make_proxy_series(name, typehint)
+       for name, typehint in fields},
+      columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
       I thought the MultiIndex or named case was important since otherwise we'll drop the grouped column(s) when unbatching the result of a grouped aggregation.
   
   It raise some tricky issues though:
   - Index names are not required to be unique.
   - It looks like my assumption that all MultiIndexes are named is wrong. It's possible to create a `MultiIndex` with `names=[None, None, 'foo']`, which would break this badly.
   - Type information is not necessarily preserved in indexes. e.g. Int64Index doesn't support nulls like Series with Int64Dtype does. if one is added it's converted to a Float64Index with nans.
   
   Maybe including the index shouldn't be the default until we have a better handle on these edge cases.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/a4ea68ced3deb4a6e5ae745dedf8ac65740f79ab?el=desc) will **decrease** coverage by `0.19%`.
   > The diff coverage is `91.55%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   - Coverage   82.58%   82.39%   -0.20%     
   ==========================================
     Files         454      454              
     Lines       55403    54299    -1104     
   ==========================================
   - Hits        45757    44742    -1015     
   + Misses       9646     9557      -89     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/frame\_base.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lX2Jhc2UucHk=) | `84.97% <ø> (ø)` | |
   | [sdks/python/apache\_beam/io/external/snowflake.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZXh0ZXJuYWwvc25vd2ZsYWtlLnB5) | `0.00% <0.00%> (ø)` | |
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `96.52% <ø> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `79.78% <44.44%> (-0.28%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <58.33%> (-2.18%)` | :arrow_down: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <84.61%> (-3.85%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [...python/apache\_beam/examples/wordcount\_dataframe.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X2RhdGFmcmFtZS5weQ==) | `91.66% <91.66%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `92.09% <92.09%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3NjaGVtYXMucHk=) | `97.56% <97.22%> (-2.44%)` | :arrow_down: |
   | ... and [26 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [e3cdcc3...369bba0](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/a4ea68ced3deb4a6e5ae745dedf8ac65740f79ab?el=desc) will **decrease** coverage by `0.19%`.
   > The diff coverage is `91.55%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   - Coverage   82.58%   82.39%   -0.20%     
   ==========================================
     Files         454      454              
     Lines       55403    54299    -1104     
   ==========================================
   - Hits        45757    44742    -1015     
   + Misses       9646     9557      -89     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/frame\_base.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lX2Jhc2UucHk=) | `84.97% <ø> (ø)` | |
   | [sdks/python/apache\_beam/io/external/snowflake.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZXh0ZXJuYWwvc25vd2ZsYWtlLnB5) | `0.00% <0.00%> (ø)` | |
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `96.52% <ø> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `79.78% <44.44%> (-0.28%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <58.33%> (-2.18%)` | :arrow_down: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <84.61%> (-3.85%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [...python/apache\_beam/examples/wordcount\_dataframe.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X2RhdGFmcmFtZS5weQ==) | `91.66% <91.66%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `92.09% <92.09%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3NjaGVtYXMucHk=) | `97.56% <97.22%> (-2.44%)` | :arrow_down: |
   | ... and [26 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [e3cdcc3...c80993d](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/a4ea68ced3deb4a6e5ae745dedf8ac65740f79ab?el=desc) will **increase** coverage by `0.02%`.
   > The diff coverage is `94.64%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   + Coverage   82.34%   82.36%   +0.02%     
   ==========================================
     Files         454      454              
     Lines       54119    54200      +81     
   ==========================================
   + Hits        44566    44644      +78     
   - Misses       9553     9556       +3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `89.62% <89.62%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3NjaGVtYXMucHk=) | `97.56% <97.22%> (-2.44%)` | :arrow_down: |
   | [sdks/python/apache\_beam/coders/row\_coder.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL3Jvd19jb2Rlci5weQ==) | `93.66% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/doctests.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2RvY3Rlc3RzLnB5) | `97.20% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==) | `89.83% <100.00%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/transforms.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3RyYW5zZm9ybXMucHk=) | `95.54% <100.00%> (+0.05%)` | :arrow_up: |
   | [sdks/python/apache\_beam/typehints/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL3NjaGVtYXMucHk=) | `93.33% <100.00%> (+0.12%)` | :arrow_up: |
   | [...eam/runners/interactive/options/capture\_control.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9vcHRpb25zL2NhcHR1cmVfY29udHJvbC5weQ==) | `92.00% <0.00%> (-8.00%)` | :arrow_down: |
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `94.78% <0.00%> (-1.74%)` | :arrow_down: |
   | ... and [12 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [e3cdcc3...c80993d](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-697073685






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] codecov[bot] edited a comment on pull request #12882: [BEAM-10814][BEAM-10570] DataframeTransform outputs elements

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #12882:
URL: https://github.com/apache/beam/pull/12882#issuecomment-696913901


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=h1) Report
   > Merging [#12882](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=desc) into [master](https://codecov.io/gh/apache/beam/commit/a4ea68ced3deb4a6e5ae745dedf8ac65740f79ab?el=desc) will **decrease** coverage by `0.19%`.
   > The diff coverage is `91.55%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/beam/pull/12882/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #12882      +/-   ##
   ==========================================
   - Coverage   82.58%   82.39%   -0.20%     
   ==========================================
     Files         454      454              
     Lines       55403    54299    -1104     
   ==========================================
   - Hits        45757    44742    -1015     
   + Misses       9646     9557      -89     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/dataframe/frame\_base.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lX2Jhc2UucHk=) | `84.97% <ø> (ø)` | |
   | [sdks/python/apache\_beam/io/external/snowflake.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZXh0ZXJuYWwvc25vd2ZsYWtlLnB5) | `0.00% <0.00%> (ø)` | |
   | [...beam/runners/interactive/background\_caching\_job.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9iYWNrZ3JvdW5kX2NhY2hpbmdfam9iLnB5) | `96.52% <ø> (-1.01%)` | :arrow_down: |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `79.78% <44.44%> (-0.28%)` | :arrow_down: |
   | [...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=) | `89.97% <58.33%> (-2.18%)` | :arrow_down: |
   | [...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5) | `92.80% <84.61%> (-3.85%)` | :arrow_down: |
   | [sdks/python/apache\_beam/dataframe/convert.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2NvbnZlcnQucHk=) | `88.88% <85.71%> (-0.70%)` | :arrow_down: |
   | [...python/apache\_beam/examples/wordcount\_dataframe.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X2RhdGFmcmFtZS5weQ==) | `91.66% <91.66%> (ø)` | |
   | [sdks/python/apache\_beam/dataframe/io.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2lvLnB5) | `92.09% <92.09%> (+2.46%)` | :arrow_up: |
   | [sdks/python/apache\_beam/dataframe/schemas.py](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL3NjaGVtYXMucHk=) | `97.56% <97.22%> (-2.44%)` | :arrow_down: |
   | ... and [26 more](https://codecov.io/gh/apache/beam/pull/12882/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=footer). Last update [e3cdcc3...c80993d](https://codecov.io/gh/apache/beam/pull/12882?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org