You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "ahmedabu98 (via GitHub)" <gi...@apache.org> on 2023/05/08 22:13:51 UTC

[GitHub] [beam] ahmedabu98 opened a new pull request, #26593: Bigtable Read Xlang Wrapper for Python SDK

ahmedabu98 opened a new pull request, #26593:
URL: https://github.com/apache/beam/pull/26593

   Adding a Bigtable Read connector to Python, using the recently add [BigtableReadSchemaTransform](https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigTableReadSchemaTransformProvider.java).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #26593:
URL: https://github.com/apache/beam/pull/26593#discussion_r1235680762


##########
sdks/python/apache_beam/io/gcp/bigtableio.py:
##########
@@ -227,3 +231,75 @@ def expand(self, pvalue):
                 beam_options['project_id'],
                 beam_options['instance_id'],
                 beam_options['table_id'])))
+
+
+class ReadFromBigtable(PTransform):
+  """Reads rows from Bigtable.
+
+  Returns a PCollection of PartialRowData objects, each representing a
+  Bigtable row. For more information about this row object, visit
+  https://cloud.google.com/python/docs/reference/bigtable/latest/row#class-googlecloudbigtablerowpartialrowdatarowkey
+  """
+  URN = "beam:schematransform:org.apache.beam:bigtable_read:v1"
+
+  def __init__(self, table_id, instance_id, project_id, expansion_service=None):
+    """Initialize a ReadFromBigtable transform.
+
+    :param table_id:
+      The ID of the table to read from.
+    :param instance_id:
+      The ID of the instance where the table resides.
+    :param project_id:
+      The GCP project ID.
+    :param expansion_service:
+      The address of the expansion service. If no expansion service is
+      provided, will attempt to run the default GCP expansion service.
+    """
+    super().__init__()
+    self._table_id = table_id
+    self._instance_id = instance_id
+    self._project_id = project_id
+    self._expansion_service = (
+        expansion_service or BeamJarExpansionService(
+            'sdks:java:io:google-cloud-platform:expansion-service:build'))
+    self.schematransform_config = SchemaAwareExternalTransform.discover_config(
+        self._expansion_service, self.URN)
+
+  def expand(self, input):
+    external_read = SchemaAwareExternalTransform(
+        identifier=self.schematransform_config.identifier,
+        expansion_service=self._expansion_service,
+        rearrange_based_on_discovery=True,
+        tableId=self._table_id,
+        instanceId=self._instance_id,
+        projectId=self._project_id)
+
+    return (
+        input.pipeline
+        | external_read
+        | beam.ParDo(self._BeamRowToPartialRowData()))
+
+  # PartialRowData has some useful methods for querying data within a row.
+  # To make use of those methods and to give Python users a more familiar
+  # object, we process each Beam Row and return a PartialRowData equivalent.
+  class _BeamRowToPartialRowData(beam.DoFn):
+    def process(self, row):
+      key = row.key
+      families = row.column_families
+
+      # initialize PartialRowData object
+      partial_row: PartialRowData = PartialRowData(key)
+      for fam_name, col_fam in families.items():
+        if fam_name not in partial_row.cells:
+          partial_row.cells[fam_name] = {}
+        for col_qualifier, cells in col_fam.items():
+          # store column qualifier as bytes to follow PartialRowData behavior
+          col_qualifier_bytes = col_qualifier.encode()
+          if col_qualifier not in partial_row.cells[fam_name]:
+            partial_row.cells[fam_name][col_qualifier_bytes] = []

Review Comment:
   Added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1560026157

   Run Python_Xlang_Gcp_Direct PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1564603809

   Run Python_Xlang_Gcp_Direct PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1551496848

   Run Python_Xlang_Gcp_Direct PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1566209363

   R: @chamikaramj 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1539147398

   ## [Codecov](https://app.codecov.io/gh/apache/beam/pull/26593?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#26593](https://app.codecov.io/gh/apache/beam/pull/26593?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cc2eae8) into [master](https://app.codecov.io/gh/apache/beam/commit/7bd1b331e4aff22e87e77e07e8bdda8717192a73?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7bd1b33) will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #26593      +/-   ##
   ==========================================
   - Coverage   72.06%   72.06%   -0.01%     
   ==========================================
     Files         745      745              
     Lines      101223   101222       -1     
   ==========================================
   - Hits        72950    72949       -1     
     Misses      26813    26813              
     Partials     1460     1460              
   ```
   
   
   | [Impacted Files](https://app.codecov.io/gh/apache/beam/pull/26593?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/io/gcp/bigtableio.py](https://app.codecov.io/gh/apache/beam/pull/26593?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3RhYmxlaW8ucHk=) | `74.39% <ø> (-0.31%)` | :arrow_down: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1551593619

   Run Python_Xlang_Gcp_Dataflow PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1555499300

   Run Python_Xlang_Gcp_Dataflow PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1562891872

   Run Python_Xlang_Gcp_Dataflow PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1552035829

   Run Python_Xlang_Gcp_Direct PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1542486918

   Fixes #26643


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1601379845

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1562899540

   Run Python_Xlang_Gcp_Dataflow PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1565071825

   Run Python_Xlang_Gcp_Direct PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1565071771

   Run Python_Xlang_Gcp_Direct PostCommit
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1565071718

   Run Python_Xlang_Gcp_Dataflow PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1563509960

   Run Python_Xlang_Gcp_Dataflow PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1600287140

   LGTM. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj merged pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj merged PR #26593:
URL: https://github.com/apache/beam/pull/26593


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1566209552

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1563291235

   Run Python_Xlang_Gcp_Dataflow PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1551443685

   Run  Python_Xlang_Gcp_Direct PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on a diff in pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on code in PR #26593:
URL: https://github.com/apache/beam/pull/26593#discussion_r1233149294


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadSchemaTransformProvider.java:
##########
@@ -100,60 +100,55 @@ public List<String> outputCollectionNames() {
     return Collections.singletonList(OUTPUT_TAG);
   }
 
-  /** Configuration for reading from BigTable. */
+  /** Configuration for reading from Bigtable. */
   @DefaultSchema(AutoValueSchema.class)
   @AutoValue
-  public abstract static class BigTableReadSchemaTransformConfiguration {
-    /** Instantiates a {@link BigTableReadSchemaTransformConfiguration.Builder} instance. */
+  public abstract static class BigtableReadSchemaTransformConfiguration {
+    /** Instantiates a {@link BigtableReadSchemaTransformConfiguration.Builder} instance. */
+    public void validate() {
+      String emptyStringMessage =
+          "Invalid Bigtable Read configuration: %s should not be a non-empty String";
+      checkArgument(!this.getTableId().isEmpty(), String.format(emptyStringMessage, "table"));
+      checkArgument(!this.getInstanceId().isEmpty(), String.format(emptyStringMessage, "instance"));
+      checkArgument(!this.getProjectId().isEmpty(), String.format(emptyStringMessage, "project"));
+    }
+
     public static Builder builder() {
-      return new AutoValue_BigTableReadSchemaTransformProvider_BigTableReadSchemaTransformConfiguration
+      return new AutoValue_BigtableReadSchemaTransformProvider_BigtableReadSchemaTransformConfiguration
           .Builder();
     }
 
-    public abstract String getTable();
+    public abstract String getTableId();
 
-    public abstract String getInstance();
+    public abstract String getInstanceId();
 
-    public abstract String getProject();
+    public abstract String getProjectId();
 
-    /** Builder for the {@link BigTableReadSchemaTransformConfiguration}. */
+    /** Builder for the {@link BigtableReadSchemaTransformConfiguration}. */
     @AutoValue.Builder
     public abstract static class Builder {
-      public abstract Builder setTable(String table);
-
-      public abstract Builder setInstance(String instance);
-
-      public abstract Builder setProject(String project);
-
-      abstract BigTableReadSchemaTransformConfiguration autoBuild();
+      public abstract Builder setTableId(String table);

Review Comment:
   Nit: change variable names to match the setters (tableId, instanceId, projectId).



##########
sdks/python/apache_beam/io/gcp/bigtableio.py:
##########
@@ -227,3 +231,75 @@ def expand(self, pvalue):
                 beam_options['project_id'],
                 beam_options['instance_id'],
                 beam_options['table_id'])))
+
+
+class ReadFromBigtable(PTransform):
+  """Reads rows from Bigtable.
+
+  Returns a PCollection of PartialRowData objects, each representing a
+  Bigtable row. For more information about this row object, visit
+  https://cloud.google.com/python/docs/reference/bigtable/latest/row#class-googlecloudbigtablerowpartialrowdatarowkey
+  """
+  URN = "beam:schematransform:org.apache.beam:bigtable_read:v1"
+
+  def __init__(self, table_id, instance_id, project_id, expansion_service=None):
+    """Initialize a ReadFromBigtable transform.
+
+    :param table_id:
+      The ID of the table to read from.
+    :param instance_id:
+      The ID of the instance where the table resides.
+    :param project_id:
+      The GCP project ID.
+    :param expansion_service:
+      The address of the expansion service. If no expansion service is
+      provided, will attempt to run the default GCP expansion service.
+    """
+    super().__init__()
+    self._table_id = table_id
+    self._instance_id = instance_id
+    self._project_id = project_id
+    self._expansion_service = (
+        expansion_service or BeamJarExpansionService(
+            'sdks:java:io:google-cloud-platform:expansion-service:build'))
+    self.schematransform_config = SchemaAwareExternalTransform.discover_config(
+        self._expansion_service, self.URN)
+
+  def expand(self, input):
+    external_read = SchemaAwareExternalTransform(
+        identifier=self.schematransform_config.identifier,
+        expansion_service=self._expansion_service,
+        rearrange_based_on_discovery=True,
+        tableId=self._table_id,
+        instanceId=self._instance_id,
+        projectId=self._project_id)
+
+    return (
+        input.pipeline
+        | external_read
+        | beam.ParDo(self._BeamRowToPartialRowData()))
+
+  # PartialRowData has some useful methods for querying data within a row.
+  # To make use of those methods and to give Python users a more familiar
+  # object, we process each Beam Row and return a PartialRowData equivalent.
+  class _BeamRowToPartialRowData(beam.DoFn):

Review Comment:
   Let's add unit tests to make sure that this conversion is correct for various data types etc.



##########
sdks/python/apache_beam/io/gcp/bigtableio.py:
##########
@@ -227,3 +231,75 @@ def expand(self, pvalue):
                 beam_options['project_id'],
                 beam_options['instance_id'],
                 beam_options['table_id'])))
+
+
+class ReadFromBigtable(PTransform):
+  """Reads rows from Bigtable.
+
+  Returns a PCollection of PartialRowData objects, each representing a
+  Bigtable row. For more information about this row object, visit
+  https://cloud.google.com/python/docs/reference/bigtable/latest/row#class-googlecloudbigtablerowpartialrowdatarowkey
+  """
+  URN = "beam:schematransform:org.apache.beam:bigtable_read:v1"
+
+  def __init__(self, table_id, instance_id, project_id, expansion_service=None):
+    """Initialize a ReadFromBigtable transform.
+
+    :param table_id:
+      The ID of the table to read from.
+    :param instance_id:
+      The ID of the instance where the table resides.
+    :param project_id:
+      The GCP project ID.
+    :param expansion_service:
+      The address of the expansion service. If no expansion service is
+      provided, will attempt to run the default GCP expansion service.
+    """
+    super().__init__()
+    self._table_id = table_id
+    self._instance_id = instance_id
+    self._project_id = project_id
+    self._expansion_service = (
+        expansion_service or BeamJarExpansionService(
+            'sdks:java:io:google-cloud-platform:expansion-service:build'))
+    self.schematransform_config = SchemaAwareExternalTransform.discover_config(
+        self._expansion_service, self.URN)
+
+  def expand(self, input):
+    external_read = SchemaAwareExternalTransform(
+        identifier=self.schematransform_config.identifier,
+        expansion_service=self._expansion_service,
+        rearrange_based_on_discovery=True,
+        tableId=self._table_id,
+        instanceId=self._instance_id,
+        projectId=self._project_id)
+
+    return (
+        input.pipeline
+        | external_read
+        | beam.ParDo(self._BeamRowToPartialRowData()))
+
+  # PartialRowData has some useful methods for querying data within a row.
+  # To make use of those methods and to give Python users a more familiar
+  # object, we process each Beam Row and return a PartialRowData equivalent.
+  class _BeamRowToPartialRowData(beam.DoFn):
+    def process(self, row):
+      key = row.key
+      families = row.column_families
+
+      # initialize PartialRowData object
+      partial_row: PartialRowData = PartialRowData(key)
+      for fam_name, col_fam in families.items():
+        if fam_name not in partial_row.cells:
+          partial_row.cells[fam_name] = {}
+        for col_qualifier, cells in col_fam.items():
+          # store column qualifier as bytes to follow PartialRowData behavior
+          col_qualifier_bytes = col_qualifier.encode()
+          if col_qualifier not in partial_row.cells[fam_name]:
+            partial_row.cells[fam_name][col_qualifier_bytes] = []

Review Comment:
   Also, let's add unit tests to cover the inner loop here. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1562899674

   Run Python_Xlang_Gcp_Direct PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1563463294

   Run Python_Xlang_Gcp_Dataflow PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1564653016

   Run Java_Spark3_Versions PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1564652866

   Run Portable_Python PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "johnjcasey (via GitHub)" <gi...@apache.org>.
johnjcasey commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1581261778

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #26593:
URL: https://github.com/apache/beam/pull/26593#discussion_r1235618745


##########
sdks/python/apache_beam/io/gcp/bigtableio.py:
##########
@@ -227,3 +231,75 @@ def expand(self, pvalue):
                 beam_options['project_id'],
                 beam_options['instance_id'],
                 beam_options['table_id'])))
+
+
+class ReadFromBigtable(PTransform):
+  """Reads rows from Bigtable.
+
+  Returns a PCollection of PartialRowData objects, each representing a
+  Bigtable row. For more information about this row object, visit
+  https://cloud.google.com/python/docs/reference/bigtable/latest/row#class-googlecloudbigtablerowpartialrowdatarowkey
+  """
+  URN = "beam:schematransform:org.apache.beam:bigtable_read:v1"
+
+  def __init__(self, table_id, instance_id, project_id, expansion_service=None):
+    """Initialize a ReadFromBigtable transform.
+
+    :param table_id:
+      The ID of the table to read from.
+    :param instance_id:
+      The ID of the instance where the table resides.
+    :param project_id:
+      The GCP project ID.
+    :param expansion_service:
+      The address of the expansion service. If no expansion service is
+      provided, will attempt to run the default GCP expansion service.
+    """
+    super().__init__()
+    self._table_id = table_id
+    self._instance_id = instance_id
+    self._project_id = project_id
+    self._expansion_service = (
+        expansion_service or BeamJarExpansionService(
+            'sdks:java:io:google-cloud-platform:expansion-service:build'))
+    self.schematransform_config = SchemaAwareExternalTransform.discover_config(
+        self._expansion_service, self.URN)
+
+  def expand(self, input):
+    external_read = SchemaAwareExternalTransform(
+        identifier=self.schematransform_config.identifier,
+        expansion_service=self._expansion_service,
+        rearrange_based_on_discovery=True,
+        tableId=self._table_id,
+        instanceId=self._instance_id,
+        projectId=self._project_id)
+
+    return (
+        input.pipeline
+        | external_read
+        | beam.ParDo(self._BeamRowToPartialRowData()))
+
+  # PartialRowData has some useful methods for querying data within a row.
+  # To make use of those methods and to give Python users a more familiar
+  # object, we process each Beam Row and return a PartialRowData equivalent.
+  class _BeamRowToPartialRowData(beam.DoFn):

Review Comment:
   the data type returned from reading Bigtable cells is always bytes ([code documentation](https://cloud.google.com/java/docs/reference/google-cloud-bigtable/latest/com.google.bigtable.v2.Cell#com_google_bigtable_v2_Cell_getValue__))



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1564603605

   Run Python_Xlang_Gcp_Dataflow PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1564653493

   Run Python_Xlang_Gcp_Direct PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1564653609

   Run Python_Xlang_Gcp_Dataflow PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1552036027

   Run Python_Xlang_Gcp_Dataflow PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on pull request #26593: Bigtable Read Xlang Wrapper for Python SDK

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on PR #26593:
URL: https://github.com/apache/beam/pull/26593#issuecomment-1600288936

   Run Java_GCP_IO_Direct PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org