You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/07/01 05:52:27 UTC

[GitHub] [beam] piotr-szuberski opened a new pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

piotr-szuberski opened a new pull request #12145:
URL: https://github.com/apache/beam/pull/12145


   Encountered problems:
   1. For now the python transforms are very limited as RowCoder is yet to be upgraded:
   - Write can accept only primitive types without boolean, bytes
   - Read can accept even more limited types because LogicalTypes are not implemented in python
   2. Write transform uses JdbcIO.write() - it would be better to use writeRows() with automated Row coder when it's implemented.
   In general coders need to be upgraded before these transforms are usable .
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | --- | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r456453517



##########
File path: sdks/python/apache_beam/io/jdbc.py
##########
@@ -0,0 +1,272 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable
+  Flink and Spark runners.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc
+  transforms. This option is only available for Beam 2.24.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+    and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Jdbc transforms use the
+  'beam-sdks-java-io-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.

Review comment:
       I added `Experimental; no backwards compatibility guarantees.` at the end of the docstrings.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656790677


   Why are they taking ~15min? When I run them locally they're quite fast


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r456207160



##########
File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##########
@@ -1982,17 +1982,25 @@ class BeamModulePlugin implements Plugin<Project> {
         return argList.join(' ')
       }
 
-      project.ext.toxTask = { name, tox_env ->
+      project.ext.toxTask = { name, tox_env, needsExpansionServiceJar = false ->
+        project.evaluationDependsOn(":sdks:java:io:expansion-service")
         project.tasks.create(name) {
           dependsOn 'setupVirtualenv'
           dependsOn ':sdks:python:sdist'
+          if (needsExpansionServiceJar) {
+            dependsOn ':sdks:java:io:expansion-service:shadowJar'
+          }
 
           doLast {
             // Python source directory is also tox execution workspace, We want
             // to isolate them per tox suite to avoid conflict when running
             // multiple tox suites in parallel.
             project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
-
+            if (needsExpansionServiceJar) {
+              def expansionServiceJar =  project.project(':sdks:java:io:expansion-service').shadowJar.archivePath
+              def expansionServiceDestinationDir = "${copiedSrcRoot}/sdks/java/io/expansion-service/build/libs"

Review comment:
       I understand that we may need it for  integration tests (most of those should  run in postcommits), but why precommits/unit-tests-on-tox need it? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655317727


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654229168


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660254599






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660121423






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657410328


   @chamikaramj @TheNeuralBit  I've made the requested changes. Also, I found an utility in JdbcUtil that is able to call proper set method on PreparedStatement, so I think there is no need to test BeamRowPreparedStatementSetter separately anymore (there already is a test of this utility and it would be a copy-paste). I had to make the JdbcUtil class public instead of package-private because I wanted to have external classes in separate package.
   
   I'll publish the external builder test after this PR is merged.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite. I'm able to run the tests with `--test-pipeline-options="--runner=(Flink/Spark)Runner"` locally as well. It's strange that on your setup you encountered a timeout.
   
   I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660442659


   Run Python 3.6 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657505727


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r456598535



##########
File path: build.gradle
##########
@@ -273,6 +273,7 @@ task python37PostCommit() {
 
 task python38PostCommit() {
   dependsOn ":sdks:python:test-suites:portable:py38:crossLanguagePythonJavaKafkaIOFlink"
+  dependsOn ":sdks:python:test-suites:portable:py38:crossLanguagePythonJavaJdbcIO"

Review comment:
       Some tests will run only in high-priority python versions and that's ok. @lazylynx  is working on making the workflow easy to configure in https://github.com/apache/beam/pull/12239.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660341731


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655625695


   @TheNeuralBit I've moved the tests execution to python postcommit suite, like KafkaIO xlang test. I don't have a clue why they failed on ValidateCrossLanguageFlink. Now the Jdbc tests pass, but I continously have `apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types` failing. I checked the postcommit cron executions and it fails quite often. I'll trigger it from time to time to confirm whether it's a flake or not.
   
   If you have further suggestions for code improvement then go ahead.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656870717


   > 10 mins for pre-commit is a lot. Probably we should limit this to some of test combinations or Python versions. @tvalentyn for recommendations on this.
   
   I'll limit the execution of these tests to python 3.8 with
   ```
   @unittest.skipIf(
       sys.version_info <= (3, 8), 'Run the test on py38 or higher only')
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654539018


   From jenkins it looks like the test is timing out:
   ```
   04:38:09 INFO:apache_beam.runners.portability.portable_runner:Job state changed to RUNNING
   05:41:59 Build timed out (after 100 minutes). Marking the build as aborted.
   ```
   
   I noticed a similar behavior when I try to run the test locally with
   ```
   python setup.py nosetests \
           --tests apache_beam.io.external.xlang_jdbcio_it_test \
           --test-pipeline-options="--runner=FlinkRunner"
   ```
   Eventually the job server logs that the pipeline is finished but then the test runner just hangs. I'm not sure where it's getting stuck


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660255791


   Hmm, any idea why it's not listed here ? https://ci-beam.apache.org/job/beam_PostCommit_Python37_PR/206/testReport/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660280966


   > Sorry about not being clear. When I said regular PostCommit I did not mean Python 2. We should definitely run this in Python 3 post-commits. Let's add it back to Python 3 and check console log to make sure that the tasks run as Valentyn mentioned. A separate task will be to export logs to xml test reports for easy visibility.
   
   They won't execute on python 2 because there are skips when there are no packages installed. I was able to remove the gradle task and run the test in `:sdks:python:test-suites:portable:py38:postCommitPy38`. It should publish the test results by default, so maybe it's worth to run it there? The only disadvantage is that `:sdks:java:container:docker` will run on all postcommit python jobs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656791328


   Each of them (read, write) takes ~40-70s and they run on cython, python-cloud on python3.5, 3.6, 3.7, 3.8, so it's 16 * 40seconds


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656471698


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453161777



##########
File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##########
@@ -1982,17 +1982,25 @@ class BeamModulePlugin implements Plugin<Project> {
         return argList.join(' ')
       }
 
-      project.ext.toxTask = { name, tox_env ->
+      project.ext.toxTask = { name, tox_env, needsExpansionServiceJar = false ->
+        project.evaluationDependsOn(":sdks:java:io:expansion-service")
         project.tasks.create(name) {
           dependsOn 'setupVirtualenv'
           dependsOn ':sdks:python:sdist'
+          if (needsExpansionServiceJar) {
+            dependsOn ':sdks:java:io:expansion-service:shadowJar'
+          }
 
           doLast {
             // Python source directory is also tox execution workspace, We want
             // to isolate them per tox suite to avoid conflict when running
             // multiple tox suites in parallel.
             project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
-
+            if (needsExpansionServiceJar) {
+              def expansionServiceJar =  project.project(':sdks:java:io:expansion-service').shadowJar.archivePath
+              def expansionServiceDestinationDir = "${copiedSrcRoot}/sdks/java/io/expansion-service/build/libs"

Review comment:
       Sorry, I didn't read carefully. Tox suites use separate environment and BeamJarExpansionService('sdks:java:io:expansion-service:shadowJar') cannot find the jar if it is not copied into the tox suite's root path




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453063006



##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalWrite.java
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.sql.Date;
+import java.sql.PreparedStatement;
+import java.sql.SQLException;
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.Write} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalWrite implements ExternalTransformRegistrar {
+
+  public static final String URN = "beam:external:java:jdbc:write:v1";
+
+  @Override
+  public Map<String, Class<? extends ExternalTransformBuilder>> knownBuilders() {
+    return ImmutableMap.of(URN, JdbcExternalWrite.Builder.class);
+  }
+
+  /** Parameters class to expose the Write transform to an external SDK. */
+  public static class WriteConfiguration extends Configuration {
+    private String statement;
+
+    public void setStatement(String statement) {
+      this.statement = statement;
+    }
+  }
+
+  public static class Builder
+      implements ExternalTransformBuilder<WriteConfiguration, PCollection<Row>, PDone> {
+    @Override
+    public PTransform<PCollection<Row>, PDone> buildExternal(WriteConfiguration configuration) {
+      DataSourceConfiguration dataSourceConfiguration = configuration.getDataSourceConfiguration();
+
+      // TODO: BEAM-10396 use writeRows() when it's available
+      return JdbcIO.<Row>write()
+          .withDataSourceConfiguration(dataSourceConfiguration)
+          .withStatement(configuration.statement)
+          .withPreparedStatementSetter(new XlangPreparedStatementSetter());
+    }
+
+    private static class XlangPreparedStatementSetter

Review comment:
       Good idea. Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654901752


   Yeah, I noticed that it times out.
   
   Thanks that you tried to run the tests on your setup. Locally on my computer `:runners:flink:1.10:job-server:validatesCrossLanguageRunner` passes and stops without a hassle, I don't get what is the reason the `XVR_Flink PostCommit` fails.
   
   Especially that `Python PreCommit` also runs those tests and they pass.
   
   Tomorrow I'll try to run them on Ubuntu, maybe OSX is the problem.. but I doubt it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453161777



##########
File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##########
@@ -1982,17 +1982,25 @@ class BeamModulePlugin implements Plugin<Project> {
         return argList.join(' ')
       }
 
-      project.ext.toxTask = { name, tox_env ->
+      project.ext.toxTask = { name, tox_env, needsExpansionServiceJar = false ->
+        project.evaluationDependsOn(":sdks:java:io:expansion-service")
         project.tasks.create(name) {
           dependsOn 'setupVirtualenv'
           dependsOn ':sdks:python:sdist'
+          if (needsExpansionServiceJar) {
+            dependsOn ':sdks:java:io:expansion-service:shadowJar'
+          }
 
           doLast {
             // Python source directory is also tox execution workspace, We want
             // to isolate them per tox suite to avoid conflict when running
             // multiple tox suites in parallel.
             project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
-
+            if (needsExpansionServiceJar) {
+              def expansionServiceJar =  project.project(':sdks:java:io:expansion-service').shadowJar.archivePath
+              def expansionServiceDestinationDir = "${copiedSrcRoot}/sdks/java/io/expansion-service/build/libs"

Review comment:
       Sorry, I didn't read carefully. Tox suites use separate environment and BeamJarExpansionService('sdks:java:io:expansion-service:shadowJar') cannot find the jar if it is not copied into the tox's root path




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453059995



##########
File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##########
@@ -1982,17 +1982,25 @@ class BeamModulePlugin implements Plugin<Project> {
         return argList.join(' ')
       }
 
-      project.ext.toxTask = { name, tox_env ->
+      project.ext.toxTask = { name, tox_env, needsExpansionServiceJar = false ->
+        project.evaluationDependsOn(":sdks:java:io:expansion-service")
         project.tasks.create(name) {
           dependsOn 'setupVirtualenv'
           dependsOn ':sdks:python:sdist'
+          if (needsExpansionServiceJar) {
+            dependsOn ':sdks:java:io:expansion-service:shadowJar'
+          }
 
           doLast {
             // Python source directory is also tox execution workspace, We want
             // to isolate them per tox suite to avoid conflict when running
             // multiple tox suites in parallel.
             project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
-
+            if (needsExpansionServiceJar) {
+              def expansionServiceJar =  project.project(':sdks:java:io:expansion-service').shadowJar.archivePath
+              def expansionServiceDestinationDir = "${copiedSrcRoot}/sdks/java/io/expansion-service/build/libs"

Review comment:
       Python tox suites tlike pythonLint, pythonFormatter, pythonDocs do not require expansion service to be built for their executions. Only tox test suites require them.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654901927


   Run XVR_Spark PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655368641






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660271768


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654647958


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654164768


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660327975


   1. leave the separate task and try to publish the test results
   2. leave it in the postcommitIT
   
   Either is fine by me. We just have to make sure that the IT you add here is successfully running in at least one active post-commit test suite and we can detect/track failures. Based on what you said, seems like it'll be running in "Python 3.8 PostCommit".


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660264396


   > Hmm, any idea why it's not listed here ? https://ci-beam.apache.org/job/beam_PostCommit_Python37_PR/206/testReport/
   
   It's for the same reason why Kafka's cross-language tests are not listed in python 3.8 postcommit: https://ci-beam.apache.org/job/beam_PostCommit_Python38_PR/21/testReport/
   
   They run from a separate task, not from the postCommitIT which runs the tests via run_integration_test.sh script. It could probably be run via that task after some modifications, could I add it as a task in Jira and look after it in a different PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656515114


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656479859


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660336934


   >     1. leave the separate task and try to publish the test results
   > 
   >     2. leave it in the postcommitIT
   > 
   > 
   > Either is fine by me. We just have to make sure that the IT you add here is successfully running in at least one active post-commit test suite and we can detect/track failures. Based on what you said, seems like it'll be running in "Python 3.8 PostCommit".
   
   Ok, it's already visible in the postcommit tests results


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-653932181


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655625695


   @TheNeuralBit I've moved the tests execution to python postcommit suite, like KafkaIO xlang test. I don't have a clue why they were timing out on ValidateCrossLanguageFlink task.
   
   If you have further suggestions for code improvement then go ahead.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655317727


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453064048



##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.

Review comment:
       Sorry, no. I didn't manage to run in on Dataflow. I'll remove this information.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660307058


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655457457


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655481902






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658734369


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r456600783



##########
File path: build.gradle
##########
@@ -273,6 +273,7 @@ task python37PostCommit() {
 
 task python38PostCommit() {
   dependsOn ":sdks:python:test-suites:portable:py38:crossLanguagePythonJavaKafkaIOFlink"
+  dependsOn ":sdks:python:test-suites:portable:py38:crossLanguagePythonJavaJdbcIO"

Review comment:
       @tvalentyn should I remove this from other python versions then?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656664035


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453061196



##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalRead.java
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.ReadRows} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalRead implements ExternalTransformRegistrar {

Review comment:
       Good point, I'll change it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660442777


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660253642


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655588336


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655382593


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-652985646


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654901927






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660341563


   > Nice! Thanks.
   
   Ok, I'll run other python postcommits just to be on the safe side


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658687357


   Run PythonDocker PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453161122



##########
File path: sdks/java/io/expansion-service/build.gradle
##########
@@ -33,6 +33,9 @@ ext.summary = "Expansion service serving several Java IOs"
 dependencies {
   compile project(":sdks:java:expansion-service")
   compile project(":sdks:java:io:kafka")
+  runtime project(":sdks:java:io:jdbc")
   runtime library.java.kafka_clients
+  // Include postgres so it can be used with external JDBC
+  runtime library.java.postgres

Review comment:
       I changed it to compile dependencies just to be on the safe side. Strange thing is that the size of the io-expansion-service has the same growth in size as with the runtime dependencies - from 51,4Mb to 52,7Mb - maybe it was why it worked.
   
   I managed to start flink job server on docker and the io-expansion-server on non-default port and run the tests using the io-expansion-server. I'm not sure though whether it confirms it works or not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660285741


   Sounds good. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite and pass.
   
   The problem was in validatesCrossLanguageRunner task. It has more complicated setup (run flink/spark job-service, then run test expansion service, use PortableRunner) and maybe something in there caused the problem. But even though it passed locally on my setup so I wasn't able to reconstruct this issue. I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658807601


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453061920



##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalWrite.java
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.sql.Date;
+import java.sql.PreparedStatement;
+import java.sql.SQLException;
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.Write} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalWrite implements ExternalTransformRegistrar {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660485108


   @chamikaramj Ok, the tests are green :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660369041


   Run Python 3.6 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r456328082



##########
File path: sdks/python/apache_beam/io/jdbc.py
##########
@@ -0,0 +1,272 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable
+  Flink and Spark runners.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc
+  transforms. This option is only available for Beam 2.24.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+    and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Jdbc transforms use the
+  'beam-sdks-java-io-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.

Review comment:
       Let's mark new transforms here as experimental.

##########
File path: build.gradle
##########
@@ -273,6 +273,7 @@ task python37PostCommit() {
 
 task python38PostCommit() {
   dependsOn ":sdks:python:test-suites:portable:py38:crossLanguagePythonJavaKafkaIOFlink"
+  dependsOn ":sdks:python:test-suites:portable:py38:crossLanguagePythonJavaJdbcIO"

Review comment:
       Probably add to other post-commits as well ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-659936518


   > versions
   
   Ok, then I'll leave execution of these tests in postcommit only :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660367490


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654260584


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453062888



##########
File path: sdks/java/io/expansion-service/build.gradle
##########
@@ -33,6 +33,9 @@ ext.summary = "Expansion service serving several Java IOs"
 dependencies {
   compile project(":sdks:java:expansion-service")
   compile project(":sdks:java:io:kafka")
+  runtime project(":sdks:java:io:jdbc")
   runtime library.java.kafka_clients
+  // Include postgres so it can be used with external JDBC
+  runtime library.java.postgres

Review comment:
       Let's test/confirm before adding it :)
   For example this should work for a distributed Flink deployment as well not just a local instance running in a single JVM.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r452611151



##########
File path: sdks/java/container/boot.go
##########
@@ -122,6 +122,7 @@ func main() {
 		filepath.Join(jarsDir, "beam-sdks-java-harness.jar"),
 		filepath.Join(jarsDir, "beam-sdks-java-io-kafka.jar"),
 		filepath.Join(jarsDir, "kafka-clients.jar"),
+		filepath.Join(jarsDir, "beam-sdks-java-io-jdbc.jar"),

Review comment:
       You're right, that's a good news! :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655625695


   @TheNeuralBit I've moved the tests execution to python postcommit suite, like KafkaIO xlang test. I don't have a clue why they were timing out on ValidateCrossLanguageFlink task. Now the Jdbc tests pass, but I continously have `apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types` failing. I checked the postcommit cron executions and it fails quite often. I'll trigger it from time to time to confirm whether it's a flake or not.
   
   If you have further suggestions for code improvement then go ahead.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657548355


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r452971734



##########
File path: CHANGES.md
##########
@@ -55,6 +55,8 @@
 
 * New overloads for BigtableIO.Read.withKeyRange() and BigtableIO.Read.withRowFilter()
   methods that take ValueProvider as a parameter (Java) ([BEAM-10283](https://issues.apache.org/jira/browse/BEAM-10283)).
+* Add cross-language support to JdbcIO.ReadRows([BEAM-10135](https://issues.apache.org/jira/browse/BEAM-10135)).
+* Add cross-language support to JdbcIO.Write([BEAM-10136](https://issues.apache.org/jira/browse/BEAM-10136)).

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654333554


   @TheNeuralBit For some reason "Flink CrossLanguageValidatesRunner Tests" timeouts on xlang_jdbc test and on the other hand the same test passes on Python precommit suite.
   
   At first it didn't see the URN as it wasn't added to the test expansion service. But now I have no idea what's going on, everything seems to be like in generate_sequence but for some reason my test gets hanged forever.
   
   I changed the test to use EXPANSION_PORT when available because Postcommit suites run test expansion service and expects the tests to use  it.
   
   Do you have any idea what could be the reason or how to debug it? 
   I don't seem to be able to run :runners:flink:1.10:job-server:validatesCrossLanguageRunnerPythonUsingPython locally


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r456453517



##########
File path: sdks/python/apache_beam/io/jdbc.py
##########
@@ -0,0 +1,272 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable
+  Flink and Spark runners.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc
+  transforms. This option is only available for Beam 2.24.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+    and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Jdbc transforms use the
+  'beam-sdks-java-io-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.

Review comment:
       I added `Experimental; no backwards compatibility guarantees.` at the end of the docstring.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660336201


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656515318


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-653899979


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654828281


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657587958


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656870717


   > 10 mins for pre-commit is a lot. Probably we should limit this to some of test combinations or Python versions. @tvalentyn for recommendations on this.
   
   I've limited execution of the tests by reducing installing sqlalchemy, psycopg2 and testcontainers packages to python 3.8.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-659936518


   > All precommit unit test suites (3.5, 3.6, 3.7, 3.8) run in parallel on the same machine, so if we add the tests to 3.8 only, total precommit time will increase to a similar extent as if we ran the test in all precommits.
   
   @tvalentyn Ok, then I'm leaving the execution of these tests in postcommit only :)
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660270608


   Oh sorry, I pushed before I could see your response. But maybe that's ok if the job passes - the test will be run in postcommit and there will be a little less code :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656289151


   > Those tests run on Flink and Spark in Python Postcommit suite.
   
   Oh really? I didn't realize that. How do you know? I'm having a hard time looking through the logs to verify since they're so large.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660270608


   Oh sorry, I pushed before I could see your response. But maybe that's ok if the job passes :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658687120






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654333554


   @TheNeuralBit For some reason "Flink CrossLanguageValidatesRunner Tests" timeouts on xlang_jdbc test and on the other hand the same test passes on Python precommit suite.
   
   At first it didn't see the URN as it wasn't added to the test expansion service. But now I have no idea what's going on, everything seems to be like in generate_sequence but for some reason my test gets hanged forever.
   
   I changed the test to use EXPANSION_PORT when available because Postcommit suites run test expansion service and run tests with it.
   
   Do you have any idea what could be the reason for the test to hang in this way? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657639087


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656665253


   @TheNeuralBit I've done the requested changes :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660267196


   Ah "python 3.8 postcommit" is good. And based on Valentyn's comment seems like this is by choice. So let's just wait for that to finish :))


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660253424


   > Thanks. LGTM other than a couple of comments.
   > 
   > Did the post-commit 3.8 tests trigger with new tests ? Could not find it in the list.
   
   Yes, I could find the crossLanguagePythonJavaJdbcIO tasks running both for Flink and Spark. It runs in a separate task, so it's easy to find them in jenkins logs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660343286


   > Nice! Thanks.
   
   Should I rerun everything until everything is green or can I omit obvious flakes like the one in precommit?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660264396


   > Hmm, any idea why it's not listed here ? https://ci-beam.apache.org/job/beam_PostCommit_Python37_PR/206/testReport/
   
   It's for the same reason why Kafka's cross-language tests are not listed in python 3.8 postcommit: https://ci-beam.apache.org/job/beam_PostCommit_Python38_PR/21/testReport/
   
   They run from a separate task, not from the postCommitIT which runs the tests via run_integration_test.sh script. It could probably be run via that task, could I add it as a task in Jira and look after it in a different PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655625695


   @TheNeuralBit I've moved the tests execution to python postcommit suite, like KafkaIO xlang test. Now the Jdbc tests pass, but I continously have `apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types` failing. I checked the postcommit cron executions and it fails quite often. I'll trigger it from time to time to confirm whether it's a flake or not.
   
   If you have further suggestions for code improvement then go ahead.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453157075



##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalWrite.java
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.sql.Date;
+import java.sql.PreparedStatement;
+import java.sql.SQLException;
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.Write} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalWrite implements ExternalTransformRegistrar {
+
+  public static final String URN = "beam:external:java:jdbc:write:v1";
+
+  @Override
+  public Map<String, Class<? extends ExternalTransformBuilder>> knownBuilders() {
+    return ImmutableMap.of(URN, JdbcExternalWrite.Builder.class);
+  }
+
+  /** Parameters class to expose the Write transform to an external SDK. */
+  public static class WriteConfiguration extends Configuration {
+    private String statement;
+
+    public void setStatement(String statement) {
+      this.statement = statement;
+    }
+  }
+
+  public static class Builder
+      implements ExternalTransformBuilder<WriteConfiguration, PCollection<Row>, PDone> {
+    @Override
+    public PTransform<PCollection<Row>, PDone> buildExternal(WriteConfiguration configuration) {
+      DataSourceConfiguration dataSourceConfiguration = configuration.getDataSourceConfiguration();
+
+      // TODO: BEAM-10396 use writeRows() when it's available
+      return JdbcIO.<Row>write()
+          .withDataSourceConfiguration(dataSourceConfiguration)
+          .withStatement(configuration.statement)
+          .withPreparedStatementSetter(new XlangPreparedStatementSetter());
+    }
+
+    private static class XlangPreparedStatementSetter
+        implements JdbcIO.PreparedStatementSetter<Row> {
+      @Override
+      public void setParameters(Row row, PreparedStatement statement) throws SQLException {
+        List<Schema.Field> fieldTypes = row.getSchema().getFields();
+        for (int i = 0; i < fieldTypes.size(); ++i) {
+          Schema.TypeName typeName = fieldTypes.get(i).getType().getTypeName();
+          switch (typeName) {
+            case DATETIME:

Review comment:
       @chamikaramj  I'll do a separate PR for this on Monday. I don't get how should I deal with such PRs that depend on each other? I'm used to Gerrit where every commit is a separate PR rebased on the previous commit. But here it would be a duplicate PR with one more commit. Is that ok or there is a smarter way to publish a PR based on the changes that are not merged yet and reviewed in another PR?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660322823


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660346187


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r452611306



##########
File path: sdks/java/io/jdbc/build.gradle
##########
@@ -25,6 +25,7 @@ description = "Apache Beam :: SDKs :: Java :: IO :: JDBC"
 ext.summary = "IO to read and write on JDBC datasource."
 
 dependencies {
+  compile library.java.postgres

Review comment:
       Good idea!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656757637


   Thanks. Taking a look as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658687120


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654333554


   @TheNeuralBit For some reason "Flink CrossLanguageValidatesRunner Tests" timeouts on xlang_jdbc test and on the other hand the same test passes on Python precommit suite.
   
   At first it didn't see the URN as it wasn't added to the test expansion service. But now I have no idea what's going on, everything seems to be like in generate_sequence but for some reason my test gets hanged forever.
   
   I changed the test to use EXPANSION_PORT when available because Postcommit suites run test expansion service and run tests with it.
   
   Do you have any idea what could be the reason? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654116484


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-659936518


   > All precommit unit test suites (3.5, 3.6, 3.7, 3.8) run in parallel on the same machine, so if we add the tests to 3.8 only, total precommit time will increase to a similar extent as if we ran the test in all precommits.
   
   Ok, then I'll leave execution of these tests in postcommit only :)
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite and pass.
   
   The problem was in validatesCrossLanguageRunner task. It has more complicated setup (run flink/spark job-service, then run test expansion service, use PortableRunner) and maybe something in there caused the problem. But even though it passed locally on my setup (even with fresh Beam repo) so I wasn't able to reconstruct this issue. I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660448975


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite and pass, so I think there is no need for the ticket.
   
   The problem was in validatesCrossLanguageRunner task. It has more complicated setup (run flink/spark job-service, then run test expansion service, use PortableRunner). But even though it passed locally on my setup so I wasn't able to reconstruct this issue. I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656026101


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656791328


   Each of them (read, write) takes ~40-70s and they run on cython, python-cloud on python3.5, 3.6, 3.7, 3.8, so it's 16 * 40 ~= 10min


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657548608






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660273491


   Sorry about not being clear. When I said regular PostCommit I did not mean Python 2. We should definitely run this in Python 3 post-commits. Let's add it back to Python 3 and check console log to make sure that the tasks run as Valentyn mentioned. A separate task will be to export logs to xml test reports for easy visibility.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655748822


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657471270


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r456277969



##########
File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##########
@@ -1982,17 +1982,25 @@ class BeamModulePlugin implements Plugin<Project> {
         return argList.join(' ')
       }
 
-      project.ext.toxTask = { name, tox_env ->
+      project.ext.toxTask = { name, tox_env, needsExpansionServiceJar = false ->
+        project.evaluationDependsOn(":sdks:java:io:expansion-service")
         project.tasks.create(name) {
           dependsOn 'setupVirtualenv'
           dependsOn ':sdks:python:sdist'
+          if (needsExpansionServiceJar) {
+            dependsOn ':sdks:java:io:expansion-service:shadowJar'
+          }
 
           doLast {
             // Python source directory is also tox execution workspace, We want
             // to isolate them per tox suite to avoid conflict when running
             // multiple tox suites in parallel.
             project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
-
+            if (needsExpansionServiceJar) {
+              def expansionServiceJar =  project.project(':sdks:java:io:expansion-service').shadowJar.archivePath
+              def expansionServiceDestinationDir = "${copiedSrcRoot}/sdks/java/io/expansion-service/build/libs"

Review comment:
       Because I understood that we want these tests to run in precommit with fn_api runner. If we don't then I'll leave them running in postcommit suites (which is good enough as I assume) and disable them in precommit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656791328


   Each of them takes ~40-70s and they are run on cython, python-clouds on python3.5, 3.6, 3.7, 3.8


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660342190


   Run Python 3.6 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657410328


   @chamikaramj @TheNeuralBit  I've made the requested changes. Also, I found an utility in JdbcUtil (earlier for some reason I looked only in SchemaUtil) that is able to call proper set method on PreparedStatement, so I think there is no need to test BeamRowPreparedStatementSetter separately anymore (there already is a test of this utility and it would be a copy-paste). I had to make the JdbcUtil class public instead of package-private because I wanted to have external classes in separate package.
   
   I'll publish the external builder test after this PR is merged.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r452611468



##########
File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##########
@@ -1982,17 +1982,25 @@ class BeamModulePlugin implements Plugin<Project> {
         return argList.join(' ')
       }
 
-      project.ext.toxTask = { name, tox_env ->
+      project.ext.toxTask = { name, tox_env, needsExpansionServiceJar = true ->

Review comment:
       Yeah, definitely. I made it true as default without thinking at the moment of writing it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r452610919



##########
File path: CHANGES.md
##########
@@ -78,6 +78,8 @@
   is experimental. It reads data from BigQuery by exporting data to Avro files, and reading those files. It also supports
   reading data by exporting to JSON files. This has small differences in behavior for Time and Date-related fields. See
   Pydoc for more information.
+* Add cross-language support to JdbcIO.ReadRows([BEAM-10135](https://issues.apache.org/jira/browse/BEAM-10135)).
+* Add cross-language support to JdbcIO.Write([BEAM-10136](https://issues.apache.org/jira/browse/BEAM-10136)).

Review comment:
       Yeah, the time has passed :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654992334


   Run XVR_Spark PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453060645



##########
File path: sdks/java/io/expansion-service/build.gradle
##########
@@ -33,6 +33,9 @@ ext.summary = "Expansion service serving several Java IOs"
 dependencies {
   compile project(":sdks:java:expansion-service")
   compile project(":sdks:java:io:kafka")
+  runtime project(":sdks:java:io:jdbc")
   runtime library.java.kafka_clients
+  // Include postgres so it can be used with external JDBC
+  runtime library.java.postgres

Review comment:
       To be honest I don't really know these dependencies work under the hood. It just happened to work this way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658687216


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656791328


   Each of them takes ~40-70s and they run on cython, python-cloud on python3.5, 3.6, 3.7, 3.8


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660252801


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r452971372



##########
File path: sdks/java/io/expansion-service/build.gradle
##########
@@ -33,6 +33,8 @@ ext.summary = "Expansion service serving several Java IOs"
 dependencies {
   compile project(":sdks:java:expansion-service")
   compile project(":sdks:java:io:kafka")
+  runtime project(":sdks:java:io:jdbc")
   runtime library.java.kafka_clients
+  runtime library.java.postgres

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656514840


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656851781


   10 mins for pre-commit is a lot. Probably we should limit this to some of test combinations or Python versions. @tvalentyn for recommendations on this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655395224


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660485108


   @chamikaramj Ok, tests are green :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657410328


   @chamikaramj @TheNeuralBit  I've made the requested changes. Also, I found an utility in JdbcUtil (earlier for some reason I looked only in SchemaUtil) that is able to call proper set method on PreparedStatement, so I think there is no need to test BeamRowPreparedStatementSetter separately anymore (there already is a test of this utility and it would be a copy-paste). I had to make the JdbcUtil class public instead of package-private because I wanted to have external classes in separate package.
   
   I'll add the external builder test PR after this PR is merged.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658282575






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654333554


   @TheNeuralBit For some reason "Flink CrossLanguageValidatesRunner Tests" timeouts on xlang_jdbc test and on the other hand the same test passes on Python precommit suite.
   
   At first it didn't see the URN as it wasn't added to the test expansion service. But now I have no idea what's going on, everything seems to be like in generate_sequence but for some reason my test gets hanged forever.
   
   I changed the test to use EXPANSION_PORT when available because Postcommit suites run test expansion service and expects the tests to use  it.
   
   Do you have any idea what could be the reason or how to debug it? 
   
   I managed to run the scripts
   ```
   run_job_server.sh ...
   run_expansion_services.sh  ...
   
   ./scripts/run_integration_test.sh --pipeline_opts "--runner=PortableRunner --environment_cache_millis=10000 --job_endpoint=localhost:18091" --test_opts "--attr=UsesCrossLanguageTransforms" --suite xlangValidateRunner
   ```
   Tests fail with output:
   ```
   
   -------------------- >> begin captured stdout << ---------------------
   
   Pulling image postgres:latest
   
   Container started:  91387bffec
   Waiting to be ready...
   
   --------------------- >> end captured stdout << ----------------------
   
   <_InactiveRpcError of RPC that terminated with:
   	status = StatusCode.UNAVAILABLE
   	details = "DNS resolution failed"
          debug_error_string = ...
   ```
   
   Could it be that Postgres container is running in another container?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite. I'm able to do so locally as well. It's strange that on your setup you encountered a timeout as on Jenkins everything goes as expected.
   
   I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453064853



##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc

Review comment:
       Right, done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-653924699


   Run SQL_Java11 PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453063816



##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0

Review comment:
       Ok, done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660265684


   Can you please add it to regular post-commit please ? There's no point in adding an IT that we cannot easily track. I don't think following the path of Kafka IT is good since it's not run anywhere and not tracked.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453059995



##########
File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##########
@@ -1982,17 +1982,25 @@ class BeamModulePlugin implements Plugin<Project> {
         return argList.join(' ')
       }
 
-      project.ext.toxTask = { name, tox_env ->
+      project.ext.toxTask = { name, tox_env, needsExpansionServiceJar = false ->
+        project.evaluationDependsOn(":sdks:java:io:expansion-service")
         project.tasks.create(name) {
           dependsOn 'setupVirtualenv'
           dependsOn ':sdks:python:sdist'
+          if (needsExpansionServiceJar) {
+            dependsOn ':sdks:java:io:expansion-service:shadowJar'
+          }
 
           doLast {
             // Python source directory is also tox execution workspace, We want
             // to isolate them per tox suite to avoid conflict when running
             // multiple tox suites in parallel.
             project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
-
+            if (needsExpansionServiceJar) {
+              def expansionServiceJar =  project.project(':sdks:java:io:expansion-service').shadowJar.archivePath
+              def expansionServiceDestinationDir = "${copiedSrcRoot}/sdks/java/io/expansion-service/build/libs"

Review comment:
       Yeah, these are python tox suites that include pythonLint, pythonFormatter, pythonDocs that do not require expansion service to be built for their executions. Only test suites require them.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655429792






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655957540


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660455599


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657410328


   @chamikaramj @TheNeuralBit  I've made the requested changes. Also, I found an utility in JdbcUtil that is able to call proper set method on PreparedStatement, so I think there is no need to test BeamRowPreparedStatementSetter separately anymore (there already is a test of this utility). I had to make the JdbcUtil class public instead of package-private because I wanted to have external classes in separate package.
   
   I'll publish the external builder test after this PR is merged.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658270493


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] egeucak commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
egeucak commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-889928732


   hey, sorry for resurrecting the PR, but I don't really feel like Jira is being monitored that closely. I couldn't get "statement", and "query" arguments for python API to work. Any idea why that could be happening? @piotr-szuberski  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654333554


   @TheNeuralBit For some reason "Flink CrossLanguageValidatesRunner Tests" timeouts on xlang_jdbc test and on the other hand the same test passes on Python precommit suite.
   
   At first it didn't see the URN as it wasn't added to the test expansion service. But now I have no idea what's going on, everything seems to be like in generate_sequence but for some reason my test gets hanged forever.
   
   I changed the test to use EXPANSION_PORT when available because Postcommit suites run test expansion service and expects the tests to use  it.
   
   Do you have any idea what could be the reason? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655325323


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660112864


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657471270


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r456207160



##########
File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##########
@@ -1982,17 +1982,25 @@ class BeamModulePlugin implements Plugin<Project> {
         return argList.join(' ')
       }
 
-      project.ext.toxTask = { name, tox_env ->
+      project.ext.toxTask = { name, tox_env, needsExpansionServiceJar = false ->
+        project.evaluationDependsOn(":sdks:java:io:expansion-service")
         project.tasks.create(name) {
           dependsOn 'setupVirtualenv'
           dependsOn ':sdks:python:sdist'
+          if (needsExpansionServiceJar) {
+            dependsOn ':sdks:java:io:expansion-service:shadowJar'
+          }
 
           doLast {
             // Python source directory is also tox execution workspace, We want
             // to isolate them per tox suite to avoid conflict when running
             // multiple tox suites in parallel.
             project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
-
+            if (needsExpansionServiceJar) {
+              def expansionServiceJar =  project.project(':sdks:java:io:expansion-service').shadowJar.archivePath
+              def expansionServiceDestinationDir = "${copiedSrcRoot}/sdks/java/io/expansion-service/build/libs"

Review comment:
       I understand that we may need it for postcommit (integration tests), but why precommits-on-tox? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453061806



##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalRead.java
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.ReadRows} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalRead implements ExternalTransformRegistrar {
+
+  public static final String URN = "beam:external:java:jdbc:read:v1";

Review comment:
       Sure! Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655751166


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654647958






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658765830


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453071802



##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalRead.java
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.ReadRows} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalRead implements ExternalTransformRegistrar {
+
+  public static final String URN = "beam:external:java:jdbc:read:v1";
+
+  @Override
+  public Map<String, Class<? extends ExternalTransformBuilder>> knownBuilders() {
+    return ImmutableMap.of(URN, JdbcExternalRead.Builder.class);
+  }
+
+  /** Parameters class to expose the Read transform to an external SDK. */
+  public static class ReadConfiguration extends Configuration {
+    private String query;
+    private Integer fetchSize;
+    private Boolean outputParallelization;
+
+    public void setOutputParallelization(Boolean outputParallelization) {
+      this.outputParallelization = outputParallelization;
+    }
+
+    public void setFetchSize(Integer fetchSize) {
+      this.fetchSize = fetchSize;
+    }
+
+    public void setQuery(String query) {
+      this.query = query;
+    }
+  }
+
+  public static class Builder
+      implements ExternalTransformBuilder<ReadConfiguration, PBegin, PCollection<Row>> {
+    @Override
+    public PTransform<PBegin, PCollection<Row>> buildExternal(ReadConfiguration configuration) {
+      DataSourceConfiguration dataSourceConfiguration = configuration.getDataSourceConfiguration();
+
+      JdbcIO.ReadRows readRows =

Review comment:
       A separate PR for this is fine. Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-653926713






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654823349


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-659936518


   > All precommit unit test suites (3.5, 3.6, 3.7, 3.8) run in parallel on the same machine, so if we add the tests to 3.8 only, total precommit time will increase to a similar extent as if we ran the test in all precommits.
   
   Ok, then I'm leaving the execution of these tests in postcommit only :)
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655841690


   Interesting. Looks like it is running fine on Python PostCommit, and I'm able to run it locally with the fn api runner as well:
   ```
   python setup.py nosetests --tests apache_beam.io.external.xlang_jdbcio_it_test
   ```
   (cc: @sclukas77 - it looks like this command should work for you now if you have all the dependencies built)
   
   Running on Python PostCommit is fine with me, I just wanted to make sure we have it running continuously _somewhere_.
   
   Can you file a jira so we don't forget to investigate the issue on Flink/Spark?
   
   I'd like to take another look through the code when I'm fresh tomorrow, sorry this is taking so long :grimacing: 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453067817



##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc
+  transforms. This option is only available for Beam 2.22.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+    and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Jdbc transforms use the
+  'beam-sdks-java-io-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.
+  * Update your pipeline to provide the expansion service address when
+    initiating Jdbc transforms provided in this module.
+
+  Flink Users can use the built-in Expansion Service of the Flink Runner's
+  Job Server. If you start Flink's Job Server, the expansion service will be
+  started on port 8097. For a different address, please set the
+  expansion_service parameter.
+
+  **More information**
+
+  For more information regarding cross-language transforms see:
+  - https://beam.apache.org/roadmap/portability/
+
+  For more information specific to Flink runner see:
+  - https://beam.apache.org/documentation/runners/flink/
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import typing
+
+from past.builtins import unicode
+
+from apache_beam.transforms.external import BeamJarExpansionService
+from apache_beam.transforms.external import ExternalTransform
+from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder
+
+__all__ = [
+    'WriteToJdbc',
+    'ReadFromJdbc',
+]
+
+
+def default_io_expansion_service():
+  return BeamJarExpansionService('sdks:java:io:expansion-service:shadowJar')
+
+
+WriteToJdbcSchema = typing.NamedTuple(
+    'WriteToJdbcSchema',
+    [
+        ('driver_class_name', unicode),
+        ('jdbc_url', unicode),
+        ('username', unicode),
+        ('password', unicode),
+        ('connection_properties', typing.Optional[unicode]),
+        ('connection_init_sqls', typing.Optional[typing.List[unicode]]),
+        ('statement', unicode),
+    ],
+)
+
+
+class WriteToJdbc(ExternalTransform):
+  """A PTransform which writes Rows to the specified database via JDBC.
+
+  This transform receives Rows defined as NamedTuple type and registered in
+  the coders registry, e.g.::
+
+    ExampleRow = typing.NamedTuple('ExampleRow',
+                                   [('id', int), ('name', unicode)])
+    coders.registry.register_coder(ExampleRow, coders.RowCoder)

Review comment:
       Ok, done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657410328


   @chamikaramj @TheNeuralBit  I've made the requested changes. Also, I found an utility in JdbcUtil (earlier for some reason I looked only in SchemaUtil) that is able to call proper set method on PreparedStatement, so I think there is no need to test BeamRowPreparedStatementSetter separately anymore (there already is a test of this utility and it would be a copy-paste).
   
   I'll add the external builder test PR after this PR is merged.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-652560391


   Unfortunately the python test is taken by tox precommit suites - I suppose it shouldn't?
   Btw in which suites should this test be running? postCommitPortable?
   
   Other xlang tests have some skipIfs, e.g.:
   ```
   @unittest.skipUnless(
       os.environ.get('EXPANSION_PORT'))
   ```
   But they are really using those env variables and my test doesn't need any setup. How would you suggest to solve it?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453068182



##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalRead.java
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.ReadRows} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalRead implements ExternalTransformRegistrar {
+
+  public static final String URN = "beam:external:java:jdbc:read:v1";
+
+  @Override
+  public Map<String, Class<? extends ExternalTransformBuilder>> knownBuilders() {
+    return ImmutableMap.of(URN, JdbcExternalRead.Builder.class);
+  }
+
+  /** Parameters class to expose the Read transform to an external SDK. */
+  public static class ReadConfiguration extends Configuration {
+    private String query;
+    private Integer fetchSize;
+    private Boolean outputParallelization;
+
+    public void setOutputParallelization(Boolean outputParallelization) {
+      this.outputParallelization = outputParallelization;
+    }
+
+    public void setFetchSize(Integer fetchSize) {
+      this.fetchSize = fetchSize;
+    }
+
+    public void setQuery(String query) {
+      this.query = query;
+    }
+  }
+
+  public static class Builder
+      implements ExternalTransformBuilder<ReadConfiguration, PBegin, PCollection<Row>> {
+    @Override
+    public PTransform<PBegin, PCollection<Row>> buildExternal(ReadConfiguration configuration) {
+      DataSourceConfiguration dataSourceConfiguration = configuration.getDataSourceConfiguration();
+
+      JdbcIO.ReadRows readRows =

Review comment:
       Ok, I'll do that on Monday. Could it be on separate PR? This one is already a big one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660254878


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660346140


   We need tests to be green before merging.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654901752


   Yeah, I noticed that it times out.
   
   Thanks that you tried to run the tests on your setup. Locally on my computer `:runners:flink:1.10:job-server:validatesCrossLanguageRunner` passes and stops without a hassle, I don't get what is the reason the `XVR_Flink PostCommit` fails.
   
   Especially that `Python PreCommit` also runs those tests and they pass.
   
   Tomorrow I'll try to run them on Ubuntu, maybe OSX is the problem. But I doubt it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660367490


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-652719638


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-652560391


   Unfortunately the python test is taken by tox precommit suites - I suppose it shouldn't?
   Btw in which suites should this test be running? postCommitPortable?
   
   Other xlang tests have some skipIfs, e.g.:
   ```
   @unittest.skipUnless(
       os.environ.get('EXPANSION_PORT'))
   ```
   But they are really using those env variables and my test doesn't need any setup. How would you suggest to solve it?
   For now I added, like in sql_test
   ```
   @unittest.skipIf(
       TestPipeline().get_pipeline_options().view_as(StandardOptions).runner is
       None,
       "Must be run with a runner that supports staging java artifacts.")
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite.
   
   The problem was in validatesCrossLanguageRunner task. It has more complicated setup (run flink/spark job-service, then run test expansion service, use PortableRunner) and maybe something in there caused the problem. But even though it passed locally on my setup (even with fresh Beam repo) so I wasn't able to reconstruct this issue. I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654260584






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r452611306



##########
File path: sdks/java/io/jdbc/build.gradle
##########
@@ -25,6 +25,7 @@ description = "Apache Beam :: SDKs :: Java :: IO :: JDBC"
 ext.summary = "IO to read and write on JDBC datasource."
 
 dependencies {
+  compile library.java.postgres

Review comment:
       Good idea! I could even add it as runtime dependency




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655429792


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658765677


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r456452350



##########
File path: build.gradle
##########
@@ -273,6 +273,7 @@ task python37PostCommit() {
 
 task python38PostCommit() {
   dependsOn ":sdks:python:test-suites:portable:py38:crossLanguagePythonJavaKafkaIOFlink"
+  dependsOn ":sdks:python:test-suites:portable:py38:crossLanguagePythonJavaJdbcIO"

Review comment:
       I'm ok with that. The reason why I chose py38 only was that KafkaIO cross-language tests are run on py38 only.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660314220


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660366910


   Run Python 2 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r452918966



##########
File path: sdks/java/io/expansion-service/build.gradle
##########
@@ -33,6 +33,8 @@ ext.summary = "Expansion service serving several Java IOs"
 dependencies {
   compile project(":sdks:java:expansion-service")
   compile project(":sdks:java:io:kafka")
+  runtime project(":sdks:java:io:jdbc")
   runtime library.java.kafka_clients
+  runtime library.java.postgres

Review comment:
       nit: It might not be clear why this is here
   ```suggestion
     // Include postgres so it can be used with external JDBC
     runtime library.java.postgres
   ```

##########
File path: CHANGES.md
##########
@@ -55,6 +55,8 @@
 
 * New overloads for BigtableIO.Read.withKeyRange() and BigtableIO.Read.withRowFilter()
   methods that take ValueProvider as a parameter (Java) ([BEAM-10283](https://issues.apache.org/jira/browse/BEAM-10283)).
+* Add cross-language support to JdbcIO.ReadRows([BEAM-10135](https://issues.apache.org/jira/browse/BEAM-10135)).
+* Add cross-language support to JdbcIO.Write([BEAM-10136](https://issues.apache.org/jira/browse/BEAM-10136)).

Review comment:
       ```suggestion
   * Add cross-language support to Java's JdbcIO, now available in the Python module `apache_beam.io.external.jdbc` ([BEAM-10135](https://issues.apache.org/jira/browse/BEAM-10135), [BEAM-10136](https://issues.apache.org/jira/browse/BEAM-10136)).
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658267248






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657410328


   @chamikaramj @TheNeuralBit  I've made the requested changes. Also, I found an utility in JdbcUtil (earlier for some reason I looked only in SchemaUtil) that is able to call proper set method on PreparedStatement, so I think there is no need to test BeamRowPreparedStatementSetter separately anymore (there already is a test of this utility and it would be a copy-paste). I had to make the JdbcUtil class public instead of package-private because I wanted to have external classes in separate package.
   
   I'll publish the external builder test PR after this PR is merged.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654134208


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660342249


   Run Python 3.5 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453020473



##########
File path: sdks/java/io/expansion-service/build.gradle
##########
@@ -33,6 +33,9 @@ ext.summary = "Expansion service serving several Java IOs"
 dependencies {
   compile project(":sdks:java:expansion-service")
   compile project(":sdks:java:io:kafka")
+  runtime project(":sdks:java:io:jdbc")
   runtime library.java.kafka_clients
+  // Include postgres so it can be used with external JDBC
+  runtime library.java.postgres

Review comment:
       These runtime dependencies are not included in the shadow jar ? How do they get staged for the runner ?

##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalRead.java
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.ReadRows} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalRead implements ExternalTransformRegistrar {

Review comment:
       How about renaming this to "JdbcReadRowsRegistrar". Current name make this sounds like a transform while in reality this is just a utility to register the existing transform with the expansion service.

##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.

Review comment:
       Is support for Dataflow runner confirmed and tested ?

##########
File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##########
@@ -1982,17 +1982,25 @@ class BeamModulePlugin implements Plugin<Project> {
         return argList.join(' ')
       }
 
-      project.ext.toxTask = { name, tox_env ->
+      project.ext.toxTask = { name, tox_env, needsExpansionServiceJar = false ->
+        project.evaluationDependsOn(":sdks:java:io:expansion-service")
         project.tasks.create(name) {
           dependsOn 'setupVirtualenv'
           dependsOn ':sdks:python:sdist'
+          if (needsExpansionServiceJar) {
+            dependsOn ':sdks:java:io:expansion-service:shadowJar'
+          }
 
           doLast {
             // Python source directory is also tox execution workspace, We want
             // to isolate them per tox suite to avoid conflict when running
             // multiple tox suites in parallel.
             project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
-
+            if (needsExpansionServiceJar) {
+              def expansionServiceJar =  project.project(':sdks:java:io:expansion-service').shadowJar.archivePath
+              def expansionServiceDestinationDir = "${copiedSrcRoot}/sdks/java/io/expansion-service/build/libs"

Review comment:
       Could you clarify why this copy is needed ? Is building ":sdks:java:io:expansion-service" inadequate for some reason ?

##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalRead.java
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.ReadRows} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalRead implements ExternalTransformRegistrar {
+
+  public static final String URN = "beam:external:java:jdbc:read:v1";

Review comment:
       Probably "beam:external:java:jdbc:read_rows:v1" is more unique to this transform

##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalWrite.java
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.sql.Date;
+import java.sql.PreparedStatement;
+import java.sql.SQLException;
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.Write} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalWrite implements ExternalTransformRegistrar {

Review comment:
       JdbcWriteRegistrar

##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0

Review comment:
       Please move this to io module directly. "apache_beam/io/jdbc.py"

##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc
+  transforms. This option is only available for Beam 2.22.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+    and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Jdbc transforms use the
+  'beam-sdks-java-io-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.
+  * Update your pipeline to provide the expansion service address when
+    initiating Jdbc transforms provided in this module.
+
+  Flink Users can use the built-in Expansion Service of the Flink Runner's
+  Job Server. If you start Flink's Job Server, the expansion service will be
+  started on port 8097. For a different address, please set the
+  expansion_service parameter.
+
+  **More information**
+
+  For more information regarding cross-language transforms see:
+  - https://beam.apache.org/roadmap/portability/
+
+  For more information specific to Flink runner see:
+  - https://beam.apache.org/documentation/runners/flink/
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import typing
+
+from past.builtins import unicode
+
+from apache_beam.transforms.external import BeamJarExpansionService
+from apache_beam.transforms.external import ExternalTransform
+from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder
+
+__all__ = [
+    'WriteToJdbc',
+    'ReadFromJdbc',
+]
+
+
+def default_io_expansion_service():
+  return BeamJarExpansionService('sdks:java:io:expansion-service:shadowJar')
+
+
+WriteToJdbcSchema = typing.NamedTuple(
+    'WriteToJdbcSchema',
+    [
+        ('driver_class_name', unicode),
+        ('jdbc_url', unicode),
+        ('username', unicode),
+        ('password', unicode),
+        ('connection_properties', typing.Optional[unicode]),
+        ('connection_init_sqls', typing.Optional[typing.List[unicode]]),
+        ('statement', unicode),
+    ],
+)
+
+
+class WriteToJdbc(ExternalTransform):
+  """A PTransform which writes Rows to the specified database via JDBC.
+
+  This transform receives Rows defined as NamedTuple type and registered in
+  the coders registry, e.g.::
+
+    ExampleRow = typing.NamedTuple('ExampleRow',
+                                   [('id', int), ('name', unicode)])
+    coders.registry.register_coder(ExampleRow, coders.RowCoder)
+
+  An example can be found in

Review comment:
       Is this example being added in a separate PR ?

##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc

Review comment:
       This should be Beam 2.24.0 for Jdbc.

##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc
+  transforms. This option is only available for Beam 2.22.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+    and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Jdbc transforms use the
+  'beam-sdks-java-io-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.
+  * Update your pipeline to provide the expansion service address when
+    initiating Jdbc transforms provided in this module.
+
+  Flink Users can use the built-in Expansion Service of the Flink Runner's
+  Job Server. If you start Flink's Job Server, the expansion service will be
+  started on port 8097. For a different address, please set the
+  expansion_service parameter.
+
+  **More information**
+
+  For more information regarding cross-language transforms see:
+  - https://beam.apache.org/roadmap/portability/
+
+  For more information specific to Flink runner see:
+  - https://beam.apache.org/documentation/runners/flink/
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import typing
+
+from past.builtins import unicode
+
+from apache_beam.transforms.external import BeamJarExpansionService
+from apache_beam.transforms.external import ExternalTransform
+from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder
+
+__all__ = [
+    'WriteToJdbc',
+    'ReadFromJdbc',
+]
+
+
+def default_io_expansion_service():
+  return BeamJarExpansionService('sdks:java:io:expansion-service:shadowJar')
+
+
+WriteToJdbcSchema = typing.NamedTuple(
+    'WriteToJdbcSchema',
+    [
+        ('driver_class_name', unicode),
+        ('jdbc_url', unicode),
+        ('username', unicode),
+        ('password', unicode),
+        ('connection_properties', typing.Optional[unicode]),
+        ('connection_init_sqls', typing.Optional[typing.List[unicode]]),
+        ('statement', unicode),
+    ],
+)
+
+
+class WriteToJdbc(ExternalTransform):
+  """A PTransform which writes Rows to the specified database via JDBC.
+
+  This transform receives Rows defined as NamedTuple type and registered in
+  the coders registry, e.g.::
+
+    ExampleRow = typing.NamedTuple('ExampleRow',
+                                   [('id', int), ('name', unicode)])
+    coders.registry.register_coder(ExampleRow, coders.RowCoder)
+
+  An example can be found in
+  `apache_beam.examples.xlang_jdbcio_it_test`
+  """
+
+  URN = 'beam:external:java:jdbc:write:v1'
+
+  def __init__(
+      self,
+      driver_class_name,
+      jdbc_url,
+      username,
+      password,
+      statement,
+      connection_properties=None,
+      connection_init_sqls=None,
+      expansion_service=None,
+  ):
+    """
+    Initializes a write operation to Jdbc.
+
+    :param driver_class_name: name of the jdbc driver class
+    :param jdbc_url: full jdbc url to the database.
+    :param username: database username
+    :param password: database password
+    :param statement: sql statement to be executed
+    :param connection_properties: properties of the jdbc connection
+                                  passed as string with format
+                                  [propertyName=property;]*
+    :param connection_init_sqls: required only for MySql and MariaDB.
+                                 passed as list of strings
+    :param expansion_service: The address (host:port) of the ExpansionService.
+    """
+
+    super(WriteToJdbc, self).__init__(
+        self.URN,
+        NamedTupleBasedPayloadBuilder(
+            WriteToJdbcSchema(
+                driver_class_name=driver_class_name,
+                jdbc_url=jdbc_url,
+                username=username,
+                password=password,
+                statement=statement,
+                connection_properties=connection_properties,
+                connection_init_sqls=connection_init_sqls,
+            ),
+        ),
+        expansion_service or default_io_expansion_service(),
+    )
+
+
+ReadFromJdbcSchema = typing.NamedTuple(
+    'ReadFromJdbcSchema',
+    [
+        ('driver_class_name', unicode),
+        ('jdbc_url', unicode),
+        ('username', unicode),
+        ('password', unicode),
+        ('connection_properties', typing.Optional[unicode]),
+        ('connection_init_sqls', typing.Optional[typing.List[unicode]]),
+        ('query', unicode),
+        ('fetch_size', typing.Optional[int]),
+        ('output_parallelization', typing.Optional[bool]),
+    ],
+)
+
+
+class ReadFromJdbc(ExternalTransform):
+  """A PTransform which reads Rows from the specified database via JDBC.
+
+  This transform delivers Rows defined as NamedTuple registered in
+  the coders registry, e.g.::
+
+    ExampleRow = typing.NamedTuple('ExampleRow',
+                                   [('id', int), ('name', unicode)])
+    coders.registry.register_coder(ExampleRow, coders.RowCoder)

Review comment:
       Ditto regarding adding a simple code example here.

##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalWrite.java
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.sql.Date;
+import java.sql.PreparedStatement;
+import java.sql.SQLException;
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.Write} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalWrite implements ExternalTransformRegistrar {
+
+  public static final String URN = "beam:external:java:jdbc:write:v1";
+
+  @Override
+  public Map<String, Class<? extends ExternalTransformBuilder>> knownBuilders() {
+    return ImmutableMap.of(URN, JdbcExternalWrite.Builder.class);
+  }
+
+  /** Parameters class to expose the Write transform to an external SDK. */
+  public static class WriteConfiguration extends Configuration {
+    private String statement;
+
+    public void setStatement(String statement) {
+      this.statement = statement;
+    }
+  }
+
+  public static class Builder
+      implements ExternalTransformBuilder<WriteConfiguration, PCollection<Row>, PDone> {
+    @Override
+    public PTransform<PCollection<Row>, PDone> buildExternal(WriteConfiguration configuration) {
+      DataSourceConfiguration dataSourceConfiguration = configuration.getDataSourceConfiguration();
+
+      // TODO: BEAM-10396 use writeRows() when it's available
+      return JdbcIO.<Row>write()
+          .withDataSourceConfiguration(dataSourceConfiguration)
+          .withStatement(configuration.statement)
+          .withPreparedStatementSetter(new XlangPreparedStatementSetter());
+    }
+
+    private static class XlangPreparedStatementSetter

Review comment:
       How about calling this "BeamRowPreparedStatementSetter" and moving this out of external directory ? This sounds like a more generic utility.

##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc
+  transforms. This option is only available for Beam 2.22.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+    and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Jdbc transforms use the
+  'beam-sdks-java-io-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.
+  * Update your pipeline to provide the expansion service address when
+    initiating Jdbc transforms provided in this module.
+
+  Flink Users can use the built-in Expansion Service of the Flink Runner's
+  Job Server. If you start Flink's Job Server, the expansion service will be
+  started on port 8097. For a different address, please set the
+  expansion_service parameter.
+
+  **More information**
+
+  For more information regarding cross-language transforms see:
+  - https://beam.apache.org/roadmap/portability/
+
+  For more information specific to Flink runner see:
+  - https://beam.apache.org/documentation/runners/flink/
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import typing
+
+from past.builtins import unicode
+
+from apache_beam.transforms.external import BeamJarExpansionService
+from apache_beam.transforms.external import ExternalTransform
+from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder
+
+__all__ = [
+    'WriteToJdbc',
+    'ReadFromJdbc',
+]
+
+
+def default_io_expansion_service():
+  return BeamJarExpansionService('sdks:java:io:expansion-service:shadowJar')
+
+
+WriteToJdbcSchema = typing.NamedTuple(
+    'WriteToJdbcSchema',
+    [
+        ('driver_class_name', unicode),
+        ('jdbc_url', unicode),
+        ('username', unicode),
+        ('password', unicode),
+        ('connection_properties', typing.Optional[unicode]),
+        ('connection_init_sqls', typing.Optional[typing.List[unicode]]),
+        ('statement', unicode),
+    ],
+)
+
+
+class WriteToJdbc(ExternalTransform):
+  """A PTransform which writes Rows to the specified database via JDBC.
+
+  This transform receives Rows defined as NamedTuple type and registered in
+  the coders registry, e.g.::
+
+    ExampleRow = typing.NamedTuple('ExampleRow',
+                                   [('id', int), ('name', unicode)])
+    coders.registry.register_coder(ExampleRow, coders.RowCoder)

Review comment:
       Let's add a simple code example here to showcase the usage of this transform.

##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalRead.java
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.ReadRows} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalRead implements ExternalTransformRegistrar {
+
+  public static final String URN = "beam:external:java:jdbc:read:v1";
+
+  @Override
+  public Map<String, Class<? extends ExternalTransformBuilder>> knownBuilders() {
+    return ImmutableMap.of(URN, JdbcExternalRead.Builder.class);
+  }
+
+  /** Parameters class to expose the Read transform to an external SDK. */
+  public static class ReadConfiguration extends Configuration {
+    private String query;
+    private Integer fetchSize;
+    private Boolean outputParallelization;
+
+    public void setOutputParallelization(Boolean outputParallelization) {
+      this.outputParallelization = outputParallelization;
+    }
+
+    public void setFetchSize(Integer fetchSize) {
+      this.fetchSize = fetchSize;
+    }
+
+    public void setQuery(String query) {
+      this.query = query;
+    }
+  }
+
+  public static class Builder
+      implements ExternalTransformBuilder<ReadConfiguration, PBegin, PCollection<Row>> {
+    @Override
+    public PTransform<PBegin, PCollection<Row>> buildExternal(ReadConfiguration configuration) {
+      DataSourceConfiguration dataSourceConfiguration = configuration.getDataSourceConfiguration();
+
+      JdbcIO.ReadRows readRows =

Review comment:
       How about adding unit tests for Builder classes for read and write ?

##########
File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/external/JdbcExternalWrite.java
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc.external;
+
+import com.google.auto.service.AutoService;
+import java.sql.Date;
+import java.sql.PreparedStatement;
+import java.sql.SQLException;
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.io.jdbc.JdbcIO;
+import org.apache.beam.sdk.io.jdbc.JdbcIO.DataSourceConfiguration;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/** Exposes {@link JdbcIO.Write} as an external transform for cross-language usage. */
+@Experimental(Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class JdbcExternalWrite implements ExternalTransformRegistrar {
+
+  public static final String URN = "beam:external:java:jdbc:write:v1";
+
+  @Override
+  public Map<String, Class<? extends ExternalTransformBuilder>> knownBuilders() {
+    return ImmutableMap.of(URN, JdbcExternalWrite.Builder.class);
+  }
+
+  /** Parameters class to expose the Write transform to an external SDK. */
+  public static class WriteConfiguration extends Configuration {
+    private String statement;
+
+    public void setStatement(String statement) {
+      this.statement = statement;
+    }
+  }
+
+  public static class Builder
+      implements ExternalTransformBuilder<WriteConfiguration, PCollection<Row>, PDone> {
+    @Override
+    public PTransform<PCollection<Row>, PDone> buildExternal(WriteConfiguration configuration) {
+      DataSourceConfiguration dataSourceConfiguration = configuration.getDataSourceConfiguration();
+
+      // TODO: BEAM-10396 use writeRows() when it's available
+      return JdbcIO.<Row>write()
+          .withDataSourceConfiguration(dataSourceConfiguration)
+          .withStatement(configuration.statement)
+          .withPreparedStatementSetter(new XlangPreparedStatementSetter());
+    }
+
+    private static class XlangPreparedStatementSetter
+        implements JdbcIO.PreparedStatementSetter<Row> {
+      @Override
+      public void setParameters(Row row, PreparedStatement statement) throws SQLException {
+        List<Schema.Field> fieldTypes = row.getSchema().getFields();
+        for (int i = 0; i < fieldTypes.size(); ++i) {
+          Schema.TypeName typeName = fieldTypes.get(i).getType().getTypeName();
+          switch (typeName) {
+            case DATETIME:

Review comment:
       Please add unit tests for these conversions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656784166


   @TheNeuralBit @chamikaramj BTW the BigQuery tests (BigQueryWriteIntegrationTests.test_big_query_write_new_types, BigQueryQueryToTableIT.test_big_query_new_types_native and few others) in Python Postcommit are flaky enough to cause this job fail more often than pass. Is there a jira ticket for it?
   
   Other thing that disturbs me is that those JdbcIO python tests run ~15min on Python Precommit. I think this is ok now but when other xlang transforms appear then it will be a serious problem - they'll probably need a separate precommit or something.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654901752


   Yeah, I noticed that it times out.
   
   Thanks that you tried to run the tests on your setup. Locally on my computer `:runners:flink:1.10:job-server:validatesCrossLanguageRunner` passes without a hassle, I don't get what is the reason the `XVR_Flink PostCommit` fails.
   
   Especially that `Python PreCommit` also runs those tests and they pass.
   
   Tomorrow I'll try to run them on Ubuntu, maybe OSX is the problem.. but I doubt it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655539531


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656784166


   @TheNeuralBit @chamikaramj BTW the BigQuery tests (BigQueryWriteIntegrationTests.test_big_query_write_new_types, BigQueryQueryToTableIT.test_big_query_new_types_native and few others) in Python Postcommit are flaky enough to cause this job fail more often than pass. Is there a jira ticket for it?
   
   Other thing that disturbs me is that those JdbcIO python tests run ~15min on Python Precommit. I think this is ok now but when other xlang transforms appear running in precommit then it will be a serious problem - they'll probably need a separate precommit or something.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655547727


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453085135



##########
File path: sdks/java/io/expansion-service/build.gradle
##########
@@ -33,6 +33,9 @@ ext.summary = "Expansion service serving several Java IOs"
 dependencies {
   compile project(":sdks:java:expansion-service")
   compile project(":sdks:java:io:kafka")
+  runtime project(":sdks:java:io:jdbc")
   runtime library.java.kafka_clients
+  // Include postgres so it can be used with external JDBC
+  runtime library.java.postgres

Review comment:
       They must be in the shadowJar though right? Otherwise the request to expand jdbc:read would fail immediately




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658734369






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453060645



##########
File path: sdks/java/io/expansion-service/build.gradle
##########
@@ -33,6 +33,9 @@ ext.summary = "Expansion service serving several Java IOs"
 dependencies {
   compile project(":sdks:java:expansion-service")
   compile project(":sdks:java:io:kafka")
+  runtime project(":sdks:java:io:jdbc")
   runtime library.java.kafka_clients
+  // Include postgres so it can be used with external JDBC
+  runtime library.java.postgres

Review comment:
       To be honest I don't really know these dependencies work under the hood.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654860333


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657548608


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655122687






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r456452350



##########
File path: build.gradle
##########
@@ -273,6 +273,7 @@ task python37PostCommit() {
 
 task python38PostCommit() {
   dependsOn ":sdks:python:test-suites:portable:py38:crossLanguagePythonJavaKafkaIOFlink"
+  dependsOn ":sdks:python:test-suites:portable:py38:crossLanguagePythonJavaJdbcIO"

Review comment:
       I'm ok with that. The reason why I chose py38 only was that KafkaIO cross-language tests are run on py38 only.
   
   But the tests can't be run on py2 due to testcontainers availability for python3 only.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660442800


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-659831659


   > 10 mins for pre-commit is a lot. Probably we should limit this to some of test combinations or Python versions. @tvalentyn for recommendations on this.
   
   All precommit unit test suites (3.5, 3.6, 3.7, 3.8) run in parallel on the same machine, so if we add the tests to 3.8 only, total precommit time will increase to a similar extent as if we ran the test in all precommits.
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj merged pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
chamikaramj merged pull request #12145:
URL: https://github.com/apache/beam/pull/12145


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658270493


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660347968


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655525439


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655751328


   Run XVR_Flink PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655539531


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654333554


   @TheNeuralBit For some reason "Flink CrossLanguageValidatesRunner Tests" timeouts on xlang_jdbc test and on the other hand the same test passes on Python precommit suite.
   
   At first it didn't see the URN as it wasn't added to the test expansion service. But now I have no idea what's going on, everything seems to be like in generate_sequence but for some reason my test gets hanged forever.
   
   I changed the test to use EXPANSION_PORT when available because Postcommit suites run test expansion service and expects the tests to use  it.
   
   Do you have any idea what could be the reason or how to debug it? 
   I don't seem to be able to replicate :runners:flink:1.10:job-server:validatesCrossLanguageRunnerPythonUsingJava locally, every test crashes when I run
   ```
   ./gradlew :runners:flink:1.10:job-server:validatesCrossLanguageRunner -PpythonVersion=3.7 -Pdocker-pull-licenses -Dorg.gradle.jvmargs=-Xmx4g -Dorg.gradle.jvmargs=-Xms2g --max-workers=12 --continue
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657505727






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite. I'm able to do so locally as well. It's strange that on your setup you encountered a timeout as on Jenkins everything goes as expected.
   
   The problem was in validatesCrossLanguageRunner task. It has more complicated setup (run flink/spark job-service, then run test expansion service, use PortableRunner) and maybe something in there caused the problem. But even though it passed locally on my setup (even with fresh Beam repo) so I wasn't able to reconstruct this issue. I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660463631


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660280966


   > Sorry about not being clear. When I said regular PostCommit I did not mean Python 2. We should definitely run this in Python 3 post-commits. Let's add it back to Python 3 and check console log to make sure that the tasks run as Valentyn mentioned. A separate task will be to export logs to xml test reports for easy visibility.
   
   They won't execute on python 2 because there are skips when there are no packages installed. I was able to remove the gradle task and run the test in `:sdks:python:test-suites:portable:py38:postCommitPy38`. It should publish the test results by default, so maybe it's worth to run it there? The only disadvantage is that `:sdks:java:container:docker` will run on all postcommit python jobs.
   
   So please decide:
   1. leave the separate task and try to publish the test results
   2. leave it in the postcommitIT
   
   A concrete decision will make it much easier for me :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
TheNeuralBit edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656790677


   Why are they taking ~10min? When I run them locally they're quite fast


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660264396


   > Hmm, any idea why it's not listed here ? https://ci-beam.apache.org/job/beam_PostCommit_Python37_PR/206/testReport/
   
   It's for the same reason why Kafka's cross-language tests are not listed in python 3.8 postcommit: https://ci-beam.apache.org/job/beam_PostCommit_Python38_PR/21/testReport/
   
   They run from a separate task, not from the postCommitIT which runs. It could probably be run via that task, could I add it as a task in Jira and look after it in a different PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655506061


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660269701


   You should be able to see task execution in console logs. To see meaningful signals in Jenkins test report,  logs need to be exported into xml test reports, like: https://github.com/apache/beam/blob/c8d21625235bd3f4d685383ebc4a04f6f5ace787/sdks/python/scripts/run_integration_test.sh#L276


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657410328


   @chamikaramj @TheNeuralBit  I've made the requested changes. Also, I found an utility in JdbcUtil that is able to call proper set method on PreparedStatement, so I think there is no need to test it separately anymore (there already is a test of this utility). I put BeamRowPreparedStatementSetter to JdbcUtil. I had to make the util class public instead of package-private because I wanted to have external classes in separate package.
   
   I'll publish the external builder test after this PR is merged.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660367420


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-653907264


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655122687


   Run XVR_Spark PostCommitRun XVR_Spark PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r452418029



##########
File path: CHANGES.md
##########
@@ -78,6 +78,8 @@
   is experimental. It reads data from BigQuery by exporting data to Avro files, and reading those files. It also supports
   reading data by exporting to JSON files. This has small differences in behavior for Time and Date-related fields. See
   Pydoc for more information.
+* Add cross-language support to JdbcIO.ReadRows([BEAM-10135](https://issues.apache.org/jira/browse/BEAM-10135)).
+* Add cross-language support to JdbcIO.Write([BEAM-10136](https://issues.apache.org/jira/browse/BEAM-10136)).

Review comment:
       Looks like this will be in 2.24.0

##########
File path: sdks/java/container/boot.go
##########
@@ -122,6 +122,7 @@ func main() {
 		filepath.Join(jarsDir, "beam-sdks-java-harness.jar"),
 		filepath.Join(jarsDir, "beam-sdks-java-io-kafka.jar"),
 		filepath.Join(jarsDir, "kafka-clients.jar"),
+		filepath.Join(jarsDir, "beam-sdks-java-io-jdbc.jar"),

Review comment:
       I think this change and the changes in sdks/java/container/build.gradle are no longer necessary (like the Dockerfile one from here: https://github.com/apache/beam/pull/12022#discussion_r447186177). I think they were required before we had artifact staging.
   
   

##########
File path: build.gradle
##########
@@ -273,6 +273,7 @@ task python37PostCommit() {
 
 task python38PostCommit() {
   dependsOn ":sdks:python:test-suites:portable:py38:crossLanguagePythonJavaKafkaIOFlink"
+  dependsOn ":sdks:python:test-suites:portable:py38:crossLanguagePythonJavaJdbcIO"

Review comment:
       Ah sorry, I see now that this is how you know it's running on Flink and Spark :)

##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc
+  transforms. This option is only available for Beam 2.22.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+    and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Jdbc transforms use the
+  'beam-sdks-java-io-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.
+  * Update your pipeline to provide the expansion service address when
+    initiating Jdbc transforms provided in this module.
+
+  Flink Users can use the built-in Expansion Service of the Flink Runner's
+  Job Server. If you start Flink's Job Server, the expansion service will be
+  started on port 8097. For a different address, please set the
+  expansion_service parameter.
+
+  **More information**
+
+  For more information regarding cross-language transforms see:
+  - https://beam.apache.org/roadmap/portability/
+
+  For more information specific to Flink runner see:
+  - https://beam.apache.org/documentation/runners/flink/
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import typing
+
+from past.builtins import unicode
+
+from apache_beam.transforms.external import BeamJarExpansionService
+from apache_beam.transforms.external import ExternalTransform
+from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder
+
+__all__ = [
+    'WriteToJdbc',
+    'ReadFromJdbc',
+]
+
+
+def default_io_expansion_service():
+  return BeamJarExpansionService('sdks:java:io:expansion-service:shadowJar')
+
+
+WriteToJdbcSchema = typing.NamedTuple(
+    'WriteToJdbcSchema',
+    [
+        ('driver_class_name', unicode),
+        ('jdbc_url', unicode),
+        ('username', unicode),
+        ('password', unicode),
+        ('connection_properties', typing.Optional[unicode]),
+        ('connection_init_sqls', typing.Optional[typing.List[unicode]]),
+        ('statement', unicode),
+    ],
+)
+
+
+class WriteToJdbc(ExternalTransform):
+  """An external PTransform which writes Rows to the specified database.

Review comment:
       ```suggestion
     """A PTransform which writes Rows to the specified database via JDBC.
   ```

##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc
+  transforms. This option is only available for Beam 2.22.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+    and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Jdbc transforms use the
+  'beam-sdks-java-io-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.
+  * Update your pipeline to provide the expansion service address when
+    initiating Jdbc transforms provided in this module.
+
+  Flink Users can use the built-in Expansion Service of the Flink Runner's
+  Job Server. If you start Flink's Job Server, the expansion service will be
+  started on port 8097. For a different address, please set the
+  expansion_service parameter.
+
+  **More information**
+
+  For more information regarding cross-language transforms see:
+  - https://beam.apache.org/roadmap/portability/
+
+  For more information specific to Flink runner see:
+  - https://beam.apache.org/documentation/runners/flink/

Review comment:
       This looks great, thanks!

##########
File path: sdks/java/io/jdbc/build.gradle
##########
@@ -25,6 +25,7 @@ description = "Apache Beam :: SDKs :: Java :: IO :: JDBC"
 ext.summary = "IO to read and write on JDBC datasource."
 
 dependencies {
+  compile library.java.postgres

Review comment:
       Could you add this dependency in :sdks:java:io:expansion-service instead? I think that would have the same effect, but saves users who don't need it from pulling it in.

##########
File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##########
@@ -1982,17 +1982,25 @@ class BeamModulePlugin implements Plugin<Project> {
         return argList.join(' ')
       }
 
-      project.ext.toxTask = { name, tox_env ->
+      project.ext.toxTask = { name, tox_env, needsExpansionServiceJar = true ->

Review comment:
       wouldn't it be preferable to make false the default? I'd think most tox tasks do not need this.

##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc
+  transforms. This option is only available for Beam 2.22.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+    and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Jdbc transforms use the
+  'beam-sdks-java-io-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.
+  * Update your pipeline to provide the expansion service address when
+    initiating Jdbc transforms provided in this module.
+
+  Flink Users can use the built-in Expansion Service of the Flink Runner's
+  Job Server. If you start Flink's Job Server, the expansion service will be
+  started on port 8097. For a different address, please set the
+  expansion_service parameter.
+
+  **More information**
+
+  For more information regarding cross-language transforms see:
+  - https://beam.apache.org/roadmap/portability/
+
+  For more information specific to Flink runner see:
+  - https://beam.apache.org/documentation/runners/flink/
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import typing
+
+from past.builtins import unicode
+
+from apache_beam.transforms.external import BeamJarExpansionService
+from apache_beam.transforms.external import ExternalTransform
+from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder
+
+__all__ = [
+    'WriteToJdbc',
+    'ReadFromJdbc',
+]
+
+
+def default_io_expansion_service():
+  return BeamJarExpansionService('sdks:java:io:expansion-service:shadowJar')
+
+
+WriteToJdbcSchema = typing.NamedTuple(
+    'WriteToJdbcSchema',
+    [
+        ('driver_class_name', unicode),
+        ('jdbc_url', unicode),
+        ('username', unicode),
+        ('password', unicode),
+        ('connection_properties', typing.Optional[unicode]),
+        ('connection_init_sqls', typing.Optional[typing.List[unicode]]),
+        ('statement', unicode),
+    ],
+)
+
+
+class WriteToJdbc(ExternalTransform):
+  """An external PTransform which writes Rows to the specified database.
+
+  This transform receives Rows defined as NamedTuple type and registered in
+  the coders registry, e.g.::
+
+    ExampleRow = typing.NamedTuple('ExampleRow',
+                                   [('id', int), ('name', unicode)])
+    coders.registry.register_coder(ExampleRow, coders.RowCoder)
+
+  An example can be found in
+  `apache_beam.examples.xlang_jdbcio_it_test`
+  """
+
+  URN = 'beam:external:java:jdbc:write:v1'
+
+  def __init__(
+      self,
+      driver_class_name,
+      jdbc_url,
+      username,
+      password,
+      statement,
+      connection_properties=None,
+      connection_init_sqls=None,
+      expansion_service=None,
+  ):
+    """
+    Initializes a write operation to Jdbc.
+
+    :param driver_class_name: name of the jdbc driver class
+    :param jdbc_url: full jdbc url to the database.
+    :param username: database username
+    :param password: database password
+    :param statement: sql statement to be executed
+    :param connection_properties: properties of the jdbc connection
+                                  passed as string with format
+                                  [propertyName=property;]*
+    :param connection_init_sqls: required only for MySql and MariaDB.
+                                 passed as list of strings
+    :param expansion_service: The address (host:port) of the ExpansionService.
+    """
+
+    super(WriteToJdbc, self).__init__(
+        self.URN,
+        NamedTupleBasedPayloadBuilder(
+            WriteToJdbcSchema(
+                driver_class_name=driver_class_name,
+                jdbc_url=jdbc_url,
+                username=username,
+                password=password,
+                statement=statement,
+                connection_properties=connection_properties,
+                connection_init_sqls=connection_init_sqls,
+            ),
+        ),
+        expansion_service or default_io_expansion_service(),
+    )
+
+
+ReadFromJdbcSchema = typing.NamedTuple(
+    'ReadFromJdbcSchema',
+    [
+        ('driver_class_name', unicode),
+        ('jdbc_url', unicode),
+        ('username', unicode),
+        ('password', unicode),
+        ('connection_properties', typing.Optional[unicode]),
+        ('connection_init_sqls', typing.Optional[typing.List[unicode]]),
+        ('query', unicode),
+        ('fetch_size', typing.Optional[int]),
+        ('output_parallelization', typing.Optional[bool]),
+    ],
+)
+
+
+class ReadFromJdbc(ExternalTransform):
+  """An external PTransform which reads Rows from the specified database.

Review comment:
       ```suggestion
     """A PTransform which reads Rows from the specified database via JDBC.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654333554


   @TheNeuralBit For some reason "Flink CrossLanguageValidatesRunner Tests" timeouts on xlang_jdbc test and on the other hand the same test passes on Python precommit suite.
   
   At first it didn't see the URN as it wasn't added to the test expansion service. But now I have no idea what's going on, everything seems to be like in generate_sequence but for some reason my test gets hanged forever.
   
   I changed the test to use EXPANSION_PORT when available because Postcommit suites run test expansion service and expects the tests to use  it.
   
   Do you have any idea what could be the reason or how to debug it? 
   I don't seem to be able to replicate :runners:flink:1.10:job-server:validatesCrossLanguageRunnerPythonUsingJava locally, every test crashes when I run
   ```
   ./gradlew :runners:flink:1.10:job-server:validatesCrossLanguageRunner -PpythonVersion=3.7 -Dorg.gradle.jvmargs=-Xmx4g -Dorg.gradle.jvmargs=-Xms2g --max-workers=12 --continue
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453058828



##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc
+  transforms. This option is only available for Beam 2.22.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+    and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Jdbc transforms use the
+  'beam-sdks-java-io-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.
+  * Update your pipeline to provide the expansion service address when
+    initiating Jdbc transforms provided in this module.
+
+  Flink Users can use the built-in Expansion Service of the Flink Runner's
+  Job Server. If you start Flink's Job Server, the expansion service will be
+  started on port 8097. For a different address, please set the
+  expansion_service parameter.
+
+  **More information**
+
+  For more information regarding cross-language transforms see:
+  - https://beam.apache.org/roadmap/portability/
+
+  For more information specific to Flink runner see:
+  - https://beam.apache.org/documentation/runners/flink/
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import typing
+
+from past.builtins import unicode
+
+from apache_beam.transforms.external import BeamJarExpansionService
+from apache_beam.transforms.external import ExternalTransform
+from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder
+
+__all__ = [
+    'WriteToJdbc',
+    'ReadFromJdbc',
+]
+
+
+def default_io_expansion_service():
+  return BeamJarExpansionService('sdks:java:io:expansion-service:shadowJar')
+
+
+WriteToJdbcSchema = typing.NamedTuple(
+    'WriteToJdbcSchema',
+    [
+        ('driver_class_name', unicode),
+        ('jdbc_url', unicode),
+        ('username', unicode),
+        ('password', unicode),
+        ('connection_properties', typing.Optional[unicode]),
+        ('connection_init_sqls', typing.Optional[typing.List[unicode]]),
+        ('statement', unicode),
+    ],
+)
+
+
+class WriteToJdbc(ExternalTransform):
+  """A PTransform which writes Rows to the specified database via JDBC.
+
+  This transform receives Rows defined as NamedTuple type and registered in
+  the coders registry, e.g.::
+
+    ExampleRow = typing.NamedTuple('ExampleRow',
+                                   [('id', int), ('name', unicode)])
+    coders.registry.register_coder(ExampleRow, coders.RowCoder)
+
+  An example can be found in

Review comment:
       No, the example leads to the integration test which is a quite straightforward example of using this transform.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660367362


   Run Python 3.5 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655622733


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite. I'm able to do so (with `--test-pipeline-options="--runner=(Flink/Spark)Runner"` locally as well. It's strange that on your setup you encountered a timeout as on Jenkins everything goes as expected.
   
   I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453161777



##########
File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##########
@@ -1982,17 +1982,25 @@ class BeamModulePlugin implements Plugin<Project> {
         return argList.join(' ')
       }
 
-      project.ext.toxTask = { name, tox_env ->
+      project.ext.toxTask = { name, tox_env, needsExpansionServiceJar = false ->
+        project.evaluationDependsOn(":sdks:java:io:expansion-service")
         project.tasks.create(name) {
           dependsOn 'setupVirtualenv'
           dependsOn ':sdks:python:sdist'
+          if (needsExpansionServiceJar) {
+            dependsOn ':sdks:java:io:expansion-service:shadowJar'
+          }
 
           doLast {
             // Python source directory is also tox execution workspace, We want
             // to isolate them per tox suite to avoid conflict when running
             // multiple tox suites in parallel.
             project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
-
+            if (needsExpansionServiceJar) {
+              def expansionServiceJar =  project.project(':sdks:java:io:expansion-service').shadowJar.archivePath
+              def expansionServiceDestinationDir = "${copiedSrcRoot}/sdks/java/io/expansion-service/build/libs"

Review comment:
       Sorry, I didn't read carefully. Tox suites use separate environment and BeamJarExpansionService('sdks:java:io:expansion-service:shadowJar') cannot find the jar unless it is not copied into the tox suite's root path




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660342030


   Run Python 2 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660121300


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658881720


   @chamikaramj ping


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r453067876



##########
File path: sdks/python/apache_beam/io/external/jdbc.py
##########
@@ -0,0 +1,254 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Jdbc in Python pipelines.
+
+  These transforms are currently supported by Beam portable runners (for
+  example, portable Flink and Spark) as well as Dataflow runner.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Jdbc transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Jdbc
+  transforms. This option is only available for Beam 2.22.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+    and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Jdbc transforms use the
+  'beam-sdks-java-io-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.
+  * Update your pipeline to provide the expansion service address when
+    initiating Jdbc transforms provided in this module.
+
+  Flink Users can use the built-in Expansion Service of the Flink Runner's
+  Job Server. If you start Flink's Job Server, the expansion service will be
+  started on port 8097. For a different address, please set the
+  expansion_service parameter.
+
+  **More information**
+
+  For more information regarding cross-language transforms see:
+  - https://beam.apache.org/roadmap/portability/
+
+  For more information specific to Flink runner see:
+  - https://beam.apache.org/documentation/runners/flink/
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import typing
+
+from past.builtins import unicode
+
+from apache_beam.transforms.external import BeamJarExpansionService
+from apache_beam.transforms.external import ExternalTransform
+from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder
+
+__all__ = [
+    'WriteToJdbc',
+    'ReadFromJdbc',
+]
+
+
+def default_io_expansion_service():
+  return BeamJarExpansionService('sdks:java:io:expansion-service:shadowJar')
+
+
+WriteToJdbcSchema = typing.NamedTuple(
+    'WriteToJdbcSchema',
+    [
+        ('driver_class_name', unicode),
+        ('jdbc_url', unicode),
+        ('username', unicode),
+        ('password', unicode),
+        ('connection_properties', typing.Optional[unicode]),
+        ('connection_init_sqls', typing.Optional[typing.List[unicode]]),
+        ('statement', unicode),
+    ],
+)
+
+
+class WriteToJdbc(ExternalTransform):
+  """A PTransform which writes Rows to the specified database via JDBC.
+
+  This transform receives Rows defined as NamedTuple type and registered in
+  the coders registry, e.g.::
+
+    ExampleRow = typing.NamedTuple('ExampleRow',
+                                   [('id', int), ('name', unicode)])
+    coders.registry.register_coder(ExampleRow, coders.RowCoder)
+
+  An example can be found in
+  `apache_beam.examples.xlang_jdbcio_it_test`
+  """
+
+  URN = 'beam:external:java:jdbc:write:v1'
+
+  def __init__(
+      self,
+      driver_class_name,
+      jdbc_url,
+      username,
+      password,
+      statement,
+      connection_properties=None,
+      connection_init_sqls=None,
+      expansion_service=None,
+  ):
+    """
+    Initializes a write operation to Jdbc.
+
+    :param driver_class_name: name of the jdbc driver class
+    :param jdbc_url: full jdbc url to the database.
+    :param username: database username
+    :param password: database password
+    :param statement: sql statement to be executed
+    :param connection_properties: properties of the jdbc connection
+                                  passed as string with format
+                                  [propertyName=property;]*
+    :param connection_init_sqls: required only for MySql and MariaDB.
+                                 passed as list of strings
+    :param expansion_service: The address (host:port) of the ExpansionService.
+    """
+
+    super(WriteToJdbc, self).__init__(
+        self.URN,
+        NamedTupleBasedPayloadBuilder(
+            WriteToJdbcSchema(
+                driver_class_name=driver_class_name,
+                jdbc_url=jdbc_url,
+                username=username,
+                password=password,
+                statement=statement,
+                connection_properties=connection_properties,
+                connection_init_sqls=connection_init_sqls,
+            ),
+        ),
+        expansion_service or default_io_expansion_service(),
+    )
+
+
+ReadFromJdbcSchema = typing.NamedTuple(
+    'ReadFromJdbcSchema',
+    [
+        ('driver_class_name', unicode),
+        ('jdbc_url', unicode),
+        ('username', unicode),
+        ('password', unicode),
+        ('connection_properties', typing.Optional[unicode]),
+        ('connection_init_sqls', typing.Optional[typing.List[unicode]]),
+        ('query', unicode),
+        ('fetch_size', typing.Optional[int]),
+        ('output_parallelization', typing.Optional[bool]),
+    ],
+)
+
+
+class ReadFromJdbc(ExternalTransform):
+  """A PTransform which reads Rows from the specified database via JDBC.
+
+  This transform delivers Rows defined as NamedTuple registered in
+  the coders registry, e.g.::
+
+    ExampleRow = typing.NamedTuple('ExampleRow',
+                                   [('id', int), ('name', unicode)])
+    coders.registry.register_coder(ExampleRow, coders.RowCoder)

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660266528


   > Can you please add it to regular post-commit please ? There's no point in adding an IT that we cannot easily track. I don't think following the path of Kafka IT is good since it's not run anywhere and not tracked.
   
   It is run in python 3.8 postcommit. But I see the point, I'll try to run it in regular post-commit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658286950


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-656784166


   @TheNeuralBit @chamikaramj BTW the BigQuery tests (BigQueryWriteIntegrationTests.test_big_query_write_new_types, BigQueryQueryToTableIT.test_big_query_new_types_native and few others) in Python Postcommit are flaky enough to cause this job fail more often than pass. Is there a jira ticket for it?
   
   Other thing that disturbs me is that those JdbcIO python tests run ~10min on Python Precommit. I think this is ok now but when other xlang transforms appear running in precommit then it will be a serious problem - they'll probably need a separate precommit or something.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12145:
URL: https://github.com/apache/beam/pull/12145#discussion_r456206985



##########
File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##########
@@ -1982,17 +1982,25 @@ class BeamModulePlugin implements Plugin<Project> {
         return argList.join(' ')
       }
 
-      project.ext.toxTask = { name, tox_env ->
+      project.ext.toxTask = { name, tox_env, needsExpansionServiceJar = false ->
+        project.evaluationDependsOn(":sdks:java:io:expansion-service")
         project.tasks.create(name) {
           dependsOn 'setupVirtualenv'
           dependsOn ':sdks:python:sdist'
+          if (needsExpansionServiceJar) {
+            dependsOn ':sdks:java:io:expansion-service:shadowJar'
+          }
 
           doLast {
             // Python source directory is also tox execution workspace, We want
             // to isolate them per tox suite to avoid conflict when running
             // multiple tox suites in parallel.
             project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
-
+            if (needsExpansionServiceJar) {
+              def expansionServiceJar =  project.project(':sdks:java:io:expansion-service').shadowJar.archivePath
+              def expansionServiceDestinationDir = "${copiedSrcRoot}/sdks/java/io/expansion-service/build/libs"

Review comment:
       Why to we need a jar in tox (unit test) suite?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655321777


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-657471610


   Run Python 3.8 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-658282661


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-652720997


   > Unfortunately the python test is taken by tox precommit suites - I suppose it shouldn't?
   
   I think that's actually ok. The precommit will uses the fn api runner, which supports cross-language transforms. I haven't been able to run SqlTransform with the fn api runner yet though (BEAM-10010), which is why I still have it disabled in sql_test. 
   
   If JdbcIO is also having trouble running on the fn api runner you could disable it like I did in sql_test, as long as we're still testing continuously with XVR_Flink and XVR_Spark, but we should file a jira for JdbcIO on fn api runner and track down the cause.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-654860333






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655481902


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-653926713


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-660336934


   >     1. leave the separate task and try to publish the test results
   > 
   >     2. leave it in the postcommitIT
   > 
   > 
   > Either is fine by me. We just have to make sure that the IT you add here is successfully running in at least one active post-commit test suite and we can detect/track failures. Based on what you said, seems like it'll be running in "Python 3.8 PostCommit".
   
   Ok, it's already visible in postcommit tests results


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

Posted by GitBox <gi...@apache.org>.
piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655748822


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org