You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (Jira)" <ji...@apache.org> on 2020/09/17 22:18:00 UTC

[jira] [Reopened] (BEAM-7463) BigQueryQueryToTableIT is flaky on Direct runner in PostCommit suites: incorrect checksum

     [ https://issues.apache.org/jira/browse/BEAM-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Valentyn Tymofieiev reopened BEAM-7463:
---------------------------------------

This is still happening and is one of the top source of flakiness in postcommits.
Reopening in case any of the prior work provides useful context.

 Sample: https://ci-beam.apache.org/job/beam_PostCommit_Python37/2805/

{noformat}
Error Message
Expected: (Test pipeline expected terminated in state: DONE and Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214)
     but: Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is da39a3ee5e6b4b0d3255bfef95601890afd80709

-------------------- >> begin captured logging << --------------------
apache_beam.runners.direct.direct_runner: INFO: Running pipeline with DirectRunner.
apache_beam.io.gcp.bigquery_tools: INFO: Using location 'US' from table <TableReference
 datasetId: 'python_query_to_table_15998907212925'
 projectId: 'apache-beam-testing'
 tableId: 'python_new_types_table'> referenced by query SELECT bytes, date, time FROM [python_query_to_table_15998907212925.python_new_types_table]
apache_beam.io.gcp.bigquery_tools: WARNING: Dataset apache-beam-testing:temp_dataset_17c640c2cdb346ea8955d86ad3e60786 does not exist so we will create it as temporary with location=US
urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): metadata:80
urllib3.connectionpool: DEBUG: http://metadata:80 "GET /computeMetadata/v1/instance/attributes/job_id HTTP/1.1" 404 1606
apache_beam.io.gcp.bigquery_tools: DEBUG: Created the table with id output_table
apache_beam.io.gcp.bigquery_tools: INFO: Created table apache-beam-testing.python_query_to_table_15998907212925.output_table with schema <TableSchema
 fields: [<TableFieldSchema
 fields: []
 mode: 'NULLABLE'
 name: 'bytes'
 type: 'BYTES'>, <TableFieldSchema
 fields: []
 mode: 'NULLABLE'
 name: 'date'
 type: 'DATE'>, <TableFieldSchema
 fields: []
 mode: 'NULLABLE'
 name: 'time'
 type: 'TIME'>]>. Result: <Table
 creationTime: 1599890727293
 etag: 'tQbsuXVPNUn0ygeb3OnTug=='
 id: 'apache-beam-testing:python_query_to_table_15998907212925.output_table'
 kind: 'bigquery#table'
 lastModifiedTime: 1599890727421
 location: 'US'
 numBytes: 0
 numLongTermBytes: 0
 numRows: 0
 schema: <TableSchema
 fields: [<TableFieldSchema
 fields: []
 mode: 'NULLABLE'
 name: 'bytes'
 type: 'BYTES'>, <TableFieldSchema
 fields: []
 mode: 'NULLABLE'
 name: 'date'
 type: 'DATE'>, <TableFieldSchema
 fields: []
 mode: 'NULLABLE'
 name: 'time'
 type: 'TIME'>]>
 selfLink: 'https://bigquery.googleapis.com/bigquery/v2/projects/apache-beam-testing/datasets/python_query_to_table_15998907212925/tables/output_table'
 tableReference: <TableReference
 datasetId: 'python_query_to_table_15998907212925'
 projectId: 'apache-beam-testing'
 tableId: 'output_table'>
 type: 'TABLE'>.
apache_beam.io.gcp.bigquery_tools: INFO: Writing 4 rows to apache-beam-testing:python_query_to_table_15998907212925.output_table table.
apache_beam.io.gcp.tests.bigquery_matcher: INFO: Attempting to perform query SELECT bytes, date, time FROM `python_query_to_table_15998907212925.output_table`; to BQ
google.auth._default: DEBUG: Checking None for explicit credentials as part of auth process...
google.auth._default: DEBUG: Checking Cloud SDK credentials as part of auth process...
google.auth._default: DEBUG: Cloud SDK credentials not found on disk; not using them
google.auth._default: DEBUG: Checking for App Engine runtime as part of auth process...
google.auth._default: DEBUG: No App Engine library was found so cannot authentication via App Engine Identity Credentials.
google.auth.transport._http_client: DEBUG: Making request: GET http://169.254.169.254
google.auth.transport._http_client: DEBUG: Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
urllib3.util.retry: DEBUG: Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
google.auth.transport.requests: DEBUG: Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): metadata.google.internal:80
urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 144
google.auth.transport.requests: DEBUG: Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-compute@developer.gserviceaccount.com/token
urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/844138762903-compute@developer.gserviceaccount.com/token HTTP/1.1" 200 221
urllib3.connectionpool: DEBUG: Starting new HTTPS connection (1): bigquery.googleapis.com:443
urllib3.connectionpool: DEBUG: https://bigquery.googleapis.com:443 "POST /bigquery/v2/projects/apache-beam-testing/jobs HTTP/1.1" 200 None
urllib3.connectionpool: DEBUG: https://bigquery.googleapis.com:443 "GET /bigquery/v2/projects/apache-beam-testing/queries/ccb82856-c480-43ea-a4f9-1120813ebdaf?maxResults=0&location=US HTTP/1.1" 200 None
urllib3.connectionpool: DEBUG: https://bigquery.googleapis.com:443 "GET /bigquery/v2/projects/apache-beam-testing/datasets/_7357fab0f784d2a7327ddbe81cdd1f4ca7e429cd/tables/anona6eb3fbe74d882a4011ceed2377d0a2ca1258820/data HTTP/1.1" 200 None
apache_beam.io.gcp.tests.bigquery_matcher: INFO: Read from given query (SELECT bytes, date, time FROM `python_query_to_table_15998907212925.output_table`;), total rows 0
apache_beam.io.gcp.tests.bigquery_matcher: INFO: Generate checksum: da39a3ee5e6b4b0d3255bfef95601890afd80709
--------------------- >> end captured logging << ---------------------
Stacktrace
  File "/usr/lib/python3.7/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/lib/python3.7/unittest/case.py", line 615, in run
    testMethod()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py", line 310, in test_big_query_new_types_native
    big_query_query_to_table_pipeline.run_bq_pipeline(options)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py", line 113, in run_bq_pipeline
    result = p.run()
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/testing/test_pipeline.py", line 112, in run
    False if self.not_use_test_runner_api else test_runner_api))
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/pipeline.py", line 546, in run
    return self.runner.run_pipeline(self, self._options)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py", line 53, in run_pipeline
    hc_assert_that(self.result, pickler.loads(on_success_matcher))
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py", line 44, in assert_that
    _assert_match(actual=arg1, matcher=arg2, reason=arg3)
  File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py", line 60, in _assert_match
    raise AssertionError(description)

Expected: (Test pipeline expected terminated in state: DONE and Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214)
     but: Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is da39a3ee5e6b4b0d3255bfef95601890afd80709

{noformat}

> BigQueryQueryToTableIT is flaky on Direct runner in PostCommit suites: incorrect checksum 
> ------------------------------------------------------------------------------------------
>
>                 Key: BEAM-7463
>                 URL: https://issues.apache.org/jira/browse/BEAM-7463
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>            Reporter: Valentyn Tymofieiev
>            Assignee: Udi Meiri
>            Priority: P2
>              Labels: currently-failing
>             Fix For: Not applicable
>
>          Time Spent: 6h
>  Remaining Estimate: 0h
>
> {noformat}
> 15:03:38 FAIL: test_big_query_new_types (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
> 15:03:38 ----------------------------------------------------------------------
> 15:03:38 Traceback (most recent call last):
> 15:03:38   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py", line 211, in test_big_query_new_types
> 15:03:38     big_query_query_to_table_pipeline.run_bq_pipeline(options)
> 15:03:38   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py", line 82, in run_bq_pipeline
> 15:03:38     result = p.run()
> 15:03:38   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py", line 107, in run
> 15:03:38     else test_runner_api))
> 15:03:38   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", line 406, in run
> 15:03:38     self._options).run(False)
> 15:03:38   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", line 419, in run
> 15:03:38     return self.runner.run_pipeline(self, self._options)
> 15:03:38   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py", line 51, in run_pipeline
> 15:03:38     hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 15:03:38 AssertionError: 
> 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214)
> 15:03:38      but: Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is da39a3ee5e6b4b0d3255bfef95601890afd80709
> {noformat}
> [~Juta] could this be caused by changes to Bigquery matcher? https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134 
> cc: [~pabloem] [~chamikara] [~apilloud]
> A recent postcommit run has BQ failures in other tests as well: https://builds.apache.org/job/beam_PostCommit_Python3_Verify/1000/consoleFull



--
This message was sent by Atlassian Jira
(v8.3.4#803005)