You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/09 12:43:43 UTC

[GitHub] [airflow] gemelen opened a new issue, #26952: BigQuery job fails due on missing dataset while it is present

gemelen opened a new issue, #26952:
URL: https://github.com/apache/airflow/issues/26952

   ### Apache Airflow Provider(s)
   
   google
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-google==2022.9.6+composer
   apache-airflow-providers-google==8.1.0
   
   ### Apache Airflow version
   
   2.2.3 | 2.3.3
   
   ### Operating System
   
   Managed
   
   ### Deployment
   
   Composer
   
   ### Deployment details
   
   apache-airflow-providers-google==2022.9.6+composer (provided by composer-2.0.28-airflow-2.2.5)
   apache-airflow-providers-google==8.1.0 (provided by composer-2.0.28-airflow-2.3.3)
   
   Cluster details are identical between versions.
   
   Data that DAG/jobs are processing are identical between versions.
   
   ### What happened
   
   My original DAG consists of a list of BigQueryInsertJobOperator, each of them is essentially running `select * from <source dataset>.<source table> where <field> = <value from list>` and inserting result records into <destination dataset>.<destination table>, where `<destination dataset> = <source table>'s name` and `<destination table> = <source table>_<field value>`.
   Operator is configured in a next way:
   
   ```
   bigquery.BigQueryInsertJobOperator(
   ...
           configuration = {
               'query': {
                   'useLegacySql': False,
                   'priority': 'BATCH',
                   'createDisposition': 'CREATE_IF_NEEDED',
                   'writeDisposition': 'WRITE_TRUNCATE',
                   'destinationTable': {
                       'projectId': GCP_PROJECT,
                       'datasetId': table,
                       'tableId': f'{table}_{business_id}'
                   },
                   'defaultDataset': {
                       'projectId': GCP_PROJECT,
                       'datasetId': DATASET,
                   }, 
                   'query': f'select * from {table} where business_id = {business_id};'
               },
           }
       )
   ```
   
   In an environment of Composer `composer-2.0.28-airflow-2.2.5` (ie with a provided lib of version `2022.9.6+composer`) it works as intended (job is inserted and executed).
   In an environment of Composer `composer-2.0.28-airflow-2.3.3` (ie with a provided lib of version `8.1.0`) each job is inserted but fails with an error `"Dataset was not found in location"` (for the same set of datasets and tables).
   
   The same job inserted via BigQuery REST API executed successfully too.
   
   ### What you think should happen instead
   
   As described above, job should succeed.
   
   ### How to reproduce
   
   _No response_
   
   ### Anything else
   
   Error occurs for every run, and most likely is due to the changes in dataset/table reference, so in a latter version (from versions above) its reference incorrectly (ie using wrong separator, for example).
   
   As a result, this issue prevents from upgrade to the version of 8.1.0 (bundled as a default for a new version of Composer).
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] gemelen closed issue #26952: BigQuery job fails due on missing dataset while it is present

Posted by GitBox <gi...@apache.org>.
gemelen closed issue #26952: BigQuery job fails due on missing dataset while it is present
URL: https://github.com/apache/airflow/issues/26952


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #26952: BigQuery job fails due on missing dataset while it is present

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #26952:
URL: https://github.com/apache/airflow/issues/26952#issuecomment-1272534649

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #26952: BigQuery job fails due on missing dataset while it is present

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #26952:
URL: https://github.com/apache/airflow/issues/26952#issuecomment-1275842890

   Hi @gemelen,
   Welcome to Apache-Airflow repo.
   This report contains many details related to Composer and very few details related to Airflow.
   We are not cloud composer. we know nothing about their versions and deployment.
   
   If you believe you found a bug with Airflow source code please be specific what is the bug. Which version broke it / PR that introduced a breaking change? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #26952: BigQuery job fails due on missing dataset while it is present

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #26952:
URL: https://github.com/apache/airflow/issues/26952#issuecomment-1277243648

   I did not mean to offend you.
   
   The bug report asks for Airflow provider version. You wrote:
   ```
   Versions of Apache Airflow Providers
   apache-airflow-providers-google==2022.9.6+composer
   apache-airflow-providers-google==8.1.0
   ```
   
   I for once have no idea what `apache-airflow-providers-google==2022.9.6+composer` is. This is probably some backport of the open source code to composer system and i have no idea of which Airflow version nor what changes Composer made there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #26952: BigQuery job fails due on missing dataset while it is present

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #26952:
URL: https://github.com/apache/airflow/issues/26952#issuecomment-1290753627

   Yep. I did not find anything aggressive or offending in @eladkal comment. You are using Composer @gemelen and you pay for support there, Airflow is free software that is delivered on as-is basis without making any promises, and people here provide help in their free time and I think you should be respectful of that when they are politely redirecting you back to composer team for stuff that is clearly composer (and they were even nice enough to diagnose it and ask for extra information).
   
   I think exercising empathy is important in open-source. You can watch my talk about it from Airflow Summit - it can help you understand you how things work in open-source https://www.youtube.com/watch?v=G6VjYvKr2wQ&list=PLGudixcDaxY2LxjeHpZRtzq7miykjjFOn&index=55
   
   And please don't treat it personally as an attack or something - this is merely to help people like you who are used to getting pad support and some klnd of SLAs and Helpdesk kind of experience, undersand that in open-source things are quite a bit different. This is not clear first time when you get involved in the Open Source Community.
   
   I hope this will help to set the expectations right. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #26952: BigQuery job fails due on missing dataset while it is present

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #26952:
URL: https://github.com/apache/airflow/issues/26952#issuecomment-1277274344

   Since we don’t have access to the Google forks and understand how it’s different from the upstream provider implementation, there’s not much we can do to resolve the issue. You’ll need to either find a way to reproduce this with the non-Google provider, or ask Google for support instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] gemelen commented on issue #26952: BigQuery job fails due on missing dataset while it is present

Posted by GitBox <gi...@apache.org>.
gemelen commented on issue #26952:
URL: https://github.com/apache/airflow/issues/26952#issuecomment-1275875889

   @eladkal I found your comment inadequately aggressive.
   
   This repository holds a module `apache-airflow-provides-google` which is an area of a problem I bumped into. Moreover, problem occurred for me during upgrade to a publicly available version of this library.
   
   If such a case must not be tracked (even to be proved wrong) in this repository, then a bug workflow _must_ not encourage to mention anything cloud-specific at all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] gemelen commented on issue #26952: BigQuery job fails due on missing dataset while it is present

Posted by GitBox <gi...@apache.org>.
gemelen commented on issue #26952:
URL: https://github.com/apache/airflow/issues/26952#issuecomment-1277270850

   `2022.9.6+composer` is based on `6.8.0` which is only stated in a Composer's release notes. 
   It's true that we don't know what's really in there since it's a private fork by Google.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org