You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/05 12:10:47 UTC

[GitHub] [airflow] charan-doxel opened a new issue, #22748: Pyspark Job Operator is failing from airflow

charan-doxel opened a new issue, #22748:
URL: https://github.com/apache/airflow/issues/22748

   ### Apache Airflow version
   
   2.2.4
   
   ### What happened
   
   Using DataprocSubmitPySparkJobOperator from airflow is failing with below error
   
   Broken DAG: [/usr/local/airflow/dags/prod/dag-factory-test.py] Traceback (most recent call last):
     File "/usr/local/airflow/dags/dag_constructor/target_test_dag_constructor.py", line 486, in build
       run_cipo_pipeline = RunCIPOPipeline(
     File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 188, in apply_defaults
       result = func(self, *args, **kwargs)
   TypeError: __init__() got an unexpected keyword argument 'default_args'
   
   ### What you think should happen instead
   
   From the initial debugging, we found that pyspark operator is sending unintendted data to base operator.
   
   ### How to reproduce
   
   Using below code will fail in a dag task
   
   
   class RunPipeline(DataprocSubmitPySparkJobOperator):
       def __init__(self, owner, dag, cluster_name):
           super().__init__(
               main="gs://ml-models/datasets/__main__.py",
               files=["gs://ml-models/datasets/gs-service-creds.json"],
               pyfiles=[
                   "gs://ml-models/datasets/annotation-ml.whl",
               ]
           )
   
   
   ### Operating System
   
   PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux" VERSION_ID="11" VERSION="11 (bullseye)" VERSION_CODENAME=bullseye ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow==1!2.2.4+astro.4
   apache-airflow-providers-amazon==3.0.0
   apache-airflow-providers-cncf-kubernetes==1!3.0.2
   apache-airflow-providers-elasticsearch==1!2.2.0
   apache-airflow-providers-ftp==1!2.0.1
   apache-airflow-providers-google==1!6.4.0
   apache-airflow-providers-http==1!2.0.3
   apache-airflow-providers-imap==1!2.2.0
   apache-airflow-providers-microsoft-azure==1!3.6.0
   apache-airflow-providers-mysql==1!2.2.0
   apache-airflow-providers-postgres==1!3.0.0
   apache-airflow-providers-redis==1!2.0.1
   apache-airflow-providers-slack==4.2.0
   apache-airflow-providers-sqlite==1!2.1.0
   apache-airflow-providers-ssh==1!2.4.0
   
   google-ads==14.0.0
   google-api-core==1.31.5
   google-api-python-client==1.12.10
   google-auth==1.35.0
   google-auth-httplib2==0.1.0
   google-auth-oauthlib==0.4.6
   google-cloud-aiplatform==1.10.0
   google-cloud-appengine-logging==1.1.0
   
   
   google-cloud-audit-log==0.2.0
   google-cloud-automl==2.6.0
   google-cloud-bigquery==2.33.0
   google-cloud-bigquery-datatransfer==3.6.0
   google-cloud-bigquery-storage==2.11.0
   google-cloud-bigtable==1.7.0
   google-cloud-build==3.8.0
   google-cloud-container==1.0.1
   google-cloud-core==1.7.2
   google-cloud-datacatalog==3.6.2
   google-cloud-dataproc==3.2.0
   google-cloud-dataproc-metastore==1.3.1
   google-cloud-dlp==1.0.0
   google-cloud-kms==2.11.0
   google-cloud-language==1.3.0
   google-cloud-logging==2.7.0
   google-cloud-memcache==1.0.0
   google-cloud-monitoring==2.8.0
   google-cloud-orchestration-airflow==1.2.1
   google-cloud-os-login==2.5.1
   google-cloud-pubsub==2.9.0
   google-cloud-redis==2.5.1
   google-cloud-secret-manager==1.0.0
   google-cloud-spanner==1.19.1
   google-cloud-speech==1.3.2
   google-cloud-storage==1.44.0
   google-cloud-tasks==2.7.2
   google-cloud-texttospeech==1.0.1
   google-cloud-translate==1.7.0
   google-cloud-videointelligence==1.16.1
   google-cloud-vision==1.0.0
   google-cloud-workflows==1.5.0
   google-crc32c==1.3.0
   google-resumable-media==2.2.1
   googleapis-common-protos==1.54.0
   graphqlclient==0.2.4
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   scaled out airflow setup with 2 schedulers, 3 workers in GKE
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] diman82 commented on issue #22748: Pyspark Job Operator is failing from airflow

Posted by GitBox <gi...@apache.org>.
diman82 commented on issue #22748:
URL: https://github.com/apache/airflow/issues/22748#issuecomment-1094269403

   I get the very same error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #22748: Pyspark Job Operator is failing from airflow

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #22748:
URL: https://github.com/apache/airflow/issues/22748#issuecomment-1094281928

   This error is on your custom code.
   For the moment there is no indication of a bug.
   If you found a bug that is reproducible in latest main and Google provider please add a full reproduce example that we can run. What you shared is a fragment of code that we can't really run and it seems to be originated from your own custom code.
   
   Should you need support rather than report a bug please use Stackoverflow or [GitHub discussions](https://github.com/apache/airflow/discussions)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal closed issue #22748: Pyspark Job Operator is failing from airflow

Posted by GitBox <gi...@apache.org>.
eladkal closed issue #22748: Pyspark Job Operator is failing from airflow
URL: https://github.com/apache/airflow/issues/22748


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #22748: Pyspark Job Operator is failing from airflow

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #22748:
URL: https://github.com/apache/airflow/issues/22748#issuecomment-1088626733

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org