You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/05 12:10:47 UTC
[GitHub] [airflow] charan-doxel opened a new issue, #22748: Pyspark Job Operator is failing from airflow
charan-doxel opened a new issue, #22748:
URL: https://github.com/apache/airflow/issues/22748
### Apache Airflow version
2.2.4
### What happened
Using DataprocSubmitPySparkJobOperator from airflow is failing with below error
Broken DAG: [/usr/local/airflow/dags/prod/dag-factory-test.py] Traceback (most recent call last):
File "/usr/local/airflow/dags/dag_constructor/target_test_dag_constructor.py", line 486, in build
run_cipo_pipeline = RunCIPOPipeline(
File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 188, in apply_defaults
result = func(self, *args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'default_args'
### What you think should happen instead
From the initial debugging, we found that pyspark operator is sending unintendted data to base operator.
### How to reproduce
Using below code will fail in a dag task
class RunPipeline(DataprocSubmitPySparkJobOperator):
def __init__(self, owner, dag, cluster_name):
super().__init__(
main="gs://ml-models/datasets/__main__.py",
files=["gs://ml-models/datasets/gs-service-creds.json"],
pyfiles=[
"gs://ml-models/datasets/annotation-ml.whl",
]
)
### Operating System
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux" VERSION_ID="11" VERSION="11 (bullseye)" VERSION_CODENAME=bullseye ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"
### Versions of Apache Airflow Providers
apache-airflow==1!2.2.4+astro.4
apache-airflow-providers-amazon==3.0.0
apache-airflow-providers-cncf-kubernetes==1!3.0.2
apache-airflow-providers-elasticsearch==1!2.2.0
apache-airflow-providers-ftp==1!2.0.1
apache-airflow-providers-google==1!6.4.0
apache-airflow-providers-http==1!2.0.3
apache-airflow-providers-imap==1!2.2.0
apache-airflow-providers-microsoft-azure==1!3.6.0
apache-airflow-providers-mysql==1!2.2.0
apache-airflow-providers-postgres==1!3.0.0
apache-airflow-providers-redis==1!2.0.1
apache-airflow-providers-slack==4.2.0
apache-airflow-providers-sqlite==1!2.1.0
apache-airflow-providers-ssh==1!2.4.0
google-ads==14.0.0
google-api-core==1.31.5
google-api-python-client==1.12.10
google-auth==1.35.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.6
google-cloud-aiplatform==1.10.0
google-cloud-appengine-logging==1.1.0
google-cloud-audit-log==0.2.0
google-cloud-automl==2.6.0
google-cloud-bigquery==2.33.0
google-cloud-bigquery-datatransfer==3.6.0
google-cloud-bigquery-storage==2.11.0
google-cloud-bigtable==1.7.0
google-cloud-build==3.8.0
google-cloud-container==1.0.1
google-cloud-core==1.7.2
google-cloud-datacatalog==3.6.2
google-cloud-dataproc==3.2.0
google-cloud-dataproc-metastore==1.3.1
google-cloud-dlp==1.0.0
google-cloud-kms==2.11.0
google-cloud-language==1.3.0
google-cloud-logging==2.7.0
google-cloud-memcache==1.0.0
google-cloud-monitoring==2.8.0
google-cloud-orchestration-airflow==1.2.1
google-cloud-os-login==2.5.1
google-cloud-pubsub==2.9.0
google-cloud-redis==2.5.1
google-cloud-secret-manager==1.0.0
google-cloud-spanner==1.19.1
google-cloud-speech==1.3.2
google-cloud-storage==1.44.0
google-cloud-tasks==2.7.2
google-cloud-texttospeech==1.0.1
google-cloud-translate==1.7.0
google-cloud-videointelligence==1.16.1
google-cloud-vision==1.0.0
google-cloud-workflows==1.5.0
google-crc32c==1.3.0
google-resumable-media==2.2.1
googleapis-common-protos==1.54.0
graphqlclient==0.2.4
### Deployment
Other 3rd-party Helm chart
### Deployment details
scaled out airflow setup with 2 schedulers, 3 workers in GKE
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] diman82 commented on issue #22748: Pyspark Job Operator is failing from airflow
Posted by GitBox <gi...@apache.org>.
diman82 commented on issue #22748:
URL: https://github.com/apache/airflow/issues/22748#issuecomment-1094269403
I get the very same error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #22748: Pyspark Job Operator is failing from airflow
Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #22748:
URL: https://github.com/apache/airflow/issues/22748#issuecomment-1094281928
This error is on your custom code.
For the moment there is no indication of a bug.
If you found a bug that is reproducible in latest main and Google provider please add a full reproduce example that we can run. What you shared is a fragment of code that we can't really run and it seems to be originated from your own custom code.
Should you need support rather than report a bug please use Stackoverflow or [GitHub discussions](https://github.com/apache/airflow/discussions)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal closed issue #22748: Pyspark Job Operator is failing from airflow
Posted by GitBox <gi...@apache.org>.
eladkal closed issue #22748: Pyspark Job Operator is failing from airflow
URL: https://github.com/apache/airflow/issues/22748
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #22748: Pyspark Job Operator is failing from airflow
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #22748:
URL: https://github.com/apache/airflow/issues/22748#issuecomment-1088626733
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org