You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/20 08:11:27 UTC
[GitHub] [airflow] PatrykKlimowicz opened a new issue, #25177: Airflow ElasticSearch provider issue
PatrykKlimowicz opened a new issue, #25177:
URL: https://github.com/apache/airflow/issues/25177
### Apache Airflow version
2.3.3 (latest released)
### What happened
Durign usage of Airflow v2.1.3 in my project [this](https://github.com/apache/airflow/issues/17512) issue appeared, and was solved by adding the `Offset_Key` to the [Fluent Bit](https://github.com/fluent/fluent-bit) configuration. This Offset_Key appends the offset field to the logs, so we can retrieve the logs in correct order. We specified the `AIRFLOW__ELASTICSEARCH__OFFSET_FIELD="custom_offset"` and logs were retrieved correctly based on the `custom_offset` and then displayed in Airflow UI.
Now, I updated the version to the v2.3.3 and this behavior is no longer valid. I tested some combinations:
- AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and Offset_Key has the same value - no offset key is created in the logs and logs cannot be obtained from ElasticSearch
- AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and Offset_Key has different values - both offset keys are added to the logs and I can see the logs on UI (logs are obtained based on AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and not custom one).
Due to backward compatibility I need to achieve config in which `custom_offset` has higher precedence than the one Airflow inserts.
As suggested [here](https://github.com/apache/airflow/discussions/25154) I tried to lower the elasticsearch provider version and see which one will work for this scenario.
It turned out that the version which we used with Airflow v2.1.3 was OK, so the `apache-airflow-providers-elasticsearch==2.0.2`.
I think that [this](https://github.com/apache/airflow/pull/17551) change break our use case, as the version `2.0.3` is first that does not work for us - [changelog](https://pypi.org/project/apache-airflow-providers-elasticsearch/2.0.3/). With the version 2.0.2 I can see that `custom_offset` and the Airflow's `offset` are added to the logs, but thanks to `AIRFLOW__ELASTICSEARCH__OFFSET_FIELD="custom_offset"` logs are displayed in correct order.
### What you think should happen instead
Offset from Airflow should not conflict with the offset added by third party tool since Airflow does not support sending logs to the ElasticSearch, but supports reading from it.
Most probably, there will be an issue with flow of the logs. Right now it is like:
Airflow -> LogFile <- Fluent Bit -> ElasticSearch <- Airflow
so Airflow does not know about the (in that specific case) Fluent Bit config and it's offset name.
It would be nice to make the change in version 2.0.3 I linked above optional, so we can instruct Airflow if it should create a offset with given `AIRFLOW__ELASTICSEARCH__OFFSET_FIELD` name or just use that name to obtain logs (I do not know the whole logic behind the Airflow logs retrieval, so not sure if this is a good idea). I think that the bool flag like `AIRFLOW__ELASTICSEARCH__ADD_OFFSET_FIELD` could determine the creation of Airflow's offset field and the `AIRFLOW__ELASTICSEARCH__OFFSET_FIELD` could determine what name to use to either create and retrieve logs OR just retrieve the logs.
### How to reproduce
Use Airflow in v2.3.3.
Use [Fluent Bit](https://github.com/fluent/helm-charts/tree/main/charts/fluent-bit) in v1.9.6 and add the Offset_Key to it's [INPUT](https://github.com/fluent/helm-charts/blob/main/charts/fluent-bit/values.yaml#L292) config
Use ElasticSearch to store logs and read logs from ElasticSearch in Airflow UI.
### Operating System
AKS
### Versions of Apache Airflow Providers
Working case (Airflow 2.1.3):
- apache-airflow-providers-amazon==2.1.0
- apache-airflow-providers-celery==2.0.0
- apache-airflow-providers-cncf-kubernetes==2.0.2
- apache-airflow-providers-docker==2.1.0
- apache-airflow-providers-elasticsearch==2.0.2
- apache-airflow-providers-ftp==2.0.0
- apache-airflow-providers-google==5.0.0
- apache-airflow-providers-grpc==2.0.0
- apache-airflow-providers-hashicorp==2.0.0
- apache-airflow-providers-http==2.0.0
- apache-airflow-providers-imap==2.0.0
- apache-airflow-providers-microsoft-azure==3.1.0
- apache-airflow-providers-mysql==2.1.0
- apache-airflow-providers-odbc==2.0.0
- apache-airflow-providers-postgres==2.0.0
- apache-airflow-providers-redis==2.0.0
- apache-airflow-providers-sendgrid==2.0.0
- apache-airflow-providers-sftp==2.1.0
- apache-airflow-providers-slack==4.0.0
- apache-airflow-providers-sqlite==2.0.0
- apache-airflow-providers-ssh==2.1.0
Not working case (Airflow v2.3.3):
apache-airflow-providers-amazon==4.0.0
apache-airflow-providers-celery==3.0.0
apache-airflow-providers-cncf-kubernetes==4.1.0
apache-airflow-providers-docker==3.0.0
apache-airflow-providers-elasticsearch==4.0.0
apache-airflow-providers-ftp==3.0.0
apache-airflow-providers-google==8.1.0
apache-airflow-providers-grpc==3.0.0
apache-airflow-providers-hashicorp==3.0.0
apache-airflow-providers-http==3.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-microsoft-azure==4.0.0
apache-airflow-providers-mysql==3.0.0
apache-airflow-providers-odbc==3.0.0
apache-airflow-providers-postgres==5.0.0
apache-airflow-providers-redis==3.0.0
apache-airflow-providers-sendgrid==3.0.0
apache-airflow-providers-sftp==3.0.0
apache-airflow-providers-slack==5.0.0
apache-airflow-providers-sqlite==3.0.0
apache-airflow-providers-ssh==3.0.0
Airflow v2.3.3 is working with apache-airflow-providers-elasticsearch==2.0.2
### Deployment
Other 3rd-party Helm chart
### Deployment details
We are using Airflow Community Helm chart + Azure Kubernetes Service
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] millin commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
millin commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1191645743
I think this mistake already [fixed here](https://github.com/apache/airflow/pull/21942/files#diff-dd898ab2ed4bca853f1ce5cf52b6fbb37d5fc3545f28967c03dc499fabd3a746R311).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] PatrykKlimowicz commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
PatrykKlimowicz commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192407955
I disabled the SSL and it's "OK" now
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1191596114
Hmm. there is an on-going change #21942 - @millin maybe you could take a look at the issue here and implement as part of the improvements in #21942 ? And then @PatrykKlimowicz you could test if the change will work ? Might be a good cooperation and I have a little to no experience with Elasticsearch - but maybe you should test each-other's changes?
IT's actually very easy to prepare a new provider. This:
```bash
breeze prepare-provider-packages elasticsearch --version-suffix-for-pypi post1
```
Should build `dist/apache_airflow_providers_elasticsearch-4.1.0.post1-py3-none-any.whl` provider that you should be install and test easily.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] PatrykKlimowicz commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
PatrykKlimowicz commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192248250
@potiuk I'll try to test 😄 Will be back with some feedback soon
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192339565
Because that would happen if you run this command in the worktree
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192339301
Interesting. Do you happen to work in a worktree maybe ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192551345
COOOL. I am merging it now then :).
We release providers ~ monthly last release was last week, so expect this one in ~3 weeks or so
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] PatrykKlimowicz commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
PatrykKlimowicz commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192328622
@potiuk I followed [this](https://github.com/apache/airflow/blob/main/BREEZE.rst#installation) to setup env with breeze, but I stuck on this error:
```bash
(myvenv) ➜ ~/dev/airflow git:(main) ✗ breeze --force-build prepare-provider-packages elasticsearch --version-suffix-for-pypi post1
Good version of Docker: 20.10.12.
Good version of docker-compose: 2.2.3
Good Docker context used: default.
Docker image build is not needed for CI build as no important files are changed! You can add --force-build to force it
Requirement already satisfied: pip==22.2 in /usr/local/lib/python3.7/site-packages (22.2)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Get all providers
Copy sources
===================================================================================
Copying sources for provider packages
===================================================================================
/opt/airflow /opt/airflow/dev/provider_packages
/opt/airflow/dev/provider_packages
-----------------------------------------------------------------------------------
Package Version of providers suffix set for PyPI version: post1
-----------------------------------------------------------------------------------
########## Generate setup files for 'elasticsearch' ##########
Traceback (most recent call last):
File "/opt/airflow/dev/provider_packages/prepare_provider_packages.py", line 2001, in <module>
cli()
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/airflow/dev/provider_packages/prepare_provider_packages.py", line 1541, in generate_setup_files
current_tag = get_current_tag(provider_package_id, version_suffix, git_update, verbose)
File "/opt/airflow/dev/provider_packages/prepare_provider_packages.py", line 1553, in get_current_tag
make_sure_remote_apache_exists_and_fetch(git_update, verbose)
File "/opt/airflow/dev/provider_packages/prepare_provider_packages.py", line 715, in make_sure_remote_apache_exists_and_fetch
stderr=subprocess.DEVNULL,
File "/usr/local/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'fetch', '--tags', '--force', 'apache-https-for-providers']' returned non-zero exit status 128.
===================================================================================
Summary of prepared packages:
Errors:
elasticsearch
==================================================================================
==================================================================================
There were errors when preparing packages. Exiting!
```
Any ideas?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] PatrykKlimowicz commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
PatrykKlimowicz commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192546160
@potiuk I deployed the Airflow in my env with new elasticsearch provider package and I have some good news. The #21942 fixed the issue I described!
Is there any ETA maybe for this code to be released?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
potiuk closed issue #25177: Airflow ElasticSearch provider issue
URL: https://github.com/apache/airflow/issues/25177
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192648007
> Cool. I will add the flag - it used to be there in old breeze (and will just turn this error into warning - it's not nessary to be run, it's more to make sure we have latest version of tags :) .
https://github.com/apache/airflow/pull/25236 to skip the fetch error and turn it into warning @PatrykKlimowicz
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1189965614
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1191667608
HA!. There you go!
@PatrykKlimowicz - how about checking ot the code from #21942 and testing it it works for you :) ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192423208
Cool. I will add the flag - it used to be there in old breeze (and will just turn this error into warning - it's not nessary to be run, it's more to make sure we have latest version of tags :) .
Re: using in K8S - you really need to update your image. There is potentially a way to install it dynamically in your image but it might be more complex than rebuilding the image:
See https://airflow.apache.org/docs/docker-stack/entrypoint.html#installing-additional-requirements
* you have to make the package available to your image (for example you can place it in DAGs folder or plugins folder)
Set env variable for your deployment: `_PIP_ADDITIONAL_REQUIREMENTS="<fulll_path_to_the_package>"`
Then whenever any of the components start it will install the package before running anyhing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192340304
And there is a flag to disable this command i think - just run it with --help
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] PatrykKlimowicz commented on issue #25177: Airflow ElasticSearch provider issue
Posted by GitBox <gi...@apache.org>.
PatrykKlimowicz commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192385246
> Interesting. Do you happen to work in a worktree maybe ?
Nope
> And there is a flag to disable this command i think - just run it with --help
I do not see any special flag. I tried to fix ownership, but still got the error
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org