You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/20 08:11:27 UTC

[GitHub] [airflow] PatrykKlimowicz opened a new issue, #25177: Airflow ElasticSearch provider issue

PatrykKlimowicz opened a new issue, #25177:
URL: https://github.com/apache/airflow/issues/25177

   ### Apache Airflow version
   
   2.3.3 (latest released)
   
   ### What happened
   
   Durign usage of Airflow v2.1.3 in my project [this](https://github.com/apache/airflow/issues/17512) issue appeared, and was solved by adding the `Offset_Key` to the [Fluent Bit](https://github.com/fluent/fluent-bit) configuration. This Offset_Key appends the offset field to the logs, so we can retrieve the logs in correct order. We specified the `AIRFLOW__ELASTICSEARCH__OFFSET_FIELD="custom_offset"` and logs were retrieved correctly based on the `custom_offset` and then displayed in Airflow UI.
   
   Now, I updated the version to the v2.3.3 and this behavior is no longer valid. I tested some combinations:
   
   - AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and Offset_Key has the same value - no offset key is created in the logs and logs cannot be obtained from ElasticSearch
   - AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and Offset_Key  has different values - both offset keys are added to the logs and I can see the logs on UI (logs are obtained based on AIRFLOW__ELASTICSEARCH__OFFSET_FIELD  and not custom one).
   Due to backward compatibility I need to achieve config in which `custom_offset` has higher precedence than the one Airflow inserts.
   
   As suggested [here](https://github.com/apache/airflow/discussions/25154) I tried to lower the elasticsearch provider version and see which one will work for this scenario.
   
   It turned out that the version which we used with Airflow v2.1.3 was OK, so the `apache-airflow-providers-elasticsearch==2.0.2`.
   I think that [this](https://github.com/apache/airflow/pull/17551) change break our use case, as the version `2.0.3` is first that does not work for us - [changelog](https://pypi.org/project/apache-airflow-providers-elasticsearch/2.0.3/). With the version 2.0.2 I can see that `custom_offset` and the Airflow's `offset` are added to the logs, but thanks to `AIRFLOW__ELASTICSEARCH__OFFSET_FIELD="custom_offset"` logs are displayed in correct order.
   
   
   ### What you think should happen instead
   
   Offset from Airflow should not conflict with the offset added by third party tool since Airflow does not support sending logs to the ElasticSearch, but supports reading from it. 
   
   Most probably, there will be an issue with flow of the logs. Right now it is like:
   
   Airflow -> LogFile <- Fluent Bit -> ElasticSearch <- Airflow 
   
   so Airflow does not know about the (in that specific case) Fluent Bit config and it's offset name. 
   
   It would be nice to make the change in version 2.0.3 I linked above optional, so we can instruct Airflow if it should create a offset with given `AIRFLOW__ELASTICSEARCH__OFFSET_FIELD` name or just use that name to obtain logs (I do not know the whole logic behind the Airflow logs retrieval, so not sure if this is a good idea). I think that the bool flag like `AIRFLOW__ELASTICSEARCH__ADD_OFFSET_FIELD` could determine the creation of Airflow's offset field and the `AIRFLOW__ELASTICSEARCH__OFFSET_FIELD` could determine what name to use to either create and retrieve logs OR just retrieve the logs. 
   
   ### How to reproduce
   
   Use Airflow in v2.3.3.
   Use [Fluent Bit](https://github.com/fluent/helm-charts/tree/main/charts/fluent-bit) in v1.9.6 and add the Offset_Key to it's [INPUT](https://github.com/fluent/helm-charts/blob/main/charts/fluent-bit/values.yaml#L292) config
   Use ElasticSearch to store logs and read logs from ElasticSearch in Airflow UI.
   
   ### Operating System
   
   AKS
   
   ### Versions of Apache Airflow Providers
   
   Working case (Airflow 2.1.3):
   
   - apache-airflow-providers-amazon==2.1.0
   - apache-airflow-providers-celery==2.0.0
   - apache-airflow-providers-cncf-kubernetes==2.0.2
   - apache-airflow-providers-docker==2.1.0
   - apache-airflow-providers-elasticsearch==2.0.2
   - apache-airflow-providers-ftp==2.0.0
   - apache-airflow-providers-google==5.0.0
   - apache-airflow-providers-grpc==2.0.0
   - apache-airflow-providers-hashicorp==2.0.0
   - apache-airflow-providers-http==2.0.0
   - apache-airflow-providers-imap==2.0.0
   - apache-airflow-providers-microsoft-azure==3.1.0
   - apache-airflow-providers-mysql==2.1.0
   - apache-airflow-providers-odbc==2.0.0
   - apache-airflow-providers-postgres==2.0.0
   - apache-airflow-providers-redis==2.0.0
   - apache-airflow-providers-sendgrid==2.0.0
   - apache-airflow-providers-sftp==2.1.0
   - apache-airflow-providers-slack==4.0.0
   - apache-airflow-providers-sqlite==2.0.0
   - apache-airflow-providers-ssh==2.1.0
   
   Not working case (Airflow v2.3.3):
   apache-airflow-providers-amazon==4.0.0
   apache-airflow-providers-celery==3.0.0
   apache-airflow-providers-cncf-kubernetes==4.1.0
   apache-airflow-providers-docker==3.0.0
   apache-airflow-providers-elasticsearch==4.0.0
   apache-airflow-providers-ftp==3.0.0
   apache-airflow-providers-google==8.1.0
   apache-airflow-providers-grpc==3.0.0
   apache-airflow-providers-hashicorp==3.0.0
   apache-airflow-providers-http==3.0.0
   apache-airflow-providers-imap==3.0.0
   apache-airflow-providers-microsoft-azure==4.0.0
   apache-airflow-providers-mysql==3.0.0
   apache-airflow-providers-odbc==3.0.0
   apache-airflow-providers-postgres==5.0.0
   apache-airflow-providers-redis==3.0.0
   apache-airflow-providers-sendgrid==3.0.0
   apache-airflow-providers-sftp==3.0.0
   apache-airflow-providers-slack==5.0.0
   apache-airflow-providers-sqlite==3.0.0
   apache-airflow-providers-ssh==3.0.0
   
   Airflow v2.3.3 is working with apache-airflow-providers-elasticsearch==2.0.2
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   We are using Airflow Community Helm chart + Azure Kubernetes Service
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] millin commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
millin commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1191645743

   I think this mistake already [fixed here](https://github.com/apache/airflow/pull/21942/files#diff-dd898ab2ed4bca853f1ce5cf52b6fbb37d5fc3545f28967c03dc499fabd3a746R311).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] PatrykKlimowicz commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
PatrykKlimowicz commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192407955

   I disabled the SSL and it's "OK" now 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1191596114

   Hmm. there is an on-going change #21942  - @millin maybe you could take a look at the issue here and implement as part of the improvements in #21942 ? And then @PatrykKlimowicz you could test if the change will work ? Might be a good cooperation and I have a little to no experience with Elasticsearch - but maybe you should test each-other's changes? 
   
   IT's actually very easy to prepare a new provider. This:
   
   ```bash
   breeze prepare-provider-packages elasticsearch --version-suffix-for-pypi post1
   ```
    
   Should build `dist/apache_airflow_providers_elasticsearch-4.1.0.post1-py3-none-any.whl` provider that you should be install and test easily.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] PatrykKlimowicz commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
PatrykKlimowicz commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192248250

   @potiuk I'll try to test 😄 Will be back with some feedback soon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192339565

   Because that would happen if you run this command in the worktree 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192339301

   Interesting. Do you happen to work in a worktree maybe ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192551345

   COOOL. I am merging it now then :).
   
   We release providers ~ monthly last release was last week, so expect this one in ~3 weeks or so
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] PatrykKlimowicz commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
PatrykKlimowicz commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192328622

   @potiuk I followed [this](https://github.com/apache/airflow/blob/main/BREEZE.rst#installation) to setup env with breeze, but I stuck on this error:
   
   ```bash
   (myvenv) ➜  ~/dev/airflow  git:(main) ✗ breeze --force-build prepare-provider-packages elasticsearch --version-suffix-for-pypi post1
   Good version of Docker: 20.10.12.
   Good version of docker-compose: 2.2.3
   Good Docker context used: default.
   Docker image build is not needed for CI build as no important files are changed! You can add --force-build to force it
   Requirement already satisfied: pip==22.2 in /usr/local/lib/python3.7/site-packages (22.2)
   WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
   
   Get all providers
   
   
   Copy sources
   
   ===================================================================================
    Copying sources for provider packages
   ===================================================================================
   /opt/airflow /opt/airflow/dev/provider_packages
   /opt/airflow/dev/provider_packages
   -----------------------------------------------------------------------------------
   
    Package Version of providers suffix set for PyPI version: post1
   
   -----------------------------------------------------------------------------------
   ########## Generate setup files for 'elasticsearch' ##########
   Traceback (most recent call last):
     File "/opt/airflow/dev/provider_packages/prepare_provider_packages.py", line 2001, in <module>
       cli()
     File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
       return self.main(*args, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/rich_click/rich_group.py", line 21, in main
       rv = super().main(*args, standalone_mode=False, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1055, in main
       rv = self.invoke(ctx)
     File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
       return _process_result(sub_ctx.command.invoke(sub_ctx))
     File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
       return ctx.invoke(self.callback, **ctx.params)
     File "/usr/local/lib/python3.7/site-packages/click/core.py", line 760, in invoke
       return __callback(*args, **kwargs)
     File "/opt/airflow/dev/provider_packages/prepare_provider_packages.py", line 1541, in generate_setup_files
       current_tag = get_current_tag(provider_package_id, version_suffix, git_update, verbose)
     File "/opt/airflow/dev/provider_packages/prepare_provider_packages.py", line 1553, in get_current_tag
       make_sure_remote_apache_exists_and_fetch(git_update, verbose)
     File "/opt/airflow/dev/provider_packages/prepare_provider_packages.py", line 715, in make_sure_remote_apache_exists_and_fetch
       stderr=subprocess.DEVNULL,
     File "/usr/local/lib/python3.7/subprocess.py", line 363, in check_call
       raise CalledProcessError(retcode, cmd)
   subprocess.CalledProcessError: Command '['git', 'fetch', '--tags', '--force', 'apache-https-for-providers']' returned non-zero exit status 128.
   ===================================================================================
   
   Summary of prepared packages:
   
       Errors:
   elasticsearch
   
   ==================================================================================
   ==================================================================================
   
   There were errors when preparing packages. Exiting! 
   ```
   
   Any ideas? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] PatrykKlimowicz commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
PatrykKlimowicz commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192546160

   @potiuk I deployed the Airflow in my env with new elasticsearch provider package and I have some good news. The #21942 fixed the issue I described! 
   
   Is there any ETA maybe for this code to be released? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #25177: Airflow ElasticSearch provider issue
URL: https://github.com/apache/airflow/issues/25177


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192648007

   > Cool. I will add the flag - it used to be there in old breeze (and will just turn this error into warning - it's not nessary to be run, it's more to make sure we have latest version of tags :) .
   
   https://github.com/apache/airflow/pull/25236 to skip the fetch error and turn it into warning @PatrykKlimowicz 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1189965614

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1191667608

   HA!. There you go! 
   
   @PatrykKlimowicz - how about checking ot the code from #21942 and testing it it works for you :) ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192423208

   Cool. I will add the flag - it used to be there in old breeze (and will just turn this error into warning - it's not nessary to be run, it's more to make sure we have latest version of tags :) .
   
   Re: using in K8S - you really need to update your image. There is potentially a way to install it dynamically in your image but it might be more complex than rebuilding the image:
   
   See https://airflow.apache.org/docs/docker-stack/entrypoint.html#installing-additional-requirements
   
   * you have to make the package available to your image (for example you can place it in DAGs folder or plugins folder)
   Set env variable for your deployment:  `_PIP_ADDITIONAL_REQUIREMENTS="<fulll_path_to_the_package>"`
   
   Then whenever any of the components start it will install the package before running anyhing
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192340304

   And there is a flag to disable this command i think - just run it with --help 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] PatrykKlimowicz commented on issue #25177: Airflow ElasticSearch provider issue

Posted by GitBox <gi...@apache.org>.
PatrykKlimowicz commented on issue #25177:
URL: https://github.com/apache/airflow/issues/25177#issuecomment-1192385246

   >  Interesting. Do you happen to work in a worktree maybe ?
   
   Nope
   
   >  And there is a flag to disable this command i think - just run it with --help
   
   I do not see any special flag. I tried to fix ownership, but still got the error 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org