You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/01 12:11:41 UTC
[GitHub] [airflow] thejens opened a new issue #17962: Warn if robots.txt is accessed
thejens opened a new issue #17962:
URL: https://github.com/apache/airflow/issues/17962
### Description
https://github.com/apache/airflow/pull/17946 implements a `/robots.txt` endpoint to block search engines crawling Airflow - in the cases where it is (accidentally) exposed to the public Internet.
If we record any GET requests to that end-point we'd have a strong warning flag that the deployment is exposed, and could issue a warning in the UI, or even enable some kill-switch on the deployment.
Some deployments are likely intentionally available and rely on auth mechanisms on the `login` endpoint, so there should be a config option to suppress the warnings.
An alternative approach would be to monitor for requests from specific user-agents used by crawlers for the same reasons
### Use case/motivation
People who accidentally expose airflow have a slightly higher chance of realising they've done so and tighten their security.
### Related issues
_No response_
### Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #17962: Warn if robots.txt is accessed
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-910330620
BTW. I think logging it in logs should possibly be enough ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #17962: Warn if robots.txt is accessed
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-910430796
> However I think the kind of user who accidentally expose their Airflow deployment are quite likely not the kind of user who monitors logs. I know quite a lot of data scientists who set up tooling like airflow without really understanding what they're doing - but simply following tutorials.
True.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ShakaibKhan commented on issue #17962: Warn if robots.txt is accessed
Posted by GitBox <gi...@apache.org>.
ShakaibKhan commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-928292345
started pr to address this: https://github.com/apache/airflow/pull/18557
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #17962: Warn if robots.txt is accessed
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-910329521
good idea.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] thejens commented on issue #17962: Warn if robots.txt is accessed
Posted by GitBox <gi...@apache.org>.
thejens commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-910226115
( now that I think of it, there may be - or perhaps should be - a plugin to Flask that does this)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #17962: Warn if robots.txt is accessed
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-914580057
Feel free!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr closed issue #17962: Warn if robots.txt is accessed
Posted by GitBox <gi...@apache.org>.
uranusjr closed issue #17962:
URL: https://github.com/apache/airflow/issues/17962
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ShakaibKhan commented on issue #17962: Warn if robots.txt is accessed
Posted by GitBox <gi...@apache.org>.
ShakaibKhan commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-928292345
started pr to address this: https://github.com/apache/airflow/pull/18557
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ShakaibKhan commented on issue #17962: Warn if robots.txt is accessed
Posted by GitBox <gi...@apache.org>.
ShakaibKhan commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-913937348
This feature sounds interesting and I would like to try implementing it if no one minds :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] thejens commented on issue #17962: Warn if robots.txt is accessed
Posted by GitBox <gi...@apache.org>.
thejens commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-910442159
Here's a list of google's crawlers user-agents btw; **https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] thejens commented on issue #17962: Warn if robots.txt is accessed
Posted by GitBox <gi...@apache.org>.
thejens commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-910414795
Logging would be a start.
However I think the kind of user who accidentally expose their Airflow deployment are quite likely not the kind of user who monitors logs. I know quite a lot of data scientists who set up tooling like airflow without really understanding what they're doing - but simply following tutorials.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org