You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/01 12:11:41 UTC

[GitHub] [airflow] thejens opened a new issue #17962: Warn if robots.txt is accessed

thejens opened a new issue #17962:
URL: https://github.com/apache/airflow/issues/17962


   ### Description
   
   https://github.com/apache/airflow/pull/17946 implements a `/robots.txt` endpoint to block search engines crawling Airflow - in the cases where it is (accidentally) exposed to the public Internet.
   
   If we record any GET requests to that end-point we'd have a strong warning flag that the deployment is exposed, and could issue a warning in the UI, or even enable some kill-switch on the deployment. 
   
   Some deployments are likely intentionally available and rely on auth mechanisms on the `login` endpoint, so there should be a config option to suppress the warnings.
   
   An alternative approach would be to monitor for requests from specific user-agents used by crawlers for the same reasons
   
   ### Use case/motivation
   
   People who accidentally expose airflow have a slightly higher chance of realising they've done so and tighten their security.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17962: Warn if robots.txt is accessed

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-910330620


   BTW. I think logging it in logs should possibly be enough ?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17962: Warn if robots.txt is accessed

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-910430796


   > However I think the kind of user who accidentally expose their Airflow deployment are quite likely not the kind of user who monitors logs. I know quite a lot of data scientists who set up tooling like airflow without really understanding what they're doing - but simply following tutorials.
   
   True. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ShakaibKhan commented on issue #17962: Warn if robots.txt is accessed

Posted by GitBox <gi...@apache.org>.
ShakaibKhan commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-928292345


   started pr to address this: https://github.com/apache/airflow/pull/18557


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17962: Warn if robots.txt is accessed

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-910329521


   good idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thejens commented on issue #17962: Warn if robots.txt is accessed

Posted by GitBox <gi...@apache.org>.
thejens commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-910226115


   ( now that I think of it, there may be - or perhaps should be - a plugin to Flask that does this)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17962: Warn if robots.txt is accessed

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-914580057


   Feel free!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr closed issue #17962: Warn if robots.txt is accessed

Posted by GitBox <gi...@apache.org>.
uranusjr closed issue #17962:
URL: https://github.com/apache/airflow/issues/17962


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ShakaibKhan commented on issue #17962: Warn if robots.txt is accessed

Posted by GitBox <gi...@apache.org>.
ShakaibKhan commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-928292345


   started pr to address this: https://github.com/apache/airflow/pull/18557


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ShakaibKhan commented on issue #17962: Warn if robots.txt is accessed

Posted by GitBox <gi...@apache.org>.
ShakaibKhan commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-913937348


   This feature sounds interesting and I would like to try implementing it if no one minds :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thejens commented on issue #17962: Warn if robots.txt is accessed

Posted by GitBox <gi...@apache.org>.
thejens commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-910442159


   Here's a list of google's crawlers user-agents btw; **https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thejens commented on issue #17962: Warn if robots.txt is accessed

Posted by GitBox <gi...@apache.org>.
thejens commented on issue #17962:
URL: https://github.com/apache/airflow/issues/17962#issuecomment-910414795


   Logging would be a start. 
   
   However I think the kind of user who accidentally expose their Airflow deployment are quite likely not the kind of user who monitors logs. I know quite a lot of data scientists who set up tooling like airflow without really understanding what they're doing - but simply following tutorials.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org