You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/10/27 12:11:11 UTC

[GitHub] [druid] nishantmonu51 opened a new pull request #10535: Add support for blacklisting domains for HTTPInputSource

nishantmonu51 opened a new pull request #10535:
URL: https://github.com/apache/druid/pull/10535


   ### Description
   Right now there is no support to restrict druid HTTP based ingestion to deny ingestion from any given hosts. 
   Cloud providers have metadata service that can be accessible via HTTP endpoints which can leak sensitive information about the instance on which druid is running. 
   * https://docs.microsoft.com/en-us/azure/virtual-machines/linux/instance-metadata-service
   * https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
   
   this PR adds a new config `druid.ingestion.http.blackListDomains` which allows restricting specific IP/domains from being accessed through HTTP based ingestion.
   
   This PR has:
   - [*] been self-reviewed.
   - [*] added documentation for new or modified features or behaviors.
   - [*] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   
   
   <hr>
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] nishantmonu51 merged pull request #10535: Add support for domain allow/deny list for HTTPInputSource

Posted by GitBox <gi...@apache.org>.
nishantmonu51 merged pull request #10535:
URL: https://github.com/apache/druid/pull/10535


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10535: Add support for domain allow/deny list for HTTPInputSource

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10535:
URL: https://github.com/apache/druid/pull/10535#issuecomment-775666255


   I'm not @nishantmonu51 🙂 but IMO, it would make sense to revert this out of 0.21.0 in favor of a more comprehensive fix with a better API, like we're talking about in https://github.com/apache/druid/pull/10677.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #10535: Add support for domain allow/deny list for HTTPInputSource

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10535:
URL: https://github.com/apache/druid/pull/10535#issuecomment-773690442


   Hi @nishantmonu51, I have been thinking about this PR whether it should be a part of the 0.21.0 release. My major concern is that the new security config only works for the HTTP inputSource, which means users can still use the HTTP firehose to bypass the security config. This can make a wrong impression for system administrators that they can secure their Druid cluster with the new config, which is not true. 
   
   To avoid this problem, I think we have two options, either making both the HTTP inputSource and firehose to support the security config in the 0.21.0 release or not including them at all in the release. I would say the first option seems not eligible as it requires to add a new feature after code freeze which can potentially delay the release. So, I'd like to suggest reverting this PR. Since the problem described above will be fixed in https://github.com/apache/druid/pull/10677, we won't even have to deprecate the configs added in this PR if we go down this way. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #10535: Add support for domain allow/deny list for HTTPInputSource

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10535:
URL: https://github.com/apache/druid/pull/10535#issuecomment-775602414


   @nishantmonu51 sorry for pinging you again, but do you have any opinion on this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] himanshug edited a comment on pull request #10535: Add support for blacklisting domains for HTTPInputSource

Posted by GitBox <gi...@apache.org>.
himanshug edited a comment on pull request #10535:
URL: https://github.com/apache/druid/pull/10535#issuecomment-717442644


   User generally is not aware of all the different domains that could leak sensitive information, so an explicit `allowList`  instead-of/in-addition-to `denyList` would make the choice more secure in this context.
   what do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] himanshug commented on pull request #10535: Add support for blacklisting domains for HTTPInputSource

Posted by GitBox <gi...@apache.org>.
himanshug commented on pull request #10535:
URL: https://github.com/apache/druid/pull/10535#issuecomment-717442644


   User generally is not aware of all the different domains that could leak sensitive information, so an explicit `allowList` (trying to incorporate https://searchcio.techtarget.com/feature/Rooting-out-racism-in-AI-systems-theres-no-time-to-lose ) instead-of/in-addition-to `denyList` would make the choice more secure in this context.
   what do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] himanshug commented on a change in pull request #10535: Add support for domain allow/deny list for HTTPInputSource

Posted by GitBox <gi...@apache.org>.
himanshug commented on a change in pull request #10535:
URL: https://github.com/apache/druid/pull/10535#discussion_r514609436



##########
File path: core/src/main/java/org/apache/druid/data/input/impl/HttpInputSourceConfig.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.impl;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+
+import java.net.URI;
+import java.util.Collections;
+import java.util.List;
+
+public class HttpInputSourceConfig
+{
+  private final List<String> allowListDomains;
+  private final List<String> denyListDomains;
+
+  @JsonCreator
+  public HttpInputSourceConfig(
+      @JsonProperty("allowListDomains") List<String> allowListDomains,
+      @JsonProperty("denyListDomains") List<String> denyListDomains
+  )
+  {
+    this.allowListDomains = allowListDomains == null ? Collections.emptyList() : allowListDomains;

Review comment:
       it would be nice to differentiate between NULL vs EMPTY allowList ... null meaning everything is allowed and empty meaning nothing is allowed , so that a user can just set the allowList to empty to fully disable reading from anywhere.. they can add things to it as they see fit later on.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] nishantmonu51 commented on pull request #10535: Add support for blacklisting domains for HTTPInputSource

Posted by GitBox <gi...@apache.org>.
nishantmonu51 commented on pull request #10535:
URL: https://github.com/apache/druid/pull/10535#issuecomment-717854106


   @himanshug: This config is mostly for the administrators of the druid service to secure the environment and block any internal hosts. 
   The admin of the service will not have the complete allow list for the sources the service users can ingest the data from.
   
   Although, an option for an allowList along with blacklist where the user can choose to either use allowList or denyList makes sense. I updated the PR and added both the options.     
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] himanshug commented on pull request #10535: Add support for domain allow/deny list for HTTPInputSource

Posted by GitBox <gi...@apache.org>.
himanshug commented on pull request #10535:
URL: https://github.com/apache/druid/pull/10535#issuecomment-719066186


   +1 after https://github.com/apache/druid/pull/10535#discussion_r514609436


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #10535: Add support for domain allow/deny list for HTTPInputSource

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10535:
URL: https://github.com/apache/druid/pull/10535#issuecomment-776337957


   Thanks @gianm for chiming in. I'm going to revert this PR in both master and 0.21.0 branches.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #10535: Add support for domain allow/deny list for HTTPInputSource

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10535:
URL: https://github.com/apache/druid/pull/10535#issuecomment-773690442


   Hi @nishantmonu51, I have been thinking about this PR whether it should be a part of the 0.21.0 release. My major concern is that the new security config only works for the HTTP inputSource, which means users can still use the HTTP firehose to bypass the security config. This can make a wrong impression for system administrators that they can secure their Druid cluster with the new config, which is not true. 
   
   To avoid this problem, I think we have two options, either making both the HTTP inputSource and firehose to support the security config in the 0.21.0 release or not including them at all in the release. I would say the first option seems not eligible as it requires to add a new feature after code freeze which can potentially delay the release. So, I'd like to suggest reverting this PR. Since the problem described above will be fixed in https://github.com/apache/druid/pull/10677, we won't even have to deprecate the configs added in this PR if we go down this way. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson edited a comment on pull request #10535: Add support for domain allow/deny list for HTTPInputSource

Posted by GitBox <gi...@apache.org>.
jihoonson edited a comment on pull request #10535:
URL: https://github.com/apache/druid/pull/10535#issuecomment-773690442


   Hi @nishantmonu51, I have been thinking about this PR whether it should be a part of the 0.21.0 release. My major concern is that the new security config only works for the HTTP inputSource, which means users can still use the HTTP firehose to bypass the security config. This can give a wrong impression to system administrators that they can secure their Druid cluster with the new config, which is not true. 
   
   To avoid this problem, I think we have two options, either making both the HTTP inputSource and firehose to support the security config in the 0.21.0 release or not including them at all in the release. I would say the first option seems not eligible as it requires to add a new feature after code freeze which can potentially delay the release. So, I'd like to suggest reverting this PR. Since the problem described above will be fixed in https://github.com/apache/druid/pull/10677, we won't even have to deprecate the configs added in this PR if we go down this way. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org