You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "gianm (via GitHub)" <gi...@apache.org> on 2023/02/27 22:04:41 UTC

[GitHub] [druid] gianm commented on issue #13837: Input source security model for MSQ table functions and more

gianm commented on issue #13837:
URL: https://github.com/apache/druid/issues/13837#issuecomment-1447182330

   With regard to backwards-compatibility with `(EXTERNAL, EXTERNAL, READ)` stuff, the feature flag approach sounds good to me. The documentation on this page would need to be updated as well: https://druid.apache.org/docs/latest/multi-stage-query/security.html.
   
   Some notes on the other stuff we'll need to sort out as part of this:
   
   ### non-http protocols via `http` input source
   
   The `http` input source is implementing using [java.net.URLConnection](https://docs.oracle.com/javase/8/docs/api/java/net/URLConnection.html), which can handle various protocols other than http (including local `file://`). Currently the config `druid.ingestion.http.allowedProtocols` (default: `http, https`) is used to control which protocols are permitted via this input source.
   
   We should consider how this all fits together. Perhaps something like this:
   
   - `(EXTERNAL, http, READ)` refers to the `http` input source.
   - The `http` input source may, if `druid.ingestion.http.allowedProtocols` is set, handle non-http protocols. This isn't the concern of the authorization layer.
   - To ensure that people who use either of the above features (`EXTERNAL` authorization, or `druid.ingestion.http.allowedProtocols`) understand their interaction, we should include notesĀ about this in the docs for both features (with examples).
   
   ### non-hdfs protocols via `hdfs` input source
   
   The `hdfs` input source has a similar behavior to the `http` input source. Like `http`, it supports various non-hdfs protocols. Like `http`, there is a `druid.ingestion.hdfs.allowedProtocols` that controls which protocols are allowed. Like `http`, the default set is limited to only the obvious one (`hdfs`).
   
   So, we should be able to take the same approach here that we take with `http`.
   
   ### firehose factories
   
   [Firehoses](https://druid.apache.org/docs/latest/ingestion/native-batch-firehose.html) are a deprecated predecessor to the current "input source" concept. They have been deprecated since 0.17 (late 2019). If we're going with a feature flag for the overall input-source-security feature, IMO it makes sense for that feature flag to also disable firehose factories completely. This absolves us of the responsibility to figure out how to fit them into the new security framework.
   
   ### Hadoop ingest
   
   Hadoop ingest doesn't use our input source concept: instead, it uses Hadoop filesystems and path globs. One approach that comes to mind here is to special-case it to piggyback on the native `hdfs` input source. The idea being:
   
   - If a user has `(EXTERNAL, hdfs, READ)` permissions then they can submit Hadoop ingest jobs.
   - If a user does _not_ have those permission, then they _cannot_ submit Hadoop ingest jobs.
   
   It would be excellent to, in addition, introduce a permission (or cluster-wide setting) specifically for whether it is possible to submit Hadoop jobs. People that do not use Hadoop integration would appreciate the opportunity to switch it off completely, thereby minimizing their potential attack surface.
   
   ### Realtime ingest
   
   Realtime ingest doesn't use our input source concept: instead, it uses Kafka and Kinesis supervisors with system-specific `ioConfig` APIs.
   
   One approach that comes to mind is something similar to the proposal for Hadoop above: special-case these to use `(EXTERNAL, kafka, READ)` and `(EXTERNAL, kinesis, READ)` respectively. This doesn't make quite as much sense as the Hadoop case, since while there _is_ an `hdfs` input source, there are no `kafka` and `kinesis` input sources.
   
   I'm open to other ideas.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org