You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by larry mccay <lm...@apache.org> on 2017/10/20 18:29:32 UTC

[DISCUSS] Feature Branch Merge and Security Audits

All -

Given the maturity of Hadoop at this point, I would like to propose that we
start doing explicit security audits of features at merge time.

There are a few reasons that I think this is a good place/time to do the
review:

1. It represents a specific snapshot of where the feature stands as a
whole. This means that we can more easily identity the attack surface of a
given feature.
2. We can identify any security gaps that need to be fixed before a release
that carries the feature can be considered ready.
3. We - in extreme cases - can block a feature from merging until some
baseline of security coverage is achieved.
4. The folks that are interested and able to review security aspects can't
scale for every iteration over every JIRA but can review the checklist and
follow pointers for specific areas of interest.

I have provided an impromptu security audit checklist on the DISCUSS thread
for merging Ozone - HDFS-7240 into trunk.

I don't want to pick on it particularly but I think it is a good way to
bootstrap this audit process and figure out how to incorporate it without
being too intrusive.

The questions that I provided below are a mix of general questions that
could be on a standard checklist that you provide along with the merge
thread and some that are specific to what I read about ozone in the
excellent docs provided. So, we should consider some subset of the
following as a proposal for a general checklist.

Perhaps, a shared document can be created to iterate over the list to fine
tune it?

Any thoughts on this, any additional datapoints to collect, etc?

thanks!

--larry

1. UIs
I see there are at least two UIs - Storage Container Manager and Key Space
Manager. There are a number of typical vulnerabilities that we find in UIs

1.1. What sort of validation is being done on any accepted user input?
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting
  1.2.2. cross site request forgery
  1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data or affecting
object stores and related processes?
1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
headers?
1.6. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
1.8. Is there TLS/SSL support?

2. REST APIs

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS)
  2.2.2. cross site request forgery (CSRF)
  2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are the part of existing HDFS processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for access the REST APIs?
2.7. Bucket Level API allows for setting of ACLs on a bucket - what
authorization is required here - is there a restrictive ACL set on creation?
2.8. Bucket Level API allows for deleting a bucket - I assume this is
dependent on ACLs based access control?
2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
paging available?
2.10. Storage Level APIs indicate “Signed with User Authorization” what
does this refer to exactly?
2.11. Object Level APIs indicate that there is no ACL support and only
bucket owners can read and write - but there are ACL APIs on the Bucket
Level are they meaningless for now?
2.12. How does a REST client know which Ozone Handler to connect to or am I
missing some well known NN type endpoint in the architecture doc somewhere?

3. Encryption

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?

4. Configuration

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning in credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out any commands, etc?

5. HA

5.1. Are there provisions for HA?
5.2. Are we leveraging the existing HA capabilities in HDFS?
5.3. Is Storage Container Manager a SPOF?
5.4. I see HA listed in future work in the architecture doc - is this still
an open issue?

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by Eric Yang <ey...@apache.org>.

Looks good and +1 for markdown documentations to provide per release
specific information.

On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:

> New Revision...
>
> This revision acknowledges the reality that we often have multiple phases
> of feature lifecycle and that we need to account for each phase.
> It has also been made more generic.
> I have created a Tech Preview Security Audit list and a GA Readiness
> Security Audit list.
> I've also included suggested items into the GA Readiness list.
>
> It has also been suggested that we publish the information as part of docs
> so that the state of such features can be easily determined from these
> pages. We can discuss this aspect as well.
>
> Thoughts?
>
> *Tech Preview Security Audit*
> For features that are being merged without full security model coverage,
> there need to be a base line of assurances that they do not introduce new
> attack vectors in deployments that are from actual releases or even just
> built from trunk.
>
> *1. UIs*
>
> 1.1. Are there new UIs added with this merge?
> 1.2. Are they enabled/accessible by default?
> 1.3. Are they hosted in existing processes or as part of a new
> process/server?
> 1.4. If new process/server, is it launched by default?
>
> *2. APIs*
>
> 2.1. Are there new REST APIs added with this merge?
> 2.2. Are they enabled by default?
> 2.3. Are there RPC based APIs added with this merge?
> 2.4. Are they enabled by default?
>
> *3. Secure Clusters*
>
> 3.1. Is this feature disabled completely in secure deployments?
> 3.2. If not, is there some justification as to why it should be available?
>
> *4. CVEs*
>
> 4.1. Have all dependencies introduced by this merge been checked for known
> issues?
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> --------------------------
>
>
> *GA Readiness Security Audit*
> At this point, we are merging full or partial security model
> implementations.
> Let's inventory what is covered by the model at this point and whether
> there are future merges required to be full.
>
> *1. UIs*
>
> 1.1. What sort of validation is being done on any accepted user input?
> (pointers to code would be appreciated)
> 1.2. What explicit protections have been built in for (pointers to code
> would be appreciated):
>   1.2.1. cross site scripting
>   1.2.2. cross site request forgery
>   1.2.3. click jacking (X-Frame-Options)
> 1.3. What sort of authentication is required for access to the UIs?
>   1.3.1. Kerberos
>     1.3.1.1. has TGT renewal been accounted for
>     1.3.1.2. SPNEGO support?
>     1.3.1.3. Delegation token?
>   1.3.2. Proxy User ACL?
> 1.4. What authorization is available for determining who can access what
> capabilities of the UIs for either viewing, modifying data and/or related
> processes?
> 1.5. Is there any input that will ultimately be persisted in configuration
> for executing shell commands or processes?
> 1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
> 1.7. Is there TLS/SSL support?
>
> *2. REST APIs*
>
> 2.1. Do the REST APIs support the trusted proxy pattern with doas
> impersonation capabilities?
> 2.2. What explicit protections have been built in for:
>   2.2.1. cross site scripting (XSS)
>   2.2.2. cross site request forgery (CSRF)
>   2.2.3. XML External Entity (XXE)
> 2.3. What is being used for authentication - Hadoop Auth Module?
> 2.4. Are there separate processes for the HTTP resources (UIs and REST
> endpoints) or are they part of existing processes?
> 2.5. Is there TLS/SSL support?
> 2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
> 2.7. What authorization enforcement points are there within the REST APIs?
>
> *3. Encryption*
>
> 3.1. Is there any support for encryption of persisted data?
> 3.2. If so, is KMS and the hadoop key command used for key management?
> 3.3. KMS interaction with Proxy Users?
>
> *4. Configuration*
>
> 4.1. Are there any passwords or secrets being added to configuration?
> 4.2. If so, are they accessed via Configuration.getPassword() to allow for
> provisioning to credential providers?
> 4.3. Are there any settings that are used to launch docker containers or
> shell out command execution, etc?
>
> *5. HA*
>
> 5.1. Are there provisions for HA?
> 5.2. Are there any single point of failures?
>
> *6. CVEs*
>
> Dependencies need to have been checked for known issues before we merge.
> We don't however want to list any CVEs that have been fixed but not
> released yet.
>
> 6.1. All dependencies checked for CVEs?
>
>
>
>
> On Sat, Oct 21, 2017 at 10:26 AM, larry mccay <lm...@apache.org> wrote:
>
>> Hi Marton -
>>
>> I don't think there is any denying that it would be great to have such
>> documentation for all of those reasons.
>> If it is a natural extension of getting the checklist information as an
>> assertion of security state when merging then we can certainly include it.
>>
>> I think that backfilling all such information across the project is a
>> different topic altogether and wouldn't want to expand the scope of this
>> discussion in that direction.
>>
>> Thanks for the great thoughts on this!
>>
>> thanks,
>>
>> --larry
>>
>>
>>
>>
>>
>> On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <hd...@anzix.net> wrote:
>>
>>>
>>>
>>> On 10/21/2017 02:41 AM, larry mccay wrote:
>>>
>>>>
>>>> "We might want to start a security section for Hadoop wiki for each of
>>>>> the
>>>>> services and components.
>>>>> This helps to track what has been completed."
>>>>>
>>>>
>>>> Do you mean to keep the audit checklist for each service and component
>>>> there?
>>>> Interesting idea, I wonder what sort of maintenance that implies and
>>>> whether we want to take on that burden even though it would be great
>>>> information to have for future reviewers.
>>>>
>>>
>>> I think we should care about the maintenance of the documentation
>>> anyway. We also need to maintain all the other documentations. I think it
>>> could be even part of the generated docs and not the wiki.
>>>
>>> I also suggest to fill this list about the current trunk/3.0 as a first
>>> step.
>>>
>>> 1. It would be a very usefull documentation for the end-users (some
>>> answers could link the existing documentation, it exists, but I am not sure
>>> if all the answers are in the current documentation.)
>>>
>>> 2. It would be a good example who the questions could be answered.
>>>
>>> 3. It would help to check, if something is missing from the list.
>>>
>>> 4. There are future branches where some of the components are not
>>> touched. For example, no web ui or no REST service. A prefilled list could
>>> help to check if the branch doesn't break any old security functionality on
>>> trunk.
>>>
>>> 5. It helps to document the security features in one place. If we have a
>>> list for the existing functionality in the same format, it would be easy to
>>> merge the new documentation of the new features as they will be reported in
>>> the same form. (So it won't be so hard to maintain the list...).
>>>
>>> Marton
>>>
>>
>>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by Eric Yang <ey...@apache.org>.

Looks good and +1 for markdown documentations to provide per release
specific information.

On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:

> New Revision...
>
> This revision acknowledges the reality that we often have multiple phases
> of feature lifecycle and that we need to account for each phase.
> It has also been made more generic.
> I have created a Tech Preview Security Audit list and a GA Readiness
> Security Audit list.
> I've also included suggested items into the GA Readiness list.
>
> It has also been suggested that we publish the information as part of docs
> so that the state of such features can be easily determined from these
> pages. We can discuss this aspect as well.
>
> Thoughts?
>
> *Tech Preview Security Audit*
> For features that are being merged without full security model coverage,
> there need to be a base line of assurances that they do not introduce new
> attack vectors in deployments that are from actual releases or even just
> built from trunk.
>
> *1. UIs*
>
> 1.1. Are there new UIs added with this merge?
> 1.2. Are they enabled/accessible by default?
> 1.3. Are they hosted in existing processes or as part of a new
> process/server?
> 1.4. If new process/server, is it launched by default?
>
> *2. APIs*
>
> 2.1. Are there new REST APIs added with this merge?
> 2.2. Are they enabled by default?
> 2.3. Are there RPC based APIs added with this merge?
> 2.4. Are they enabled by default?
>
> *3. Secure Clusters*
>
> 3.1. Is this feature disabled completely in secure deployments?
> 3.2. If not, is there some justification as to why it should be available?
>
> *4. CVEs*
>
> 4.1. Have all dependencies introduced by this merge been checked for known
> issues?
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> --------------------------
>
>
> *GA Readiness Security Audit*
> At this point, we are merging full or partial security model
> implementations.
> Let's inventory what is covered by the model at this point and whether
> there are future merges required to be full.
>
> *1. UIs*
>
> 1.1. What sort of validation is being done on any accepted user input?
> (pointers to code would be appreciated)
> 1.2. What explicit protections have been built in for (pointers to code
> would be appreciated):
>   1.2.1. cross site scripting
>   1.2.2. cross site request forgery
>   1.2.3. click jacking (X-Frame-Options)
> 1.3. What sort of authentication is required for access to the UIs?
>   1.3.1. Kerberos
>     1.3.1.1. has TGT renewal been accounted for
>     1.3.1.2. SPNEGO support?
>     1.3.1.3. Delegation token?
>   1.3.2. Proxy User ACL?
> 1.4. What authorization is available for determining who can access what
> capabilities of the UIs for either viewing, modifying data and/or related
> processes?
> 1.5. Is there any input that will ultimately be persisted in configuration
> for executing shell commands or processes?
> 1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
> 1.7. Is there TLS/SSL support?
>
> *2. REST APIs*
>
> 2.1. Do the REST APIs support the trusted proxy pattern with doas
> impersonation capabilities?
> 2.2. What explicit protections have been built in for:
>   2.2.1. cross site scripting (XSS)
>   2.2.2. cross site request forgery (CSRF)
>   2.2.3. XML External Entity (XXE)
> 2.3. What is being used for authentication - Hadoop Auth Module?
> 2.4. Are there separate processes for the HTTP resources (UIs and REST
> endpoints) or are they part of existing processes?
> 2.5. Is there TLS/SSL support?
> 2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
> 2.7. What authorization enforcement points are there within the REST APIs?
>
> *3. Encryption*
>
> 3.1. Is there any support for encryption of persisted data?
> 3.2. If so, is KMS and the hadoop key command used for key management?
> 3.3. KMS interaction with Proxy Users?
>
> *4. Configuration*
>
> 4.1. Are there any passwords or secrets being added to configuration?
> 4.2. If so, are they accessed via Configuration.getPassword() to allow for
> provisioning to credential providers?
> 4.3. Are there any settings that are used to launch docker containers or
> shell out command execution, etc?
>
> *5. HA*
>
> 5.1. Are there provisions for HA?
> 5.2. Are there any single point of failures?
>
> *6. CVEs*
>
> Dependencies need to have been checked for known issues before we merge.
> We don't however want to list any CVEs that have been fixed but not
> released yet.
>
> 6.1. All dependencies checked for CVEs?
>
>
>
>
> On Sat, Oct 21, 2017 at 10:26 AM, larry mccay <lm...@apache.org> wrote:
>
>> Hi Marton -
>>
>> I don't think there is any denying that it would be great to have such
>> documentation for all of those reasons.
>> If it is a natural extension of getting the checklist information as an
>> assertion of security state when merging then we can certainly include it.
>>
>> I think that backfilling all such information across the project is a
>> different topic altogether and wouldn't want to expand the scope of this
>> discussion in that direction.
>>
>> Thanks for the great thoughts on this!
>>
>> thanks,
>>
>> --larry
>>
>>
>>
>>
>>
>> On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <hd...@anzix.net> wrote:
>>
>>>
>>>
>>> On 10/21/2017 02:41 AM, larry mccay wrote:
>>>
>>>>
>>>> "We might want to start a security section for Hadoop wiki for each of
>>>>> the
>>>>> services and components.
>>>>> This helps to track what has been completed."
>>>>>
>>>>
>>>> Do you mean to keep the audit checklist for each service and component
>>>> there?
>>>> Interesting idea, I wonder what sort of maintenance that implies and
>>>> whether we want to take on that burden even though it would be great
>>>> information to have for future reviewers.
>>>>
>>>
>>> I think we should care about the maintenance of the documentation
>>> anyway. We also need to maintain all the other documentations. I think it
>>> could be even part of the generated docs and not the wiki.
>>>
>>> I also suggest to fill this list about the current trunk/3.0 as a first
>>> step.
>>>
>>> 1. It would be a very usefull documentation for the end-users (some
>>> answers could link the existing documentation, it exists, but I am not sure
>>> if all the answers are in the current documentation.)
>>>
>>> 2. It would be a good example who the questions could be answered.
>>>
>>> 3. It would help to check, if something is missing from the list.
>>>
>>> 4. There are future branches where some of the components are not
>>> touched. For example, no web ui or no REST service. A prefilled list could
>>> help to check if the branch doesn't break any old security functionality on
>>> trunk.
>>>
>>> 5. It helps to document the security features in one place. If we have a
>>> list for the existing functionality in the same format, it would be easy to
>>> merge the new documentation of the new features as they will be reported in
>>> the same form. (So it won't be so hard to maintain the list...).
>>>
>>> Marton
>>>
>>
>>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by Eric Yang <ey...@apache.org>.

Looks good and +1 for markdown documentations to provide per release
specific information.

On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:

> New Revision...
>
> This revision acknowledges the reality that we often have multiple phases
> of feature lifecycle and that we need to account for each phase.
> It has also been made more generic.
> I have created a Tech Preview Security Audit list and a GA Readiness
> Security Audit list.
> I've also included suggested items into the GA Readiness list.
>
> It has also been suggested that we publish the information as part of docs
> so that the state of such features can be easily determined from these
> pages. We can discuss this aspect as well.
>
> Thoughts?
>
> *Tech Preview Security Audit*
> For features that are being merged without full security model coverage,
> there need to be a base line of assurances that they do not introduce new
> attack vectors in deployments that are from actual releases or even just
> built from trunk.
>
> *1. UIs*
>
> 1.1. Are there new UIs added with this merge?
> 1.2. Are they enabled/accessible by default?
> 1.3. Are they hosted in existing processes or as part of a new
> process/server?
> 1.4. If new process/server, is it launched by default?
>
> *2. APIs*
>
> 2.1. Are there new REST APIs added with this merge?
> 2.2. Are they enabled by default?
> 2.3. Are there RPC based APIs added with this merge?
> 2.4. Are they enabled by default?
>
> *3. Secure Clusters*
>
> 3.1. Is this feature disabled completely in secure deployments?
> 3.2. If not, is there some justification as to why it should be available?
>
> *4. CVEs*
>
> 4.1. Have all dependencies introduced by this merge been checked for known
> issues?
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> --------------------------
>
>
> *GA Readiness Security Audit*
> At this point, we are merging full or partial security model
> implementations.
> Let's inventory what is covered by the model at this point and whether
> there are future merges required to be full.
>
> *1. UIs*
>
> 1.1. What sort of validation is being done on any accepted user input?
> (pointers to code would be appreciated)
> 1.2. What explicit protections have been built in for (pointers to code
> would be appreciated):
>   1.2.1. cross site scripting
>   1.2.2. cross site request forgery
>   1.2.3. click jacking (X-Frame-Options)
> 1.3. What sort of authentication is required for access to the UIs?
>   1.3.1. Kerberos
>     1.3.1.1. has TGT renewal been accounted for
>     1.3.1.2. SPNEGO support?
>     1.3.1.3. Delegation token?
>   1.3.2. Proxy User ACL?
> 1.4. What authorization is available for determining who can access what
> capabilities of the UIs for either viewing, modifying data and/or related
> processes?
> 1.5. Is there any input that will ultimately be persisted in configuration
> for executing shell commands or processes?
> 1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
> 1.7. Is there TLS/SSL support?
>
> *2. REST APIs*
>
> 2.1. Do the REST APIs support the trusted proxy pattern with doas
> impersonation capabilities?
> 2.2. What explicit protections have been built in for:
>   2.2.1. cross site scripting (XSS)
>   2.2.2. cross site request forgery (CSRF)
>   2.2.3. XML External Entity (XXE)
> 2.3. What is being used for authentication - Hadoop Auth Module?
> 2.4. Are there separate processes for the HTTP resources (UIs and REST
> endpoints) or are they part of existing processes?
> 2.5. Is there TLS/SSL support?
> 2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
> 2.7. What authorization enforcement points are there within the REST APIs?
>
> *3. Encryption*
>
> 3.1. Is there any support for encryption of persisted data?
> 3.2. If so, is KMS and the hadoop key command used for key management?
> 3.3. KMS interaction with Proxy Users?
>
> *4. Configuration*
>
> 4.1. Are there any passwords or secrets being added to configuration?
> 4.2. If so, are they accessed via Configuration.getPassword() to allow for
> provisioning to credential providers?
> 4.3. Are there any settings that are used to launch docker containers or
> shell out command execution, etc?
>
> *5. HA*
>
> 5.1. Are there provisions for HA?
> 5.2. Are there any single point of failures?
>
> *6. CVEs*
>
> Dependencies need to have been checked for known issues before we merge.
> We don't however want to list any CVEs that have been fixed but not
> released yet.
>
> 6.1. All dependencies checked for CVEs?
>
>
>
>
> On Sat, Oct 21, 2017 at 10:26 AM, larry mccay <lm...@apache.org> wrote:
>
>> Hi Marton -
>>
>> I don't think there is any denying that it would be great to have such
>> documentation for all of those reasons.
>> If it is a natural extension of getting the checklist information as an
>> assertion of security state when merging then we can certainly include it.
>>
>> I think that backfilling all such information across the project is a
>> different topic altogether and wouldn't want to expand the scope of this
>> discussion in that direction.
>>
>> Thanks for the great thoughts on this!
>>
>> thanks,
>>
>> --larry
>>
>>
>>
>>
>>
>> On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <hd...@anzix.net> wrote:
>>
>>>
>>>
>>> On 10/21/2017 02:41 AM, larry mccay wrote:
>>>
>>>>
>>>> "We might want to start a security section for Hadoop wiki for each of
>>>>> the
>>>>> services and components.
>>>>> This helps to track what has been completed."
>>>>>
>>>>
>>>> Do you mean to keep the audit checklist for each service and component
>>>> there?
>>>> Interesting idea, I wonder what sort of maintenance that implies and
>>>> whether we want to take on that burden even though it would be great
>>>> information to have for future reviewers.
>>>>
>>>
>>> I think we should care about the maintenance of the documentation
>>> anyway. We also need to maintain all the other documentations. I think it
>>> could be even part of the generated docs and not the wiki.
>>>
>>> I also suggest to fill this list about the current trunk/3.0 as a first
>>> step.
>>>
>>> 1. It would be a very usefull documentation for the end-users (some
>>> answers could link the existing documentation, it exists, but I am not sure
>>> if all the answers are in the current documentation.)
>>>
>>> 2. It would be a good example who the questions could be answered.
>>>
>>> 3. It would help to check, if something is missing from the list.
>>>
>>> 4. There are future branches where some of the components are not
>>> touched. For example, no web ui or no REST service. A prefilled list could
>>> help to check if the branch doesn't break any old security functionality on
>>> trunk.
>>>
>>> 5. It helps to document the security features in one place. If we have a
>>> list for the existing functionality in the same format, it would be easy to
>>> merge the new documentation of the new features as they will be reported in
>>> the same form. (So it won't be so hard to maintain the list...).
>>>
>>> Marton
>>>
>>
>>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by Eric Yang <ey...@apache.org>.

Looks good and +1 for markdown documentations to provide per release
specific information.

On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:

> New Revision...
>
> This revision acknowledges the reality that we often have multiple phases
> of feature lifecycle and that we need to account for each phase.
> It has also been made more generic.
> I have created a Tech Preview Security Audit list and a GA Readiness
> Security Audit list.
> I've also included suggested items into the GA Readiness list.
>
> It has also been suggested that we publish the information as part of docs
> so that the state of such features can be easily determined from these
> pages. We can discuss this aspect as well.
>
> Thoughts?
>
> *Tech Preview Security Audit*
> For features that are being merged without full security model coverage,
> there need to be a base line of assurances that they do not introduce new
> attack vectors in deployments that are from actual releases or even just
> built from trunk.
>
> *1. UIs*
>
> 1.1. Are there new UIs added with this merge?
> 1.2. Are they enabled/accessible by default?
> 1.3. Are they hosted in existing processes or as part of a new
> process/server?
> 1.4. If new process/server, is it launched by default?
>
> *2. APIs*
>
> 2.1. Are there new REST APIs added with this merge?
> 2.2. Are they enabled by default?
> 2.3. Are there RPC based APIs added with this merge?
> 2.4. Are they enabled by default?
>
> *3. Secure Clusters*
>
> 3.1. Is this feature disabled completely in secure deployments?
> 3.2. If not, is there some justification as to why it should be available?
>
> *4. CVEs*
>
> 4.1. Have all dependencies introduced by this merge been checked for known
> issues?
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> --------------------------
>
>
> *GA Readiness Security Audit*
> At this point, we are merging full or partial security model
> implementations.
> Let's inventory what is covered by the model at this point and whether
> there are future merges required to be full.
>
> *1. UIs*
>
> 1.1. What sort of validation is being done on any accepted user input?
> (pointers to code would be appreciated)
> 1.2. What explicit protections have been built in for (pointers to code
> would be appreciated):
>   1.2.1. cross site scripting
>   1.2.2. cross site request forgery
>   1.2.3. click jacking (X-Frame-Options)
> 1.3. What sort of authentication is required for access to the UIs?
>   1.3.1. Kerberos
>     1.3.1.1. has TGT renewal been accounted for
>     1.3.1.2. SPNEGO support?
>     1.3.1.3. Delegation token?
>   1.3.2. Proxy User ACL?
> 1.4. What authorization is available for determining who can access what
> capabilities of the UIs for either viewing, modifying data and/or related
> processes?
> 1.5. Is there any input that will ultimately be persisted in configuration
> for executing shell commands or processes?
> 1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
> 1.7. Is there TLS/SSL support?
>
> *2. REST APIs*
>
> 2.1. Do the REST APIs support the trusted proxy pattern with doas
> impersonation capabilities?
> 2.2. What explicit protections have been built in for:
>   2.2.1. cross site scripting (XSS)
>   2.2.2. cross site request forgery (CSRF)
>   2.2.3. XML External Entity (XXE)
> 2.3. What is being used for authentication - Hadoop Auth Module?
> 2.4. Are there separate processes for the HTTP resources (UIs and REST
> endpoints) or are they part of existing processes?
> 2.5. Is there TLS/SSL support?
> 2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
> 2.7. What authorization enforcement points are there within the REST APIs?
>
> *3. Encryption*
>
> 3.1. Is there any support for encryption of persisted data?
> 3.2. If so, is KMS and the hadoop key command used for key management?
> 3.3. KMS interaction with Proxy Users?
>
> *4. Configuration*
>
> 4.1. Are there any passwords or secrets being added to configuration?
> 4.2. If so, are they accessed via Configuration.getPassword() to allow for
> provisioning to credential providers?
> 4.3. Are there any settings that are used to launch docker containers or
> shell out command execution, etc?
>
> *5. HA*
>
> 5.1. Are there provisions for HA?
> 5.2. Are there any single point of failures?
>
> *6. CVEs*
>
> Dependencies need to have been checked for known issues before we merge.
> We don't however want to list any CVEs that have been fixed but not
> released yet.
>
> 6.1. All dependencies checked for CVEs?
>
>
>
>
> On Sat, Oct 21, 2017 at 10:26 AM, larry mccay <lm...@apache.org> wrote:
>
>> Hi Marton -
>>
>> I don't think there is any denying that it would be great to have such
>> documentation for all of those reasons.
>> If it is a natural extension of getting the checklist information as an
>> assertion of security state when merging then we can certainly include it.
>>
>> I think that backfilling all such information across the project is a
>> different topic altogether and wouldn't want to expand the scope of this
>> discussion in that direction.
>>
>> Thanks for the great thoughts on this!
>>
>> thanks,
>>
>> --larry
>>
>>
>>
>>
>>
>> On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <hd...@anzix.net> wrote:
>>
>>>
>>>
>>> On 10/21/2017 02:41 AM, larry mccay wrote:
>>>
>>>>
>>>> "We might want to start a security section for Hadoop wiki for each of
>>>>> the
>>>>> services and components.
>>>>> This helps to track what has been completed."
>>>>>
>>>>
>>>> Do you mean to keep the audit checklist for each service and component
>>>> there?
>>>> Interesting idea, I wonder what sort of maintenance that implies and
>>>> whether we want to take on that burden even though it would be great
>>>> information to have for future reviewers.
>>>>
>>>
>>> I think we should care about the maintenance of the documentation
>>> anyway. We also need to maintain all the other documentations. I think it
>>> could be even part of the generated docs and not the wiki.
>>>
>>> I also suggest to fill this list about the current trunk/3.0 as a first
>>> step.
>>>
>>> 1. It would be a very usefull documentation for the end-users (some
>>> answers could link the existing documentation, it exists, but I am not sure
>>> if all the answers are in the current documentation.)
>>>
>>> 2. It would be a good example who the questions could be answered.
>>>
>>> 3. It would help to check, if something is missing from the list.
>>>
>>> 4. There are future branches where some of the components are not
>>> touched. For example, no web ui or no REST service. A prefilled list could
>>> help to check if the branch doesn't break any old security functionality on
>>> trunk.
>>>
>>> 5. It helps to document the security features in one place. If we have a
>>> list for the existing functionality in the same format, it would be easy to
>>> merge the new documentation of the new features as they will be reported in
>>> the same form. (So it won't be so hard to maintain the list...).
>>>
>>> Marton
>>>
>>
>>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

New revision...

I have incorporated additions from Mike and added a [DEFAULT] tag to those
items that should be considered for Secure by Default settings.
I am hoping that we can close down on the actual lists shortly and move to
discussing the meta points on how/when to require the completion of the
checklists and whether and how they should be included as docs for the
feature moving forward.

Some comments that I have gotten offline have included concern that
targeting merge requests would only capture a subset of new features and
may actually affect the decision to use branches or not. This is certainly
something that we wouldn't want to do. At the same time, we don't want to
be so intrusive in the development cycles to bog down those patches that
just fix bugs.

At any rate, let's close down on the checklists here first.

Thanks!

*Tech Preview Security Audit*
For features that are being merged without full security model coverage,
there need to be a base line of assurances that they do not introduce new
attack vectors in deployments that are from actual releases or even just
built from trunk.

*1. UIs*

1.1. Are there new UIs added with this merge?
1.2. Are they enabled/accessible by default?
1.3. Are they hosted in existing processes or as part of a new
process/server?
1.4. If new process/server, is it launched by default?

*2. APIs*

2.1. Are there new REST APIs added with this merge?
2.2. Are they enabled by default?
2.3. Are there RPC based APIs added with this merge?
2.4. Are they enabled by default?

*3. Secure Clusters*

3.1. Is this feature disabled completely in secure deployments?
3.2. If not, is there some justification as to why it should be available?

*4. CVEs*

4.1. Have all dependencies introduced by this merge been checked for known
issues?

------------------------------------------------------------
------------------------------------------------------------
--------------------------

*GA Readiness Security Audit*
At this point, we are merging full or partial security model
implementations.
Let's inventory what is covered by the model at this point and whether
there are future merges required to be full.

*1. UIs*

1.1. What sort of validation is being done on any accepted user input?
[DEFAULT] (pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting [DEFAULT]
  1.2.2. cross site request forgery [DEFAULT]
  1.2.3. click jacking (X-Frame-Options) [DEFAULT]
  1.2.4 If using cookies, is the secure flag for cookies turned on?
[DEFAULT]
  1.2.5 If using cookies, is the HTTPOnly flag turned on? [DEFAULT]
1.3. What sort of authentication is required for access to the UIs?
[DEFAULT]
  1.3.1. Kerberos
    1.3.1.1. has TGT renewal been accounted for
    1.3.1.2. SPNEGO support?
    1.3.1.3. Delegation token?
  1.3.2. Proxy User ACL?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data and/or related
processes? [DEFAULT]
1.5. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
  1.5.1 If so, how is it validated before persistence? [DEFAULT]
1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
1.7. Is there TLS/SSL support? [DEFAULT]
  1.7.1 Is it possible to configure TLS protocols and cipher suites?
  1.7.2 Is it possible to configure support for HTTP Strict Transport
Security (HSTS)?
1.8 Are accesses to the UIs audited? ("User X logged into Y from IP address
Z", etc) [DEFAULT]

*2. REST APIs*

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS) [DEFAULT]
  2.2.2. cross site request forgery (CSRF) [DEFAULT]
  2.2.3. XML External Entity (XXE) [DEFAULT]
2.3. What is being used for authentication - Hadoop Auth Module? [DEFAULT]
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are they part of existing processes?
2.5. Is there TLS/SSL support? [DEFAULT]
  2.5.1 Is it possible to configure TLS protocols and cipher suites?
  2.5.2 Is it possible to configure support for HTTP Strict Transport
Security (HSTS)? [DEFAULT]
2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
2.7. What authorization enforcement points are there within the REST APIs?
2.8 Are accesses to the REST APIs audited? ("User X accessed resource Y
from IP address Z", etc) [DEFAULT]

*3. Encryption*

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?
3.3. KMS interaction with Proxy Users?
3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
any other in computer science. Standard cryptographic libraries should
always be used. Does this work attempt to create an encryption scheme or
protocol? Does it have a "novel" or "unique" use of normal crypto?  There
be dragons. Even normal-looking use of cryptography must be carefully
reviewed.
3.5 If you need random bits for a security purpose, such as for a session
token or a cryptographic key, you need a cryptographically approved place
to acquire said bits. Use the SecureRandom class. [DEFAULT]

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?

*7. Log Messages*

Do not write secrets or data into log files. This sounds obvious, but
mistakes happen.

7.1 Do not log passwords, keys, security-related tokens, or any sensitive
configuration item.
7.2 Do not log any user-supplied data, ever. Not even snippets of user
data, such as “I had an error parsing this line of text: xxxx” where the
xxxx’s are user data. You never know, it might contain secrets like credit
card numbers.

*8. Secure By Default*

Strive to be secure by default. This means that products should ship in a
secure state, and only by human tuning be put into an insecure state.
Exhibit A here is the MongoDB ransomware fiasco, where the
insecure-by-default MongoDB installation resulted in completely open
instances of mongodb on the open internet.  Attackers removed or encrypted
the data and left ransom notes behind. We don't want that sort of notoriety
for hadoop. Granted, it's not always possible to turn on all security
features: for example you have to have a KDC set up in order to enable
Kerberos.

8.1 Are there settings or configurations that can be shipped in a
default-secure state?

On Tue, Oct 31, 2017 at 10:36 AM, larry mccay <lm...@apache.org> wrote:

> Thanks for the examples, Mike.
>
> I think some of those should actually just be added to the checklist in
> other places as they are best practices.
> Which raises an interesting point that some of those items can be enabled
> by default and maybe indicating so throughout the list makes sense.
>
> Then we can ask for a description of any other Secure by Default
> considerations at the end.
>
> I will work on a new revision this morning.
>
>
> On Wed, Oct 25, 2017 at 4:56 PM, Michael Yoder <my...@cloudera.com>
> wrote:
>
>> #8 is a great topic - given that Hadoop is insecure by default.
>>> Actual movement to Secure by Default would be a challenge both
>>> technically (given the need for kerberos) and discussion-wise.
>>> Asking whether you have considered any settings of configurations that
>>> can be secure by default is an interesting idea.
>>>
>>> Can you provide an example though?
>>>
>>
>> It's tough, I admit - kerberos requires a KDC, TLS requires certificates,
>> etc.  But here are some ideas:
>>
>> - Default to only listen for network traffic on the loopback interface.
>> The admin would have to take specific action to listen on a non-loopback
>> address. Hence secure by default. I've known web servers that ship like
>> this. The counter argument to this is that this is a "useless by default"
>> setting for a distributed system... which does have some validity.
>> - A more constrained version of the above is to not bind to any network
>> interface that has an internet-routable ip address. (That is, not in the
>> ranges <https://en.wikipedia.org/wiki/Private_network> 192.168.x.x,
>> 172.16.x.x, and 10.x).  The idea is that we wouldn't want to risk traffic
>> that's obviously headed towards the open internet.  Sure this isn't
>> perfect, but it would catch some cases. The admin could provide a specific
>> flag to override.  (I got this one from discussion with the Kudu folks.)
>> - The examples don't have to be big. Another example would be... if using
>> TLS, and if the certificate authority used to sign the certificate is in
>> the default certificate store, turn on HSTS automatically.
>> - Always turn off TLSv1 and TLSv1.1
>> - Forbid single-DES and RC4 encryption algorithms
>>
>> You get the idea.
>> -Mike
>>
>>
>>
>>>
>>>
>>> On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <my...@cloudera.com>
>>> wrote:
>>>
>>>> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:
>>>>
>>>>> New Revision...
>>>>>
>>>>
>>>> These lists are wonderful. I appreciate the split between the Tech
>>>> Preview and the GA Readiness lists, with the emphasis on the former being
>>>> "don't enable by default" or at least "don't enable if security is on".  I
>>>> don't have any comments on that part.
>>>>
>>>> Additions inline below. If some of the additions are items covered by
>>>> existing frameworks that any code would use, please forgive my ignorance.
>>>> Also, my points aren't as succinct as yours. Feel free to reword.
>>>>
>>>> *GA Readiness Security Audit*
>>>>> At this point, we are merging full or partial security model
>>>>> implementations.
>>>>> Let's inventory what is covered by the model at this point and whether
>>>>> there are future merges required to be full.
>>>>>
>>>>> *1. UIs*
>>>>>
>>>>> 1.1. What sort of validation is being done on any accepted user input?
>>>>> (pointers to code would be appreciated)
>>>>> 1.2. What explicit protections have been built in for (pointers to
>>>>> code would be appreciated):
>>>>>   1.2.1. cross site scripting
>>>>>   1.2.2. cross site request forgery
>>>>>   1.2.3. click jacking (X-Frame-Options)
>>>>>
>>>>
>>>> 1.2.4 If using cookies, is the secure flag for cookies
>>>> <https://www.owasp.org/index.php/SecureFlag> turned on?
>>>>
>>>>
>>>>> 1.3. What sort of authentication is required for access to the UIs?
>>>>>   1.3.1. Kerberos
>>>>>     1.3.1.1. has TGT renewal been accounted for
>>>>>     1.3.1.2. SPNEGO support?
>>>>>     1.3.1.3. Delegation token?
>>>>>   1.3.2. Proxy User ACL?
>>>>> 1.4. What authorization is available for determining who can access
>>>>> what capabilities of the UIs for either viewing, modifying data and/or
>>>>> related processes?
>>>>> 1.5. Is there any input that will ultimately be persisted in
>>>>> configuration for executing shell commands or processes?
>>>>> 1.6. Do the UIs support the trusted proxy pattern with doas
>>>>> impersonation?
>>>>> 1.7. Is there TLS/SSL support?
>>>>>
>>>>
>>>> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
>>>> 1.7.2 Is it possible to configure support for HTTP Strict Transport
>>>> Security
>>>> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
>>>> (HSTS)?
>>>> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP
>>>> address Z", etc)
>>>>
>>>>
>>>>> *2. REST APIs*
>>>>>
>>>>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>>>>> impersonation capabilities?
>>>>> 2.2. What explicit protections have been built in for:
>>>>>   2.2.1. cross site scripting (XSS)
>>>>>   2.2.2. cross site request forgery (CSRF)
>>>>>   2.2.3. XML External Entity (XXE)
>>>>> 2.3. What is being used for authentication - Hadoop Auth Module?
>>>>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>>>>> endpoints) or are they part of existing processes?
>>>>> 2.5. Is there TLS/SSL support?
>>>>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>>>>> APIs?
>>>>> 2.7. What authorization enforcement points are there within the REST
>>>>> APIs?
>>>>>
>>>>
>>>> The TLS and audit comments above apply here, too.
>>>>
>>>>
>>>>> *3. Encryption*
>>>>>
>>>>> 3.1. Is there any support for encryption of persisted data?
>>>>> 3.2. If so, is KMS and the hadoop key command used for key management?
>>>>> 3.3. KMS interaction with Proxy Users?
>>>>>
>>>>
>>>> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto
>>>> than any other in computer science. Standard cryptographic libraries should
>>>> always be used. Does this work attempt to create an encryption scheme or
>>>> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
>>>> be dragons. Even normal-looking use of cryptography must be carefully
>>>> reviewed.
>>>> 3.5 If you need random bits for a security purpose, such as for a
>>>> session token or a cryptographic key, you need a cryptographically approved
>>>> place to acquire said bits. Use the SecureRandom class.
>>>>
>>>> *4. Configuration*
>>>>>
>>>>> 4.1. Are there any passwords or secrets being added to configuration?
>>>>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>>>>> for provisioning to credential providers?
>>>>> 4.3. Are there any settings that are used to launch docker containers
>>>>> or shell out command execution, etc?
>>>>>
>>>>
>>>> +1. So good.
>>>>
>>>>
>>>>> *5. HA*
>>>>>
>>>>> 5.1. Are there provisions for HA?
>>>>> 5.2. Are there any single point of failures?
>>>>>
>>>>> *6. CVEs*
>>>>>
>>>>> Dependencies need to have been checked for known issues before we
>>>>> merge.
>>>>> We don't however want to list any CVEs that have been fixed but not
>>>>> released yet.
>>>>>
>>>>> 6.1. All dependencies checked for CVEs?
>>>>>
>>>>
>>>> Big +1 for this, too.
>>>>
>>>> 7. Log Messages
>>>>
>>>> Do not write secrets or data into log files. This sounds obvious, but
>>>> mistakes happen.
>>>>
>>>> 7.1 Do not log passwords, keys, security-related tokens, or any
>>>> sensitive configuration item.
>>>> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
>>>> data, such as “I had an error parsing this line of text: xxxx” where the
>>>> xxxx’s are user data. You never know, it might contain secrets like credit
>>>> card numbers.
>>>>
>>>> 8. Secure By Default
>>>>
>>>> Strive to be *secure by default*. This means that products should ship
>>>> in a secure state, and only by human tuning be put into an insecure state.
>>>> Exhibit A here is the MongoDB ransomware fiasco
>>>> <https://krebsonsecurity.com/tag/mongodb/>, where the
>>>> insecure-by-default MongoDB installation resulted in completely open
>>>> instances of mongodb on the open internet.  Attackers removed or encrypted
>>>> the data and left ransom notes behind. We don't want that sort of notoriety
>>>> for hadoop. Granted, it's not always possible to turn on all security
>>>> features: for example you have to have a KDC set up in order to enable
>>>> Kerberos.
>>>>
>>>> 8.1 Are there settings or configurations that can be shipped in a
>>>> default-secure state?
>>>>
>>>>
>>>> Thanks again for putting this list together!
>>>> -Mike
>>>>
>>>>
>>>>
>>>
>>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

New revision...

I have incorporated additions from Mike and added a [DEFAULT] tag to those
items that should be considered for Secure by Default settings.
I am hoping that we can close down on the actual lists shortly and move to
discussing the meta points on how/when to require the completion of the
checklists and whether and how they should be included as docs for the
feature moving forward.

Some comments that I have gotten offline have included concern that
targeting merge requests would only capture a subset of new features and
may actually affect the decision to use branches or not. This is certainly
something that we wouldn't want to do. At the same time, we don't want to
be so intrusive in the development cycles to bog down those patches that
just fix bugs.

At any rate, let's close down on the checklists here first.

Thanks!

*Tech Preview Security Audit*
For features that are being merged without full security model coverage,
there need to be a base line of assurances that they do not introduce new
attack vectors in deployments that are from actual releases or even just
built from trunk.

*1. UIs*

1.1. Are there new UIs added with this merge?
1.2. Are they enabled/accessible by default?
1.3. Are they hosted in existing processes or as part of a new
process/server?
1.4. If new process/server, is it launched by default?

*2. APIs*

2.1. Are there new REST APIs added with this merge?
2.2. Are they enabled by default?
2.3. Are there RPC based APIs added with this merge?
2.4. Are they enabled by default?

*3. Secure Clusters*

3.1. Is this feature disabled completely in secure deployments?
3.2. If not, is there some justification as to why it should be available?

*4. CVEs*

4.1. Have all dependencies introduced by this merge been checked for known
issues?

------------------------------------------------------------
------------------------------------------------------------
--------------------------

*GA Readiness Security Audit*
At this point, we are merging full or partial security model
implementations.
Let's inventory what is covered by the model at this point and whether
there are future merges required to be full.

*1. UIs*

1.1. What sort of validation is being done on any accepted user input?
[DEFAULT] (pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting [DEFAULT]
  1.2.2. cross site request forgery [DEFAULT]
  1.2.3. click jacking (X-Frame-Options) [DEFAULT]
  1.2.4 If using cookies, is the secure flag for cookies turned on?
[DEFAULT]
  1.2.5 If using cookies, is the HTTPOnly flag turned on? [DEFAULT]
1.3. What sort of authentication is required for access to the UIs?
[DEFAULT]
  1.3.1. Kerberos
    1.3.1.1. has TGT renewal been accounted for
    1.3.1.2. SPNEGO support?
    1.3.1.3. Delegation token?
  1.3.2. Proxy User ACL?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data and/or related
processes? [DEFAULT]
1.5. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
  1.5.1 If so, how is it validated before persistence? [DEFAULT]
1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
1.7. Is there TLS/SSL support? [DEFAULT]
  1.7.1 Is it possible to configure TLS protocols and cipher suites?
  1.7.2 Is it possible to configure support for HTTP Strict Transport
Security (HSTS)?
1.8 Are accesses to the UIs audited? ("User X logged into Y from IP address
Z", etc) [DEFAULT]

*2. REST APIs*

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS) [DEFAULT]
  2.2.2. cross site request forgery (CSRF) [DEFAULT]
  2.2.3. XML External Entity (XXE) [DEFAULT]
2.3. What is being used for authentication - Hadoop Auth Module? [DEFAULT]
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are they part of existing processes?
2.5. Is there TLS/SSL support? [DEFAULT]
  2.5.1 Is it possible to configure TLS protocols and cipher suites?
  2.5.2 Is it possible to configure support for HTTP Strict Transport
Security (HSTS)? [DEFAULT]
2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
2.7. What authorization enforcement points are there within the REST APIs?
2.8 Are accesses to the REST APIs audited? ("User X accessed resource Y
from IP address Z", etc) [DEFAULT]

*3. Encryption*

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?
3.3. KMS interaction with Proxy Users?
3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
any other in computer science. Standard cryptographic libraries should
always be used. Does this work attempt to create an encryption scheme or
protocol? Does it have a "novel" or "unique" use of normal crypto?  There
be dragons. Even normal-looking use of cryptography must be carefully
reviewed.
3.5 If you need random bits for a security purpose, such as for a session
token or a cryptographic key, you need a cryptographically approved place
to acquire said bits. Use the SecureRandom class. [DEFAULT]

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?

*7. Log Messages*

Do not write secrets or data into log files. This sounds obvious, but
mistakes happen.

7.1 Do not log passwords, keys, security-related tokens, or any sensitive
configuration item.
7.2 Do not log any user-supplied data, ever. Not even snippets of user
data, such as “I had an error parsing this line of text: xxxx” where the
xxxx’s are user data. You never know, it might contain secrets like credit
card numbers.

*8. Secure By Default*

Strive to be secure by default. This means that products should ship in a
secure state, and only by human tuning be put into an insecure state.
Exhibit A here is the MongoDB ransomware fiasco, where the
insecure-by-default MongoDB installation resulted in completely open
instances of mongodb on the open internet.  Attackers removed or encrypted
the data and left ransom notes behind. We don't want that sort of notoriety
for hadoop. Granted, it's not always possible to turn on all security
features: for example you have to have a KDC set up in order to enable
Kerberos.

8.1 Are there settings or configurations that can be shipped in a
default-secure state?

On Tue, Oct 31, 2017 at 10:36 AM, larry mccay <lm...@apache.org> wrote:

> Thanks for the examples, Mike.
>
> I think some of those should actually just be added to the checklist in
> other places as they are best practices.
> Which raises an interesting point that some of those items can be enabled
> by default and maybe indicating so throughout the list makes sense.
>
> Then we can ask for a description of any other Secure by Default
> considerations at the end.
>
> I will work on a new revision this morning.
>
>
> On Wed, Oct 25, 2017 at 4:56 PM, Michael Yoder <my...@cloudera.com>
> wrote:
>
>> #8 is a great topic - given that Hadoop is insecure by default.
>>> Actual movement to Secure by Default would be a challenge both
>>> technically (given the need for kerberos) and discussion-wise.
>>> Asking whether you have considered any settings of configurations that
>>> can be secure by default is an interesting idea.
>>>
>>> Can you provide an example though?
>>>
>>
>> It's tough, I admit - kerberos requires a KDC, TLS requires certificates,
>> etc.  But here are some ideas:
>>
>> - Default to only listen for network traffic on the loopback interface.
>> The admin would have to take specific action to listen on a non-loopback
>> address. Hence secure by default. I've known web servers that ship like
>> this. The counter argument to this is that this is a "useless by default"
>> setting for a distributed system... which does have some validity.
>> - A more constrained version of the above is to not bind to any network
>> interface that has an internet-routable ip address. (That is, not in the
>> ranges <https://en.wikipedia.org/wiki/Private_network> 192.168.x.x,
>> 172.16.x.x, and 10.x).  The idea is that we wouldn't want to risk traffic
>> that's obviously headed towards the open internet.  Sure this isn't
>> perfect, but it would catch some cases. The admin could provide a specific
>> flag to override.  (I got this one from discussion with the Kudu folks.)
>> - The examples don't have to be big. Another example would be... if using
>> TLS, and if the certificate authority used to sign the certificate is in
>> the default certificate store, turn on HSTS automatically.
>> - Always turn off TLSv1 and TLSv1.1
>> - Forbid single-DES and RC4 encryption algorithms
>>
>> You get the idea.
>> -Mike
>>
>>
>>
>>>
>>>
>>> On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <my...@cloudera.com>
>>> wrote:
>>>
>>>> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:
>>>>
>>>>> New Revision...
>>>>>
>>>>
>>>> These lists are wonderful. I appreciate the split between the Tech
>>>> Preview and the GA Readiness lists, with the emphasis on the former being
>>>> "don't enable by default" or at least "don't enable if security is on".  I
>>>> don't have any comments on that part.
>>>>
>>>> Additions inline below. If some of the additions are items covered by
>>>> existing frameworks that any code would use, please forgive my ignorance.
>>>> Also, my points aren't as succinct as yours. Feel free to reword.
>>>>
>>>> *GA Readiness Security Audit*
>>>>> At this point, we are merging full or partial security model
>>>>> implementations.
>>>>> Let's inventory what is covered by the model at this point and whether
>>>>> there are future merges required to be full.
>>>>>
>>>>> *1. UIs*
>>>>>
>>>>> 1.1. What sort of validation is being done on any accepted user input?
>>>>> (pointers to code would be appreciated)
>>>>> 1.2. What explicit protections have been built in for (pointers to
>>>>> code would be appreciated):
>>>>>   1.2.1. cross site scripting
>>>>>   1.2.2. cross site request forgery
>>>>>   1.2.3. click jacking (X-Frame-Options)
>>>>>
>>>>
>>>> 1.2.4 If using cookies, is the secure flag for cookies
>>>> <https://www.owasp.org/index.php/SecureFlag> turned on?
>>>>
>>>>
>>>>> 1.3. What sort of authentication is required for access to the UIs?
>>>>>   1.3.1. Kerberos
>>>>>     1.3.1.1. has TGT renewal been accounted for
>>>>>     1.3.1.2. SPNEGO support?
>>>>>     1.3.1.3. Delegation token?
>>>>>   1.3.2. Proxy User ACL?
>>>>> 1.4. What authorization is available for determining who can access
>>>>> what capabilities of the UIs for either viewing, modifying data and/or
>>>>> related processes?
>>>>> 1.5. Is there any input that will ultimately be persisted in
>>>>> configuration for executing shell commands or processes?
>>>>> 1.6. Do the UIs support the trusted proxy pattern with doas
>>>>> impersonation?
>>>>> 1.7. Is there TLS/SSL support?
>>>>>
>>>>
>>>> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
>>>> 1.7.2 Is it possible to configure support for HTTP Strict Transport
>>>> Security
>>>> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
>>>> (HSTS)?
>>>> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP
>>>> address Z", etc)
>>>>
>>>>
>>>>> *2. REST APIs*
>>>>>
>>>>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>>>>> impersonation capabilities?
>>>>> 2.2. What explicit protections have been built in for:
>>>>>   2.2.1. cross site scripting (XSS)
>>>>>   2.2.2. cross site request forgery (CSRF)
>>>>>   2.2.3. XML External Entity (XXE)
>>>>> 2.3. What is being used for authentication - Hadoop Auth Module?
>>>>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>>>>> endpoints) or are they part of existing processes?
>>>>> 2.5. Is there TLS/SSL support?
>>>>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>>>>> APIs?
>>>>> 2.7. What authorization enforcement points are there within the REST
>>>>> APIs?
>>>>>
>>>>
>>>> The TLS and audit comments above apply here, too.
>>>>
>>>>
>>>>> *3. Encryption*
>>>>>
>>>>> 3.1. Is there any support for encryption of persisted data?
>>>>> 3.2. If so, is KMS and the hadoop key command used for key management?
>>>>> 3.3. KMS interaction with Proxy Users?
>>>>>
>>>>
>>>> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto
>>>> than any other in computer science. Standard cryptographic libraries should
>>>> always be used. Does this work attempt to create an encryption scheme or
>>>> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
>>>> be dragons. Even normal-looking use of cryptography must be carefully
>>>> reviewed.
>>>> 3.5 If you need random bits for a security purpose, such as for a
>>>> session token or a cryptographic key, you need a cryptographically approved
>>>> place to acquire said bits. Use the SecureRandom class.
>>>>
>>>> *4. Configuration*
>>>>>
>>>>> 4.1. Are there any passwords or secrets being added to configuration?
>>>>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>>>>> for provisioning to credential providers?
>>>>> 4.3. Are there any settings that are used to launch docker containers
>>>>> or shell out command execution, etc?
>>>>>
>>>>
>>>> +1. So good.
>>>>
>>>>
>>>>> *5. HA*
>>>>>
>>>>> 5.1. Are there provisions for HA?
>>>>> 5.2. Are there any single point of failures?
>>>>>
>>>>> *6. CVEs*
>>>>>
>>>>> Dependencies need to have been checked for known issues before we
>>>>> merge.
>>>>> We don't however want to list any CVEs that have been fixed but not
>>>>> released yet.
>>>>>
>>>>> 6.1. All dependencies checked for CVEs?
>>>>>
>>>>
>>>> Big +1 for this, too.
>>>>
>>>> 7. Log Messages
>>>>
>>>> Do not write secrets or data into log files. This sounds obvious, but
>>>> mistakes happen.
>>>>
>>>> 7.1 Do not log passwords, keys, security-related tokens, or any
>>>> sensitive configuration item.
>>>> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
>>>> data, such as “I had an error parsing this line of text: xxxx” where the
>>>> xxxx’s are user data. You never know, it might contain secrets like credit
>>>> card numbers.
>>>>
>>>> 8. Secure By Default
>>>>
>>>> Strive to be *secure by default*. This means that products should ship
>>>> in a secure state, and only by human tuning be put into an insecure state.
>>>> Exhibit A here is the MongoDB ransomware fiasco
>>>> <https://krebsonsecurity.com/tag/mongodb/>, where the
>>>> insecure-by-default MongoDB installation resulted in completely open
>>>> instances of mongodb on the open internet.  Attackers removed or encrypted
>>>> the data and left ransom notes behind. We don't want that sort of notoriety
>>>> for hadoop. Granted, it's not always possible to turn on all security
>>>> features: for example you have to have a KDC set up in order to enable
>>>> Kerberos.
>>>>
>>>> 8.1 Are there settings or configurations that can be shipped in a
>>>> default-secure state?
>>>>
>>>>
>>>> Thanks again for putting this list together!
>>>> -Mike
>>>>
>>>>
>>>>
>>>
>>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

New revision...

I have incorporated additions from Mike and added a [DEFAULT] tag to those
items that should be considered for Secure by Default settings.
I am hoping that we can close down on the actual lists shortly and move to
discussing the meta points on how/when to require the completion of the
checklists and whether and how they should be included as docs for the
feature moving forward.

Some comments that I have gotten offline have included concern that
targeting merge requests would only capture a subset of new features and
may actually affect the decision to use branches or not. This is certainly
something that we wouldn't want to do. At the same time, we don't want to
be so intrusive in the development cycles to bog down those patches that
just fix bugs.

At any rate, let's close down on the checklists here first.

Thanks!

*Tech Preview Security Audit*
For features that are being merged without full security model coverage,
there need to be a base line of assurances that they do not introduce new
attack vectors in deployments that are from actual releases or even just
built from trunk.

*1. UIs*

1.1. Are there new UIs added with this merge?
1.2. Are they enabled/accessible by default?
1.3. Are they hosted in existing processes or as part of a new
process/server?
1.4. If new process/server, is it launched by default?

*2. APIs*

2.1. Are there new REST APIs added with this merge?
2.2. Are they enabled by default?
2.3. Are there RPC based APIs added with this merge?
2.4. Are they enabled by default?

*3. Secure Clusters*

3.1. Is this feature disabled completely in secure deployments?
3.2. If not, is there some justification as to why it should be available?

*4. CVEs*

4.1. Have all dependencies introduced by this merge been checked for known
issues?

------------------------------------------------------------
------------------------------------------------------------
--------------------------

*GA Readiness Security Audit*
At this point, we are merging full or partial security model
implementations.
Let's inventory what is covered by the model at this point and whether
there are future merges required to be full.

*1. UIs*

1.1. What sort of validation is being done on any accepted user input?
[DEFAULT] (pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting [DEFAULT]
  1.2.2. cross site request forgery [DEFAULT]
  1.2.3. click jacking (X-Frame-Options) [DEFAULT]
  1.2.4 If using cookies, is the secure flag for cookies turned on?
[DEFAULT]
  1.2.5 If using cookies, is the HTTPOnly flag turned on? [DEFAULT]
1.3. What sort of authentication is required for access to the UIs?
[DEFAULT]
  1.3.1. Kerberos
    1.3.1.1. has TGT renewal been accounted for
    1.3.1.2. SPNEGO support?
    1.3.1.3. Delegation token?
  1.3.2. Proxy User ACL?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data and/or related
processes? [DEFAULT]
1.5. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
  1.5.1 If so, how is it validated before persistence? [DEFAULT]
1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
1.7. Is there TLS/SSL support? [DEFAULT]
  1.7.1 Is it possible to configure TLS protocols and cipher suites?
  1.7.2 Is it possible to configure support for HTTP Strict Transport
Security (HSTS)?
1.8 Are accesses to the UIs audited? ("User X logged into Y from IP address
Z", etc) [DEFAULT]

*2. REST APIs*

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS) [DEFAULT]
  2.2.2. cross site request forgery (CSRF) [DEFAULT]
  2.2.3. XML External Entity (XXE) [DEFAULT]
2.3. What is being used for authentication - Hadoop Auth Module? [DEFAULT]
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are they part of existing processes?
2.5. Is there TLS/SSL support? [DEFAULT]
  2.5.1 Is it possible to configure TLS protocols and cipher suites?
  2.5.2 Is it possible to configure support for HTTP Strict Transport
Security (HSTS)? [DEFAULT]
2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
2.7. What authorization enforcement points are there within the REST APIs?
2.8 Are accesses to the REST APIs audited? ("User X accessed resource Y
from IP address Z", etc) [DEFAULT]

*3. Encryption*

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?
3.3. KMS interaction with Proxy Users?
3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
any other in computer science. Standard cryptographic libraries should
always be used. Does this work attempt to create an encryption scheme or
protocol? Does it have a "novel" or "unique" use of normal crypto?  There
be dragons. Even normal-looking use of cryptography must be carefully
reviewed.
3.5 If you need random bits for a security purpose, such as for a session
token or a cryptographic key, you need a cryptographically approved place
to acquire said bits. Use the SecureRandom class. [DEFAULT]

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?

*7. Log Messages*

Do not write secrets or data into log files. This sounds obvious, but
mistakes happen.

7.1 Do not log passwords, keys, security-related tokens, or any sensitive
configuration item.
7.2 Do not log any user-supplied data, ever. Not even snippets of user
data, such as “I had an error parsing this line of text: xxxx” where the
xxxx’s are user data. You never know, it might contain secrets like credit
card numbers.

*8. Secure By Default*

Strive to be secure by default. This means that products should ship in a
secure state, and only by human tuning be put into an insecure state.
Exhibit A here is the MongoDB ransomware fiasco, where the
insecure-by-default MongoDB installation resulted in completely open
instances of mongodb on the open internet.  Attackers removed or encrypted
the data and left ransom notes behind. We don't want that sort of notoriety
for hadoop. Granted, it's not always possible to turn on all security
features: for example you have to have a KDC set up in order to enable
Kerberos.

8.1 Are there settings or configurations that can be shipped in a
default-secure state?

On Tue, Oct 31, 2017 at 10:36 AM, larry mccay <lm...@apache.org> wrote:

> Thanks for the examples, Mike.
>
> I think some of those should actually just be added to the checklist in
> other places as they are best practices.
> Which raises an interesting point that some of those items can be enabled
> by default and maybe indicating so throughout the list makes sense.
>
> Then we can ask for a description of any other Secure by Default
> considerations at the end.
>
> I will work on a new revision this morning.
>
>
> On Wed, Oct 25, 2017 at 4:56 PM, Michael Yoder <my...@cloudera.com>
> wrote:
>
>> #8 is a great topic - given that Hadoop is insecure by default.
>>> Actual movement to Secure by Default would be a challenge both
>>> technically (given the need for kerberos) and discussion-wise.
>>> Asking whether you have considered any settings of configurations that
>>> can be secure by default is an interesting idea.
>>>
>>> Can you provide an example though?
>>>
>>
>> It's tough, I admit - kerberos requires a KDC, TLS requires certificates,
>> etc.  But here are some ideas:
>>
>> - Default to only listen for network traffic on the loopback interface.
>> The admin would have to take specific action to listen on a non-loopback
>> address. Hence secure by default. I've known web servers that ship like
>> this. The counter argument to this is that this is a "useless by default"
>> setting for a distributed system... which does have some validity.
>> - A more constrained version of the above is to not bind to any network
>> interface that has an internet-routable ip address. (That is, not in the
>> ranges <https://en.wikipedia.org/wiki/Private_network> 192.168.x.x,
>> 172.16.x.x, and 10.x).  The idea is that we wouldn't want to risk traffic
>> that's obviously headed towards the open internet.  Sure this isn't
>> perfect, but it would catch some cases. The admin could provide a specific
>> flag to override.  (I got this one from discussion with the Kudu folks.)
>> - The examples don't have to be big. Another example would be... if using
>> TLS, and if the certificate authority used to sign the certificate is in
>> the default certificate store, turn on HSTS automatically.
>> - Always turn off TLSv1 and TLSv1.1
>> - Forbid single-DES and RC4 encryption algorithms
>>
>> You get the idea.
>> -Mike
>>
>>
>>
>>>
>>>
>>> On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <my...@cloudera.com>
>>> wrote:
>>>
>>>> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:
>>>>
>>>>> New Revision...
>>>>>
>>>>
>>>> These lists are wonderful. I appreciate the split between the Tech
>>>> Preview and the GA Readiness lists, with the emphasis on the former being
>>>> "don't enable by default" or at least "don't enable if security is on".  I
>>>> don't have any comments on that part.
>>>>
>>>> Additions inline below. If some of the additions are items covered by
>>>> existing frameworks that any code would use, please forgive my ignorance.
>>>> Also, my points aren't as succinct as yours. Feel free to reword.
>>>>
>>>> *GA Readiness Security Audit*
>>>>> At this point, we are merging full or partial security model
>>>>> implementations.
>>>>> Let's inventory what is covered by the model at this point and whether
>>>>> there are future merges required to be full.
>>>>>
>>>>> *1. UIs*
>>>>>
>>>>> 1.1. What sort of validation is being done on any accepted user input?
>>>>> (pointers to code would be appreciated)
>>>>> 1.2. What explicit protections have been built in for (pointers to
>>>>> code would be appreciated):
>>>>>   1.2.1. cross site scripting
>>>>>   1.2.2. cross site request forgery
>>>>>   1.2.3. click jacking (X-Frame-Options)
>>>>>
>>>>
>>>> 1.2.4 If using cookies, is the secure flag for cookies
>>>> <https://www.owasp.org/index.php/SecureFlag> turned on?
>>>>
>>>>
>>>>> 1.3. What sort of authentication is required for access to the UIs?
>>>>>   1.3.1. Kerberos
>>>>>     1.3.1.1. has TGT renewal been accounted for
>>>>>     1.3.1.2. SPNEGO support?
>>>>>     1.3.1.3. Delegation token?
>>>>>   1.3.2. Proxy User ACL?
>>>>> 1.4. What authorization is available for determining who can access
>>>>> what capabilities of the UIs for either viewing, modifying data and/or
>>>>> related processes?
>>>>> 1.5. Is there any input that will ultimately be persisted in
>>>>> configuration for executing shell commands or processes?
>>>>> 1.6. Do the UIs support the trusted proxy pattern with doas
>>>>> impersonation?
>>>>> 1.7. Is there TLS/SSL support?
>>>>>
>>>>
>>>> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
>>>> 1.7.2 Is it possible to configure support for HTTP Strict Transport
>>>> Security
>>>> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
>>>> (HSTS)?
>>>> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP
>>>> address Z", etc)
>>>>
>>>>
>>>>> *2. REST APIs*
>>>>>
>>>>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>>>>> impersonation capabilities?
>>>>> 2.2. What explicit protections have been built in for:
>>>>>   2.2.1. cross site scripting (XSS)
>>>>>   2.2.2. cross site request forgery (CSRF)
>>>>>   2.2.3. XML External Entity (XXE)
>>>>> 2.3. What is being used for authentication - Hadoop Auth Module?
>>>>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>>>>> endpoints) or are they part of existing processes?
>>>>> 2.5. Is there TLS/SSL support?
>>>>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>>>>> APIs?
>>>>> 2.7. What authorization enforcement points are there within the REST
>>>>> APIs?
>>>>>
>>>>
>>>> The TLS and audit comments above apply here, too.
>>>>
>>>>
>>>>> *3. Encryption*
>>>>>
>>>>> 3.1. Is there any support for encryption of persisted data?
>>>>> 3.2. If so, is KMS and the hadoop key command used for key management?
>>>>> 3.3. KMS interaction with Proxy Users?
>>>>>
>>>>
>>>> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto
>>>> than any other in computer science. Standard cryptographic libraries should
>>>> always be used. Does this work attempt to create an encryption scheme or
>>>> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
>>>> be dragons. Even normal-looking use of cryptography must be carefully
>>>> reviewed.
>>>> 3.5 If you need random bits for a security purpose, such as for a
>>>> session token or a cryptographic key, you need a cryptographically approved
>>>> place to acquire said bits. Use the SecureRandom class.
>>>>
>>>> *4. Configuration*
>>>>>
>>>>> 4.1. Are there any passwords or secrets being added to configuration?
>>>>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>>>>> for provisioning to credential providers?
>>>>> 4.3. Are there any settings that are used to launch docker containers
>>>>> or shell out command execution, etc?
>>>>>
>>>>
>>>> +1. So good.
>>>>
>>>>
>>>>> *5. HA*
>>>>>
>>>>> 5.1. Are there provisions for HA?
>>>>> 5.2. Are there any single point of failures?
>>>>>
>>>>> *6. CVEs*
>>>>>
>>>>> Dependencies need to have been checked for known issues before we
>>>>> merge.
>>>>> We don't however want to list any CVEs that have been fixed but not
>>>>> released yet.
>>>>>
>>>>> 6.1. All dependencies checked for CVEs?
>>>>>
>>>>
>>>> Big +1 for this, too.
>>>>
>>>> 7. Log Messages
>>>>
>>>> Do not write secrets or data into log files. This sounds obvious, but
>>>> mistakes happen.
>>>>
>>>> 7.1 Do not log passwords, keys, security-related tokens, or any
>>>> sensitive configuration item.
>>>> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
>>>> data, such as “I had an error parsing this line of text: xxxx” where the
>>>> xxxx’s are user data. You never know, it might contain secrets like credit
>>>> card numbers.
>>>>
>>>> 8. Secure By Default
>>>>
>>>> Strive to be *secure by default*. This means that products should ship
>>>> in a secure state, and only by human tuning be put into an insecure state.
>>>> Exhibit A here is the MongoDB ransomware fiasco
>>>> <https://krebsonsecurity.com/tag/mongodb/>, where the
>>>> insecure-by-default MongoDB installation resulted in completely open
>>>> instances of mongodb on the open internet.  Attackers removed or encrypted
>>>> the data and left ransom notes behind. We don't want that sort of notoriety
>>>> for hadoop. Granted, it's not always possible to turn on all security
>>>> features: for example you have to have a KDC set up in order to enable
>>>> Kerberos.
>>>>
>>>> 8.1 Are there settings or configurations that can be shipped in a
>>>> default-secure state?
>>>>
>>>>
>>>> Thanks again for putting this list together!
>>>> -Mike
>>>>
>>>>
>>>>
>>>
>>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

New revision...

I have incorporated additions from Mike and added a [DEFAULT] tag to those
items that should be considered for Secure by Default settings.
I am hoping that we can close down on the actual lists shortly and move to
discussing the meta points on how/when to require the completion of the
checklists and whether and how they should be included as docs for the
feature moving forward.

Some comments that I have gotten offline have included concern that
targeting merge requests would only capture a subset of new features and
may actually affect the decision to use branches or not. This is certainly
something that we wouldn't want to do. At the same time, we don't want to
be so intrusive in the development cycles to bog down those patches that
just fix bugs.

At any rate, let's close down on the checklists here first.

Thanks!

*Tech Preview Security Audit*
For features that are being merged without full security model coverage,
there need to be a base line of assurances that they do not introduce new
attack vectors in deployments that are from actual releases or even just
built from trunk.

*1. UIs*

1.1. Are there new UIs added with this merge?
1.2. Are they enabled/accessible by default?
1.3. Are they hosted in existing processes or as part of a new
process/server?
1.4. If new process/server, is it launched by default?

*2. APIs*

2.1. Are there new REST APIs added with this merge?
2.2. Are they enabled by default?
2.3. Are there RPC based APIs added with this merge?
2.4. Are they enabled by default?

*3. Secure Clusters*

3.1. Is this feature disabled completely in secure deployments?
3.2. If not, is there some justification as to why it should be available?

*4. CVEs*

4.1. Have all dependencies introduced by this merge been checked for known
issues?

------------------------------------------------------------
------------------------------------------------------------
--------------------------

*GA Readiness Security Audit*
At this point, we are merging full or partial security model
implementations.
Let's inventory what is covered by the model at this point and whether
there are future merges required to be full.

*1. UIs*

1.1. What sort of validation is being done on any accepted user input?
[DEFAULT] (pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting [DEFAULT]
  1.2.2. cross site request forgery [DEFAULT]
  1.2.3. click jacking (X-Frame-Options) [DEFAULT]
  1.2.4 If using cookies, is the secure flag for cookies turned on?
[DEFAULT]
  1.2.5 If using cookies, is the HTTPOnly flag turned on? [DEFAULT]
1.3. What sort of authentication is required for access to the UIs?
[DEFAULT]
  1.3.1. Kerberos
    1.3.1.1. has TGT renewal been accounted for
    1.3.1.2. SPNEGO support?
    1.3.1.3. Delegation token?
  1.3.2. Proxy User ACL?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data and/or related
processes? [DEFAULT]
1.5. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
  1.5.1 If so, how is it validated before persistence? [DEFAULT]
1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
1.7. Is there TLS/SSL support? [DEFAULT]
  1.7.1 Is it possible to configure TLS protocols and cipher suites?
  1.7.2 Is it possible to configure support for HTTP Strict Transport
Security (HSTS)?
1.8 Are accesses to the UIs audited? ("User X logged into Y from IP address
Z", etc) [DEFAULT]

*2. REST APIs*

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS) [DEFAULT]
  2.2.2. cross site request forgery (CSRF) [DEFAULT]
  2.2.3. XML External Entity (XXE) [DEFAULT]
2.3. What is being used for authentication - Hadoop Auth Module? [DEFAULT]
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are they part of existing processes?
2.5. Is there TLS/SSL support? [DEFAULT]
  2.5.1 Is it possible to configure TLS protocols and cipher suites?
  2.5.2 Is it possible to configure support for HTTP Strict Transport
Security (HSTS)? [DEFAULT]
2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
2.7. What authorization enforcement points are there within the REST APIs?
2.8 Are accesses to the REST APIs audited? ("User X accessed resource Y
from IP address Z", etc) [DEFAULT]

*3. Encryption*

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?
3.3. KMS interaction with Proxy Users?
3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
any other in computer science. Standard cryptographic libraries should
always be used. Does this work attempt to create an encryption scheme or
protocol? Does it have a "novel" or "unique" use of normal crypto?  There
be dragons. Even normal-looking use of cryptography must be carefully
reviewed.
3.5 If you need random bits for a security purpose, such as for a session
token or a cryptographic key, you need a cryptographically approved place
to acquire said bits. Use the SecureRandom class. [DEFAULT]

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?

*7. Log Messages*

Do not write secrets or data into log files. This sounds obvious, but
mistakes happen.

7.1 Do not log passwords, keys, security-related tokens, or any sensitive
configuration item.
7.2 Do not log any user-supplied data, ever. Not even snippets of user
data, such as “I had an error parsing this line of text: xxxx” where the
xxxx’s are user data. You never know, it might contain secrets like credit
card numbers.

*8. Secure By Default*

Strive to be secure by default. This means that products should ship in a
secure state, and only by human tuning be put into an insecure state.
Exhibit A here is the MongoDB ransomware fiasco, where the
insecure-by-default MongoDB installation resulted in completely open
instances of mongodb on the open internet.  Attackers removed or encrypted
the data and left ransom notes behind. We don't want that sort of notoriety
for hadoop. Granted, it's not always possible to turn on all security
features: for example you have to have a KDC set up in order to enable
Kerberos.

8.1 Are there settings or configurations that can be shipped in a
default-secure state?

On Tue, Oct 31, 2017 at 10:36 AM, larry mccay <lm...@apache.org> wrote:

> Thanks for the examples, Mike.
>
> I think some of those should actually just be added to the checklist in
> other places as they are best practices.
> Which raises an interesting point that some of those items can be enabled
> by default and maybe indicating so throughout the list makes sense.
>
> Then we can ask for a description of any other Secure by Default
> considerations at the end.
>
> I will work on a new revision this morning.
>
>
> On Wed, Oct 25, 2017 at 4:56 PM, Michael Yoder <my...@cloudera.com>
> wrote:
>
>> #8 is a great topic - given that Hadoop is insecure by default.
>>> Actual movement to Secure by Default would be a challenge both
>>> technically (given the need for kerberos) and discussion-wise.
>>> Asking whether you have considered any settings of configurations that
>>> can be secure by default is an interesting idea.
>>>
>>> Can you provide an example though?
>>>
>>
>> It's tough, I admit - kerberos requires a KDC, TLS requires certificates,
>> etc.  But here are some ideas:
>>
>> - Default to only listen for network traffic on the loopback interface.
>> The admin would have to take specific action to listen on a non-loopback
>> address. Hence secure by default. I've known web servers that ship like
>> this. The counter argument to this is that this is a "useless by default"
>> setting for a distributed system... which does have some validity.
>> - A more constrained version of the above is to not bind to any network
>> interface that has an internet-routable ip address. (That is, not in the
>> ranges <https://en.wikipedia.org/wiki/Private_network> 192.168.x.x,
>> 172.16.x.x, and 10.x).  The idea is that we wouldn't want to risk traffic
>> that's obviously headed towards the open internet.  Sure this isn't
>> perfect, but it would catch some cases. The admin could provide a specific
>> flag to override.  (I got this one from discussion with the Kudu folks.)
>> - The examples don't have to be big. Another example would be... if using
>> TLS, and if the certificate authority used to sign the certificate is in
>> the default certificate store, turn on HSTS automatically.
>> - Always turn off TLSv1 and TLSv1.1
>> - Forbid single-DES and RC4 encryption algorithms
>>
>> You get the idea.
>> -Mike
>>
>>
>>
>>>
>>>
>>> On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <my...@cloudera.com>
>>> wrote:
>>>
>>>> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:
>>>>
>>>>> New Revision...
>>>>>
>>>>
>>>> These lists are wonderful. I appreciate the split between the Tech
>>>> Preview and the GA Readiness lists, with the emphasis on the former being
>>>> "don't enable by default" or at least "don't enable if security is on".  I
>>>> don't have any comments on that part.
>>>>
>>>> Additions inline below. If some of the additions are items covered by
>>>> existing frameworks that any code would use, please forgive my ignorance.
>>>> Also, my points aren't as succinct as yours. Feel free to reword.
>>>>
>>>> *GA Readiness Security Audit*
>>>>> At this point, we are merging full or partial security model
>>>>> implementations.
>>>>> Let's inventory what is covered by the model at this point and whether
>>>>> there are future merges required to be full.
>>>>>
>>>>> *1. UIs*
>>>>>
>>>>> 1.1. What sort of validation is being done on any accepted user input?
>>>>> (pointers to code would be appreciated)
>>>>> 1.2. What explicit protections have been built in for (pointers to
>>>>> code would be appreciated):
>>>>>   1.2.1. cross site scripting
>>>>>   1.2.2. cross site request forgery
>>>>>   1.2.3. click jacking (X-Frame-Options)
>>>>>
>>>>
>>>> 1.2.4 If using cookies, is the secure flag for cookies
>>>> <https://www.owasp.org/index.php/SecureFlag> turned on?
>>>>
>>>>
>>>>> 1.3. What sort of authentication is required for access to the UIs?
>>>>>   1.3.1. Kerberos
>>>>>     1.3.1.1. has TGT renewal been accounted for
>>>>>     1.3.1.2. SPNEGO support?
>>>>>     1.3.1.3. Delegation token?
>>>>>   1.3.2. Proxy User ACL?
>>>>> 1.4. What authorization is available for determining who can access
>>>>> what capabilities of the UIs for either viewing, modifying data and/or
>>>>> related processes?
>>>>> 1.5. Is there any input that will ultimately be persisted in
>>>>> configuration for executing shell commands or processes?
>>>>> 1.6. Do the UIs support the trusted proxy pattern with doas
>>>>> impersonation?
>>>>> 1.7. Is there TLS/SSL support?
>>>>>
>>>>
>>>> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
>>>> 1.7.2 Is it possible to configure support for HTTP Strict Transport
>>>> Security
>>>> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
>>>> (HSTS)?
>>>> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP
>>>> address Z", etc)
>>>>
>>>>
>>>>> *2. REST APIs*
>>>>>
>>>>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>>>>> impersonation capabilities?
>>>>> 2.2. What explicit protections have been built in for:
>>>>>   2.2.1. cross site scripting (XSS)
>>>>>   2.2.2. cross site request forgery (CSRF)
>>>>>   2.2.3. XML External Entity (XXE)
>>>>> 2.3. What is being used for authentication - Hadoop Auth Module?
>>>>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>>>>> endpoints) or are they part of existing processes?
>>>>> 2.5. Is there TLS/SSL support?
>>>>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>>>>> APIs?
>>>>> 2.7. What authorization enforcement points are there within the REST
>>>>> APIs?
>>>>>
>>>>
>>>> The TLS and audit comments above apply here, too.
>>>>
>>>>
>>>>> *3. Encryption*
>>>>>
>>>>> 3.1. Is there any support for encryption of persisted data?
>>>>> 3.2. If so, is KMS and the hadoop key command used for key management?
>>>>> 3.3. KMS interaction with Proxy Users?
>>>>>
>>>>
>>>> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto
>>>> than any other in computer science. Standard cryptographic libraries should
>>>> always be used. Does this work attempt to create an encryption scheme or
>>>> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
>>>> be dragons. Even normal-looking use of cryptography must be carefully
>>>> reviewed.
>>>> 3.5 If you need random bits for a security purpose, such as for a
>>>> session token or a cryptographic key, you need a cryptographically approved
>>>> place to acquire said bits. Use the SecureRandom class.
>>>>
>>>> *4. Configuration*
>>>>>
>>>>> 4.1. Are there any passwords or secrets being added to configuration?
>>>>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>>>>> for provisioning to credential providers?
>>>>> 4.3. Are there any settings that are used to launch docker containers
>>>>> or shell out command execution, etc?
>>>>>
>>>>
>>>> +1. So good.
>>>>
>>>>
>>>>> *5. HA*
>>>>>
>>>>> 5.1. Are there provisions for HA?
>>>>> 5.2. Are there any single point of failures?
>>>>>
>>>>> *6. CVEs*
>>>>>
>>>>> Dependencies need to have been checked for known issues before we
>>>>> merge.
>>>>> We don't however want to list any CVEs that have been fixed but not
>>>>> released yet.
>>>>>
>>>>> 6.1. All dependencies checked for CVEs?
>>>>>
>>>>
>>>> Big +1 for this, too.
>>>>
>>>> 7. Log Messages
>>>>
>>>> Do not write secrets or data into log files. This sounds obvious, but
>>>> mistakes happen.
>>>>
>>>> 7.1 Do not log passwords, keys, security-related tokens, or any
>>>> sensitive configuration item.
>>>> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
>>>> data, such as “I had an error parsing this line of text: xxxx” where the
>>>> xxxx’s are user data. You never know, it might contain secrets like credit
>>>> card numbers.
>>>>
>>>> 8. Secure By Default
>>>>
>>>> Strive to be *secure by default*. This means that products should ship
>>>> in a secure state, and only by human tuning be put into an insecure state.
>>>> Exhibit A here is the MongoDB ransomware fiasco
>>>> <https://krebsonsecurity.com/tag/mongodb/>, where the
>>>> insecure-by-default MongoDB installation resulted in completely open
>>>> instances of mongodb on the open internet.  Attackers removed or encrypted
>>>> the data and left ransom notes behind. We don't want that sort of notoriety
>>>> for hadoop. Granted, it's not always possible to turn on all security
>>>> features: for example you have to have a KDC set up in order to enable
>>>> Kerberos.
>>>>
>>>> 8.1 Are there settings or configurations that can be shipped in a
>>>> default-secure state?
>>>>
>>>>
>>>> Thanks again for putting this list together!
>>>> -Mike
>>>>
>>>>
>>>>
>>>
>>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Thanks for the examples, Mike.

I think some of those should actually just be added to the checklist in
other places as they are best practices.
Which raises an interesting point that some of those items can be enabled
by default and maybe indicating so throughout the list makes sense.

Then we can ask for a description of any other Secure by Default
considerations at the end.

I will work on a new revision this morning.


On Wed, Oct 25, 2017 at 4:56 PM, Michael Yoder <my...@cloudera.com> wrote:

> #8 is a great topic - given that Hadoop is insecure by default.
>> Actual movement to Secure by Default would be a challenge both
>> technically (given the need for kerberos) and discussion-wise.
>> Asking whether you have considered any settings of configurations that
>> can be secure by default is an interesting idea.
>>
>> Can you provide an example though?
>>
>
> It's tough, I admit - kerberos requires a KDC, TLS requires certificates,
> etc.  But here are some ideas:
>
> - Default to only listen for network traffic on the loopback interface.
> The admin would have to take specific action to listen on a non-loopback
> address. Hence secure by default. I've known web servers that ship like
> this. The counter argument to this is that this is a "useless by default"
> setting for a distributed system... which does have some validity.
> - A more constrained version of the above is to not bind to any network
> interface that has an internet-routable ip address. (That is, not in the
> ranges <https://en.wikipedia.org/wiki/Private_network> 192.168.x.x,
> 172.16.x.x, and 10.x).  The idea is that we wouldn't want to risk traffic
> that's obviously headed towards the open internet.  Sure this isn't
> perfect, but it would catch some cases. The admin could provide a specific
> flag to override.  (I got this one from discussion with the Kudu folks.)
> - The examples don't have to be big. Another example would be... if using
> TLS, and if the certificate authority used to sign the certificate is in
> the default certificate store, turn on HSTS automatically.
> - Always turn off TLSv1 and TLSv1.1
> - Forbid single-DES and RC4 encryption algorithms
>
> You get the idea.
> -Mike
>
>
>
>>
>>
>> On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <my...@cloudera.com>
>> wrote:
>>
>>> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:
>>>
>>>> New Revision...
>>>>
>>>
>>> These lists are wonderful. I appreciate the split between the Tech
>>> Preview and the GA Readiness lists, with the emphasis on the former being
>>> "don't enable by default" or at least "don't enable if security is on".  I
>>> don't have any comments on that part.
>>>
>>> Additions inline below. If some of the additions are items covered by
>>> existing frameworks that any code would use, please forgive my ignorance.
>>> Also, my points aren't as succinct as yours. Feel free to reword.
>>>
>>> *GA Readiness Security Audit*
>>>> At this point, we are merging full or partial security model
>>>> implementations.
>>>> Let's inventory what is covered by the model at this point and whether
>>>> there are future merges required to be full.
>>>>
>>>> *1. UIs*
>>>>
>>>> 1.1. What sort of validation is being done on any accepted user input?
>>>> (pointers to code would be appreciated)
>>>> 1.2. What explicit protections have been built in for (pointers to code
>>>> would be appreciated):
>>>>   1.2.1. cross site scripting
>>>>   1.2.2. cross site request forgery
>>>>   1.2.3. click jacking (X-Frame-Options)
>>>>
>>>
>>> 1.2.4 If using cookies, is the secure flag for cookies
>>> <https://www.owasp.org/index.php/SecureFlag> turned on?
>>>
>>>
>>>> 1.3. What sort of authentication is required for access to the UIs?
>>>>   1.3.1. Kerberos
>>>>     1.3.1.1. has TGT renewal been accounted for
>>>>     1.3.1.2. SPNEGO support?
>>>>     1.3.1.3. Delegation token?
>>>>   1.3.2. Proxy User ACL?
>>>> 1.4. What authorization is available for determining who can access
>>>> what capabilities of the UIs for either viewing, modifying data and/or
>>>> related processes?
>>>> 1.5. Is there any input that will ultimately be persisted in
>>>> configuration for executing shell commands or processes?
>>>> 1.6. Do the UIs support the trusted proxy pattern with doas
>>>> impersonation?
>>>> 1.7. Is there TLS/SSL support?
>>>>
>>>
>>> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
>>> 1.7.2 Is it possible to configure support for HTTP Strict Transport
>>> Security
>>> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
>>> (HSTS)?
>>> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP
>>> address Z", etc)
>>>
>>>
>>>> *2. REST APIs*
>>>>
>>>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>>>> impersonation capabilities?
>>>> 2.2. What explicit protections have been built in for:
>>>>   2.2.1. cross site scripting (XSS)
>>>>   2.2.2. cross site request forgery (CSRF)
>>>>   2.2.3. XML External Entity (XXE)
>>>> 2.3. What is being used for authentication - Hadoop Auth Module?
>>>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>>>> endpoints) or are they part of existing processes?
>>>> 2.5. Is there TLS/SSL support?
>>>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>>>> APIs?
>>>> 2.7. What authorization enforcement points are there within the REST
>>>> APIs?
>>>>
>>>
>>> The TLS and audit comments above apply here, too.
>>>
>>>
>>>> *3. Encryption*
>>>>
>>>> 3.1. Is there any support for encryption of persisted data?
>>>> 3.2. If so, is KMS and the hadoop key command used for key management?
>>>> 3.3. KMS interaction with Proxy Users?
>>>>
>>>
>>> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
>>> any other in computer science. Standard cryptographic libraries should
>>> always be used. Does this work attempt to create an encryption scheme or
>>> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
>>> be dragons. Even normal-looking use of cryptography must be carefully
>>> reviewed.
>>> 3.5 If you need random bits for a security purpose, such as for a
>>> session token or a cryptographic key, you need a cryptographically approved
>>> place to acquire said bits. Use the SecureRandom class.
>>>
>>> *4. Configuration*
>>>>
>>>> 4.1. Are there any passwords or secrets being added to configuration?
>>>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>>>> for provisioning to credential providers?
>>>> 4.3. Are there any settings that are used to launch docker containers
>>>> or shell out command execution, etc?
>>>>
>>>
>>> +1. So good.
>>>
>>>
>>>> *5. HA*
>>>>
>>>> 5.1. Are there provisions for HA?
>>>> 5.2. Are there any single point of failures?
>>>>
>>>> *6. CVEs*
>>>>
>>>> Dependencies need to have been checked for known issues before we merge.
>>>> We don't however want to list any CVEs that have been fixed but not
>>>> released yet.
>>>>
>>>> 6.1. All dependencies checked for CVEs?
>>>>
>>>
>>> Big +1 for this, too.
>>>
>>> 7. Log Messages
>>>
>>> Do not write secrets or data into log files. This sounds obvious, but
>>> mistakes happen.
>>>
>>> 7.1 Do not log passwords, keys, security-related tokens, or any
>>> sensitive configuration item.
>>> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
>>> data, such as “I had an error parsing this line of text: xxxx” where the
>>> xxxx’s are user data. You never know, it might contain secrets like credit
>>> card numbers.
>>>
>>> 8. Secure By Default
>>>
>>> Strive to be *secure by default*. This means that products should ship
>>> in a secure state, and only by human tuning be put into an insecure state.
>>> Exhibit A here is the MongoDB ransomware fiasco
>>> <https://krebsonsecurity.com/tag/mongodb/>, where the
>>> insecure-by-default MongoDB installation resulted in completely open
>>> instances of mongodb on the open internet.  Attackers removed or encrypted
>>> the data and left ransom notes behind. We don't want that sort of notoriety
>>> for hadoop. Granted, it's not always possible to turn on all security
>>> features: for example you have to have a KDC set up in order to enable
>>> Kerberos.
>>>
>>> 8.1 Are there settings or configurations that can be shipped in a
>>> default-secure state?
>>>
>>>
>>> Thanks again for putting this list together!
>>> -Mike
>>>
>>>
>>>
>>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Thanks for the examples, Mike.

I think some of those should actually just be added to the checklist in
other places as they are best practices.
Which raises an interesting point that some of those items can be enabled
by default and maybe indicating so throughout the list makes sense.

Then we can ask for a description of any other Secure by Default
considerations at the end.

I will work on a new revision this morning.


On Wed, Oct 25, 2017 at 4:56 PM, Michael Yoder <my...@cloudera.com> wrote:

> #8 is a great topic - given that Hadoop is insecure by default.
>> Actual movement to Secure by Default would be a challenge both
>> technically (given the need for kerberos) and discussion-wise.
>> Asking whether you have considered any settings of configurations that
>> can be secure by default is an interesting idea.
>>
>> Can you provide an example though?
>>
>
> It's tough, I admit - kerberos requires a KDC, TLS requires certificates,
> etc.  But here are some ideas:
>
> - Default to only listen for network traffic on the loopback interface.
> The admin would have to take specific action to listen on a non-loopback
> address. Hence secure by default. I've known web servers that ship like
> this. The counter argument to this is that this is a "useless by default"
> setting for a distributed system... which does have some validity.
> - A more constrained version of the above is to not bind to any network
> interface that has an internet-routable ip address. (That is, not in the
> ranges <https://en.wikipedia.org/wiki/Private_network> 192.168.x.x,
> 172.16.x.x, and 10.x).  The idea is that we wouldn't want to risk traffic
> that's obviously headed towards the open internet.  Sure this isn't
> perfect, but it would catch some cases. The admin could provide a specific
> flag to override.  (I got this one from discussion with the Kudu folks.)
> - The examples don't have to be big. Another example would be... if using
> TLS, and if the certificate authority used to sign the certificate is in
> the default certificate store, turn on HSTS automatically.
> - Always turn off TLSv1 and TLSv1.1
> - Forbid single-DES and RC4 encryption algorithms
>
> You get the idea.
> -Mike
>
>
>
>>
>>
>> On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <my...@cloudera.com>
>> wrote:
>>
>>> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:
>>>
>>>> New Revision...
>>>>
>>>
>>> These lists are wonderful. I appreciate the split between the Tech
>>> Preview and the GA Readiness lists, with the emphasis on the former being
>>> "don't enable by default" or at least "don't enable if security is on".  I
>>> don't have any comments on that part.
>>>
>>> Additions inline below. If some of the additions are items covered by
>>> existing frameworks that any code would use, please forgive my ignorance.
>>> Also, my points aren't as succinct as yours. Feel free to reword.
>>>
>>> *GA Readiness Security Audit*
>>>> At this point, we are merging full or partial security model
>>>> implementations.
>>>> Let's inventory what is covered by the model at this point and whether
>>>> there are future merges required to be full.
>>>>
>>>> *1. UIs*
>>>>
>>>> 1.1. What sort of validation is being done on any accepted user input?
>>>> (pointers to code would be appreciated)
>>>> 1.2. What explicit protections have been built in for (pointers to code
>>>> would be appreciated):
>>>>   1.2.1. cross site scripting
>>>>   1.2.2. cross site request forgery
>>>>   1.2.3. click jacking (X-Frame-Options)
>>>>
>>>
>>> 1.2.4 If using cookies, is the secure flag for cookies
>>> <https://www.owasp.org/index.php/SecureFlag> turned on?
>>>
>>>
>>>> 1.3. What sort of authentication is required for access to the UIs?
>>>>   1.3.1. Kerberos
>>>>     1.3.1.1. has TGT renewal been accounted for
>>>>     1.3.1.2. SPNEGO support?
>>>>     1.3.1.3. Delegation token?
>>>>   1.3.2. Proxy User ACL?
>>>> 1.4. What authorization is available for determining who can access
>>>> what capabilities of the UIs for either viewing, modifying data and/or
>>>> related processes?
>>>> 1.5. Is there any input that will ultimately be persisted in
>>>> configuration for executing shell commands or processes?
>>>> 1.6. Do the UIs support the trusted proxy pattern with doas
>>>> impersonation?
>>>> 1.7. Is there TLS/SSL support?
>>>>
>>>
>>> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
>>> 1.7.2 Is it possible to configure support for HTTP Strict Transport
>>> Security
>>> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
>>> (HSTS)?
>>> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP
>>> address Z", etc)
>>>
>>>
>>>> *2. REST APIs*
>>>>
>>>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>>>> impersonation capabilities?
>>>> 2.2. What explicit protections have been built in for:
>>>>   2.2.1. cross site scripting (XSS)
>>>>   2.2.2. cross site request forgery (CSRF)
>>>>   2.2.3. XML External Entity (XXE)
>>>> 2.3. What is being used for authentication - Hadoop Auth Module?
>>>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>>>> endpoints) or are they part of existing processes?
>>>> 2.5. Is there TLS/SSL support?
>>>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>>>> APIs?
>>>> 2.7. What authorization enforcement points are there within the REST
>>>> APIs?
>>>>
>>>
>>> The TLS and audit comments above apply here, too.
>>>
>>>
>>>> *3. Encryption*
>>>>
>>>> 3.1. Is there any support for encryption of persisted data?
>>>> 3.2. If so, is KMS and the hadoop key command used for key management?
>>>> 3.3. KMS interaction with Proxy Users?
>>>>
>>>
>>> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
>>> any other in computer science. Standard cryptographic libraries should
>>> always be used. Does this work attempt to create an encryption scheme or
>>> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
>>> be dragons. Even normal-looking use of cryptography must be carefully
>>> reviewed.
>>> 3.5 If you need random bits for a security purpose, such as for a
>>> session token or a cryptographic key, you need a cryptographically approved
>>> place to acquire said bits. Use the SecureRandom class.
>>>
>>> *4. Configuration*
>>>>
>>>> 4.1. Are there any passwords or secrets being added to configuration?
>>>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>>>> for provisioning to credential providers?
>>>> 4.3. Are there any settings that are used to launch docker containers
>>>> or shell out command execution, etc?
>>>>
>>>
>>> +1. So good.
>>>
>>>
>>>> *5. HA*
>>>>
>>>> 5.1. Are there provisions for HA?
>>>> 5.2. Are there any single point of failures?
>>>>
>>>> *6. CVEs*
>>>>
>>>> Dependencies need to have been checked for known issues before we merge.
>>>> We don't however want to list any CVEs that have been fixed but not
>>>> released yet.
>>>>
>>>> 6.1. All dependencies checked for CVEs?
>>>>
>>>
>>> Big +1 for this, too.
>>>
>>> 7. Log Messages
>>>
>>> Do not write secrets or data into log files. This sounds obvious, but
>>> mistakes happen.
>>>
>>> 7.1 Do not log passwords, keys, security-related tokens, or any
>>> sensitive configuration item.
>>> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
>>> data, such as “I had an error parsing this line of text: xxxx” where the
>>> xxxx’s are user data. You never know, it might contain secrets like credit
>>> card numbers.
>>>
>>> 8. Secure By Default
>>>
>>> Strive to be *secure by default*. This means that products should ship
>>> in a secure state, and only by human tuning be put into an insecure state.
>>> Exhibit A here is the MongoDB ransomware fiasco
>>> <https://krebsonsecurity.com/tag/mongodb/>, where the
>>> insecure-by-default MongoDB installation resulted in completely open
>>> instances of mongodb on the open internet.  Attackers removed or encrypted
>>> the data and left ransom notes behind. We don't want that sort of notoriety
>>> for hadoop. Granted, it's not always possible to turn on all security
>>> features: for example you have to have a KDC set up in order to enable
>>> Kerberos.
>>>
>>> 8.1 Are there settings or configurations that can be shipped in a
>>> default-secure state?
>>>
>>>
>>> Thanks again for putting this list together!
>>> -Mike
>>>
>>>
>>>
>>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Thanks for the examples, Mike.

I think some of those should actually just be added to the checklist in
other places as they are best practices.
Which raises an interesting point that some of those items can be enabled
by default and maybe indicating so throughout the list makes sense.

Then we can ask for a description of any other Secure by Default
considerations at the end.

I will work on a new revision this morning.


On Wed, Oct 25, 2017 at 4:56 PM, Michael Yoder <my...@cloudera.com> wrote:

> #8 is a great topic - given that Hadoop is insecure by default.
>> Actual movement to Secure by Default would be a challenge both
>> technically (given the need for kerberos) and discussion-wise.
>> Asking whether you have considered any settings of configurations that
>> can be secure by default is an interesting idea.
>>
>> Can you provide an example though?
>>
>
> It's tough, I admit - kerberos requires a KDC, TLS requires certificates,
> etc.  But here are some ideas:
>
> - Default to only listen for network traffic on the loopback interface.
> The admin would have to take specific action to listen on a non-loopback
> address. Hence secure by default. I've known web servers that ship like
> this. The counter argument to this is that this is a "useless by default"
> setting for a distributed system... which does have some validity.
> - A more constrained version of the above is to not bind to any network
> interface that has an internet-routable ip address. (That is, not in the
> ranges <https://en.wikipedia.org/wiki/Private_network> 192.168.x.x,
> 172.16.x.x, and 10.x).  The idea is that we wouldn't want to risk traffic
> that's obviously headed towards the open internet.  Sure this isn't
> perfect, but it would catch some cases. The admin could provide a specific
> flag to override.  (I got this one from discussion with the Kudu folks.)
> - The examples don't have to be big. Another example would be... if using
> TLS, and if the certificate authority used to sign the certificate is in
> the default certificate store, turn on HSTS automatically.
> - Always turn off TLSv1 and TLSv1.1
> - Forbid single-DES and RC4 encryption algorithms
>
> You get the idea.
> -Mike
>
>
>
>>
>>
>> On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <my...@cloudera.com>
>> wrote:
>>
>>> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:
>>>
>>>> New Revision...
>>>>
>>>
>>> These lists are wonderful. I appreciate the split between the Tech
>>> Preview and the GA Readiness lists, with the emphasis on the former being
>>> "don't enable by default" or at least "don't enable if security is on".  I
>>> don't have any comments on that part.
>>>
>>> Additions inline below. If some of the additions are items covered by
>>> existing frameworks that any code would use, please forgive my ignorance.
>>> Also, my points aren't as succinct as yours. Feel free to reword.
>>>
>>> *GA Readiness Security Audit*
>>>> At this point, we are merging full or partial security model
>>>> implementations.
>>>> Let's inventory what is covered by the model at this point and whether
>>>> there are future merges required to be full.
>>>>
>>>> *1. UIs*
>>>>
>>>> 1.1. What sort of validation is being done on any accepted user input?
>>>> (pointers to code would be appreciated)
>>>> 1.2. What explicit protections have been built in for (pointers to code
>>>> would be appreciated):
>>>>   1.2.1. cross site scripting
>>>>   1.2.2. cross site request forgery
>>>>   1.2.3. click jacking (X-Frame-Options)
>>>>
>>>
>>> 1.2.4 If using cookies, is the secure flag for cookies
>>> <https://www.owasp.org/index.php/SecureFlag> turned on?
>>>
>>>
>>>> 1.3. What sort of authentication is required for access to the UIs?
>>>>   1.3.1. Kerberos
>>>>     1.3.1.1. has TGT renewal been accounted for
>>>>     1.3.1.2. SPNEGO support?
>>>>     1.3.1.3. Delegation token?
>>>>   1.3.2. Proxy User ACL?
>>>> 1.4. What authorization is available for determining who can access
>>>> what capabilities of the UIs for either viewing, modifying data and/or
>>>> related processes?
>>>> 1.5. Is there any input that will ultimately be persisted in
>>>> configuration for executing shell commands or processes?
>>>> 1.6. Do the UIs support the trusted proxy pattern with doas
>>>> impersonation?
>>>> 1.7. Is there TLS/SSL support?
>>>>
>>>
>>> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
>>> 1.7.2 Is it possible to configure support for HTTP Strict Transport
>>> Security
>>> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
>>> (HSTS)?
>>> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP
>>> address Z", etc)
>>>
>>>
>>>> *2. REST APIs*
>>>>
>>>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>>>> impersonation capabilities?
>>>> 2.2. What explicit protections have been built in for:
>>>>   2.2.1. cross site scripting (XSS)
>>>>   2.2.2. cross site request forgery (CSRF)
>>>>   2.2.3. XML External Entity (XXE)
>>>> 2.3. What is being used for authentication - Hadoop Auth Module?
>>>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>>>> endpoints) or are they part of existing processes?
>>>> 2.5. Is there TLS/SSL support?
>>>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>>>> APIs?
>>>> 2.7. What authorization enforcement points are there within the REST
>>>> APIs?
>>>>
>>>
>>> The TLS and audit comments above apply here, too.
>>>
>>>
>>>> *3. Encryption*
>>>>
>>>> 3.1. Is there any support for encryption of persisted data?
>>>> 3.2. If so, is KMS and the hadoop key command used for key management?
>>>> 3.3. KMS interaction with Proxy Users?
>>>>
>>>
>>> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
>>> any other in computer science. Standard cryptographic libraries should
>>> always be used. Does this work attempt to create an encryption scheme or
>>> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
>>> be dragons. Even normal-looking use of cryptography must be carefully
>>> reviewed.
>>> 3.5 If you need random bits for a security purpose, such as for a
>>> session token or a cryptographic key, you need a cryptographically approved
>>> place to acquire said bits. Use the SecureRandom class.
>>>
>>> *4. Configuration*
>>>>
>>>> 4.1. Are there any passwords or secrets being added to configuration?
>>>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>>>> for provisioning to credential providers?
>>>> 4.3. Are there any settings that are used to launch docker containers
>>>> or shell out command execution, etc?
>>>>
>>>
>>> +1. So good.
>>>
>>>
>>>> *5. HA*
>>>>
>>>> 5.1. Are there provisions for HA?
>>>> 5.2. Are there any single point of failures?
>>>>
>>>> *6. CVEs*
>>>>
>>>> Dependencies need to have been checked for known issues before we merge.
>>>> We don't however want to list any CVEs that have been fixed but not
>>>> released yet.
>>>>
>>>> 6.1. All dependencies checked for CVEs?
>>>>
>>>
>>> Big +1 for this, too.
>>>
>>> 7. Log Messages
>>>
>>> Do not write secrets or data into log files. This sounds obvious, but
>>> mistakes happen.
>>>
>>> 7.1 Do not log passwords, keys, security-related tokens, or any
>>> sensitive configuration item.
>>> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
>>> data, such as “I had an error parsing this line of text: xxxx” where the
>>> xxxx’s are user data. You never know, it might contain secrets like credit
>>> card numbers.
>>>
>>> 8. Secure By Default
>>>
>>> Strive to be *secure by default*. This means that products should ship
>>> in a secure state, and only by human tuning be put into an insecure state.
>>> Exhibit A here is the MongoDB ransomware fiasco
>>> <https://krebsonsecurity.com/tag/mongodb/>, where the
>>> insecure-by-default MongoDB installation resulted in completely open
>>> instances of mongodb on the open internet.  Attackers removed or encrypted
>>> the data and left ransom notes behind. We don't want that sort of notoriety
>>> for hadoop. Granted, it's not always possible to turn on all security
>>> features: for example you have to have a KDC set up in order to enable
>>> Kerberos.
>>>
>>> 8.1 Are there settings or configurations that can be shipped in a
>>> default-secure state?
>>>
>>>
>>> Thanks again for putting this list together!
>>> -Mike
>>>
>>>
>>>
>>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Thanks for the examples, Mike.

I think some of those should actually just be added to the checklist in
other places as they are best practices.
Which raises an interesting point that some of those items can be enabled
by default and maybe indicating so throughout the list makes sense.

Then we can ask for a description of any other Secure by Default
considerations at the end.

I will work on a new revision this morning.


On Wed, Oct 25, 2017 at 4:56 PM, Michael Yoder <my...@cloudera.com> wrote:

> #8 is a great topic - given that Hadoop is insecure by default.
>> Actual movement to Secure by Default would be a challenge both
>> technically (given the need for kerberos) and discussion-wise.
>> Asking whether you have considered any settings of configurations that
>> can be secure by default is an interesting idea.
>>
>> Can you provide an example though?
>>
>
> It's tough, I admit - kerberos requires a KDC, TLS requires certificates,
> etc.  But here are some ideas:
>
> - Default to only listen for network traffic on the loopback interface.
> The admin would have to take specific action to listen on a non-loopback
> address. Hence secure by default. I've known web servers that ship like
> this. The counter argument to this is that this is a "useless by default"
> setting for a distributed system... which does have some validity.
> - A more constrained version of the above is to not bind to any network
> interface that has an internet-routable ip address. (That is, not in the
> ranges <https://en.wikipedia.org/wiki/Private_network> 192.168.x.x,
> 172.16.x.x, and 10.x).  The idea is that we wouldn't want to risk traffic
> that's obviously headed towards the open internet.  Sure this isn't
> perfect, but it would catch some cases. The admin could provide a specific
> flag to override.  (I got this one from discussion with the Kudu folks.)
> - The examples don't have to be big. Another example would be... if using
> TLS, and if the certificate authority used to sign the certificate is in
> the default certificate store, turn on HSTS automatically.
> - Always turn off TLSv1 and TLSv1.1
> - Forbid single-DES and RC4 encryption algorithms
>
> You get the idea.
> -Mike
>
>
>
>>
>>
>> On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <my...@cloudera.com>
>> wrote:
>>
>>> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:
>>>
>>>> New Revision...
>>>>
>>>
>>> These lists are wonderful. I appreciate the split between the Tech
>>> Preview and the GA Readiness lists, with the emphasis on the former being
>>> "don't enable by default" or at least "don't enable if security is on".  I
>>> don't have any comments on that part.
>>>
>>> Additions inline below. If some of the additions are items covered by
>>> existing frameworks that any code would use, please forgive my ignorance.
>>> Also, my points aren't as succinct as yours. Feel free to reword.
>>>
>>> *GA Readiness Security Audit*
>>>> At this point, we are merging full or partial security model
>>>> implementations.
>>>> Let's inventory what is covered by the model at this point and whether
>>>> there are future merges required to be full.
>>>>
>>>> *1. UIs*
>>>>
>>>> 1.1. What sort of validation is being done on any accepted user input?
>>>> (pointers to code would be appreciated)
>>>> 1.2. What explicit protections have been built in for (pointers to code
>>>> would be appreciated):
>>>>   1.2.1. cross site scripting
>>>>   1.2.2. cross site request forgery
>>>>   1.2.3. click jacking (X-Frame-Options)
>>>>
>>>
>>> 1.2.4 If using cookies, is the secure flag for cookies
>>> <https://www.owasp.org/index.php/SecureFlag> turned on?
>>>
>>>
>>>> 1.3. What sort of authentication is required for access to the UIs?
>>>>   1.3.1. Kerberos
>>>>     1.3.1.1. has TGT renewal been accounted for
>>>>     1.3.1.2. SPNEGO support?
>>>>     1.3.1.3. Delegation token?
>>>>   1.3.2. Proxy User ACL?
>>>> 1.4. What authorization is available for determining who can access
>>>> what capabilities of the UIs for either viewing, modifying data and/or
>>>> related processes?
>>>> 1.5. Is there any input that will ultimately be persisted in
>>>> configuration for executing shell commands or processes?
>>>> 1.6. Do the UIs support the trusted proxy pattern with doas
>>>> impersonation?
>>>> 1.7. Is there TLS/SSL support?
>>>>
>>>
>>> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
>>> 1.7.2 Is it possible to configure support for HTTP Strict Transport
>>> Security
>>> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
>>> (HSTS)?
>>> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP
>>> address Z", etc)
>>>
>>>
>>>> *2. REST APIs*
>>>>
>>>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>>>> impersonation capabilities?
>>>> 2.2. What explicit protections have been built in for:
>>>>   2.2.1. cross site scripting (XSS)
>>>>   2.2.2. cross site request forgery (CSRF)
>>>>   2.2.3. XML External Entity (XXE)
>>>> 2.3. What is being used for authentication - Hadoop Auth Module?
>>>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>>>> endpoints) or are they part of existing processes?
>>>> 2.5. Is there TLS/SSL support?
>>>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>>>> APIs?
>>>> 2.7. What authorization enforcement points are there within the REST
>>>> APIs?
>>>>
>>>
>>> The TLS and audit comments above apply here, too.
>>>
>>>
>>>> *3. Encryption*
>>>>
>>>> 3.1. Is there any support for encryption of persisted data?
>>>> 3.2. If so, is KMS and the hadoop key command used for key management?
>>>> 3.3. KMS interaction with Proxy Users?
>>>>
>>>
>>> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
>>> any other in computer science. Standard cryptographic libraries should
>>> always be used. Does this work attempt to create an encryption scheme or
>>> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
>>> be dragons. Even normal-looking use of cryptography must be carefully
>>> reviewed.
>>> 3.5 If you need random bits for a security purpose, such as for a
>>> session token or a cryptographic key, you need a cryptographically approved
>>> place to acquire said bits. Use the SecureRandom class.
>>>
>>> *4. Configuration*
>>>>
>>>> 4.1. Are there any passwords or secrets being added to configuration?
>>>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>>>> for provisioning to credential providers?
>>>> 4.3. Are there any settings that are used to launch docker containers
>>>> or shell out command execution, etc?
>>>>
>>>
>>> +1. So good.
>>>
>>>
>>>> *5. HA*
>>>>
>>>> 5.1. Are there provisions for HA?
>>>> 5.2. Are there any single point of failures?
>>>>
>>>> *6. CVEs*
>>>>
>>>> Dependencies need to have been checked for known issues before we merge.
>>>> We don't however want to list any CVEs that have been fixed but not
>>>> released yet.
>>>>
>>>> 6.1. All dependencies checked for CVEs?
>>>>
>>>
>>> Big +1 for this, too.
>>>
>>> 7. Log Messages
>>>
>>> Do not write secrets or data into log files. This sounds obvious, but
>>> mistakes happen.
>>>
>>> 7.1 Do not log passwords, keys, security-related tokens, or any
>>> sensitive configuration item.
>>> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
>>> data, such as “I had an error parsing this line of text: xxxx” where the
>>> xxxx’s are user data. You never know, it might contain secrets like credit
>>> card numbers.
>>>
>>> 8. Secure By Default
>>>
>>> Strive to be *secure by default*. This means that products should ship
>>> in a secure state, and only by human tuning be put into an insecure state.
>>> Exhibit A here is the MongoDB ransomware fiasco
>>> <https://krebsonsecurity.com/tag/mongodb/>, where the
>>> insecure-by-default MongoDB installation resulted in completely open
>>> instances of mongodb on the open internet.  Attackers removed or encrypted
>>> the data and left ransom notes behind. We don't want that sort of notoriety
>>> for hadoop. Granted, it's not always possible to turn on all security
>>> features: for example you have to have a KDC set up in order to enable
>>> Kerberos.
>>>
>>> 8.1 Are there settings or configurations that can be shipped in a
>>> default-secure state?
>>>
>>>
>>> Thanks again for putting this list together!
>>> -Mike
>>>
>>>
>>>
>>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Terrific additions, Mike!
I will spin a new revision and incorporate your additions.

#8 is a great topic - given that Hadoop is insecure by default.
Actual movement to Secure by Default would be a challenge both technically
(given the need for kerberos) and discussion-wise.
Asking whether you have considered any settings of configurations that can
be secure by default is an interesting idea.

Can you provide an example though?


On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <my...@cloudera.com> wrote:

> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:
>
>> New Revision...
>>
>
> These lists are wonderful. I appreciate the split between the Tech Preview
> and the GA Readiness lists, with the emphasis on the former being "don't
> enable by default" or at least "don't enable if security is on".  I don't
> have any comments on that part.
>
> Additions inline below. If some of the additions are items covered by
> existing frameworks that any code would use, please forgive my ignorance.
> Also, my points aren't as succinct as yours. Feel free to reword.
>
> *GA Readiness Security Audit*
>> At this point, we are merging full or partial security model
>> implementations.
>> Let's inventory what is covered by the model at this point and whether
>> there are future merges required to be full.
>>
>> *1. UIs*
>>
>> 1.1. What sort of validation is being done on any accepted user input?
>> (pointers to code would be appreciated)
>> 1.2. What explicit protections have been built in for (pointers to code
>> would be appreciated):
>>   1.2.1. cross site scripting
>>   1.2.2. cross site request forgery
>>   1.2.3. click jacking (X-Frame-Options)
>>
>
> 1.2.4 If using cookies, is the secure flag for cookies
> <https://www.owasp.org/index.php/SecureFlag> turned on?
>
>
>> 1.3. What sort of authentication is required for access to the UIs?
>>   1.3.1. Kerberos
>>     1.3.1.1. has TGT renewal been accounted for
>>     1.3.1.2. SPNEGO support?
>>     1.3.1.3. Delegation token?
>>   1.3.2. Proxy User ACL?
>> 1.4. What authorization is available for determining who can access what
>> capabilities of the UIs for either viewing, modifying data and/or related
>> processes?
>> 1.5. Is there any input that will ultimately be persisted in
>> configuration for executing shell commands or processes?
>> 1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
>> 1.7. Is there TLS/SSL support?
>>
>
> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
> 1.7.2 Is it possible to configure support for HTTP Strict Transport
> Security
> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
> (HSTS)?
> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP address
> Z", etc)
>
>
>> *2. REST APIs*
>>
>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>> impersonation capabilities?
>> 2.2. What explicit protections have been built in for:
>>   2.2.1. cross site scripting (XSS)
>>   2.2.2. cross site request forgery (CSRF)
>>   2.2.3. XML External Entity (XXE)
>> 2.3. What is being used for authentication - Hadoop Auth Module?
>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>> endpoints) or are they part of existing processes?
>> 2.5. Is there TLS/SSL support?
>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>> APIs?
>> 2.7. What authorization enforcement points are there within the REST APIs?
>>
>
> The TLS and audit comments above apply here, too.
>
>
>> *3. Encryption*
>>
>> 3.1. Is there any support for encryption of persisted data?
>> 3.2. If so, is KMS and the hadoop key command used for key management?
>> 3.3. KMS interaction with Proxy Users?
>>
>
> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
> any other in computer science. Standard cryptographic libraries should
> always be used. Does this work attempt to create an encryption scheme or
> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
> be dragons. Even normal-looking use of cryptography must be carefully
> reviewed.
> 3.5 If you need random bits for a security purpose, such as for a session
> token or a cryptographic key, you need a cryptographically approved place
> to acquire said bits. Use the SecureRandom class.
>
> *4. Configuration*
>>
>> 4.1. Are there any passwords or secrets being added to configuration?
>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>> for provisioning to credential providers?
>> 4.3. Are there any settings that are used to launch docker containers or
>> shell out command execution, etc?
>>
>
> +1. So good.
>
>
>> *5. HA*
>>
>> 5.1. Are there provisions for HA?
>> 5.2. Are there any single point of failures?
>>
>> *6. CVEs*
>>
>> Dependencies need to have been checked for known issues before we merge.
>> We don't however want to list any CVEs that have been fixed but not
>> released yet.
>>
>> 6.1. All dependencies checked for CVEs?
>>
>
> Big +1 for this, too.
>
> 7. Log Messages
>
> Do not write secrets or data into log files. This sounds obvious, but
> mistakes happen.
>
> 7.1 Do not log passwords, keys, security-related tokens, or any sensitive
> configuration item.
> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
> data, such as “I had an error parsing this line of text: xxxx” where the
> xxxx’s are user data. You never know, it might contain secrets like credit
> card numbers.
>
> 8. Secure By Default
>
> Strive to be *secure by default*. This means that products should ship in
> a secure state, and only by human tuning be put into an insecure state.
> Exhibit A here is the MongoDB ransomware fiasco
> <https://krebsonsecurity.com/tag/mongodb/>, where the insecure-by-default
> MongoDB installation resulted in completely open instances of mongodb on
> the open internet.  Attackers removed or encrypted the data and left ransom
> notes behind. We don't want that sort of notoriety for hadoop. Granted,
> it's not always possible to turn on all security features: for example you
> have to have a KDC set up in order to enable Kerberos.
>
> 8.1 Are there settings or configurations that can be shipped in a
> default-secure state?
>
>
> Thanks again for putting this list together!
> -Mike
>
>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Terrific additions, Mike!
I will spin a new revision and incorporate your additions.

#8 is a great topic - given that Hadoop is insecure by default.
Actual movement to Secure by Default would be a challenge both technically
(given the need for kerberos) and discussion-wise.
Asking whether you have considered any settings of configurations that can
be secure by default is an interesting idea.

Can you provide an example though?


On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <my...@cloudera.com> wrote:

> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:
>
>> New Revision...
>>
>
> These lists are wonderful. I appreciate the split between the Tech Preview
> and the GA Readiness lists, with the emphasis on the former being "don't
> enable by default" or at least "don't enable if security is on".  I don't
> have any comments on that part.
>
> Additions inline below. If some of the additions are items covered by
> existing frameworks that any code would use, please forgive my ignorance.
> Also, my points aren't as succinct as yours. Feel free to reword.
>
> *GA Readiness Security Audit*
>> At this point, we are merging full or partial security model
>> implementations.
>> Let's inventory what is covered by the model at this point and whether
>> there are future merges required to be full.
>>
>> *1. UIs*
>>
>> 1.1. What sort of validation is being done on any accepted user input?
>> (pointers to code would be appreciated)
>> 1.2. What explicit protections have been built in for (pointers to code
>> would be appreciated):
>>   1.2.1. cross site scripting
>>   1.2.2. cross site request forgery
>>   1.2.3. click jacking (X-Frame-Options)
>>
>
> 1.2.4 If using cookies, is the secure flag for cookies
> <https://www.owasp.org/index.php/SecureFlag> turned on?
>
>
>> 1.3. What sort of authentication is required for access to the UIs?
>>   1.3.1. Kerberos
>>     1.3.1.1. has TGT renewal been accounted for
>>     1.3.1.2. SPNEGO support?
>>     1.3.1.3. Delegation token?
>>   1.3.2. Proxy User ACL?
>> 1.4. What authorization is available for determining who can access what
>> capabilities of the UIs for either viewing, modifying data and/or related
>> processes?
>> 1.5. Is there any input that will ultimately be persisted in
>> configuration for executing shell commands or processes?
>> 1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
>> 1.7. Is there TLS/SSL support?
>>
>
> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
> 1.7.2 Is it possible to configure support for HTTP Strict Transport
> Security
> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
> (HSTS)?
> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP address
> Z", etc)
>
>
>> *2. REST APIs*
>>
>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>> impersonation capabilities?
>> 2.2. What explicit protections have been built in for:
>>   2.2.1. cross site scripting (XSS)
>>   2.2.2. cross site request forgery (CSRF)
>>   2.2.3. XML External Entity (XXE)
>> 2.3. What is being used for authentication - Hadoop Auth Module?
>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>> endpoints) or are they part of existing processes?
>> 2.5. Is there TLS/SSL support?
>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>> APIs?
>> 2.7. What authorization enforcement points are there within the REST APIs?
>>
>
> The TLS and audit comments above apply here, too.
>
>
>> *3. Encryption*
>>
>> 3.1. Is there any support for encryption of persisted data?
>> 3.2. If so, is KMS and the hadoop key command used for key management?
>> 3.3. KMS interaction with Proxy Users?
>>
>
> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
> any other in computer science. Standard cryptographic libraries should
> always be used. Does this work attempt to create an encryption scheme or
> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
> be dragons. Even normal-looking use of cryptography must be carefully
> reviewed.
> 3.5 If you need random bits for a security purpose, such as for a session
> token or a cryptographic key, you need a cryptographically approved place
> to acquire said bits. Use the SecureRandom class.
>
> *4. Configuration*
>>
>> 4.1. Are there any passwords or secrets being added to configuration?
>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>> for provisioning to credential providers?
>> 4.3. Are there any settings that are used to launch docker containers or
>> shell out command execution, etc?
>>
>
> +1. So good.
>
>
>> *5. HA*
>>
>> 5.1. Are there provisions for HA?
>> 5.2. Are there any single point of failures?
>>
>> *6. CVEs*
>>
>> Dependencies need to have been checked for known issues before we merge.
>> We don't however want to list any CVEs that have been fixed but not
>> released yet.
>>
>> 6.1. All dependencies checked for CVEs?
>>
>
> Big +1 for this, too.
>
> 7. Log Messages
>
> Do not write secrets or data into log files. This sounds obvious, but
> mistakes happen.
>
> 7.1 Do not log passwords, keys, security-related tokens, or any sensitive
> configuration item.
> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
> data, such as “I had an error parsing this line of text: xxxx” where the
> xxxx’s are user data. You never know, it might contain secrets like credit
> card numbers.
>
> 8. Secure By Default
>
> Strive to be *secure by default*. This means that products should ship in
> a secure state, and only by human tuning be put into an insecure state.
> Exhibit A here is the MongoDB ransomware fiasco
> <https://krebsonsecurity.com/tag/mongodb/>, where the insecure-by-default
> MongoDB installation resulted in completely open instances of mongodb on
> the open internet.  Attackers removed or encrypted the data and left ransom
> notes behind. We don't want that sort of notoriety for hadoop. Granted,
> it's not always possible to turn on all security features: for example you
> have to have a KDC set up in order to enable Kerberos.
>
> 8.1 Are there settings or configurations that can be shipped in a
> default-secure state?
>
>
> Thanks again for putting this list together!
> -Mike
>
>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Terrific additions, Mike!
I will spin a new revision and incorporate your additions.

#8 is a great topic - given that Hadoop is insecure by default.
Actual movement to Secure by Default would be a challenge both technically
(given the need for kerberos) and discussion-wise.
Asking whether you have considered any settings of configurations that can
be secure by default is an interesting idea.

Can you provide an example though?


On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <my...@cloudera.com> wrote:

> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:
>
>> New Revision...
>>
>
> These lists are wonderful. I appreciate the split between the Tech Preview
> and the GA Readiness lists, with the emphasis on the former being "don't
> enable by default" or at least "don't enable if security is on".  I don't
> have any comments on that part.
>
> Additions inline below. If some of the additions are items covered by
> existing frameworks that any code would use, please forgive my ignorance.
> Also, my points aren't as succinct as yours. Feel free to reword.
>
> *GA Readiness Security Audit*
>> At this point, we are merging full or partial security model
>> implementations.
>> Let's inventory what is covered by the model at this point and whether
>> there are future merges required to be full.
>>
>> *1. UIs*
>>
>> 1.1. What sort of validation is being done on any accepted user input?
>> (pointers to code would be appreciated)
>> 1.2. What explicit protections have been built in for (pointers to code
>> would be appreciated):
>>   1.2.1. cross site scripting
>>   1.2.2. cross site request forgery
>>   1.2.3. click jacking (X-Frame-Options)
>>
>
> 1.2.4 If using cookies, is the secure flag for cookies
> <https://www.owasp.org/index.php/SecureFlag> turned on?
>
>
>> 1.3. What sort of authentication is required for access to the UIs?
>>   1.3.1. Kerberos
>>     1.3.1.1. has TGT renewal been accounted for
>>     1.3.1.2. SPNEGO support?
>>     1.3.1.3. Delegation token?
>>   1.3.2. Proxy User ACL?
>> 1.4. What authorization is available for determining who can access what
>> capabilities of the UIs for either viewing, modifying data and/or related
>> processes?
>> 1.5. Is there any input that will ultimately be persisted in
>> configuration for executing shell commands or processes?
>> 1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
>> 1.7. Is there TLS/SSL support?
>>
>
> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
> 1.7.2 Is it possible to configure support for HTTP Strict Transport
> Security
> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
> (HSTS)?
> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP address
> Z", etc)
>
>
>> *2. REST APIs*
>>
>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>> impersonation capabilities?
>> 2.2. What explicit protections have been built in for:
>>   2.2.1. cross site scripting (XSS)
>>   2.2.2. cross site request forgery (CSRF)
>>   2.2.3. XML External Entity (XXE)
>> 2.3. What is being used for authentication - Hadoop Auth Module?
>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>> endpoints) or are they part of existing processes?
>> 2.5. Is there TLS/SSL support?
>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>> APIs?
>> 2.7. What authorization enforcement points are there within the REST APIs?
>>
>
> The TLS and audit comments above apply here, too.
>
>
>> *3. Encryption*
>>
>> 3.1. Is there any support for encryption of persisted data?
>> 3.2. If so, is KMS and the hadoop key command used for key management?
>> 3.3. KMS interaction with Proxy Users?
>>
>
> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
> any other in computer science. Standard cryptographic libraries should
> always be used. Does this work attempt to create an encryption scheme or
> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
> be dragons. Even normal-looking use of cryptography must be carefully
> reviewed.
> 3.5 If you need random bits for a security purpose, such as for a session
> token or a cryptographic key, you need a cryptographically approved place
> to acquire said bits. Use the SecureRandom class.
>
> *4. Configuration*
>>
>> 4.1. Are there any passwords or secrets being added to configuration?
>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>> for provisioning to credential providers?
>> 4.3. Are there any settings that are used to launch docker containers or
>> shell out command execution, etc?
>>
>
> +1. So good.
>
>
>> *5. HA*
>>
>> 5.1. Are there provisions for HA?
>> 5.2. Are there any single point of failures?
>>
>> *6. CVEs*
>>
>> Dependencies need to have been checked for known issues before we merge.
>> We don't however want to list any CVEs that have been fixed but not
>> released yet.
>>
>> 6.1. All dependencies checked for CVEs?
>>
>
> Big +1 for this, too.
>
> 7. Log Messages
>
> Do not write secrets or data into log files. This sounds obvious, but
> mistakes happen.
>
> 7.1 Do not log passwords, keys, security-related tokens, or any sensitive
> configuration item.
> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
> data, such as “I had an error parsing this line of text: xxxx” where the
> xxxx’s are user data. You never know, it might contain secrets like credit
> card numbers.
>
> 8. Secure By Default
>
> Strive to be *secure by default*. This means that products should ship in
> a secure state, and only by human tuning be put into an insecure state.
> Exhibit A here is the MongoDB ransomware fiasco
> <https://krebsonsecurity.com/tag/mongodb/>, where the insecure-by-default
> MongoDB installation resulted in completely open instances of mongodb on
> the open internet.  Attackers removed or encrypted the data and left ransom
> notes behind. We don't want that sort of notoriety for hadoop. Granted,
> it's not always possible to turn on all security features: for example you
> have to have a KDC set up in order to enable Kerberos.
>
> 8.1 Are there settings or configurations that can be shipped in a
> default-secure state?
>
>
> Thanks again for putting this list together!
> -Mike
>
>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Terrific additions, Mike!
I will spin a new revision and incorporate your additions.

#8 is a great topic - given that Hadoop is insecure by default.
Actual movement to Secure by Default would be a challenge both technically
(given the need for kerberos) and discussion-wise.
Asking whether you have considered any settings of configurations that can
be secure by default is an interesting idea.

Can you provide an example though?


On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <my...@cloudera.com> wrote:

> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lm...@apache.org> wrote:
>
>> New Revision...
>>
>
> These lists are wonderful. I appreciate the split between the Tech Preview
> and the GA Readiness lists, with the emphasis on the former being "don't
> enable by default" or at least "don't enable if security is on".  I don't
> have any comments on that part.
>
> Additions inline below. If some of the additions are items covered by
> existing frameworks that any code would use, please forgive my ignorance.
> Also, my points aren't as succinct as yours. Feel free to reword.
>
> *GA Readiness Security Audit*
>> At this point, we are merging full or partial security model
>> implementations.
>> Let's inventory what is covered by the model at this point and whether
>> there are future merges required to be full.
>>
>> *1. UIs*
>>
>> 1.1. What sort of validation is being done on any accepted user input?
>> (pointers to code would be appreciated)
>> 1.2. What explicit protections have been built in for (pointers to code
>> would be appreciated):
>>   1.2.1. cross site scripting
>>   1.2.2. cross site request forgery
>>   1.2.3. click jacking (X-Frame-Options)
>>
>
> 1.2.4 If using cookies, is the secure flag for cookies
> <https://www.owasp.org/index.php/SecureFlag> turned on?
>
>
>> 1.3. What sort of authentication is required for access to the UIs?
>>   1.3.1. Kerberos
>>     1.3.1.1. has TGT renewal been accounted for
>>     1.3.1.2. SPNEGO support?
>>     1.3.1.3. Delegation token?
>>   1.3.2. Proxy User ACL?
>> 1.4. What authorization is available for determining who can access what
>> capabilities of the UIs for either viewing, modifying data and/or related
>> processes?
>> 1.5. Is there any input that will ultimately be persisted in
>> configuration for executing shell commands or processes?
>> 1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
>> 1.7. Is there TLS/SSL support?
>>
>
> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
> 1.7.2 Is it possible to configure support for HTTP Strict Transport
> Security
> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
> (HSTS)?
> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP address
> Z", etc)
>
>
>> *2. REST APIs*
>>
>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>> impersonation capabilities?
>> 2.2. What explicit protections have been built in for:
>>   2.2.1. cross site scripting (XSS)
>>   2.2.2. cross site request forgery (CSRF)
>>   2.2.3. XML External Entity (XXE)
>> 2.3. What is being used for authentication - Hadoop Auth Module?
>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>> endpoints) or are they part of existing processes?
>> 2.5. Is there TLS/SSL support?
>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>> APIs?
>> 2.7. What authorization enforcement points are there within the REST APIs?
>>
>
> The TLS and audit comments above apply here, too.
>
>
>> *3. Encryption*
>>
>> 3.1. Is there any support for encryption of persisted data?
>> 3.2. If so, is KMS and the hadoop key command used for key management?
>> 3.3. KMS interaction with Proxy Users?
>>
>
> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
> any other in computer science. Standard cryptographic libraries should
> always be used. Does this work attempt to create an encryption scheme or
> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
> be dragons. Even normal-looking use of cryptography must be carefully
> reviewed.
> 3.5 If you need random bits for a security purpose, such as for a session
> token or a cryptographic key, you need a cryptographically approved place
> to acquire said bits. Use the SecureRandom class.
>
> *4. Configuration*
>>
>> 4.1. Are there any passwords or secrets being added to configuration?
>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>> for provisioning to credential providers?
>> 4.3. Are there any settings that are used to launch docker containers or
>> shell out command execution, etc?
>>
>
> +1. So good.
>
>
>> *5. HA*
>>
>> 5.1. Are there provisions for HA?
>> 5.2. Are there any single point of failures?
>>
>> *6. CVEs*
>>
>> Dependencies need to have been checked for known issues before we merge.
>> We don't however want to list any CVEs that have been fixed but not
>> released yet.
>>
>> 6.1. All dependencies checked for CVEs?
>>
>
> Big +1 for this, too.
>
> 7. Log Messages
>
> Do not write secrets or data into log files. This sounds obvious, but
> mistakes happen.
>
> 7.1 Do not log passwords, keys, security-related tokens, or any sensitive
> configuration item.
> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
> data, such as “I had an error parsing this line of text: xxxx” where the
> xxxx’s are user data. You never know, it might contain secrets like credit
> card numbers.
>
> 8. Secure By Default
>
> Strive to be *secure by default*. This means that products should ship in
> a secure state, and only by human tuning be put into an insecure state.
> Exhibit A here is the MongoDB ransomware fiasco
> <https://krebsonsecurity.com/tag/mongodb/>, where the insecure-by-default
> MongoDB installation resulted in completely open instances of mongodb on
> the open internet.  Attackers removed or encrypted the data and left ransom
> notes behind. We don't want that sort of notoriety for hadoop. Granted,
> it's not always possible to turn on all security features: for example you
> have to have a KDC set up in order to enable Kerberos.
>
> 8.1 Are there settings or configurations that can be shipped in a
> default-secure state?
>
>
> Thanks again for putting this list together!
> -Mike
>
>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

New Revision...

This revision acknowledges the reality that we often have multiple phases
of feature lifecycle and that we need to account for each phase.
It has also been made more generic.
I have created a Tech Preview Security Audit list and a GA Readiness
Security Audit list.
I've also included suggested items into the GA Readiness list.

It has also been suggested that we publish the information as part of docs
so that the state of such features can be easily determined from these
pages. We can discuss this aspect as well.

Thoughts?

*Tech Preview Security Audit*
For features that are being merged without full security model coverage,
there need to be a base line of assurances that they do not introduce new
attack vectors in deployments that are from actual releases or even just
built from trunk.

*1. UIs*

1.1. Are there new UIs added with this merge?
1.2. Are they enabled/accessible by default?
1.3. Are they hosted in existing processes or as part of a new
process/server?
1.4. If new process/server, is it launched by default?

*2. APIs*

2.1. Are there new REST APIs added with this merge?
2.2. Are they enabled by default?
2.3. Are there RPC based APIs added with this merge?
2.4. Are they enabled by default?

*3. Secure Clusters*

3.1. Is this feature disabled completely in secure deployments?
3.2. If not, is there some justification as to why it should be available?

*4. CVEs*

4.1. Have all dependencies introduced by this merge been checked for known
issues?

--------------------------------------------------------------------------------------------------------------------------------------------------

*GA Readiness Security Audit*
At this point, we are merging full or partial security model
implementations.
Let's inventory what is covered by the model at this point and whether
there are future merges required to be full.

*1. UIs*

1.1. What sort of validation is being done on any accepted user input?
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting
  1.2.2. cross site request forgery
  1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
  1.3.1. Kerberos
    1.3.1.1. has TGT renewal been accounted for
    1.3.1.2. SPNEGO support?
    1.3.1.3. Delegation token?
  1.3.2. Proxy User ACL?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data and/or related
processes?
1.5. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
1.7. Is there TLS/SSL support?

*2. REST APIs*

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS)
  2.2.2. cross site request forgery (CSRF)
  2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are they part of existing processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
2.7. What authorization enforcement points are there within the REST APIs?

*3. Encryption*

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?
3.3. KMS interaction with Proxy Users?

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?

On Sat, Oct 21, 2017 at 10:26 AM, larry mccay <lm...@apache.org> wrote:

> Hi Marton -
>
> I don't think there is any denying that it would be great to have such
> documentation for all of those reasons.
> If it is a natural extension of getting the checklist information as an
> assertion of security state when merging then we can certainly include it.
>
> I think that backfilling all such information across the project is a
> different topic altogether and wouldn't want to expand the scope of this
> discussion in that direction.
>
> Thanks for the great thoughts on this!
>
> thanks,
>
> --larry
>
>
>
>
>
> On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <hd...@anzix.net> wrote:
>
>>
>>
>> On 10/21/2017 02:41 AM, larry mccay wrote:
>>
>>>
>>> "We might want to start a security section for Hadoop wiki for each of
>>>> the
>>>> services and components.
>>>> This helps to track what has been completed."
>>>>
>>>
>>> Do you mean to keep the audit checklist for each service and component
>>> there?
>>> Interesting idea, I wonder what sort of maintenance that implies and
>>> whether we want to take on that burden even though it would be great
>>> information to have for future reviewers.
>>>
>>
>> I think we should care about the maintenance of the documentation anyway.
>> We also need to maintain all the other documentations. I think it could be
>> even part of the generated docs and not the wiki.
>>
>> I also suggest to fill this list about the current trunk/3.0 as a first
>> step.
>>
>> 1. It would be a very usefull documentation for the end-users (some
>> answers could link the existing documentation, it exists, but I am not sure
>> if all the answers are in the current documentation.)
>>
>> 2. It would be a good example who the questions could be answered.
>>
>> 3. It would help to check, if something is missing from the list.
>>
>> 4. There are future branches where some of the components are not
>> touched. For example, no web ui or no REST service. A prefilled list could
>> help to check if the branch doesn't break any old security functionality on
>> trunk.
>>
>> 5. It helps to document the security features in one place. If we have a
>> list for the existing functionality in the same format, it would be easy to
>> merge the new documentation of the new features as they will be reported in
>> the same form. (So it won't be so hard to maintain the list...).
>>
>> Marton
>>
>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

New Revision...

This revision acknowledges the reality that we often have multiple phases
of feature lifecycle and that we need to account for each phase.
It has also been made more generic.
I have created a Tech Preview Security Audit list and a GA Readiness
Security Audit list.
I've also included suggested items into the GA Readiness list.

It has also been suggested that we publish the information as part of docs
so that the state of such features can be easily determined from these
pages. We can discuss this aspect as well.

Thoughts?

*Tech Preview Security Audit*
For features that are being merged without full security model coverage,
there need to be a base line of assurances that they do not introduce new
attack vectors in deployments that are from actual releases or even just
built from trunk.

*1. UIs*

1.1. Are there new UIs added with this merge?
1.2. Are they enabled/accessible by default?
1.3. Are they hosted in existing processes or as part of a new
process/server?
1.4. If new process/server, is it launched by default?

*2. APIs*

2.1. Are there new REST APIs added with this merge?
2.2. Are they enabled by default?
2.3. Are there RPC based APIs added with this merge?
2.4. Are they enabled by default?

*3. Secure Clusters*

3.1. Is this feature disabled completely in secure deployments?
3.2. If not, is there some justification as to why it should be available?

*4. CVEs*

4.1. Have all dependencies introduced by this merge been checked for known
issues?

--------------------------------------------------------------------------------------------------------------------------------------------------

*GA Readiness Security Audit*
At this point, we are merging full or partial security model
implementations.
Let's inventory what is covered by the model at this point and whether
there are future merges required to be full.

*1. UIs*

1.1. What sort of validation is being done on any accepted user input?
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting
  1.2.2. cross site request forgery
  1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
  1.3.1. Kerberos
    1.3.1.1. has TGT renewal been accounted for
    1.3.1.2. SPNEGO support?
    1.3.1.3. Delegation token?
  1.3.2. Proxy User ACL?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data and/or related
processes?
1.5. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
1.7. Is there TLS/SSL support?

*2. REST APIs*

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS)
  2.2.2. cross site request forgery (CSRF)
  2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are they part of existing processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
2.7. What authorization enforcement points are there within the REST APIs?

*3. Encryption*

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?
3.3. KMS interaction with Proxy Users?

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?

On Sat, Oct 21, 2017 at 10:26 AM, larry mccay <lm...@apache.org> wrote:

> Hi Marton -
>
> I don't think there is any denying that it would be great to have such
> documentation for all of those reasons.
> If it is a natural extension of getting the checklist information as an
> assertion of security state when merging then we can certainly include it.
>
> I think that backfilling all such information across the project is a
> different topic altogether and wouldn't want to expand the scope of this
> discussion in that direction.
>
> Thanks for the great thoughts on this!
>
> thanks,
>
> --larry
>
>
>
>
>
> On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <hd...@anzix.net> wrote:
>
>>
>>
>> On 10/21/2017 02:41 AM, larry mccay wrote:
>>
>>>
>>> "We might want to start a security section for Hadoop wiki for each of
>>>> the
>>>> services and components.
>>>> This helps to track what has been completed."
>>>>
>>>
>>> Do you mean to keep the audit checklist for each service and component
>>> there?
>>> Interesting idea, I wonder what sort of maintenance that implies and
>>> whether we want to take on that burden even though it would be great
>>> information to have for future reviewers.
>>>
>>
>> I think we should care about the maintenance of the documentation anyway.
>> We also need to maintain all the other documentations. I think it could be
>> even part of the generated docs and not the wiki.
>>
>> I also suggest to fill this list about the current trunk/3.0 as a first
>> step.
>>
>> 1. It would be a very usefull documentation for the end-users (some
>> answers could link the existing documentation, it exists, but I am not sure
>> if all the answers are in the current documentation.)
>>
>> 2. It would be a good example who the questions could be answered.
>>
>> 3. It would help to check, if something is missing from the list.
>>
>> 4. There are future branches where some of the components are not
>> touched. For example, no web ui or no REST service. A prefilled list could
>> help to check if the branch doesn't break any old security functionality on
>> trunk.
>>
>> 5. It helps to document the security features in one place. If we have a
>> list for the existing functionality in the same format, it would be easy to
>> merge the new documentation of the new features as they will be reported in
>> the same form. (So it won't be so hard to maintain the list...).
>>
>> Marton
>>
>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

New Revision...

This revision acknowledges the reality that we often have multiple phases
of feature lifecycle and that we need to account for each phase.
It has also been made more generic.
I have created a Tech Preview Security Audit list and a GA Readiness
Security Audit list.
I've also included suggested items into the GA Readiness list.

It has also been suggested that we publish the information as part of docs
so that the state of such features can be easily determined from these
pages. We can discuss this aspect as well.

Thoughts?

*Tech Preview Security Audit*
For features that are being merged without full security model coverage,
there need to be a base line of assurances that they do not introduce new
attack vectors in deployments that are from actual releases or even just
built from trunk.

*1. UIs*

1.1. Are there new UIs added with this merge?
1.2. Are they enabled/accessible by default?
1.3. Are they hosted in existing processes or as part of a new
process/server?
1.4. If new process/server, is it launched by default?

*2. APIs*

2.1. Are there new REST APIs added with this merge?
2.2. Are they enabled by default?
2.3. Are there RPC based APIs added with this merge?
2.4. Are they enabled by default?

*3. Secure Clusters*

3.1. Is this feature disabled completely in secure deployments?
3.2. If not, is there some justification as to why it should be available?

*4. CVEs*

4.1. Have all dependencies introduced by this merge been checked for known
issues?

--------------------------------------------------------------------------------------------------------------------------------------------------

*GA Readiness Security Audit*
At this point, we are merging full or partial security model
implementations.
Let's inventory what is covered by the model at this point and whether
there are future merges required to be full.

*1. UIs*

1.1. What sort of validation is being done on any accepted user input?
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting
  1.2.2. cross site request forgery
  1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
  1.3.1. Kerberos
    1.3.1.1. has TGT renewal been accounted for
    1.3.1.2. SPNEGO support?
    1.3.1.3. Delegation token?
  1.3.2. Proxy User ACL?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data and/or related
processes?
1.5. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
1.7. Is there TLS/SSL support?

*2. REST APIs*

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS)
  2.2.2. cross site request forgery (CSRF)
  2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are they part of existing processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
2.7. What authorization enforcement points are there within the REST APIs?

*3. Encryption*

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?
3.3. KMS interaction with Proxy Users?

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?

On Sat, Oct 21, 2017 at 10:26 AM, larry mccay <lm...@apache.org> wrote:

> Hi Marton -
>
> I don't think there is any denying that it would be great to have such
> documentation for all of those reasons.
> If it is a natural extension of getting the checklist information as an
> assertion of security state when merging then we can certainly include it.
>
> I think that backfilling all such information across the project is a
> different topic altogether and wouldn't want to expand the scope of this
> discussion in that direction.
>
> Thanks for the great thoughts on this!
>
> thanks,
>
> --larry
>
>
>
>
>
> On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <hd...@anzix.net> wrote:
>
>>
>>
>> On 10/21/2017 02:41 AM, larry mccay wrote:
>>
>>>
>>> "We might want to start a security section for Hadoop wiki for each of
>>>> the
>>>> services and components.
>>>> This helps to track what has been completed."
>>>>
>>>
>>> Do you mean to keep the audit checklist for each service and component
>>> there?
>>> Interesting idea, I wonder what sort of maintenance that implies and
>>> whether we want to take on that burden even though it would be great
>>> information to have for future reviewers.
>>>
>>
>> I think we should care about the maintenance of the documentation anyway.
>> We also need to maintain all the other documentations. I think it could be
>> even part of the generated docs and not the wiki.
>>
>> I also suggest to fill this list about the current trunk/3.0 as a first
>> step.
>>
>> 1. It would be a very usefull documentation for the end-users (some
>> answers could link the existing documentation, it exists, but I am not sure
>> if all the answers are in the current documentation.)
>>
>> 2. It would be a good example who the questions could be answered.
>>
>> 3. It would help to check, if something is missing from the list.
>>
>> 4. There are future branches where some of the components are not
>> touched. For example, no web ui or no REST service. A prefilled list could
>> help to check if the branch doesn't break any old security functionality on
>> trunk.
>>
>> 5. It helps to document the security features in one place. If we have a
>> list for the existing functionality in the same format, it would be easy to
>> merge the new documentation of the new features as they will be reported in
>> the same form. (So it won't be so hard to maintain the list...).
>>
>> Marton
>>
>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

New Revision...

This revision acknowledges the reality that we often have multiple phases
of feature lifecycle and that we need to account for each phase.
It has also been made more generic.
I have created a Tech Preview Security Audit list and a GA Readiness
Security Audit list.
I've also included suggested items into the GA Readiness list.

It has also been suggested that we publish the information as part of docs
so that the state of such features can be easily determined from these
pages. We can discuss this aspect as well.

Thoughts?

*Tech Preview Security Audit*
For features that are being merged without full security model coverage,
there need to be a base line of assurances that they do not introduce new
attack vectors in deployments that are from actual releases or even just
built from trunk.

*1. UIs*

1.1. Are there new UIs added with this merge?
1.2. Are they enabled/accessible by default?
1.3. Are they hosted in existing processes or as part of a new
process/server?
1.4. If new process/server, is it launched by default?

*2. APIs*

2.1. Are there new REST APIs added with this merge?
2.2. Are they enabled by default?
2.3. Are there RPC based APIs added with this merge?
2.4. Are they enabled by default?

*3. Secure Clusters*

3.1. Is this feature disabled completely in secure deployments?
3.2. If not, is there some justification as to why it should be available?

*4. CVEs*

4.1. Have all dependencies introduced by this merge been checked for known
issues?

--------------------------------------------------------------------------------------------------------------------------------------------------

*GA Readiness Security Audit*
At this point, we are merging full or partial security model
implementations.
Let's inventory what is covered by the model at this point and whether
there are future merges required to be full.

*1. UIs*

1.1. What sort of validation is being done on any accepted user input?
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting
  1.2.2. cross site request forgery
  1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
  1.3.1. Kerberos
    1.3.1.1. has TGT renewal been accounted for
    1.3.1.2. SPNEGO support?
    1.3.1.3. Delegation token?
  1.3.2. Proxy User ACL?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data and/or related
processes?
1.5. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
1.7. Is there TLS/SSL support?

*2. REST APIs*

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS)
  2.2.2. cross site request forgery (CSRF)
  2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are they part of existing processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
2.7. What authorization enforcement points are there within the REST APIs?

*3. Encryption*

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?
3.3. KMS interaction with Proxy Users?

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?

On Sat, Oct 21, 2017 at 10:26 AM, larry mccay <lm...@apache.org> wrote:

> Hi Marton -
>
> I don't think there is any denying that it would be great to have such
> documentation for all of those reasons.
> If it is a natural extension of getting the checklist information as an
> assertion of security state when merging then we can certainly include it.
>
> I think that backfilling all such information across the project is a
> different topic altogether and wouldn't want to expand the scope of this
> discussion in that direction.
>
> Thanks for the great thoughts on this!
>
> thanks,
>
> --larry
>
>
>
>
>
> On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <hd...@anzix.net> wrote:
>
>>
>>
>> On 10/21/2017 02:41 AM, larry mccay wrote:
>>
>>>
>>> "We might want to start a security section for Hadoop wiki for each of
>>>> the
>>>> services and components.
>>>> This helps to track what has been completed."
>>>>
>>>
>>> Do you mean to keep the audit checklist for each service and component
>>> there?
>>> Interesting idea, I wonder what sort of maintenance that implies and
>>> whether we want to take on that burden even though it would be great
>>> information to have for future reviewers.
>>>
>>
>> I think we should care about the maintenance of the documentation anyway.
>> We also need to maintain all the other documentations. I think it could be
>> even part of the generated docs and not the wiki.
>>
>> I also suggest to fill this list about the current trunk/3.0 as a first
>> step.
>>
>> 1. It would be a very usefull documentation for the end-users (some
>> answers could link the existing documentation, it exists, but I am not sure
>> if all the answers are in the current documentation.)
>>
>> 2. It would be a good example who the questions could be answered.
>>
>> 3. It would help to check, if something is missing from the list.
>>
>> 4. There are future branches where some of the components are not
>> touched. For example, no web ui or no REST service. A prefilled list could
>> help to check if the branch doesn't break any old security functionality on
>> trunk.
>>
>> 5. It helps to document the security features in one place. If we have a
>> list for the existing functionality in the same format, it would be easy to
>> merge the new documentation of the new features as they will be reported in
>> the same form. (So it won't be so hard to maintain the list...).
>>
>> Marton
>>
>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Hi Marton -

I don't think there is any denying that it would be great to have such
documentation for all of those reasons.
If it is a natural extension of getting the checklist information as an
assertion of security state when merging then we can certainly include it.

I think that backfilling all such information across the project is a
different topic altogether and wouldn't want to expand the scope of this
discussion in that direction.

Thanks for the great thoughts on this!

thanks,

--larry





On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <hd...@anzix.net> wrote:

>
>
> On 10/21/2017 02:41 AM, larry mccay wrote:
>
>>
>> "We might want to start a security section for Hadoop wiki for each of the
>>> services and components.
>>> This helps to track what has been completed."
>>>
>>
>> Do you mean to keep the audit checklist for each service and component
>> there?
>> Interesting idea, I wonder what sort of maintenance that implies and
>> whether we want to take on that burden even though it would be great
>> information to have for future reviewers.
>>
>
> I think we should care about the maintenance of the documentation anyway.
> We also need to maintain all the other documentations. I think it could be
> even part of the generated docs and not the wiki.
>
> I also suggest to fill this list about the current trunk/3.0 as a first
> step.
>
> 1. It would be a very usefull documentation for the end-users (some
> answers could link the existing documentation, it exists, but I am not sure
> if all the answers are in the current documentation.)
>
> 2. It would be a good example who the questions could be answered.
>
> 3. It would help to check, if something is missing from the list.
>
> 4. There are future branches where some of the components are not touched.
> For example, no web ui or no REST service. A prefilled list could help to
> check if the branch doesn't break any old security functionality on trunk.
>
> 5. It helps to document the security features in one place. If we have a
> list for the existing functionality in the same format, it would be easy to
> merge the new documentation of the new features as they will be reported in
> the same form. (So it won't be so hard to maintain the list...).
>
> Marton
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Hi Marton -

I don't think there is any denying that it would be great to have such
documentation for all of those reasons.
If it is a natural extension of getting the checklist information as an
assertion of security state when merging then we can certainly include it.

I think that backfilling all such information across the project is a
different topic altogether and wouldn't want to expand the scope of this
discussion in that direction.

Thanks for the great thoughts on this!

thanks,

--larry





On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <hd...@anzix.net> wrote:

>
>
> On 10/21/2017 02:41 AM, larry mccay wrote:
>
>>
>> "We might want to start a security section for Hadoop wiki for each of the
>>> services and components.
>>> This helps to track what has been completed."
>>>
>>
>> Do you mean to keep the audit checklist for each service and component
>> there?
>> Interesting idea, I wonder what sort of maintenance that implies and
>> whether we want to take on that burden even though it would be great
>> information to have for future reviewers.
>>
>
> I think we should care about the maintenance of the documentation anyway.
> We also need to maintain all the other documentations. I think it could be
> even part of the generated docs and not the wiki.
>
> I also suggest to fill this list about the current trunk/3.0 as a first
> step.
>
> 1. It would be a very usefull documentation for the end-users (some
> answers could link the existing documentation, it exists, but I am not sure
> if all the answers are in the current documentation.)
>
> 2. It would be a good example who the questions could be answered.
>
> 3. It would help to check, if something is missing from the list.
>
> 4. There are future branches where some of the components are not touched.
> For example, no web ui or no REST service. A prefilled list could help to
> check if the branch doesn't break any old security functionality on trunk.
>
> 5. It helps to document the security features in one place. If we have a
> list for the existing functionality in the same format, it would be easy to
> merge the new documentation of the new features as they will be reported in
> the same form. (So it won't be so hard to maintain the list...).
>
> Marton
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Hi Marton -

I don't think there is any denying that it would be great to have such
documentation for all of those reasons.
If it is a natural extension of getting the checklist information as an
assertion of security state when merging then we can certainly include it.

I think that backfilling all such information across the project is a
different topic altogether and wouldn't want to expand the scope of this
discussion in that direction.

Thanks for the great thoughts on this!

thanks,

--larry





On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <hd...@anzix.net> wrote:

>
>
> On 10/21/2017 02:41 AM, larry mccay wrote:
>
>>
>> "We might want to start a security section for Hadoop wiki for each of the
>>> services and components.
>>> This helps to track what has been completed."
>>>
>>
>> Do you mean to keep the audit checklist for each service and component
>> there?
>> Interesting idea, I wonder what sort of maintenance that implies and
>> whether we want to take on that burden even though it would be great
>> information to have for future reviewers.
>>
>
> I think we should care about the maintenance of the documentation anyway.
> We also need to maintain all the other documentations. I think it could be
> even part of the generated docs and not the wiki.
>
> I also suggest to fill this list about the current trunk/3.0 as a first
> step.
>
> 1. It would be a very usefull documentation for the end-users (some
> answers could link the existing documentation, it exists, but I am not sure
> if all the answers are in the current documentation.)
>
> 2. It would be a good example who the questions could be answered.
>
> 3. It would help to check, if something is missing from the list.
>
> 4. There are future branches where some of the components are not touched.
> For example, no web ui or no REST service. A prefilled list could help to
> check if the branch doesn't break any old security functionality on trunk.
>
> 5. It helps to document the security features in one place. If we have a
> list for the existing functionality in the same format, it would be easy to
> merge the new documentation of the new features as they will be reported in
> the same form. (So it won't be so hard to maintain the list...).
>
> Marton
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Hi Marton -

I don't think there is any denying that it would be great to have such
documentation for all of those reasons.
If it is a natural extension of getting the checklist information as an
assertion of security state when merging then we can certainly include it.

I think that backfilling all such information across the project is a
different topic altogether and wouldn't want to expand the scope of this
discussion in that direction.

Thanks for the great thoughts on this!

thanks,

--larry





On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <hd...@anzix.net> wrote:

>
>
> On 10/21/2017 02:41 AM, larry mccay wrote:
>
>>
>> "We might want to start a security section for Hadoop wiki for each of the
>>> services and components.
>>> This helps to track what has been completed."
>>>
>>
>> Do you mean to keep the audit checklist for each service and component
>> there?
>> Interesting idea, I wonder what sort of maintenance that implies and
>> whether we want to take on that burden even though it would be great
>> information to have for future reviewers.
>>
>
> I think we should care about the maintenance of the documentation anyway.
> We also need to maintain all the other documentations. I think it could be
> even part of the generated docs and not the wiki.
>
> I also suggest to fill this list about the current trunk/3.0 as a first
> step.
>
> 1. It would be a very usefull documentation for the end-users (some
> answers could link the existing documentation, it exists, but I am not sure
> if all the answers are in the current documentation.)
>
> 2. It would be a good example who the questions could be answered.
>
> 3. It would help to check, if something is missing from the list.
>
> 4. There are future branches where some of the components are not touched.
> For example, no web ui or no REST service. A prefilled list could help to
> check if the branch doesn't break any old security functionality on trunk.
>
> 5. It helps to document the security features in one place. If we have a
> list for the existing functionality in the same format, it would be easy to
> merge the new documentation of the new features as they will be reported in
> the same form. (So it won't be so hard to maintain the list...).
>
> Marton
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by "Elek, Marton" <hd...@anzix.net>.

On 10/21/2017 02:41 AM, larry mccay wrote:
> 
>> "We might want to start a security section for Hadoop wiki for each of the
>> services and components.
>> This helps to track what has been completed."
> 
> Do you mean to keep the audit checklist for each service and component
> there?
> Interesting idea, I wonder what sort of maintenance that implies and
> whether we want to take on that burden even though it would be great
> information to have for future reviewers.

I think we should care about the maintenance of the documentation 
anyway. We also need to maintain all the other documentations. I think 
it could be even part of the generated docs and not the wiki.

I also suggest to fill this list about the current trunk/3.0 as a first 
step.

1. It would be a very usefull documentation for the end-users (some 
answers could link the existing documentation, it exists, but I am not 
sure if all the answers are in the current documentation.)

2. It would be a good example who the questions could be answered.

3. It would help to check, if something is missing from the list.

4. There are future branches where some of the components are not 
touched. For example, no web ui or no REST service. A prefilled list 
could help to check if the branch doesn't break any old security 
functionality on trunk.

5. It helps to document the security features in one place. If we have a 
list for the existing functionality in the same format, it would be easy 
to merge the new documentation of the new features as they will be 
reported in the same form. (So it won't be so hard to maintain the list...).

Marton

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by "Elek, Marton" <hd...@anzix.net>.

On 10/21/2017 02:41 AM, larry mccay wrote:
> 
>> "We might want to start a security section for Hadoop wiki for each of the
>> services and components.
>> This helps to track what has been completed."
> 
> Do you mean to keep the audit checklist for each service and component
> there?
> Interesting idea, I wonder what sort of maintenance that implies and
> whether we want to take on that burden even though it would be great
> information to have for future reviewers.

I think we should care about the maintenance of the documentation 
anyway. We also need to maintain all the other documentations. I think 
it could be even part of the generated docs and not the wiki.

I also suggest to fill this list about the current trunk/3.0 as a first 
step.

1. It would be a very usefull documentation for the end-users (some 
answers could link the existing documentation, it exists, but I am not 
sure if all the answers are in the current documentation.)

2. It would be a good example who the questions could be answered.

3. It would help to check, if something is missing from the list.

4. There are future branches where some of the components are not 
touched. For example, no web ui or no REST service. A prefilled list 
could help to check if the branch doesn't break any old security 
functionality on trunk.

5. It helps to document the security features in one place. If we have a 
list for the existing functionality in the same format, it would be easy 
to merge the new documentation of the new features as they will be 
reported in the same form. (So it won't be so hard to maintain the list...).

Marton

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by "Elek, Marton" <hd...@anzix.net>.

On 10/21/2017 02:41 AM, larry mccay wrote:
> 
>> "We might want to start a security section for Hadoop wiki for each of the
>> services and components.
>> This helps to track what has been completed."
> 
> Do you mean to keep the audit checklist for each service and component
> there?
> Interesting idea, I wonder what sort of maintenance that implies and
> whether we want to take on that burden even though it would be great
> information to have for future reviewers.

I think we should care about the maintenance of the documentation 
anyway. We also need to maintain all the other documentations. I think 
it could be even part of the generated docs and not the wiki.

I also suggest to fill this list about the current trunk/3.0 as a first 
step.

1. It would be a very usefull documentation for the end-users (some 
answers could link the existing documentation, it exists, but I am not 
sure if all the answers are in the current documentation.)

2. It would be a good example who the questions could be answered.

3. It would help to check, if something is missing from the list.

4. There are future branches where some of the components are not 
touched. For example, no web ui or no REST service. A prefilled list 
could help to check if the branch doesn't break any old security 
functionality on trunk.

5. It helps to document the security features in one place. If we have a 
list for the existing functionality in the same format, it would be easy 
to merge the new documentation of the new features as they will be 
reported in the same form. (So it won't be so hard to maintain the list...).

Marton

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by "Elek, Marton" <hd...@anzix.net>.

On 10/21/2017 02:41 AM, larry mccay wrote:
> 
>> "We might want to start a security section for Hadoop wiki for each of the
>> services and components.
>> This helps to track what has been completed."
> 
> Do you mean to keep the audit checklist for each service and component
> there?
> Interesting idea, I wonder what sort of maintenance that implies and
> whether we want to take on that burden even though it would be great
> information to have for future reviewers.

I think we should care about the maintenance of the documentation 
anyway. We also need to maintain all the other documentations. I think 
it could be even part of the generated docs and not the wiki.

I also suggest to fill this list about the current trunk/3.0 as a first 
step.

1. It would be a very usefull documentation for the end-users (some 
answers could link the existing documentation, it exists, but I am not 
sure if all the answers are in the current documentation.)

2. It would be a good example who the questions could be answered.

3. It would help to check, if something is missing from the list.

4. There are future branches where some of the components are not 
touched. For example, no web ui or no REST service. A prefilled list 
could help to check if the branch doesn't break any old security 
functionality on trunk.

5. It helps to document the security features in one place. If we have a 
list for the existing functionality in the same format, it would be easy 
to merge the new documentation of the new features as they will be 
reported in the same form. (So it won't be so hard to maintain the list...).

Marton

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Hi Eric -

Thanks for the additional item suggestions!

"We might want to start a security section for Hadoop wiki for each of the
services and components.
This helps to track what has been completed."

Do you mean to keep the audit checklist for each service and component
there?
Interesting idea, I wonder what sort of maintenance that implies and
whether we want to take on that burden even though it would be great
information to have for future reviewers.

"How do we want to enforce security completeness?  Most features will not
meet all security requirements on merge day."

This is a really important question and point.
Maybe we should have started with goals and intents before the actual list.

My high level goals:

1. To have a holistic idea of what a given feature (or merge) is bringing
to the table in terms of attack surface
2. To understand the level of security that intended for the feature in its
endstate (GA)
3. To fully understand the stated level of security that is in place at the
time of each merge
4. To ensure that a merge meets some minimal bar for not adding security
vulnerabilities to deployments of a release or even builds from trunk. Not
the least of which is whether it is enabled by default and what it means to
disabled.
5. To be as unobtrusive to the branch committers as possible while still
communicating what we need for security review.
6. To have a reasonable checklist of security concerns that may or may not
apply to each merge but should be at least thought about in the final
security model design for the particular feature.

I think that feature merges often span multiple branch merges with security
coming in phases or other aspects of the feature.
This intent should maybe be part of the checklist itself so that we can
assess the audit with the level of scrutiny appropriate for the current
merge.

I will work on another revision of the list and incorporate your
suggestions as well.

thanks!

--larry

On Fri, Oct 20, 2017 at 7:42 PM, Eric Yang <ey...@hortonworks.com> wrote:

> The check list looks good.  Some more items to add:
>
> Kerberos
>   TGT renewal
>   SPNEGO support
>   Delegation token
> Proxy User ACL
>
> CVE tracking list
>
> We might want to start a security section for Hadoop wiki for each of the
> services and components.
> This helps to track what has been completed.
>
> How do we want to enforce security completeness?  Most features will not
> meet all security requirements on merge day.
>
> Regards,
> Eric
>
> On 10/20/17, 12:41 PM, "larry mccay" <lm...@apache.org> wrote:
>
>     Adding security@hadoop list as well...
>
>     On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lm...@apache.org>
> wrote:
>
>     > All -
>     >
>     > Given the maturity of Hadoop at this point, I would like to propose
> that
>     > we start doing explicit security audits of features at merge time.
>     >
>     > There are a few reasons that I think this is a good place/time to do
> the
>     > review:
>     >
>     > 1. It represents a specific snapshot of where the feature stands as a
>     > whole. This means that we can more easily identity the attack
> surface of a
>     > given feature.
>     > 2. We can identify any security gaps that need to be fixed before a
>     > release that carries the feature can be considered ready.
>     > 3. We - in extreme cases - can block a feature from merging until
> some
>     > baseline of security coverage is achieved.
>     > 4. The folks that are interested and able to review security aspects
> can't
>     > scale for every iteration over every JIRA but can review the
> checklist and
>     > follow pointers for specific areas of interest.
>     >
>     > I have provided an impromptu security audit checklist on the DISCUSS
>     > thread for merging Ozone - HDFS-7240 into trunk.
>     >
>     > I don't want to pick on it particularly but I think it is a good way
> to
>     > bootstrap this audit process and figure out how to incorporate it
> without
>     > being too intrusive.
>     >
>     > The questions that I provided below are a mix of general questions
> that
>     > could be on a standard checklist that you provide along with the
> merge
>     > thread and some that are specific to what I read about ozone in the
>     > excellent docs provided. So, we should consider some subset of the
>     > following as a proposal for a general checklist.
>     >
>     > Perhaps, a shared document can be created to iterate over the list
> to fine
>     > tune it?
>     >
>     > Any thoughts on this, any additional datapoints to collect, etc?
>     >
>     > thanks!
>     >
>     > --larry
>     >
>     > 1. UIs
>     > I see there are at least two UIs - Storage Container Manager and Key
> Space
>     > Manager. There are a number of typical vulnerabilities that we find
> in UIs
>     >
>     > 1.1. What sort of validation is being done on any accepted user
> input?
>     > (pointers to code would be appreciated)
>     > 1.2. What explicit protections have been built in for (pointers to
> code
>     > would be appreciated):
>     >   1.2.1. cross site scripting
>     >   1.2.2. cross site request forgery
>     >   1.2.3. click jacking (X-Frame-Options)
>     > 1.3. What sort of authentication is required for access to the UIs?
>     > 1.4. What authorization is available for determining who can access
> what
>     > capabilities of the UIs for either viewing, modifying data or
> affecting
>     > object stores and related processes?
>     > 1.5. Are the UIs built with proxying in mind by leveraging
> X-Forwarded
>     > headers?
>     > 1.6. Is there any input that will ultimately be persisted in
> configuration
>     > for executing shell commands or processes?
>     > 1.7. Do the UIs support the trusted proxy pattern with doas
> impersonation?
>     > 1.8. Is there TLS/SSL support?
>     >
>     > 2. REST APIs
>     >
>     > 2.1. Do the REST APIs support the trusted proxy pattern with doas
>     > impersonation capabilities?
>     > 2.2. What explicit protections have been built in for:
>     >   2.2.1. cross site scripting (XSS)
>     >   2.2.2. cross site request forgery (CSRF)
>     >   2.2.3. XML External Entity (XXE)
>     > 2.3. What is being used for authentication - Hadoop Auth Module?
>     > 2.4. Are there separate processes for the HTTP resources (UIs and
> REST
>     > endpoints) or are the part of existing HDFS processes?
>     > 2.5. Is there TLS/SSL support?
>     > 2.6. Are there new CLI commands and/or clients for access the REST
> APIs?
>     > 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
>     > authorization is required here - is there a restrictive ACL set on
> creation?
>     > 2.8. Bucket Level API allows for deleting a bucket - I assume this is
>     > dependent on ACLs based access control?
>     > 2.9. Bucket Level API to list bucket returns up to 1000 keys - is
> there
>     > paging available?
>     > 2.10. Storage Level APIs indicate “Signed with User Authorization”
> what
>     > does this refer to exactly?
>     > 2.11. Object Level APIs indicate that there is no ACL support and
> only
>     > bucket owners can read and write - but there are ACL APIs on the
> Bucket
>     > Level are they meaningless for now?
>     > 2.12. How does a REST client know which Ozone Handler to connect to
> or am
>     > I missing some well known NN type endpoint in the architecture doc
>     > somewhere?
>     >
>     > 3. Encryption
>     >
>     > 3.1. Is there any support for encryption of persisted data?
>     > 3.2. If so, is KMS and the hadoop key command used for key
> management?
>     >
>     > 4. Configuration
>     >
>     > 4.1. Are there any passwords or secrets being added to configuration?
>     > 4.2. If so, are they accessed via Configuration.getPassword() to
> allow for
>     > provisioning in credential providers?
>     > 4.3. Are there any settings that are used to launch docker
> containers or
>     > shell out any commands, etc?
>     >
>     > 5. HA
>     >
>     > 5.1. Are there provisions for HA?
>     > 5.2. Are we leveraging the existing HA capabilities in HDFS?
>     > 5.3. Is Storage Container Manager a SPOF?
>     > 5.4. I see HA listed in future work in the architecture doc - is this
>     > still an open issue?
>     >
>
>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Hi Eric -

Thanks for the additional item suggestions!

"We might want to start a security section for Hadoop wiki for each of the
services and components.
This helps to track what has been completed."

Do you mean to keep the audit checklist for each service and component
there?
Interesting idea, I wonder what sort of maintenance that implies and
whether we want to take on that burden even though it would be great
information to have for future reviewers.

"How do we want to enforce security completeness?  Most features will not
meet all security requirements on merge day."

This is a really important question and point.
Maybe we should have started with goals and intents before the actual list.

My high level goals:

1. To have a holistic idea of what a given feature (or merge) is bringing
to the table in terms of attack surface
2. To understand the level of security that intended for the feature in its
endstate (GA)
3. To fully understand the stated level of security that is in place at the
time of each merge
4. To ensure that a merge meets some minimal bar for not adding security
vulnerabilities to deployments of a release or even builds from trunk. Not
the least of which is whether it is enabled by default and what it means to
disabled.
5. To be as unobtrusive to the branch committers as possible while still
communicating what we need for security review.
6. To have a reasonable checklist of security concerns that may or may not
apply to each merge but should be at least thought about in the final
security model design for the particular feature.

I think that feature merges often span multiple branch merges with security
coming in phases or other aspects of the feature.
This intent should maybe be part of the checklist itself so that we can
assess the audit with the level of scrutiny appropriate for the current
merge.

I will work on another revision of the list and incorporate your
suggestions as well.

thanks!

--larry

On Fri, Oct 20, 2017 at 7:42 PM, Eric Yang <ey...@hortonworks.com> wrote:

> The check list looks good.  Some more items to add:
>
> Kerberos
>   TGT renewal
>   SPNEGO support
>   Delegation token
> Proxy User ACL
>
> CVE tracking list
>
> We might want to start a security section for Hadoop wiki for each of the
> services and components.
> This helps to track what has been completed.
>
> How do we want to enforce security completeness?  Most features will not
> meet all security requirements on merge day.
>
> Regards,
> Eric
>
> On 10/20/17, 12:41 PM, "larry mccay" <lm...@apache.org> wrote:
>
>     Adding security@hadoop list as well...
>
>     On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lm...@apache.org>
> wrote:
>
>     > All -
>     >
>     > Given the maturity of Hadoop at this point, I would like to propose
> that
>     > we start doing explicit security audits of features at merge time.
>     >
>     > There are a few reasons that I think this is a good place/time to do
> the
>     > review:
>     >
>     > 1. It represents a specific snapshot of where the feature stands as a
>     > whole. This means that we can more easily identity the attack
> surface of a
>     > given feature.
>     > 2. We can identify any security gaps that need to be fixed before a
>     > release that carries the feature can be considered ready.
>     > 3. We - in extreme cases - can block a feature from merging until
> some
>     > baseline of security coverage is achieved.
>     > 4. The folks that are interested and able to review security aspects
> can't
>     > scale for every iteration over every JIRA but can review the
> checklist and
>     > follow pointers for specific areas of interest.
>     >
>     > I have provided an impromptu security audit checklist on the DISCUSS
>     > thread for merging Ozone - HDFS-7240 into trunk.
>     >
>     > I don't want to pick on it particularly but I think it is a good way
> to
>     > bootstrap this audit process and figure out how to incorporate it
> without
>     > being too intrusive.
>     >
>     > The questions that I provided below are a mix of general questions
> that
>     > could be on a standard checklist that you provide along with the
> merge
>     > thread and some that are specific to what I read about ozone in the
>     > excellent docs provided. So, we should consider some subset of the
>     > following as a proposal for a general checklist.
>     >
>     > Perhaps, a shared document can be created to iterate over the list
> to fine
>     > tune it?
>     >
>     > Any thoughts on this, any additional datapoints to collect, etc?
>     >
>     > thanks!
>     >
>     > --larry
>     >
>     > 1. UIs
>     > I see there are at least two UIs - Storage Container Manager and Key
> Space
>     > Manager. There are a number of typical vulnerabilities that we find
> in UIs
>     >
>     > 1.1. What sort of validation is being done on any accepted user
> input?
>     > (pointers to code would be appreciated)
>     > 1.2. What explicit protections have been built in for (pointers to
> code
>     > would be appreciated):
>     >   1.2.1. cross site scripting
>     >   1.2.2. cross site request forgery
>     >   1.2.3. click jacking (X-Frame-Options)
>     > 1.3. What sort of authentication is required for access to the UIs?
>     > 1.4. What authorization is available for determining who can access
> what
>     > capabilities of the UIs for either viewing, modifying data or
> affecting
>     > object stores and related processes?
>     > 1.5. Are the UIs built with proxying in mind by leveraging
> X-Forwarded
>     > headers?
>     > 1.6. Is there any input that will ultimately be persisted in
> configuration
>     > for executing shell commands or processes?
>     > 1.7. Do the UIs support the trusted proxy pattern with doas
> impersonation?
>     > 1.8. Is there TLS/SSL support?
>     >
>     > 2. REST APIs
>     >
>     > 2.1. Do the REST APIs support the trusted proxy pattern with doas
>     > impersonation capabilities?
>     > 2.2. What explicit protections have been built in for:
>     >   2.2.1. cross site scripting (XSS)
>     >   2.2.2. cross site request forgery (CSRF)
>     >   2.2.3. XML External Entity (XXE)
>     > 2.3. What is being used for authentication - Hadoop Auth Module?
>     > 2.4. Are there separate processes for the HTTP resources (UIs and
> REST
>     > endpoints) or are the part of existing HDFS processes?
>     > 2.5. Is there TLS/SSL support?
>     > 2.6. Are there new CLI commands and/or clients for access the REST
> APIs?
>     > 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
>     > authorization is required here - is there a restrictive ACL set on
> creation?
>     > 2.8. Bucket Level API allows for deleting a bucket - I assume this is
>     > dependent on ACLs based access control?
>     > 2.9. Bucket Level API to list bucket returns up to 1000 keys - is
> there
>     > paging available?
>     > 2.10. Storage Level APIs indicate “Signed with User Authorization”
> what
>     > does this refer to exactly?
>     > 2.11. Object Level APIs indicate that there is no ACL support and
> only
>     > bucket owners can read and write - but there are ACL APIs on the
> Bucket
>     > Level are they meaningless for now?
>     > 2.12. How does a REST client know which Ozone Handler to connect to
> or am
>     > I missing some well known NN type endpoint in the architecture doc
>     > somewhere?
>     >
>     > 3. Encryption
>     >
>     > 3.1. Is there any support for encryption of persisted data?
>     > 3.2. If so, is KMS and the hadoop key command used for key
> management?
>     >
>     > 4. Configuration
>     >
>     > 4.1. Are there any passwords or secrets being added to configuration?
>     > 4.2. If so, are they accessed via Configuration.getPassword() to
> allow for
>     > provisioning in credential providers?
>     > 4.3. Are there any settings that are used to launch docker
> containers or
>     > shell out any commands, etc?
>     >
>     > 5. HA
>     >
>     > 5.1. Are there provisions for HA?
>     > 5.2. Are we leveraging the existing HA capabilities in HDFS?
>     > 5.3. Is Storage Container Manager a SPOF?
>     > 5.4. I see HA listed in future work in the architecture doc - is this
>     > still an open issue?
>     >
>
>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Hi Eric -

Thanks for the additional item suggestions!

"We might want to start a security section for Hadoop wiki for each of the
services and components.
This helps to track what has been completed."

Do you mean to keep the audit checklist for each service and component
there?
Interesting idea, I wonder what sort of maintenance that implies and
whether we want to take on that burden even though it would be great
information to have for future reviewers.

"How do we want to enforce security completeness?  Most features will not
meet all security requirements on merge day."

This is a really important question and point.
Maybe we should have started with goals and intents before the actual list.

My high level goals:

1. To have a holistic idea of what a given feature (or merge) is bringing
to the table in terms of attack surface
2. To understand the level of security that intended for the feature in its
endstate (GA)
3. To fully understand the stated level of security that is in place at the
time of each merge
4. To ensure that a merge meets some minimal bar for not adding security
vulnerabilities to deployments of a release or even builds from trunk. Not
the least of which is whether it is enabled by default and what it means to
disabled.
5. To be as unobtrusive to the branch committers as possible while still
communicating what we need for security review.
6. To have a reasonable checklist of security concerns that may or may not
apply to each merge but should be at least thought about in the final
security model design for the particular feature.

I think that feature merges often span multiple branch merges with security
coming in phases or other aspects of the feature.
This intent should maybe be part of the checklist itself so that we can
assess the audit with the level of scrutiny appropriate for the current
merge.

I will work on another revision of the list and incorporate your
suggestions as well.

thanks!

--larry

On Fri, Oct 20, 2017 at 7:42 PM, Eric Yang <ey...@hortonworks.com> wrote:

> The check list looks good.  Some more items to add:
>
> Kerberos
>   TGT renewal
>   SPNEGO support
>   Delegation token
> Proxy User ACL
>
> CVE tracking list
>
> We might want to start a security section for Hadoop wiki for each of the
> services and components.
> This helps to track what has been completed.
>
> How do we want to enforce security completeness?  Most features will not
> meet all security requirements on merge day.
>
> Regards,
> Eric
>
> On 10/20/17, 12:41 PM, "larry mccay" <lm...@apache.org> wrote:
>
>     Adding security@hadoop list as well...
>
>     On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lm...@apache.org>
> wrote:
>
>     > All -
>     >
>     > Given the maturity of Hadoop at this point, I would like to propose
> that
>     > we start doing explicit security audits of features at merge time.
>     >
>     > There are a few reasons that I think this is a good place/time to do
> the
>     > review:
>     >
>     > 1. It represents a specific snapshot of where the feature stands as a
>     > whole. This means that we can more easily identity the attack
> surface of a
>     > given feature.
>     > 2. We can identify any security gaps that need to be fixed before a
>     > release that carries the feature can be considered ready.
>     > 3. We - in extreme cases - can block a feature from merging until
> some
>     > baseline of security coverage is achieved.
>     > 4. The folks that are interested and able to review security aspects
> can't
>     > scale for every iteration over every JIRA but can review the
> checklist and
>     > follow pointers for specific areas of interest.
>     >
>     > I have provided an impromptu security audit checklist on the DISCUSS
>     > thread for merging Ozone - HDFS-7240 into trunk.
>     >
>     > I don't want to pick on it particularly but I think it is a good way
> to
>     > bootstrap this audit process and figure out how to incorporate it
> without
>     > being too intrusive.
>     >
>     > The questions that I provided below are a mix of general questions
> that
>     > could be on a standard checklist that you provide along with the
> merge
>     > thread and some that are specific to what I read about ozone in the
>     > excellent docs provided. So, we should consider some subset of the
>     > following as a proposal for a general checklist.
>     >
>     > Perhaps, a shared document can be created to iterate over the list
> to fine
>     > tune it?
>     >
>     > Any thoughts on this, any additional datapoints to collect, etc?
>     >
>     > thanks!
>     >
>     > --larry
>     >
>     > 1. UIs
>     > I see there are at least two UIs - Storage Container Manager and Key
> Space
>     > Manager. There are a number of typical vulnerabilities that we find
> in UIs
>     >
>     > 1.1. What sort of validation is being done on any accepted user
> input?
>     > (pointers to code would be appreciated)
>     > 1.2. What explicit protections have been built in for (pointers to
> code
>     > would be appreciated):
>     >   1.2.1. cross site scripting
>     >   1.2.2. cross site request forgery
>     >   1.2.3. click jacking (X-Frame-Options)
>     > 1.3. What sort of authentication is required for access to the UIs?
>     > 1.4. What authorization is available for determining who can access
> what
>     > capabilities of the UIs for either viewing, modifying data or
> affecting
>     > object stores and related processes?
>     > 1.5. Are the UIs built with proxying in mind by leveraging
> X-Forwarded
>     > headers?
>     > 1.6. Is there any input that will ultimately be persisted in
> configuration
>     > for executing shell commands or processes?
>     > 1.7. Do the UIs support the trusted proxy pattern with doas
> impersonation?
>     > 1.8. Is there TLS/SSL support?
>     >
>     > 2. REST APIs
>     >
>     > 2.1. Do the REST APIs support the trusted proxy pattern with doas
>     > impersonation capabilities?
>     > 2.2. What explicit protections have been built in for:
>     >   2.2.1. cross site scripting (XSS)
>     >   2.2.2. cross site request forgery (CSRF)
>     >   2.2.3. XML External Entity (XXE)
>     > 2.3. What is being used for authentication - Hadoop Auth Module?
>     > 2.4. Are there separate processes for the HTTP resources (UIs and
> REST
>     > endpoints) or are the part of existing HDFS processes?
>     > 2.5. Is there TLS/SSL support?
>     > 2.6. Are there new CLI commands and/or clients for access the REST
> APIs?
>     > 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
>     > authorization is required here - is there a restrictive ACL set on
> creation?
>     > 2.8. Bucket Level API allows for deleting a bucket - I assume this is
>     > dependent on ACLs based access control?
>     > 2.9. Bucket Level API to list bucket returns up to 1000 keys - is
> there
>     > paging available?
>     > 2.10. Storage Level APIs indicate “Signed with User Authorization”
> what
>     > does this refer to exactly?
>     > 2.11. Object Level APIs indicate that there is no ACL support and
> only
>     > bucket owners can read and write - but there are ACL APIs on the
> Bucket
>     > Level are they meaningless for now?
>     > 2.12. How does a REST client know which Ozone Handler to connect to
> or am
>     > I missing some well known NN type endpoint in the architecture doc
>     > somewhere?
>     >
>     > 3. Encryption
>     >
>     > 3.1. Is there any support for encryption of persisted data?
>     > 3.2. If so, is KMS and the hadoop key command used for key
> management?
>     >
>     > 4. Configuration
>     >
>     > 4.1. Are there any passwords or secrets being added to configuration?
>     > 4.2. If so, are they accessed via Configuration.getPassword() to
> allow for
>     > provisioning in credential providers?
>     > 4.3. Are there any settings that are used to launch docker
> containers or
>     > shell out any commands, etc?
>     >
>     > 5. HA
>     >
>     > 5.1. Are there provisions for HA?
>     > 5.2. Are we leveraging the existing HA capabilities in HDFS?
>     > 5.3. Is Storage Container Manager a SPOF?
>     > 5.4. I see HA listed in future work in the architecture doc - is this
>     > still an open issue?
>     >
>
>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Hi Eric -

Thanks for the additional item suggestions!

"We might want to start a security section for Hadoop wiki for each of the
services and components.
This helps to track what has been completed."

Do you mean to keep the audit checklist for each service and component
there?
Interesting idea, I wonder what sort of maintenance that implies and
whether we want to take on that burden even though it would be great
information to have for future reviewers.

"How do we want to enforce security completeness?  Most features will not
meet all security requirements on merge day."

This is a really important question and point.
Maybe we should have started with goals and intents before the actual list.

My high level goals:

1. To have a holistic idea of what a given feature (or merge) is bringing
to the table in terms of attack surface
2. To understand the level of security that intended for the feature in its
endstate (GA)
3. To fully understand the stated level of security that is in place at the
time of each merge
4. To ensure that a merge meets some minimal bar for not adding security
vulnerabilities to deployments of a release or even builds from trunk. Not
the least of which is whether it is enabled by default and what it means to
disabled.
5. To be as unobtrusive to the branch committers as possible while still
communicating what we need for security review.
6. To have a reasonable checklist of security concerns that may or may not
apply to each merge but should be at least thought about in the final
security model design for the particular feature.

I think that feature merges often span multiple branch merges with security
coming in phases or other aspects of the feature.
This intent should maybe be part of the checklist itself so that we can
assess the audit with the level of scrutiny appropriate for the current
merge.

I will work on another revision of the list and incorporate your
suggestions as well.

thanks!

--larry

On Fri, Oct 20, 2017 at 7:42 PM, Eric Yang <ey...@hortonworks.com> wrote:

> The check list looks good.  Some more items to add:
>
> Kerberos
>   TGT renewal
>   SPNEGO support
>   Delegation token
> Proxy User ACL
>
> CVE tracking list
>
> We might want to start a security section for Hadoop wiki for each of the
> services and components.
> This helps to track what has been completed.
>
> How do we want to enforce security completeness?  Most features will not
> meet all security requirements on merge day.
>
> Regards,
> Eric
>
> On 10/20/17, 12:41 PM, "larry mccay" <lm...@apache.org> wrote:
>
>     Adding security@hadoop list as well...
>
>     On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lm...@apache.org>
> wrote:
>
>     > All -
>     >
>     > Given the maturity of Hadoop at this point, I would like to propose
> that
>     > we start doing explicit security audits of features at merge time.
>     >
>     > There are a few reasons that I think this is a good place/time to do
> the
>     > review:
>     >
>     > 1. It represents a specific snapshot of where the feature stands as a
>     > whole. This means that we can more easily identity the attack
> surface of a
>     > given feature.
>     > 2. We can identify any security gaps that need to be fixed before a
>     > release that carries the feature can be considered ready.
>     > 3. We - in extreme cases - can block a feature from merging until
> some
>     > baseline of security coverage is achieved.
>     > 4. The folks that are interested and able to review security aspects
> can't
>     > scale for every iteration over every JIRA but can review the
> checklist and
>     > follow pointers for specific areas of interest.
>     >
>     > I have provided an impromptu security audit checklist on the DISCUSS
>     > thread for merging Ozone - HDFS-7240 into trunk.
>     >
>     > I don't want to pick on it particularly but I think it is a good way
> to
>     > bootstrap this audit process and figure out how to incorporate it
> without
>     > being too intrusive.
>     >
>     > The questions that I provided below are a mix of general questions
> that
>     > could be on a standard checklist that you provide along with the
> merge
>     > thread and some that are specific to what I read about ozone in the
>     > excellent docs provided. So, we should consider some subset of the
>     > following as a proposal for a general checklist.
>     >
>     > Perhaps, a shared document can be created to iterate over the list
> to fine
>     > tune it?
>     >
>     > Any thoughts on this, any additional datapoints to collect, etc?
>     >
>     > thanks!
>     >
>     > --larry
>     >
>     > 1. UIs
>     > I see there are at least two UIs - Storage Container Manager and Key
> Space
>     > Manager. There are a number of typical vulnerabilities that we find
> in UIs
>     >
>     > 1.1. What sort of validation is being done on any accepted user
> input?
>     > (pointers to code would be appreciated)
>     > 1.2. What explicit protections have been built in for (pointers to
> code
>     > would be appreciated):
>     >   1.2.1. cross site scripting
>     >   1.2.2. cross site request forgery
>     >   1.2.3. click jacking (X-Frame-Options)
>     > 1.3. What sort of authentication is required for access to the UIs?
>     > 1.4. What authorization is available for determining who can access
> what
>     > capabilities of the UIs for either viewing, modifying data or
> affecting
>     > object stores and related processes?
>     > 1.5. Are the UIs built with proxying in mind by leveraging
> X-Forwarded
>     > headers?
>     > 1.6. Is there any input that will ultimately be persisted in
> configuration
>     > for executing shell commands or processes?
>     > 1.7. Do the UIs support the trusted proxy pattern with doas
> impersonation?
>     > 1.8. Is there TLS/SSL support?
>     >
>     > 2. REST APIs
>     >
>     > 2.1. Do the REST APIs support the trusted proxy pattern with doas
>     > impersonation capabilities?
>     > 2.2. What explicit protections have been built in for:
>     >   2.2.1. cross site scripting (XSS)
>     >   2.2.2. cross site request forgery (CSRF)
>     >   2.2.3. XML External Entity (XXE)
>     > 2.3. What is being used for authentication - Hadoop Auth Module?
>     > 2.4. Are there separate processes for the HTTP resources (UIs and
> REST
>     > endpoints) or are the part of existing HDFS processes?
>     > 2.5. Is there TLS/SSL support?
>     > 2.6. Are there new CLI commands and/or clients for access the REST
> APIs?
>     > 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
>     > authorization is required here - is there a restrictive ACL set on
> creation?
>     > 2.8. Bucket Level API allows for deleting a bucket - I assume this is
>     > dependent on ACLs based access control?
>     > 2.9. Bucket Level API to list bucket returns up to 1000 keys - is
> there
>     > paging available?
>     > 2.10. Storage Level APIs indicate “Signed with User Authorization”
> what
>     > does this refer to exactly?
>     > 2.11. Object Level APIs indicate that there is no ACL support and
> only
>     > bucket owners can read and write - but there are ACL APIs on the
> Bucket
>     > Level are they meaningless for now?
>     > 2.12. How does a REST client know which Ozone Handler to connect to
> or am
>     > I missing some well known NN type endpoint in the architecture doc
>     > somewhere?
>     >
>     > 3. Encryption
>     >
>     > 3.1. Is there any support for encryption of persisted data?
>     > 3.2. If so, is KMS and the hadoop key command used for key
> management?
>     >
>     > 4. Configuration
>     >
>     > 4.1. Are there any passwords or secrets being added to configuration?
>     > 4.2. If so, are they accessed via Configuration.getPassword() to
> allow for
>     > provisioning in credential providers?
>     > 4.3. Are there any settings that are used to launch docker
> containers or
>     > shell out any commands, etc?
>     >
>     > 5. HA
>     >
>     > 5.1. Are there provisions for HA?
>     > 5.2. Are we leveraging the existing HA capabilities in HDFS?
>     > 5.3. Is Storage Container Manager a SPOF?
>     > 5.4. I see HA listed in future work in the architecture doc - is this
>     > still an open issue?
>     >
>
>
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by Eric Yang <ey...@hortonworks.com>.

The check list looks good.  Some more items to add:

Kerberos
  TGT renewal
  SPNEGO support
  Delegation token
Proxy User ACL

CVE tracking list

We might want to start a security section for Hadoop wiki for each of the services and components.
This helps to track what has been completed.

How do we want to enforce security completeness?  Most features will not meet all security requirements on merge day.

Regards,
Eric

On 10/20/17, 12:41 PM, "larry mccay" <lm...@apache.org> wrote:

    Adding security@hadoop list as well...
    
    On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lm...@apache.org> wrote:
    
    > All -
    >
    > Given the maturity of Hadoop at this point, I would like to propose that
    > we start doing explicit security audits of features at merge time.
    >
    > There are a few reasons that I think this is a good place/time to do the
    > review:
    >
    > 1. It represents a specific snapshot of where the feature stands as a
    > whole. This means that we can more easily identity the attack surface of a
    > given feature.
    > 2. We can identify any security gaps that need to be fixed before a
    > release that carries the feature can be considered ready.
    > 3. We - in extreme cases - can block a feature from merging until some
    > baseline of security coverage is achieved.
    > 4. The folks that are interested and able to review security aspects can't
    > scale for every iteration over every JIRA but can review the checklist and
    > follow pointers for specific areas of interest.
    >
    > I have provided an impromptu security audit checklist on the DISCUSS
    > thread for merging Ozone - HDFS-7240 into trunk.
    >
    > I don't want to pick on it particularly but I think it is a good way to
    > bootstrap this audit process and figure out how to incorporate it without
    > being too intrusive.
    >
    > The questions that I provided below are a mix of general questions that
    > could be on a standard checklist that you provide along with the merge
    > thread and some that are specific to what I read about ozone in the
    > excellent docs provided. So, we should consider some subset of the
    > following as a proposal for a general checklist.
    >
    > Perhaps, a shared document can be created to iterate over the list to fine
    > tune it?
    >
    > Any thoughts on this, any additional datapoints to collect, etc?
    >
    > thanks!
    >
    > --larry
    >
    > 1. UIs
    > I see there are at least two UIs - Storage Container Manager and Key Space
    > Manager. There are a number of typical vulnerabilities that we find in UIs
    >
    > 1.1. What sort of validation is being done on any accepted user input?
    > (pointers to code would be appreciated)
    > 1.2. What explicit protections have been built in for (pointers to code
    > would be appreciated):
    >   1.2.1. cross site scripting
    >   1.2.2. cross site request forgery
    >   1.2.3. click jacking (X-Frame-Options)
    > 1.3. What sort of authentication is required for access to the UIs?
    > 1.4. What authorization is available for determining who can access what
    > capabilities of the UIs for either viewing, modifying data or affecting
    > object stores and related processes?
    > 1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
    > headers?
    > 1.6. Is there any input that will ultimately be persisted in configuration
    > for executing shell commands or processes?
    > 1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
    > 1.8. Is there TLS/SSL support?
    >
    > 2. REST APIs
    >
    > 2.1. Do the REST APIs support the trusted proxy pattern with doas
    > impersonation capabilities?
    > 2.2. What explicit protections have been built in for:
    >   2.2.1. cross site scripting (XSS)
    >   2.2.2. cross site request forgery (CSRF)
    >   2.2.3. XML External Entity (XXE)
    > 2.3. What is being used for authentication - Hadoop Auth Module?
    > 2.4. Are there separate processes for the HTTP resources (UIs and REST
    > endpoints) or are the part of existing HDFS processes?
    > 2.5. Is there TLS/SSL support?
    > 2.6. Are there new CLI commands and/or clients for access the REST APIs?
    > 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
    > authorization is required here - is there a restrictive ACL set on creation?
    > 2.8. Bucket Level API allows for deleting a bucket - I assume this is
    > dependent on ACLs based access control?
    > 2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
    > paging available?
    > 2.10. Storage Level APIs indicate “Signed with User Authorization” what
    > does this refer to exactly?
    > 2.11. Object Level APIs indicate that there is no ACL support and only
    > bucket owners can read and write - but there are ACL APIs on the Bucket
    > Level are they meaningless for now?
    > 2.12. How does a REST client know which Ozone Handler to connect to or am
    > I missing some well known NN type endpoint in the architecture doc
    > somewhere?
    >
    > 3. Encryption
    >
    > 3.1. Is there any support for encryption of persisted data?
    > 3.2. If so, is KMS and the hadoop key command used for key management?
    >
    > 4. Configuration
    >
    > 4.1. Are there any passwords or secrets being added to configuration?
    > 4.2. If so, are they accessed via Configuration.getPassword() to allow for
    > provisioning in credential providers?
    > 4.3. Are there any settings that are used to launch docker containers or
    > shell out any commands, etc?
    >
    > 5. HA
    >
    > 5.1. Are there provisions for HA?
    > 5.2. Are we leveraging the existing HA capabilities in HDFS?
    > 5.3. Is Storage Container Manager a SPOF?
    > 5.4. I see HA listed in future work in the architecture doc - is this
    > still an open issue?
    >

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by Eric Yang <ey...@hortonworks.com>.

The check list looks good.  Some more items to add:

Kerberos
  TGT renewal
  SPNEGO support
  Delegation token
Proxy User ACL

CVE tracking list

We might want to start a security section for Hadoop wiki for each of the services and components.
This helps to track what has been completed.

How do we want to enforce security completeness?  Most features will not meet all security requirements on merge day.

Regards,
Eric

On 10/20/17, 12:41 PM, "larry mccay" <lm...@apache.org> wrote:

    Adding security@hadoop list as well...
    
    On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lm...@apache.org> wrote:
    
    > All -
    >
    > Given the maturity of Hadoop at this point, I would like to propose that
    > we start doing explicit security audits of features at merge time.
    >
    > There are a few reasons that I think this is a good place/time to do the
    > review:
    >
    > 1. It represents a specific snapshot of where the feature stands as a
    > whole. This means that we can more easily identity the attack surface of a
    > given feature.
    > 2. We can identify any security gaps that need to be fixed before a
    > release that carries the feature can be considered ready.
    > 3. We - in extreme cases - can block a feature from merging until some
    > baseline of security coverage is achieved.
    > 4. The folks that are interested and able to review security aspects can't
    > scale for every iteration over every JIRA but can review the checklist and
    > follow pointers for specific areas of interest.
    >
    > I have provided an impromptu security audit checklist on the DISCUSS
    > thread for merging Ozone - HDFS-7240 into trunk.
    >
    > I don't want to pick on it particularly but I think it is a good way to
    > bootstrap this audit process and figure out how to incorporate it without
    > being too intrusive.
    >
    > The questions that I provided below are a mix of general questions that
    > could be on a standard checklist that you provide along with the merge
    > thread and some that are specific to what I read about ozone in the
    > excellent docs provided. So, we should consider some subset of the
    > following as a proposal for a general checklist.
    >
    > Perhaps, a shared document can be created to iterate over the list to fine
    > tune it?
    >
    > Any thoughts on this, any additional datapoints to collect, etc?
    >
    > thanks!
    >
    > --larry
    >
    > 1. UIs
    > I see there are at least two UIs - Storage Container Manager and Key Space
    > Manager. There are a number of typical vulnerabilities that we find in UIs
    >
    > 1.1. What sort of validation is being done on any accepted user input?
    > (pointers to code would be appreciated)
    > 1.2. What explicit protections have been built in for (pointers to code
    > would be appreciated):
    >   1.2.1. cross site scripting
    >   1.2.2. cross site request forgery
    >   1.2.3. click jacking (X-Frame-Options)
    > 1.3. What sort of authentication is required for access to the UIs?
    > 1.4. What authorization is available for determining who can access what
    > capabilities of the UIs for either viewing, modifying data or affecting
    > object stores and related processes?
    > 1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
    > headers?
    > 1.6. Is there any input that will ultimately be persisted in configuration
    > for executing shell commands or processes?
    > 1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
    > 1.8. Is there TLS/SSL support?
    >
    > 2. REST APIs
    >
    > 2.1. Do the REST APIs support the trusted proxy pattern with doas
    > impersonation capabilities?
    > 2.2. What explicit protections have been built in for:
    >   2.2.1. cross site scripting (XSS)
    >   2.2.2. cross site request forgery (CSRF)
    >   2.2.3. XML External Entity (XXE)
    > 2.3. What is being used for authentication - Hadoop Auth Module?
    > 2.4. Are there separate processes for the HTTP resources (UIs and REST
    > endpoints) or are the part of existing HDFS processes?
    > 2.5. Is there TLS/SSL support?
    > 2.6. Are there new CLI commands and/or clients for access the REST APIs?
    > 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
    > authorization is required here - is there a restrictive ACL set on creation?
    > 2.8. Bucket Level API allows for deleting a bucket - I assume this is
    > dependent on ACLs based access control?
    > 2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
    > paging available?
    > 2.10. Storage Level APIs indicate “Signed with User Authorization” what
    > does this refer to exactly?
    > 2.11. Object Level APIs indicate that there is no ACL support and only
    > bucket owners can read and write - but there are ACL APIs on the Bucket
    > Level are they meaningless for now?
    > 2.12. How does a REST client know which Ozone Handler to connect to or am
    > I missing some well known NN type endpoint in the architecture doc
    > somewhere?
    >
    > 3. Encryption
    >
    > 3.1. Is there any support for encryption of persisted data?
    > 3.2. If so, is KMS and the hadoop key command used for key management?
    >
    > 4. Configuration
    >
    > 4.1. Are there any passwords or secrets being added to configuration?
    > 4.2. If so, are they accessed via Configuration.getPassword() to allow for
    > provisioning in credential providers?
    > 4.3. Are there any settings that are used to launch docker containers or
    > shell out any commands, etc?
    >
    > 5. HA
    >
    > 5.1. Are there provisions for HA?
    > 5.2. Are we leveraging the existing HA capabilities in HDFS?
    > 5.3. Is Storage Container Manager a SPOF?
    > 5.4. I see HA listed in future work in the architecture doc - is this
    > still an open issue?
    >

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by Eric Yang <ey...@hortonworks.com>.

The check list looks good.  Some more items to add:

Kerberos
  TGT renewal
  SPNEGO support
  Delegation token
Proxy User ACL

CVE tracking list

We might want to start a security section for Hadoop wiki for each of the services and components.
This helps to track what has been completed.

How do we want to enforce security completeness?  Most features will not meet all security requirements on merge day.

Regards,
Eric

On 10/20/17, 12:41 PM, "larry mccay" <lm...@apache.org> wrote:

    Adding security@hadoop list as well...
    
    On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lm...@apache.org> wrote:
    
    > All -
    >
    > Given the maturity of Hadoop at this point, I would like to propose that
    > we start doing explicit security audits of features at merge time.
    >
    > There are a few reasons that I think this is a good place/time to do the
    > review:
    >
    > 1. It represents a specific snapshot of where the feature stands as a
    > whole. This means that we can more easily identity the attack surface of a
    > given feature.
    > 2. We can identify any security gaps that need to be fixed before a
    > release that carries the feature can be considered ready.
    > 3. We - in extreme cases - can block a feature from merging until some
    > baseline of security coverage is achieved.
    > 4. The folks that are interested and able to review security aspects can't
    > scale for every iteration over every JIRA but can review the checklist and
    > follow pointers for specific areas of interest.
    >
    > I have provided an impromptu security audit checklist on the DISCUSS
    > thread for merging Ozone - HDFS-7240 into trunk.
    >
    > I don't want to pick on it particularly but I think it is a good way to
    > bootstrap this audit process and figure out how to incorporate it without
    > being too intrusive.
    >
    > The questions that I provided below are a mix of general questions that
    > could be on a standard checklist that you provide along with the merge
    > thread and some that are specific to what I read about ozone in the
    > excellent docs provided. So, we should consider some subset of the
    > following as a proposal for a general checklist.
    >
    > Perhaps, a shared document can be created to iterate over the list to fine
    > tune it?
    >
    > Any thoughts on this, any additional datapoints to collect, etc?
    >
    > thanks!
    >
    > --larry
    >
    > 1. UIs
    > I see there are at least two UIs - Storage Container Manager and Key Space
    > Manager. There are a number of typical vulnerabilities that we find in UIs
    >
    > 1.1. What sort of validation is being done on any accepted user input?
    > (pointers to code would be appreciated)
    > 1.2. What explicit protections have been built in for (pointers to code
    > would be appreciated):
    >   1.2.1. cross site scripting
    >   1.2.2. cross site request forgery
    >   1.2.3. click jacking (X-Frame-Options)
    > 1.3. What sort of authentication is required for access to the UIs?
    > 1.4. What authorization is available for determining who can access what
    > capabilities of the UIs for either viewing, modifying data or affecting
    > object stores and related processes?
    > 1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
    > headers?
    > 1.6. Is there any input that will ultimately be persisted in configuration
    > for executing shell commands or processes?
    > 1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
    > 1.8. Is there TLS/SSL support?
    >
    > 2. REST APIs
    >
    > 2.1. Do the REST APIs support the trusted proxy pattern with doas
    > impersonation capabilities?
    > 2.2. What explicit protections have been built in for:
    >   2.2.1. cross site scripting (XSS)
    >   2.2.2. cross site request forgery (CSRF)
    >   2.2.3. XML External Entity (XXE)
    > 2.3. What is being used for authentication - Hadoop Auth Module?
    > 2.4. Are there separate processes for the HTTP resources (UIs and REST
    > endpoints) or are the part of existing HDFS processes?
    > 2.5. Is there TLS/SSL support?
    > 2.6. Are there new CLI commands and/or clients for access the REST APIs?
    > 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
    > authorization is required here - is there a restrictive ACL set on creation?
    > 2.8. Bucket Level API allows for deleting a bucket - I assume this is
    > dependent on ACLs based access control?
    > 2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
    > paging available?
    > 2.10. Storage Level APIs indicate “Signed with User Authorization” what
    > does this refer to exactly?
    > 2.11. Object Level APIs indicate that there is no ACL support and only
    > bucket owners can read and write - but there are ACL APIs on the Bucket
    > Level are they meaningless for now?
    > 2.12. How does a REST client know which Ozone Handler to connect to or am
    > I missing some well known NN type endpoint in the architecture doc
    > somewhere?
    >
    > 3. Encryption
    >
    > 3.1. Is there any support for encryption of persisted data?
    > 3.2. If so, is KMS and the hadoop key command used for key management?
    >
    > 4. Configuration
    >
    > 4.1. Are there any passwords or secrets being added to configuration?
    > 4.2. If so, are they accessed via Configuration.getPassword() to allow for
    > provisioning in credential providers?
    > 4.3. Are there any settings that are used to launch docker containers or
    > shell out any commands, etc?
    >
    > 5. HA
    >
    > 5.1. Are there provisions for HA?
    > 5.2. Are we leveraging the existing HA capabilities in HDFS?
    > 5.3. Is Storage Container Manager a SPOF?
    > 5.4. I see HA listed in future work in the architecture doc - is this
    > still an open issue?
    >

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Adding security@hadoop list as well...

On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lm...@apache.org> wrote:

> All -
>
> Given the maturity of Hadoop at this point, I would like to propose that
> we start doing explicit security audits of features at merge time.
>
> There are a few reasons that I think this is a good place/time to do the
> review:
>
> 1. It represents a specific snapshot of where the feature stands as a
> whole. This means that we can more easily identity the attack surface of a
> given feature.
> 2. We can identify any security gaps that need to be fixed before a
> release that carries the feature can be considered ready.
> 3. We - in extreme cases - can block a feature from merging until some
> baseline of security coverage is achieved.
> 4. The folks that are interested and able to review security aspects can't
> scale for every iteration over every JIRA but can review the checklist and
> follow pointers for specific areas of interest.
>
> I have provided an impromptu security audit checklist on the DISCUSS
> thread for merging Ozone - HDFS-7240 into trunk.
>
> I don't want to pick on it particularly but I think it is a good way to
> bootstrap this audit process and figure out how to incorporate it without
> being too intrusive.
>
> The questions that I provided below are a mix of general questions that
> could be on a standard checklist that you provide along with the merge
> thread and some that are specific to what I read about ozone in the
> excellent docs provided. So, we should consider some subset of the
> following as a proposal for a general checklist.
>
> Perhaps, a shared document can be created to iterate over the list to fine
> tune it?
>
> Any thoughts on this, any additional datapoints to collect, etc?
>
> thanks!
>
> --larry
>
> 1. UIs
> I see there are at least two UIs - Storage Container Manager and Key Space
> Manager. There are a number of typical vulnerabilities that we find in UIs
>
> 1.1. What sort of validation is being done on any accepted user input?
> (pointers to code would be appreciated)
> 1.2. What explicit protections have been built in for (pointers to code
> would be appreciated):
>   1.2.1. cross site scripting
>   1.2.2. cross site request forgery
>   1.2.3. click jacking (X-Frame-Options)
> 1.3. What sort of authentication is required for access to the UIs?
> 1.4. What authorization is available for determining who can access what
> capabilities of the UIs for either viewing, modifying data or affecting
> object stores and related processes?
> 1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
> headers?
> 1.6. Is there any input that will ultimately be persisted in configuration
> for executing shell commands or processes?
> 1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
> 1.8. Is there TLS/SSL support?
>
> 2. REST APIs
>
> 2.1. Do the REST APIs support the trusted proxy pattern with doas
> impersonation capabilities?
> 2.2. What explicit protections have been built in for:
>   2.2.1. cross site scripting (XSS)
>   2.2.2. cross site request forgery (CSRF)
>   2.2.3. XML External Entity (XXE)
> 2.3. What is being used for authentication - Hadoop Auth Module?
> 2.4. Are there separate processes for the HTTP resources (UIs and REST
> endpoints) or are the part of existing HDFS processes?
> 2.5. Is there TLS/SSL support?
> 2.6. Are there new CLI commands and/or clients for access the REST APIs?
> 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
> authorization is required here - is there a restrictive ACL set on creation?
> 2.8. Bucket Level API allows for deleting a bucket - I assume this is
> dependent on ACLs based access control?
> 2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
> paging available?
> 2.10. Storage Level APIs indicate “Signed with User Authorization” what
> does this refer to exactly?
> 2.11. Object Level APIs indicate that there is no ACL support and only
> bucket owners can read and write - but there are ACL APIs on the Bucket
> Level are they meaningless for now?
> 2.12. How does a REST client know which Ozone Handler to connect to or am
> I missing some well known NN type endpoint in the architecture doc
> somewhere?
>
> 3. Encryption
>
> 3.1. Is there any support for encryption of persisted data?
> 3.2. If so, is KMS and the hadoop key command used for key management?
>
> 4. Configuration
>
> 4.1. Are there any passwords or secrets being added to configuration?
> 4.2. If so, are they accessed via Configuration.getPassword() to allow for
> provisioning in credential providers?
> 4.3. Are there any settings that are used to launch docker containers or
> shell out any commands, etc?
>
> 5. HA
>
> 5.1. Are there provisions for HA?
> 5.2. Are we leveraging the existing HA capabilities in HDFS?
> 5.3. Is Storage Container Manager a SPOF?
> 5.4. I see HA listed in future work in the architecture doc - is this
> still an open issue?
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Adding security@hadoop list as well...

On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lm...@apache.org> wrote:

> All -
>
> Given the maturity of Hadoop at this point, I would like to propose that
> we start doing explicit security audits of features at merge time.
>
> There are a few reasons that I think this is a good place/time to do the
> review:
>
> 1. It represents a specific snapshot of where the feature stands as a
> whole. This means that we can more easily identity the attack surface of a
> given feature.
> 2. We can identify any security gaps that need to be fixed before a
> release that carries the feature can be considered ready.
> 3. We - in extreme cases - can block a feature from merging until some
> baseline of security coverage is achieved.
> 4. The folks that are interested and able to review security aspects can't
> scale for every iteration over every JIRA but can review the checklist and
> follow pointers for specific areas of interest.
>
> I have provided an impromptu security audit checklist on the DISCUSS
> thread for merging Ozone - HDFS-7240 into trunk.
>
> I don't want to pick on it particularly but I think it is a good way to
> bootstrap this audit process and figure out how to incorporate it without
> being too intrusive.
>
> The questions that I provided below are a mix of general questions that
> could be on a standard checklist that you provide along with the merge
> thread and some that are specific to what I read about ozone in the
> excellent docs provided. So, we should consider some subset of the
> following as a proposal for a general checklist.
>
> Perhaps, a shared document can be created to iterate over the list to fine
> tune it?
>
> Any thoughts on this, any additional datapoints to collect, etc?
>
> thanks!
>
> --larry
>
> 1. UIs
> I see there are at least two UIs - Storage Container Manager and Key Space
> Manager. There are a number of typical vulnerabilities that we find in UIs
>
> 1.1. What sort of validation is being done on any accepted user input?
> (pointers to code would be appreciated)
> 1.2. What explicit protections have been built in for (pointers to code
> would be appreciated):
>   1.2.1. cross site scripting
>   1.2.2. cross site request forgery
>   1.2.3. click jacking (X-Frame-Options)
> 1.3. What sort of authentication is required for access to the UIs?
> 1.4. What authorization is available for determining who can access what
> capabilities of the UIs for either viewing, modifying data or affecting
> object stores and related processes?
> 1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
> headers?
> 1.6. Is there any input that will ultimately be persisted in configuration
> for executing shell commands or processes?
> 1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
> 1.8. Is there TLS/SSL support?
>
> 2. REST APIs
>
> 2.1. Do the REST APIs support the trusted proxy pattern with doas
> impersonation capabilities?
> 2.2. What explicit protections have been built in for:
>   2.2.1. cross site scripting (XSS)
>   2.2.2. cross site request forgery (CSRF)
>   2.2.3. XML External Entity (XXE)
> 2.3. What is being used for authentication - Hadoop Auth Module?
> 2.4. Are there separate processes for the HTTP resources (UIs and REST
> endpoints) or are the part of existing HDFS processes?
> 2.5. Is there TLS/SSL support?
> 2.6. Are there new CLI commands and/or clients for access the REST APIs?
> 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
> authorization is required here - is there a restrictive ACL set on creation?
> 2.8. Bucket Level API allows for deleting a bucket - I assume this is
> dependent on ACLs based access control?
> 2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
> paging available?
> 2.10. Storage Level APIs indicate “Signed with User Authorization” what
> does this refer to exactly?
> 2.11. Object Level APIs indicate that there is no ACL support and only
> bucket owners can read and write - but there are ACL APIs on the Bucket
> Level are they meaningless for now?
> 2.12. How does a REST client know which Ozone Handler to connect to or am
> I missing some well known NN type endpoint in the architecture doc
> somewhere?
>
> 3. Encryption
>
> 3.1. Is there any support for encryption of persisted data?
> 3.2. If so, is KMS and the hadoop key command used for key management?
>
> 4. Configuration
>
> 4.1. Are there any passwords or secrets being added to configuration?
> 4.2. If so, are they accessed via Configuration.getPassword() to allow for
> provisioning in credential providers?
> 4.3. Are there any settings that are used to launch docker containers or
> shell out any commands, etc?
>
> 5. HA
>
> 5.1. Are there provisions for HA?
> 5.2. Are we leveraging the existing HA capabilities in HDFS?
> 5.3. Is Storage Container Manager a SPOF?
> 5.4. I see HA listed in future work in the architecture doc - is this
> still an open issue?
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Adding security@hadoop list as well...

On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lm...@apache.org> wrote:

> All -
>
> Given the maturity of Hadoop at this point, I would like to propose that
> we start doing explicit security audits of features at merge time.
>
> There are a few reasons that I think this is a good place/time to do the
> review:
>
> 1. It represents a specific snapshot of where the feature stands as a
> whole. This means that we can more easily identity the attack surface of a
> given feature.
> 2. We can identify any security gaps that need to be fixed before a
> release that carries the feature can be considered ready.
> 3. We - in extreme cases - can block a feature from merging until some
> baseline of security coverage is achieved.
> 4. The folks that are interested and able to review security aspects can't
> scale for every iteration over every JIRA but can review the checklist and
> follow pointers for specific areas of interest.
>
> I have provided an impromptu security audit checklist on the DISCUSS
> thread for merging Ozone - HDFS-7240 into trunk.
>
> I don't want to pick on it particularly but I think it is a good way to
> bootstrap this audit process and figure out how to incorporate it without
> being too intrusive.
>
> The questions that I provided below are a mix of general questions that
> could be on a standard checklist that you provide along with the merge
> thread and some that are specific to what I read about ozone in the
> excellent docs provided. So, we should consider some subset of the
> following as a proposal for a general checklist.
>
> Perhaps, a shared document can be created to iterate over the list to fine
> tune it?
>
> Any thoughts on this, any additional datapoints to collect, etc?
>
> thanks!
>
> --larry
>
> 1. UIs
> I see there are at least two UIs - Storage Container Manager and Key Space
> Manager. There are a number of typical vulnerabilities that we find in UIs
>
> 1.1. What sort of validation is being done on any accepted user input?
> (pointers to code would be appreciated)
> 1.2. What explicit protections have been built in for (pointers to code
> would be appreciated):
>   1.2.1. cross site scripting
>   1.2.2. cross site request forgery
>   1.2.3. click jacking (X-Frame-Options)
> 1.3. What sort of authentication is required for access to the UIs?
> 1.4. What authorization is available for determining who can access what
> capabilities of the UIs for either viewing, modifying data or affecting
> object stores and related processes?
> 1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
> headers?
> 1.6. Is there any input that will ultimately be persisted in configuration
> for executing shell commands or processes?
> 1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
> 1.8. Is there TLS/SSL support?
>
> 2. REST APIs
>
> 2.1. Do the REST APIs support the trusted proxy pattern with doas
> impersonation capabilities?
> 2.2. What explicit protections have been built in for:
>   2.2.1. cross site scripting (XSS)
>   2.2.2. cross site request forgery (CSRF)
>   2.2.3. XML External Entity (XXE)
> 2.3. What is being used for authentication - Hadoop Auth Module?
> 2.4. Are there separate processes for the HTTP resources (UIs and REST
> endpoints) or are the part of existing HDFS processes?
> 2.5. Is there TLS/SSL support?
> 2.6. Are there new CLI commands and/or clients for access the REST APIs?
> 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
> authorization is required here - is there a restrictive ACL set on creation?
> 2.8. Bucket Level API allows for deleting a bucket - I assume this is
> dependent on ACLs based access control?
> 2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
> paging available?
> 2.10. Storage Level APIs indicate “Signed with User Authorization” what
> does this refer to exactly?
> 2.11. Object Level APIs indicate that there is no ACL support and only
> bucket owners can read and write - but there are ACL APIs on the Bucket
> Level are they meaningless for now?
> 2.12. How does a REST client know which Ozone Handler to connect to or am
> I missing some well known NN type endpoint in the architecture doc
> somewhere?
>
> 3. Encryption
>
> 3.1. Is there any support for encryption of persisted data?
> 3.2. If so, is KMS and the hadoop key command used for key management?
>
> 4. Configuration
>
> 4.1. Are there any passwords or secrets being added to configuration?
> 4.2. If so, are they accessed via Configuration.getPassword() to allow for
> provisioning in credential providers?
> 4.3. Are there any settings that are used to launch docker containers or
> shell out any commands, etc?
>
> 5. HA
>
> 5.1. Are there provisions for HA?
> 5.2. Are we leveraging the existing HA capabilities in HDFS?
> 5.3. Is Storage Container Manager a SPOF?
> 5.4. I see HA listed in future work in the architecture doc - is this
> still an open issue?
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

Posted by larry mccay <lm...@apache.org>.

Adding security@hadoop list as well...

On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lm...@apache.org> wrote:

> All -
>
> Given the maturity of Hadoop at this point, I would like to propose that
> we start doing explicit security audits of features at merge time.
>
> There are a few reasons that I think this is a good place/time to do the
> review:
>
> 1. It represents a specific snapshot of where the feature stands as a
> whole. This means that we can more easily identity the attack surface of a
> given feature.
> 2. We can identify any security gaps that need to be fixed before a
> release that carries the feature can be considered ready.
> 3. We - in extreme cases - can block a feature from merging until some
> baseline of security coverage is achieved.
> 4. The folks that are interested and able to review security aspects can't
> scale for every iteration over every JIRA but can review the checklist and
> follow pointers for specific areas of interest.
>
> I have provided an impromptu security audit checklist on the DISCUSS
> thread for merging Ozone - HDFS-7240 into trunk.
>
> I don't want to pick on it particularly but I think it is a good way to
> bootstrap this audit process and figure out how to incorporate it without
> being too intrusive.
>
> The questions that I provided below are a mix of general questions that
> could be on a standard checklist that you provide along with the merge
> thread and some that are specific to what I read about ozone in the
> excellent docs provided. So, we should consider some subset of the
> following as a proposal for a general checklist.
>
> Perhaps, a shared document can be created to iterate over the list to fine
> tune it?
>
> Any thoughts on this, any additional datapoints to collect, etc?
>
> thanks!
>
> --larry
>
> 1. UIs
> I see there are at least two UIs - Storage Container Manager and Key Space
> Manager. There are a number of typical vulnerabilities that we find in UIs
>
> 1.1. What sort of validation is being done on any accepted user input?
> (pointers to code would be appreciated)
> 1.2. What explicit protections have been built in for (pointers to code
> would be appreciated):
>   1.2.1. cross site scripting
>   1.2.2. cross site request forgery
>   1.2.3. click jacking (X-Frame-Options)
> 1.3. What sort of authentication is required for access to the UIs?
> 1.4. What authorization is available for determining who can access what
> capabilities of the UIs for either viewing, modifying data or affecting
> object stores and related processes?
> 1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
> headers?
> 1.6. Is there any input that will ultimately be persisted in configuration
> for executing shell commands or processes?
> 1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
> 1.8. Is there TLS/SSL support?
>
> 2. REST APIs
>
> 2.1. Do the REST APIs support the trusted proxy pattern with doas
> impersonation capabilities?
> 2.2. What explicit protections have been built in for:
>   2.2.1. cross site scripting (XSS)
>   2.2.2. cross site request forgery (CSRF)
>   2.2.3. XML External Entity (XXE)
> 2.3. What is being used for authentication - Hadoop Auth Module?
> 2.4. Are there separate processes for the HTTP resources (UIs and REST
> endpoints) or are the part of existing HDFS processes?
> 2.5. Is there TLS/SSL support?
> 2.6. Are there new CLI commands and/or clients for access the REST APIs?
> 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
> authorization is required here - is there a restrictive ACL set on creation?
> 2.8. Bucket Level API allows for deleting a bucket - I assume this is
> dependent on ACLs based access control?
> 2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
> paging available?
> 2.10. Storage Level APIs indicate “Signed with User Authorization” what
> does this refer to exactly?
> 2.11. Object Level APIs indicate that there is no ACL support and only
> bucket owners can read and write - but there are ACL APIs on the Bucket
> Level are they meaningless for now?
> 2.12. How does a REST client know which Ozone Handler to connect to or am
> I missing some well known NN type endpoint in the architecture doc
> somewhere?
>
> 3. Encryption
>
> 3.1. Is there any support for encryption of persisted data?
> 3.2. If so, is KMS and the hadoop key command used for key management?
>
> 4. Configuration
>
> 4.1. Are there any passwords or secrets being added to configuration?
> 4.2. If so, are they accessed via Configuration.getPassword() to allow for
> provisioning in credential providers?
> 4.3. Are there any settings that are used to launch docker containers or
> shell out any commands, etc?
>
> 5. HA
>
> 5.1. Are there provisions for HA?
> 5.2. Are we leveraging the existing HA capabilities in HDFS?
> 5.3. Is Storage Container Manager a SPOF?
> 5.4. I see HA listed in future work in the architecture doc - is this
> still an open issue?
>