You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ranger.apache.org by Ramesh Mani <rm...@hortonworks.com> on 2021/03/31 18:01:51 UTC

Re: Review Request 63552: RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63552/
-----------------------------------------------------------

(Updated March 31, 2021, 6:01 p.m.)


Review request for ranger, Don Bosco Durai, Abhay Kulkarni, Madhan Neethiraj, Mehul Parikh, Selvamohan Neethiraj, Sailaja Polavarapu, and Velmurugan Periasamy.


Changes
-------

Rebased to include HFlushCapableStream check


Bugs: RANGER-1837
    https://issues.apache.org/jira/browse/RANGER-1837


Repository: ranger


Description
-------

RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format


Diffs (updated)
-----

  agents-audit/pom.xml b9f6af27c 
  agents-audit/src/main/java/org/apache/ranger/audit/destination/HDFSAuditDestination.java 5e6f40226 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditHandler.java 4ce31dd09 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditProviderFactory.java 6b7f4b00b 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditWriterFactory.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/BaseAuditHandler.java 54f37644b 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/DummyAuditProvider.java 05f882ff3 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/MiscUtil.java e2b74489b 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/MultiDestAuditProvider.java 282f5abfa 
  agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileCacheProviderSpool.java 41513ba40 
  agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueue.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueueSpool.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/AbstractRangerAuditWriter.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCFileUtil.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerAuditWriter.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerJSONAuditWriter.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerORCAuditWriter.java PRE-CREATION 


Diff: https://reviews.apache.org/r/63552/diff/7/

Changes: https://reviews.apache.org/r/63552/diff/6-7/


Testing
-------

Testing done in local

ORC FILE FORMAT in HDFS Ranger Audit log with local audit file store as source for HDFS audit:
	NOTE: When this is done each records in the local file will be read for creating the ORC File.

    1. Enable Ranger Audit to HDFS in ORC file format using AuditFileQueue
        - To enable Ranger Audit to HDFS with ORC format, we need to first enable AuditFileQueue to spool the audit to local first.
            * In Namenode host, create spool directory and make sure the path can be read/write/execute for owner of the Service for which Ranger plugin is enabled ( e.g HDFS Service it is hdfs:hadoop, Hive Service it is hive:hadoop ..etc)

                $ mkdir -p  /var/log/hadoop/audit/staging/spool
                $ cd /var/log/hadoop/audit/staging/spool
                $ chown hdfs:hadoop spool

            * Enable AuditFileQueue via following params in ranger-<component>-audit.xml
               xasecure.audit.destination.hdfs.batch.queuetype=filequeue (NOTE: default = memqueue which is the behaviour where a  memory queue / buffer is used  instead of Local File buffer)
			   xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300    ( This will determine the batch size for ORC file which is created)
               xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hadoop/audit/staging/spool  ( This is the local staging directory for audit)
               xasecure.audit.destination.hdfs.batch.filequeue.filespool.buffer.size=10000  ( This will determine batch size for ORC file creation alone with rollover.sec parameter)

    2. Enable ORC fileformat for Ranger HDFS Audit.
          - This is done by having the following param in ranger-<component>-audit.xml. By default the value is "json"

            xasecure.audit.destination.hdfs.filetype=orc ( default = json )

    3. Provision to control the compression techniques for ORC format. Default is 'snappy'
            xasecure.audit.destination.hdfs.orc.compression=snappy|lzo|zlip|none

    4. Buffer Size and Stripe Size of ORC file batch. Default is '10000' bytes and '100000' bytes respectively. This will decide the batch size on ORC file in hdfs.
            xasecure.audit.destination.hdfs.orc.buffersize= (value in bytes)
            xasecure.audit.destination.hdfs.orc.stripesize= (value in bytes)

    5. Hive Query to create ORC table with default 'snappy' compresssion.

        CREATE EXTERNAL TABLE ranger_audit_event (
        repositoryType int,
        repositoryName string,
        reqUser string,
        evtTime string,
        accessType string,
        resourcePath string,
        resourceType string,
        action  string,
        accessResult string,
        agentId string,
        policyId  bigint,
        resultReason string,
        aclEnforcer string,
        sessionId string,
        clientType string,
        clientIP string,
        requestData string,
        clusterName string
        )
        STORED AS ORC
        LOCATION '/ranger/audit/hdfs'
        TBLPROPERTIES  ("orc.compress"="SNAPPY");


-------------------------

JSON FILE FORMAT in HDFS Ranger Audit log with local audit file store as source for HDFS audit:
	NOTE: When this is done each local file will be copied entirely into HDFS destination. This enables us to generate Ranger audit files in HDFS which are larger in size which is a preferred.
	
	 1. Enable Ranger Audit to HDFS in JSON file format using AuditFileQueue
        - To enable Ranger Audit to HDFS with JSON format and local file cached, we need to first enable AuditFileQueue to spool the audit to locally.

            * In Namenode host, create spool directory and make sure the path can be read/write/execute for owner of the Service for which Ranger plugin is enabled (e.g HDFS Service it is hdfs:hadoop, Hive Service it is hive:hadoop ..etc)

                $ mkdir -p  /var/log/hadoop/audit/staging/spool
                $ cd /var/log/hadoop/audit/staging/spool
                $ chown hdfs:hadoop spool

            * Enable AuditFileQueue via following params in ranger-<component>-audit.xml
               xasecure.audit.destination.hdfs.batch.queuetype=filequeue ( NOTE: default = memqueue which is the behaviour where a  memory queue / buffer is used  instead of Local File buffer)
			   xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300    ( This will determine the JSON file size which will be copied to HDFS)
               xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hadoop/audit/staging/spool  ( This is the local staging directory for audit)


Thanks,

Ramesh Mani


Re: Review Request 63552: RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format

Posted by Velmurugan Periasamy <vp...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63552/#review222758
-----------------------------------------------------------


Ship it!




Ship It!

- Velmurugan Periasamy


On March 31, 2021, 6:01 p.m., Ramesh Mani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63552/
> -----------------------------------------------------------
> 
> (Updated March 31, 2021, 6:01 p.m.)
> 
> 
> Review request for ranger, Don Bosco Durai, Abhay Kulkarni, Madhan Neethiraj, Mehul Parikh, Selvamohan Neethiraj, Sailaja Polavarapu, and Velmurugan Periasamy.
> 
> 
> Bugs: RANGER-1837
>     https://issues.apache.org/jira/browse/RANGER-1837
> 
> 
> Repository: ranger
> 
> 
> Description
> -------
> 
> RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format
> 
> 
> Diffs
> -----
> 
>   agents-audit/pom.xml b9f6af27c 
>   agents-audit/src/main/java/org/apache/ranger/audit/destination/HDFSAuditDestination.java 5e6f40226 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditHandler.java 4ce31dd09 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditProviderFactory.java 6b7f4b00b 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditWriterFactory.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/BaseAuditHandler.java 54f37644b 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/DummyAuditProvider.java 05f882ff3 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/MiscUtil.java e2b74489b 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/MultiDestAuditProvider.java 282f5abfa 
>   agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileCacheProviderSpool.java 41513ba40 
>   agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueue.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueueSpool.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/AbstractRangerAuditWriter.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCFileUtil.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerAuditWriter.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerJSONAuditWriter.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerORCAuditWriter.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63552/diff/7/
> 
> 
> Testing
> -------
> 
> Testing done in local
> 
> ORC FILE FORMAT in HDFS Ranger Audit log with local audit file store as source for HDFS audit:
> 	NOTE: When this is done each records in the local file will be read for creating the ORC File.
> 
>     1. Enable Ranger Audit to HDFS in ORC file format using AuditFileQueue
>         - To enable Ranger Audit to HDFS with ORC format, we need to first enable AuditFileQueue to spool the audit to local first.
>             * In Namenode host, create spool directory and make sure the path can be read/write/execute for owner of the Service for which Ranger plugin is enabled ( e.g HDFS Service it is hdfs:hadoop, Hive Service it is hive:hadoop ..etc)
> 
>                 $ mkdir -p  /var/log/hadoop/audit/staging/spool
>                 $ cd /var/log/hadoop/audit/staging/spool
>                 $ chown hdfs:hadoop spool
> 
>             * Enable AuditFileQueue via following params in ranger-<component>-audit.xml
>                xasecure.audit.destination.hdfs.batch.queuetype=filequeue (NOTE: default = memqueue which is the behaviour where a  memory queue / buffer is used  instead of Local File buffer)
> 			   xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300    ( This will determine the batch size for ORC file which is created)
>                xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hadoop/audit/staging/spool  ( This is the local staging directory for audit)
>                xasecure.audit.destination.hdfs.batch.filequeue.filespool.buffer.size=10000  ( This will determine batch size for ORC file creation alone with rollover.sec parameter)
> 
>     2. Enable ORC fileformat for Ranger HDFS Audit.
>           - This is done by having the following param in ranger-<component>-audit.xml. By default the value is "json"
> 
>             xasecure.audit.destination.hdfs.filetype=orc ( default = json )
> 
>     3. Provision to control the compression techniques for ORC format. Default is 'snappy'
>             xasecure.audit.destination.hdfs.orc.compression=snappy|lzo|zlip|none
> 
>     4. Buffer Size and Stripe Size of ORC file batch. Default is '10000' bytes and '100000' bytes respectively. This will decide the batch size on ORC file in hdfs.
>             xasecure.audit.destination.hdfs.orc.buffersize= (value in bytes)
>             xasecure.audit.destination.hdfs.orc.stripesize= (value in bytes)
> 
>     5. Hive Query to create ORC table with default 'snappy' compresssion.
> 
>         CREATE EXTERNAL TABLE ranger_audit_event (
>         repositoryType int,
>         repositoryName string,
>         reqUser string,
>         evtTime string,
>         accessType string,
>         resourcePath string,
>         resourceType string,
>         action  string,
>         accessResult string,
>         agentId string,
>         policyId  bigint,
>         resultReason string,
>         aclEnforcer string,
>         sessionId string,
>         clientType string,
>         clientIP string,
>         requestData string,
>         clusterName string
>         )
>         STORED AS ORC
>         LOCATION '/ranger/audit/hdfs'
>         TBLPROPERTIES  ("orc.compress"="SNAPPY");
> 
> 
> -------------------------
> 
> JSON FILE FORMAT in HDFS Ranger Audit log with local audit file store as source for HDFS audit:
> 	NOTE: When this is done each local file will be copied entirely into HDFS destination. This enables us to generate Ranger audit files in HDFS which are larger in size which is a preferred.
> 	
> 	 1. Enable Ranger Audit to HDFS in JSON file format using AuditFileQueue
>         - To enable Ranger Audit to HDFS with JSON format and local file cached, we need to first enable AuditFileQueue to spool the audit to locally.
> 
>             * In Namenode host, create spool directory and make sure the path can be read/write/execute for owner of the Service for which Ranger plugin is enabled (e.g HDFS Service it is hdfs:hadoop, Hive Service it is hive:hadoop ..etc)
> 
>                 $ mkdir -p  /var/log/hadoop/audit/staging/spool
>                 $ cd /var/log/hadoop/audit/staging/spool
>                 $ chown hdfs:hadoop spool
> 
>             * Enable AuditFileQueue via following params in ranger-<component>-audit.xml
>                xasecure.audit.destination.hdfs.batch.queuetype=filequeue ( NOTE: default = memqueue which is the behaviour where a  memory queue / buffer is used  instead of Local File buffer)
> 			   xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300    ( This will determine the JSON file size which will be copied to HDFS)
>                xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hadoop/audit/staging/spool  ( This is the local staging directory for audit)
> 
> 
> Thanks,
> 
> Ramesh Mani
> 
>