You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ranger.apache.org by Ramesh Mani <rm...@hortonworks.com> on 2018/01/27 02:05:19 UTC

Re: Review Request 63552: RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63552/
-----------------------------------------------------------

(Updated Jan. 27, 2018, 2:05 a.m.)


Review request for ranger, Don Bosco Durai and Madhan Neethiraj.


Changes
-------

Address review comments.


Repository: ranger


Description
-------

RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format


Diffs (updated)
-----

  agents-audit/README.txt PRE-CREATION 
  agents-audit/pom.xml c8bd1d8 
  agents-audit/src/main/java/org/apache/ranger/audit/destination/AuditDestination.java 41d0e82 
  agents-audit/src/main/java/org/apache/ranger/audit/destination/HDFSAuditDestination.java 66d8504 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditFileCacheProvider.java 314b130 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditHandler.java 4ce31dd 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditProviderFactory.java 43107ba 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditWriterFactory.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/BaseAuditHandler.java b095000 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/DummyAuditProvider.java 05f882f 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/MiscUtil.java eff3824 
  agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileCacheProviderSpool.java 41513ba 
  agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueue.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueueSpool.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/AbstractAuditWriter.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/JSONWriter.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCFileUtil.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCWriter.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/Writer.java PRE-CREATION 
  pom.xml 589cd6a 
  src/main/assembly/hbase-agent.xml 3ebc334 
  src/main/assembly/hdfs-agent.xml 5279a9a 
  src/main/assembly/hive-agent.xml ca65c80 
  src/main/assembly/knox-agent.xml 8357d49 
  src/main/assembly/plugin-atlas.xml fd98811 
  src/main/assembly/plugin-kafka.xml 95855d9 
  src/main/assembly/plugin-kms.xml 6d15f2a 
  src/main/assembly/plugin-solr.xml de30bfb 
  src/main/assembly/plugin-sqoop.xml d2bd69a 
  src/main/assembly/plugin-yarn.xml c6a48e8 
  src/main/assembly/storm-agent.xml 64224ec 


Diff: https://reviews.apache.org/r/63552/diff/5/

Changes: https://reviews.apache.org/r/63552/diff/4-5/


Testing
-------

Testing done in local


Thanks,

Ramesh Mani


Re: Review Request 63552: RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format

Posted by Kevin Risden <kr...@apache.org>.

> On Nov. 6, 2018, 9:31 a.m., Kevin Risden wrote:
> >

Changes look reasonable. Only 2 comments and more optimizations than anything else.


- Kevin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63552/#review210350
-----------------------------------------------------------


On Jan. 26, 2018, 8:05 p.m., Ramesh Mani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63552/
> -----------------------------------------------------------
> 
> (Updated Jan. 26, 2018, 8:05 p.m.)
> 
> 
> Review request for ranger, Don Bosco Durai and Madhan Neethiraj.
> 
> 
> Repository: ranger
> 
> 
> Description
> -------
> 
> RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format
> 
> 
> Diffs
> -----
> 
>   agents-audit/README.txt PRE-CREATION 
>   agents-audit/pom.xml c8bd1d8 
>   agents-audit/src/main/java/org/apache/ranger/audit/destination/AuditDestination.java 41d0e82 
>   agents-audit/src/main/java/org/apache/ranger/audit/destination/HDFSAuditDestination.java 66d8504 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditFileCacheProvider.java 314b130 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditHandler.java 4ce31dd 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditProviderFactory.java 43107ba 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditWriterFactory.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/BaseAuditHandler.java b095000 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/DummyAuditProvider.java 05f882f 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/MiscUtil.java eff3824 
>   agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileCacheProviderSpool.java 41513ba 
>   agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueue.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueueSpool.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/AbstractAuditWriter.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/JSONWriter.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCFileUtil.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCWriter.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/Writer.java PRE-CREATION 
>   pom.xml 589cd6a 
>   src/main/assembly/hbase-agent.xml 3ebc334 
>   src/main/assembly/hdfs-agent.xml 5279a9a 
>   src/main/assembly/hive-agent.xml ca65c80 
>   src/main/assembly/knox-agent.xml 8357d49 
>   src/main/assembly/plugin-atlas.xml fd98811 
>   src/main/assembly/plugin-kafka.xml 95855d9 
>   src/main/assembly/plugin-kms.xml 6d15f2a 
>   src/main/assembly/plugin-solr.xml de30bfb 
>   src/main/assembly/plugin-sqoop.xml d2bd69a 
>   src/main/assembly/plugin-yarn.xml c6a48e8 
>   src/main/assembly/storm-agent.xml 64224ec 
> 
> 
> Diff: https://reviews.apache.org/r/63552/diff/5/
> 
> 
> Testing
> -------
> 
> Testing done in local
> 
> 
> Thanks,
> 
> Ramesh Mani
> 
>


Re: Review Request 63552: RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format

Posted by Kevin Risden <kr...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63552/#review210350
-----------------------------------------------------------




agents-audit/README.txt
Lines 80 (patched)
<https://reviews.apache.org/r/63552/#comment295017>

    Might be worth specifying partitioned by with the year/month/day. Otherwise, each query will need to search ALL orc files. I think nothing changes here in the fact that Ranger audits are written to HDFS in per day folders. right?



agents-audit/pom.xml
Lines 60 (patched)
<https://reviews.apache.org/r/63552/#comment295018>

    Is there a chance this causes dependency conflict when put on the classpath of other projects. I thought hive-exec pulled in a bunch of extra dependencies.


- Kevin Risden


On Jan. 26, 2018, 8:05 p.m., Ramesh Mani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63552/
> -----------------------------------------------------------
> 
> (Updated Jan. 26, 2018, 8:05 p.m.)
> 
> 
> Review request for ranger, Don Bosco Durai and Madhan Neethiraj.
> 
> 
> Repository: ranger
> 
> 
> Description
> -------
> 
> RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format
> 
> 
> Diffs
> -----
> 
>   agents-audit/README.txt PRE-CREATION 
>   agents-audit/pom.xml c8bd1d8 
>   agents-audit/src/main/java/org/apache/ranger/audit/destination/AuditDestination.java 41d0e82 
>   agents-audit/src/main/java/org/apache/ranger/audit/destination/HDFSAuditDestination.java 66d8504 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditFileCacheProvider.java 314b130 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditHandler.java 4ce31dd 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditProviderFactory.java 43107ba 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditWriterFactory.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/BaseAuditHandler.java b095000 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/DummyAuditProvider.java 05f882f 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/MiscUtil.java eff3824 
>   agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileCacheProviderSpool.java 41513ba 
>   agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueue.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueueSpool.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/AbstractAuditWriter.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/JSONWriter.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCFileUtil.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCWriter.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/Writer.java PRE-CREATION 
>   pom.xml 589cd6a 
>   src/main/assembly/hbase-agent.xml 3ebc334 
>   src/main/assembly/hdfs-agent.xml 5279a9a 
>   src/main/assembly/hive-agent.xml ca65c80 
>   src/main/assembly/knox-agent.xml 8357d49 
>   src/main/assembly/plugin-atlas.xml fd98811 
>   src/main/assembly/plugin-kafka.xml 95855d9 
>   src/main/assembly/plugin-kms.xml 6d15f2a 
>   src/main/assembly/plugin-solr.xml de30bfb 
>   src/main/assembly/plugin-sqoop.xml d2bd69a 
>   src/main/assembly/plugin-yarn.xml c6a48e8 
>   src/main/assembly/storm-agent.xml 64224ec 
> 
> 
> Diff: https://reviews.apache.org/r/63552/diff/5/
> 
> 
> Testing
> -------
> 
> Testing done in local
> 
> 
> Thanks,
> 
> Ramesh Mani
> 
>


Re: Review Request 63552: RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format

Posted by Velmurugan Periasamy <vp...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63552/#review222758
-----------------------------------------------------------


Ship it!




Ship It!

- Velmurugan Periasamy


On March 31, 2021, 6:01 p.m., Ramesh Mani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63552/
> -----------------------------------------------------------
> 
> (Updated March 31, 2021, 6:01 p.m.)
> 
> 
> Review request for ranger, Don Bosco Durai, Abhay Kulkarni, Madhan Neethiraj, Mehul Parikh, Selvamohan Neethiraj, Sailaja Polavarapu, and Velmurugan Periasamy.
> 
> 
> Bugs: RANGER-1837
>     https://issues.apache.org/jira/browse/RANGER-1837
> 
> 
> Repository: ranger
> 
> 
> Description
> -------
> 
> RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format
> 
> 
> Diffs
> -----
> 
>   agents-audit/pom.xml b9f6af27c 
>   agents-audit/src/main/java/org/apache/ranger/audit/destination/HDFSAuditDestination.java 5e6f40226 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditHandler.java 4ce31dd09 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditProviderFactory.java 6b7f4b00b 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditWriterFactory.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/BaseAuditHandler.java 54f37644b 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/DummyAuditProvider.java 05f882ff3 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/MiscUtil.java e2b74489b 
>   agents-audit/src/main/java/org/apache/ranger/audit/provider/MultiDestAuditProvider.java 282f5abfa 
>   agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileCacheProviderSpool.java 41513ba40 
>   agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueue.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueueSpool.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/AbstractRangerAuditWriter.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCFileUtil.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerAuditWriter.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerJSONAuditWriter.java PRE-CREATION 
>   agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerORCAuditWriter.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63552/diff/7/
> 
> 
> Testing
> -------
> 
> Testing done in local
> 
> ORC FILE FORMAT in HDFS Ranger Audit log with local audit file store as source for HDFS audit:
> 	NOTE: When this is done each records in the local file will be read for creating the ORC File.
> 
>     1. Enable Ranger Audit to HDFS in ORC file format using AuditFileQueue
>         - To enable Ranger Audit to HDFS with ORC format, we need to first enable AuditFileQueue to spool the audit to local first.
>             * In Namenode host, create spool directory and make sure the path can be read/write/execute for owner of the Service for which Ranger plugin is enabled ( e.g HDFS Service it is hdfs:hadoop, Hive Service it is hive:hadoop ..etc)
> 
>                 $ mkdir -p  /var/log/hadoop/audit/staging/spool
>                 $ cd /var/log/hadoop/audit/staging/spool
>                 $ chown hdfs:hadoop spool
> 
>             * Enable AuditFileQueue via following params in ranger-<component>-audit.xml
>                xasecure.audit.destination.hdfs.batch.queuetype=filequeue (NOTE: default = memqueue which is the behaviour where a  memory queue / buffer is used  instead of Local File buffer)
> 			   xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300    ( This will determine the batch size for ORC file which is created)
>                xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hadoop/audit/staging/spool  ( This is the local staging directory for audit)
>                xasecure.audit.destination.hdfs.batch.filequeue.filespool.buffer.size=10000  ( This will determine batch size for ORC file creation alone with rollover.sec parameter)
> 
>     2. Enable ORC fileformat for Ranger HDFS Audit.
>           - This is done by having the following param in ranger-<component>-audit.xml. By default the value is "json"
> 
>             xasecure.audit.destination.hdfs.filetype=orc ( default = json )
> 
>     3. Provision to control the compression techniques for ORC format. Default is 'snappy'
>             xasecure.audit.destination.hdfs.orc.compression=snappy|lzo|zlip|none
> 
>     4. Buffer Size and Stripe Size of ORC file batch. Default is '10000' bytes and '100000' bytes respectively. This will decide the batch size on ORC file in hdfs.
>             xasecure.audit.destination.hdfs.orc.buffersize= (value in bytes)
>             xasecure.audit.destination.hdfs.orc.stripesize= (value in bytes)
> 
>     5. Hive Query to create ORC table with default 'snappy' compresssion.
> 
>         CREATE EXTERNAL TABLE ranger_audit_event (
>         repositoryType int,
>         repositoryName string,
>         reqUser string,
>         evtTime string,
>         accessType string,
>         resourcePath string,
>         resourceType string,
>         action  string,
>         accessResult string,
>         agentId string,
>         policyId  bigint,
>         resultReason string,
>         aclEnforcer string,
>         sessionId string,
>         clientType string,
>         clientIP string,
>         requestData string,
>         clusterName string
>         )
>         STORED AS ORC
>         LOCATION '/ranger/audit/hdfs'
>         TBLPROPERTIES  ("orc.compress"="SNAPPY");
> 
> 
> -------------------------
> 
> JSON FILE FORMAT in HDFS Ranger Audit log with local audit file store as source for HDFS audit:
> 	NOTE: When this is done each local file will be copied entirely into HDFS destination. This enables us to generate Ranger audit files in HDFS which are larger in size which is a preferred.
> 	
> 	 1. Enable Ranger Audit to HDFS in JSON file format using AuditFileQueue
>         - To enable Ranger Audit to HDFS with JSON format and local file cached, we need to first enable AuditFileQueue to spool the audit to locally.
> 
>             * In Namenode host, create spool directory and make sure the path can be read/write/execute for owner of the Service for which Ranger plugin is enabled (e.g HDFS Service it is hdfs:hadoop, Hive Service it is hive:hadoop ..etc)
> 
>                 $ mkdir -p  /var/log/hadoop/audit/staging/spool
>                 $ cd /var/log/hadoop/audit/staging/spool
>                 $ chown hdfs:hadoop spool
> 
>             * Enable AuditFileQueue via following params in ranger-<component>-audit.xml
>                xasecure.audit.destination.hdfs.batch.queuetype=filequeue ( NOTE: default = memqueue which is the behaviour where a  memory queue / buffer is used  instead of Local File buffer)
> 			   xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300    ( This will determine the JSON file size which will be copied to HDFS)
>                xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hadoop/audit/staging/spool  ( This is the local staging directory for audit)
> 
> 
> Thanks,
> 
> Ramesh Mani
> 
>


Re: Review Request 63552: RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format

Posted by Ramesh Mani <rm...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63552/
-----------------------------------------------------------

(Updated March 31, 2021, 6:01 p.m.)


Review request for ranger, Don Bosco Durai, Abhay Kulkarni, Madhan Neethiraj, Mehul Parikh, Selvamohan Neethiraj, Sailaja Polavarapu, and Velmurugan Periasamy.


Changes
-------

Rebased to include HFlushCapableStream check


Bugs: RANGER-1837
    https://issues.apache.org/jira/browse/RANGER-1837


Repository: ranger


Description
-------

RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format


Diffs (updated)
-----

  agents-audit/pom.xml b9f6af27c 
  agents-audit/src/main/java/org/apache/ranger/audit/destination/HDFSAuditDestination.java 5e6f40226 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditHandler.java 4ce31dd09 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditProviderFactory.java 6b7f4b00b 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditWriterFactory.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/BaseAuditHandler.java 54f37644b 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/DummyAuditProvider.java 05f882ff3 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/MiscUtil.java e2b74489b 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/MultiDestAuditProvider.java 282f5abfa 
  agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileCacheProviderSpool.java 41513ba40 
  agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueue.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueueSpool.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/AbstractRangerAuditWriter.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCFileUtil.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerAuditWriter.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerJSONAuditWriter.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerORCAuditWriter.java PRE-CREATION 


Diff: https://reviews.apache.org/r/63552/diff/7/

Changes: https://reviews.apache.org/r/63552/diff/6-7/


Testing
-------

Testing done in local

ORC FILE FORMAT in HDFS Ranger Audit log with local audit file store as source for HDFS audit:
	NOTE: When this is done each records in the local file will be read for creating the ORC File.

    1. Enable Ranger Audit to HDFS in ORC file format using AuditFileQueue
        - To enable Ranger Audit to HDFS with ORC format, we need to first enable AuditFileQueue to spool the audit to local first.
            * In Namenode host, create spool directory and make sure the path can be read/write/execute for owner of the Service for which Ranger plugin is enabled ( e.g HDFS Service it is hdfs:hadoop, Hive Service it is hive:hadoop ..etc)

                $ mkdir -p  /var/log/hadoop/audit/staging/spool
                $ cd /var/log/hadoop/audit/staging/spool
                $ chown hdfs:hadoop spool

            * Enable AuditFileQueue via following params in ranger-<component>-audit.xml
               xasecure.audit.destination.hdfs.batch.queuetype=filequeue (NOTE: default = memqueue which is the behaviour where a  memory queue / buffer is used  instead of Local File buffer)
			   xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300    ( This will determine the batch size for ORC file which is created)
               xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hadoop/audit/staging/spool  ( This is the local staging directory for audit)
               xasecure.audit.destination.hdfs.batch.filequeue.filespool.buffer.size=10000  ( This will determine batch size for ORC file creation alone with rollover.sec parameter)

    2. Enable ORC fileformat for Ranger HDFS Audit.
          - This is done by having the following param in ranger-<component>-audit.xml. By default the value is "json"

            xasecure.audit.destination.hdfs.filetype=orc ( default = json )

    3. Provision to control the compression techniques for ORC format. Default is 'snappy'
            xasecure.audit.destination.hdfs.orc.compression=snappy|lzo|zlip|none

    4. Buffer Size and Stripe Size of ORC file batch. Default is '10000' bytes and '100000' bytes respectively. This will decide the batch size on ORC file in hdfs.
            xasecure.audit.destination.hdfs.orc.buffersize= (value in bytes)
            xasecure.audit.destination.hdfs.orc.stripesize= (value in bytes)

    5. Hive Query to create ORC table with default 'snappy' compresssion.

        CREATE EXTERNAL TABLE ranger_audit_event (
        repositoryType int,
        repositoryName string,
        reqUser string,
        evtTime string,
        accessType string,
        resourcePath string,
        resourceType string,
        action  string,
        accessResult string,
        agentId string,
        policyId  bigint,
        resultReason string,
        aclEnforcer string,
        sessionId string,
        clientType string,
        clientIP string,
        requestData string,
        clusterName string
        )
        STORED AS ORC
        LOCATION '/ranger/audit/hdfs'
        TBLPROPERTIES  ("orc.compress"="SNAPPY");


-------------------------

JSON FILE FORMAT in HDFS Ranger Audit log with local audit file store as source for HDFS audit:
	NOTE: When this is done each local file will be copied entirely into HDFS destination. This enables us to generate Ranger audit files in HDFS which are larger in size which is a preferred.
	
	 1. Enable Ranger Audit to HDFS in JSON file format using AuditFileQueue
        - To enable Ranger Audit to HDFS with JSON format and local file cached, we need to first enable AuditFileQueue to spool the audit to locally.

            * In Namenode host, create spool directory and make sure the path can be read/write/execute for owner of the Service for which Ranger plugin is enabled (e.g HDFS Service it is hdfs:hadoop, Hive Service it is hive:hadoop ..etc)

                $ mkdir -p  /var/log/hadoop/audit/staging/spool
                $ cd /var/log/hadoop/audit/staging/spool
                $ chown hdfs:hadoop spool

            * Enable AuditFileQueue via following params in ranger-<component>-audit.xml
               xasecure.audit.destination.hdfs.batch.queuetype=filequeue ( NOTE: default = memqueue which is the behaviour where a  memory queue / buffer is used  instead of Local File buffer)
			   xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300    ( This will determine the JSON file size which will be copied to HDFS)
               xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hadoop/audit/staging/spool  ( This is the local staging directory for audit)


Thanks,

Ramesh Mani


Re: Review Request 63552: RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format

Posted by Ramesh Mani <rm...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63552/
-----------------------------------------------------------

(Updated Feb. 4, 2021, 12:49 a.m.)


Review request for ranger, Don Bosco Durai, Abhay Kulkarni, Madhan Neethiraj, Mehul Parikh, Selvamohan Neethiraj, Sailaja Polavarapu, and Velmurugan Periasamy.


Changes
-------

Revised patch based on comments and testing done.


Bugs: RANGER-1837
    https://issues.apache.org/jira/browse/RANGER-1837


Repository: ranger


Description
-------

RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format


Diffs (updated)
-----

  agents-audit/pom.xml 85dd550ad 
  agents-audit/src/main/java/org/apache/ranger/audit/destination/HDFSAuditDestination.java 906ff341f 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditHandler.java 4ce31dd09 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditProviderFactory.java f971a76f0 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditWriterFactory.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/BaseAuditHandler.java 6138ca0eb 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/DummyAuditProvider.java 05f882ff3 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/MiscUtil.java e2b74489b 
  agents-audit/src/main/java/org/apache/ranger/audit/provider/MultiDestAuditProvider.java 282f5abfa 
  agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileCacheProviderSpool.java 41513ba40 
  agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueue.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueueSpool.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/AbstractRangerAuditWriter.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCFileUtil.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerAuditWriter.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerJSONAuditWriter.java PRE-CREATION 
  agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerORCAuditWriter.java PRE-CREATION 


Diff: https://reviews.apache.org/r/63552/diff/6/

Changes: https://reviews.apache.org/r/63552/diff/5-6/


Testing (updated)
-------

Testing done in local

ORC FILE FORMAT in HDFS Ranger Audit log with local audit file store as source for HDFS audit:
	NOTE: When this is done each records in the local file will be read for creating the ORC File.

    1. Enable Ranger Audit to HDFS in ORC file format using AuditFileQueue
        - To enable Ranger Audit to HDFS with ORC format, we need to first enable AuditFileQueue to spool the audit to local first.
            * In Namenode host, create spool directory and make sure the path can be read/write/execute for owner of the Service for which Ranger plugin is enabled ( e.g HDFS Service it is hdfs:hadoop, Hive Service it is hive:hadoop ..etc)

                $ mkdir -p  /var/log/hadoop/audit/staging/spool
                $ cd /var/log/hadoop/audit/staging/spool
                $ chown hdfs:hadoop spool

            * Enable AuditFileQueue via following params in ranger-<component>-audit.xml
               xasecure.audit.destination.hdfs.batch.queuetype=filequeue (NOTE: default = memqueue which is the behaviour where a  memory queue / buffer is used  instead of Local File buffer)
			   xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300    ( This will determine the batch size for ORC file which is created)
               xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hadoop/audit/staging/spool  ( This is the local staging directory for audit)
               xasecure.audit.destination.hdfs.batch.filequeue.filespool.buffer.size=10000  ( This will determine batch size for ORC file creation alone with rollover.sec parameter)

    2. Enable ORC fileformat for Ranger HDFS Audit.
          - This is done by having the following param in ranger-<component>-audit.xml. By default the value is "json"

            xasecure.audit.destination.hdfs.filetype=orc ( default = json )

    3. Provision to control the compression techniques for ORC format. Default is 'snappy'
            xasecure.audit.destination.hdfs.orc.compression=snappy|lzo|zlip|none

    4. Buffer Size and Stripe Size of ORC file batch. Default is '10000' bytes and '100000' bytes respectively. This will decide the batch size on ORC file in hdfs.
            xasecure.audit.destination.hdfs.orc.buffersize= (value in bytes)
            xasecure.audit.destination.hdfs.orc.stripesize= (value in bytes)

    5. Hive Query to create ORC table with default 'snappy' compresssion.

        CREATE EXTERNAL TABLE ranger_audit_event (
        repositoryType int,
        repositoryName string,
        reqUser string,
        evtTime string,
        accessType string,
        resourcePath string,
        resourceType string,
        action  string,
        accessResult string,
        agentId string,
        policyId  bigint,
        resultReason string,
        aclEnforcer string,
        sessionId string,
        clientType string,
        clientIP string,
        requestData string,
        clusterName string
        )
        STORED AS ORC
        LOCATION '/ranger/audit/hdfs'
        TBLPROPERTIES  ("orc.compress"="SNAPPY");


-------------------------

JSON FILE FORMAT in HDFS Ranger Audit log with local audit file store as source for HDFS audit:
	NOTE: When this is done each local file will be copied entirely into HDFS destination. This enables us to generate Ranger audit files in HDFS which are larger in size which is a preferred.
	
	 1. Enable Ranger Audit to HDFS in JSON file format using AuditFileQueue
        - To enable Ranger Audit to HDFS with JSON format and local file cached, we need to first enable AuditFileQueue to spool the audit to locally.

            * In Namenode host, create spool directory and make sure the path can be read/write/execute for owner of the Service for which Ranger plugin is enabled (e.g HDFS Service it is hdfs:hadoop, Hive Service it is hive:hadoop ..etc)

                $ mkdir -p  /var/log/hadoop/audit/staging/spool
                $ cd /var/log/hadoop/audit/staging/spool
                $ chown hdfs:hadoop spool

            * Enable AuditFileQueue via following params in ranger-<component>-audit.xml
               xasecure.audit.destination.hdfs.batch.queuetype=filequeue ( NOTE: default = memqueue which is the behaviour where a  memory queue / buffer is used  instead of Local File buffer)
			   xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300    ( This will determine the JSON file size which will be copied to HDFS)
               xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hadoop/audit/staging/spool  ( This is the local staging directory for audit)


Thanks,

Ramesh Mani