You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sentry.apache.org by kalyan kumar kalvagadda via Review Board <no...@reviews.apache.org> on 2018/09/17 15:07:30 UTC

Review Request 68727: SENTRY-2306: Reduce the HMS snapshot size by ignoring partitions at default locations

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68727/
-----------------------------------------------------------

Review request for sentry, Alexander Kolbasov, Arjun Mishra, Na Li, and Sergio Pena.


Bugs: SENTRY-2306
    https://issues.apache.org/jira/browse/SENTRY-2306


Repository: sentry


Description
-------

Ignore the partitions entries which are located in default location as they share the same permissions as the table. This will reduce the snapshot size decreasing the time to persist the snapshot and also time to send the full update to Namenode. This is important as the partition information has lion’s share in a HMS Full snapshot.


Diffs
-----

  sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java 1eda41b2b6bd940a404cc1ba09a861fe783ead04 
  sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStoreInterface.java 0b4f4aa24bd3002c50bf4d80a6fa361f66052973 
  sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java 3e27d1bbee9bea9a60e7f7e012a367957610826c 
  sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/SentryService.java b3a4934dfc523d14eec216df00a6b7597c66c166 
  sentry-service/sentry-service-server/src/test/java/org/apache/sentry/service/thrift/TestFullUpdateInitializer.java 589acbed12855ff09309a04c9214f8daf87ea1de 
  sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hdfs/TestHDFSIntegrationEnd2End.java 574bc4b0432824abc81fe3c0dd30d1fe01f012e5 
  sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/InternalSentryService.java PRE-CREATION 
  sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/InternalSentrySrv.java 9fa42f2de41b6700a18a358f8d5e442104dd0585 
  sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/SentrySrv.java 0e82b86a727f578013a63c005aa05279790344f1 


Diff: https://reviews.apache.org/r/68727/diff/1/


Testing
-------

Made sure all the existing tests passed and also added unit and e2e tests to verify the new behavior.


Thanks,

kalyan kumar kalvagadda


Re: Review Request 68727: SENTRY-2306: Reduce the HMS snapshot size by ignoring partitions at default locations

Posted by Sergio Pena via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68727/#review209067
-----------------------------------------------------------



Changes look good. But I'd like to know two things before approving it:

1. Could you verify Lina's concerns (see JIRA) about what would happen if a partition or table location changes in the NotificationProcessor? If a partition location that has the same prefix as the table changes to be out of the table location, would this affect the HDFS ACLs?
2. Could you run a benchmark with and without this patch to see the improvement in numbers?

- Sergio Pena


On Sept. 24, 2018, 10:28 p.m., kalyan kumar kalvagadda wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68727/
> -----------------------------------------------------------
> 
> (Updated Sept. 24, 2018, 10:28 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2306
>     https://issues.apache.org/jira/browse/SENTRY-2306
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> Ignore the partitions entries which are located in default location as they share the same permissions as the table. This will reduce the snapshot size decreasing the time to persist the snapshot and also time to send the full update to Namenode. This is important as the partition information has lion’s share in a HMS Full snapshot.
> 
> 
> Diffs
> -----
> 
>   sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java 1eda41b2b6bd940a404cc1ba09a861fe783ead04 
>   sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStoreInterface.java 0b4f4aa24bd3002c50bf4d80a6fa361f66052973 
>   sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java 3e27d1bbee9bea9a60e7f7e012a367957610826c 
>   sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/SentryService.java b3a4934dfc523d14eec216df00a6b7597c66c166 
>   sentry-service/sentry-service-server/src/test/java/org/apache/sentry/service/thrift/TestFullUpdateInitializer.java 589acbed12855ff09309a04c9214f8daf87ea1de 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hdfs/TestHDFSIntegrationBase.java 3d7fbe3fea333d58e46cd721d610c7d8050c9de4 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hdfs/TestHDFSIntegrationEnd2End.java 574bc4b0432824abc81fe3c0dd30d1fe01f012e5 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/InternalSentryService.java PRE-CREATION 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/InternalSentrySrv.java 9fa42f2de41b6700a18a358f8d5e442104dd0585 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/SentrySrv.java 0e82b86a727f578013a63c005aa05279790344f1 
> 
> 
> Diff: https://reviews.apache.org/r/68727/diff/4/
> 
> 
> Testing
> -------
> 
> Made sure all the existing tests passed and also added unit and e2e tests to verify the new behavior.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>


Re: Review Request 68727: SENTRY-2306: Reduce the HMS snapshot size by ignoring partitions at default locations

Posted by kalyan kumar kalvagadda via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68727/
-----------------------------------------------------------

(Updated Sept. 24, 2018, 10:28 p.m.)


Review request for sentry, Alexander Kolbasov, Arjun Mishra, Na Li, and Sergio Pena.


Changes
-------

Addressed reivew comments.


Bugs: SENTRY-2306
    https://issues.apache.org/jira/browse/SENTRY-2306


Repository: sentry


Description
-------

Ignore the partitions entries which are located in default location as they share the same permissions as the table. This will reduce the snapshot size decreasing the time to persist the snapshot and also time to send the full update to Namenode. This is important as the partition information has lion’s share in a HMS Full snapshot.


Diffs (updated)
-----

  sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java 1eda41b2b6bd940a404cc1ba09a861fe783ead04 
  sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStoreInterface.java 0b4f4aa24bd3002c50bf4d80a6fa361f66052973 
  sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java 3e27d1bbee9bea9a60e7f7e012a367957610826c 
  sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/SentryService.java b3a4934dfc523d14eec216df00a6b7597c66c166 
  sentry-service/sentry-service-server/src/test/java/org/apache/sentry/service/thrift/TestFullUpdateInitializer.java 589acbed12855ff09309a04c9214f8daf87ea1de 
  sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hdfs/TestHDFSIntegrationEnd2End.java 574bc4b0432824abc81fe3c0dd30d1fe01f012e5 
  sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/InternalSentryService.java PRE-CREATION 
  sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/InternalSentrySrv.java 9fa42f2de41b6700a18a358f8d5e442104dd0585 
  sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/SentrySrv.java 0e82b86a727f578013a63c005aa05279790344f1 


Diff: https://reviews.apache.org/r/68727/diff/2/

Changes: https://reviews.apache.org/r/68727/diff/1-2/


Testing
-------

Made sure all the existing tests passed and also added unit and e2e tests to verify the new behavior.


Thanks,

kalyan kumar kalvagadda


Re: Review Request 68727: SENTRY-2306: Reduce the HMS snapshot size by ignoring partitions at default locations

Posted by kalyan kumar kalvagadda via Review Board <no...@reviews.apache.org>.

> On Sept. 20, 2018, 3 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 4911 (patched)
> > <https://reviews.apache.org/r/68727/diff/1/?file=2089936#file2089936line4911>
> >
> >     This method is already public, so you don't need the @VisibleForTesting annotation.

You are right. i made some refactoring beore publishing the final patch. I over looked it.


> On Sept. 20, 2018, 3 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStoreInterface.java
> > Lines 804-806 (patched)
> > <https://reviews.apache.org/r/68727/diff/1/?file=2089937#file2089937line804>
> >
> >     Would be a good idea to specify which tables contain HMS Metadata as this interface is shared by other implementations and would need help to understand what to delete.

Will update the comment


> On Sept. 20, 2018, 3 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java
> > Lines 549-555 (patched)
> > <https://reviews.apache.org/r/68727/diff/1/?file=2089938#file2089938line550>
> >
> >     You could save time and code by checking if the partition location is part of the table location inside the PartitionTask.doTask() method instead. 
> >     
> >     There is one part of the code there:
> >     
> >         String partPath = pathFromURI(part.getSd().getLocation());
> >         If (partPath != null) {
> >           partitionNames.add(partPath.intern());
> >         }
> >         
> >     If as part of the if() condition you also call !isPartitionLocatedWithinTableLocation(), then that should make the rest of the code work without too many changes.

Agree, This approach makes the change simpler. Let me look closer and make changes accordingly.


- kalyan kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68727/#review208797
-----------------------------------------------------------


On Sept. 17, 2018, 3:07 p.m., kalyan kumar kalvagadda wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68727/
> -----------------------------------------------------------
> 
> (Updated Sept. 17, 2018, 3:07 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2306
>     https://issues.apache.org/jira/browse/SENTRY-2306
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> Ignore the partitions entries which are located in default location as they share the same permissions as the table. This will reduce the snapshot size decreasing the time to persist the snapshot and also time to send the full update to Namenode. This is important as the partition information has lion’s share in a HMS Full snapshot.
> 
> 
> Diffs
> -----
> 
>   sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java 1eda41b2b6bd940a404cc1ba09a861fe783ead04 
>   sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStoreInterface.java 0b4f4aa24bd3002c50bf4d80a6fa361f66052973 
>   sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java 3e27d1bbee9bea9a60e7f7e012a367957610826c 
>   sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/SentryService.java b3a4934dfc523d14eec216df00a6b7597c66c166 
>   sentry-service/sentry-service-server/src/test/java/org/apache/sentry/service/thrift/TestFullUpdateInitializer.java 589acbed12855ff09309a04c9214f8daf87ea1de 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hdfs/TestHDFSIntegrationEnd2End.java 574bc4b0432824abc81fe3c0dd30d1fe01f012e5 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/InternalSentryService.java PRE-CREATION 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/InternalSentrySrv.java 9fa42f2de41b6700a18a358f8d5e442104dd0585 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/SentrySrv.java 0e82b86a727f578013a63c005aa05279790344f1 
> 
> 
> Diff: https://reviews.apache.org/r/68727/diff/1/
> 
> 
> Testing
> -------
> 
> Made sure all the existing tests passed and also added unit and e2e tests to verify the new behavior.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>


Re: Review Request 68727: SENTRY-2306: Reduce the HMS snapshot size by ignoring partitions at default locations

Posted by Sergio Pena via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68727/#review208797
-----------------------------------------------------------




sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 4911 (patched)
<https://reviews.apache.org/r/68727/#comment292999>

    This method is already public, so you don't need the @VisibleForTesting annotation.



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStoreInterface.java
Lines 804-806 (patched)
<https://reviews.apache.org/r/68727/#comment293000>

    Would be a good idea to specify which tables contain HMS Metadata as this interface is shared by other implementations and would need help to understand what to delete.



sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java
Lines 549-555 (patched)
<https://reviews.apache.org/r/68727/#comment293001>

    You could save time and code by checking if the partition location is part of the table location inside the PartitionTask.doTask() method instead. 
    
    There is one part of the code there:
    
        String partPath = pathFromURI(part.getSd().getLocation());
        If (partPath != null) {
          partitionNames.add(partPath.intern());
        }
        
    If as part of the if() condition you also call !isPartitionLocatedWithinTableLocation(), then that should make the rest of the code work without too many changes.


- Sergio Pena


On Sept. 17, 2018, 3:07 p.m., kalyan kumar kalvagadda wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68727/
> -----------------------------------------------------------
> 
> (Updated Sept. 17, 2018, 3:07 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2306
>     https://issues.apache.org/jira/browse/SENTRY-2306
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> Ignore the partitions entries which are located in default location as they share the same permissions as the table. This will reduce the snapshot size decreasing the time to persist the snapshot and also time to send the full update to Namenode. This is important as the partition information has lion’s share in a HMS Full snapshot.
> 
> 
> Diffs
> -----
> 
>   sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java 1eda41b2b6bd940a404cc1ba09a861fe783ead04 
>   sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStoreInterface.java 0b4f4aa24bd3002c50bf4d80a6fa361f66052973 
>   sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/FullUpdateInitializer.java 3e27d1bbee9bea9a60e7f7e012a367957610826c 
>   sentry-service/sentry-service-server/src/main/java/org/apache/sentry/service/thrift/SentryService.java b3a4934dfc523d14eec216df00a6b7597c66c166 
>   sentry-service/sentry-service-server/src/test/java/org/apache/sentry/service/thrift/TestFullUpdateInitializer.java 589acbed12855ff09309a04c9214f8daf87ea1de 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hdfs/TestHDFSIntegrationEnd2End.java 574bc4b0432824abc81fe3c0dd30d1fe01f012e5 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/InternalSentryService.java PRE-CREATION 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/InternalSentrySrv.java 9fa42f2de41b6700a18a358f8d5e442104dd0585 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/minisentry/SentrySrv.java 0e82b86a727f578013a63c005aa05279790344f1 
> 
> 
> Diff: https://reviews.apache.org/r/68727/diff/1/
> 
> 
> Testing
> -------
> 
> Made sure all the existing tests passed and also added unit and e2e tests to verify the new behavior.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>