You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Jeff Hagelberg <jn...@us.ibm.com> on 2017/01/23 20:32:43 UTC

Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/
-----------------------------------------------------------

(Updated Jan. 23, 2017, 8:32 p.m.)


Review request for atlas and David Kantor.


Repository: atlas


Description
-------

Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were

    - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
    - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
    - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID

The changes we put in do the following:

    - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
    - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
    - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
    - reuse reference vertices from instance to graph mapping when computing full text property


Diffs
-----

  repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java b9671b27530b6abc0c648df34e950fccb9ef61a0 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
  repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
  repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 

Diff: https://reviews.apache.org/r/51092/diff/


Testing
-------

Ran complete build on linux, all tests passed


Thanks,

Jeff Hagelberg


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Madhan Neethiraj <ma...@apache.org>.
Jeff,

I get following test failures while running “mvn clean package” with this patch. Can you please look into these?

I am reviewing the patch. Optimizations look good; I will post my comments shortly.

Thanks,
Madhan

Results :

Failed tests:
org.apache.atlas.discovery.DataSetLineageServiceTest.testGetSchema(org.apache.atlas.discovery.DataSetLineageServiceTest)
  Run 1: DataSetLineageServiceTest.testGetSchema:288->assertColumn:313 expected:<hive_column_v1> but was:<hive_column>
  Run 2: DataSetLineageServiceTest.testGetSchema:288->assertColumn:313 expected:<hive_column_v1> but was:<hive_column>
  Run 3: DataSetLineageServiceTest.testGetSchema:288->assertColumn:313 expected:<hive_column_v1> but was:<hive_column>
  Run 4: DataSetLineageServiceTest.testGetSchema:288->assertColumn:313 expected:<hive_column_v1> but was:<hive_column>

org.apache.atlas.discovery.DataSetLineageServiceTest.testGetSchemaForEntity(org.apache.atlas.discovery.DataSetLineageServiceTest)
  Run 1: DataSetLineageServiceTest.testGetSchemaForEntity:305->assertColumn:313 expected:<hive_column_v1> but was:<hive_column>
  Run 2: DataSetLineageServiceTest.testGetSchemaForEntity:305->assertColumn:313 expected:<hive_column_v1> but was:<hive_column>
  Run 3: DataSetLineageServiceTest.testGetSchemaForEntity:305->assertColumn:313 expected:<hive_column_v1> but was:<hive_column>
  Run 4: DataSetLineageServiceTest.testGetSchemaForEntity:305->assertColumn:313 expected:<hive_column_v1> but was:<hive_column>

org.apache.atlas.discovery.DataSetLineageServiceTest.testSearchByDSLQueries(org.apache.atlas.discovery.DataSetLineageServiceTest)
  Run 1: PASS
  Run 2: PASS
  Run 3: PASS
  Run 4: PASS
  Run 5: PASS
  Run 6: PASS
  Run 7: PASS
  Run 8: PASS
  Run 9: PASS
  Run 10: PASS
  Run 11: PASS
  Run 12: PASS
  Run 13: DataSetLineageServiceTest.testSearchByDSLQueries:123 » Discovery Invalid expre...
  Run 14: DataSetLineageServiceTest.testSearchByDSLQueries:123 » Discovery Invalid expre...
  Run 15: DataSetLineageServiceTest.testSearchByDSLQueries:123 » Discovery Invalid expre...
  Run 16: DataSetLineageServiceTest.testSearchByDSLQueries:123 » Discovery Invalid expre...
  Run 17: PASS
  Run 18: PASS
  Run 19: PASS
  Run 20: PASS
  Run 21: PASS
  Run 22: PASS
  Run 23: PASS
  Run 24: PASS
  Run 25: PASS
  Run 26: PASS
  Run 27: PASS


Tests run: 584, Failures: 3, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Atlas Server Build Tools .................... SUCCESS [  0.683 s]
[INFO] apache-atlas ....................................... SUCCESS [  2.473 s]
[INFO] Apache Atlas Integration ........................... SUCCESS [ 17.953 s]
[INFO] Apache Atlas Common ................................ SUCCESS [  3.162 s]
[INFO] Apache Atlas Typesystem ............................ SUCCESS [ 45.467 s]
[INFO] Apache Atlas Client ................................ SUCCESS [  4.616 s]
[INFO] Apache Atlas Server API ............................ SUCCESS [  1.181 s]
[INFO] Apache Atlas Notification .......................... SUCCESS [ 20.127 s]
[INFO] Apache Atlas Graph Database Projects ............... SUCCESS [  0.097 s]
[INFO] Apache Atlas Graph Database API .................... SUCCESS [  0.535 s]
[INFO] Graph Database Common Code ......................... SUCCESS [  0.472 s]
[INFO] Shaded version of Apache hbase client .............. SUCCESS [  6.432 s]
[INFO] Apache Atlas Titan 0.5.4 Graph DB Impl ............. SUCCESS [01:16 min]
[INFO] Apache Atlas Graph Database Implementation Dependencies SUCCESS [  0.178 s]
[INFO] Shaded version of Apache hbase server .............. SUCCESS [ 14.259 s]
[INFO] Apache Atlas Repository ............................ FAILURE [08:51 min]
[INFO] Apache Atlas Authorization ......................... SKIPPED



On 1/24/17, 2:08 PM, "Jeff Hagelberg" <noreply@reviews.apache.org on behalf of jnhagelberg@us.ibm.com> wrote:

    
    -----------------------------------------------------------
    This is an automatically generated e-mail. To reply, visit:
    https://reviews.apache.org/r/51092/#review162859
    -----------------------------------------------------------
    
    
    
    Specifically, the changes in webapp (and the other changes added in diff #5) fix the following test failures that I have been seeing consistently with my changes.  I ended up applying them to the latest Atlas first, to ensure they passed there, and then merging them in with these changes.
    
    EntityJerseyResouErceIT.testCompleteUpdate:739 » AtlasService Metadata service ...
    EntityJerseyResourceIT.testEntityDeduping:197 expected:<1> but was:<0>
    org.apache.atlas.web.resources.EntityJerseyResourceIT.testEntityInvalidValue(org.apache.atlas.web.resources.EntityJerseyResourceIT)
      Run 1: EntityJerseyResourceIT.testEntityInvalidValue:280 expected:<Bad Request> but was:<Internal Server Error>
      Run 2: EntityJerseyResourceIT.testEntityInvalidValue:280 expected:<Bad Request> but was:<Internal Server Error>
    
    EntityJerseyResourceIT.testPartialUpdate:673 » AtlasService Metadata service A...
    ntityJerseyResourceIT.testUTF8:640->BaseResourceIT.createType:177->BaseResourceIT.createType:182 » AtlasService
      
    EntityV2JerseyResourceIT.testAddNullPropertyValue:274 NullPointer
    EntityV2JerseyResourceIT.testDeleteEntities:643->BaseResourceIT.createEntity:238->BaseResourceIT.modifyEntity:228 expected object to not be null
    EntityV2JerseyResourceIT.testDeleteEntityByUniqAttribute:666->createHiveDB:368->BaseResourceIT.createEntity:238->BaseResourceIT.modifyEntity:228 expected object to not be null
    EntityV2JerseyResourceIT.testGetEntityByAttribute:222 » AtlasService Metadata ...
    EntityV2JerseyResourceIT.testSubmitEntity:102->createDBAndTable:378->createHiveDB:368->BaseResourceIT.createEntity:238->BaseResourceIT.modifyEntity:228 expected object to not be null
    EntityV2JerseyResourceIT.testSubmitEntityWithBadDateFormat:230->BaseResourceIT.createEntity:238->BaseResourceIT.modifyEntity:228 expected object to not be null
    EntityV2JerseyResourceIT.testUTF8:552->BaseResourceIT.createEntity:238->BaseResourceIT.modifyEntity:228 expected object to not be null
    
    NotificationHookConsumerIT.testDeleteByQualifiedName:187->BaseResourceIT.waitFor:578 » 
    NotificationHookConsumerIT.testUpdateEntityFullUpdate:216->BaseResourceIT.waitFor:578 » 
    NotificationHookConsumerIT.testUpdateEntityPartial:130->BaseResourceIT.waitFor:573 » AtlasService
    NotificationHookConsumerIT.testUpdatePartialUpdatingQualifiedName:160->BaseResourceIT.waitFor:578 »
    
    DataSetLineageJerseyResourceIT.testSchema:152 » JSON JSONObject["type"] not fo...
    DataSetLineageJerseyResourceIT.testSchemaForEntity:171 » JSON JSONObject["type...
      
      
    EntityLineageJerseyResourceIT>DataSetLineageJerseyResourceIT.testSchema:152 » JSON
    EntityLineageJerseyResourceIT>DataSetLineageJerseyResourceIT.testSchemaForEntity:171 » JSON
      
    
    EntityDiscoveryJerseyResourceIT.testSearchByDSL:66 » AtlasService Metadata ser...
    EntityDiscoveryJerseyResourceIT.testSearchDSLLimits:87 » AtlasService Metadata...
    EntityDiscoveryJerseyResourceIT.testSearchUsingDSL:128 » AtlasService Metadata..
    
    EntityV2JerseyResourceIT.testEntityDeduping:119 expected:<1> but was:<0>
    
    - Jeff Hagelberg
    
    
    On Jan. 24, 2017, 9:58 p.m., Jeff Hagelberg wrote:
    > 
    > -----------------------------------------------------------
    > This is an automatically generated e-mail. To reply, visit:
    > https://reviews.apache.org/r/51092/
    > -----------------------------------------------------------
    > 
    > (Updated Jan. 24, 2017, 9:58 p.m.)
    > 
    > 
    > Review request for atlas and David Kantor.
    > 
    > 
    > Repository: atlas
    > 
    > 
    > Description
    > -------
    > 
    > Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were
    > 
    >     - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
    >     - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
    >     - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID
    > 
    > The changes we put in do the following:
    > 
    >     - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
    >     - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
    >     - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
    >     - reuse reference vertices from instance to graph mapping when computing full text property
    > 
    > Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.
    > 
    > 
    > Diffs
    > -----
    > 
    >   repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
    >   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
    >   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
    >   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
    >   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
    >   repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
    >   repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
    >   repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
    >   repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
    >   repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c34517a441cfb7513d98852bd8d13ecca9 
    >   repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
    >   typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
    >   webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
    >   webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
    >   webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
    >   webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
    >   webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
    >   webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
    >   webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
    >   webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
    >   webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 
    > 
    > Diff: https://reviews.apache.org/r/51092/diff/
    > 
    > 
    > Testing
    > -------
    > 
    > Ran complete build on linux, all tests passed
    > 
    > 
    > Thanks,
    > 
    > Jeff Hagelberg
    > 
    >
    
    




Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Jeff Hagelberg <jn...@us.ibm.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/#review162859
-----------------------------------------------------------



Specifically, the changes in webapp (and the other changes added in diff #5) fix the following test failures that I have been seeing consistently with my changes.  I ended up applying them to the latest Atlas first, to ensure they passed there, and then merging them in with these changes.

EntityJerseyResouErceIT.testCompleteUpdate:739 � AtlasService Metadata service ...
EntityJerseyResourceIT.testEntityDeduping:197 expected:<1> but was:<0>
org.apache.atlas.web.resources.EntityJerseyResourceIT.testEntityInvalidValue(org.apache.atlas.web.resources.EntityJerseyResourceIT)
  Run 1: EntityJerseyResourceIT.testEntityInvalidValue:280 expected:<Bad Request> but was:<Internal Server Error>
  Run 2: EntityJerseyResourceIT.testEntityInvalidValue:280 expected:<Bad Request> but was:<Internal Server Error>

EntityJerseyResourceIT.testPartialUpdate:673 � AtlasService Metadata service A...
ntityJerseyResourceIT.testUTF8:640->BaseResourceIT.createType:177->BaseResourceIT.createType:182 � AtlasService
  
EntityV2JerseyResourceIT.testAddNullPropertyValue:274 NullPointer
EntityV2JerseyResourceIT.testDeleteEntities:643->BaseResourceIT.createEntity:238->BaseResourceIT.modifyEntity:228 expected object to not be null
EntityV2JerseyResourceIT.testDeleteEntityByUniqAttribute:666->createHiveDB:368->BaseResourceIT.createEntity:238->BaseResourceIT.modifyEntity:228 expected object to not be null
EntityV2JerseyResourceIT.testGetEntityByAttribute:222 � AtlasService Metadata ...
EntityV2JerseyResourceIT.testSubmitEntity:102->createDBAndTable:378->createHiveDB:368->BaseResourceIT.createEntity:238->BaseResourceIT.modifyEntity:228 expected object to not be null
EntityV2JerseyResourceIT.testSubmitEntityWithBadDateFormat:230->BaseResourceIT.createEntity:238->BaseResourceIT.modifyEntity:228 expected object to not be null
EntityV2JerseyResourceIT.testUTF8:552->BaseResourceIT.createEntity:238->BaseResourceIT.modifyEntity:228 expected object to not be null

NotificationHookConsumerIT.testDeleteByQualifiedName:187->BaseResourceIT.waitFor:578 � 
NotificationHookConsumerIT.testUpdateEntityFullUpdate:216->BaseResourceIT.waitFor:578 � 
NotificationHookConsumerIT.testUpdateEntityPartial:130->BaseResourceIT.waitFor:573 � AtlasService
NotificationHookConsumerIT.testUpdatePartialUpdatingQualifiedName:160->BaseResourceIT.waitFor:578 �

DataSetLineageJerseyResourceIT.testSchema:152 � JSON JSONObject["type"] not fo...
DataSetLineageJerseyResourceIT.testSchemaForEntity:171 � JSON JSONObject["type...
  
  
EntityLineageJerseyResourceIT>DataSetLineageJerseyResourceIT.testSchema:152 � JSON
EntityLineageJerseyResourceIT>DataSetLineageJerseyResourceIT.testSchemaForEntity:171 � JSON
  

EntityDiscoveryJerseyResourceIT.testSearchByDSL:66 � AtlasService Metadata ser...
EntityDiscoveryJerseyResourceIT.testSearchDSLLimits:87 � AtlasService Metadata...
EntityDiscoveryJerseyResourceIT.testSearchUsingDSL:128 � AtlasService Metadata..

EntityV2JerseyResourceIT.testEntityDeduping:119 expected:<1> but was:<0>

- Jeff Hagelberg


On Jan. 24, 2017, 9:58 p.m., Jeff Hagelberg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51092/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 9:58 p.m.)
> 
> 
> Review request for atlas and David Kantor.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were
> 
>     - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
>     - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
>     - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID
> 
> The changes we put in do the following:
> 
>     - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
>     - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
>     - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
>     - reuse reference vertices from instance to graph mapping when computing full text property
> 
> Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
>   repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
>   repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c34517a441cfb7513d98852bd8d13ecca9 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
>   typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
>   webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
>   webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
>   webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
>   webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
>   webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
>   webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 
> 
> Diff: https://reviews.apache.org/r/51092/diff/
> 
> 
> Testing
> -------
> 
> Ran complete build on linux, all tests passed
> 
> 
> Thanks,
> 
> Jeff Hagelberg
> 
>


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Jeff Hagelberg <jn...@us.ibm.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/#review163014
-----------------------------------------------------------



I'm rerunning the tests.  I'll upload a patch with the updated files soon.

- Jeff Hagelberg


On Jan. 24, 2017, 11:22 p.m., Jeff Hagelberg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51092/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 11:22 p.m.)
> 
> 
> Review request for atlas and David Kantor.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were
> 
>     - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
>     - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
>     - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID
> 
> The changes we put in do the following:
> 
>     - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
>     - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
>     - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
>     - reuse reference vertices from instance to graph mapping when computing full text property
> 
> Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
>   repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
>   repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
>   typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
>   webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
>   webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
>   webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
>   webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
>   webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
>   webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 
> 
> Diff: https://reviews.apache.org/r/51092/diff/
> 
> 
> Testing
> -------
> 
> Ran complete build on linux, all tests passed
> 
> 
> Thanks,
> 
> Jeff Hagelberg
> 
>


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/#review163012
-----------------------------------------------------------




repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (line 795)
<https://reviews.apache.org/r/51092/#comment234386>

    You are right. I misread the "for" loop above. The code is good as it is.



repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (line 803)
<https://reviews.apache.org/r/51092/#comment234387>

    Ok.


- Madhan Neethiraj


On Jan. 24, 2017, 11:22 p.m., Jeff Hagelberg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51092/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 11:22 p.m.)
> 
> 
> Review request for atlas and David Kantor.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were
> 
>     - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
>     - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
>     - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID
> 
> The changes we put in do the following:
> 
>     - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
>     - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
>     - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
>     - reuse reference vertices from instance to graph mapping when computing full text property
> 
> Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
>   repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
>   repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
>   typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
>   webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
>   webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
>   webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
>   webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
>   webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
>   webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 
> 
> Diff: https://reviews.apache.org/r/51092/diff/
> 
> 
> Testing
> -------
> 
> Ran complete build on linux, all tests passed
> 
> 
> Thanks,
> 
> Jeff Hagelberg
> 
>


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Jeff Hagelberg <jn...@us.ibm.com>.

> On Jan. 24, 2017, 11:38 p.m., Madhan Neethiraj wrote:
> > repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java, line 795
> > <https://reviews.apache.org/r/51092/diff/5/?file=1613767#file1613767line795>
> >
> >     Instead of the following:
> >       AtlasVertex[] result = new AtlasVertex[instancesForClass.size()];
> >       // ...
> >       return Arrays.asList(result);
> >       
> >     It might be efficient to use Vector, like:
> >       Vector<AtlasVertex> result = new Vector<>();
> >       result.setSize(instancesForClass.size());
> >       // ...
> >       return result;
> >     
> >     Please review.

That is definitely an interesting idea.  I'm not sure that would be more efficient, though.  In result.setSize(), all of the elements in the Array underlying the Vector need to be explcitly nulled out.  We avoid that overhead by making the result be an array initially.  In addition, Arrays.asList() is very efficient.  It creates a List that directly uses the array provided, without doing any copying.  I actually doubt there would be much difference.  I can change it if you feel strongly about this.


> On Jan. 24, 2017, 11:38 p.m., Madhan Neethiraj wrote:
> > repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java, line 803
> > <https://reviews.apache.org/r/51092/diff/5/?file=1613767#file1613767line803>
> >
> >     This would work only if all entries in 'instancesForClass' list have typeName == classType.getName(). If the list can have subType of classType.getName() - this query will skip these instances. Perhaps this is intentional - if yes, please ignore.

I made the check this way because that is what the original getVertexForInstanceByUniqueAttribute() did.  I did not want to change the behavior.

result = findVertex(propertyKey, instance.get(attributeInfo.name),
                            Constants.ENTITY_TYPE_PROPERTY_KEY, classType.getName(),
                            Constants.STATE_PROPERTY_KEY, Id.EntityState.ACTIVE.name());


> On Jan. 24, 2017, 11:38 p.m., Madhan Neethiraj wrote:
> > repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java, line 794
> > <https://reviews.apache.org/r/51092/diff/5/?file=1613767#file1613767line794>
> >
> >     Consider moving this block to start of the method, to avoid unused "map" that gets populated above.

I'm not sure I understand your comment.  Are you suggesting that we should loop through the attributes twice in order to avoid creating the map if there are no unique attributes (once to determine if there are any unique attributes, and a second time to populate "map")?  The way it is now, if there are no unique attribues, nothing will be added to "map".


> On Jan. 24, 2017, 11:38 p.m., Madhan Neethiraj wrote:
> > repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java, line 566
> > <https://reviews.apache.org/r/51092/diff/5/?file=1613767#file1613767line566>
> >
> >     Collection<Integer> as the map value is needed only to handle duplicate values in 'values'. If most of the queries are expected to have non-duplicate values, avoiding this overhead (in creating a number of list objects, ..), especially at this lower level code, could help improve performance.
> >     
> >     One approach could be to use Map<String, AtlasVertex> as the return type from this method. This approach could eliminate the need for NonExistentVertexHandling flag. Please review.

Great suggestion.


- Jeff


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/#review162863
-----------------------------------------------------------


On Jan. 24, 2017, 11:22 p.m., Jeff Hagelberg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51092/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 11:22 p.m.)
> 
> 
> Review request for atlas and David Kantor.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were
> 
>     - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
>     - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
>     - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID
> 
> The changes we put in do the following:
> 
>     - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
>     - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
>     - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
>     - reuse reference vertices from instance to graph mapping when computing full text property
> 
> Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
>   repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
>   repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
>   typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
>   webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
>   webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
>   webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
>   webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
>   webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
>   webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 
> 
> Diff: https://reviews.apache.org/r/51092/diff/
> 
> 
> Testing
> -------
> 
> Ran complete build on linux, all tests passed
> 
> 
> Thanks,
> 
> Jeff Hagelberg
> 
>


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/#review162863
-----------------------------------------------------------



Jeff - here are the comments from my partial review. I will continue the review with your updated patch.


repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (line 566)
<https://reviews.apache.org/r/51092/#comment234225>

    Collection<Integer> as the map value is needed only to handle duplicate values in 'values'. If most of the queries are expected to have non-duplicate values, avoiding this overhead (in creating a number of list objects, ..), especially at this lower level code, could help improve performance.
    
    One approach could be to use Map<String, AtlasVertex> as the return type from this method. This approach could eliminate the need for NonExistentVertexHandling flag. Please review.



repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (line 794)
<https://reviews.apache.org/r/51092/#comment234226>

    Consider moving this block to start of the method, to avoid unused "map" that gets populated above.



repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (line 795)
<https://reviews.apache.org/r/51092/#comment234243>

    Instead of the following:
      AtlasVertex[] result = new AtlasVertex[instancesForClass.size()];
      // ...
      return Arrays.asList(result);
      
    It might be efficient to use Vector, like:
      Vector<AtlasVertex> result = new Vector<>();
      result.setSize(instancesForClass.size());
      // ...
      return result;
    
    Please review.



repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (line 803)
<https://reviews.apache.org/r/51092/#comment234227>

    This would work only if all entries in 'instancesForClass' list have typeName == classType.getName(). If the list can have subType of classType.getName() - this query will skip these instances. Perhaps this is intentional - if yes, please ignore.



repository/src/main/java/org/apache/atlas/util/IndexedInstance.java (line 31)
<https://reviews.apache.org/r/51092/#comment234246>

    Since there are no setters, consider marking these as final.


- Madhan Neethiraj


On Jan. 24, 2017, 11:22 p.m., Jeff Hagelberg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51092/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 11:22 p.m.)
> 
> 
> Review request for atlas and David Kantor.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were
> 
>     - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
>     - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
>     - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID
> 
> The changes we put in do the following:
> 
>     - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
>     - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
>     - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
>     - reuse reference vertices from instance to graph mapping when computing full text property
> 
> Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
>   repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
>   repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
>   typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
>   webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
>   webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
>   webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
>   webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
>   webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
>   webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 
> 
> Diff: https://reviews.apache.org/r/51092/diff/
> 
> 
> Testing
> -------
> 
> Ran complete build on linux, all tests passed
> 
> 
> Thanks,
> 
> Jeff Hagelberg
> 
>


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Jeff Hagelberg <jn...@us.ibm.com>.

> On Jan. 25, 2017, 1:30 a.m., Madhan Neethiraj wrote:
> > repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java, line 289
> > <https://reviews.apache.org/r/51092/diff/6/?file=1613904#file1613904line289>
> >
> >     Given there is no 'else' for this large 'if' block, please consider using continue, as shown below. I think it will help readability of the code.
> >     
> >     if (processedIds.contains(id)) {
> >       continue;
> >     }

Good idea.


> On Jan. 25, 2017, 1:30 a.m., Madhan Neethiraj wrote:
> > repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java, line 331
> > <https://reviews.apache.org/r/51092/diff/6/?file=1613904#file1613904line331>
> >
> >     Should instance be a "ReferenceableInstance"? Or can it be any inplementation of "ITypedReferenceableInstance" - like "Id", "ReferenceableInstance"?
> >     
> >     This check is inconsistent with the exception message below. Once of these needs to be corrected.

Good point.  The original code was like that as well.  I'll change the message.


- Jeff


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/#review162884
-----------------------------------------------------------


On Jan. 24, 2017, 11:22 p.m., Jeff Hagelberg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51092/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 11:22 p.m.)
> 
> 
> Review request for atlas and David Kantor.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were
> 
>     - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
>     - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
>     - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID
> 
> The changes we put in do the following:
> 
>     - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
>     - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
>     - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
>     - reuse reference vertices from instance to graph mapping when computing full text property
> 
> Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
>   repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
>   repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
>   typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
>   webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
>   webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
>   webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
>   webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
>   webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
>   webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 
> 
> Diff: https://reviews.apache.org/r/51092/diff/
> 
> 
> Testing
> -------
> 
> Ran complete build on linux, all tests passed
> 
> 
> Thanks,
> 
> Jeff Hagelberg
> 
>


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/#review162884
-----------------------------------------------------------




repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java (line 268)
<https://reviews.apache.org/r/51092/#comment234267>

    Looks like this "for" loop can be replaced with a call to: idToVertexMap.putAll(foundVertices)
    
    Please review.



repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java (line 277)
<https://reviews.apache.org/r/51092/#comment234260>

    Given there is no 'else' for this large 'if' block, please consider using continue, as shown below. I think it will help readability of the code.
    
    if (processedIds.contains(id)) {
      continue;
    }



repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java (line 283)
<https://reviews.apache.org/r/51092/#comment234262>

    Consider surronding "LOG.debug()" with "if (LOG.isDebugEnabled())" - 
    
    toShortString() implementations are not as simple/efficient as it sounds - they perform String.format(..).!



repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java (line 294)
<https://reviews.apache.org/r/51092/#comment234263>

    Consider surronding "LOG.debug()" with "if (LOG.isDebugEnabled())" -



repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java (line 295)
<https://reviews.apache.org/r/51092/#comment234265>

    Should instance be a "ReferenceableInstance"? Or can it be any inplementation of "ITypedReferenceableInstance" - like "Id", "ReferenceableInstance"?
    
    This check is inconsistent with the exception message below. Once of these needs to be corrected.



repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java (line 299)
<https://reviews.apache.org/r/51092/#comment234266>

    Since this is an existing instance, it might be good to replace variable name 'newInstance' with a local variable like 'exisingInstance' here.
    
    While at this, consider moving line 281 inside "if (instanceVertex == null)" block.



repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java (line 334)
<https://reviews.apache.org/r/51092/#comment234268>

    Please use "if (LOG.isDebugEnabled())" here, to avoid overhead of string(instanceVertex).


- Madhan Neethiraj


On Jan. 24, 2017, 11:22 p.m., Jeff Hagelberg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51092/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 11:22 p.m.)
> 
> 
> Review request for atlas and David Kantor.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were
> 
>     - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
>     - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
>     - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID
> 
> The changes we put in do the following:
> 
>     - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
>     - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
>     - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
>     - reuse reference vertices from instance to graph mapping when computing full text property
> 
> Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
>   repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
>   repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
>   typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
>   webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
>   webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
>   webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
>   webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
>   webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
>   webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 
> 
> Diff: https://reviews.apache.org/r/51092/diff/
> 
> 
> Testing
> -------
> 
> Ran complete build on linux, all tests passed
> 
> 
> Thanks,
> 
> Jeff Hagelberg
> 
>


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/#review163145
-----------------------------------------------------------


Ship it!




Ship It!

- Madhan Neethiraj


On Jan. 26, 2017, 4:53 p.m., Jeff Hagelberg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51092/
> -----------------------------------------------------------
> 
> (Updated Jan. 26, 2017, 4:53 p.m.)
> 
> 
> Review request for atlas and David Kantor.
> 
> 
> Bugs: ATLAS-1114
>     https://issues.apache.org/jira/browse/ATLAS-1114
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were
> 
>     - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
>     - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
>     - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID
> 
> The changes we put in do the following:
> 
>     - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
>     - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
>     - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
>     - reuse reference vertices from instance to graph mapping when computing full text property
> 
> Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
>   repository/src/main/java/org/apache/atlas/repository/graph/DeleteHandler.java 9eb086f79c556ed01ed24628f298f7c665e4a985 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
>   repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
>   repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
>   typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
>   webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
>   webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
>   webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
>   webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
>   webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
>   webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 
> 
> Diff: https://reviews.apache.org/r/51092/diff/
> 
> 
> Testing
> -------
> 
> Ran complete build on linux, all tests passed
> 
> 
> Thanks,
> 
> Jeff Hagelberg
> 
>


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Jeff Hagelberg <jn...@us.ibm.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/
-----------------------------------------------------------

(Updated Jan. 26, 2017, 4:53 p.m.)


Review request for atlas and David Kantor.


Bugs: ATLAS-1114
    https://issues.apache.org/jira/browse/ATLAS-1114


Repository: atlas


Description
-------

Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were

    - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
    - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
    - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID

The changes we put in do the following:

    - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
    - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
    - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
    - reuse reference vertices from instance to graph mapping when computing full text property

Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.


Diffs
-----

  repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
  repository/src/main/java/org/apache/atlas/repository/graph/DeleteHandler.java 9eb086f79c556ed01ed24628f298f7c665e4a985 
  repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
  repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
  repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
  repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
  typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
  webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
  webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
  webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
  webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
  webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
  webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 

Diff: https://reviews.apache.org/r/51092/diff/


Testing
-------

Ran complete build on linux, all tests passed


Thanks,

Jeff Hagelberg


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Jeff Hagelberg <jn...@us.ibm.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/
-----------------------------------------------------------

(Updated Jan. 26, 2017, 2:58 p.m.)


Review request for atlas and David Kantor.


Changes
-------

Addressed review comments.


Repository: atlas


Description
-------

Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were

    - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
    - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
    - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID

The changes we put in do the following:

    - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
    - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
    - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
    - reuse reference vertices from instance to graph mapping when computing full text property

Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.


Diffs (updated)
-----

  repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
  repository/src/main/java/org/apache/atlas/repository/graph/DeleteHandler.java 9eb086f79c556ed01ed24628f298f7c665e4a985 
  repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
  repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
  repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
  repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
  typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
  webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
  webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
  webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
  webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
  webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
  webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 

Diff: https://reviews.apache.org/r/51092/diff/


Testing
-------

Ran complete build on linux, all tests passed


Thanks,

Jeff Hagelberg


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/#review163093
-----------------------------------------------------------


Fix it, then Ship it!





repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (line 541)
<https://reviews.apache.org/r/51092/#comment234537>

    Consider marking this method as 'private' - to avoid possible misuse, as this method will return atmost one vertex for a given value. For example, if this method is called with:
    
     getVerticesForPropertyValues("Referenceable.qualifiedName", [ "testname" ])
    
    even if multiple entities exist with qualifiedName=testname, the method will return only one of these entities. This is not obvious from the method name/arguments.



repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (line 563)
<https://reviews.apache.org/r/51092/#comment234535>

    "vertexCount" is not used here; can be removed.


- Madhan Neethiraj


On Jan. 25, 2017, 8:53 p.m., Jeff Hagelberg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51092/
> -----------------------------------------------------------
> 
> (Updated Jan. 25, 2017, 8:53 p.m.)
> 
> 
> Review request for atlas and David Kantor.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were
> 
>     - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
>     - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
>     - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID
> 
> The changes we put in do the following:
> 
>     - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
>     - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
>     - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
>     - reuse reference vertices from instance to graph mapping when computing full text property
> 
> Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
>   repository/src/main/java/org/apache/atlas/repository/graph/DeleteHandler.java 9eb086f79c556ed01ed24628f298f7c665e4a985 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
>   repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
>   repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
>   typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
>   webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
>   webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
>   webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
>   webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
>   webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
>   webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
>   webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 
> 
> Diff: https://reviews.apache.org/r/51092/diff/
> 
> 
> Testing
> -------
> 
> Ran complete build on linux, all tests passed
> 
> 
> Thanks,
> 
> Jeff Hagelberg
> 
>


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Jeff Hagelberg <jn...@us.ibm.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/
-----------------------------------------------------------

(Updated Jan. 25, 2017, 8:53 p.m.)


Review request for atlas and David Kantor.


Changes
-------

Addressed code review comments.


Repository: atlas


Description
-------

Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were

    - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
    - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
    - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID

The changes we put in do the following:

    - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
    - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
    - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
    - reuse reference vertices from instance to graph mapping when computing full text property

Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.


Diffs (updated)
-----

  repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
  repository/src/main/java/org/apache/atlas/repository/graph/DeleteHandler.java 9eb086f79c556ed01ed24628f298f7c665e4a985 
  repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
  repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
  repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
  repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
  typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
  webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
  webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
  webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
  webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
  webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
  webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 

Diff: https://reviews.apache.org/r/51092/diff/


Testing
-------

Ran complete build on linux, all tests passed


Thanks,

Jeff Hagelberg


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Jeff Hagelberg <jn...@us.ibm.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/
-----------------------------------------------------------

(Updated Jan. 24, 2017, 11:22 p.m.)


Review request for atlas and David Kantor.


Changes
-------

Rolling back accidental changes to DataSetLineageServiceTest in repository.


Repository: atlas


Description
-------

Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were

    - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
    - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
    - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID

The changes we put in do the following:

    - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
    - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
    - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
    - reuse reference vertices from instance to graph mapping when computing full text property

Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.


Diffs (updated)
-----

  repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
  repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
  repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
  repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
  repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
  typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
  webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
  webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
  webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
  webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
  webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
  webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 

Diff: https://reviews.apache.org/r/51092/diff/


Testing
-------

Ran complete build on linux, all tests passed


Thanks,

Jeff Hagelberg


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Jeff Hagelberg <jn...@us.ibm.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/
-----------------------------------------------------------

(Updated Jan. 24, 2017, 9:58 p.m.)


Review request for atlas and David Kantor.


Repository: atlas


Description (updated)
-------

Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were

    - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
    - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
    - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID

The changes we put in do the following:

    - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
    - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
    - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
    - reuse reference vertices from instance to graph mapping when computing full text property

Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.  I tried to follow the path of least resistence.  We really should clean this up more, there is really no need for threee different versions of hive_table and its related classes.


Diffs
-----

  repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
  repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
  repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
  repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
  repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
  repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c34517a441cfb7513d98852bd8d13ecca9 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
  typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
  webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
  webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
  webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
  webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
  webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
  webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 

Diff: https://reviews.apache.org/r/51092/diff/


Testing
-------

Ran complete build on linux, all tests passed


Thanks,

Jeff Hagelberg


Re: Review Request 51092: ATLAS-1114: Performance improvements for create/update entity

Posted by Jeff Hagelberg <jn...@us.ibm.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51092/
-----------------------------------------------------------

(Updated Jan. 24, 2017, 9:51 p.m.)


Review request for atlas and David Kantor.


Changes
-------

rebased, resolved all test failures in webapp.


Repository: atlas


Description (updated)
-------

Apply performance fixes for create/update entities from IBM fork to Atlas. During our performance profiling, we found a number of performance hotspots in JProfiler. Our main findings were

    - multiple queries were being executed for each instance being created/updated to find matches by unique attribute.
    - one query was being executed for each instance being created/updated to find the corresponding vertex if there is one
    - Calculating the value of the full text property was taking a significant portion of the time to create/update entities, mainly due to its calls to getVertexForGUID

The changes we put in do the following:

    - batch lookups by guid when create/update entities. Execute one AtlasGraphQuery to find them all.
    - batch lookups by unique attribute when create/update entities. Execute one AtlasGraphQuery per class to find unique attribute matches.
    - find all existing vertices up front during create/update entity. Use those vertices during the graph mapping process to avoid running unnecessary graph queries
    - reuse reference vertices from instance to graph mapping when computing full text property

Also, resolved all test failures in webapp.  I disentagled the three competing versions of the hive model that the various tests were trying to use.  Now they all pass.


Diffs (updated)
-----

  repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 50a007bf1bf7e8c07a57b327b3aa3b907dd5f660 
  repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 911b1adbad92a76ce15f49ce023b56aeca8b4f94 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedMetadataRepository.java 0c80aeddbae35c80cdf8b83cea6aefadf6454a20 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236ca805142a93d9d4e63789fb0cc9aea05aa 
  repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java 4e55bbcab91b8572d13345cec61e8df2f195ee4f 
  repository/src/main/java/org/apache/atlas/repository/graph/VertexLookupContext.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 35a489f2d77578a72a2e73a37bdf094af25a166e 
  repository/src/main/java/org/apache/atlas/util/AttributeValueMap.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/util/IndexedInstance.java PRE-CREATION 
  repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c34517a441cfb7513d98852bd8d13ecca9 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphHelperTest.java a7dc13db72fb4ff268312c106df1b6c41f46962f 
  typesystem/src/test/resources/atlas-application.properties 108630b485e712179dc80d001dbce97551b37516 
  webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 17c8237569746bb77f75692d50ce115c21b80c7c 
  webapp/src/test/java/org/apache/atlas/notification/EntityNotificationIT.java 1774611285956a7f187bd4353778fe34a6893c69 
  webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerIT.java 4a3db8874468bc8373625afe93df0a0938d00e39 
  webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c0dccecd56e720f3633ba46bbaf5c37f5d 
  webapp/src/test/java/org/apache/atlas/web/resources/DataSetLineageJerseyResourceIT.java 8334e4f9dd2eedf48e516ee34a5b2981489ce0da 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityDiscoveryJerseyResourceIT.java 2bbe10a0827b2035a6c74e6b3aa3520b7e12d571 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityJerseyResourceIT.java f084053a03f9940d2f9015d4a15ef9c085553ae5 
  webapp/src/test/java/org/apache/atlas/web/resources/EntityV2JerseyResourceIT.java 74338fd7aee2d81f54f59b0be15bd249852fbd0b 
  webapp/src/test/java/org/apache/atlas/web/resources/MetadataDiscoveryJerseyResourceIT.java b004cb52cc996763dbc4c24cd80ab545c5749358 

Diff: https://reviews.apache.org/r/51092/diff/


Testing
-------

Ran complete build on linux, all tests passed


Thanks,

Jeff Hagelberg