You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Sarath Subramanian <sa...@gmail.com> on 2017/01/21 02:19:12 UTC

Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/
-----------------------------------------------------------

Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.


Bugs: ATLAS-1403
    https://issues.apache.org/jira/browse/ATLAS-1403


Repository: atlas


Description
-------

Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
For eg:
Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
Optimized query
Gremlin Query = L:
{g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}


Diffs
-----

  repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7 
  repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c 
  repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c 
  repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
  repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff 
  repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 
  repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 

Diff: https://reviews.apache.org/r/55813/diff/


Testing
-------

Ran Unit Tests and was successful.


Thanks,

Sarath Subramanian


Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162725
-----------------------------------------------------------


Fix it, then Ship it!





repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java (line 127)
<https://reviews.apache.org/r/55813/#comment234051>

    Please change log level to 'debug'.



repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java (line 153)
<https://reviews.apache.org/r/55813/#comment234052>

    Please change log level to 'debug'.


- Madhan Neethiraj


On Jan. 23, 2017, 10:09 p.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
> 
> (Updated Jan. 23, 2017, 10:09 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
> 
> 
> Bugs: ATLAS-1403
>     https://issues.apache.org/jira/browse/ATLAS-1403
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7 
>   repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c 
>   repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90 
>   repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f 
>   repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c 
>   repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
>   repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff 
>   repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 
>   repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 
> 
> Diff: https://reviews.apache.org/r/55813/diff/
> 
> 
> Testing
> -------
> 
> Ran Unit Tests and was successful.
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>


Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162981
-----------------------------------------------------------


Ship it!




The fix looks good!

With this patch, following DSL query returns in about 300ms, compared to about 50 seconds earlier! On a store having ~70,000 hive_columns

  hive_column where qualifiedName='default.testtable_772.col507@cl1'

- Madhan Neethiraj


On Jan. 25, 2017, 9:32 a.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
> 
> (Updated Jan. 25, 2017, 9:32 a.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
> 
> 
> Bugs: ATLAS-1403
>     https://issues.apache.org/jira/browse/ATLAS-1403
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7 
>   repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c 
>   repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90 
>   repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f 
>   repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c 
>   repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
>   repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff 
>   repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 
>   repository/src/main/scala/org/apache/atlas/query/GremlinQuery.scala a61ff98 
>   repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c 
>   repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 
> 
> Diff: https://reviews.apache.org/r/55813/diff/
> 
> 
> Testing
> -------
> 
> Ran all Unit Tests and was successful.
> Ran search query on hive_column with 100,000 entities, performance improved from 45sec to 0.5sec
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>


Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

Posted by Sarath Subramanian <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/
-----------------------------------------------------------

(Updated Jan. 25, 2017, 1:32 a.m.)


Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.


Changes
-------

1. Improved GremlinTranslation to apply 'AND transformation rule' only for trait, alias and non-logical expressions.


Bugs: ATLAS-1403
    https://issues.apache.org/jira/browse/ATLAS-1403


Repository: atlas


Description
-------

Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
For eg:
Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
Optimized query
Gremlin Query = L:
{g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}


Diffs (updated)
-----

  repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7 
  repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c 
  repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90 
  repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f 
  repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c 
  repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
  repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff 
  repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 
  repository/src/main/scala/org/apache/atlas/query/GremlinQuery.scala a61ff98 
  repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c 
  repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 

Diff: https://reviews.apache.org/r/55813/diff/


Testing (updated)
-------

Ran all Unit Tests and was successful.
Ran search query on hive_column with 100,000 entities, performance improved from 45sec to 0.5sec


Thanks,

Sarath Subramanian


Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

Posted by Sarath Subramanian <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/
-----------------------------------------------------------

(Updated Jan. 24, 2017, 12:08 a.m.)


Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.


Changes
-------

reverted changes to DataSetLineageService
disabled tests in DataSetLineageServiceTest


Bugs: ATLAS-1403
    https://issues.apache.org/jira/browse/ATLAS-1403


Repository: atlas


Description
-------

Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
For eg:
Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
Optimized query
Gremlin Query = L:
{g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}


Diffs (updated)
-----

  repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7 
  repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c 
  repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90 
  repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f 
  repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c 
  repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
  repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff 
  repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 
  repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c 
  repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 

Diff: https://reviews.apache.org/r/55813/diff/


Testing
-------

Ran Unit Tests and was successful.


Thanks,

Sarath Subramanian


Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

Posted by Sarath Subramanian <sa...@gmail.com>.

> On Jan. 23, 2017, 8:36 p.m., Suma Shivaprasad wrote:
> > repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java, line 99
> > <https://reviews.apache.org/r/55813/diff/3/?file=1613196#file1613196line99>
> >
> >     This shoudnt have superTypeNames check directly in the query since there is no index on guid + typeName. I had removed it in ATLAS-1404

reverted changes in DataSetLineageService class since its used only in unit tests.


- Sarath


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162765
-----------------------------------------------------------


On Jan. 24, 2017, 12:08 a.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 12:08 a.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
> 
> 
> Bugs: ATLAS-1403
>     https://issues.apache.org/jira/browse/ATLAS-1403
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7 
>   repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c 
>   repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90 
>   repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f 
>   repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c 
>   repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
>   repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff 
>   repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 
>   repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c 
>   repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 
> 
> Diff: https://reviews.apache.org/r/55813/diff/
> 
> 
> Testing
> -------
> 
> Ran Unit Tests and was successful.
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>


Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162765
-----------------------------------------------------------




repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java (line 97)
<https://reviews.apache.org/r/55813/#comment234109>

    This shoudnt have superTypeNames check directly in the query since there is no index on guid + typeName. I had removed it in ATLAS-1404


- Suma Shivaprasad


On Jan. 23, 2017, 11:17 p.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
> 
> (Updated Jan. 23, 2017, 11:17 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
> 
> 
> Bugs: ATLAS-1403
>     https://issues.apache.org/jira/browse/ATLAS-1403
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7 
>   repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c 
>   repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90 
>   repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f 
>   repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c 
>   repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
>   repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff 
>   repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 
>   repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 
> 
> Diff: https://reviews.apache.org/r/55813/diff/
> 
> 
> Testing
> -------
> 
> Ran Unit Tests and was successful.
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>


Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

Posted by Sarath Subramanian <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/
-----------------------------------------------------------

(Updated Jan. 23, 2017, 3:17 p.m.)


Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.


Changes
-------

* reverted collectTypeInstancesIntoVar back to 'true'
* added debug statements


Bugs: ATLAS-1403
    https://issues.apache.org/jira/browse/ATLAS-1403


Repository: atlas


Description
-------

Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
For eg:
Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
Optimized query
Gremlin Query = L:
{g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}


Diffs (updated)
-----

  repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7 
  repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c 
  repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90 
  repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f 
  repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c 
  repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
  repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff 
  repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 
  repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 

Diff: https://reviews.apache.org/r/55813/diff/


Testing
-------

Ran Unit Tests and was successful.


Thanks,

Sarath Subramanian


Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

Posted by Sarath Subramanian <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/
-----------------------------------------------------------

(Updated Jan. 23, 2017, 2:09 p.m.)


Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.


Changes
-------

Implemented typeTestExpressionUsingInFilter() and made changes to use Gremlin2ExpressionFactory class


Bugs: ATLAS-1403
    https://issues.apache.org/jira/browse/ATLAS-1403


Repository: atlas


Description
-------

Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
For eg:
Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
Optimized query
Gremlin Query = L:
{g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}


Diffs (updated)
-----

  repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7 
  repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c 
  repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90 
  repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f 
  repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c 
  repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
  repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff 
  repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 
  repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 

Diff: https://reviews.apache.org/r/55813/diff/


Testing
-------

Ran Unit Tests and was successful.


Thanks,

Sarath Subramanian


Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

Posted by Sarath Subramanian <sa...@gmail.com>.

> On Jan. 20, 2017, 7:11 p.m., Suma Shivaprasad wrote:
> > repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala, line 25
> > <https://reviews.apache.org/r/55813/diff/1/?file=1611399#file1611399line25>
> >
> >     will also need to change code in Gremlin2ExpressionFactory and Gremlin3ExpressionFactory.typeTestExpression
> >     
> >     for in clause changes
> >     
> >     Please check GremlinExpressionFactory.generateTypeTestExpression. This gets called GremlinQuery

Implemented In-filter changes for Gremlin2, Gremlin3 changes will be addressed in separate jira.


- Sarath


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162536
-----------------------------------------------------------


On Jan. 23, 2017, 2:09 p.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
> 
> (Updated Jan. 23, 2017, 2:09 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
> 
> 
> Bugs: ATLAS-1403
>     https://issues.apache.org/jira/browse/ATLAS-1403
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7 
>   repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c 
>   repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90 
>   repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f 
>   repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c 
>   repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
>   repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff 
>   repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 
>   repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 
> 
> Diff: https://reviews.apache.org/r/55813/diff/
> 
> 
> Testing
> -------
> 
> Ran Unit Tests and was successful.
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>


Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162536
-----------------------------------------------------------




repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala (line 24)
<https://reviews.apache.org/r/55813/#comment233842>

    will also need to change code in Gremlin2ExpressionFactory and Gremlin3ExpressionFactory.typeTestExpression
    
    for in clause changes
    
    Please check GremlinExpressionFactory.generateTypeTestExpression. This gets called GremlinQuery


- Suma Shivaprasad


On Jan. 21, 2017, 2:19 a.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2017, 2:19 a.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
> 
> 
> Bugs: ATLAS-1403
>     https://issues.apache.org/jira/browse/ATLAS-1403
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7 
>   repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c 
>   repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c 
>   repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
>   repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff 
>   repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 
>   repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 
> 
> Diff: https://reviews.apache.org/r/55813/diff/
> 
> 
> Testing
> -------
> 
> Ran Unit Tests and was successful.
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>