You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Sarath Subramanian <sa...@gmail.com> on 2017/01/21 02:19:12 UTC
Review Request 55813: Porting performance and stability changes made
in 0.7 branch into master
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/
-----------------------------------------------------------
Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
Bugs: ATLAS-1403
https://issues.apache.org/jira/browse/ATLAS-1403
Repository: atlas
Description
-------
Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
For eg:
Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
Optimized query
Gremlin Query = L:
{g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
Diffs
-----
repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7
repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c
repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90
repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c
repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582
repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff
repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176
repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5
Diff: https://reviews.apache.org/r/55813/diff/
Testing
-------
Ran Unit Tests and was successful.
Thanks,
Sarath Subramanian
Re: Review Request 55813: Porting performance and stability changes
made in 0.7 branch into master
Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162725
-----------------------------------------------------------
Fix it, then Ship it!
repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java (line 127)
<https://reviews.apache.org/r/55813/#comment234051>
Please change log level to 'debug'.
repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java (line 153)
<https://reviews.apache.org/r/55813/#comment234052>
Please change log level to 'debug'.
- Madhan Neethiraj
On Jan. 23, 2017, 10:09 p.m., Sarath Subramanian wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
>
> (Updated Jan. 23, 2017, 10:09 p.m.)
>
>
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
>
>
> Bugs: ATLAS-1403
> https://issues.apache.org/jira/browse/ATLAS-1403
>
>
> Repository: atlas
>
>
> Description
> -------
>
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
>
>
> Diffs
> -----
>
> repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7
> repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c
> repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90
> repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f
> repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544
> repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c
> repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582
> repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff
> repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176
> repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5
>
> Diff: https://reviews.apache.org/r/55813/diff/
>
>
> Testing
> -------
>
> Ran Unit Tests and was successful.
>
>
> Thanks,
>
> Sarath Subramanian
>
>
Re: Review Request 55813: Porting performance and stability changes
made in 0.7 branch into master
Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162981
-----------------------------------------------------------
Ship it!
The fix looks good!
With this patch, following DSL query returns in about 300ms, compared to about 50 seconds earlier! On a store having ~70,000 hive_columns
hive_column where qualifiedName='default.testtable_772.col507@cl1'
- Madhan Neethiraj
On Jan. 25, 2017, 9:32 a.m., Sarath Subramanian wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
>
> (Updated Jan. 25, 2017, 9:32 a.m.)
>
>
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
>
>
> Bugs: ATLAS-1403
> https://issues.apache.org/jira/browse/ATLAS-1403
>
>
> Repository: atlas
>
>
> Description
> -------
>
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
>
>
> Diffs
> -----
>
> repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7
> repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c
> repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90
> repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f
> repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544
> repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c
> repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582
> repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff
> repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176
> repository/src/main/scala/org/apache/atlas/query/GremlinQuery.scala a61ff98
> repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c
> repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5
>
> Diff: https://reviews.apache.org/r/55813/diff/
>
>
> Testing
> -------
>
> Ran all Unit Tests and was successful.
> Ran search query on hive_column with 100,000 entities, performance improved from 45sec to 0.5sec
>
>
> Thanks,
>
> Sarath Subramanian
>
>
Re: Review Request 55813: Porting performance and stability changes
made in 0.7 branch into master
Posted by Sarath Subramanian <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/
-----------------------------------------------------------
(Updated Jan. 25, 2017, 1:32 a.m.)
Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
Changes
-------
1. Improved GremlinTranslation to apply 'AND transformation rule' only for trait, alias and non-logical expressions.
Bugs: ATLAS-1403
https://issues.apache.org/jira/browse/ATLAS-1403
Repository: atlas
Description
-------
Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
For eg:
Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
Optimized query
Gremlin Query = L:
{g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
Diffs (updated)
-----
repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7
repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c
repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90
repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f
repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544
repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c
repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582
repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff
repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176
repository/src/main/scala/org/apache/atlas/query/GremlinQuery.scala a61ff98
repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c
repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5
Diff: https://reviews.apache.org/r/55813/diff/
Testing (updated)
-------
Ran all Unit Tests and was successful.
Ran search query on hive_column with 100,000 entities, performance improved from 45sec to 0.5sec
Thanks,
Sarath Subramanian
Re: Review Request 55813: Porting performance and stability changes
made in 0.7 branch into master
Posted by Sarath Subramanian <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/
-----------------------------------------------------------
(Updated Jan. 24, 2017, 12:08 a.m.)
Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
Changes
-------
reverted changes to DataSetLineageService
disabled tests in DataSetLineageServiceTest
Bugs: ATLAS-1403
https://issues.apache.org/jira/browse/ATLAS-1403
Repository: atlas
Description
-------
Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
For eg:
Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
Optimized query
Gremlin Query = L:
{g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
Diffs (updated)
-----
repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7
repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c
repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90
repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f
repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544
repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c
repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582
repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff
repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176
repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c
repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5
Diff: https://reviews.apache.org/r/55813/diff/
Testing
-------
Ran Unit Tests and was successful.
Thanks,
Sarath Subramanian
Re: Review Request 55813: Porting performance and stability changes
made in 0.7 branch into master
Posted by Sarath Subramanian <sa...@gmail.com>.
> On Jan. 23, 2017, 8:36 p.m., Suma Shivaprasad wrote:
> > repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java, line 99
> > <https://reviews.apache.org/r/55813/diff/3/?file=1613196#file1613196line99>
> >
> > This shoudnt have superTypeNames check directly in the query since there is no index on guid + typeName. I had removed it in ATLAS-1404
reverted changes in DataSetLineageService class since its used only in unit tests.
- Sarath
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162765
-----------------------------------------------------------
On Jan. 24, 2017, 12:08 a.m., Sarath Subramanian wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
>
> (Updated Jan. 24, 2017, 12:08 a.m.)
>
>
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
>
>
> Bugs: ATLAS-1403
> https://issues.apache.org/jira/browse/ATLAS-1403
>
>
> Repository: atlas
>
>
> Description
> -------
>
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
>
>
> Diffs
> -----
>
> repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7
> repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c
> repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90
> repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f
> repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544
> repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c
> repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582
> repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff
> repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176
> repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c
> repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5
>
> Diff: https://reviews.apache.org/r/55813/diff/
>
>
> Testing
> -------
>
> Ran Unit Tests and was successful.
>
>
> Thanks,
>
> Sarath Subramanian
>
>
Re: Review Request 55813: Porting performance and stability changes
made in 0.7 branch into master
Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162765
-----------------------------------------------------------
repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java (line 97)
<https://reviews.apache.org/r/55813/#comment234109>
This shoudnt have superTypeNames check directly in the query since there is no index on guid + typeName. I had removed it in ATLAS-1404
- Suma Shivaprasad
On Jan. 23, 2017, 11:17 p.m., Sarath Subramanian wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
>
> (Updated Jan. 23, 2017, 11:17 p.m.)
>
>
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
>
>
> Bugs: ATLAS-1403
> https://issues.apache.org/jira/browse/ATLAS-1403
>
>
> Repository: atlas
>
>
> Description
> -------
>
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
>
>
> Diffs
> -----
>
> repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7
> repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c
> repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90
> repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f
> repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544
> repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c
> repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582
> repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff
> repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176
> repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5
>
> Diff: https://reviews.apache.org/r/55813/diff/
>
>
> Testing
> -------
>
> Ran Unit Tests and was successful.
>
>
> Thanks,
>
> Sarath Subramanian
>
>
Re: Review Request 55813: Porting performance and stability changes
made in 0.7 branch into master
Posted by Sarath Subramanian <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/
-----------------------------------------------------------
(Updated Jan. 23, 2017, 3:17 p.m.)
Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
Changes
-------
* reverted collectTypeInstancesIntoVar back to 'true'
* added debug statements
Bugs: ATLAS-1403
https://issues.apache.org/jira/browse/ATLAS-1403
Repository: atlas
Description
-------
Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
For eg:
Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
Optimized query
Gremlin Query = L:
{g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
Diffs (updated)
-----
repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7
repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c
repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90
repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f
repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544
repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c
repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582
repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff
repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176
repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5
Diff: https://reviews.apache.org/r/55813/diff/
Testing
-------
Ran Unit Tests and was successful.
Thanks,
Sarath Subramanian
Re: Review Request 55813: Porting performance and stability changes
made in 0.7 branch into master
Posted by Sarath Subramanian <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/
-----------------------------------------------------------
(Updated Jan. 23, 2017, 2:09 p.m.)
Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
Changes
-------
Implemented typeTestExpressionUsingInFilter() and made changes to use Gremlin2ExpressionFactory class
Bugs: ATLAS-1403
https://issues.apache.org/jira/browse/ATLAS-1403
Repository: atlas
Description
-------
Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
For eg:
Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
Optimized query
Gremlin Query = L:
{g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
Diffs (updated)
-----
repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7
repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c
repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90
repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f
repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544
repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c
repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582
repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff
repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176
repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5
Diff: https://reviews.apache.org/r/55813/diff/
Testing
-------
Ran Unit Tests and was successful.
Thanks,
Sarath Subramanian
Re: Review Request 55813: Porting performance and stability changes
made in 0.7 branch into master
Posted by Sarath Subramanian <sa...@gmail.com>.
> On Jan. 20, 2017, 7:11 p.m., Suma Shivaprasad wrote:
> > repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala, line 25
> > <https://reviews.apache.org/r/55813/diff/1/?file=1611399#file1611399line25>
> >
> > will also need to change code in Gremlin2ExpressionFactory and Gremlin3ExpressionFactory.typeTestExpression
> >
> > for in clause changes
> >
> > Please check GremlinExpressionFactory.generateTypeTestExpression. This gets called GremlinQuery
Implemented In-filter changes for Gremlin2, Gremlin3 changes will be addressed in separate jira.
- Sarath
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162536
-----------------------------------------------------------
On Jan. 23, 2017, 2:09 p.m., Sarath Subramanian wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
>
> (Updated Jan. 23, 2017, 2:09 p.m.)
>
>
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
>
>
> Bugs: ATLAS-1403
> https://issues.apache.org/jira/browse/ATLAS-1403
>
>
> Repository: atlas
>
>
> Description
> -------
>
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
>
>
> Diffs
> -----
>
> repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7
> repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c
> repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90
> repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f
> repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544
> repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c
> repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582
> repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff
> repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176
> repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5
>
> Diff: https://reviews.apache.org/r/55813/diff/
>
>
> Testing
> -------
>
> Ran Unit Tests and was successful.
>
>
> Thanks,
>
> Sarath Subramanian
>
>
Re: Review Request 55813: Porting performance and stability changes
made in 0.7 branch into master
Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162536
-----------------------------------------------------------
repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala (line 24)
<https://reviews.apache.org/r/55813/#comment233842>
will also need to change code in Gremlin2ExpressionFactory and Gremlin3ExpressionFactory.typeTestExpression
for in clause changes
Please check GremlinExpressionFactory.generateTypeTestExpression. This gets called GremlinQuery
- Suma Shivaprasad
On Jan. 21, 2017, 2:19 a.m., Sarath Subramanian wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
>
> (Updated Jan. 21, 2017, 2:19 a.m.)
>
>
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
>
>
> Bugs: ATLAS-1403
> https://issues.apache.org/jira/browse/ATLAS-1403
>
>
> Repository: atlas
>
>
> Description
> -------
>
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()}
>
>
> Diffs
> -----
>
> repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7
> repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c
> repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90
> repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c
> repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582
> repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff
> repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176
> repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5
>
> Diff: https://reviews.apache.org/r/55813/diff/
>
>
> Testing
> -------
>
> Ran Unit Tests and was successful.
>
>
> Thanks,
>
> Sarath Subramanian
>
>