You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Hemanth Yamijala <yh...@gmail.com> on 2016/05/03 20:28:05 UTC

Review Request 46943: ATLAS-690: Read timed out exceptions when tables are imported into Atlas.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46943/
-----------------------------------------------------------

Review request for atlas.


Bugs: ATLAS-690
    https://issues.apache.org/jira/browse/ATLAS-690


Repository: atlas


Description
-------

The patch fixes two regressions caused in Atlas code base recently.

* A regression involving missing out adding the right indexes
* A regression that caused a quadratic load of vertices for hive objects.

The first one was fixed by correcting the indexes back. The second one was fixed by having a per request cache. Some other indexing changes are due to what looks like a Titan issue.


Diffs
-----

  repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 36d8034 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java d83c08c 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java c542ec7 
  repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java a3dc7e5 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerMockTest.java PRE-CREATION 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerTest.java 87fdf87 

Diff: https://reviews.apache.org/r/46943/diff/


Testing
-------

* Ran a local run of 1000 hive tables and verified the time taken is as before the regressions were introduced.
* All existing UTs / ITs pass. New UT for testing index addition added.


Thanks,

Hemanth Yamijala


Re: Review Request 46943: ATLAS-690: Read timed out exceptions when tables are imported into Atlas.

Posted by Shwetha GS <ss...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46943/#review131632
-----------------------------------------------------------




repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (line 390)
<https://reviews.apache.org/r/46943/#comment195647>

    this shouldn't be expensive - its read from reference. Can we return always?



repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (line 401)
<https://reviews.apache.org/r/46943/#comment195649>

    can it use just vertex id in the else case, shouldn't be expensive?



repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (line 410)
<https://reviews.apache.org/r/46943/#comment195650>

    use edge id in else case?



repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java (line 299)
<https://reviews.apache.org/r/46943/#comment195651>

    You can skip clear() as new instance is created everytime. Also, move new FullTextMapper() to mapTypedInstanceToGraph() so that the cache is shared across create and update entities in the same request


- Shwetha GS


On May 3, 2016, 6:28 p.m., Hemanth Yamijala wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46943/
> -----------------------------------------------------------
> 
> (Updated May 3, 2016, 6:28 p.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-690
>     https://issues.apache.org/jira/browse/ATLAS-690
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> The patch fixes two regressions caused in Atlas code base recently.
> 
> * A regression involving missing out adding the right indexes
> * A regression that caused a quadratic load of vertices for hive objects.
> 
> The first one was fixed by correcting the indexes back. The second one was fixed by having a per request cache. Some other indexing changes are due to what looks like a Titan issue.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 36d8034 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java d83c08c 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java c542ec7 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java a3dc7e5 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerMockTest.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerTest.java 87fdf87 
> 
> Diff: https://reviews.apache.org/r/46943/diff/
> 
> 
> Testing
> -------
> 
> * Ran a local run of 1000 hive tables and verified the time taken is as before the regressions were introduced.
> * All existing UTs / ITs pass. New UT for testing index addition added.
> 
> 
> Thanks,
> 
> Hemanth Yamijala
> 
>


Re: Review Request 46943: ATLAS-690: Read timed out exceptions when tables are imported into Atlas.

Posted by Hemanth Yamijala <yh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46943/
-----------------------------------------------------------

(Updated May 5, 2016, 4:35 a.m.)


Review request for atlas.


Changes
-------

Addressed latest review comments from Shwetha.


Bugs: ATLAS-690
    https://issues.apache.org/jira/browse/ATLAS-690


Repository: atlas


Description
-------

The patch fixes two regressions caused in Atlas code base recently.

* A regression involving missing out adding the right indexes
* A regression that caused a quadratic load of vertices for hive objects.

The first one was fixed by correcting the indexes back. The second one was fixed by having a per request cache. Some other indexing changes are due to what looks like a Titan issue.


Diffs (updated)
-----

  repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 36d8034 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java d83c08c 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java c542ec7 
  repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java a3dc7e5 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerMockTest.java PRE-CREATION 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerTest.java 87fdf87 

Diff: https://reviews.apache.org/r/46943/diff/


Testing
-------

* Ran a local run of 1000 hive tables and verified the time taken is as before the regressions were introduced.
* All existing UTs / ITs pass. New UT for testing index addition added.


Thanks,

Hemanth Yamijala


Re: Review Request 46943: ATLAS-690: Read timed out exceptions when tables are imported into Atlas.

Posted by Suma Shivaprasad <su...@gmail.com>.

> On May 4, 2016, 3:33 p.m., Suma Shivaprasad wrote:
> > repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java, line 56
> > <https://reviews.apache.org/r/46943/diff/2/?file=1370654#file1370654line56>
> >
> >     Does it make sense to cache the entities against the GUIDs for longer time? Since the flow involves persist to graph, generate fulltext and then usually a getEntityDefinition is called on same entity resulting in maping the graph to type twice. We could try Titan caching as well?

I understand that this change needs to be extensively tested due to potential memory issues. We could take this up in another jira as well for consideration.


- Suma


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46943/#review131690
-----------------------------------------------------------


On May 4, 2016, 10:58 a.m., Hemanth Yamijala wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46943/
> -----------------------------------------------------------
> 
> (Updated May 4, 2016, 10:58 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-690
>     https://issues.apache.org/jira/browse/ATLAS-690
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> The patch fixes two regressions caused in Atlas code base recently.
> 
> * A regression involving missing out adding the right indexes
> * A regression that caused a quadratic load of vertices for hive objects.
> 
> The first one was fixed by correcting the indexes back. The second one was fixed by having a per request cache. Some other indexing changes are due to what looks like a Titan issue.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 36d8034 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java d83c08c 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java c542ec7 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java a3dc7e5 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerMockTest.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerTest.java 87fdf87 
> 
> Diff: https://reviews.apache.org/r/46943/diff/
> 
> 
> Testing
> -------
> 
> * Ran a local run of 1000 hive tables and verified the time taken is as before the regressions were introduced.
> * All existing UTs / ITs pass. New UT for testing index addition added.
> 
> 
> Thanks,
> 
> Hemanth Yamijala
> 
>


Re: Review Request 46943: ATLAS-690: Read timed out exceptions when tables are imported into Atlas.

Posted by Hemanth Yamijala <yh...@gmail.com>.

> On May 4, 2016, 3:33 p.m., Suma Shivaprasad wrote:
> > repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java, line 56
> > <https://reviews.apache.org/r/46943/diff/2/?file=1370654#file1370654line56>
> >
> >     Does it make sense to cache the entities against the GUIDs for longer time? Since the flow involves persist to graph, generate fulltext and then usually a getEntityDefinition is called on same entity resulting in maping the graph to type twice. We could try Titan caching as well?
> 
> Suma Shivaprasad wrote:
>     I understand that this change needs to be extensively tested due to potential memory issues. We could take this up in another jira as well for consideration.

Suma, thanks for looking at this. I have not investigated the effects of caching these longer. Like you pointed out, any benefits there could also have trade-offs on  memory etc. Right now, my thinking was to get the least intrusive change that fixes the regressions and provably improves performance (as shown by our performance tests). So, I would like to pick this up as an improvement on a need basis. Let me know if this is fine and whether we can commit this patch in as-is state for now. I will leave the issue open for your response.


- Hemanth


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46943/#review131690
-----------------------------------------------------------


On May 5, 2016, 4:35 a.m., Hemanth Yamijala wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46943/
> -----------------------------------------------------------
> 
> (Updated May 5, 2016, 4:35 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-690
>     https://issues.apache.org/jira/browse/ATLAS-690
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> The patch fixes two regressions caused in Atlas code base recently.
> 
> * A regression involving missing out adding the right indexes
> * A regression that caused a quadratic load of vertices for hive objects.
> 
> The first one was fixed by correcting the indexes back. The second one was fixed by having a per request cache. Some other indexing changes are due to what looks like a Titan issue.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 36d8034 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java d83c08c 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java c542ec7 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java a3dc7e5 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerMockTest.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerTest.java 87fdf87 
> 
> Diff: https://reviews.apache.org/r/46943/diff/
> 
> 
> Testing
> -------
> 
> * Ran a local run of 1000 hive tables and verified the time taken is as before the regressions were introduced.
> * All existing UTs / ITs pass. New UT for testing index addition added.
> 
> 
> Thanks,
> 
> Hemanth Yamijala
> 
>


Re: Review Request 46943: ATLAS-690: Read timed out exceptions when tables are imported into Atlas.

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46943/#review131690
-----------------------------------------------------------




repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java (line 56)
<https://reviews.apache.org/r/46943/#comment195728>

    Does it make sense to cache the entities against the GUIDs for longer time? Since the flow involves persist to graph, generate fulltext and then usually a getEntityDefinition is called on same entity resulting in maping the graph to type twice. We could try Titan caching as well?


- Suma Shivaprasad


On May 4, 2016, 10:58 a.m., Hemanth Yamijala wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46943/
> -----------------------------------------------------------
> 
> (Updated May 4, 2016, 10:58 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-690
>     https://issues.apache.org/jira/browse/ATLAS-690
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> The patch fixes two regressions caused in Atlas code base recently.
> 
> * A regression involving missing out adding the right indexes
> * A regression that caused a quadratic load of vertices for hive objects.
> 
> The first one was fixed by correcting the indexes back. The second one was fixed by having a per request cache. Some other indexing changes are due to what looks like a Titan issue.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 36d8034 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java d83c08c 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java c542ec7 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java a3dc7e5 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerMockTest.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerTest.java 87fdf87 
> 
> Diff: https://reviews.apache.org/r/46943/diff/
> 
> 
> Testing
> -------
> 
> * Ran a local run of 1000 hive tables and verified the time taken is as before the regressions were introduced.
> * All existing UTs / ITs pass. New UT for testing index addition added.
> 
> 
> Thanks,
> 
> Hemanth Yamijala
> 
>


Re: Review Request 46943: ATLAS-690: Read timed out exceptions when tables are imported into Atlas.

Posted by Shwetha GS <ss...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46943/#review131652
-----------------------------------------------------------


Fix it, then Ship it!





repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (line 397)
<https://reviews.apache.org/r/46943/#comment195683>

    Use - vertex[id=%s]



repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java (lines 405 - 406)
<https://reviews.apache.org/r/46943/#comment195684>

    Use 'edge[id=%s]'


- Shwetha GS


On May 4, 2016, 10:58 a.m., Hemanth Yamijala wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46943/
> -----------------------------------------------------------
> 
> (Updated May 4, 2016, 10:58 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-690
>     https://issues.apache.org/jira/browse/ATLAS-690
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> The patch fixes two regressions caused in Atlas code base recently.
> 
> * A regression involving missing out adding the right indexes
> * A regression that caused a quadratic load of vertices for hive objects.
> 
> The first one was fixed by correcting the indexes back. The second one was fixed by having a per request cache. Some other indexing changes are due to what looks like a Titan issue.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 36d8034 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java d83c08c 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java c542ec7 
>   repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java a3dc7e5 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerMockTest.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerTest.java 87fdf87 
> 
> Diff: https://reviews.apache.org/r/46943/diff/
> 
> 
> Testing
> -------
> 
> * Ran a local run of 1000 hive tables and verified the time taken is as before the regressions were introduced.
> * All existing UTs / ITs pass. New UT for testing index addition added.
> 
> 
> Thanks,
> 
> Hemanth Yamijala
> 
>


Re: Review Request 46943: ATLAS-690: Read timed out exceptions when tables are imported into Atlas.

Posted by Hemanth Yamijala <yh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46943/
-----------------------------------------------------------

(Updated May 4, 2016, 10:58 a.m.)


Review request for atlas.


Changes
-------

Updated patch incorporating review comments.


Bugs: ATLAS-690
    https://issues.apache.org/jira/browse/ATLAS-690


Repository: atlas


Description
-------

The patch fixes two regressions caused in Atlas code base recently.

* A regression involving missing out adding the right indexes
* A regression that caused a quadratic load of vertices for hive objects.

The first one was fixed by correcting the indexes back. The second one was fixed by having a per request cache. Some other indexing changes are due to what looks like a Titan issue.


Diffs (updated)
-----

  repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapper.java 36d8034 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java d83c08c 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java c542ec7 
  repository/src/main/java/org/apache/atlas/repository/graph/TypedInstanceToGraphMapper.java a3dc7e5 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerMockTest.java PRE-CREATION 
  repository/src/test/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexerTest.java 87fdf87 

Diff: https://reviews.apache.org/r/46943/diff/


Testing
-------

* Ran a local run of 1000 hive tables and verified the time taken is as before the regressions were introduced.
* All existing UTs / ITs pass. New UT for testing index addition added.


Thanks,

Hemanth Yamijala