You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Apoorv Naik (JIRA)" <ji...@apache.org> on 2018/08/10 04:13:00 UTC

[jira] [Comment Edited] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2

    [ https://issues.apache.org/jira/browse/ATLAS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575740#comment-16575740 ] 

Apoorv Naik edited comment on ATLAS-2816 at 8/10/18 4:12 AM:
-------------------------------------------------------------

One suggestion, use the followReferences flag instead of hardcoding the ignoreRelationship param. This would make is easier to toggle if certain deployment scenario wants to use the relationship details to be captured in the entityText. Also follow this guideline for patch creation,

 
 # Work on a local branch
 # Commit the patch on local branch
 # Generate patch using "git format-patch origin/master" (this way you get credit by including author info in the patch)
 # Attach the patch to JIRA

 

HTH


was (Author: apoorvnaik):
One suggestion, use the followReferences flag instead of hardcoding the ignoreRelationship param. This would make is easier to toggle if certain deployment scenario wants to use the relationship details to be captured in the entityText.

 

HTH

> Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
> ------------------------------------------------------------------------
>
>                 Key: ATLAS-2816
>                 URL: https://issues.apache.org/jira/browse/ATLAS-2816
>             Project: Atlas
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Chengbing Liu
>            Assignee: Apoorv Naik
>            Priority: Major
>         Attachments: ATLAS-2816.01.patch
>
>
> We encountered a problem when using Hive bridge in production. One database has 5000+ tables. Importing the first table costs only tens of milliseconds, and then it becomes slower with more tables. In the end, it costs 1~2 seconds to import one table.
> After investigation, we realized that it is not necessary for the {{FullTextMapperV2}} to retrieve all the relationship of the database each time a table is imported. The time complexity of importing a whole database actually goes to O(n^2) (n is number of tables).
> We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: {{ignoreRelationship}}. When set to true, {{mapVertexToAtlasEntity}} will skip the {{mapRelationshipAttributes}} call. Since {{FullTextMapperV2}} will not use relationship attributes of the entity, this can save plenty of time when importing entities with a large number of relations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)