You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@rya.apache.org by "Caleb Meier (JIRA)" <ji...@apache.org> on 2017/11/22 15:55:00 UTC
[jira] [Updated] (RYA-408) PCJ Updater Does Not Support Queries with DIrect Products

     [ https://issues.apache.org/jira/browse/RYA-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Caleb Meier updated RYA-408:
----------------------------
    Description: 
A number of optimizations were made to the Rya PCJ Updater to support sharding.  Among these optimizations was sharding the binding set results to distribute the load among the tablet servers.  These changes prevent the JoinResultUpdater from creating joins that are the result of direct products.  This is a result of how new rows are written in the Fluo table.  For example, statement patterns used to be written in the form 
"SP_123/BS_Val1:BS_Val2", but are now written as SP:HASH(BS_Val1):123/BS_Val1:BS_Val2",
where HASH(BS_Val1) is the hash of the first binding set value.  Before sharding, a targeted range scan for all the results associated with SP_123 could be done, because all of entries associated with that node had that prefix.  After sharding, it is impossible to do a targeted range lookup on values corresponding to SP_123 without the first binding set value (because the hash precedes the id = 123).  So if the JoinResultUpdater attempts to join a new StatementPattern result with the results of another StatementPattern and there are no common variables (and therefore no first binding value to hash), then the updater will not locate any results.  

This issue can be resolved by issuing a more general scan on the "SP" prefix and then filtering the results on the StatementPattern nodeId (123 in the above example).  This is not a very performant approach, but may be the only way to resolve the issue.  Given the large amount of data that is currently stored in the Fluo table already, there is some question about whether we should support direct products in Fluo queries anyway.  Another approach is to simply attempt to optimize queries to avoid direct queries when they are registered (this should be done anyway), and if there is no arrangement that avoid direct products, then throw an exception.  So we could take the approach that queries that have unavoidable direct products should not be allowed to be registered in Fluo.   

  was:
A number of optimizations were made to the Rya PCJ Updater to support sharding.  Among these optimizations was sharding the binding set results to distribute the load among the tablet servers.  The changes that were made to shard the rows prevents the JoinResultUpdater from creating joins that are the result of direct products.  This is a direct result of how new rows are written in the Fluo table.  For example, statement patterns used to be written in the form 
"SP_123/BS_Val1:BS_Val2", but are now written as SP:HASH(BS_Val1):123/BS_Val1:BS_Val2",
where HASH(BS_Val1) is the hash of the first binding set value.  After sharding, it is impossible to do a targeted range lookup on values corresponding to SP_123 without the first binding set value (because the hash precedes the id).  So if the JoinResultUpdater attempts to join a new StatementPattern result with the results of another StatementPattern and there are no common variables (and therefore no first binding value to hash), then the updater will not locate any results.  

This issue can be resolved by issuing a more general scan on the "SP" prefix and then filtering the results on the StatementPattern nodeId.  This is not a very performant approach, but may be the only way to resolve the issue.  Given the large amount of data that is currently stored in the Fluo table already, there is some question about whether we should support direct products in Fluo queries anyway.  Another approach is to simply attempt to optimize queries to avoid direct queries when they are register (this should be done anyway), and if there is no arrangement that avoid direct products, then throw an exception.  Queries that have unavoidable direct products should not be allowed to be registered in Fluo.   


> PCJ Updater Does Not Support Queries with DIrect Products
> ---------------------------------------------------------
>
>                 Key: RYA-408
>                 URL: https://issues.apache.org/jira/browse/RYA-408
>             Project: Rya
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 3.2.12
>            Reporter: Caleb Meier
>
> A number of optimizations were made to the Rya PCJ Updater to support sharding.  Among these optimizations was sharding the binding set results to distribute the load among the tablet servers.  These changes prevent the JoinResultUpdater from creating joins that are the result of direct products.  This is a result of how new rows are written in the Fluo table.  For example, statement patterns used to be written in the form 
> "SP_123/BS_Val1:BS_Val2", but are now written as SP:HASH(BS_Val1):123/BS_Val1:BS_Val2",
> where HASH(BS_Val1) is the hash of the first binding set value.  Before sharding, a targeted range scan for all the results associated with SP_123 could be done, because all of entries associated with that node had that prefix.  After sharding, it is impossible to do a targeted range lookup on values corresponding to SP_123 without the first binding set value (because the hash precedes the id = 123).  So if the JoinResultUpdater attempts to join a new StatementPattern result with the results of another StatementPattern and there are no common variables (and therefore no first binding value to hash), then the updater will not locate any results.  
> This issue can be resolved by issuing a more general scan on the "SP" prefix and then filtering the results on the StatementPattern nodeId (123 in the above example).  This is not a very performant approach, but may be the only way to resolve the issue.  Given the large amount of data that is currently stored in the Fluo table already, there is some question about whether we should support direct products in Fluo queries anyway.  Another approach is to simply attempt to optimize queries to avoid direct queries when they are registered (this should be done anyway), and if there is no arrangement that avoid direct products, then throw an exception.  So we could take the approach that queries that have unavoidable direct products should not be allowed to be registered in Fluo.   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)