You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@usergrid.apache.org by "Todd Nine (JIRA)" <ji...@apache.org> on 2014/04/15 17:10:14 UTC

[jira] [Updated] (USERGRID-107) Implement commit logging and sharding on graph edges

     [ https://issues.apache.org/jira/browse/USERGRID-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Nine updated USERGRID-107:
-------------------------------

    Description: 
Currently, we're limited to 2b graph edges by type from a single source or target edge.  In highly connected graphs, this makes it impossible to construct the entire graph.  To alleviate this, we should use a commit log + time series post processing.  I envision this working in the following way.

# Have a commit log set of CF's for all edges.  The gc_grace period should be set very low.  This should be around 1 minute.  Re-writing existing edges (due to phantom deletes) will not be an issue, all algorithms should be idempotent.
# Always write to the commit log, Fire the async processing as usual, and immediately return
# In post processing use variable size time series (algorithm TBD) to copy the edge from the commit log CF into the new correct CF.  Remove the entry from the commit log.
# When seeking values, seek values from the correct shards (via time) on disk as well as the commit log.  These values can are already time ordered, so an in memory merge can easily take place for a final result set.



  was:
Currently, we're limited to 2b graph edges by type from a single source or target edge.  In highly connected graphs, this makes it impossible to construct the entire graph.  To alleviate this, we should use a commit log + time series post processing.  I envision this working in the following way.

# Have a commit log set of CF's for all edges.  The gc_grace period should be set very low.  This should be around 1 minute.  Re-writing existing edges (due to phantom deletes) will not be an issue, all algorithms should be idempotent.

# Always write to the commit log, Fire the async processing as usual, and immediately return

# In post processing use variable size time series (algorithm TBD) to copy the edge from the commit log CF into the new correct CF.  Remove the entry from the commit log.

# When seeking values, seek values from the correct shards (via time) on disk as well as the commit log.  These values can are already time ordered, so an in memory merge can easily take place for a final result set.




> Implement commit logging and sharding on graph edges
> ----------------------------------------------------
>
>                 Key: USERGRID-107
>                 URL: https://issues.apache.org/jira/browse/USERGRID-107
>             Project: Usergrid
>          Issue Type: Story
>            Reporter: Todd Nine
>            Assignee: Todd Nine
>            Priority: Blocker
>
> Currently, we're limited to 2b graph edges by type from a single source or target edge.  In highly connected graphs, this makes it impossible to construct the entire graph.  To alleviate this, we should use a commit log + time series post processing.  I envision this working in the following way.
> # Have a commit log set of CF's for all edges.  The gc_grace period should be set very low.  This should be around 1 minute.  Re-writing existing edges (due to phantom deletes) will not be an issue, all algorithms should be idempotent.
> # Always write to the commit log, Fire the async processing as usual, and immediately return
> # In post processing use variable size time series (algorithm TBD) to copy the edge from the commit log CF into the new correct CF.  Remove the entry from the commit log.
> # When seeking values, seek values from the correct shards (via time) on disk as well as the commit log.  These values can are already time ordered, so an in memory merge can easily take place for a final result set.



--
This message was sent by Atlassian JIRA
(v6.2#6252)