You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (Created) (JIRA)" <ji...@apache.org> on 2011/11/09 15:29:51 UTC

[jira] [Created] (CONNECTORS-286) Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains

Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains
--------------------------------------------------------------------------------------------------------------------------------

                 Key: CONNECTORS-286
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-286
             Project: ManifoldCF
          Issue Type: New Feature
          Components: Framework core
            Reporter: Karl Wright
            Assignee: Karl Wright
             Fix For: ManifoldCF next


ManifoldCF's reliance on a relational database limits its throughput and scalability.  I am now convinced it is possible to build all the structures we need within a distributed key-value store like Voldemort, which has the nice side effect of permitting massive scaling.  I envision there will be several layers to this project, some of which may have broader utility in the open-source community at large:

(1) An atomic serialization layer, which adds serialization capabilities to an non-transactional substrate;
(2) A transaction layer, which uses atomic serialization to build a notion of light transactions;
(3) A table and index layer, which defines SQL-like concepts of tables and btree indexes on top of the transaction layer, via a Java API;
(4) A generic "database abstraction" layer, which is capable of representing both standard SQL databases as well as this NoSQL variant, so that ManifoldCF can support both models.

This is obviously a major development task, and as such is not envisioned to be completed by the next standard release.  Work will indeed need to be done in a branch.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-286) Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171178#comment-13171178 ] 

Karl Wright commented on CONNECTORS-286:
----------------------------------------

Work on a database java API, with a key/value back-end implementation, has been going on for some six weeks now and is nearly complete.  It may be a valuable first step to convert ManifoldCF to this layer, and create standard SQL database implementations of it as a replacement for the current ManifoldCF implementation layer, to see how that works.  The functionality needed by the database in order to participate is limited to the following:

- transaction support
- ability to read tables sequentially
- ability to read through a table index in order, with specific conditions on that index read

Everything else is synthesized by the Warthog API layer.  This should make the job of porting to different databases much much easier - if it works.

Code as it currently stands is in branches/CONNECTORS-286/warthog.

                
> Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-286
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-286
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework core
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF next
>
>
> ManifoldCF's reliance on a relational database limits its throughput and scalability.  I am now convinced it is possible to build all the structures we need within a distributed key-value store like Voldemort, which has the nice side effect of permitting massive scaling.  I envision there will be several layers to this project, some of which may have broader utility in the open-source community at large:
> (1) An atomic serialization layer, which adds serialization capabilities to an non-transactional substrate;
> (2) A transaction layer, which uses atomic serialization to build a notion of light transactions;
> (3) A table and index layer, which defines SQL-like concepts of tables and btree indexes on top of the transaction layer, via a Java API;
> (4) A generic "database abstraction" layer, which is capable of representing both standard SQL databases as well as this NoSQL variant, so that ManifoldCF can support both models.
> This is obviously a major development task, and as such is not envisioned to be completed by the next standard release.  Work will indeed need to be done in a branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CONNECTORS-286) Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains

Posted by "Karl Wright (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181155#comment-13181155 ] 

Karl Wright edited comment on CONNECTORS-286 at 1/6/12 6:23 AM:
----------------------------------------------------------------

bq. Not sure what Wharthog is...

See https://svn.apache.org/repos/asf/incubator/lcf/branches/CONNECTORS-286/warthog.  "Warthog" is a potential future spinoff technology of ManifoldCF - if all this stuff actually works. ;-)

bq. But I wanted to suggest you consider using HBase for underlying storage.

Maybe.  One of the apparent requirements of ManifoldCF is the ability run on top of practically anything.  I don't think we'd be able to throw away PostgreSQL and MySQL and HSQLDB and Derby support for instance.  So if Warthog is the actual API layer ManifoldCF uses then we'd need implementations of Warthog for as many backends as possible.  Right now I've only got one going for a testing key-value store, but almost certainly the next step would be a SQL database.  Then moving on from then HBase (or Hive/Pig) may also be possibilities.  Still proving the concept however...

                
      was (Author: kwright@metacarta.com):
    bq. Not sure what Wharthog is...

See https://svn.apache.org/repos/asf/incubator/lcf/branches/CONNECTORS-286/warthog.  "Warthog" is a potential future spinoff technology of ManifoldCF - if all this stuff actually works. ;-)

bq. But I wanted to suggest you consider using HBase for underlying storage.

Maybe.  One of the apparent requirements of ManifoldCF is the ability run on top of practically anything.  I don't think we'd be able to throw away PostgreSQL and MySQL and HSQLDB and Derby support for instance.  So if Warthog is the actual API layer ManifoldCF uses then we'd need implementations of Warthog for as many backends as possible.  Right now I've only going one for a testing key-value store, but almost certainly the next step would be a SQL database.  Then moving on from then HBase (or Hive/Pig) may also be possibilities.  Still proving the concept however...

                  
> Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-286
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-286
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework core
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF next
>
>
> ManifoldCF's reliance on a relational database limits its throughput and scalability.  I am now convinced it is possible to build all the structures we need within a distributed key-value store like Voldemort, which has the nice side effect of permitting massive scaling.  I envision there will be several layers to this project, some of which may have broader utility in the open-source community at large:
> (1) An atomic serialization layer, which adds serialization capabilities to an non-transactional substrate;
> (2) A transaction layer, which uses atomic serialization to build a notion of light transactions;
> (3) A table and index layer, which defines SQL-like concepts of tables and btree indexes on top of the transaction layer, via a Java API;
> (4) A generic "database abstraction" layer, which is capable of representing both standard SQL databases as well as this NoSQL variant, so that ManifoldCF can support both models.
> This is obviously a major development task, and as such is not envisioned to be completed by the next standard release.  Work will indeed need to be done in a branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-286) Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178695#comment-13178695 ] 

Karl Wright commented on CONNECTORS-286:
----------------------------------------

Using the Warthog API as the standard ManifoldCF way of dealing with databases may not be practical, for the following reasons.
- A significant amount of the actual functionality of Warthog comes from java methods you supply to it.  This is incompatible fundamentally with using a standard database to do the same thing, because there are bound to be situations where the two implementations disagree.
- A full database implementation under Warthog entails using the database for table storage and index access (ordered) with conditions applied to the index.  Warthog would do the rest.  But it is conceivable that this would not perform as well as native database queries.
- It is not clear how to construct a cache key in Warthog, so caching database results will require some thought.  Caching at the interface to the underlying database is not practical at all, because only partial resultsets will be read from many of the queries.
- It's not even clear (yet) whether critical functionality is missing from Warthog that will be needed to implement ManifoldCF.

Nevertheless, the next step is to try to create an implementation of Warthog where WHTableStore, WHTable, and WHIndex are implemented by an underlying relational database.  The difficulty in this, as stated above, occurs because the index (for example) is defined in terms of a WHComparator for each column being indexed, which is opaque Java code. Instead of merely performing the comparison, the code must, in addition, be in accordance with what the database is doing, AND also be capable of assisting in the generation of SQL code.  Special SQL-consistent WHComparator implementations are therefore going to be necessary, which also implement another interface (SQLInspectable?).  The WHIndex implementation can therefore use them to do what it needs, and complain if somebody tries to use incompatible comparator implementations.

Thus, each implementation of the Warthog API consists of:
- Implementations of WHTableStore and WHTable and WHIndex
- A body of comparators, filters, etc. that implement data types consistent with the SQL database
 

                
> Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-286
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-286
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework core
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF next
>
>
> ManifoldCF's reliance on a relational database limits its throughput and scalability.  I am now convinced it is possible to build all the structures we need within a distributed key-value store like Voldemort, which has the nice side effect of permitting massive scaling.  I envision there will be several layers to this project, some of which may have broader utility in the open-source community at large:
> (1) An atomic serialization layer, which adds serialization capabilities to an non-transactional substrate;
> (2) A transaction layer, which uses atomic serialization to build a notion of light transactions;
> (3) A table and index layer, which defines SQL-like concepts of tables and btree indexes on top of the transaction layer, via a Java API;
> (4) A generic "database abstraction" layer, which is capable of representing both standard SQL databases as well as this NoSQL variant, so that ManifoldCF can support both models.
> This is obviously a major development task, and as such is not envisioned to be completed by the next standard release.  Work will indeed need to be done in a branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-286) Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains

Posted by "Karl Wright (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396425#comment-13396425 ] 

Karl Wright commented on CONNECTORS-286:
----------------------------------------

I separated out warthog into a separate tree under the root https://svn.apache.org/repos/asf/manifoldcf/warthog.  
                
> Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-286
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-286
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework core
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF next
>
>
> ManifoldCF's reliance on a relational database limits its throughput and scalability.  I am now convinced it is possible to build all the structures we need within a distributed key-value store like Voldemort, which has the nice side effect of permitting massive scaling.  I envision there will be several layers to this project, some of which may have broader utility in the open-source community at large:
> (1) An atomic serialization layer, which adds serialization capabilities to an non-transactional substrate;
> (2) A transaction layer, which uses atomic serialization to build a notion of light transactions;
> (3) A table and index layer, which defines SQL-like concepts of tables and btree indexes on top of the transaction layer, via a Java API;
> (4) A generic "database abstraction" layer, which is capable of representing both standard SQL databases as well as this NoSQL variant, so that ManifoldCF can support both models.
> This is obviously a major development task, and as such is not envisioned to be completed by the next standard release.  Work will indeed need to be done in a branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-286) Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains

Posted by "Otis Gospodnetic (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181466#comment-13181466 ] 

Otis Gospodnetic commented on CONNECTORS-286:
---------------------------------------------

Maybe I'm not understanding what Warthog is, but doesn't that sounds like a common API for different data storage backends and isn't that what GORA (just graduated) aims to do?  I know it specifically mentions column stores on http://incubator.apache.org/gora/ , but I seem to remember MySQL being mentioned in GORA context, too, for example.  Anyhow, maybe worth checking with GORA what their plans and roadmap are to see if you can use (any of) it or not.
                
> Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-286
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-286
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework core
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF next
>
>
> ManifoldCF's reliance on a relational database limits its throughput and scalability.  I am now convinced it is possible to build all the structures we need within a distributed key-value store like Voldemort, which has the nice side effect of permitting massive scaling.  I envision there will be several layers to this project, some of which may have broader utility in the open-source community at large:
> (1) An atomic serialization layer, which adds serialization capabilities to an non-transactional substrate;
> (2) A transaction layer, which uses atomic serialization to build a notion of light transactions;
> (3) A table and index layer, which defines SQL-like concepts of tables and btree indexes on top of the transaction layer, via a Java API;
> (4) A generic "database abstraction" layer, which is capable of representing both standard SQL databases as well as this NoSQL variant, so that ManifoldCF can support both models.
> This is obviously a major development task, and as such is not envisioned to be completed by the next standard release.  Work will indeed need to be done in a branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-286) Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181501#comment-13181501 ] 

Karl Wright commented on CONNECTORS-286:
----------------------------------------

bq. Maybe I'm not understanding what Warthog is, but doesn't that sounds like a common API for different data storage backends and isn't that what GORA (just graduated) aims to do?

It's really pretty hard to see what GORA actually does since it has almost no documentation that I can find.  But it does mention SQL but uses the modifier "primitive" in conjunction with it.  And that, you see, is what Warthog fixes - it turns tables and ordered btree indexes (with index filtering criteria) into the full set of SQL constructs, e.g. joins, subqueries, aggregation, etc., but without actual SQL.  Warthog also basically requires you to specify the plan that will be used to run the query and does not leave this up to the database.  Implementing Warthog on top of Gora might be the appropriate path, if the documentation hints can be taken at face value.

                
> Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-286
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-286
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework core
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF next
>
>
> ManifoldCF's reliance on a relational database limits its throughput and scalability.  I am now convinced it is possible to build all the structures we need within a distributed key-value store like Voldemort, which has the nice side effect of permitting massive scaling.  I envision there will be several layers to this project, some of which may have broader utility in the open-source community at large:
> (1) An atomic serialization layer, which adds serialization capabilities to an non-transactional substrate;
> (2) A transaction layer, which uses atomic serialization to build a notion of light transactions;
> (3) A table and index layer, which defines SQL-like concepts of tables and btree indexes on top of the transaction layer, via a Java API;
> (4) A generic "database abstraction" layer, which is capable of representing both standard SQL databases as well as this NoSQL variant, so that ManifoldCF can support both models.
> This is obviously a major development task, and as such is not envisioned to be completed by the next standard release.  Work will indeed need to be done in a branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-286) Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181155#comment-13181155 ] 

Karl Wright commented on CONNECTORS-286:
----------------------------------------

bq. Not sure what Wharthog is...

See https://svn.apache.org/repos/asf/incubator/lcf/branches/CONNECTORS-286/warthog.  "Warthog" is a potential future spinoff technology of ManifoldCF - if all this stuff actually works. ;-)

bq. But I wanted to suggest you consider using HBase for underlying storage.

Maybe.  One of the apparent requirements of ManifoldCF is the ability run on top of practically anything.  I don't think we'd be able to throw away PostgreSQL and MySQL and HSQLDB and Derby support for instance.  So if Warthog is the actual API layer ManifoldCF uses then we'd need implementations of Warthog for as many backends as possible.  Right now I've only going one for a testing key-value store, but almost certainly the next step would be a SQL database.  Then moving on from then HBase (or Hive/Pig) may also be possibilities.  Still proving the concept however...

                
> Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-286
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-286
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework core
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF next
>
>
> ManifoldCF's reliance on a relational database limits its throughput and scalability.  I am now convinced it is possible to build all the structures we need within a distributed key-value store like Voldemort, which has the nice side effect of permitting massive scaling.  I envision there will be several layers to this project, some of which may have broader utility in the open-source community at large:
> (1) An atomic serialization layer, which adds serialization capabilities to an non-transactional substrate;
> (2) A transaction layer, which uses atomic serialization to build a notion of light transactions;
> (3) A table and index layer, which defines SQL-like concepts of tables and btree indexes on top of the transaction layer, via a Java API;
> (4) A generic "database abstraction" layer, which is capable of representing both standard SQL databases as well as this NoSQL variant, so that ManifoldCF can support both models.
> This is obviously a major development task, and as such is not envisioned to be completed by the next standard release.  Work will indeed need to be done in a branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CONNECTORS-286) Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains

Posted by "Otis Gospodnetic (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181067#comment-13181067 ] 

Otis Gospodnetic edited comment on CONNECTORS-286 at 1/6/12 3:15 AM:
---------------------------------------------------------------------

Not sure what Wharthog is...

But I wanted to suggest you consider using HBase for underlying storage.  It's more complex to operate (runs on top of HDFS, which means a few different processes and typically likes to run on more than 1 server for replication and scalability purposes, in addition to a couple of its own processes, Zookeeper and such), but it may be a better choice than Voldemort - integration with MapReduce, ability to do a fast scan with filtering, more open and active community, etc.

See also: http://incubator.apache.org/gora/

                
      was (Author: otis):
    Not sure what Wharthog is...

But I wanted to suggest you consider using HBase for underlying storage.  It's more complex to operate (runs on top of HDFS, which means a few different processes and typically likes to run on more than 1 server for replication and scalability purposes, in addition to a couple of its own processes, Zookeeper and such), but it may be a better choice than Voldemort - integration with MapReduce, ability to do a fast scan with filtering, more open and active community, etc.

See also: http://incubator.apache.org/projects/gora.html

                  
> Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-286
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-286
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework core
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF next
>
>
> ManifoldCF's reliance on a relational database limits its throughput and scalability.  I am now convinced it is possible to build all the structures we need within a distributed key-value store like Voldemort, which has the nice side effect of permitting massive scaling.  I envision there will be several layers to this project, some of which may have broader utility in the open-source community at large:
> (1) An atomic serialization layer, which adds serialization capabilities to an non-transactional substrate;
> (2) A transaction layer, which uses atomic serialization to build a notion of light transactions;
> (3) A table and index layer, which defines SQL-like concepts of tables and btree indexes on top of the transaction layer, via a Java API;
> (4) A generic "database abstraction" layer, which is capable of representing both standard SQL databases as well as this NoSQL variant, so that ManifoldCF can support both models.
> This is obviously a major development task, and as such is not envisioned to be completed by the next standard release.  Work will indeed need to be done in a branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-286) Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains

Posted by "Otis Gospodnetic (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181067#comment-13181067 ] 

Otis Gospodnetic commented on CONNECTORS-286:
---------------------------------------------

Not sure what Wharthog is...

But I wanted to suggest you consider using HBase for underlying storage.  It's more complex to operate (runs on top of HDFS, which means a few different processes and typically likes to run on more than 1 server for replication and scalability purposes, in addition to a couple of its own processes, Zookeeper and such), but it may be a better choice than Voldemort - integration with MapReduce, ability to do a fast scan with filtering, more open and active community, etc.

See also: http://incubator.apache.org/projects/gora.html

                
> Get ManifoldCF to run on top of a key/value store like Voldemort, for potential massive scalability improvements and speed gains
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-286
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-286
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework core
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF next
>
>
> ManifoldCF's reliance on a relational database limits its throughput and scalability.  I am now convinced it is possible to build all the structures we need within a distributed key-value store like Voldemort, which has the nice side effect of permitting massive scaling.  I envision there will be several layers to this project, some of which may have broader utility in the open-source community at large:
> (1) An atomic serialization layer, which adds serialization capabilities to an non-transactional substrate;
> (2) A transaction layer, which uses atomic serialization to build a notion of light transactions;
> (3) A table and index layer, which defines SQL-like concepts of tables and btree indexes on top of the transaction layer, via a Java API;
> (4) A generic "database abstraction" layer, which is capable of representing both standard SQL databases as well as this NoSQL variant, so that ManifoldCF can support both models.
> This is obviously a major development task, and as such is not envisioned to be completed by the next standard release.  Work will indeed need to be done in a branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira