You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by Daniel Dai <da...@gmail.com> on 2016/03/17 21:17:28 UTC

[DISCUSS] [PROPOSAL] Omid for Apache Incubator

Hi,

I would like to propose Omid as an Apache Incubator project:

https://wiki.apache.org/incubator/OmidProposal

I've posted posted the text of the proposal below:

Thanks,
Daniel

= Omid Proposal =

=== Abstract ===

Omid is a flexible, reliable, high performant and scalable ACID
transactional framework that allows client applications to execute
transactions on top of MVCC key/value-based NoSQL datastores
(currently Apache HBase) providing Snapshot Isolation guarantees on
the accessed data.


=== Proposal ===

Omid is a flexible open-source transactional framework that provides
ACID transactions with Snapshot Isolation guarantees on top of NoSQL
datastores. In particular, the current codebase brings the concept of
transactions to the popular Apache HBase datastore. Omid offers great
performance, it is highly available, and scalable. Omid's current
version is able to scale to thousands of clients triggering concurrent
transactions on application data stored in HBase. Omid can scale
beyond 100K transactions per second on mid-range hardware while
incurring in a minimal impact on the speed of data access in the
datastore. We’re currently experimenting with a prototype version that
can improve the performance up to ~380K TPS.


Omid has been publicly available as an open-source project in Github
under Apache License Version 2.0 since 2011 [1]. During these years,
it has generated certain interest in the open source community,
especially since the public presentation of the first version in
Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
93 forks. Yahoo Inc. submits this proposal to the Apache Software
Foundation with the aim to transfer the Omid project -including its
source code and documentation- to Apache in order to start the build
of a stable open source community around it.


[1] https://github.com/yahoo/omid

[2] Omid presentation at Hadoop Summit 2013:
https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus


=== Background ===

An Omid prototype was first released as an open-source project back in
2011. Inspired by Google Percolator [1], it offered a lock-free
approach to transactions in NoSQL datastores (See [2]). However,
during these years, the design of Omid has evolved significantly.
Whilst the current open-sourced version maintains many aspects of the
original implementation, it is the result of a major redesign of the
first prototype released in 2011.


Omid has now a more decentralized design that does not sacrifice the
consistency and performance of the original version. The current
design also enables Omid to scale to thousands of clients executing
transactions concurrently on application data stored in HBase.
Internally, Omid still utilizes a lock-free approach to support
multiple concurrent clients. Its design also relies on a centralized
conflict detection component, the TSO, which now resolves in an
efficient manner writeset collisions among concurrent transactions
without having to piggyback commit information to the clients. Another
important benefit of Omid is that it doesn't require any modification
of the underlying key-value datastore, HBase in this case. Moreover,
the recently added high availability algorithm allows to eliminate the
single point of failure represented by the TSO in those system
deployments requiring a higher degree of dependability. Last but not
least, the provided user API is very simple, mimicking transaction
managers in the relational world: begin, commit, rollback.


Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
management platform powering some of next-generation search and
personalization products is using Omid as a transaction manager in its
processing pipeline. Sieve essentially acts as a huge processing hub
between content feeds and serving systems. It provides an environment
for highly customizable, real-time, streamed information processing,
with typical discovery-to-service latencies of just a few seconds. In
terms of scale and availability, Omid’s new design was largely driven
by Sieve’s requirements.


At Yahoo, we are also making an effort to disseminate the current
status of the project through blog entries (See [3], [4] and [5]) and
submissions to technical and academic conferences such as ATC 2016,
Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
appeared in a TechCrunch article in the last quarter of 2015 (See [6])


[1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
Distributed Transactions and Notifications. USENIX Symposium on
Operating Systems Design and Implementation, 2010

[2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
Omid: Lock-free transactional support for distributed data stores. In
Proc. of ICDE, 2013.

[3] http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for

[4] http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol

[5] http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid

[6] http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/


=== Rationale ===

Programming with ACID (Atomicity, Consistency, Isolation, Durability)
transactions is very popular and it is featured in relational
databases. However, in the Big Data ecosystem, applications typically
use NoSQL datastores, which do not provide ACID transactions. Such
NoSQL datastores used to give up transactional support for greater
agility and scalability. However, while early NoSQL data store
implementations did not include transaction support, the need for
transactions soon emerged in Big Data applications when accessing
shared data; for  example, transactions are very important  for
modern, scalable systems that process content incrementally.


NoSQL datastores -including HBase- don’t provide transactional
frameworks to coordinate the access to the underlying data for
preserving consistency. By using Omid, Big Data applications that need
to bundle multiple read and write operations on HBase into logically
indivisible units of work can execute transactions with ACID
properties, just as they would use transactions in the relational
database world. Omid extends the HBase key-value access APl with
transaction semantics. It can be exercised either directly, or via
higher level data management API’s. For example, Apache Phoenix
(SQL-on-top-of-HBase) might use Omid as its transaction management
component.


The following features make Omid an attractive choice for system
designers and other projects in the Apache community:


* Semantics. Omid implements Snapshot Isolation (SI,) supported by
major SQL and NoSQL technologies (e.g. Google Percolator).


* Performance and Scalability. Omid  provides a highly scalable,
lock-free implementation of SI. To the best of our knowledge, it is
also one of the few open source NoSQL transactional platforms that can
execute more than 100K transactions per second [1]. A new prototype
still in development can go even further, up to ~380K TPS.


* Reliability.  Omid has a high-availability (HA) mode, in which the
core service performing writeset conflict resolution operates as
primary-backup process pair with automatic failover. The HA support
has zero overhead on the mainstream operation.


* Adaptability. Omid current version provides transactions on data
stored in Apache HBase. However, Omid’s components are generic enough
to be adapted to any other key-value NoSQL datasource that supports
MVCC.


* Development. Omid provides a very simple interface that mimics
standard HBase APIs, making it developer friendly. Only minimal
extensions to the standard interfaces have been introduced to enable
transactions.


* Simplicity. Omid leverages the HBase infrastructure for managing its
own metadata. It entails no additional services apart from those
provided and used by HBase.


* Track Record. As we have mentioned, Omid is already in use by
very-large-scale production systems at Yahoo. Also, Hortonworks is
integrating Omid in a metastore implementation for Hive based on
HBase.

[1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance


=== Current Status ===
Current Omid implementation is available in both, Yahoo’s internal
Github repository for internal use at Yahoo as well as in Yahoo’s
Github public repository (https://github.com/yahoo/omid.git). Both
repositories are managed by Omid’s current developers at Yahoo.

As it is mentioned above, Yahoo is currently using Omid for providing
transactions in Sieve, a web-scale content management platform that
powers Yahoo’s next-generation search and personalization products.


==== Meritocracy ====
The first version of Omid was originally created in 2011 by Maysam
Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.


During the years after its inception, Omid has matured to operate at
Web scale and has been used internally by strategic projects at Yahoo
such as Sieve. The current base of committers belong to the Yahoo team
that took over the initial Omid prototype and rewrote it to meet the
high availability and scalability requirements of the Sieve project.
This base of committers has recently incorporated Hortonworks members
that helped in the Omid adaptation to HBase 1.x versions.


With this initial committer base, we aim to form a larger community
that can collaborate with new ideas over the current code base. This
new community will run the project following the "Apache Way"
(http://apache.org/foundation/governance/). Users and new contributors
will be treated with respect and welcomed. To grow the community, we
will encourage contributors to provide patches, review code, propose
new features improvements, talk at conferences such as Hadoop Summit,
HBaseCon, ApacheCon, etc. Committership and PMC membership will be
offered according to meritocracy.

==== Community ====

The public Yahoo Omid repository at Github currently has 241 Stars and
93 forks, which means that there is an important interest for the
project in the open-source community, at least compared with other
similar projects (See https://github.com/yahoo/omid.git).


Recently, Hortonworks contributors to the Apache Hive project which
are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
manifested interest in using Omid. We started with them a fruitful
collaboration that resulted in Omid supporting HBase 1.x versions.


Salesforce is also interested in collaborating in doing a Proof of
Concept for integrating Omid as a pluggable transaction manager in
Apache Phoenix.


Yahoo, Hortonworks and Salesforce participants will constitute the
initial set of committers and mentors for the proposal.

==== Core Developers ====
The core developers of Omid are all skilled software developers and
research engineers at Yahoo Inc. and Hortonworks with years of
experiences in their fields. At this moment, developers are
distributed across U.S. and Israel. The aim is to incorporate more
committers from different organizations and locations over time.


The current set of developers include experienced committers from
Apache HBase, Hive and Hadoop projects that have been working with us
in the current codebase found in Github.

Finally, some of the core developers are currently NOT affiliated with
the ASF and would require new ICLAs to be filed.


=== Alignment ===
Omid enhances with transactions the already successful Apache HBase
datastore project. We have collaborated with other developers inside
and outside Yahoo which are involved in the Apache HBase community, so
we have had reliable feedback from them.

Although Omid brings value into HBase, the design of the current
version provides a general transaction scheme that can potentially be
adapted to other MVCC key-value datastores such as Apache Cassandra.


Apache Phoenix is also a potential target. Phoenix is a SQL layer on
top of HBase that can potentially integrate Omid in order to provide
the well-know concept of transactions to Phoenix-based applications.


=== Known Risks ===
==== Orphaned products ====
Yahoo’s Research and Search organizations have been taking care of
Omid development since the first prototype creation in 2011. Yahoo has
a long history participating in open-source projects, and has been
also a long time contributor to the Apache community. For example, in
Apache, Yahoo is an important contributor in many projects in the
Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
open-sourced other well-known projects outside Hadoop, such as
Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
Omid also a successful open-source Apache product. If this happens, we
are sure that a larger community will be formed around the project in
a relatively short period of time, contributing to the diversification
and stabilization of the base of committers.


==== Inexperience with Open Source ====
This project has long standing experienced mentors and interested
contributors from Apache HBase, Hive and Phoenix to help us moving
through the open source process. We are actively working with
experienced Apache community members to improve our project and
further testing.

==== Homogeneous Developers ====
Omid has been supported by Yahoo since its inception in 2011. However,
all current committers are employed by their respective companies
shown in the Affiliations section.


==== Reliance on Salaried Developers ====

All the current developers are paid by their employers to contribute
to this project. Yahoo developers will also continuing maintaining the
internal Omid repository at their company.

Of course, other developers are welcomed to contribute to this project
after it is open sourced in Apache.

==== Relationships with Other Apache Product ====

Current Omid incarnation serves transactional contexts to applications
storing their data in HBase. However Omid design potentially allows to
be adapted to serve transactions on top of other MVCC-based key-value
datastores in Apache community such as Cassandra.


As a transactional framework, many other Apache projects such as
Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
potentially benefit from Omid to get transactional contexts. In
particular, Apache Phoenix -a SQL layer on top of HBase- might use
Omid as its transaction management component. Once we open source Omid
as an Apache project, we expect to generate more interest in the
surrounded communities.


Very recently, a new incubator proposal for a similar project called
Tephra, has been submitted to the ASF. We think this is good for the
Apache community, and we believe that there’s room for both proposals
as the design of each of them is based on different principles (e.g.
Omid does not require to maintain the state of ongoing transactions on
the server-side component) and due to the fact that both -Tephra and
Omid- have also gained certain traction in the open-source community.


With regard to the Apache projects that Omid uses, apart from HBase,
Omid relies on Apache Zookeeper and Curator projects in order to
coordinate the (re)connection of transaction managers (acting as
clients) to the conflict resolution component for transactions (server
side.) They’re also used in order to coordinate the master and backup
replicas in high availability scenarios.


==== An Excessive Fascination with the Apache Brand ====

We are applying to the Incubator process because we think that it is
the logical next step for the  Omid project after we open-sourced the
code in Github some years ago. Yahoo has a long-standing history of
contributing to Apache projects. The developers and contributors
understand the implications of making it an Apache project, and
strongly believe that the growing community can benefit from the
Apache environment, ecosystem, and infrastrastructure.


=== Documentation ===
Current documentation about the project is available in the wiki of
Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
be moved under https://omid.incubator.apache.org/docs if the project
is accepted as an Apache Incubator.

=== Initial Source ===
Initial source code is currently hosted in Github for general viewing
and contribution:

https://github.com/yahoo/omid.git


Omid source code is written in Java code (99%) mixed with some shell
script (1%) in order to configure and trigger the execution of main
components.


The code will be moved to Apache http://git.apache.org/ if accepted as
an Incubator project.

=== Source and Intellectual Property Submission Plan ===

The current Omid License for the code published in Github is Apache
2.0. If Omid fulfills and passes the conditions for being an Incubator
project in the ASF, the source code will be transitioned via the
Software Grant Agreement onto the ASF infrastructure and in turn made
available under the Apache License, version 2.0.

=== External Dependencies ===


The required external dependencies that are not Apache projects are
all Apache licenses or other compatible Licenses:

Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]

JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]

Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]

Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]

Testng v6.8.8  (http://testng.org) [Apache 2.0]

SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]

Netty (http://netty.io) v3.2.6.Final [Apache 2.0]

Google Protocol Buffers v2.5.0
(https://developers.google.com/protocol-buffers/) [BSD License]

Mockito (http://mockito.org/) v1.9.5 [MIT License]

LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/) [Apache 2.0]

Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
(http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]

C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]

Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]


=== Cryptography ===
Omid project does not use cryptography itself. However, Apache HBase
-the datastore on top of which Omid works in its current version- uses
standard APIs and tools for SSH and SSL communication where necessary.

=== Required Resources ===
We request that following resources be created for the project to use:

==== Mailing lists ====

omid-private (moderated subscriptions)

omid-commits (commit notification)
omid-dev (technical discussions)

==== Git repository ====
https://github.com/apache/incubator-omid

==== Documentation ====
https://omid.incubator.apache.org/docs/

==== JIRA instance ====
https://issues.apache.org/jira/browse/omid

=== Initial Committers ===

* Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)


* Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)


* Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)


* Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)


* Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)


* Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)

* Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)


* Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)


* Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)


* Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)

* James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)


=== Additional Interested Contributors ===
* Ivan Kelly (ivank<AT>apache<DOT>org)

* Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)


=== Affiliations ===

* Edward Bortnikov, Yahoo Inc.


* Daniel Dai, Hortonworks


* Flavio P. Junqueira, Confluent


* Igor Katkov, Yahoo Inc.


* Ivan Kelly, Midokura


* Francis C. Liu, Yahoo Inc.


* Sameer Paranjpye, Arimo

* Francisco Perez-Sorrosal, Yahoo Inc.


* Ohad Shacham, Yahoo Inc.


* Maysam Yabandeh, Dropbox Inc.


=== Sponsors ===

==== Champion ====

Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)

==== Nominated Mentors ====

Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)

Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)

Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)

Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)

James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)


==== Sponsoring Entity ====
Apache Incubator PMC

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Chris Nauroth <cn...@hortonworks.com>.
+1 (binding)

--Chris Nauroth




On 3/17/16, 1:17 PM, "Daniel Dai" <da...@gmail.com> wrote:

>Hi,
>
>I would like to propose Omid as an Apache Incubator project:
>
>https://wiki.apache.org/incubator/OmidProposal
>
>I've posted posted the text of the proposal below:
>
>Thanks,
>Daniel
>
>= Omid Proposal =
>
>=== Abstract ===
>
>Omid is a flexible, reliable, high performant and scalable ACID
>transactional framework that allows client applications to execute
>transactions on top of MVCC key/value-based NoSQL datastores
>(currently Apache HBase) providing Snapshot Isolation guarantees on
>the accessed data.
>
>
>=== Proposal ===
>
>Omid is a flexible open-source transactional framework that provides
>ACID transactions with Snapshot Isolation guarantees on top of NoSQL
>datastores. In particular, the current codebase brings the concept of
>transactions to the popular Apache HBase datastore. Omid offers great
>performance, it is highly available, and scalable. Omid's current
>version is able to scale to thousands of clients triggering concurrent
>transactions on application data stored in HBase. Omid can scale
>beyond 100K transactions per second on mid-range hardware while
>incurring in a minimal impact on the speed of data access in the
>datastore. We¹re currently experimenting with a prototype version that
>can improve the performance up to ~380K TPS.
>
>
>Omid has been publicly available as an open-source project in Github
>under Apache License Version 2.0 since 2011 [1]. During these years,
>it has generated certain interest in the open source community,
>especially since the public presentation of the first version in
>Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
>93 forks. Yahoo Inc. submits this proposal to the Apache Software
>Foundation with the aim to transfer the Omid project -including its
>source code and documentation- to Apache in order to start the build
>of a stable open source community around it.
>
>
>[1] https://github.com/yahoo/omid
>
>[2] Omid presentation at Hadoop Summit 2013:
>https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyq
>LU464Nxz4aQe7EPBus
>
>
>=== Background ===
>
>An Omid prototype was first released as an open-source project back in
>2011. Inspired by Google Percolator [1], it offered a lock-free
>approach to transactions in NoSQL datastores (See [2]). However,
>during these years, the design of Omid has evolved significantly.
>Whilst the current open-sourced version maintains many aspects of the
>original implementation, it is the result of a major redesign of the
>first prototype released in 2011.
>
>
>Omid has now a more decentralized design that does not sacrifice the
>consistency and performance of the original version. The current
>design also enables Omid to scale to thousands of clients executing
>transactions concurrently on application data stored in HBase.
>Internally, Omid still utilizes a lock-free approach to support
>multiple concurrent clients. Its design also relies on a centralized
>conflict detection component, the TSO, which now resolves in an
>efficient manner writeset collisions among concurrent transactions
>without having to piggyback commit information to the clients. Another
>important benefit of Omid is that it doesn't require any modification
>of the underlying key-value datastore, HBase in this case. Moreover,
>the recently added high availability algorithm allows to eliminate the
>single point of failure represented by the TSO in those system
>deployments requiring a higher degree of dependability. Last but not
>least, the provided user API is very simple, mimicking transaction
>managers in the relational world: begin, commit, rollback.
>
>
>Omid is used internally at Yahoo. Sieve, Yahoo¹s web-scale content
>management platform powering some of next-generation search and
>personalization products is using Omid as a transaction manager in its
>processing pipeline. Sieve essentially acts as a huge processing hub
>between content feeds and serving systems. It provides an environment
>for highly customizable, real-time, streamed information processing,
>with typical discovery-to-service latencies of just a few seconds. In
>terms of scale and availability, Omid¹s new design was largely driven
>by Sieve¹s requirements.
>
>
>At Yahoo, we are also making an effort to disseminate the current
>status of the project through blog entries (See [3], [4] and [5]) and
>submissions to technical and academic conferences such as ATC 2016,
>Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
>appeared in a TechCrunch article in the last quarter of 2015 (See [6])
>
>
>[1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
>Distributed Transactions and Notifications. USENIX Symposium on
>Operating Systems Design and Implementation, 2010
>
>[2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
>Omid: Lock-free transactional support for distributed data stores. In
>Proc. of ICDE, 2013.
>
>[3] 
>http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transacti
>on-processing-for
>
>[4] 
>http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-prot
>ocol
>
>[5] 
>http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
>
>[6] 
>http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-sc
>alable-transaction-processing-to-hbase/
>
>
>=== Rationale ===
>
>Programming with ACID (Atomicity, Consistency, Isolation, Durability)
>transactions is very popular and it is featured in relational
>databases. However, in the Big Data ecosystem, applications typically
>use NoSQL datastores, which do not provide ACID transactions. Such
>NoSQL datastores used to give up transactional support for greater
>agility and scalability. However, while early NoSQL data store
>implementations did not include transaction support, the need for
>transactions soon emerged in Big Data applications when accessing
>shared data; for  example, transactions are very important  for
>modern, scalable systems that process content incrementally.
>
>
>NoSQL datastores -including HBase- don¹t provide transactional
>frameworks to coordinate the access to the underlying data for
>preserving consistency. By using Omid, Big Data applications that need
>to bundle multiple read and write operations on HBase into logically
>indivisible units of work can execute transactions with ACID
>properties, just as they would use transactions in the relational
>database world. Omid extends the HBase key-value access APl with
>transaction semantics. It can be exercised either directly, or via
>higher level data management API¹s. For example, Apache Phoenix
>(SQL-on-top-of-HBase) might use Omid as its transaction management
>component.
>
>
>The following features make Omid an attractive choice for system
>designers and other projects in the Apache community:
>
>
>* Semantics. Omid implements Snapshot Isolation (SI,) supported by
>major SQL and NoSQL technologies (e.g. Google Percolator).
>
>
>* Performance and Scalability. Omid  provides a highly scalable,
>lock-free implementation of SI. To the best of our knowledge, it is
>also one of the few open source NoSQL transactional platforms that can
>execute more than 100K transactions per second [1]. A new prototype
>still in development can go even further, up to ~380K TPS.
>
>
>* Reliability.  Omid has a high-availability (HA) mode, in which the
>core service performing writeset conflict resolution operates as
>primary-backup process pair with automatic failover. The HA support
>has zero overhead on the mainstream operation.
>
>
>* Adaptability. Omid current version provides transactions on data
>stored in Apache HBase. However, Omid¹s components are generic enough
>to be adapted to any other key-value NoSQL datasource that supports
>MVCC.
>
>
>* Development. Omid provides a very simple interface that mimics
>standard HBase APIs, making it developer friendly. Only minimal
>extensions to the standard interfaces have been introduced to enable
>transactions.
>
>
>* Simplicity. Omid leverages the HBase infrastructure for managing its
>own metadata. It entails no additional services apart from those
>provided and used by HBase.
>
>
>* Track Record. As we have mentioned, Omid is already in use by
>very-large-scale production systems at Yahoo. Also, Hortonworks is
>integrating Omid in a metastore implementation for Hive based on
>HBase.
>
>[1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
>
>
>=== Current Status ===
>Current Omid implementation is available in both, Yahoo¹s internal
>Github repository for internal use at Yahoo as well as in Yahoo¹s
>Github public repository (https://github.com/yahoo/omid.git). Both
>repositories are managed by Omid¹s current developers at Yahoo.
>
>As it is mentioned above, Yahoo is currently using Omid for providing
>transactions in Sieve, a web-scale content management platform that
>powers Yahoo¹s next-generation search and personalization products.
>
>
>==== Meritocracy ====
>The first version of Omid was originally created in 2011 by Maysam
>Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
>Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>
>
>During the years after its inception, Omid has matured to operate at
>Web scale and has been used internally by strategic projects at Yahoo
>such as Sieve. The current base of committers belong to the Yahoo team
>that took over the initial Omid prototype and rewrote it to meet the
>high availability and scalability requirements of the Sieve project.
>This base of committers has recently incorporated Hortonworks members
>that helped in the Omid adaptation to HBase 1.x versions.
>
>
>With this initial committer base, we aim to form a larger community
>that can collaborate with new ideas over the current code base. This
>new community will run the project following the "Apache Way"
>(http://apache.org/foundation/governance/). Users and new contributors
>will be treated with respect and welcomed. To grow the community, we
>will encourage contributors to provide patches, review code, propose
>new features improvements, talk at conferences such as Hadoop Summit,
>HBaseCon, ApacheCon, etc. Committership and PMC membership will be
>offered according to meritocracy.
>
>==== Community ====
>
>The public Yahoo Omid repository at Github currently has 241 Stars and
>93 forks, which means that there is an important interest for the
>project in the open-source community, at least compared with other
>similar projects (See https://github.com/yahoo/omid.git).
>
>
>Recently, Hortonworks contributors to the Apache Hive project which
>are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
>manifested interest in using Omid. We started with them a fruitful
>collaboration that resulted in Omid supporting HBase 1.x versions.
>
>
>Salesforce is also interested in collaborating in doing a Proof of
>Concept for integrating Omid as a pluggable transaction manager in
>Apache Phoenix.
>
>
>Yahoo, Hortonworks and Salesforce participants will constitute the
>initial set of committers and mentors for the proposal.
>
>==== Core Developers ====
>The core developers of Omid are all skilled software developers and
>research engineers at Yahoo Inc. and Hortonworks with years of
>experiences in their fields. At this moment, developers are
>distributed across U.S. and Israel. The aim is to incorporate more
>committers from different organizations and locations over time.
>
>
>The current set of developers include experienced committers from
>Apache HBase, Hive and Hadoop projects that have been working with us
>in the current codebase found in Github.
>
>Finally, some of the core developers are currently NOT affiliated with
>the ASF and would require new ICLAs to be filed.
>
>
>=== Alignment ===
>Omid enhances with transactions the already successful Apache HBase
>datastore project. We have collaborated with other developers inside
>and outside Yahoo which are involved in the Apache HBase community, so
>we have had reliable feedback from them.
>
>Although Omid brings value into HBase, the design of the current
>version provides a general transaction scheme that can potentially be
>adapted to other MVCC key-value datastores such as Apache Cassandra.
>
>
>Apache Phoenix is also a potential target. Phoenix is a SQL layer on
>top of HBase that can potentially integrate Omid in order to provide
>the well-know concept of transactions to Phoenix-based applications.
>
>
>=== Known Risks ===
>==== Orphaned products ====
>Yahoo¹s Research and Search organizations have been taking care of
>Omid development since the first prototype creation in 2011. Yahoo has
>a long history participating in open-source projects, and has been
>also a long time contributor to the Apache community. For example, in
>Apache, Yahoo is an important contributor in many projects in the
>Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
>open-sourced other well-known projects outside Hadoop, such as
>Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
>Omid also a successful open-source Apache product. If this happens, we
>are sure that a larger community will be formed around the project in
>a relatively short period of time, contributing to the diversification
>and stabilization of the base of committers.
>
>
>==== Inexperience with Open Source ====
>This project has long standing experienced mentors and interested
>contributors from Apache HBase, Hive and Phoenix to help us moving
>through the open source process. We are actively working with
>experienced Apache community members to improve our project and
>further testing.
>
>==== Homogeneous Developers ====
>Omid has been supported by Yahoo since its inception in 2011. However,
>all current committers are employed by their respective companies
>shown in the Affiliations section.
>
>
>==== Reliance on Salaried Developers ====
>
>All the current developers are paid by their employers to contribute
>to this project. Yahoo developers will also continuing maintaining the
>internal Omid repository at their company.
>
>Of course, other developers are welcomed to contribute to this project
>after it is open sourced in Apache.
>
>==== Relationships with Other Apache Product ====
>
>Current Omid incarnation serves transactional contexts to applications
>storing their data in HBase. However Omid design potentially allows to
>be adapted to serve transactions on top of other MVCC-based key-value
>datastores in Apache community such as Cassandra.
>
>
>As a transactional framework, many other Apache projects such as
>Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
>potentially benefit from Omid to get transactional contexts. In
>particular, Apache Phoenix -a SQL layer on top of HBase- might use
>Omid as its transaction management component. Once we open source Omid
>as an Apache project, we expect to generate more interest in the
>surrounded communities.
>
>
>Very recently, a new incubator proposal for a similar project called
>Tephra, has been submitted to the ASF. We think this is good for the
>Apache community, and we believe that there¹s room for both proposals
>as the design of each of them is based on different principles (e.g.
>Omid does not require to maintain the state of ongoing transactions on
>the server-side component) and due to the fact that both -Tephra and
>Omid- have also gained certain traction in the open-source community.
>
>
>With regard to the Apache projects that Omid uses, apart from HBase,
>Omid relies on Apache Zookeeper and Curator projects in order to
>coordinate the (re)connection of transaction managers (acting as
>clients) to the conflict resolution component for transactions (server
>side.) They¹re also used in order to coordinate the master and backup
>replicas in high availability scenarios.
>
>
>==== An Excessive Fascination with the Apache Brand ====
>
>We are applying to the Incubator process because we think that it is
>the logical next step for the  Omid project after we open-sourced the
>code in Github some years ago. Yahoo has a long-standing history of
>contributing to Apache projects. The developers and contributors
>understand the implications of making it an Apache project, and
>strongly believe that the growing community can benefit from the
>Apache environment, ecosystem, and infrastrastructure.
>
>
>=== Documentation ===
>Current documentation about the project is available in the wiki of
>Omid¹s Github repository: https://github.com/yahoo/omid/wiki . It will
>be moved under https://omid.incubator.apache.org/docs if the project
>is accepted as an Apache Incubator.
>
>=== Initial Source ===
>Initial source code is currently hosted in Github for general viewing
>and contribution:
>
>https://github.com/yahoo/omid.git
>
>
>Omid source code is written in Java code (99%) mixed with some shell
>script (1%) in order to configure and trigger the execution of main
>components.
>
>
>The code will be moved to Apache http://git.apache.org/ if accepted as
>an Incubator project.
>
>=== Source and Intellectual Property Submission Plan ===
>
>The current Omid License for the code published in Github is Apache
>2.0. If Omid fulfills and passes the conditions for being an Incubator
>project in the ASF, the source code will be transitioned via the
>Software Grant Agreement onto the ASF infrastructure and in turn made
>available under the Apache License, version 2.0.
>
>=== External Dependencies ===
>
>
>The required external dependencies that are not Apache projects are
>all Apache licenses or other compatible Licenses:
>
>Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
>
>JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
>
>Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>
>Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
>
>Testng v6.8.8  (http://testng.org) [Apache 2.0]
>
>SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>
>Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>
>Google Protocol Buffers v2.5.0
>(https://developers.google.com/protocol-buffers/) [BSD License]
>
>Mockito (http://mockito.org/) v1.9.5 [MIT License]
>
>LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
>[Apache 2.0]
>
>Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
>(http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>
>C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>
>Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>
>
>=== Cryptography ===
>Omid project does not use cryptography itself. However, Apache HBase
>-the datastore on top of which Omid works in its current version- uses
>standard APIs and tools for SSH and SSL communication where necessary.
>
>=== Required Resources ===
>We request that following resources be created for the project to use:
>
>==== Mailing lists ====
>
>omid-private (moderated subscriptions)
>
>omid-commits (commit notification)
>omid-dev (technical discussions)
>
>==== Git repository ====
>https://github.com/apache/incubator-omid
>
>==== Documentation ====
>https://omid.incubator.apache.org/docs/
>
>==== JIRA instance ====
>https://issues.apache.org/jira/browse/omid
>
>=== Initial Committers ===
>
>* Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>
>
>* Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>
>
>* Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>
>
>* Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>
>
>* Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>
>
>* Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>
>* Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>
>
>* Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>
>
>* Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>
>
>* Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>
>* James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>
>
>=== Additional Interested Contributors ===
>* Ivan Kelly (ivank<AT>apache<DOT>org)
>
>* Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>
>
>=== Affiliations ===
>
>* Edward Bortnikov, Yahoo Inc.
>
>
>* Daniel Dai, Hortonworks
>
>
>* Flavio P. Junqueira, Confluent
>
>
>* Igor Katkov, Yahoo Inc.
>
>
>* Ivan Kelly, Midokura
>
>
>* Francis C. Liu, Yahoo Inc.
>
>
>* Sameer Paranjpye, Arimo
>
>* Francisco Perez-Sorrosal, Yahoo Inc.
>
>
>* Ohad Shacham, Yahoo Inc.
>
>
>* Maysam Yabandeh, Dropbox Inc.
>
>
>=== Sponsors ===
>
>==== Champion ====
>
>Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>
>==== Nominated Mentors ====
>
>Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>
>Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>
>Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>
>Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>
>James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>
>
>==== Sponsoring Entity ====
>Apache Incubator PMC
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>For additional commands, e-mail: general-help@incubator.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Daniel Dai <da...@gmail.com>.
Thanks for the comments. Though there are already lots of +1s, this
thread is not intended for voting. I will send out the voting email
shortly.

Daniel

On Mon, Mar 21, 2016 at 1:08 PM, James Taylor <ja...@apache.org> wrote:
> That'd be great to allow different transaction frameworks to plug into
> Phoenix. I suspect that transactions are in the same boat as secondary
> indexing with a one-size-fits-all approach not being feasible across the
> variety of use cases we see. Having a pluggable mechanism would be a good
> solution. I've filed PHOENIX-2788 [1] for this work. Though, of course it
> helps that a transaction layer works with HBase, much of the integration
> work is at the Phoenix level. To get an idea, see [2]. There are several
> features missing in HBase that would be precursors to HBASE-11447 IMHO.
> Namely support for undo of a Delete [3] and finer timestamp granularity for
> Cells [4].
>
>     James
>
> [1] https://issues.apache.org/jira/browse/PHOENIX-2788
> [2] https://github.com/apache/phoenix/pull/133
> [3] https://issues.apache.org/jira/browse/HBASE-11292
> [4] https://issues.apache.org/jira/browse/HBASE-8927
>
>
> On Mon, Mar 21, 2016 at 12:48 PM, Henry Saputra <he...@gmail.com>
> wrote:
>
>> Hi Pierre,
>>
>> Thanks for your reply. Yes, I remember Trafodion, but since it is more
>> complete solution of SQL + Transaction, I did not mention it as
>> comparisons.
>>
>> But the comment is valid, meaning there were already prior acts about
>> Transaction support for NoSQL in Apache, so no reason to "reject" or as
>> immediate consolidation of such projects into incubator.
>>
>>
>> - Henry
>>
>> On Sun, Mar 20, 2016 at 1:34 PM, Pierre Smits <pi...@gmail.com>
>> wrote:
>>
>> > Hi Henry,
>> >
>> > It seems you (and several others) are forgetting the Trafodion, which
>> also
>> > privides transactions on N*SQL solutions, see http trafodion.apache.org
>> >
>> > Best regards,
>> >
>> > Pierre Smits
>> >
>> > ORRTIZ.COM <http://www.orrtiz.com>
>> > OFBiz based solutions & services
>> >
>> > OFBiz Extensions Marketplace
>> > http://oem.ofbizci.net/oci-2/
>> >
>> > On Sat, Mar 19, 2016 at 12:19 AM, Henry Saputra <henry.saputra@gmail.com
>> >
>> > wrote:
>> >
>> > > I know Apache incubator does not play favorite but it is getting
>> awkward
>> > > that TWO transaction engine for HBase coming to incubator at the same
>> > time.
>> > >
>> > > As most people know, the other one is Tephra, that just coming to
>> > incubator
>> > > few weeks ago.
>> > >
>> > > As member of IPMC, I would like to see Omid provide some more details
>> > > comparisons about the difference that the project bring,  in term of
>> > > approach and possible integrations with other ASF projects.
>> > >
>> > > If possible, I would prefer to see Omid team work together with Tephra
>> to
>> > > work on working together to make one solid transaction engine for HBase
>> > and
>> > > later NoSQL databases.
>> > >
>> > >
>> > > - Henry
>> > >
>> > > On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <da...@gmail.com> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I would like to propose Omid as an Apache Incubator project:
>> > > >
>> > > > https://wiki.apache.org/incubator/OmidProposal
>> > > >
>> > > > I've posted posted the text of the proposal below:
>> > > >
>> > > > Thanks,
>> > > > Daniel
>> > > >
>> > > > = Omid Proposal =
>> > > >
>> > > > === Abstract ===
>> > > >
>> > > > Omid is a flexible, reliable, high performant and scalable ACID
>> > > > transactional framework that allows client applications to execute
>> > > > transactions on top of MVCC key/value-based NoSQL datastores
>> > > > (currently Apache HBase) providing Snapshot Isolation guarantees on
>> > > > the accessed data.
>> > > >
>> > > >
>> > > > === Proposal ===
>> > > >
>> > > > Omid is a flexible open-source transactional framework that provides
>> > > > ACID transactions with Snapshot Isolation guarantees on top of NoSQL
>> > > > datastores. In particular, the current codebase brings the concept of
>> > > > transactions to the popular Apache HBase datastore. Omid offers great
>> > > > performance, it is highly available, and scalable. Omid's current
>> > > > version is able to scale to thousands of clients triggering
>> concurrent
>> > > > transactions on application data stored in HBase. Omid can scale
>> > > > beyond 100K transactions per second on mid-range hardware while
>> > > > incurring in a minimal impact on the speed of data access in the
>> > > > datastore. We’re currently experimenting with a prototype version
>> that
>> > > > can improve the performance up to ~380K TPS.
>> > > >
>> > > >
>> > > > Omid has been publicly available as an open-source project in Github
>> > > > under Apache License Version 2.0 since 2011 [1]. During these years,
>> > > > it has generated certain interest in the open source community,
>> > > > especially since the public presentation of the first version in
>> > > > Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars
>> and
>> > > > 93 forks. Yahoo Inc. submits this proposal to the Apache Software
>> > > > Foundation with the aim to transfer the Omid project -including its
>> > > > source code and documentation- to Apache in order to start the build
>> > > > of a stable open source community around it.
>> > > >
>> > > >
>> > > > [1] https://github.com/yahoo/omid
>> > > >
>> > > > [2] Omid presentation at Hadoop Summit 2013:
>> > > >
>> > > >
>> > >
>> >
>> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
>> > > >
>> > > >
>> > > > === Background ===
>> > > >
>> > > > An Omid prototype was first released as an open-source project back
>> in
>> > > > 2011. Inspired by Google Percolator [1], it offered a lock-free
>> > > > approach to transactions in NoSQL datastores (See [2]). However,
>> > > > during these years, the design of Omid has evolved significantly.
>> > > > Whilst the current open-sourced version maintains many aspects of the
>> > > > original implementation, it is the result of a major redesign of the
>> > > > first prototype released in 2011.
>> > > >
>> > > >
>> > > > Omid has now a more decentralized design that does not sacrifice the
>> > > > consistency and performance of the original version. The current
>> > > > design also enables Omid to scale to thousands of clients executing
>> > > > transactions concurrently on application data stored in HBase.
>> > > > Internally, Omid still utilizes a lock-free approach to support
>> > > > multiple concurrent clients. Its design also relies on a centralized
>> > > > conflict detection component, the TSO, which now resolves in an
>> > > > efficient manner writeset collisions among concurrent transactions
>> > > > without having to piggyback commit information to the clients.
>> Another
>> > > > important benefit of Omid is that it doesn't require any modification
>> > > > of the underlying key-value datastore, HBase in this case. Moreover,
>> > > > the recently added high availability algorithm allows to eliminate
>> the
>> > > > single point of failure represented by the TSO in those system
>> > > > deployments requiring a higher degree of dependability. Last but not
>> > > > least, the provided user API is very simple, mimicking transaction
>> > > > managers in the relational world: begin, commit, rollback.
>> > > >
>> > > >
>> > > > Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
>> > > > management platform powering some of next-generation search and
>> > > > personalization products is using Omid as a transaction manager in
>> its
>> > > > processing pipeline. Sieve essentially acts as a huge processing hub
>> > > > between content feeds and serving systems. It provides an environment
>> > > > for highly customizable, real-time, streamed information processing,
>> > > > with typical discovery-to-service latencies of just a few seconds. In
>> > > > terms of scale and availability, Omid’s new design was largely driven
>> > > > by Sieve’s requirements.
>> > > >
>> > > >
>> > > > At Yahoo, we are also making an effort to disseminate the current
>> > > > status of the project through blog entries (See [3], [4] and [5]) and
>> > > > submissions to technical and academic conferences such as ATC 2016,
>> > > > Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
>> > > > appeared in a TechCrunch article in the last quarter of 2015 (See
>> [6])
>> > > >
>> > > >
>> > > > [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
>> > > > Distributed Transactions and Notifications. USENIX Symposium on
>> > > > Operating Systems Design and Implementation, 2010
>> > > >
>> > > > [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
>> > > > Omid: Lock-free transactional support for distributed data stores. In
>> > > > Proc. of ICDE, 2013.
>> > > >
>> > > > [3]
>> > > >
>> > >
>> >
>> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
>> > > >
>> > > > [4]
>> > > >
>> > >
>> >
>> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
>> > > >
>> > > > [5]
>> > > >
>> > >
>> >
>> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
>> > > >
>> > > > [6]
>> > > >
>> > >
>> >
>> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
>> > > >
>> > > >
>> > > > === Rationale ===
>> > > >
>> > > > Programming with ACID (Atomicity, Consistency, Isolation, Durability)
>> > > > transactions is very popular and it is featured in relational
>> > > > databases. However, in the Big Data ecosystem, applications typically
>> > > > use NoSQL datastores, which do not provide ACID transactions. Such
>> > > > NoSQL datastores used to give up transactional support for greater
>> > > > agility and scalability. However, while early NoSQL data store
>> > > > implementations did not include transaction support, the need for
>> > > > transactions soon emerged in Big Data applications when accessing
>> > > > shared data; for  example, transactions are very important  for
>> > > > modern, scalable systems that process content incrementally.
>> > > >
>> > > >
>> > > > NoSQL datastores -including HBase- don’t provide transactional
>> > > > frameworks to coordinate the access to the underlying data for
>> > > > preserving consistency. By using Omid, Big Data applications that
>> need
>> > > > to bundle multiple read and write operations on HBase into logically
>> > > > indivisible units of work can execute transactions with ACID
>> > > > properties, just as they would use transactions in the relational
>> > > > database world. Omid extends the HBase key-value access APl with
>> > > > transaction semantics. It can be exercised either directly, or via
>> > > > higher level data management API’s. For example, Apache Phoenix
>> > > > (SQL-on-top-of-HBase) might use Omid as its transaction management
>> > > > component.
>> > > >
>> > > >
>> > > > The following features make Omid an attractive choice for system
>> > > > designers and other projects in the Apache community:
>> > > >
>> > > >
>> > > > * Semantics. Omid implements Snapshot Isolation (SI,) supported by
>> > > > major SQL and NoSQL technologies (e.g. Google Percolator).
>> > > >
>> > > >
>> > > > * Performance and Scalability. Omid  provides a highly scalable,
>> > > > lock-free implementation of SI. To the best of our knowledge, it is
>> > > > also one of the few open source NoSQL transactional platforms that
>> can
>> > > > execute more than 100K transactions per second [1]. A new prototype
>> > > > still in development can go even further, up to ~380K TPS.
>> > > >
>> > > >
>> > > > * Reliability.  Omid has a high-availability (HA) mode, in which the
>> > > > core service performing writeset conflict resolution operates as
>> > > > primary-backup process pair with automatic failover. The HA support
>> > > > has zero overhead on the mainstream operation.
>> > > >
>> > > >
>> > > > * Adaptability. Omid current version provides transactions on data
>> > > > stored in Apache HBase. However, Omid’s components are generic enough
>> > > > to be adapted to any other key-value NoSQL datasource that supports
>> > > > MVCC.
>> > > >
>> > > >
>> > > > * Development. Omid provides a very simple interface that mimics
>> > > > standard HBase APIs, making it developer friendly. Only minimal
>> > > > extensions to the standard interfaces have been introduced to enable
>> > > > transactions.
>> > > >
>> > > >
>> > > > * Simplicity. Omid leverages the HBase infrastructure for managing
>> its
>> > > > own metadata. It entails no additional services apart from those
>> > > > provided and used by HBase.
>> > > >
>> > > >
>> > > > * Track Record. As we have mentioned, Omid is already in use by
>> > > > very-large-scale production systems at Yahoo. Also, Hortonworks is
>> > > > integrating Omid in a metastore implementation for Hive based on
>> > > > HBase.
>> > > >
>> > > > [1] See also Haeinsa:
>> https://github.com/vcnc/haeinsa/wiki/Performance
>> > > >
>> > > >
>> > > > === Current Status ===
>> > > > Current Omid implementation is available in both, Yahoo’s internal
>> > > > Github repository for internal use at Yahoo as well as in Yahoo’s
>> > > > Github public repository (https://github.com/yahoo/omid.git). Both
>> > > > repositories are managed by Omid’s current developers at Yahoo.
>> > > >
>> > > > As it is mentioned above, Yahoo is currently using Omid for providing
>> > > > transactions in Sieve, a web-scale content management platform that
>> > > > powers Yahoo’s next-generation search and personalization products.
>> > > >
>> > > >
>> > > > ==== Meritocracy ====
>> > > > The first version of Omid was originally created in 2011 by Maysam
>> > > > Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
>> > > > Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>> > > >
>> > > >
>> > > > During the years after its inception, Omid has matured to operate at
>> > > > Web scale and has been used internally by strategic projects at Yahoo
>> > > > such as Sieve. The current base of committers belong to the Yahoo
>> team
>> > > > that took over the initial Omid prototype and rewrote it to meet the
>> > > > high availability and scalability requirements of the Sieve project.
>> > > > This base of committers has recently incorporated Hortonworks members
>> > > > that helped in the Omid adaptation to HBase 1.x versions.
>> > > >
>> > > >
>> > > > With this initial committer base, we aim to form a larger community
>> > > > that can collaborate with new ideas over the current code base. This
>> > > > new community will run the project following the "Apache Way"
>> > > > (http://apache.org/foundation/governance/). Users and new
>> contributors
>> > > > will be treated with respect and welcomed. To grow the community, we
>> > > > will encourage contributors to provide patches, review code, propose
>> > > > new features improvements, talk at conferences such as Hadoop Summit,
>> > > > HBaseCon, ApacheCon, etc. Committership and PMC membership will be
>> > > > offered according to meritocracy.
>> > > >
>> > > > ==== Community ====
>> > > >
>> > > > The public Yahoo Omid repository at Github currently has 241 Stars
>> and
>> > > > 93 forks, which means that there is an important interest for the
>> > > > project in the open-source community, at least compared with other
>> > > > similar projects (See https://github.com/yahoo/omid.git).
>> > > >
>> > > >
>> > > > Recently, Hortonworks contributors to the Apache Hive project which
>> > > > are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
>> > > > manifested interest in using Omid. We started with them a fruitful
>> > > > collaboration that resulted in Omid supporting HBase 1.x versions.
>> > > >
>> > > >
>> > > > Salesforce is also interested in collaborating in doing a Proof of
>> > > > Concept for integrating Omid as a pluggable transaction manager in
>> > > > Apache Phoenix.
>> > > >
>> > > >
>> > > > Yahoo, Hortonworks and Salesforce participants will constitute the
>> > > > initial set of committers and mentors for the proposal.
>> > > >
>> > > > ==== Core Developers ====
>> > > > The core developers of Omid are all skilled software developers and
>> > > > research engineers at Yahoo Inc. and Hortonworks with years of
>> > > > experiences in their fields. At this moment, developers are
>> > > > distributed across U.S. and Israel. The aim is to incorporate more
>> > > > committers from different organizations and locations over time.
>> > > >
>> > > >
>> > > > The current set of developers include experienced committers from
>> > > > Apache HBase, Hive and Hadoop projects that have been working with us
>> > > > in the current codebase found in Github.
>> > > >
>> > > > Finally, some of the core developers are currently NOT affiliated
>> with
>> > > > the ASF and would require new ICLAs to be filed.
>> > > >
>> > > >
>> > > > === Alignment ===
>> > > > Omid enhances with transactions the already successful Apache HBase
>> > > > datastore project. We have collaborated with other developers inside
>> > > > and outside Yahoo which are involved in the Apache HBase community,
>> so
>> > > > we have had reliable feedback from them.
>> > > >
>> > > > Although Omid brings value into HBase, the design of the current
>> > > > version provides a general transaction scheme that can potentially be
>> > > > adapted to other MVCC key-value datastores such as Apache Cassandra.
>> > > >
>> > > >
>> > > > Apache Phoenix is also a potential target. Phoenix is a SQL layer on
>> > > > top of HBase that can potentially integrate Omid in order to provide
>> > > > the well-know concept of transactions to Phoenix-based applications.
>> > > >
>> > > >
>> > > > === Known Risks ===
>> > > > ==== Orphaned products ====
>> > > > Yahoo’s Research and Search organizations have been taking care of
>> > > > Omid development since the first prototype creation in 2011. Yahoo
>> has
>> > > > a long history participating in open-source projects, and has been
>> > > > also a long time contributor to the Apache community. For example, in
>> > > > Apache, Yahoo is an important contributor in many projects in the
>> > > > Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
>> > > > open-sourced other well-known projects outside Hadoop, such as
>> > > > Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
>> > > > Omid also a successful open-source Apache product. If this happens,
>> we
>> > > > are sure that a larger community will be formed around the project in
>> > > > a relatively short period of time, contributing to the
>> diversification
>> > > > and stabilization of the base of committers.
>> > > >
>> > > >
>> > > > ==== Inexperience with Open Source ====
>> > > > This project has long standing experienced mentors and interested
>> > > > contributors from Apache HBase, Hive and Phoenix to help us moving
>> > > > through the open source process. We are actively working with
>> > > > experienced Apache community members to improve our project and
>> > > > further testing.
>> > > >
>> > > > ==== Homogeneous Developers ====
>> > > > Omid has been supported by Yahoo since its inception in 2011.
>> However,
>> > > > all current committers are employed by their respective companies
>> > > > shown in the Affiliations section.
>> > > >
>> > > >
>> > > > ==== Reliance on Salaried Developers ====
>> > > >
>> > > > All the current developers are paid by their employers to contribute
>> > > > to this project. Yahoo developers will also continuing maintaining
>> the
>> > > > internal Omid repository at their company.
>> > > >
>> > > > Of course, other developers are welcomed to contribute to this
>> project
>> > > > after it is open sourced in Apache.
>> > > >
>> > > > ==== Relationships with Other Apache Product ====
>> > > >
>> > > > Current Omid incarnation serves transactional contexts to
>> applications
>> > > > storing their data in HBase. However Omid design potentially allows
>> to
>> > > > be adapted to serve transactions on top of other MVCC-based key-value
>> > > > datastores in Apache community such as Cassandra.
>> > > >
>> > > >
>> > > > As a transactional framework, many other Apache projects such as
>> > > > Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
>> > > > potentially benefit from Omid to get transactional contexts. In
>> > > > particular, Apache Phoenix -a SQL layer on top of HBase- might use
>> > > > Omid as its transaction management component. Once we open source
>> Omid
>> > > > as an Apache project, we expect to generate more interest in the
>> > > > surrounded communities.
>> > > >
>> > > >
>> > > > Very recently, a new incubator proposal for a similar project called
>> > > > Tephra, has been submitted to the ASF. We think this is good for the
>> > > > Apache community, and we believe that there’s room for both proposals
>> > > > as the design of each of them is based on different principles (e.g.
>> > > > Omid does not require to maintain the state of ongoing transactions
>> on
>> > > > the server-side component) and due to the fact that both -Tephra and
>> > > > Omid- have also gained certain traction in the open-source community.
>> > > >
>> > > >
>> > > > With regard to the Apache projects that Omid uses, apart from HBase,
>> > > > Omid relies on Apache Zookeeper and Curator projects in order to
>> > > > coordinate the (re)connection of transaction managers (acting as
>> > > > clients) to the conflict resolution component for transactions
>> (server
>> > > > side.) They’re also used in order to coordinate the master and backup
>> > > > replicas in high availability scenarios.
>> > > >
>> > > >
>> > > > ==== An Excessive Fascination with the Apache Brand ====
>> > > >
>> > > > We are applying to the Incubator process because we think that it is
>> > > > the logical next step for the  Omid project after we open-sourced the
>> > > > code in Github some years ago. Yahoo has a long-standing history of
>> > > > contributing to Apache projects. The developers and contributors
>> > > > understand the implications of making it an Apache project, and
>> > > > strongly believe that the growing community can benefit from the
>> > > > Apache environment, ecosystem, and infrastrastructure.
>> > > >
>> > > >
>> > > > === Documentation ===
>> > > > Current documentation about the project is available in the wiki of
>> > > > Omid’s Github repository: https://github.com/yahoo/omid/wiki . It
>> will
>> > > > be moved under https://omid.incubator.apache.org/docs if the project
>> > > > is accepted as an Apache Incubator.
>> > > >
>> > > > === Initial Source ===
>> > > > Initial source code is currently hosted in Github for general viewing
>> > > > and contribution:
>> > > >
>> > > > https://github.com/yahoo/omid.git
>> > > >
>> > > >
>> > > > Omid source code is written in Java code (99%) mixed with some shell
>> > > > script (1%) in order to configure and trigger the execution of main
>> > > > components.
>> > > >
>> > > >
>> > > > The code will be moved to Apache http://git.apache.org/ if accepted
>> as
>> > > > an Incubator project.
>> > > >
>> > > > === Source and Intellectual Property Submission Plan ===
>> > > >
>> > > > The current Omid License for the code published in Github is Apache
>> > > > 2.0. If Omid fulfills and passes the conditions for being an
>> Incubator
>> > > > project in the ASF, the source code will be transitioned via the
>> > > > Software Grant Agreement onto the ASF infrastructure and in turn made
>> > > > available under the Apache License, version 2.0.
>> > > >
>> > > > === External Dependencies ===
>> > > >
>> > > >
>> > > > The required external dependencies that are not Apache projects are
>> > > > all Apache licenses or other compatible Licenses:
>> > > >
>> > > > Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
>> > > >
>> > > > JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
>> > > >
>> > > > Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>> > > >
>> > > > Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache
>> 2.0]
>> > > >
>> > > > Testng v6.8.8  (http://testng.org) [Apache 2.0]
>> > > >
>> > > > SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>> > > >
>> > > > Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>> > > >
>> > > > Google Protocol Buffers v2.5.0
>> > > > (https://developers.google.com/protocol-buffers/) [BSD License]
>> > > >
>> > > > Mockito (http://mockito.org/) v1.9.5 [MIT License]
>> > > >
>> > > > LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
>> > > > [Apache 2.0]
>> > > >
>> > > > Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
>> > > > (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>> > > >
>> > > > C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>> > > >
>> > > > Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>> > > >
>> > > >
>> > > > === Cryptography ===
>> > > > Omid project does not use cryptography itself. However, Apache HBase
>> > > > -the datastore on top of which Omid works in its current version-
>> uses
>> > > > standard APIs and tools for SSH and SSL communication where
>> necessary.
>> > > >
>> > > > === Required Resources ===
>> > > > We request that following resources be created for the project to
>> use:
>> > > >
>> > > > ==== Mailing lists ====
>> > > >
>> > > > omid-private (moderated subscriptions)
>> > > >
>> > > > omid-commits (commit notification)
>> > > > omid-dev (technical discussions)
>> > > >
>> > > > ==== Git repository ====
>> > > > https://github.com/apache/incubator-omid
>> > > >
>> > > > ==== Documentation ====
>> > > > https://omid.incubator.apache.org/docs/
>> > > >
>> > > > ==== JIRA instance ====
>> > > > https://issues.apache.org/jira/browse/omid
>> > > >
>> > > > === Initial Committers ===
>> > > >
>> > > > * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>> > > >
>> > > >
>> > > > * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>> > > >
>> > > >
>> > > > * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>> > > >
>> > > >
>> > > > * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>> > > >
>> > > >
>> > > > * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>> > > >
>> > > >
>> > > > * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>> > > >
>> > > > * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>> > > >
>> > > >
>> > > > * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>> > > >
>> > > >
>> > > > * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>> > > >
>> > > >
>> > > > * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>> > > >
>> > > > * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>> > > >
>> > > >
>> > > > === Additional Interested Contributors ===
>> > > > * Ivan Kelly (ivank<AT>apache<DOT>org)
>> > > >
>> > > > * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>> > > >
>> > > >
>> > > > === Affiliations ===
>> > > >
>> > > > * Edward Bortnikov, Yahoo Inc.
>> > > >
>> > > >
>> > > > * Daniel Dai, Hortonworks
>> > > >
>> > > >
>> > > > * Flavio P. Junqueira, Confluent
>> > > >
>> > > >
>> > > > * Igor Katkov, Yahoo Inc.
>> > > >
>> > > >
>> > > > * Ivan Kelly, Midokura
>> > > >
>> > > >
>> > > > * Francis C. Liu, Yahoo Inc.
>> > > >
>> > > >
>> > > > * Sameer Paranjpye, Arimo
>> > > >
>> > > > * Francisco Perez-Sorrosal, Yahoo Inc.
>> > > >
>> > > >
>> > > > * Ohad Shacham, Yahoo Inc.
>> > > >
>> > > >
>> > > > * Maysam Yabandeh, Dropbox Inc.
>> > > >
>> > > >
>> > > > === Sponsors ===
>> > > >
>> > > > ==== Champion ====
>> > > >
>> > > > Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>> > > >
>> > > > ==== Nominated Mentors ====
>> > > >
>> > > > Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>> > > >
>> > > > Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>> > > >
>> > > > Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>> > > >
>> > > > Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>> > > >
>> > > > James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>> > > >
>> > > >
>> > > > ==== Sponsoring Entity ====
>> > > > Apache Incubator PMC
>> > > >
>> > > > ---------------------------------------------------------------------
>> > > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> > > > For additional commands, e-mail: general-help@incubator.apache.org
>> > > >
>> > > >
>> > >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by James Taylor <ja...@apache.org>.
That'd be great to allow different transaction frameworks to plug into
Phoenix. I suspect that transactions are in the same boat as secondary
indexing with a one-size-fits-all approach not being feasible across the
variety of use cases we see. Having a pluggable mechanism would be a good
solution. I've filed PHOENIX-2788 [1] for this work. Though, of course it
helps that a transaction layer works with HBase, much of the integration
work is at the Phoenix level. To get an idea, see [2]. There are several
features missing in HBase that would be precursors to HBASE-11447 IMHO.
Namely support for undo of a Delete [3] and finer timestamp granularity for
Cells [4].

    James

[1] https://issues.apache.org/jira/browse/PHOENIX-2788
[2] https://github.com/apache/phoenix/pull/133
[3] https://issues.apache.org/jira/browse/HBASE-11292
[4] https://issues.apache.org/jira/browse/HBASE-8927


On Mon, Mar 21, 2016 at 12:48 PM, Henry Saputra <he...@gmail.com>
wrote:

> Hi Pierre,
>
> Thanks for your reply. Yes, I remember Trafodion, but since it is more
> complete solution of SQL + Transaction, I did not mention it as
> comparisons.
>
> But the comment is valid, meaning there were already prior acts about
> Transaction support for NoSQL in Apache, so no reason to "reject" or as
> immediate consolidation of such projects into incubator.
>
>
> - Henry
>
> On Sun, Mar 20, 2016 at 1:34 PM, Pierre Smits <pi...@gmail.com>
> wrote:
>
> > Hi Henry,
> >
> > It seems you (and several others) are forgetting the Trafodion, which
> also
> > privides transactions on N*SQL solutions, see http trafodion.apache.org
> >
> > Best regards,
> >
> > Pierre Smits
> >
> > ORRTIZ.COM <http://www.orrtiz.com>
> > OFBiz based solutions & services
> >
> > OFBiz Extensions Marketplace
> > http://oem.ofbizci.net/oci-2/
> >
> > On Sat, Mar 19, 2016 at 12:19 AM, Henry Saputra <henry.saputra@gmail.com
> >
> > wrote:
> >
> > > I know Apache incubator does not play favorite but it is getting
> awkward
> > > that TWO transaction engine for HBase coming to incubator at the same
> > time.
> > >
> > > As most people know, the other one is Tephra, that just coming to
> > incubator
> > > few weeks ago.
> > >
> > > As member of IPMC, I would like to see Omid provide some more details
> > > comparisons about the difference that the project bring,  in term of
> > > approach and possible integrations with other ASF projects.
> > >
> > > If possible, I would prefer to see Omid team work together with Tephra
> to
> > > work on working together to make one solid transaction engine for HBase
> > and
> > > later NoSQL databases.
> > >
> > >
> > > - Henry
> > >
> > > On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <da...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I would like to propose Omid as an Apache Incubator project:
> > > >
> > > > https://wiki.apache.org/incubator/OmidProposal
> > > >
> > > > I've posted posted the text of the proposal below:
> > > >
> > > > Thanks,
> > > > Daniel
> > > >
> > > > = Omid Proposal =
> > > >
> > > > === Abstract ===
> > > >
> > > > Omid is a flexible, reliable, high performant and scalable ACID
> > > > transactional framework that allows client applications to execute
> > > > transactions on top of MVCC key/value-based NoSQL datastores
> > > > (currently Apache HBase) providing Snapshot Isolation guarantees on
> > > > the accessed data.
> > > >
> > > >
> > > > === Proposal ===
> > > >
> > > > Omid is a flexible open-source transactional framework that provides
> > > > ACID transactions with Snapshot Isolation guarantees on top of NoSQL
> > > > datastores. In particular, the current codebase brings the concept of
> > > > transactions to the popular Apache HBase datastore. Omid offers great
> > > > performance, it is highly available, and scalable. Omid's current
> > > > version is able to scale to thousands of clients triggering
> concurrent
> > > > transactions on application data stored in HBase. Omid can scale
> > > > beyond 100K transactions per second on mid-range hardware while
> > > > incurring in a minimal impact on the speed of data access in the
> > > > datastore. We’re currently experimenting with a prototype version
> that
> > > > can improve the performance up to ~380K TPS.
> > > >
> > > >
> > > > Omid has been publicly available as an open-source project in Github
> > > > under Apache License Version 2.0 since 2011 [1]. During these years,
> > > > it has generated certain interest in the open source community,
> > > > especially since the public presentation of the first version in
> > > > Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars
> and
> > > > 93 forks. Yahoo Inc. submits this proposal to the Apache Software
> > > > Foundation with the aim to transfer the Omid project -including its
> > > > source code and documentation- to Apache in order to start the build
> > > > of a stable open source community around it.
> > > >
> > > >
> > > > [1] https://github.com/yahoo/omid
> > > >
> > > > [2] Omid presentation at Hadoop Summit 2013:
> > > >
> > > >
> > >
> >
> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
> > > >
> > > >
> > > > === Background ===
> > > >
> > > > An Omid prototype was first released as an open-source project back
> in
> > > > 2011. Inspired by Google Percolator [1], it offered a lock-free
> > > > approach to transactions in NoSQL datastores (See [2]). However,
> > > > during these years, the design of Omid has evolved significantly.
> > > > Whilst the current open-sourced version maintains many aspects of the
> > > > original implementation, it is the result of a major redesign of the
> > > > first prototype released in 2011.
> > > >
> > > >
> > > > Omid has now a more decentralized design that does not sacrifice the
> > > > consistency and performance of the original version. The current
> > > > design also enables Omid to scale to thousands of clients executing
> > > > transactions concurrently on application data stored in HBase.
> > > > Internally, Omid still utilizes a lock-free approach to support
> > > > multiple concurrent clients. Its design also relies on a centralized
> > > > conflict detection component, the TSO, which now resolves in an
> > > > efficient manner writeset collisions among concurrent transactions
> > > > without having to piggyback commit information to the clients.
> Another
> > > > important benefit of Omid is that it doesn't require any modification
> > > > of the underlying key-value datastore, HBase in this case. Moreover,
> > > > the recently added high availability algorithm allows to eliminate
> the
> > > > single point of failure represented by the TSO in those system
> > > > deployments requiring a higher degree of dependability. Last but not
> > > > least, the provided user API is very simple, mimicking transaction
> > > > managers in the relational world: begin, commit, rollback.
> > > >
> > > >
> > > > Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
> > > > management platform powering some of next-generation search and
> > > > personalization products is using Omid as a transaction manager in
> its
> > > > processing pipeline. Sieve essentially acts as a huge processing hub
> > > > between content feeds and serving systems. It provides an environment
> > > > for highly customizable, real-time, streamed information processing,
> > > > with typical discovery-to-service latencies of just a few seconds. In
> > > > terms of scale and availability, Omid’s new design was largely driven
> > > > by Sieve’s requirements.
> > > >
> > > >
> > > > At Yahoo, we are also making an effort to disseminate the current
> > > > status of the project through blog entries (See [3], [4] and [5]) and
> > > > submissions to technical and academic conferences such as ATC 2016,
> > > > Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
> > > > appeared in a TechCrunch article in the last quarter of 2015 (See
> [6])
> > > >
> > > >
> > > > [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
> > > > Distributed Transactions and Notifications. USENIX Symposium on
> > > > Operating Systems Design and Implementation, 2010
> > > >
> > > > [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
> > > > Omid: Lock-free transactional support for distributed data stores. In
> > > > Proc. of ICDE, 2013.
> > > >
> > > > [3]
> > > >
> > >
> >
> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
> > > >
> > > > [4]
> > > >
> > >
> >
> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
> > > >
> > > > [5]
> > > >
> > >
> >
> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
> > > >
> > > > [6]
> > > >
> > >
> >
> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
> > > >
> > > >
> > > > === Rationale ===
> > > >
> > > > Programming with ACID (Atomicity, Consistency, Isolation, Durability)
> > > > transactions is very popular and it is featured in relational
> > > > databases. However, in the Big Data ecosystem, applications typically
> > > > use NoSQL datastores, which do not provide ACID transactions. Such
> > > > NoSQL datastores used to give up transactional support for greater
> > > > agility and scalability. However, while early NoSQL data store
> > > > implementations did not include transaction support, the need for
> > > > transactions soon emerged in Big Data applications when accessing
> > > > shared data; for  example, transactions are very important  for
> > > > modern, scalable systems that process content incrementally.
> > > >
> > > >
> > > > NoSQL datastores -including HBase- don’t provide transactional
> > > > frameworks to coordinate the access to the underlying data for
> > > > preserving consistency. By using Omid, Big Data applications that
> need
> > > > to bundle multiple read and write operations on HBase into logically
> > > > indivisible units of work can execute transactions with ACID
> > > > properties, just as they would use transactions in the relational
> > > > database world. Omid extends the HBase key-value access APl with
> > > > transaction semantics. It can be exercised either directly, or via
> > > > higher level data management API’s. For example, Apache Phoenix
> > > > (SQL-on-top-of-HBase) might use Omid as its transaction management
> > > > component.
> > > >
> > > >
> > > > The following features make Omid an attractive choice for system
> > > > designers and other projects in the Apache community:
> > > >
> > > >
> > > > * Semantics. Omid implements Snapshot Isolation (SI,) supported by
> > > > major SQL and NoSQL technologies (e.g. Google Percolator).
> > > >
> > > >
> > > > * Performance and Scalability. Omid  provides a highly scalable,
> > > > lock-free implementation of SI. To the best of our knowledge, it is
> > > > also one of the few open source NoSQL transactional platforms that
> can
> > > > execute more than 100K transactions per second [1]. A new prototype
> > > > still in development can go even further, up to ~380K TPS.
> > > >
> > > >
> > > > * Reliability.  Omid has a high-availability (HA) mode, in which the
> > > > core service performing writeset conflict resolution operates as
> > > > primary-backup process pair with automatic failover. The HA support
> > > > has zero overhead on the mainstream operation.
> > > >
> > > >
> > > > * Adaptability. Omid current version provides transactions on data
> > > > stored in Apache HBase. However, Omid’s components are generic enough
> > > > to be adapted to any other key-value NoSQL datasource that supports
> > > > MVCC.
> > > >
> > > >
> > > > * Development. Omid provides a very simple interface that mimics
> > > > standard HBase APIs, making it developer friendly. Only minimal
> > > > extensions to the standard interfaces have been introduced to enable
> > > > transactions.
> > > >
> > > >
> > > > * Simplicity. Omid leverages the HBase infrastructure for managing
> its
> > > > own metadata. It entails no additional services apart from those
> > > > provided and used by HBase.
> > > >
> > > >
> > > > * Track Record. As we have mentioned, Omid is already in use by
> > > > very-large-scale production systems at Yahoo. Also, Hortonworks is
> > > > integrating Omid in a metastore implementation for Hive based on
> > > > HBase.
> > > >
> > > > [1] See also Haeinsa:
> https://github.com/vcnc/haeinsa/wiki/Performance
> > > >
> > > >
> > > > === Current Status ===
> > > > Current Omid implementation is available in both, Yahoo’s internal
> > > > Github repository for internal use at Yahoo as well as in Yahoo’s
> > > > Github public repository (https://github.com/yahoo/omid.git). Both
> > > > repositories are managed by Omid’s current developers at Yahoo.
> > > >
> > > > As it is mentioned above, Yahoo is currently using Omid for providing
> > > > transactions in Sieve, a web-scale content management platform that
> > > > powers Yahoo’s next-generation search and personalization products.
> > > >
> > > >
> > > > ==== Meritocracy ====
> > > > The first version of Omid was originally created in 2011 by Maysam
> > > > Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
> > > > Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
> > > >
> > > >
> > > > During the years after its inception, Omid has matured to operate at
> > > > Web scale and has been used internally by strategic projects at Yahoo
> > > > such as Sieve. The current base of committers belong to the Yahoo
> team
> > > > that took over the initial Omid prototype and rewrote it to meet the
> > > > high availability and scalability requirements of the Sieve project.
> > > > This base of committers has recently incorporated Hortonworks members
> > > > that helped in the Omid adaptation to HBase 1.x versions.
> > > >
> > > >
> > > > With this initial committer base, we aim to form a larger community
> > > > that can collaborate with new ideas over the current code base. This
> > > > new community will run the project following the "Apache Way"
> > > > (http://apache.org/foundation/governance/). Users and new
> contributors
> > > > will be treated with respect and welcomed. To grow the community, we
> > > > will encourage contributors to provide patches, review code, propose
> > > > new features improvements, talk at conferences such as Hadoop Summit,
> > > > HBaseCon, ApacheCon, etc. Committership and PMC membership will be
> > > > offered according to meritocracy.
> > > >
> > > > ==== Community ====
> > > >
> > > > The public Yahoo Omid repository at Github currently has 241 Stars
> and
> > > > 93 forks, which means that there is an important interest for the
> > > > project in the open-source community, at least compared with other
> > > > similar projects (See https://github.com/yahoo/omid.git).
> > > >
> > > >
> > > > Recently, Hortonworks contributors to the Apache Hive project which
> > > > are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
> > > > manifested interest in using Omid. We started with them a fruitful
> > > > collaboration that resulted in Omid supporting HBase 1.x versions.
> > > >
> > > >
> > > > Salesforce is also interested in collaborating in doing a Proof of
> > > > Concept for integrating Omid as a pluggable transaction manager in
> > > > Apache Phoenix.
> > > >
> > > >
> > > > Yahoo, Hortonworks and Salesforce participants will constitute the
> > > > initial set of committers and mentors for the proposal.
> > > >
> > > > ==== Core Developers ====
> > > > The core developers of Omid are all skilled software developers and
> > > > research engineers at Yahoo Inc. and Hortonworks with years of
> > > > experiences in their fields. At this moment, developers are
> > > > distributed across U.S. and Israel. The aim is to incorporate more
> > > > committers from different organizations and locations over time.
> > > >
> > > >
> > > > The current set of developers include experienced committers from
> > > > Apache HBase, Hive and Hadoop projects that have been working with us
> > > > in the current codebase found in Github.
> > > >
> > > > Finally, some of the core developers are currently NOT affiliated
> with
> > > > the ASF and would require new ICLAs to be filed.
> > > >
> > > >
> > > > === Alignment ===
> > > > Omid enhances with transactions the already successful Apache HBase
> > > > datastore project. We have collaborated with other developers inside
> > > > and outside Yahoo which are involved in the Apache HBase community,
> so
> > > > we have had reliable feedback from them.
> > > >
> > > > Although Omid brings value into HBase, the design of the current
> > > > version provides a general transaction scheme that can potentially be
> > > > adapted to other MVCC key-value datastores such as Apache Cassandra.
> > > >
> > > >
> > > > Apache Phoenix is also a potential target. Phoenix is a SQL layer on
> > > > top of HBase that can potentially integrate Omid in order to provide
> > > > the well-know concept of transactions to Phoenix-based applications.
> > > >
> > > >
> > > > === Known Risks ===
> > > > ==== Orphaned products ====
> > > > Yahoo’s Research and Search organizations have been taking care of
> > > > Omid development since the first prototype creation in 2011. Yahoo
> has
> > > > a long history participating in open-source projects, and has been
> > > > also a long time contributor to the Apache community. For example, in
> > > > Apache, Yahoo is an important contributor in many projects in the
> > > > Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
> > > > open-sourced other well-known projects outside Hadoop, such as
> > > > Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
> > > > Omid also a successful open-source Apache product. If this happens,
> we
> > > > are sure that a larger community will be formed around the project in
> > > > a relatively short period of time, contributing to the
> diversification
> > > > and stabilization of the base of committers.
> > > >
> > > >
> > > > ==== Inexperience with Open Source ====
> > > > This project has long standing experienced mentors and interested
> > > > contributors from Apache HBase, Hive and Phoenix to help us moving
> > > > through the open source process. We are actively working with
> > > > experienced Apache community members to improve our project and
> > > > further testing.
> > > >
> > > > ==== Homogeneous Developers ====
> > > > Omid has been supported by Yahoo since its inception in 2011.
> However,
> > > > all current committers are employed by their respective companies
> > > > shown in the Affiliations section.
> > > >
> > > >
> > > > ==== Reliance on Salaried Developers ====
> > > >
> > > > All the current developers are paid by their employers to contribute
> > > > to this project. Yahoo developers will also continuing maintaining
> the
> > > > internal Omid repository at their company.
> > > >
> > > > Of course, other developers are welcomed to contribute to this
> project
> > > > after it is open sourced in Apache.
> > > >
> > > > ==== Relationships with Other Apache Product ====
> > > >
> > > > Current Omid incarnation serves transactional contexts to
> applications
> > > > storing their data in HBase. However Omid design potentially allows
> to
> > > > be adapted to serve transactions on top of other MVCC-based key-value
> > > > datastores in Apache community such as Cassandra.
> > > >
> > > >
> > > > As a transactional framework, many other Apache projects such as
> > > > Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
> > > > potentially benefit from Omid to get transactional contexts. In
> > > > particular, Apache Phoenix -a SQL layer on top of HBase- might use
> > > > Omid as its transaction management component. Once we open source
> Omid
> > > > as an Apache project, we expect to generate more interest in the
> > > > surrounded communities.
> > > >
> > > >
> > > > Very recently, a new incubator proposal for a similar project called
> > > > Tephra, has been submitted to the ASF. We think this is good for the
> > > > Apache community, and we believe that there’s room for both proposals
> > > > as the design of each of them is based on different principles (e.g.
> > > > Omid does not require to maintain the state of ongoing transactions
> on
> > > > the server-side component) and due to the fact that both -Tephra and
> > > > Omid- have also gained certain traction in the open-source community.
> > > >
> > > >
> > > > With regard to the Apache projects that Omid uses, apart from HBase,
> > > > Omid relies on Apache Zookeeper and Curator projects in order to
> > > > coordinate the (re)connection of transaction managers (acting as
> > > > clients) to the conflict resolution component for transactions
> (server
> > > > side.) They’re also used in order to coordinate the master and backup
> > > > replicas in high availability scenarios.
> > > >
> > > >
> > > > ==== An Excessive Fascination with the Apache Brand ====
> > > >
> > > > We are applying to the Incubator process because we think that it is
> > > > the logical next step for the  Omid project after we open-sourced the
> > > > code in Github some years ago. Yahoo has a long-standing history of
> > > > contributing to Apache projects. The developers and contributors
> > > > understand the implications of making it an Apache project, and
> > > > strongly believe that the growing community can benefit from the
> > > > Apache environment, ecosystem, and infrastrastructure.
> > > >
> > > >
> > > > === Documentation ===
> > > > Current documentation about the project is available in the wiki of
> > > > Omid’s Github repository: https://github.com/yahoo/omid/wiki . It
> will
> > > > be moved under https://omid.incubator.apache.org/docs if the project
> > > > is accepted as an Apache Incubator.
> > > >
> > > > === Initial Source ===
> > > > Initial source code is currently hosted in Github for general viewing
> > > > and contribution:
> > > >
> > > > https://github.com/yahoo/omid.git
> > > >
> > > >
> > > > Omid source code is written in Java code (99%) mixed with some shell
> > > > script (1%) in order to configure and trigger the execution of main
> > > > components.
> > > >
> > > >
> > > > The code will be moved to Apache http://git.apache.org/ if accepted
> as
> > > > an Incubator project.
> > > >
> > > > === Source and Intellectual Property Submission Plan ===
> > > >
> > > > The current Omid License for the code published in Github is Apache
> > > > 2.0. If Omid fulfills and passes the conditions for being an
> Incubator
> > > > project in the ASF, the source code will be transitioned via the
> > > > Software Grant Agreement onto the ASF infrastructure and in turn made
> > > > available under the Apache License, version 2.0.
> > > >
> > > > === External Dependencies ===
> > > >
> > > >
> > > > The required external dependencies that are not Apache projects are
> > > > all Apache licenses or other compatible Licenses:
> > > >
> > > > Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
> > > >
> > > > JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
> > > >
> > > > Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
> > > >
> > > > Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache
> 2.0]
> > > >
> > > > Testng v6.8.8  (http://testng.org) [Apache 2.0]
> > > >
> > > > SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
> > > >
> > > > Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
> > > >
> > > > Google Protocol Buffers v2.5.0
> > > > (https://developers.google.com/protocol-buffers/) [BSD License]
> > > >
> > > > Mockito (http://mockito.org/) v1.9.5 [MIT License]
> > > >
> > > > LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
> > > > [Apache 2.0]
> > > >
> > > > Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
> > > > (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
> > > >
> > > > C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
> > > >
> > > > Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
> > > >
> > > >
> > > > === Cryptography ===
> > > > Omid project does not use cryptography itself. However, Apache HBase
> > > > -the datastore on top of which Omid works in its current version-
> uses
> > > > standard APIs and tools for SSH and SSL communication where
> necessary.
> > > >
> > > > === Required Resources ===
> > > > We request that following resources be created for the project to
> use:
> > > >
> > > > ==== Mailing lists ====
> > > >
> > > > omid-private (moderated subscriptions)
> > > >
> > > > omid-commits (commit notification)
> > > > omid-dev (technical discussions)
> > > >
> > > > ==== Git repository ====
> > > > https://github.com/apache/incubator-omid
> > > >
> > > > ==== Documentation ====
> > > > https://omid.incubator.apache.org/docs/
> > > >
> > > > ==== JIRA instance ====
> > > > https://issues.apache.org/jira/browse/omid
> > > >
> > > > === Initial Committers ===
> > > >
> > > > * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> > > >
> > > >
> > > > * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> > > >
> > > >
> > > > * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> > > >
> > > >
> > > > * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> > > >
> > > >
> > > > * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
> > > >
> > > >
> > > > * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
> > > >
> > > > * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> > > >
> > > >
> > > > * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
> > > >
> > > >
> > > > * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
> > > >
> > > >
> > > > * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
> > > >
> > > > * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> > > >
> > > >
> > > > === Additional Interested Contributors ===
> > > > * Ivan Kelly (ivank<AT>apache<DOT>org)
> > > >
> > > > * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
> > > >
> > > >
> > > > === Affiliations ===
> > > >
> > > > * Edward Bortnikov, Yahoo Inc.
> > > >
> > > >
> > > > * Daniel Dai, Hortonworks
> > > >
> > > >
> > > > * Flavio P. Junqueira, Confluent
> > > >
> > > >
> > > > * Igor Katkov, Yahoo Inc.
> > > >
> > > >
> > > > * Ivan Kelly, Midokura
> > > >
> > > >
> > > > * Francis C. Liu, Yahoo Inc.
> > > >
> > > >
> > > > * Sameer Paranjpye, Arimo
> > > >
> > > > * Francisco Perez-Sorrosal, Yahoo Inc.
> > > >
> > > >
> > > > * Ohad Shacham, Yahoo Inc.
> > > >
> > > >
> > > > * Maysam Yabandeh, Dropbox Inc.
> > > >
> > > >
> > > > === Sponsors ===
> > > >
> > > > ==== Champion ====
> > > >
> > > > Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> > > >
> > > > ==== Nominated Mentors ====
> > > >
> > > > Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> > > >
> > > > Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> > > >
> > > > Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> > > >
> > > > Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> > > >
> > > > James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> > > >
> > > >
> > > > ==== Sponsoring Entity ====
> > > > Apache Incubator PMC
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > > > For additional commands, e-mail: general-help@incubator.apache.org
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Henry Saputra <he...@gmail.com>.
Hi Pierre,

Thanks for your reply. Yes, I remember Trafodion, but since it is more
complete solution of SQL + Transaction, I did not mention it as comparisons.

But the comment is valid, meaning there were already prior acts about
Transaction support for NoSQL in Apache, so no reason to "reject" or as
immediate consolidation of such projects into incubator.


- Henry

On Sun, Mar 20, 2016 at 1:34 PM, Pierre Smits <pi...@gmail.com>
wrote:

> Hi Henry,
>
> It seems you (and several others) are forgetting the Trafodion, which also
> privides transactions on N*SQL solutions, see http trafodion.apache.org
>
> Best regards,
>
> Pierre Smits
>
> ORRTIZ.COM <http://www.orrtiz.com>
> OFBiz based solutions & services
>
> OFBiz Extensions Marketplace
> http://oem.ofbizci.net/oci-2/
>
> On Sat, Mar 19, 2016 at 12:19 AM, Henry Saputra <he...@gmail.com>
> wrote:
>
> > I know Apache incubator does not play favorite but it is getting awkward
> > that TWO transaction engine for HBase coming to incubator at the same
> time.
> >
> > As most people know, the other one is Tephra, that just coming to
> incubator
> > few weeks ago.
> >
> > As member of IPMC, I would like to see Omid provide some more details
> > comparisons about the difference that the project bring,  in term of
> > approach and possible integrations with other ASF projects.
> >
> > If possible, I would prefer to see Omid team work together with Tephra to
> > work on working together to make one solid transaction engine for HBase
> and
> > later NoSQL databases.
> >
> >
> > - Henry
> >
> > On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <da...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I would like to propose Omid as an Apache Incubator project:
> > >
> > > https://wiki.apache.org/incubator/OmidProposal
> > >
> > > I've posted posted the text of the proposal below:
> > >
> > > Thanks,
> > > Daniel
> > >
> > > = Omid Proposal =
> > >
> > > === Abstract ===
> > >
> > > Omid is a flexible, reliable, high performant and scalable ACID
> > > transactional framework that allows client applications to execute
> > > transactions on top of MVCC key/value-based NoSQL datastores
> > > (currently Apache HBase) providing Snapshot Isolation guarantees on
> > > the accessed data.
> > >
> > >
> > > === Proposal ===
> > >
> > > Omid is a flexible open-source transactional framework that provides
> > > ACID transactions with Snapshot Isolation guarantees on top of NoSQL
> > > datastores. In particular, the current codebase brings the concept of
> > > transactions to the popular Apache HBase datastore. Omid offers great
> > > performance, it is highly available, and scalable. Omid's current
> > > version is able to scale to thousands of clients triggering concurrent
> > > transactions on application data stored in HBase. Omid can scale
> > > beyond 100K transactions per second on mid-range hardware while
> > > incurring in a minimal impact on the speed of data access in the
> > > datastore. We’re currently experimenting with a prototype version that
> > > can improve the performance up to ~380K TPS.
> > >
> > >
> > > Omid has been publicly available as an open-source project in Github
> > > under Apache License Version 2.0 since 2011 [1]. During these years,
> > > it has generated certain interest in the open source community,
> > > especially since the public presentation of the first version in
> > > Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
> > > 93 forks. Yahoo Inc. submits this proposal to the Apache Software
> > > Foundation with the aim to transfer the Omid project -including its
> > > source code and documentation- to Apache in order to start the build
> > > of a stable open source community around it.
> > >
> > >
> > > [1] https://github.com/yahoo/omid
> > >
> > > [2] Omid presentation at Hadoop Summit 2013:
> > >
> > >
> >
> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
> > >
> > >
> > > === Background ===
> > >
> > > An Omid prototype was first released as an open-source project back in
> > > 2011. Inspired by Google Percolator [1], it offered a lock-free
> > > approach to transactions in NoSQL datastores (See [2]). However,
> > > during these years, the design of Omid has evolved significantly.
> > > Whilst the current open-sourced version maintains many aspects of the
> > > original implementation, it is the result of a major redesign of the
> > > first prototype released in 2011.
> > >
> > >
> > > Omid has now a more decentralized design that does not sacrifice the
> > > consistency and performance of the original version. The current
> > > design also enables Omid to scale to thousands of clients executing
> > > transactions concurrently on application data stored in HBase.
> > > Internally, Omid still utilizes a lock-free approach to support
> > > multiple concurrent clients. Its design also relies on a centralized
> > > conflict detection component, the TSO, which now resolves in an
> > > efficient manner writeset collisions among concurrent transactions
> > > without having to piggyback commit information to the clients. Another
> > > important benefit of Omid is that it doesn't require any modification
> > > of the underlying key-value datastore, HBase in this case. Moreover,
> > > the recently added high availability algorithm allows to eliminate the
> > > single point of failure represented by the TSO in those system
> > > deployments requiring a higher degree of dependability. Last but not
> > > least, the provided user API is very simple, mimicking transaction
> > > managers in the relational world: begin, commit, rollback.
> > >
> > >
> > > Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
> > > management platform powering some of next-generation search and
> > > personalization products is using Omid as a transaction manager in its
> > > processing pipeline. Sieve essentially acts as a huge processing hub
> > > between content feeds and serving systems. It provides an environment
> > > for highly customizable, real-time, streamed information processing,
> > > with typical discovery-to-service latencies of just a few seconds. In
> > > terms of scale and availability, Omid’s new design was largely driven
> > > by Sieve’s requirements.
> > >
> > >
> > > At Yahoo, we are also making an effort to disseminate the current
> > > status of the project through blog entries (See [3], [4] and [5]) and
> > > submissions to technical and academic conferences such as ATC 2016,
> > > Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
> > > appeared in a TechCrunch article in the last quarter of 2015 (See [6])
> > >
> > >
> > > [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
> > > Distributed Transactions and Notifications. USENIX Symposium on
> > > Operating Systems Design and Implementation, 2010
> > >
> > > [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
> > > Omid: Lock-free transactional support for distributed data stores. In
> > > Proc. of ICDE, 2013.
> > >
> > > [3]
> > >
> >
> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
> > >
> > > [4]
> > >
> >
> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
> > >
> > > [5]
> > >
> >
> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
> > >
> > > [6]
> > >
> >
> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
> > >
> > >
> > > === Rationale ===
> > >
> > > Programming with ACID (Atomicity, Consistency, Isolation, Durability)
> > > transactions is very popular and it is featured in relational
> > > databases. However, in the Big Data ecosystem, applications typically
> > > use NoSQL datastores, which do not provide ACID transactions. Such
> > > NoSQL datastores used to give up transactional support for greater
> > > agility and scalability. However, while early NoSQL data store
> > > implementations did not include transaction support, the need for
> > > transactions soon emerged in Big Data applications when accessing
> > > shared data; for  example, transactions are very important  for
> > > modern, scalable systems that process content incrementally.
> > >
> > >
> > > NoSQL datastores -including HBase- don’t provide transactional
> > > frameworks to coordinate the access to the underlying data for
> > > preserving consistency. By using Omid, Big Data applications that need
> > > to bundle multiple read and write operations on HBase into logically
> > > indivisible units of work can execute transactions with ACID
> > > properties, just as they would use transactions in the relational
> > > database world. Omid extends the HBase key-value access APl with
> > > transaction semantics. It can be exercised either directly, or via
> > > higher level data management API’s. For example, Apache Phoenix
> > > (SQL-on-top-of-HBase) might use Omid as its transaction management
> > > component.
> > >
> > >
> > > The following features make Omid an attractive choice for system
> > > designers and other projects in the Apache community:
> > >
> > >
> > > * Semantics. Omid implements Snapshot Isolation (SI,) supported by
> > > major SQL and NoSQL technologies (e.g. Google Percolator).
> > >
> > >
> > > * Performance and Scalability. Omid  provides a highly scalable,
> > > lock-free implementation of SI. To the best of our knowledge, it is
> > > also one of the few open source NoSQL transactional platforms that can
> > > execute more than 100K transactions per second [1]. A new prototype
> > > still in development can go even further, up to ~380K TPS.
> > >
> > >
> > > * Reliability.  Omid has a high-availability (HA) mode, in which the
> > > core service performing writeset conflict resolution operates as
> > > primary-backup process pair with automatic failover. The HA support
> > > has zero overhead on the mainstream operation.
> > >
> > >
> > > * Adaptability. Omid current version provides transactions on data
> > > stored in Apache HBase. However, Omid’s components are generic enough
> > > to be adapted to any other key-value NoSQL datasource that supports
> > > MVCC.
> > >
> > >
> > > * Development. Omid provides a very simple interface that mimics
> > > standard HBase APIs, making it developer friendly. Only minimal
> > > extensions to the standard interfaces have been introduced to enable
> > > transactions.
> > >
> > >
> > > * Simplicity. Omid leverages the HBase infrastructure for managing its
> > > own metadata. It entails no additional services apart from those
> > > provided and used by HBase.
> > >
> > >
> > > * Track Record. As we have mentioned, Omid is already in use by
> > > very-large-scale production systems at Yahoo. Also, Hortonworks is
> > > integrating Omid in a metastore implementation for Hive based on
> > > HBase.
> > >
> > > [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
> > >
> > >
> > > === Current Status ===
> > > Current Omid implementation is available in both, Yahoo’s internal
> > > Github repository for internal use at Yahoo as well as in Yahoo’s
> > > Github public repository (https://github.com/yahoo/omid.git). Both
> > > repositories are managed by Omid’s current developers at Yahoo.
> > >
> > > As it is mentioned above, Yahoo is currently using Omid for providing
> > > transactions in Sieve, a web-scale content management platform that
> > > powers Yahoo’s next-generation search and personalization products.
> > >
> > >
> > > ==== Meritocracy ====
> > > The first version of Omid was originally created in 2011 by Maysam
> > > Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
> > > Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
> > >
> > >
> > > During the years after its inception, Omid has matured to operate at
> > > Web scale and has been used internally by strategic projects at Yahoo
> > > such as Sieve. The current base of committers belong to the Yahoo team
> > > that took over the initial Omid prototype and rewrote it to meet the
> > > high availability and scalability requirements of the Sieve project.
> > > This base of committers has recently incorporated Hortonworks members
> > > that helped in the Omid adaptation to HBase 1.x versions.
> > >
> > >
> > > With this initial committer base, we aim to form a larger community
> > > that can collaborate with new ideas over the current code base. This
> > > new community will run the project following the "Apache Way"
> > > (http://apache.org/foundation/governance/). Users and new contributors
> > > will be treated with respect and welcomed. To grow the community, we
> > > will encourage contributors to provide patches, review code, propose
> > > new features improvements, talk at conferences such as Hadoop Summit,
> > > HBaseCon, ApacheCon, etc. Committership and PMC membership will be
> > > offered according to meritocracy.
> > >
> > > ==== Community ====
> > >
> > > The public Yahoo Omid repository at Github currently has 241 Stars and
> > > 93 forks, which means that there is an important interest for the
> > > project in the open-source community, at least compared with other
> > > similar projects (See https://github.com/yahoo/omid.git).
> > >
> > >
> > > Recently, Hortonworks contributors to the Apache Hive project which
> > > are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
> > > manifested interest in using Omid. We started with them a fruitful
> > > collaboration that resulted in Omid supporting HBase 1.x versions.
> > >
> > >
> > > Salesforce is also interested in collaborating in doing a Proof of
> > > Concept for integrating Omid as a pluggable transaction manager in
> > > Apache Phoenix.
> > >
> > >
> > > Yahoo, Hortonworks and Salesforce participants will constitute the
> > > initial set of committers and mentors for the proposal.
> > >
> > > ==== Core Developers ====
> > > The core developers of Omid are all skilled software developers and
> > > research engineers at Yahoo Inc. and Hortonworks with years of
> > > experiences in their fields. At this moment, developers are
> > > distributed across U.S. and Israel. The aim is to incorporate more
> > > committers from different organizations and locations over time.
> > >
> > >
> > > The current set of developers include experienced committers from
> > > Apache HBase, Hive and Hadoop projects that have been working with us
> > > in the current codebase found in Github.
> > >
> > > Finally, some of the core developers are currently NOT affiliated with
> > > the ASF and would require new ICLAs to be filed.
> > >
> > >
> > > === Alignment ===
> > > Omid enhances with transactions the already successful Apache HBase
> > > datastore project. We have collaborated with other developers inside
> > > and outside Yahoo which are involved in the Apache HBase community, so
> > > we have had reliable feedback from them.
> > >
> > > Although Omid brings value into HBase, the design of the current
> > > version provides a general transaction scheme that can potentially be
> > > adapted to other MVCC key-value datastores such as Apache Cassandra.
> > >
> > >
> > > Apache Phoenix is also a potential target. Phoenix is a SQL layer on
> > > top of HBase that can potentially integrate Omid in order to provide
> > > the well-know concept of transactions to Phoenix-based applications.
> > >
> > >
> > > === Known Risks ===
> > > ==== Orphaned products ====
> > > Yahoo’s Research and Search organizations have been taking care of
> > > Omid development since the first prototype creation in 2011. Yahoo has
> > > a long history participating in open-source projects, and has been
> > > also a long time contributor to the Apache community. For example, in
> > > Apache, Yahoo is an important contributor in many projects in the
> > > Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
> > > open-sourced other well-known projects outside Hadoop, such as
> > > Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
> > > Omid also a successful open-source Apache product. If this happens, we
> > > are sure that a larger community will be formed around the project in
> > > a relatively short period of time, contributing to the diversification
> > > and stabilization of the base of committers.
> > >
> > >
> > > ==== Inexperience with Open Source ====
> > > This project has long standing experienced mentors and interested
> > > contributors from Apache HBase, Hive and Phoenix to help us moving
> > > through the open source process. We are actively working with
> > > experienced Apache community members to improve our project and
> > > further testing.
> > >
> > > ==== Homogeneous Developers ====
> > > Omid has been supported by Yahoo since its inception in 2011. However,
> > > all current committers are employed by their respective companies
> > > shown in the Affiliations section.
> > >
> > >
> > > ==== Reliance on Salaried Developers ====
> > >
> > > All the current developers are paid by their employers to contribute
> > > to this project. Yahoo developers will also continuing maintaining the
> > > internal Omid repository at their company.
> > >
> > > Of course, other developers are welcomed to contribute to this project
> > > after it is open sourced in Apache.
> > >
> > > ==== Relationships with Other Apache Product ====
> > >
> > > Current Omid incarnation serves transactional contexts to applications
> > > storing their data in HBase. However Omid design potentially allows to
> > > be adapted to serve transactions on top of other MVCC-based key-value
> > > datastores in Apache community such as Cassandra.
> > >
> > >
> > > As a transactional framework, many other Apache projects such as
> > > Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
> > > potentially benefit from Omid to get transactional contexts. In
> > > particular, Apache Phoenix -a SQL layer on top of HBase- might use
> > > Omid as its transaction management component. Once we open source Omid
> > > as an Apache project, we expect to generate more interest in the
> > > surrounded communities.
> > >
> > >
> > > Very recently, a new incubator proposal for a similar project called
> > > Tephra, has been submitted to the ASF. We think this is good for the
> > > Apache community, and we believe that there’s room for both proposals
> > > as the design of each of them is based on different principles (e.g.
> > > Omid does not require to maintain the state of ongoing transactions on
> > > the server-side component) and due to the fact that both -Tephra and
> > > Omid- have also gained certain traction in the open-source community.
> > >
> > >
> > > With regard to the Apache projects that Omid uses, apart from HBase,
> > > Omid relies on Apache Zookeeper and Curator projects in order to
> > > coordinate the (re)connection of transaction managers (acting as
> > > clients) to the conflict resolution component for transactions (server
> > > side.) They’re also used in order to coordinate the master and backup
> > > replicas in high availability scenarios.
> > >
> > >
> > > ==== An Excessive Fascination with the Apache Brand ====
> > >
> > > We are applying to the Incubator process because we think that it is
> > > the logical next step for the  Omid project after we open-sourced the
> > > code in Github some years ago. Yahoo has a long-standing history of
> > > contributing to Apache projects. The developers and contributors
> > > understand the implications of making it an Apache project, and
> > > strongly believe that the growing community can benefit from the
> > > Apache environment, ecosystem, and infrastrastructure.
> > >
> > >
> > > === Documentation ===
> > > Current documentation about the project is available in the wiki of
> > > Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
> > > be moved under https://omid.incubator.apache.org/docs if the project
> > > is accepted as an Apache Incubator.
> > >
> > > === Initial Source ===
> > > Initial source code is currently hosted in Github for general viewing
> > > and contribution:
> > >
> > > https://github.com/yahoo/omid.git
> > >
> > >
> > > Omid source code is written in Java code (99%) mixed with some shell
> > > script (1%) in order to configure and trigger the execution of main
> > > components.
> > >
> > >
> > > The code will be moved to Apache http://git.apache.org/ if accepted as
> > > an Incubator project.
> > >
> > > === Source and Intellectual Property Submission Plan ===
> > >
> > > The current Omid License for the code published in Github is Apache
> > > 2.0. If Omid fulfills and passes the conditions for being an Incubator
> > > project in the ASF, the source code will be transitioned via the
> > > Software Grant Agreement onto the ASF infrastructure and in turn made
> > > available under the Apache License, version 2.0.
> > >
> > > === External Dependencies ===
> > >
> > >
> > > The required external dependencies that are not Apache projects are
> > > all Apache licenses or other compatible Licenses:
> > >
> > > Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
> > >
> > > JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
> > >
> > > Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
> > >
> > > Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
> > >
> > > Testng v6.8.8  (http://testng.org) [Apache 2.0]
> > >
> > > SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
> > >
> > > Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
> > >
> > > Google Protocol Buffers v2.5.0
> > > (https://developers.google.com/protocol-buffers/) [BSD License]
> > >
> > > Mockito (http://mockito.org/) v1.9.5 [MIT License]
> > >
> > > LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
> > > [Apache 2.0]
> > >
> > > Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
> > > (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
> > >
> > > C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
> > >
> > > Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
> > >
> > >
> > > === Cryptography ===
> > > Omid project does not use cryptography itself. However, Apache HBase
> > > -the datastore on top of which Omid works in its current version- uses
> > > standard APIs and tools for SSH and SSL communication where necessary.
> > >
> > > === Required Resources ===
> > > We request that following resources be created for the project to use:
> > >
> > > ==== Mailing lists ====
> > >
> > > omid-private (moderated subscriptions)
> > >
> > > omid-commits (commit notification)
> > > omid-dev (technical discussions)
> > >
> > > ==== Git repository ====
> > > https://github.com/apache/incubator-omid
> > >
> > > ==== Documentation ====
> > > https://omid.incubator.apache.org/docs/
> > >
> > > ==== JIRA instance ====
> > > https://issues.apache.org/jira/browse/omid
> > >
> > > === Initial Committers ===
> > >
> > > * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> > >
> > >
> > > * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> > >
> > >
> > > * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> > >
> > >
> > > * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> > >
> > >
> > > * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
> > >
> > >
> > > * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
> > >
> > > * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> > >
> > >
> > > * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
> > >
> > >
> > > * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
> > >
> > >
> > > * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
> > >
> > > * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> > >
> > >
> > > === Additional Interested Contributors ===
> > > * Ivan Kelly (ivank<AT>apache<DOT>org)
> > >
> > > * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
> > >
> > >
> > > === Affiliations ===
> > >
> > > * Edward Bortnikov, Yahoo Inc.
> > >
> > >
> > > * Daniel Dai, Hortonworks
> > >
> > >
> > > * Flavio P. Junqueira, Confluent
> > >
> > >
> > > * Igor Katkov, Yahoo Inc.
> > >
> > >
> > > * Ivan Kelly, Midokura
> > >
> > >
> > > * Francis C. Liu, Yahoo Inc.
> > >
> > >
> > > * Sameer Paranjpye, Arimo
> > >
> > > * Francisco Perez-Sorrosal, Yahoo Inc.
> > >
> > >
> > > * Ohad Shacham, Yahoo Inc.
> > >
> > >
> > > * Maysam Yabandeh, Dropbox Inc.
> > >
> > >
> > > === Sponsors ===
> > >
> > > ==== Champion ====
> > >
> > > Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> > >
> > > ==== Nominated Mentors ====
> > >
> > > Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> > >
> > > Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> > >
> > > Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> > >
> > > Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> > >
> > > James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> > >
> > >
> > > ==== Sponsoring Entity ====
> > > Apache Incubator PMC
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: general-help@incubator.apache.org
> > >
> > >
> >
>

RE: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Atanu Mishra <at...@esgyn.com>.
Hi,

Yes, very true, Pierre.

Back in the summer of 2014, a proposal to implement a generic transaction
API was submitted as HBASE-11447 to allow support for multiple
implementations of transaction monitors. Some discussions were held at the
time, but a final decision on the API has not been made.

HBASE-11447 Proposal for a generic transaction API for HBase

Regards,
Atanu

-----Original Message-----
From: Pierre Smits [mailto:pierre.smits@gmail.com]
Sent: Sunday, March 20, 2016 1:35 PM
To: general@incubator.apache.org
Subject: Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Hi Henry,

It seems you (and several others) are forgetting the Trafodion, which also
privides transactions on N*SQL solutions, see http trafodion.apache.org

Best regards,

Pierre Smits

ORRTIZ.COM <http://www.orrtiz.com>
OFBiz based solutions & services

OFBiz Extensions Marketplace
http://oem.ofbizci.net/oci-2/

On Sat, Mar 19, 2016 at 12:19 AM, Henry Saputra <he...@gmail.com>
wrote:

> I know Apache incubator does not play favorite but it is getting awkward
> that TWO transaction engine for HBase coming to incubator at the same
> time.
>
> As most people know, the other one is Tephra, that just coming to
> incubator
> few weeks ago.
>
> As member of IPMC, I would like to see Omid provide some more details
> comparisons about the difference that the project bring,  in term of
> approach and possible integrations with other ASF projects.
>
> If possible, I would prefer to see Omid team work together with Tephra to
> work on working together to make one solid transaction engine for HBase
> and
> later NoSQL databases.
>
>
> - Henry
>
> On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <da...@gmail.com> wrote:
>
> > Hi,
> >
> > I would like to propose Omid as an Apache Incubator project:
> >
> > https://wiki.apache.org/incubator/OmidProposal
> >
> > I've posted posted the text of the proposal below:
> >
> > Thanks,
> > Daniel
> >
> > = Omid Proposal =
> >
> > === Abstract ===
> >
> > Omid is a flexible, reliable, high performant and scalable ACID
> > transactional framework that allows client applications to execute
> > transactions on top of MVCC key/value-based NoSQL datastores
> > (currently Apache HBase) providing Snapshot Isolation guarantees on
> > the accessed data.
> >
> >
> > === Proposal ===
> >
> > Omid is a flexible open-source transactional framework that provides
> > ACID transactions with Snapshot Isolation guarantees on top of NoSQL
> > datastores. In particular, the current codebase brings the concept of
> > transactions to the popular Apache HBase datastore. Omid offers great
> > performance, it is highly available, and scalable. Omid's current
> > version is able to scale to thousands of clients triggering concurrent
> > transactions on application data stored in HBase. Omid can scale
> > beyond 100K transactions per second on mid-range hardware while
> > incurring in a minimal impact on the speed of data access in the
> > datastore. We’re currently experimenting with a prototype version that
> > can improve the performance up to ~380K TPS.
> >
> >
> > Omid has been publicly available as an open-source project in Github
> > under Apache License Version 2.0 since 2011 [1]. During these years,
> > it has generated certain interest in the open source community,
> > especially since the public presentation of the first version in
> > Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
> > 93 forks. Yahoo Inc. submits this proposal to the Apache Software
> > Foundation with the aim to transfer the Omid project -including its
> > source code and documentation- to Apache in order to start the build
> > of a stable open source community around it.
> >
> >
> > [1] https://github.com/yahoo/omid
> >
> > [2] Omid presentation at Hadoop Summit 2013:
> >
> >
> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
> >
> >
> > === Background ===
> >
> > An Omid prototype was first released as an open-source project back in
> > 2011. Inspired by Google Percolator [1], it offered a lock-free
> > approach to transactions in NoSQL datastores (See [2]). However,
> > during these years, the design of Omid has evolved significantly.
> > Whilst the current open-sourced version maintains many aspects of the
> > original implementation, it is the result of a major redesign of the
> > first prototype released in 2011.
> >
> >
> > Omid has now a more decentralized design that does not sacrifice the
> > consistency and performance of the original version. The current
> > design also enables Omid to scale to thousands of clients executing
> > transactions concurrently on application data stored in HBase.
> > Internally, Omid still utilizes a lock-free approach to support
> > multiple concurrent clients. Its design also relies on a centralized
> > conflict detection component, the TSO, which now resolves in an
> > efficient manner writeset collisions among concurrent transactions
> > without having to piggyback commit information to the clients. Another
> > important benefit of Omid is that it doesn't require any modification
> > of the underlying key-value datastore, HBase in this case. Moreover,
> > the recently added high availability algorithm allows to eliminate the
> > single point of failure represented by the TSO in those system
> > deployments requiring a higher degree of dependability. Last but not
> > least, the provided user API is very simple, mimicking transaction
> > managers in the relational world: begin, commit, rollback.
> >
> >
> > Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
> > management platform powering some of next-generation search and
> > personalization products is using Omid as a transaction manager in its
> > processing pipeline. Sieve essentially acts as a huge processing hub
> > between content feeds and serving systems. It provides an environment
> > for highly customizable, real-time, streamed information processing,
> > with typical discovery-to-service latencies of just a few seconds. In
> > terms of scale and availability, Omid’s new design was largely driven
> > by Sieve’s requirements.
> >
> >
> > At Yahoo, we are also making an effort to disseminate the current
> > status of the project through blog entries (See [3], [4] and [5]) and
> > submissions to technical and academic conferences such as ATC 2016,
> > Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
> > appeared in a TechCrunch article in the last quarter of 2015 (See [6])
> >
> >
> > [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
> > Distributed Transactions and Notifications. USENIX Symposium on
> > Operating Systems Design and Implementation, 2010
> >
> > [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
> > Omid: Lock-free transactional support for distributed data stores. In
> > Proc. of ICDE, 2013.
> >
> > [3]
> >
> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
> >
> > [4]
> >
> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
> >
> > [5]
> >
> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
> >
> > [6]
> >
> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
> >
> >
> > === Rationale ===
> >
> > Programming with ACID (Atomicity, Consistency, Isolation, Durability)
> > transactions is very popular and it is featured in relational
> > databases. However, in the Big Data ecosystem, applications typically
> > use NoSQL datastores, which do not provide ACID transactions. Such
> > NoSQL datastores used to give up transactional support for greater
> > agility and scalability. However, while early NoSQL data store
> > implementations did not include transaction support, the need for
> > transactions soon emerged in Big Data applications when accessing
> > shared data; for  example, transactions are very important  for
> > modern, scalable systems that process content incrementally.
> >
> >
> > NoSQL datastores -including HBase- don’t provide transactional
> > frameworks to coordinate the access to the underlying data for
> > preserving consistency. By using Omid, Big Data applications that need
> > to bundle multiple read and write operations on HBase into logically
> > indivisible units of work can execute transactions with ACID
> > properties, just as they would use transactions in the relational
> > database world. Omid extends the HBase key-value access APl with
> > transaction semantics. It can be exercised either directly, or via
> > higher level data management API’s. For example, Apache Phoenix
> > (SQL-on-top-of-HBase) might use Omid as its transaction management
> > component.
> >
> >
> > The following features make Omid an attractive choice for system
> > designers and other projects in the Apache community:
> >
> >
> > * Semantics. Omid implements Snapshot Isolation (SI,) supported by
> > major SQL and NoSQL technologies (e.g. Google Percolator).
> >
> >
> > * Performance and Scalability. Omid  provides a highly scalable,
> > lock-free implementation of SI. To the best of our knowledge, it is
> > also one of the few open source NoSQL transactional platforms that can
> > execute more than 100K transactions per second [1]. A new prototype
> > still in development can go even further, up to ~380K TPS.
> >
> >
> > * Reliability.  Omid has a high-availability (HA) mode, in which the
> > core service performing writeset conflict resolution operates as
> > primary-backup process pair with automatic failover. The HA support
> > has zero overhead on the mainstream operation.
> >
> >
> > * Adaptability. Omid current version provides transactions on data
> > stored in Apache HBase. However, Omid’s components are generic enough
> > to be adapted to any other key-value NoSQL datasource that supports
> > MVCC.
> >
> >
> > * Development. Omid provides a very simple interface that mimics
> > standard HBase APIs, making it developer friendly. Only minimal
> > extensions to the standard interfaces have been introduced to enable
> > transactions.
> >
> >
> > * Simplicity. Omid leverages the HBase infrastructure for managing its
> > own metadata. It entails no additional services apart from those
> > provided and used by HBase.
> >
> >
> > * Track Record. As we have mentioned, Omid is already in use by
> > very-large-scale production systems at Yahoo. Also, Hortonworks is
> > integrating Omid in a metastore implementation for Hive based on
> > HBase.
> >
> > [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
> >
> >
> > === Current Status ===
> > Current Omid implementation is available in both, Yahoo’s internal
> > Github repository for internal use at Yahoo as well as in Yahoo’s
> > Github public repository (https://github.com/yahoo/omid.git). Both
> > repositories are managed by Omid’s current developers at Yahoo.
> >
> > As it is mentioned above, Yahoo is currently using Omid for providing
> > transactions in Sieve, a web-scale content management platform that
> > powers Yahoo’s next-generation search and personalization products.
> >
> >
> > ==== Meritocracy ====
> > The first version of Omid was originally created in 2011 by Maysam
> > Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
> > Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
> >
> >
> > During the years after its inception, Omid has matured to operate at
> > Web scale and has been used internally by strategic projects at Yahoo
> > such as Sieve. The current base of committers belong to the Yahoo team
> > that took over the initial Omid prototype and rewrote it to meet the
> > high availability and scalability requirements of the Sieve project.
> > This base of committers has recently incorporated Hortonworks members
> > that helped in the Omid adaptation to HBase 1.x versions.
> >
> >
> > With this initial committer base, we aim to form a larger community
> > that can collaborate with new ideas over the current code base. This
> > new community will run the project following the "Apache Way"
> > (http://apache.org/foundation/governance/). Users and new contributors
> > will be treated with respect and welcomed. To grow the community, we
> > will encourage contributors to provide patches, review code, propose
> > new features improvements, talk at conferences such as Hadoop Summit,
> > HBaseCon, ApacheCon, etc. Committership and PMC membership will be
> > offered according to meritocracy.
> >
> > ==== Community ====
> >
> > The public Yahoo Omid repository at Github currently has 241 Stars and
> > 93 forks, which means that there is an important interest for the
> > project in the open-source community, at least compared with other
> > similar projects (See https://github.com/yahoo/omid.git).
> >
> >
> > Recently, Hortonworks contributors to the Apache Hive project which
> > are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
> > manifested interest in using Omid. We started with them a fruitful
> > collaboration that resulted in Omid supporting HBase 1.x versions.
> >
> >
> > Salesforce is also interested in collaborating in doing a Proof of
> > Concept for integrating Omid as a pluggable transaction manager in
> > Apache Phoenix.
> >
> >
> > Yahoo, Hortonworks and Salesforce participants will constitute the
> > initial set of committers and mentors for the proposal.
> >
> > ==== Core Developers ====
> > The core developers of Omid are all skilled software developers and
> > research engineers at Yahoo Inc. and Hortonworks with years of
> > experiences in their fields. At this moment, developers are
> > distributed across U.S. and Israel. The aim is to incorporate more
> > committers from different organizations and locations over time.
> >
> >
> > The current set of developers include experienced committers from
> > Apache HBase, Hive and Hadoop projects that have been working with us
> > in the current codebase found in Github.
> >
> > Finally, some of the core developers are currently NOT affiliated with
> > the ASF and would require new ICLAs to be filed.
> >
> >
> > === Alignment ===
> > Omid enhances with transactions the already successful Apache HBase
> > datastore project. We have collaborated with other developers inside
> > and outside Yahoo which are involved in the Apache HBase community, so
> > we have had reliable feedback from them.
> >
> > Although Omid brings value into HBase, the design of the current
> > version provides a general transaction scheme that can potentially be
> > adapted to other MVCC key-value datastores such as Apache Cassandra.
> >
> >
> > Apache Phoenix is also a potential target. Phoenix is a SQL layer on
> > top of HBase that can potentially integrate Omid in order to provide
> > the well-know concept of transactions to Phoenix-based applications.
> >
> >
> > === Known Risks ===
> > ==== Orphaned products ====
> > Yahoo’s Research and Search organizations have been taking care of
> > Omid development since the first prototype creation in 2011. Yahoo has
> > a long history participating in open-source projects, and has been
> > also a long time contributor to the Apache community. For example, in
> > Apache, Yahoo is an important contributor in many projects in the
> > Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
> > open-sourced other well-known projects outside Hadoop, such as
> > Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
> > Omid also a successful open-source Apache product. If this happens, we
> > are sure that a larger community will be formed around the project in
> > a relatively short period of time, contributing to the diversification
> > and stabilization of the base of committers.
> >
> >
> > ==== Inexperience with Open Source ====
> > This project has long standing experienced mentors and interested
> > contributors from Apache HBase, Hive and Phoenix to help us moving
> > through the open source process. We are actively working with
> > experienced Apache community members to improve our project and
> > further testing.
> >
> > ==== Homogeneous Developers ====
> > Omid has been supported by Yahoo since its inception in 2011. However,
> > all current committers are employed by their respective companies
> > shown in the Affiliations section.
> >
> >
> > ==== Reliance on Salaried Developers ====
> >
> > All the current developers are paid by their employers to contribute
> > to this project. Yahoo developers will also continuing maintaining the
> > internal Omid repository at their company.
> >
> > Of course, other developers are welcomed to contribute to this project
> > after it is open sourced in Apache.
> >
> > ==== Relationships with Other Apache Product ====
> >
> > Current Omid incarnation serves transactional contexts to applications
> > storing their data in HBase. However Omid design potentially allows to
> > be adapted to serve transactions on top of other MVCC-based key-value
> > datastores in Apache community such as Cassandra.
> >
> >
> > As a transactional framework, many other Apache projects such as
> > Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
> > potentially benefit from Omid to get transactional contexts. In
> > particular, Apache Phoenix -a SQL layer on top of HBase- might use
> > Omid as its transaction management component. Once we open source Omid
> > as an Apache project, we expect to generate more interest in the
> > surrounded communities.
> >
> >
> > Very recently, a new incubator proposal for a similar project called
> > Tephra, has been submitted to the ASF. We think this is good for the
> > Apache community, and we believe that there’s room for both proposals
> > as the design of each of them is based on different principles (e.g.
> > Omid does not require to maintain the state of ongoing transactions on
> > the server-side component) and due to the fact that both -Tephra and
> > Omid- have also gained certain traction in the open-source community.
> >
> >
> > With regard to the Apache projects that Omid uses, apart from HBase,
> > Omid relies on Apache Zookeeper and Curator projects in order to
> > coordinate the (re)connection of transaction managers (acting as
> > clients) to the conflict resolution component for transactions (server
> > side.) They’re also used in order to coordinate the master and backup
> > replicas in high availability scenarios.
> >
> >
> > ==== An Excessive Fascination with the Apache Brand ====
> >
> > We are applying to the Incubator process because we think that it is
> > the logical next step for the  Omid project after we open-sourced the
> > code in Github some years ago. Yahoo has a long-standing history of
> > contributing to Apache projects. The developers and contributors
> > understand the implications of making it an Apache project, and
> > strongly believe that the growing community can benefit from the
> > Apache environment, ecosystem, and infrastrastructure.
> >
> >
> > === Documentation ===
> > Current documentation about the project is available in the wiki of
> > Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
> > be moved under https://omid.incubator.apache.org/docs if the project
> > is accepted as an Apache Incubator.
> >
> > === Initial Source ===
> > Initial source code is currently hosted in Github for general viewing
> > and contribution:
> >
> > https://github.com/yahoo/omid.git
> >
> >
> > Omid source code is written in Java code (99%) mixed with some shell
> > script (1%) in order to configure and trigger the execution of main
> > components.
> >
> >
> > The code will be moved to Apache http://git.apache.org/ if accepted as
> > an Incubator project.
> >
> > === Source and Intellectual Property Submission Plan ===
> >
> > The current Omid License for the code published in Github is Apache
> > 2.0. If Omid fulfills and passes the conditions for being an Incubator
> > project in the ASF, the source code will be transitioned via the
> > Software Grant Agreement onto the ASF infrastructure and in turn made
> > available under the Apache License, version 2.0.
> >
> > === External Dependencies ===
> >
> >
> > The required external dependencies that are not Apache projects are
> > all Apache licenses or other compatible Licenses:
> >
> > Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
> >
> > JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
> >
> > Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
> >
> > Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
> >
> > Testng v6.8.8  (http://testng.org) [Apache 2.0]
> >
> > SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
> >
> > Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
> >
> > Google Protocol Buffers v2.5.0
> > (https://developers.google.com/protocol-buffers/) [BSD License]
> >
> > Mockito (http://mockito.org/) v1.9.5 [MIT License]
> >
> > LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
> > [Apache 2.0]
> >
> > Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
> > (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
> >
> > C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
> >
> > Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
> >
> >
> > === Cryptography ===
> > Omid project does not use cryptography itself. However, Apache HBase
> > -the datastore on top of which Omid works in its current version- uses
> > standard APIs and tools for SSH and SSL communication where necessary.
> >
> > === Required Resources ===
> > We request that following resources be created for the project to use:
> >
> > ==== Mailing lists ====
> >
> > omid-private (moderated subscriptions)
> >
> > omid-commits (commit notification)
> > omid-dev (technical discussions)
> >
> > ==== Git repository ====
> > https://github.com/apache/incubator-omid
> >
> > ==== Documentation ====
> > https://omid.incubator.apache.org/docs/
> >
> > ==== JIRA instance ====
> > https://issues.apache.org/jira/browse/omid
> >
> > === Initial Committers ===
> >
> > * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> >
> >
> > * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> >
> >
> > * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> >
> >
> > * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> >
> >
> > * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
> >
> >
> > * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
> >
> > * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> >
> >
> > * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
> >
> >
> > * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
> >
> >
> > * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
> >
> > * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> >
> >
> > === Additional Interested Contributors ===
> > * Ivan Kelly (ivank<AT>apache<DOT>org)
> >
> > * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
> >
> >
> > === Affiliations ===
> >
> > * Edward Bortnikov, Yahoo Inc.
> >
> >
> > * Daniel Dai, Hortonworks
> >
> >
> > * Flavio P. Junqueira, Confluent
> >
> >
> > * Igor Katkov, Yahoo Inc.
> >
> >
> > * Ivan Kelly, Midokura
> >
> >
> > * Francis C. Liu, Yahoo Inc.
> >
> >
> > * Sameer Paranjpye, Arimo
> >
> > * Francisco Perez-Sorrosal, Yahoo Inc.
> >
> >
> > * Ohad Shacham, Yahoo Inc.
> >
> >
> > * Maysam Yabandeh, Dropbox Inc.
> >
> >
> > === Sponsors ===
> >
> > ==== Champion ====
> >
> > Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> >
> > ==== Nominated Mentors ====
> >
> > Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> >
> > Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> >
> > Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> >
> > Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> >
> > James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> >
> >
> > ==== Sponsoring Entity ====
> > Apache Incubator PMC
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


RE: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Sean Broeder <se...@esgyn.com>.
Hi,
I've worked on the Distributed Transaction Manager that is now part of
Trafodion for years.   I have a vested interest in the DTM and in the
Trafodion project's success, but I'm not opposed to adding
additional/competing transaction managers to into incubation.  Ultimately, I
think it will build a better community and better products.

I also think that having the ability to plug in transaction manager (x) into
and application or (no)SQL engine is a good thing.  To that end I encourage
the other transaction manager projects to read the existing JIRA that
attempts to propose a standard API for transactions within HBase
[HBASE-11447 Proposal for a generic transaction API for HBase].  Read it,
comment, object, suggest alternatives and let's come up with a consensus
that allows us to easily swap transaction managers to suit the application
or workload.

+1 to adding Omid (non binding)

Regards,
Sean

-----Original Message-----
From: Pierre Smits [mailto:pierre.smits@gmail.com]
Sent: Sunday, March 20, 2016 1:35 PM
To: general@incubator.apache.org
Subject: Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Hi Henry,

It seems you (and several others) are forgetting the Trafodion, which also
privides transactions on N*SQL solutions, see http trafodion.apache.org

Best regards,

Pierre Smits

ORRTIZ.COM <http://www.orrtiz.com>
OFBiz based solutions & services

OFBiz Extensions Marketplace
http://oem.ofbizci.net/oci-2/

On Sat, Mar 19, 2016 at 12:19 AM, Henry Saputra <he...@gmail.com>
wrote:

> I know Apache incubator does not play favorite but it is getting
> awkward that TWO transaction engine for HBase coming to incubator at the
> same time.
>
> As most people know, the other one is Tephra, that just coming to
> incubator few weeks ago.
>
> As member of IPMC, I would like to see Omid provide some more details
> comparisons about the difference that the project bring,  in term of
> approach and possible integrations with other ASF projects.
>
> If possible, I would prefer to see Omid team work together with Tephra
> to work on working together to make one solid transaction engine for
> HBase and later NoSQL databases.
>
>
> - Henry
>
> On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <da...@gmail.com> wrote:
>
> > Hi,
> >
> > I would like to propose Omid as an Apache Incubator project:
> >
> > https://wiki.apache.org/incubator/OmidProposal
> >
> > I've posted posted the text of the proposal below:
> >
> > Thanks,
> > Daniel
> >
> > = Omid Proposal =
> >
> > === Abstract ===
> >
> > Omid is a flexible, reliable, high performant and scalable ACID
> > transactional framework that allows client applications to execute
> > transactions on top of MVCC key/value-based NoSQL datastores
> > (currently Apache HBase) providing Snapshot Isolation guarantees on
> > the accessed data.
> >
> >
> > === Proposal ===
> >
> > Omid is a flexible open-source transactional framework that provides
> > ACID transactions with Snapshot Isolation guarantees on top of NoSQL
> > datastores. In particular, the current codebase brings the concept
> > of transactions to the popular Apache HBase datastore. Omid offers
> > great performance, it is highly available, and scalable. Omid's
> > current version is able to scale to thousands of clients triggering
> > concurrent transactions on application data stored in HBase. Omid
> > can scale beyond 100K transactions per second on mid-range hardware
> > while incurring in a minimal impact on the speed of data access in
> > the datastore. We’re currently experimenting with a prototype
> > version that can improve the performance up to ~380K TPS.
> >
> >
> > Omid has been publicly available as an open-source project in Github
> > under Apache License Version 2.0 since 2011 [1]. During these years,
> > it has generated certain interest in the open source community,
> > especially since the public presentation of the first version in
> > Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars
> > and
> > 93 forks. Yahoo Inc. submits this proposal to the Apache Software
> > Foundation with the aim to transfer the Omid project -including its
> > source code and documentation- to Apache in order to start the build
> > of a stable open source community around it.
> >
> >
> > [1] https://github.com/yahoo/omid
> >
> > [2] Omid presentation at Hadoop Summit 2013:
> >
> >
> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2
> luyqLU464Nxz4aQe7EPBus
> >
> >
> > === Background ===
> >
> > An Omid prototype was first released as an open-source project back
> > in 2011. Inspired by Google Percolator [1], it offered a lock-free
> > approach to transactions in NoSQL datastores (See [2]). However,
> > during these years, the design of Omid has evolved significantly.
> > Whilst the current open-sourced version maintains many aspects of
> > the original implementation, it is the result of a major redesign of
> > the first prototype released in 2011.
> >
> >
> > Omid has now a more decentralized design that does not sacrifice the
> > consistency and performance of the original version. The current
> > design also enables Omid to scale to thousands of clients executing
> > transactions concurrently on application data stored in HBase.
> > Internally, Omid still utilizes a lock-free approach to support
> > multiple concurrent clients. Its design also relies on a centralized
> > conflict detection component, the TSO, which now resolves in an
> > efficient manner writeset collisions among concurrent transactions
> > without having to piggyback commit information to the clients.
> > Another important benefit of Omid is that it doesn't require any
> > modification of the underlying key-value datastore, HBase in this
> > case. Moreover, the recently added high availability algorithm
> > allows to eliminate the single point of failure represented by the
> > TSO in those system deployments requiring a higher degree of
> > dependability. Last but not least, the provided user API is very
> > simple, mimicking transaction managers in the relational world: begin,
> > commit, rollback.
> >
> >
> > Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
> > management platform powering some of next-generation search and
> > personalization products is using Omid as a transaction manager in
> > its processing pipeline. Sieve essentially acts as a huge processing
> > hub between content feeds and serving systems. It provides an
> > environment for highly customizable, real-time, streamed information
> > processing, with typical discovery-to-service latencies of just a
> > few seconds. In terms of scale and availability, Omid’s new design
> > was largely driven by Sieve’s requirements.
> >
> >
> > At Yahoo, we are also making an effort to disseminate the current
> > status of the project through blog entries (See [3], [4] and [5])
> > and submissions to technical and academic conferences such as ATC
> > 2016, Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid
> > also appeared in a TechCrunch article in the last quarter of 2015
> > (See [6])
> >
> >
> > [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
> > Distributed Transactions and Notifications. USENIX Symposium on
> > Operating Systems Design and Implementation, 2010
> >
> > [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
> > Omid: Lock-free transactional support for distributed data stores.
> > In Proc. of ICDE, 2013.
> >
> > [3]
> >
> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-trans
> action-processing-for
> >
> > [4]
> >
> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-
> protocol
> >
> > [5]
> >
> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-o
> mid
> >
> > [6]
> >
> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-bring
> s-scalable-transaction-processing-to-hbase/
> >
> >
> > === Rationale ===
> >
> > Programming with ACID (Atomicity, Consistency, Isolation,
> > Durability) transactions is very popular and it is featured in
> > relational databases. However, in the Big Data ecosystem,
> > applications typically use NoSQL datastores, which do not provide
> > ACID transactions. Such NoSQL datastores used to give up
> > transactional support for greater agility and scalability. However,
> > while early NoSQL data store implementations did not include
> > transaction support, the need for transactions soon emerged in Big
> > Data applications when accessing shared data; for  example,
> > transactions are very important  for modern, scalable systems that
> > process content incrementally.
> >
> >
> > NoSQL datastores -including HBase- don’t provide transactional
> > frameworks to coordinate the access to the underlying data for
> > preserving consistency. By using Omid, Big Data applications that
> > need to bundle multiple read and write operations on HBase into
> > logically indivisible units of work can execute transactions with
> > ACID properties, just as they would use transactions in the
> > relational database world. Omid extends the HBase key-value access
> > APl with transaction semantics. It can be exercised either directly,
> > or via higher level data management API’s. For example, Apache
> > Phoenix
> > (SQL-on-top-of-HBase) might use Omid as its transaction management
> > component.
> >
> >
> > The following features make Omid an attractive choice for system
> > designers and other projects in the Apache community:
> >
> >
> > * Semantics. Omid implements Snapshot Isolation (SI,) supported by
> > major SQL and NoSQL technologies (e.g. Google Percolator).
> >
> >
> > * Performance and Scalability. Omid  provides a highly scalable,
> > lock-free implementation of SI. To the best of our knowledge, it is
> > also one of the few open source NoSQL transactional platforms that
> > can execute more than 100K transactions per second [1]. A new
> > prototype still in development can go even further, up to ~380K TPS.
> >
> >
> > * Reliability.  Omid has a high-availability (HA) mode, in which the
> > core service performing writeset conflict resolution operates as
> > primary-backup process pair with automatic failover. The HA support
> > has zero overhead on the mainstream operation.
> >
> >
> > * Adaptability. Omid current version provides transactions on data
> > stored in Apache HBase. However, Omid’s components are generic
> > enough to be adapted to any other key-value NoSQL datasource that
> > supports MVCC.
> >
> >
> > * Development. Omid provides a very simple interface that mimics
> > standard HBase APIs, making it developer friendly. Only minimal
> > extensions to the standard interfaces have been introduced to enable
> > transactions.
> >
> >
> > * Simplicity. Omid leverages the HBase infrastructure for managing
> > its own metadata. It entails no additional services apart from those
> > provided and used by HBase.
> >
> >
> > * Track Record. As we have mentioned, Omid is already in use by
> > very-large-scale production systems at Yahoo. Also, Hortonworks is
> > integrating Omid in a metastore implementation for Hive based on
> > HBase.
> >
> > [1] See also Haeinsa:
> > https://github.com/vcnc/haeinsa/wiki/Performance
> >
> >
> > === Current Status ===
> > Current Omid implementation is available in both, Yahoo’s internal
> > Github repository for internal use at Yahoo as well as in Yahoo’s
> > Github public repository (https://github.com/yahoo/omid.git). Both
> > repositories are managed by Omid’s current developers at Yahoo.
> >
> > As it is mentioned above, Yahoo is currently using Omid for
> > providing transactions in Sieve, a web-scale content management
> > platform that powers Yahoo’s next-generation search and personalization
> > products.
> >
> >
> > ==== Meritocracy ====
> > The first version of Omid was originally created in 2011 by Maysam
> > Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and
> > Flavio Junqueira at the R&D Scalable Computing Group of Yahoo Labs in
> > Spain.
> >
> >
> > During the years after its inception, Omid has matured to operate at
> > Web scale and has been used internally by strategic projects at
> > Yahoo such as Sieve. The current base of committers belong to the
> > Yahoo team that took over the initial Omid prototype and rewrote it
> > to meet the high availability and scalability requirements of the Sieve
> > project.
> > This base of committers has recently incorporated Hortonworks
> > members that helped in the Omid adaptation to HBase 1.x versions.
> >
> >
> > With this initial committer base, we aim to form a larger community
> > that can collaborate with new ideas over the current code base. This
> > new community will run the project following the "Apache Way"
> > (http://apache.org/foundation/governance/). Users and new
> > contributors will be treated with respect and welcomed. To grow the
> > community, we will encourage contributors to provide patches, review
> > code, propose new features improvements, talk at conferences such as
> > Hadoop Summit, HBaseCon, ApacheCon, etc. Committership and PMC
> > membership will be offered according to meritocracy.
> >
> > ==== Community ====
> >
> > The public Yahoo Omid repository at Github currently has 241 Stars
> > and
> > 93 forks, which means that there is an important interest for the
> > project in the open-source community, at least compared with other
> > similar projects (See https://github.com/yahoo/omid.git).
> >
> >
> > Recently, Hortonworks contributors to the Apache Hive project which
> > are working on storing Hive metadata in HBase (Apache Jira
> > HIVE-9452) manifested interest in using Omid. We started with them a
> > fruitful collaboration that resulted in Omid supporting HBase 1.x
> > versions.
> >
> >
> > Salesforce is also interested in collaborating in doing a Proof of
> > Concept for integrating Omid as a pluggable transaction manager in
> > Apache Phoenix.
> >
> >
> > Yahoo, Hortonworks and Salesforce participants will constitute the
> > initial set of committers and mentors for the proposal.
> >
> > ==== Core Developers ====
> > The core developers of Omid are all skilled software developers and
> > research engineers at Yahoo Inc. and Hortonworks with years of
> > experiences in their fields. At this moment, developers are
> > distributed across U.S. and Israel. The aim is to incorporate more
> > committers from different organizations and locations over time.
> >
> >
> > The current set of developers include experienced committers from
> > Apache HBase, Hive and Hadoop projects that have been working with
> > us in the current codebase found in Github.
> >
> > Finally, some of the core developers are currently NOT affiliated
> > with the ASF and would require new ICLAs to be filed.
> >
> >
> > === Alignment ===
> > Omid enhances with transactions the already successful Apache HBase
> > datastore project. We have collaborated with other developers inside
> > and outside Yahoo which are involved in the Apache HBase community,
> > so we have had reliable feedback from them.
> >
> > Although Omid brings value into HBase, the design of the current
> > version provides a general transaction scheme that can potentially
> > be adapted to other MVCC key-value datastores such as Apache Cassandra.
> >
> >
> > Apache Phoenix is also a potential target. Phoenix is a SQL layer on
> > top of HBase that can potentially integrate Omid in order to provide
> > the well-know concept of transactions to Phoenix-based applications.
> >
> >
> > === Known Risks ===
> > ==== Orphaned products ====
> > Yahoo’s Research and Search organizations have been taking care of
> > Omid development since the first prototype creation in 2011. Yahoo
> > has a long history participating in open-source projects, and has
> > been also a long time contributor to the Apache community. For
> > example, in Apache, Yahoo is an important contributor in many
> > projects in the Hadoop ecosystem such as HBase, Pig, Storm or YARN,
> > and has also open-sourced other well-known projects outside Hadoop,
> > such as Zookeeper or Bookkeeper. So it is in the best interest of
> > Yahoo make Omid also a successful open-source Apache product. If
> > this happens, we are sure that a larger community will be formed
> > around the project in a relatively short period of time,
> > contributing to the diversification and stabilization of the base of
> > committers.
> >
> >
> > ==== Inexperience with Open Source ==== This project has long
> > standing experienced mentors and interested contributors from Apache
> > HBase, Hive and Phoenix to help us moving through the open source
> > process. We are actively working with experienced Apache community
> > members to improve our project and further testing.
> >
> > ==== Homogeneous Developers ====
> > Omid has been supported by Yahoo since its inception in 2011.
> > However, all current committers are employed by their respective
> > companies shown in the Affiliations section.
> >
> >
> > ==== Reliance on Salaried Developers ====
> >
> > All the current developers are paid by their employers to contribute
> > to this project. Yahoo developers will also continuing maintaining
> > the internal Omid repository at their company.
> >
> > Of course, other developers are welcomed to contribute to this
> > project after it is open sourced in Apache.
> >
> > ==== Relationships with Other Apache Product ====
> >
> > Current Omid incarnation serves transactional contexts to
> > applications storing their data in HBase. However Omid design
> > potentially allows to be adapted to serve transactions on top of
> > other MVCC-based key-value datastores in Apache community such as
> > Cassandra.
> >
> >
> > As a transactional framework, many other Apache projects such as
> > Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
> > potentially benefit from Omid to get transactional contexts. In
> > particular, Apache Phoenix -a SQL layer on top of HBase- might use
> > Omid as its transaction management component. Once we open source
> > Omid as an Apache project, we expect to generate more interest in
> > the surrounded communities.
> >
> >
> > Very recently, a new incubator proposal for a similar project called
> > Tephra, has been submitted to the ASF. We think this is good for the
> > Apache community, and we believe that there’s room for both
> > proposals as the design of each of them is based on different principles
> > (e.g.
> > Omid does not require to maintain the state of ongoing transactions
> > on the server-side component) and due to the fact that both -Tephra
> > and
> > Omid- have also gained certain traction in the open-source community.
> >
> >
> > With regard to the Apache projects that Omid uses, apart from HBase,
> > Omid relies on Apache Zookeeper and Curator projects in order to
> > coordinate the (re)connection of transaction managers (acting as
> > clients) to the conflict resolution component for transactions
> > (server
> > side.) They’re also used in order to coordinate the master and
> > backup replicas in high availability scenarios.
> >
> >
> > ==== An Excessive Fascination with the Apache Brand ====
> >
> > We are applying to the Incubator process because we think that it is
> > the logical next step for the  Omid project after we open-sourced
> > the code in Github some years ago. Yahoo has a long-standing history
> > of contributing to Apache projects. The developers and contributors
> > understand the implications of making it an Apache project, and
> > strongly believe that the growing community can benefit from the
> > Apache environment, ecosystem, and infrastrastructure.
> >
> >
> > === Documentation ===
> > Current documentation about the project is available in the wiki of
> > Omid’s Github repository: https://github.com/yahoo/omid/wiki . It
> > will be moved under https://omid.incubator.apache.org/docs if the
> > project is accepted as an Apache Incubator.
> >
> > === Initial Source ===
> > Initial source code is currently hosted in Github for general
> > viewing and contribution:
> >
> > https://github.com/yahoo/omid.git
> >
> >
> > Omid source code is written in Java code (99%) mixed with some shell
> > script (1%) in order to configure and trigger the execution of main
> > components.
> >
> >
> > The code will be moved to Apache http://git.apache.org/ if accepted
> > as an Incubator project.
> >
> > === Source and Intellectual Property Submission Plan ===
> >
> > The current Omid License for the code published in Github is Apache
> > 2.0. If Omid fulfills and passes the conditions for being an
> > Incubator project in the ASF, the source code will be transitioned
> > via the Software Grant Agreement onto the ASF infrastructure and in
> > turn made available under the Apache License, version 2.0.
> >
> > === External Dependencies ===
> >
> >
> > The required external dependencies that are not Apache projects are
> > all Apache licenses or other compatible Licenses:
> >
> > Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
> >
> > JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
> >
> > Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
> >
> > Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache
> > 2.0]
> >
> > Testng v6.8.8  (http://testng.org) [Apache 2.0]
> >
> > SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
> >
> > Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
> >
> > Google Protocol Buffers v2.5.0
> > (https://developers.google.com/protocol-buffers/) [BSD License]
> >
> > Mockito (http://mockito.org/) v1.9.5 [MIT License]
> >
> > LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
> > [Apache 2.0]
> >
> > Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
> > (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
> >
> > C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
> >
> > Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
> >
> >
> > === Cryptography ===
> > Omid project does not use cryptography itself. However, Apache HBase
> > -the datastore on top of which Omid works in its current version-
> > uses standard APIs and tools for SSH and SSL communication where
> > necessary.
> >
> > === Required Resources ===
> > We request that following resources be created for the project to use:
> >
> > ==== Mailing lists ====
> >
> > omid-private (moderated subscriptions)
> >
> > omid-commits (commit notification)
> > omid-dev (technical discussions)
> >
> > ==== Git repository ====
> > https://github.com/apache/incubator-omid
> >
> > ==== Documentation ====
> > https://omid.incubator.apache.org/docs/
> >
> > ==== JIRA instance ====
> > https://issues.apache.org/jira/browse/omid
> >
> > === Initial Committers ===
> >
> > * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> >
> >
> > * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> >
> >
> > * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> >
> >
> > * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> >
> >
> > * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
> >
> >
> > * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
> >
> > * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> >
> >
> > * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
> >
> >
> > * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
> >
> >
> > * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
> >
> > * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> >
> >
> > === Additional Interested Contributors ===
> > * Ivan Kelly (ivank<AT>apache<DOT>org)
> >
> > * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
> >
> >
> > === Affiliations ===
> >
> > * Edward Bortnikov, Yahoo Inc.
> >
> >
> > * Daniel Dai, Hortonworks
> >
> >
> > * Flavio P. Junqueira, Confluent
> >
> >
> > * Igor Katkov, Yahoo Inc.
> >
> >
> > * Ivan Kelly, Midokura
> >
> >
> > * Francis C. Liu, Yahoo Inc.
> >
> >
> > * Sameer Paranjpye, Arimo
> >
> > * Francisco Perez-Sorrosal, Yahoo Inc.
> >
> >
> > * Ohad Shacham, Yahoo Inc.
> >
> >
> > * Maysam Yabandeh, Dropbox Inc.
> >
> >
> > === Sponsors ===
> >
> > ==== Champion ====
> >
> > Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> >
> > ==== Nominated Mentors ====
> >
> > Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> >
> > Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> >
> > Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> >
> > Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> >
> > James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> >
> >
> > ==== Sponsoring Entity ====
> > Apache Incubator PMC
> >
> > --------------------------------------------------------------------
> > - To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Pierre Smits <pi...@gmail.com>.
Hi Henry,

It seems you (and several others) are forgetting the Trafodion, which also
privides transactions on N*SQL solutions, see http trafodion.apache.org

Best regards,

Pierre Smits

ORRTIZ.COM <http://www.orrtiz.com>
OFBiz based solutions & services

OFBiz Extensions Marketplace
http://oem.ofbizci.net/oci-2/

On Sat, Mar 19, 2016 at 12:19 AM, Henry Saputra <he...@gmail.com>
wrote:

> I know Apache incubator does not play favorite but it is getting awkward
> that TWO transaction engine for HBase coming to incubator at the same time.
>
> As most people know, the other one is Tephra, that just coming to incubator
> few weeks ago.
>
> As member of IPMC, I would like to see Omid provide some more details
> comparisons about the difference that the project bring,  in term of
> approach and possible integrations with other ASF projects.
>
> If possible, I would prefer to see Omid team work together with Tephra to
> work on working together to make one solid transaction engine for HBase and
> later NoSQL databases.
>
>
> - Henry
>
> On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <da...@gmail.com> wrote:
>
> > Hi,
> >
> > I would like to propose Omid as an Apache Incubator project:
> >
> > https://wiki.apache.org/incubator/OmidProposal
> >
> > I've posted posted the text of the proposal below:
> >
> > Thanks,
> > Daniel
> >
> > = Omid Proposal =
> >
> > === Abstract ===
> >
> > Omid is a flexible, reliable, high performant and scalable ACID
> > transactional framework that allows client applications to execute
> > transactions on top of MVCC key/value-based NoSQL datastores
> > (currently Apache HBase) providing Snapshot Isolation guarantees on
> > the accessed data.
> >
> >
> > === Proposal ===
> >
> > Omid is a flexible open-source transactional framework that provides
> > ACID transactions with Snapshot Isolation guarantees on top of NoSQL
> > datastores. In particular, the current codebase brings the concept of
> > transactions to the popular Apache HBase datastore. Omid offers great
> > performance, it is highly available, and scalable. Omid's current
> > version is able to scale to thousands of clients triggering concurrent
> > transactions on application data stored in HBase. Omid can scale
> > beyond 100K transactions per second on mid-range hardware while
> > incurring in a minimal impact on the speed of data access in the
> > datastore. We’re currently experimenting with a prototype version that
> > can improve the performance up to ~380K TPS.
> >
> >
> > Omid has been publicly available as an open-source project in Github
> > under Apache License Version 2.0 since 2011 [1]. During these years,
> > it has generated certain interest in the open source community,
> > especially since the public presentation of the first version in
> > Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
> > 93 forks. Yahoo Inc. submits this proposal to the Apache Software
> > Foundation with the aim to transfer the Omid project -including its
> > source code and documentation- to Apache in order to start the build
> > of a stable open source community around it.
> >
> >
> > [1] https://github.com/yahoo/omid
> >
> > [2] Omid presentation at Hadoop Summit 2013:
> >
> >
> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
> >
> >
> > === Background ===
> >
> > An Omid prototype was first released as an open-source project back in
> > 2011. Inspired by Google Percolator [1], it offered a lock-free
> > approach to transactions in NoSQL datastores (See [2]). However,
> > during these years, the design of Omid has evolved significantly.
> > Whilst the current open-sourced version maintains many aspects of the
> > original implementation, it is the result of a major redesign of the
> > first prototype released in 2011.
> >
> >
> > Omid has now a more decentralized design that does not sacrifice the
> > consistency and performance of the original version. The current
> > design also enables Omid to scale to thousands of clients executing
> > transactions concurrently on application data stored in HBase.
> > Internally, Omid still utilizes a lock-free approach to support
> > multiple concurrent clients. Its design also relies on a centralized
> > conflict detection component, the TSO, which now resolves in an
> > efficient manner writeset collisions among concurrent transactions
> > without having to piggyback commit information to the clients. Another
> > important benefit of Omid is that it doesn't require any modification
> > of the underlying key-value datastore, HBase in this case. Moreover,
> > the recently added high availability algorithm allows to eliminate the
> > single point of failure represented by the TSO in those system
> > deployments requiring a higher degree of dependability. Last but not
> > least, the provided user API is very simple, mimicking transaction
> > managers in the relational world: begin, commit, rollback.
> >
> >
> > Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
> > management platform powering some of next-generation search and
> > personalization products is using Omid as a transaction manager in its
> > processing pipeline. Sieve essentially acts as a huge processing hub
> > between content feeds and serving systems. It provides an environment
> > for highly customizable, real-time, streamed information processing,
> > with typical discovery-to-service latencies of just a few seconds. In
> > terms of scale and availability, Omid’s new design was largely driven
> > by Sieve’s requirements.
> >
> >
> > At Yahoo, we are also making an effort to disseminate the current
> > status of the project through blog entries (See [3], [4] and [5]) and
> > submissions to technical and academic conferences such as ATC 2016,
> > Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
> > appeared in a TechCrunch article in the last quarter of 2015 (See [6])
> >
> >
> > [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
> > Distributed Transactions and Notifications. USENIX Symposium on
> > Operating Systems Design and Implementation, 2010
> >
> > [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
> > Omid: Lock-free transactional support for distributed data stores. In
> > Proc. of ICDE, 2013.
> >
> > [3]
> >
> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
> >
> > [4]
> >
> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
> >
> > [5]
> >
> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
> >
> > [6]
> >
> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
> >
> >
> > === Rationale ===
> >
> > Programming with ACID (Atomicity, Consistency, Isolation, Durability)
> > transactions is very popular and it is featured in relational
> > databases. However, in the Big Data ecosystem, applications typically
> > use NoSQL datastores, which do not provide ACID transactions. Such
> > NoSQL datastores used to give up transactional support for greater
> > agility and scalability. However, while early NoSQL data store
> > implementations did not include transaction support, the need for
> > transactions soon emerged in Big Data applications when accessing
> > shared data; for  example, transactions are very important  for
> > modern, scalable systems that process content incrementally.
> >
> >
> > NoSQL datastores -including HBase- don’t provide transactional
> > frameworks to coordinate the access to the underlying data for
> > preserving consistency. By using Omid, Big Data applications that need
> > to bundle multiple read and write operations on HBase into logically
> > indivisible units of work can execute transactions with ACID
> > properties, just as they would use transactions in the relational
> > database world. Omid extends the HBase key-value access APl with
> > transaction semantics. It can be exercised either directly, or via
> > higher level data management API’s. For example, Apache Phoenix
> > (SQL-on-top-of-HBase) might use Omid as its transaction management
> > component.
> >
> >
> > The following features make Omid an attractive choice for system
> > designers and other projects in the Apache community:
> >
> >
> > * Semantics. Omid implements Snapshot Isolation (SI,) supported by
> > major SQL and NoSQL technologies (e.g. Google Percolator).
> >
> >
> > * Performance and Scalability. Omid  provides a highly scalable,
> > lock-free implementation of SI. To the best of our knowledge, it is
> > also one of the few open source NoSQL transactional platforms that can
> > execute more than 100K transactions per second [1]. A new prototype
> > still in development can go even further, up to ~380K TPS.
> >
> >
> > * Reliability.  Omid has a high-availability (HA) mode, in which the
> > core service performing writeset conflict resolution operates as
> > primary-backup process pair with automatic failover. The HA support
> > has zero overhead on the mainstream operation.
> >
> >
> > * Adaptability. Omid current version provides transactions on data
> > stored in Apache HBase. However, Omid’s components are generic enough
> > to be adapted to any other key-value NoSQL datasource that supports
> > MVCC.
> >
> >
> > * Development. Omid provides a very simple interface that mimics
> > standard HBase APIs, making it developer friendly. Only minimal
> > extensions to the standard interfaces have been introduced to enable
> > transactions.
> >
> >
> > * Simplicity. Omid leverages the HBase infrastructure for managing its
> > own metadata. It entails no additional services apart from those
> > provided and used by HBase.
> >
> >
> > * Track Record. As we have mentioned, Omid is already in use by
> > very-large-scale production systems at Yahoo. Also, Hortonworks is
> > integrating Omid in a metastore implementation for Hive based on
> > HBase.
> >
> > [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
> >
> >
> > === Current Status ===
> > Current Omid implementation is available in both, Yahoo’s internal
> > Github repository for internal use at Yahoo as well as in Yahoo’s
> > Github public repository (https://github.com/yahoo/omid.git). Both
> > repositories are managed by Omid’s current developers at Yahoo.
> >
> > As it is mentioned above, Yahoo is currently using Omid for providing
> > transactions in Sieve, a web-scale content management platform that
> > powers Yahoo’s next-generation search and personalization products.
> >
> >
> > ==== Meritocracy ====
> > The first version of Omid was originally created in 2011 by Maysam
> > Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
> > Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
> >
> >
> > During the years after its inception, Omid has matured to operate at
> > Web scale and has been used internally by strategic projects at Yahoo
> > such as Sieve. The current base of committers belong to the Yahoo team
> > that took over the initial Omid prototype and rewrote it to meet the
> > high availability and scalability requirements of the Sieve project.
> > This base of committers has recently incorporated Hortonworks members
> > that helped in the Omid adaptation to HBase 1.x versions.
> >
> >
> > With this initial committer base, we aim to form a larger community
> > that can collaborate with new ideas over the current code base. This
> > new community will run the project following the "Apache Way"
> > (http://apache.org/foundation/governance/). Users and new contributors
> > will be treated with respect and welcomed. To grow the community, we
> > will encourage contributors to provide patches, review code, propose
> > new features improvements, talk at conferences such as Hadoop Summit,
> > HBaseCon, ApacheCon, etc. Committership and PMC membership will be
> > offered according to meritocracy.
> >
> > ==== Community ====
> >
> > The public Yahoo Omid repository at Github currently has 241 Stars and
> > 93 forks, which means that there is an important interest for the
> > project in the open-source community, at least compared with other
> > similar projects (See https://github.com/yahoo/omid.git).
> >
> >
> > Recently, Hortonworks contributors to the Apache Hive project which
> > are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
> > manifested interest in using Omid. We started with them a fruitful
> > collaboration that resulted in Omid supporting HBase 1.x versions.
> >
> >
> > Salesforce is also interested in collaborating in doing a Proof of
> > Concept for integrating Omid as a pluggable transaction manager in
> > Apache Phoenix.
> >
> >
> > Yahoo, Hortonworks and Salesforce participants will constitute the
> > initial set of committers and mentors for the proposal.
> >
> > ==== Core Developers ====
> > The core developers of Omid are all skilled software developers and
> > research engineers at Yahoo Inc. and Hortonworks with years of
> > experiences in their fields. At this moment, developers are
> > distributed across U.S. and Israel. The aim is to incorporate more
> > committers from different organizations and locations over time.
> >
> >
> > The current set of developers include experienced committers from
> > Apache HBase, Hive and Hadoop projects that have been working with us
> > in the current codebase found in Github.
> >
> > Finally, some of the core developers are currently NOT affiliated with
> > the ASF and would require new ICLAs to be filed.
> >
> >
> > === Alignment ===
> > Omid enhances with transactions the already successful Apache HBase
> > datastore project. We have collaborated with other developers inside
> > and outside Yahoo which are involved in the Apache HBase community, so
> > we have had reliable feedback from them.
> >
> > Although Omid brings value into HBase, the design of the current
> > version provides a general transaction scheme that can potentially be
> > adapted to other MVCC key-value datastores such as Apache Cassandra.
> >
> >
> > Apache Phoenix is also a potential target. Phoenix is a SQL layer on
> > top of HBase that can potentially integrate Omid in order to provide
> > the well-know concept of transactions to Phoenix-based applications.
> >
> >
> > === Known Risks ===
> > ==== Orphaned products ====
> > Yahoo’s Research and Search organizations have been taking care of
> > Omid development since the first prototype creation in 2011. Yahoo has
> > a long history participating in open-source projects, and has been
> > also a long time contributor to the Apache community. For example, in
> > Apache, Yahoo is an important contributor in many projects in the
> > Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
> > open-sourced other well-known projects outside Hadoop, such as
> > Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
> > Omid also a successful open-source Apache product. If this happens, we
> > are sure that a larger community will be formed around the project in
> > a relatively short period of time, contributing to the diversification
> > and stabilization of the base of committers.
> >
> >
> > ==== Inexperience with Open Source ====
> > This project has long standing experienced mentors and interested
> > contributors from Apache HBase, Hive and Phoenix to help us moving
> > through the open source process. We are actively working with
> > experienced Apache community members to improve our project and
> > further testing.
> >
> > ==== Homogeneous Developers ====
> > Omid has been supported by Yahoo since its inception in 2011. However,
> > all current committers are employed by their respective companies
> > shown in the Affiliations section.
> >
> >
> > ==== Reliance on Salaried Developers ====
> >
> > All the current developers are paid by their employers to contribute
> > to this project. Yahoo developers will also continuing maintaining the
> > internal Omid repository at their company.
> >
> > Of course, other developers are welcomed to contribute to this project
> > after it is open sourced in Apache.
> >
> > ==== Relationships with Other Apache Product ====
> >
> > Current Omid incarnation serves transactional contexts to applications
> > storing their data in HBase. However Omid design potentially allows to
> > be adapted to serve transactions on top of other MVCC-based key-value
> > datastores in Apache community such as Cassandra.
> >
> >
> > As a transactional framework, many other Apache projects such as
> > Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
> > potentially benefit from Omid to get transactional contexts. In
> > particular, Apache Phoenix -a SQL layer on top of HBase- might use
> > Omid as its transaction management component. Once we open source Omid
> > as an Apache project, we expect to generate more interest in the
> > surrounded communities.
> >
> >
> > Very recently, a new incubator proposal for a similar project called
> > Tephra, has been submitted to the ASF. We think this is good for the
> > Apache community, and we believe that there’s room for both proposals
> > as the design of each of them is based on different principles (e.g.
> > Omid does not require to maintain the state of ongoing transactions on
> > the server-side component) and due to the fact that both -Tephra and
> > Omid- have also gained certain traction in the open-source community.
> >
> >
> > With regard to the Apache projects that Omid uses, apart from HBase,
> > Omid relies on Apache Zookeeper and Curator projects in order to
> > coordinate the (re)connection of transaction managers (acting as
> > clients) to the conflict resolution component for transactions (server
> > side.) They’re also used in order to coordinate the master and backup
> > replicas in high availability scenarios.
> >
> >
> > ==== An Excessive Fascination with the Apache Brand ====
> >
> > We are applying to the Incubator process because we think that it is
> > the logical next step for the  Omid project after we open-sourced the
> > code in Github some years ago. Yahoo has a long-standing history of
> > contributing to Apache projects. The developers and contributors
> > understand the implications of making it an Apache project, and
> > strongly believe that the growing community can benefit from the
> > Apache environment, ecosystem, and infrastrastructure.
> >
> >
> > === Documentation ===
> > Current documentation about the project is available in the wiki of
> > Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
> > be moved under https://omid.incubator.apache.org/docs if the project
> > is accepted as an Apache Incubator.
> >
> > === Initial Source ===
> > Initial source code is currently hosted in Github for general viewing
> > and contribution:
> >
> > https://github.com/yahoo/omid.git
> >
> >
> > Omid source code is written in Java code (99%) mixed with some shell
> > script (1%) in order to configure and trigger the execution of main
> > components.
> >
> >
> > The code will be moved to Apache http://git.apache.org/ if accepted as
> > an Incubator project.
> >
> > === Source and Intellectual Property Submission Plan ===
> >
> > The current Omid License for the code published in Github is Apache
> > 2.0. If Omid fulfills and passes the conditions for being an Incubator
> > project in the ASF, the source code will be transitioned via the
> > Software Grant Agreement onto the ASF infrastructure and in turn made
> > available under the Apache License, version 2.0.
> >
> > === External Dependencies ===
> >
> >
> > The required external dependencies that are not Apache projects are
> > all Apache licenses or other compatible Licenses:
> >
> > Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
> >
> > JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
> >
> > Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
> >
> > Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
> >
> > Testng v6.8.8  (http://testng.org) [Apache 2.0]
> >
> > SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
> >
> > Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
> >
> > Google Protocol Buffers v2.5.0
> > (https://developers.google.com/protocol-buffers/) [BSD License]
> >
> > Mockito (http://mockito.org/) v1.9.5 [MIT License]
> >
> > LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
> > [Apache 2.0]
> >
> > Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
> > (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
> >
> > C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
> >
> > Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
> >
> >
> > === Cryptography ===
> > Omid project does not use cryptography itself. However, Apache HBase
> > -the datastore on top of which Omid works in its current version- uses
> > standard APIs and tools for SSH and SSL communication where necessary.
> >
> > === Required Resources ===
> > We request that following resources be created for the project to use:
> >
> > ==== Mailing lists ====
> >
> > omid-private (moderated subscriptions)
> >
> > omid-commits (commit notification)
> > omid-dev (technical discussions)
> >
> > ==== Git repository ====
> > https://github.com/apache/incubator-omid
> >
> > ==== Documentation ====
> > https://omid.incubator.apache.org/docs/
> >
> > ==== JIRA instance ====
> > https://issues.apache.org/jira/browse/omid
> >
> > === Initial Committers ===
> >
> > * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> >
> >
> > * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> >
> >
> > * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> >
> >
> > * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> >
> >
> > * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
> >
> >
> > * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
> >
> > * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> >
> >
> > * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
> >
> >
> > * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
> >
> >
> > * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
> >
> > * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> >
> >
> > === Additional Interested Contributors ===
> > * Ivan Kelly (ivank<AT>apache<DOT>org)
> >
> > * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
> >
> >
> > === Affiliations ===
> >
> > * Edward Bortnikov, Yahoo Inc.
> >
> >
> > * Daniel Dai, Hortonworks
> >
> >
> > * Flavio P. Junqueira, Confluent
> >
> >
> > * Igor Katkov, Yahoo Inc.
> >
> >
> > * Ivan Kelly, Midokura
> >
> >
> > * Francis C. Liu, Yahoo Inc.
> >
> >
> > * Sameer Paranjpye, Arimo
> >
> > * Francisco Perez-Sorrosal, Yahoo Inc.
> >
> >
> > * Ohad Shacham, Yahoo Inc.
> >
> >
> > * Maysam Yabandeh, Dropbox Inc.
> >
> >
> > === Sponsors ===
> >
> > ==== Champion ====
> >
> > Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> >
> > ==== Nominated Mentors ====
> >
> > Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> >
> > Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> >
> > Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> >
> > Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> >
> > James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> >
> >
> > ==== Sponsoring Entity ====
> > Apache Incubator PMC
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
>

Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Flavio Junqueira <fp...@apache.org>.
Those are great observations, Nick. Merging projects is often hard even if the developers are willing to do it, though. The projects could be already running in production and major changes could be disruptive to the point of making the merged project not viable.

Attempting to merge projects through a conversation between the two parties sounds reasonable, but I suspect that many times the two parties will prefer to at least start independently. Perhaps the incubator can do the job of bridging the conversation and making sure that the differences are sorted out before graduation and assess if a merge is possible during the process.

-Flavio 


> On 19 Mar 2016, at 11:23, Nick Burch <ni...@apache.org> wrote:
> 
> On Fri, 18 Mar 2016, Greg Trasuk wrote:
>> I don’t think it’s the Incubator’s job to choose which competing projects should join the foundation.  All we’re here to do is to make sure that a community knows how to act like an Apache community, and that the artifacts are licensed properly.
> 
> This is only my view, and I know that some key incubator folks think it's too prescriptive, but I have seen it work
> 
> TL;DR - Alternate ideas and approaches Good, Confusion or Corporatism Bad
> 
> Where we have two different communities, working in the same space, but in different languages or different approaches, then that's fine. The ASF doesn't pick "winners", it picks "runners". So, having a Batch implementation of the Foo protocol in C, and having a proposed podling for a Streaming implementation of the Foo protocol in Java is fine.
> 
> Where we have two different companies doing rival implementations who refuse to co-operate, that's an issue. Two companies who are competitors, who both read the "Foo protocol" spec / Foo paper, and who found rival projects to implement Foo in Java, is a problem. They don't have a technical distinction, just a refusal to co-operate and a refusal to take off $DAYJOB hats and a refusal to work for the best interests of the community. That's an issue for the incubator and the ASF
> 
> If we have a similar proposed project coming in, I would expect the proposed project to have a chat with the existing one to see if a merger is possible. If they're in the same langauge, and take similar approaches, then a merger could deliver a better community with more features, which would be better for everyone.
> 
> However, if they two communities had a chat, and decided they really were different + could explain that, then in my book that's fine. Document and explain those, so potential new community members can pick the "right" one for them. Maybe collaborate on some common code / tests / etc, don't bad-mouth each other, and help new people pick the appropriate one for them, then that's fine.
> 
> AcmeCorp and Contoso both want to bring a Java project for doing Foo, and won't co-operate because they're competitors = red flag
> 
> AcmeCorp found a Ruby project for Foo, grow it, bring it to the ASF, then a formerly Contoso backed Java project for Foo comes, fine.
> 
> AcmeCorp did a "Foo for 1-3 machines that's easy to get started with" and want to bring that, while Contoso have been working on a Foo that's a bit tough to setup for small clusters, but scales brilliantly past 3 racks, that's fine. The can share some Foo compliance tests, and new community members can consider their deployment sizes and pick the "right" one to join for them
> 
> 
> Only my view, though at least some others share it, I hope that helps at least a little?
> 
> Nick
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Nick Burch <ni...@apache.org>.
On Fri, 18 Mar 2016, Greg Trasuk wrote:
> I don’t think it’s the Incubator’s job to choose which competing 
> projects should join the foundation.  All we’re here to do is to make 
> sure that a community knows how to act like an Apache community, and 
> that the artifacts are licensed properly.

This is only my view, and I know that some key incubator folks think 
it's too prescriptive, but I have seen it work

TL;DR - Alternate ideas and approaches Good, Confusion or Corporatism Bad

Where we have two different communities, working in the same space, but in 
different languages or different approaches, then that's fine. The ASF 
doesn't pick "winners", it picks "runners". So, having a Batch 
implementation of the Foo protocol in C, and having a proposed podling for 
a Streaming implementation of the Foo protocol in Java is fine.

Where we have two different companies doing rival implementations who 
refuse to co-operate, that's an issue. Two companies who are competitors, 
who both read the "Foo protocol" spec / Foo paper, and who found rival 
projects to implement Foo in Java, is a problem. They don't have a 
technical distinction, just a refusal to co-operate and a refusal to take 
off $DAYJOB hats and a refusal to work for the best interests of the 
community. That's an issue for the incubator and the ASF

If we have a similar proposed project coming in, I would expect the 
proposed project to have a chat with the existing one to see if a merger 
is possible. If they're in the same langauge, and take similar approaches, 
then a merger could deliver a better community with more features, which 
would be better for everyone.

However, if they two communities had a chat, and decided they really were 
different + could explain that, then in my book that's fine. Document and 
explain those, so potential new community members can pick the "right" one 
for them. Maybe collaborate on some common code / tests / etc, don't 
bad-mouth each other, and help new people pick the appropriate one for 
them, then that's fine.

AcmeCorp and Contoso both want to bring a Java project for doing Foo, and 
won't co-operate because they're competitors = red flag

AcmeCorp found a Ruby project for Foo, grow it, bring it to the ASF, then 
a formerly Contoso backed Java project for Foo comes, fine.

AcmeCorp did a "Foo for 1-3 machines that's easy to get started with" and 
want to bring that, while Contoso have been working on a Foo that's a bit 
tough to setup for small clusters, but scales brilliantly past 3 racks, 
that's fine. The can share some Foo compliance tests, and new community 
members can consider their deployment sizes and pick the "right" one to 
join for them


Only my view, though at least some others share it, I hope that helps at 
least a little?

Nick

Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Greg Trasuk <tr...@stratuscom.com>.
I don’t think it’s the Incubator’s job to choose which competing projects should join the foundation.  All we’re here to do is to make sure that a community knows how to act like an Apache community, and that the artifacts are licensed properly.

It’s probably worth pointing out to both projects that the other one is out there, just because it’s possible they could work together, and a larger community is usually more stable.   But I certainly wouldn’t want to see the Incubator turn down a project just because it’s similar to one that’s already part of Apache.  Would we have turned down Tomcat because we already had an http server?

Cheers,

Greg Trasuk.

> On Mar 18, 2016, at 7:19 PM, Henry Saputra <he...@gmail.com> wrote:
> 
> I know Apache incubator does not play favorite but it is getting awkward
> that TWO transaction engine for HBase coming to incubator at the same time.
> 
> As most people know, the other one is Tephra, that just coming to incubator
> few weeks ago.
> 
> As member of IPMC, I would like to see Omid provide some more details
> comparisons about the difference that the project bring,  in term of
> approach and possible integrations with other ASF projects.
> 
> If possible, I would prefer to see Omid team work together with Tephra to
> work on working together to make one solid transaction engine for HBase and
> later NoSQL databases.
> 
> 
> - Henry
> 
> On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <da...@gmail.com> wrote:
> 
>> Hi,
>> 
>> I would like to propose Omid as an Apache Incubator project:
>> 
>> https://wiki.apache.org/incubator/OmidProposal
>> 
>> I've posted posted the text of the proposal below:
>> 
>> Thanks,
>> Daniel
>> 
>> = Omid Proposal =
>> 
>> === Abstract ===
>> 
>> Omid is a flexible, reliable, high performant and scalable ACID
>> transactional framework that allows client applications to execute
>> transactions on top of MVCC key/value-based NoSQL datastores
>> (currently Apache HBase) providing Snapshot Isolation guarantees on
>> the accessed data.
>> 
>> 
>> === Proposal ===
>> 
>> Omid is a flexible open-source transactional framework that provides
>> ACID transactions with Snapshot Isolation guarantees on top of NoSQL
>> datastores. In particular, the current codebase brings the concept of
>> transactions to the popular Apache HBase datastore. Omid offers great
>> performance, it is highly available, and scalable. Omid's current
>> version is able to scale to thousands of clients triggering concurrent
>> transactions on application data stored in HBase. Omid can scale
>> beyond 100K transactions per second on mid-range hardware while
>> incurring in a minimal impact on the speed of data access in the
>> datastore. We’re currently experimenting with a prototype version that
>> can improve the performance up to ~380K TPS.
>> 
>> 
>> Omid has been publicly available as an open-source project in Github
>> under Apache License Version 2.0 since 2011 [1]. During these years,
>> it has generated certain interest in the open source community,
>> especially since the public presentation of the first version in
>> Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
>> 93 forks. Yahoo Inc. submits this proposal to the Apache Software
>> Foundation with the aim to transfer the Omid project -including its
>> source code and documentation- to Apache in order to start the build
>> of a stable open source community around it.
>> 
>> 
>> [1] https://github.com/yahoo/omid
>> 
>> [2] Omid presentation at Hadoop Summit 2013:
>> 
>> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
>> 
>> 
>> === Background ===
>> 
>> An Omid prototype was first released as an open-source project back in
>> 2011. Inspired by Google Percolator [1], it offered a lock-free
>> approach to transactions in NoSQL datastores (See [2]). However,
>> during these years, the design of Omid has evolved significantly.
>> Whilst the current open-sourced version maintains many aspects of the
>> original implementation, it is the result of a major redesign of the
>> first prototype released in 2011.
>> 
>> 
>> Omid has now a more decentralized design that does not sacrifice the
>> consistency and performance of the original version. The current
>> design also enables Omid to scale to thousands of clients executing
>> transactions concurrently on application data stored in HBase.
>> Internally, Omid still utilizes a lock-free approach to support
>> multiple concurrent clients. Its design also relies on a centralized
>> conflict detection component, the TSO, which now resolves in an
>> efficient manner writeset collisions among concurrent transactions
>> without having to piggyback commit information to the clients. Another
>> important benefit of Omid is that it doesn't require any modification
>> of the underlying key-value datastore, HBase in this case. Moreover,
>> the recently added high availability algorithm allows to eliminate the
>> single point of failure represented by the TSO in those system
>> deployments requiring a higher degree of dependability. Last but not
>> least, the provided user API is very simple, mimicking transaction
>> managers in the relational world: begin, commit, rollback.
>> 
>> 
>> Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
>> management platform powering some of next-generation search and
>> personalization products is using Omid as a transaction manager in its
>> processing pipeline. Sieve essentially acts as a huge processing hub
>> between content feeds and serving systems. It provides an environment
>> for highly customizable, real-time, streamed information processing,
>> with typical discovery-to-service latencies of just a few seconds. In
>> terms of scale and availability, Omid’s new design was largely driven
>> by Sieve’s requirements.
>> 
>> 
>> At Yahoo, we are also making an effort to disseminate the current
>> status of the project through blog entries (See [3], [4] and [5]) and
>> submissions to technical and academic conferences such as ATC 2016,
>> Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
>> appeared in a TechCrunch article in the last quarter of 2015 (See [6])
>> 
>> 
>> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
>> Distributed Transactions and Notifications. USENIX Symposium on
>> Operating Systems Design and Implementation, 2010
>> 
>> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
>> Omid: Lock-free transactional support for distributed data stores. In
>> Proc. of ICDE, 2013.
>> 
>> [3]
>> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
>> 
>> [4]
>> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
>> 
>> [5]
>> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
>> 
>> [6]
>> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
>> 
>> 
>> === Rationale ===
>> 
>> Programming with ACID (Atomicity, Consistency, Isolation, Durability)
>> transactions is very popular and it is featured in relational
>> databases. However, in the Big Data ecosystem, applications typically
>> use NoSQL datastores, which do not provide ACID transactions. Such
>> NoSQL datastores used to give up transactional support for greater
>> agility and scalability. However, while early NoSQL data store
>> implementations did not include transaction support, the need for
>> transactions soon emerged in Big Data applications when accessing
>> shared data; for  example, transactions are very important  for
>> modern, scalable systems that process content incrementally.
>> 
>> 
>> NoSQL datastores -including HBase- don’t provide transactional
>> frameworks to coordinate the access to the underlying data for
>> preserving consistency. By using Omid, Big Data applications that need
>> to bundle multiple read and write operations on HBase into logically
>> indivisible units of work can execute transactions with ACID
>> properties, just as they would use transactions in the relational
>> database world. Omid extends the HBase key-value access APl with
>> transaction semantics. It can be exercised either directly, or via
>> higher level data management API’s. For example, Apache Phoenix
>> (SQL-on-top-of-HBase) might use Omid as its transaction management
>> component.
>> 
>> 
>> The following features make Omid an attractive choice for system
>> designers and other projects in the Apache community:
>> 
>> 
>> * Semantics. Omid implements Snapshot Isolation (SI,) supported by
>> major SQL and NoSQL technologies (e.g. Google Percolator).
>> 
>> 
>> * Performance and Scalability. Omid  provides a highly scalable,
>> lock-free implementation of SI. To the best of our knowledge, it is
>> also one of the few open source NoSQL transactional platforms that can
>> execute more than 100K transactions per second [1]. A new prototype
>> still in development can go even further, up to ~380K TPS.
>> 
>> 
>> * Reliability.  Omid has a high-availability (HA) mode, in which the
>> core service performing writeset conflict resolution operates as
>> primary-backup process pair with automatic failover. The HA support
>> has zero overhead on the mainstream operation.
>> 
>> 
>> * Adaptability. Omid current version provides transactions on data
>> stored in Apache HBase. However, Omid’s components are generic enough
>> to be adapted to any other key-value NoSQL datasource that supports
>> MVCC.
>> 
>> 
>> * Development. Omid provides a very simple interface that mimics
>> standard HBase APIs, making it developer friendly. Only minimal
>> extensions to the standard interfaces have been introduced to enable
>> transactions.
>> 
>> 
>> * Simplicity. Omid leverages the HBase infrastructure for managing its
>> own metadata. It entails no additional services apart from those
>> provided and used by HBase.
>> 
>> 
>> * Track Record. As we have mentioned, Omid is already in use by
>> very-large-scale production systems at Yahoo. Also, Hortonworks is
>> integrating Omid in a metastore implementation for Hive based on
>> HBase.
>> 
>> [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
>> 
>> 
>> === Current Status ===
>> Current Omid implementation is available in both, Yahoo’s internal
>> Github repository for internal use at Yahoo as well as in Yahoo’s
>> Github public repository (https://github.com/yahoo/omid.git). Both
>> repositories are managed by Omid’s current developers at Yahoo.
>> 
>> As it is mentioned above, Yahoo is currently using Omid for providing
>> transactions in Sieve, a web-scale content management platform that
>> powers Yahoo’s next-generation search and personalization products.
>> 
>> 
>> ==== Meritocracy ====
>> The first version of Omid was originally created in 2011 by Maysam
>> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
>> Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>> 
>> 
>> During the years after its inception, Omid has matured to operate at
>> Web scale and has been used internally by strategic projects at Yahoo
>> such as Sieve. The current base of committers belong to the Yahoo team
>> that took over the initial Omid prototype and rewrote it to meet the
>> high availability and scalability requirements of the Sieve project.
>> This base of committers has recently incorporated Hortonworks members
>> that helped in the Omid adaptation to HBase 1.x versions.
>> 
>> 
>> With this initial committer base, we aim to form a larger community
>> that can collaborate with new ideas over the current code base. This
>> new community will run the project following the "Apache Way"
>> (http://apache.org/foundation/governance/). Users and new contributors
>> will be treated with respect and welcomed. To grow the community, we
>> will encourage contributors to provide patches, review code, propose
>> new features improvements, talk at conferences such as Hadoop Summit,
>> HBaseCon, ApacheCon, etc. Committership and PMC membership will be
>> offered according to meritocracy.
>> 
>> ==== Community ====
>> 
>> The public Yahoo Omid repository at Github currently has 241 Stars and
>> 93 forks, which means that there is an important interest for the
>> project in the open-source community, at least compared with other
>> similar projects (See https://github.com/yahoo/omid.git).
>> 
>> 
>> Recently, Hortonworks contributors to the Apache Hive project which
>> are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
>> manifested interest in using Omid. We started with them a fruitful
>> collaboration that resulted in Omid supporting HBase 1.x versions.
>> 
>> 
>> Salesforce is also interested in collaborating in doing a Proof of
>> Concept for integrating Omid as a pluggable transaction manager in
>> Apache Phoenix.
>> 
>> 
>> Yahoo, Hortonworks and Salesforce participants will constitute the
>> initial set of committers and mentors for the proposal.
>> 
>> ==== Core Developers ====
>> The core developers of Omid are all skilled software developers and
>> research engineers at Yahoo Inc. and Hortonworks with years of
>> experiences in their fields. At this moment, developers are
>> distributed across U.S. and Israel. The aim is to incorporate more
>> committers from different organizations and locations over time.
>> 
>> 
>> The current set of developers include experienced committers from
>> Apache HBase, Hive and Hadoop projects that have been working with us
>> in the current codebase found in Github.
>> 
>> Finally, some of the core developers are currently NOT affiliated with
>> the ASF and would require new ICLAs to be filed.
>> 
>> 
>> === Alignment ===
>> Omid enhances with transactions the already successful Apache HBase
>> datastore project. We have collaborated with other developers inside
>> and outside Yahoo which are involved in the Apache HBase community, so
>> we have had reliable feedback from them.
>> 
>> Although Omid brings value into HBase, the design of the current
>> version provides a general transaction scheme that can potentially be
>> adapted to other MVCC key-value datastores such as Apache Cassandra.
>> 
>> 
>> Apache Phoenix is also a potential target. Phoenix is a SQL layer on
>> top of HBase that can potentially integrate Omid in order to provide
>> the well-know concept of transactions to Phoenix-based applications.
>> 
>> 
>> === Known Risks ===
>> ==== Orphaned products ====
>> Yahoo’s Research and Search organizations have been taking care of
>> Omid development since the first prototype creation in 2011. Yahoo has
>> a long history participating in open-source projects, and has been
>> also a long time contributor to the Apache community. For example, in
>> Apache, Yahoo is an important contributor in many projects in the
>> Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
>> open-sourced other well-known projects outside Hadoop, such as
>> Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
>> Omid also a successful open-source Apache product. If this happens, we
>> are sure that a larger community will be formed around the project in
>> a relatively short period of time, contributing to the diversification
>> and stabilization of the base of committers.
>> 
>> 
>> ==== Inexperience with Open Source ====
>> This project has long standing experienced mentors and interested
>> contributors from Apache HBase, Hive and Phoenix to help us moving
>> through the open source process. We are actively working with
>> experienced Apache community members to improve our project and
>> further testing.
>> 
>> ==== Homogeneous Developers ====
>> Omid has been supported by Yahoo since its inception in 2011. However,
>> all current committers are employed by their respective companies
>> shown in the Affiliations section.
>> 
>> 
>> ==== Reliance on Salaried Developers ====
>> 
>> All the current developers are paid by their employers to contribute
>> to this project. Yahoo developers will also continuing maintaining the
>> internal Omid repository at their company.
>> 
>> Of course, other developers are welcomed to contribute to this project
>> after it is open sourced in Apache.
>> 
>> ==== Relationships with Other Apache Product ====
>> 
>> Current Omid incarnation serves transactional contexts to applications
>> storing their data in HBase. However Omid design potentially allows to
>> be adapted to serve transactions on top of other MVCC-based key-value
>> datastores in Apache community such as Cassandra.
>> 
>> 
>> As a transactional framework, many other Apache projects such as
>> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
>> potentially benefit from Omid to get transactional contexts. In
>> particular, Apache Phoenix -a SQL layer on top of HBase- might use
>> Omid as its transaction management component. Once we open source Omid
>> as an Apache project, we expect to generate more interest in the
>> surrounded communities.
>> 
>> 
>> Very recently, a new incubator proposal for a similar project called
>> Tephra, has been submitted to the ASF. We think this is good for the
>> Apache community, and we believe that there’s room for both proposals
>> as the design of each of them is based on different principles (e.g.
>> Omid does not require to maintain the state of ongoing transactions on
>> the server-side component) and due to the fact that both -Tephra and
>> Omid- have also gained certain traction in the open-source community.
>> 
>> 
>> With regard to the Apache projects that Omid uses, apart from HBase,
>> Omid relies on Apache Zookeeper and Curator projects in order to
>> coordinate the (re)connection of transaction managers (acting as
>> clients) to the conflict resolution component for transactions (server
>> side.) They’re also used in order to coordinate the master and backup
>> replicas in high availability scenarios.
>> 
>> 
>> ==== An Excessive Fascination with the Apache Brand ====
>> 
>> We are applying to the Incubator process because we think that it is
>> the logical next step for the  Omid project after we open-sourced the
>> code in Github some years ago. Yahoo has a long-standing history of
>> contributing to Apache projects. The developers and contributors
>> understand the implications of making it an Apache project, and
>> strongly believe that the growing community can benefit from the
>> Apache environment, ecosystem, and infrastrastructure.
>> 
>> 
>> === Documentation ===
>> Current documentation about the project is available in the wiki of
>> Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
>> be moved under https://omid.incubator.apache.org/docs if the project
>> is accepted as an Apache Incubator.
>> 
>> === Initial Source ===
>> Initial source code is currently hosted in Github for general viewing
>> and contribution:
>> 
>> https://github.com/yahoo/omid.git
>> 
>> 
>> Omid source code is written in Java code (99%) mixed with some shell
>> script (1%) in order to configure and trigger the execution of main
>> components.
>> 
>> 
>> The code will be moved to Apache http://git.apache.org/ if accepted as
>> an Incubator project.
>> 
>> === Source and Intellectual Property Submission Plan ===
>> 
>> The current Omid License for the code published in Github is Apache
>> 2.0. If Omid fulfills and passes the conditions for being an Incubator
>> project in the ASF, the source code will be transitioned via the
>> Software Grant Agreement onto the ASF infrastructure and in turn made
>> available under the Apache License, version 2.0.
>> 
>> === External Dependencies ===
>> 
>> 
>> The required external dependencies that are not Apache projects are
>> all Apache licenses or other compatible Licenses:
>> 
>> Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
>> 
>> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
>> 
>> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>> 
>> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
>> 
>> Testng v6.8.8  (http://testng.org) [Apache 2.0]
>> 
>> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>> 
>> Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>> 
>> Google Protocol Buffers v2.5.0
>> (https://developers.google.com/protocol-buffers/) [BSD License]
>> 
>> Mockito (http://mockito.org/) v1.9.5 [MIT License]
>> 
>> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
>> [Apache 2.0]
>> 
>> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
>> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>> 
>> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>> 
>> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>> 
>> 
>> === Cryptography ===
>> Omid project does not use cryptography itself. However, Apache HBase
>> -the datastore on top of which Omid works in its current version- uses
>> standard APIs and tools for SSH and SSL communication where necessary.
>> 
>> === Required Resources ===
>> We request that following resources be created for the project to use:
>> 
>> ==== Mailing lists ====
>> 
>> omid-private (moderated subscriptions)
>> 
>> omid-commits (commit notification)
>> omid-dev (technical discussions)
>> 
>> ==== Git repository ====
>> https://github.com/apache/incubator-omid
>> 
>> ==== Documentation ====
>> https://omid.incubator.apache.org/docs/
>> 
>> ==== JIRA instance ====
>> https://issues.apache.org/jira/browse/omid
>> 
>> === Initial Committers ===
>> 
>> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>> 
>> 
>> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>> 
>> 
>> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>> 
>> 
>> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>> 
>> 
>> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>> 
>> 
>> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>> 
>> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>> 
>> 
>> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>> 
>> 
>> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>> 
>> 
>> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>> 
>> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>> 
>> 
>> === Additional Interested Contributors ===
>> * Ivan Kelly (ivank<AT>apache<DOT>org)
>> 
>> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>> 
>> 
>> === Affiliations ===
>> 
>> * Edward Bortnikov, Yahoo Inc.
>> 
>> 
>> * Daniel Dai, Hortonworks
>> 
>> 
>> * Flavio P. Junqueira, Confluent
>> 
>> 
>> * Igor Katkov, Yahoo Inc.
>> 
>> 
>> * Ivan Kelly, Midokura
>> 
>> 
>> * Francis C. Liu, Yahoo Inc.
>> 
>> 
>> * Sameer Paranjpye, Arimo
>> 
>> * Francisco Perez-Sorrosal, Yahoo Inc.
>> 
>> 
>> * Ohad Shacham, Yahoo Inc.
>> 
>> 
>> * Maysam Yabandeh, Dropbox Inc.
>> 
>> 
>> === Sponsors ===
>> 
>> ==== Champion ====
>> 
>> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>> 
>> ==== Nominated Mentors ====
>> 
>> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>> 
>> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>> 
>> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>> 
>> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>> 
>> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>> 
>> 
>> ==== Sponsoring Entity ====
>> Apache Incubator PMC
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


RE: [MARKETING] [Caution: Suspicious URL]: Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Dor Ben Dov <do...@amdocs.com>.
Andrew, 

Do you think Cloudera will include this new version in their bundle same as Horton ? 

Dor

-----Original Message-----
From: Andrew Purtell [mailto:andrew.purtell@gmail.com] 
Sent: שבת 19 מרץ 2016 22:59
To: general@incubator.apache.org
Subject: [MARKETING] [Caution: Suspicious URL]: Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Apache Phoenix just released version 4.7.0 with big news: transactions support, using Tephra. There's some interest in a successful Tephra incubation beyond the podling already. That said, that new code in Phoenix can be made pluggable to support more than one transaction oracle. Omid might be able to provide workable integration to stand in for Tephra. Collaboration between or even a joining of the two communities could be good but even if not as a potential downstream consumer it's good to have options! (provided the number of alternatives is bounded with reason of course). I think it would be good to see Omid get in. I think an Omid podling would find interested collaborators in the Phoenix and HBase communities right away. 


> On Mar 19, 2016, at 12:20 PM, Henry Saputra <he...@gmail.com> wrote:
> 
> Thanks for the great explanation, Flavio.
> 
> As many have mentioned before, it is definitely ok to have similar 
> projects in ASF. We have prior acts before and I didn't expect 
> incubator to reject good projects coming in.
> 
> My intention was to avoid split of resources where both projects have 
> very similar goal and approach. But maybe both projects have different 
> subtle differences that worthy to be done as independent effort.
> 
> Just being devil advocate a bit to see if potential to collaborate.
> 
> - Henry
> 
>> On Saturday, March 19, 2016, Flavio Junqueira <fp...@apache.org> wrote:
>> 
>> I understand the concern, so let me try to offer some facts and see 
>> if we can make progress from there.
>> 
>> Omid has been around for some time now, and its initial design 
>> appeared in a couple of research papers that I actually co-authored. 
>> The architecture is based on the idea of having a centralized 
>> transaction status oracle that shares transaction status data with 
>> clients for scalability. The current Omid project evolved out of that 
>> initial work and it is a much improved version over that first 
>> iteration, with the improvements focusing on scalability. It 
>> currently runs in production at scale at Yahoo! and there is interest 
>> from other companies according to the proposal. There is a series of blog posts about the experience in the project proposal.
>> 
>> Tephra has a very similar architecture. The description here says 
>> that it has a transaction server, which sounds like the TSO in the 
>> original Omid papers. I haven't spent enough time understanding the 
>> precise protocol they use, but I must say that the protocol is very 
>> important for correctness and scalability. Having two protocols with 
>> different properties could justify the presence of two projects, but 
>> they both promise snapshot isolation so I suspect they will be doing very similar things.
>> 
>> Overall, as I see it, it would be very unfair to reject the Omid 
>> proposal on the basis that Tephra was incubated a couple of weeks 
>> ago. I'd much rather see how the two communities evolve and have the 
>> mentors of the projects fostering collaboration and possibly a merge 
>> of the two projects before graduation. Why not think of a general 
>> transaction status oracle with different protocol implementations 
>> assuming it makes sense? I wouldn't like to see any of the two 
>> blocked upfront on the basis that they are in the same space, though. 
>> We could postpone this decision until graduation when we'll have more 
>> knowledge about the projects and the growth of the two communities.
>> 
>> -Flavio
>> 
>>>> On 18 Mar 2016, at 23:19, Henry Saputra <henry.saputra@gmail.com
>>> <javascript:;>> wrote:
>>> 
>>> I know Apache incubator does not play favorite but it is getting 
>>> awkward that TWO transaction engine for HBase coming to incubator at 
>>> the same
>> time.
>>> 
>>> As most people know, the other one is Tephra, that just coming to
>> incubator
>>> few weeks ago.
>>> 
>>> As member of IPMC, I would like to see Omid provide some more 
>>> details comparisons about the difference that the project bring,  in 
>>> term of approach and possible integrations with other ASF projects.
>>> 
>>> If possible, I would prefer to see Omid team work together with 
>>> Tephra to work on working together to make one solid transaction 
>>> engine for HBase
>> and
>>> later NoSQL databases.
>>> 
>>> 
>>> - Henry
>>> 
>>>> On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <daijyc@gmail.com
>>> <javascript:;>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I would like to propose Omid as an Apache Incubator project:
>>>> 
>>>> https://wiki.apache.org/incubator/OmidProposal
>>>> 
>>>> I've posted posted the text of the proposal below:
>>>> 
>>>> Thanks,
>>>> Daniel
>>>> 
>>>> = Omid Proposal =
>>>> 
>>>> === Abstract ===
>>>> 
>>>> Omid is a flexible, reliable, high performant and scalable ACID 
>>>> transactional framework that allows client applications to execute 
>>>> transactions on top of MVCC key/value-based NoSQL datastores 
>>>> (currently Apache HBase) providing Snapshot Isolation guarantees on 
>>>> the accessed data.
>>>> 
>>>> 
>>>> === Proposal ===
>>>> 
>>>> Omid is a flexible open-source transactional framework that 
>>>> provides ACID transactions with Snapshot Isolation guarantees on 
>>>> top of NoSQL datastores. In particular, the current codebase brings 
>>>> the concept of transactions to the popular Apache HBase datastore. 
>>>> Omid offers great performance, it is highly available, and 
>>>> scalable. Omid's current version is able to scale to thousands of 
>>>> clients triggering concurrent transactions on application data 
>>>> stored in HBase. Omid can scale beyond 100K transactions per second 
>>>> on mid-range hardware while incurring in a minimal impact on the 
>>>> speed of data access in the datastore. We’re currently 
>>>> experimenting with a prototype version that can improve the performance up to ~380K TPS.
>>>> 
>>>> 
>>>> Omid has been publicly available as an open-source project in 
>>>> Github under Apache License Version 2.0 since 2011 [1]. During 
>>>> these years, it has generated certain interest in the open source 
>>>> community, especially since the public presentation of the first 
>>>> version in Hadoop Summit 2013 [2]. Currently the Github project has 
>>>> 241 Stars and
>>>> 93 forks. Yahoo Inc. submits this proposal to the Apache Software 
>>>> Foundation with the aim to transfer the Omid project -including its 
>>>> source code and documentation- to Apache in order to start the 
>>>> build of a stable open source community around it.
>>>> 
>>>> 
>>>> [1] https://github.com/yahoo/omid
>>>> 
>>>> [2] Omid presentation at Hadoop Summit 2013:
>> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe
>> 2luyqLU464Nxz4aQe7EPBus
>>>> 
>>>> 
>>>> === Background ===
>>>> 
>>>> An Omid prototype was first released as an open-source project back 
>>>> in 2011. Inspired by Google Percolator [1], it offered a lock-free 
>>>> approach to transactions in NoSQL datastores (See [2]). However, 
>>>> during these years, the design of Omid has evolved significantly.
>>>> Whilst the current open-sourced version maintains many aspects of 
>>>> the original implementation, it is the result of a major redesign 
>>>> of the first prototype released in 2011.
>>>> 
>>>> 
>>>> Omid has now a more decentralized design that does not sacrifice 
>>>> the consistency and performance of the original version. The 
>>>> current design also enables Omid to scale to thousands of clients 
>>>> executing transactions concurrently on application data stored in HBase.
>>>> Internally, Omid still utilizes a lock-free approach to support 
>>>> multiple concurrent clients. Its design also relies on a 
>>>> centralized conflict detection component, the TSO, which now 
>>>> resolves in an efficient manner writeset collisions among 
>>>> concurrent transactions without having to piggyback commit 
>>>> information to the clients. Another important benefit of Omid is 
>>>> that it doesn't require any modification of the underlying 
>>>> key-value datastore, HBase in this case. Moreover, the recently 
>>>> added high availability algorithm allows to eliminate the single 
>>>> point of failure represented by the TSO in those system deployments 
>>>> requiring a higher degree of dependability. Last but not least, the 
>>>> provided user API is very simple, mimicking transaction managers in the relational world: begin, commit, rollback.
>>>> 
>>>> 
>>>> Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content 
>>>> management platform powering some of next-generation search and 
>>>> personalization products is using Omid as a transaction manager in 
>>>> its processing pipeline. Sieve essentially acts as a huge 
>>>> processing hub between content feeds and serving systems. It 
>>>> provides an environment for highly customizable, real-time, 
>>>> streamed information processing, with typical discovery-to-service 
>>>> latencies of just a few seconds. In terms of scale and 
>>>> availability, Omid’s new design was largely driven by Sieve’s requirements.
>>>> 
>>>> 
>>>> At Yahoo, we are also making an effort to disseminate the current 
>>>> status of the project through blog entries (See [3], [4] and [5]) 
>>>> and submissions to technical and academic conferences such as ATC 
>>>> 2016, Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid 
>>>> also appeared in a TechCrunch article in the last quarter of 2015 
>>>> (See [6])
>>>> 
>>>> 
>>>> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using 
>>>> Distributed Transactions and Notifications. USENIX Symposium on 
>>>> Operating Systems Design and Implementation, 2010
>>>> 
>>>> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
>>>> Omid: Lock-free transactional support for distributed data stores. 
>>>> In Proc. of ICDE, 2013.
>>>> 
>>>> [3]
>> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-tran
>> saction-processing-for
>>>> 
>>>> [4]
>> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and
>> -protocol
>>>> 
>>>> [5]
>> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-
>> omid
>>>> 
>>>> [6]
>> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brin
>> gs-scalable-transaction-processing-to-hbase/
>>>> 
>>>> 
>>>> === Rationale ===
>>>> 
>>>> Programming with ACID (Atomicity, Consistency, Isolation, 
>>>> Durability) transactions is very popular and it is featured in 
>>>> relational databases. However, in the Big Data ecosystem, 
>>>> applications typically use NoSQL datastores, which do not provide 
>>>> ACID transactions. Such NoSQL datastores used to give up 
>>>> transactional support for greater agility and scalability. However, 
>>>> while early NoSQL data store implementations did not include 
>>>> transaction support, the need for transactions soon emerged in Big 
>>>> Data applications when accessing shared data; for  example, 
>>>> transactions are very important  for modern, scalable systems that process content incrementally.
>>>> 
>>>> 
>>>> NoSQL datastores -including HBase- don’t provide transactional 
>>>> frameworks to coordinate the access to the underlying data for 
>>>> preserving consistency. By using Omid, Big Data applications that 
>>>> need to bundle multiple read and write operations on HBase into 
>>>> logically indivisible units of work can execute transactions with 
>>>> ACID properties, just as they would use transactions in the 
>>>> relational database world. Omid extends the HBase key-value access 
>>>> APl with transaction semantics. It can be exercised either 
>>>> directly, or via higher level data management API’s. For example, 
>>>> Apache Phoenix
>>>> (SQL-on-top-of-HBase) might use Omid as its transaction management 
>>>> component.
>>>> 
>>>> 
>>>> The following features make Omid an attractive choice for system 
>>>> designers and other projects in the Apache community:
>>>> 
>>>> 
>>>> * Semantics. Omid implements Snapshot Isolation (SI,) supported by 
>>>> major SQL and NoSQL technologies (e.g. Google Percolator).
>>>> 
>>>> 
>>>> * Performance and Scalability. Omid  provides a highly scalable, 
>>>> lock-free implementation of SI. To the best of our knowledge, it is 
>>>> also one of the few open source NoSQL transactional platforms that 
>>>> can execute more than 100K transactions per second [1]. A new 
>>>> prototype still in development can go even further, up to ~380K TPS.
>>>> 
>>>> 
>>>> * Reliability.  Omid has a high-availability (HA) mode, in which 
>>>> the core service performing writeset conflict resolution operates 
>>>> as primary-backup process pair with automatic failover. The HA 
>>>> support has zero overhead on the mainstream operation.
>>>> 
>>>> 
>>>> * Adaptability. Omid current version provides transactions on data 
>>>> stored in Apache HBase. However, Omid’s components are generic 
>>>> enough to be adapted to any other key-value NoSQL datasource that 
>>>> supports MVCC.
>>>> 
>>>> 
>>>> * Development. Omid provides a very simple interface that mimics 
>>>> standard HBase APIs, making it developer friendly. Only minimal 
>>>> extensions to the standard interfaces have been introduced to 
>>>> enable transactions.
>>>> 
>>>> 
>>>> * Simplicity. Omid leverages the HBase infrastructure for managing 
>>>> its own metadata. It entails no additional services apart from 
>>>> those provided and used by HBase.
>>>> 
>>>> 
>>>> * Track Record. As we have mentioned, Omid is already in use by 
>>>> very-large-scale production systems at Yahoo. Also, Hortonworks is 
>>>> integrating Omid in a metastore implementation for Hive based on 
>>>> HBase.
>>>> 
>>>> [1] See also Haeinsa: 
>>>> https://github.com/vcnc/haeinsa/wiki/Performance
>>>> 
>>>> 
>>>> === Current Status ===
>>>> Current Omid implementation is available in both, Yahoo’s internal 
>>>> Github repository for internal use at Yahoo as well as in Yahoo’s 
>>>> Github public repository (https://github.com/yahoo/omid.git). Both 
>>>> repositories are managed by Omid’s current developers at Yahoo.
>>>> 
>>>> As it is mentioned above, Yahoo is currently using Omid for 
>>>> providing transactions in Sieve, a web-scale content management 
>>>> platform that powers Yahoo’s next-generation search and personalization products.
>>>> 
>>>> 
>>>> ==== Meritocracy ====
>>>> The first version of Omid was originally created in 2011 by Maysam 
>>>> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and 
>>>> Flavio Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>>>> 
>>>> 
>>>> During the years after its inception, Omid has matured to operate 
>>>> at Web scale and has been used internally by strategic projects at 
>>>> Yahoo such as Sieve. The current base of committers belong to the 
>>>> Yahoo team that took over the initial Omid prototype and rewrote it 
>>>> to meet the high availability and scalability requirements of the Sieve project.
>>>> This base of committers has recently incorporated Hortonworks 
>>>> members that helped in the Omid adaptation to HBase 1.x versions.
>>>> 
>>>> 
>>>> With this initial committer base, we aim to form a larger community 
>>>> that can collaborate with new ideas over the current code base. 
>>>> This new community will run the project following the "Apache Way"
>>>> (http://apache.org/foundation/governance/). Users and new 
>>>> contributors will be treated with respect and welcomed. To grow the 
>>>> community, we will encourage contributors to provide patches, 
>>>> review code, propose new features improvements, talk at conferences 
>>>> such as Hadoop Summit, HBaseCon, ApacheCon, etc. Committership and 
>>>> PMC membership will be offered according to meritocracy.
>>>> 
>>>> ==== Community ====
>>>> 
>>>> The public Yahoo Omid repository at Github currently has 241 Stars 
>>>> and
>>>> 93 forks, which means that there is an important interest for the 
>>>> project in the open-source community, at least compared with other 
>>>> similar projects (See https://github.com/yahoo/omid.git).
>>>> 
>>>> 
>>>> Recently, Hortonworks contributors to the Apache Hive project which 
>>>> are working on storing Hive metadata in HBase (Apache Jira 
>>>> HIVE-9452) manifested interest in using Omid. We started with them 
>>>> a fruitful collaboration that resulted in Omid supporting HBase 1.x versions.
>>>> 
>>>> 
>>>> Salesforce is also interested in collaborating in doing a Proof of 
>>>> Concept for integrating Omid as a pluggable transaction manager in 
>>>> Apache Phoenix.
>>>> 
>>>> 
>>>> Yahoo, Hortonworks and Salesforce participants will constitute the 
>>>> initial set of committers and mentors for the proposal.
>>>> 
>>>> ==== Core Developers ====
>>>> The core developers of Omid are all skilled software developers and 
>>>> research engineers at Yahoo Inc. and Hortonworks with years of 
>>>> experiences in their fields. At this moment, developers are 
>>>> distributed across U.S. and Israel. The aim is to incorporate more 
>>>> committers from different organizations and locations over time.
>>>> 
>>>> 
>>>> The current set of developers include experienced committers from 
>>>> Apache HBase, Hive and Hadoop projects that have been working with 
>>>> us in the current codebase found in Github.
>>>> 
>>>> Finally, some of the core developers are currently NOT affiliated 
>>>> with the ASF and would require new ICLAs to be filed.
>>>> 
>>>> 
>>>> === Alignment ===
>>>> Omid enhances with transactions the already successful Apache HBase 
>>>> datastore project. We have collaborated with other developers 
>>>> inside and outside Yahoo which are involved in the Apache HBase 
>>>> community, so we have had reliable feedback from them.
>>>> 
>>>> Although Omid brings value into HBase, the design of the current 
>>>> version provides a general transaction scheme that can potentially 
>>>> be adapted to other MVCC key-value datastores such as Apache Cassandra.
>>>> 
>>>> 
>>>> Apache Phoenix is also a potential target. Phoenix is a SQL layer 
>>>> on top of HBase that can potentially integrate Omid in order to 
>>>> provide the well-know concept of transactions to Phoenix-based applications.
>>>> 
>>>> 
>>>> === Known Risks ===
>>>> ==== Orphaned products ====
>>>> Yahoo’s Research and Search organizations have been taking care of 
>>>> Omid development since the first prototype creation in 2011. Yahoo 
>>>> has a long history participating in open-source projects, and has 
>>>> been also a long time contributor to the Apache community. For 
>>>> example, in Apache, Yahoo is an important contributor in many 
>>>> projects in the Hadoop ecosystem such as HBase, Pig, Storm or YARN, 
>>>> and has also open-sourced other well-known projects outside Hadoop, 
>>>> such as Zookeeper or Bookkeeper. So it is in the best interest of 
>>>> Yahoo make Omid also a successful open-source Apache product. If 
>>>> this happens, we are sure that a larger community will be formed 
>>>> around the project in a relatively short period of time, 
>>>> contributing to the diversification and stabilization of the base of committers.
>>>> 
>>>> 
>>>> ==== Inexperience with Open Source ==== This project has long 
>>>> standing experienced mentors and interested contributors from 
>>>> Apache HBase, Hive and Phoenix to help us moving through the open 
>>>> source process. We are actively working with experienced Apache 
>>>> community members to improve our project and further testing.
>>>> 
>>>> ==== Homogeneous Developers ====
>>>> Omid has been supported by Yahoo since its inception in 2011. 
>>>> However, all current committers are employed by their respective 
>>>> companies shown in the Affiliations section.
>>>> 
>>>> 
>>>> ==== Reliance on Salaried Developers ====
>>>> 
>>>> All the current developers are paid by their employers to 
>>>> contribute to this project. Yahoo developers will also continuing 
>>>> maintaining the internal Omid repository at their company.
>>>> 
>>>> Of course, other developers are welcomed to contribute to this 
>>>> project after it is open sourced in Apache.
>>>> 
>>>> ==== Relationships with Other Apache Product ====
>>>> 
>>>> Current Omid incarnation serves transactional contexts to 
>>>> applications storing their data in HBase. However Omid design 
>>>> potentially allows to be adapted to serve transactions on top of 
>>>> other MVCC-based key-value datastores in Apache community such as Cassandra.
>>>> 
>>>> 
>>>> As a transactional framework, many other Apache projects such as 
>>>> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could 
>>>> potentially benefit from Omid to get transactional contexts. In 
>>>> particular, Apache Phoenix -a SQL layer on top of HBase- might use 
>>>> Omid as its transaction management component. Once we open source 
>>>> Omid as an Apache project, we expect to generate more interest in 
>>>> the surrounded communities.
>>>> 
>>>> 
>>>> Very recently, a new incubator proposal for a similar project 
>>>> called Tephra, has been submitted to the ASF. We think this is good 
>>>> for the Apache community, and we believe that there’s room for both 
>>>> proposals as the design of each of them is based on different principles (e.g.
>>>> Omid does not require to maintain the state of ongoing transactions 
>>>> on the server-side component) and due to the fact that both -Tephra 
>>>> and
>>>> Omid- have also gained certain traction in the open-source community.
>>>> 
>>>> 
>>>> With regard to the Apache projects that Omid uses, apart from 
>>>> HBase, Omid relies on Apache Zookeeper and Curator projects in 
>>>> order to coordinate the (re)connection of transaction managers 
>>>> (acting as
>>>> clients) to the conflict resolution component for transactions 
>>>> (server
>>>> side.) They’re also used in order to coordinate the master and 
>>>> backup replicas in high availability scenarios.
>>>> 
>>>> 
>>>> ==== An Excessive Fascination with the Apache Brand ====
>>>> 
>>>> We are applying to the Incubator process because we think that it 
>>>> is the logical next step for the  Omid project after we 
>>>> open-sourced the code in Github some years ago. Yahoo has a 
>>>> long-standing history of contributing to Apache projects. The 
>>>> developers and contributors understand the implications of making 
>>>> it an Apache project, and strongly believe that the growing 
>>>> community can benefit from the Apache environment, ecosystem, and infrastrastructure.
>>>> 
>>>> 
>>>> === Documentation ===
>>>> Current documentation about the project is available in the wiki of 
>>>> Omid’s Github repository: https://github.com/yahoo/omid/wiki . It 
>>>> will be moved under https://omid.incubator.apache.org/docs if the 
>>>> project is accepted as an Apache Incubator.
>>>> 
>>>> === Initial Source ===
>>>> Initial source code is currently hosted in Github for general 
>>>> viewing and contribution:
>>>> 
>>>> https://github.com/yahoo/omid.git
>>>> 
>>>> 
>>>> Omid source code is written in Java code (99%) mixed with some 
>>>> shell script (1%) in order to configure and trigger the execution 
>>>> of main components.
>>>> 
>>>> 
>>>> The code will be moved to Apache http://git.apache.org/ if accepted 
>>>> as an Incubator project.
>>>> 
>>>> === Source and Intellectual Property Submission Plan ===
>>>> 
>>>> The current Omid License for the code published in Github is Apache 
>>>> 2.0. If Omid fulfills and passes the conditions for being an 
>>>> Incubator project in the ASF, the source code will be transitioned 
>>>> via the Software Grant Agreement onto the ASF infrastructure and in 
>>>> turn made available under the Apache License, version 2.0.
>>>> 
>>>> === External Dependencies ===
>>>> 
>>>> 
>>>> The required external dependencies that are not Apache projects are 
>>>> all Apache licenses or other compatible Licenses:
>>>> 
>>>> Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
>>>> 
>>>> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK 
>>>> License]
>>>> 
>>>> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>>>> 
>>>> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 
>>>> 2.0]
>>>> 
>>>> Testng v6.8.8  (http://testng.org) [Apache 2.0]
>>>> 
>>>> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>>>> 
>>>> Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>>>> 
>>>> Google Protocol Buffers v2.5.0
>>>> (https://developers.google.com/protocol-buffers/) [BSD License]
>>>> 
>>>> Mockito (http://mockito.org/) v1.9.5 [MIT License]
>>>> 
>>>> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
>>>> [Apache 2.0]
>>>> 
>>>> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
>>>> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>>>> 
>>>> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>>>> 
>>>> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>>>> 
>>>> 
>>>> === Cryptography ===
>>>> Omid project does not use cryptography itself. However, Apache 
>>>> HBase -the datastore on top of which Omid works in its current 
>>>> version- uses standard APIs and tools for SSH and SSL communication where necessary.
>>>> 
>>>> === Required Resources ===
>>>> We request that following resources be created for the project to use:
>>>> 
>>>> ==== Mailing lists ====
>>>> 
>>>> omid-private (moderated subscriptions)
>>>> 
>>>> omid-commits (commit notification)
>>>> omid-dev (technical discussions)
>>>> 
>>>> ==== Git repository ====
>>>> https://github.com/apache/incubator-omid
>>>> 
>>>> ==== Documentation ====
>>>> https://omid.incubator.apache.org/docs/
>>>> 
>>>> ==== JIRA instance ====
>>>> https://issues.apache.org/jira/browse/omid
>>>> 
>>>> === Initial Committers ===
>>>> 
>>>> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>>>> 
>>>> 
>>>> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>>>> 
>>>> 
>>>> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>>>> 
>>>> 
>>>> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>>>> 
>>>> 
>>>> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>>>> 
>>>> 
>>>> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>>>> 
>>>> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>>>> 
>>>> 
>>>> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>>>> 
>>>> 
>>>> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>>>> 
>>>> 
>>>> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>>>> 
>>>> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>>>> 
>>>> 
>>>> === Additional Interested Contributors ===
>>>> * Ivan Kelly (ivank<AT>apache<DOT>org)
>>>> 
>>>> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>>>> 
>>>> 
>>>> === Affiliations ===
>>>> 
>>>> * Edward Bortnikov, Yahoo Inc.
>>>> 
>>>> 
>>>> * Daniel Dai, Hortonworks
>>>> 
>>>> 
>>>> * Flavio P. Junqueira, Confluent
>>>> 
>>>> 
>>>> * Igor Katkov, Yahoo Inc.
>>>> 
>>>> 
>>>> * Ivan Kelly, Midokura
>>>> 
>>>> 
>>>> * Francis C. Liu, Yahoo Inc.
>>>> 
>>>> 
>>>> * Sameer Paranjpye, Arimo
>>>> 
>>>> * Francisco Perez-Sorrosal, Yahoo Inc.
>>>> 
>>>> 
>>>> * Ohad Shacham, Yahoo Inc.
>>>> 
>>>> 
>>>> * Maysam Yabandeh, Dropbox Inc.
>>>> 
>>>> 
>>>> === Sponsors ===
>>>> 
>>>> ==== Champion ====
>>>> 
>>>> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>>>> 
>>>> ==== Nominated Mentors ====
>>>> 
>>>> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>>>> 
>>>> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>>>> 
>>>> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>>>> 
>>>> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>>>> 
>>>> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>>>> 
>>>> 
>>>> ==== Sponsoring Entity ====
>>>> Apache Incubator PMC
>>>> 
>>>> -------------------------------------------------------------------
>>>> -- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> <javascript:;>
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>> <javascript:;>
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> <javascript:;>
>> For additional commands, e-mail: general-help@incubator.apache.org 
>> <javascript:;>
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
you may review at http://www.amdocs.com/email_disclaimer.asp

Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Henry Saputra <he...@gmail.com>.
That seems be a good approach for Apache Phoenix to enable possible
different transaction engine.

- Henry

On Sat, Mar 19, 2016 at 1:59 PM, Andrew Purtell <an...@gmail.com>
wrote:

> Apache Phoenix just released version 4.7.0 with big news: transactions
> support, using Tephra. There's some interest in a successful Tephra
> incubation beyond the podling already. That said, that new code in Phoenix
> can be made pluggable to support more than one transaction oracle. Omid
> might be able to provide workable integration to stand in for Tephra.
> Collaboration between or even a joining of the two communities could be
> good but even if not as a potential downstream consumer it's good to have
> options! (provided the number of alternatives is bounded with reason of
> course). I think it would be good to see Omid get in. I think an Omid
> podling would find interested collaborators in the Phoenix and HBase
> communities right away.
>
>
> > On Mar 19, 2016, at 12:20 PM, Henry Saputra <he...@gmail.com>
> wrote:
> >
> > Thanks for the great explanation, Flavio.
> >
> > As many have mentioned before, it is definitely ok to have similar
> projects
> > in ASF. We have prior acts before and I didn't expect incubator to reject
> > good projects coming in.
> >
> > My intention was to avoid split of resources where both projects have
> > very similar goal and approach. But maybe both projects have different
> > subtle differences that worthy to be done as independent effort.
> >
> > Just being devil advocate a bit to see if potential to collaborate.
> >
> > - Henry
> >
> >> On Saturday, March 19, 2016, Flavio Junqueira <fp...@apache.org> wrote:
> >>
> >> I understand the concern, so let me try to offer some facts and see if
> we
> >> can make progress from there.
> >>
> >> Omid has been around for some time now, and its initial design appeared
> in
> >> a couple of research papers that I actually co-authored. The
> architecture
> >> is based on the idea of having a centralized transaction status oracle
> that
> >> shares transaction status data with clients for scalability. The current
> >> Omid project evolved out of that initial work and it is a much improved
> >> version over that first iteration, with the improvements focusing on
> >> scalability. It currently runs in production at scale at Yahoo! and
> there
> >> is interest from other companies according to the proposal. There is a
> >> series of blog posts about the experience in the project proposal.
> >>
> >> Tephra has a very similar architecture. The description here says that
> it
> >> has a transaction server, which sounds like the TSO in the original Omid
> >> papers. I haven't spent enough time understanding the precise protocol
> they
> >> use, but I must say that the protocol is very important for correctness
> and
> >> scalability. Having two protocols with different properties could
> justify
> >> the presence of two projects, but they both promise snapshot isolation
> so I
> >> suspect they will be doing very similar things.
> >>
> >> Overall, as I see it, it would be very unfair to reject the Omid
> proposal
> >> on the basis that Tephra was incubated a couple of weeks ago. I'd much
> >> rather see how the two communities evolve and have the mentors of the
> >> projects fostering collaboration and possibly a merge of the two
> projects
> >> before graduation. Why not think of a general transaction status oracle
> >> with different protocol implementations assuming it makes sense? I
> wouldn't
> >> like to see any of the two blocked upfront on the basis that they are in
> >> the same space, though. We could postpone this decision until graduation
> >> when we'll have more knowledge about the projects and the growth of the
> two
> >> communities.
> >>
> >> -Flavio
> >>
> >>>> On 18 Mar 2016, at 23:19, Henry Saputra <henry.saputra@gmail.com
> >>> <javascript:;>> wrote:
> >>>
> >>> I know Apache incubator does not play favorite but it is getting
> awkward
> >>> that TWO transaction engine for HBase coming to incubator at the same
> >> time.
> >>>
> >>> As most people know, the other one is Tephra, that just coming to
> >> incubator
> >>> few weeks ago.
> >>>
> >>> As member of IPMC, I would like to see Omid provide some more details
> >>> comparisons about the difference that the project bring,  in term of
> >>> approach and possible integrations with other ASF projects.
> >>>
> >>> If possible, I would prefer to see Omid team work together with Tephra
> to
> >>> work on working together to make one solid transaction engine for HBase
> >> and
> >>> later NoSQL databases.
> >>>
> >>>
> >>> - Henry
> >>>
> >>>> On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <daijyc@gmail.com
> >>> <javascript:;>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I would like to propose Omid as an Apache Incubator project:
> >>>>
> >>>> https://wiki.apache.org/incubator/OmidProposal
> >>>>
> >>>> I've posted posted the text of the proposal below:
> >>>>
> >>>> Thanks,
> >>>> Daniel
> >>>>
> >>>> = Omid Proposal =
> >>>>
> >>>> === Abstract ===
> >>>>
> >>>> Omid is a flexible, reliable, high performant and scalable ACID
> >>>> transactional framework that allows client applications to execute
> >>>> transactions on top of MVCC key/value-based NoSQL datastores
> >>>> (currently Apache HBase) providing Snapshot Isolation guarantees on
> >>>> the accessed data.
> >>>>
> >>>>
> >>>> === Proposal ===
> >>>>
> >>>> Omid is a flexible open-source transactional framework that provides
> >>>> ACID transactions with Snapshot Isolation guarantees on top of NoSQL
> >>>> datastores. In particular, the current codebase brings the concept of
> >>>> transactions to the popular Apache HBase datastore. Omid offers great
> >>>> performance, it is highly available, and scalable. Omid's current
> >>>> version is able to scale to thousands of clients triggering concurrent
> >>>> transactions on application data stored in HBase. Omid can scale
> >>>> beyond 100K transactions per second on mid-range hardware while
> >>>> incurring in a minimal impact on the speed of data access in the
> >>>> datastore. We’re currently experimenting with a prototype version that
> >>>> can improve the performance up to ~380K TPS.
> >>>>
> >>>>
> >>>> Omid has been publicly available as an open-source project in Github
> >>>> under Apache License Version 2.0 since 2011 [1]. During these years,
> >>>> it has generated certain interest in the open source community,
> >>>> especially since the public presentation of the first version in
> >>>> Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
> >>>> 93 forks. Yahoo Inc. submits this proposal to the Apache Software
> >>>> Foundation with the aim to transfer the Omid project -including its
> >>>> source code and documentation- to Apache in order to start the build
> >>>> of a stable open source community around it.
> >>>>
> >>>>
> >>>> [1] https://github.com/yahoo/omid
> >>>>
> >>>> [2] Omid presentation at Hadoop Summit 2013:
> >>
> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
> >>>>
> >>>>
> >>>> === Background ===
> >>>>
> >>>> An Omid prototype was first released as an open-source project back in
> >>>> 2011. Inspired by Google Percolator [1], it offered a lock-free
> >>>> approach to transactions in NoSQL datastores (See [2]). However,
> >>>> during these years, the design of Omid has evolved significantly.
> >>>> Whilst the current open-sourced version maintains many aspects of the
> >>>> original implementation, it is the result of a major redesign of the
> >>>> first prototype released in 2011.
> >>>>
> >>>>
> >>>> Omid has now a more decentralized design that does not sacrifice the
> >>>> consistency and performance of the original version. The current
> >>>> design also enables Omid to scale to thousands of clients executing
> >>>> transactions concurrently on application data stored in HBase.
> >>>> Internally, Omid still utilizes a lock-free approach to support
> >>>> multiple concurrent clients. Its design also relies on a centralized
> >>>> conflict detection component, the TSO, which now resolves in an
> >>>> efficient manner writeset collisions among concurrent transactions
> >>>> without having to piggyback commit information to the clients. Another
> >>>> important benefit of Omid is that it doesn't require any modification
> >>>> of the underlying key-value datastore, HBase in this case. Moreover,
> >>>> the recently added high availability algorithm allows to eliminate the
> >>>> single point of failure represented by the TSO in those system
> >>>> deployments requiring a higher degree of dependability. Last but not
> >>>> least, the provided user API is very simple, mimicking transaction
> >>>> managers in the relational world: begin, commit, rollback.
> >>>>
> >>>>
> >>>> Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
> >>>> management platform powering some of next-generation search and
> >>>> personalization products is using Omid as a transaction manager in its
> >>>> processing pipeline. Sieve essentially acts as a huge processing hub
> >>>> between content feeds and serving systems. It provides an environment
> >>>> for highly customizable, real-time, streamed information processing,
> >>>> with typical discovery-to-service latencies of just a few seconds. In
> >>>> terms of scale and availability, Omid’s new design was largely driven
> >>>> by Sieve’s requirements.
> >>>>
> >>>>
> >>>> At Yahoo, we are also making an effort to disseminate the current
> >>>> status of the project through blog entries (See [3], [4] and [5]) and
> >>>> submissions to technical and academic conferences such as ATC 2016,
> >>>> Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
> >>>> appeared in a TechCrunch article in the last quarter of 2015 (See [6])
> >>>>
> >>>>
> >>>> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
> >>>> Distributed Transactions and Notifications. USENIX Symposium on
> >>>> Operating Systems Design and Implementation, 2010
> >>>>
> >>>> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
> >>>> Omid: Lock-free transactional support for distributed data stores. In
> >>>> Proc. of ICDE, 2013.
> >>>>
> >>>> [3]
> >>
> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
> >>>>
> >>>> [4]
> >>
> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
> >>>>
> >>>> [5]
> >>
> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
> >>>>
> >>>> [6]
> >>
> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
> >>>>
> >>>>
> >>>> === Rationale ===
> >>>>
> >>>> Programming with ACID (Atomicity, Consistency, Isolation, Durability)
> >>>> transactions is very popular and it is featured in relational
> >>>> databases. However, in the Big Data ecosystem, applications typically
> >>>> use NoSQL datastores, which do not provide ACID transactions. Such
> >>>> NoSQL datastores used to give up transactional support for greater
> >>>> agility and scalability. However, while early NoSQL data store
> >>>> implementations did not include transaction support, the need for
> >>>> transactions soon emerged in Big Data applications when accessing
> >>>> shared data; for  example, transactions are very important  for
> >>>> modern, scalable systems that process content incrementally.
> >>>>
> >>>>
> >>>> NoSQL datastores -including HBase- don’t provide transactional
> >>>> frameworks to coordinate the access to the underlying data for
> >>>> preserving consistency. By using Omid, Big Data applications that need
> >>>> to bundle multiple read and write operations on HBase into logically
> >>>> indivisible units of work can execute transactions with ACID
> >>>> properties, just as they would use transactions in the relational
> >>>> database world. Omid extends the HBase key-value access APl with
> >>>> transaction semantics. It can be exercised either directly, or via
> >>>> higher level data management API’s. For example, Apache Phoenix
> >>>> (SQL-on-top-of-HBase) might use Omid as its transaction management
> >>>> component.
> >>>>
> >>>>
> >>>> The following features make Omid an attractive choice for system
> >>>> designers and other projects in the Apache community:
> >>>>
> >>>>
> >>>> * Semantics. Omid implements Snapshot Isolation (SI,) supported by
> >>>> major SQL and NoSQL technologies (e.g. Google Percolator).
> >>>>
> >>>>
> >>>> * Performance and Scalability. Omid  provides a highly scalable,
> >>>> lock-free implementation of SI. To the best of our knowledge, it is
> >>>> also one of the few open source NoSQL transactional platforms that can
> >>>> execute more than 100K transactions per second [1]. A new prototype
> >>>> still in development can go even further, up to ~380K TPS.
> >>>>
> >>>>
> >>>> * Reliability.  Omid has a high-availability (HA) mode, in which the
> >>>> core service performing writeset conflict resolution operates as
> >>>> primary-backup process pair with automatic failover. The HA support
> >>>> has zero overhead on the mainstream operation.
> >>>>
> >>>>
> >>>> * Adaptability. Omid current version provides transactions on data
> >>>> stored in Apache HBase. However, Omid’s components are generic enough
> >>>> to be adapted to any other key-value NoSQL datasource that supports
> >>>> MVCC.
> >>>>
> >>>>
> >>>> * Development. Omid provides a very simple interface that mimics
> >>>> standard HBase APIs, making it developer friendly. Only minimal
> >>>> extensions to the standard interfaces have been introduced to enable
> >>>> transactions.
> >>>>
> >>>>
> >>>> * Simplicity. Omid leverages the HBase infrastructure for managing its
> >>>> own metadata. It entails no additional services apart from those
> >>>> provided and used by HBase.
> >>>>
> >>>>
> >>>> * Track Record. As we have mentioned, Omid is already in use by
> >>>> very-large-scale production systems at Yahoo. Also, Hortonworks is
> >>>> integrating Omid in a metastore implementation for Hive based on
> >>>> HBase.
> >>>>
> >>>> [1] See also Haeinsa:
> https://github.com/vcnc/haeinsa/wiki/Performance
> >>>>
> >>>>
> >>>> === Current Status ===
> >>>> Current Omid implementation is available in both, Yahoo’s internal
> >>>> Github repository for internal use at Yahoo as well as in Yahoo’s
> >>>> Github public repository (https://github.com/yahoo/omid.git). Both
> >>>> repositories are managed by Omid’s current developers at Yahoo.
> >>>>
> >>>> As it is mentioned above, Yahoo is currently using Omid for providing
> >>>> transactions in Sieve, a web-scale content management platform that
> >>>> powers Yahoo’s next-generation search and personalization products.
> >>>>
> >>>>
> >>>> ==== Meritocracy ====
> >>>> The first version of Omid was originally created in 2011 by Maysam
> >>>> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
> >>>> Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
> >>>>
> >>>>
> >>>> During the years after its inception, Omid has matured to operate at
> >>>> Web scale and has been used internally by strategic projects at Yahoo
> >>>> such as Sieve. The current base of committers belong to the Yahoo team
> >>>> that took over the initial Omid prototype and rewrote it to meet the
> >>>> high availability and scalability requirements of the Sieve project.
> >>>> This base of committers has recently incorporated Hortonworks members
> >>>> that helped in the Omid adaptation to HBase 1.x versions.
> >>>>
> >>>>
> >>>> With this initial committer base, we aim to form a larger community
> >>>> that can collaborate with new ideas over the current code base. This
> >>>> new community will run the project following the "Apache Way"
> >>>> (http://apache.org/foundation/governance/). Users and new
> contributors
> >>>> will be treated with respect and welcomed. To grow the community, we
> >>>> will encourage contributors to provide patches, review code, propose
> >>>> new features improvements, talk at conferences such as Hadoop Summit,
> >>>> HBaseCon, ApacheCon, etc. Committership and PMC membership will be
> >>>> offered according to meritocracy.
> >>>>
> >>>> ==== Community ====
> >>>>
> >>>> The public Yahoo Omid repository at Github currently has 241 Stars and
> >>>> 93 forks, which means that there is an important interest for the
> >>>> project in the open-source community, at least compared with other
> >>>> similar projects (See https://github.com/yahoo/omid.git).
> >>>>
> >>>>
> >>>> Recently, Hortonworks contributors to the Apache Hive project which
> >>>> are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
> >>>> manifested interest in using Omid. We started with them a fruitful
> >>>> collaboration that resulted in Omid supporting HBase 1.x versions.
> >>>>
> >>>>
> >>>> Salesforce is also interested in collaborating in doing a Proof of
> >>>> Concept for integrating Omid as a pluggable transaction manager in
> >>>> Apache Phoenix.
> >>>>
> >>>>
> >>>> Yahoo, Hortonworks and Salesforce participants will constitute the
> >>>> initial set of committers and mentors for the proposal.
> >>>>
> >>>> ==== Core Developers ====
> >>>> The core developers of Omid are all skilled software developers and
> >>>> research engineers at Yahoo Inc. and Hortonworks with years of
> >>>> experiences in their fields. At this moment, developers are
> >>>> distributed across U.S. and Israel. The aim is to incorporate more
> >>>> committers from different organizations and locations over time.
> >>>>
> >>>>
> >>>> The current set of developers include experienced committers from
> >>>> Apache HBase, Hive and Hadoop projects that have been working with us
> >>>> in the current codebase found in Github.
> >>>>
> >>>> Finally, some of the core developers are currently NOT affiliated with
> >>>> the ASF and would require new ICLAs to be filed.
> >>>>
> >>>>
> >>>> === Alignment ===
> >>>> Omid enhances with transactions the already successful Apache HBase
> >>>> datastore project. We have collaborated with other developers inside
> >>>> and outside Yahoo which are involved in the Apache HBase community, so
> >>>> we have had reliable feedback from them.
> >>>>
> >>>> Although Omid brings value into HBase, the design of the current
> >>>> version provides a general transaction scheme that can potentially be
> >>>> adapted to other MVCC key-value datastores such as Apache Cassandra.
> >>>>
> >>>>
> >>>> Apache Phoenix is also a potential target. Phoenix is a SQL layer on
> >>>> top of HBase that can potentially integrate Omid in order to provide
> >>>> the well-know concept of transactions to Phoenix-based applications.
> >>>>
> >>>>
> >>>> === Known Risks ===
> >>>> ==== Orphaned products ====
> >>>> Yahoo’s Research and Search organizations have been taking care of
> >>>> Omid development since the first prototype creation in 2011. Yahoo has
> >>>> a long history participating in open-source projects, and has been
> >>>> also a long time contributor to the Apache community. For example, in
> >>>> Apache, Yahoo is an important contributor in many projects in the
> >>>> Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
> >>>> open-sourced other well-known projects outside Hadoop, such as
> >>>> Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
> >>>> Omid also a successful open-source Apache product. If this happens, we
> >>>> are sure that a larger community will be formed around the project in
> >>>> a relatively short period of time, contributing to the diversification
> >>>> and stabilization of the base of committers.
> >>>>
> >>>>
> >>>> ==== Inexperience with Open Source ====
> >>>> This project has long standing experienced mentors and interested
> >>>> contributors from Apache HBase, Hive and Phoenix to help us moving
> >>>> through the open source process. We are actively working with
> >>>> experienced Apache community members to improve our project and
> >>>> further testing.
> >>>>
> >>>> ==== Homogeneous Developers ====
> >>>> Omid has been supported by Yahoo since its inception in 2011. However,
> >>>> all current committers are employed by their respective companies
> >>>> shown in the Affiliations section.
> >>>>
> >>>>
> >>>> ==== Reliance on Salaried Developers ====
> >>>>
> >>>> All the current developers are paid by their employers to contribute
> >>>> to this project. Yahoo developers will also continuing maintaining the
> >>>> internal Omid repository at their company.
> >>>>
> >>>> Of course, other developers are welcomed to contribute to this project
> >>>> after it is open sourced in Apache.
> >>>>
> >>>> ==== Relationships with Other Apache Product ====
> >>>>
> >>>> Current Omid incarnation serves transactional contexts to applications
> >>>> storing their data in HBase. However Omid design potentially allows to
> >>>> be adapted to serve transactions on top of other MVCC-based key-value
> >>>> datastores in Apache community such as Cassandra.
> >>>>
> >>>>
> >>>> As a transactional framework, many other Apache projects such as
> >>>> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
> >>>> potentially benefit from Omid to get transactional contexts. In
> >>>> particular, Apache Phoenix -a SQL layer on top of HBase- might use
> >>>> Omid as its transaction management component. Once we open source Omid
> >>>> as an Apache project, we expect to generate more interest in the
> >>>> surrounded communities.
> >>>>
> >>>>
> >>>> Very recently, a new incubator proposal for a similar project called
> >>>> Tephra, has been submitted to the ASF. We think this is good for the
> >>>> Apache community, and we believe that there’s room for both proposals
> >>>> as the design of each of them is based on different principles (e.g.
> >>>> Omid does not require to maintain the state of ongoing transactions on
> >>>> the server-side component) and due to the fact that both -Tephra and
> >>>> Omid- have also gained certain traction in the open-source community.
> >>>>
> >>>>
> >>>> With regard to the Apache projects that Omid uses, apart from HBase,
> >>>> Omid relies on Apache Zookeeper and Curator projects in order to
> >>>> coordinate the (re)connection of transaction managers (acting as
> >>>> clients) to the conflict resolution component for transactions (server
> >>>> side.) They’re also used in order to coordinate the master and backup
> >>>> replicas in high availability scenarios.
> >>>>
> >>>>
> >>>> ==== An Excessive Fascination with the Apache Brand ====
> >>>>
> >>>> We are applying to the Incubator process because we think that it is
> >>>> the logical next step for the  Omid project after we open-sourced the
> >>>> code in Github some years ago. Yahoo has a long-standing history of
> >>>> contributing to Apache projects. The developers and contributors
> >>>> understand the implications of making it an Apache project, and
> >>>> strongly believe that the growing community can benefit from the
> >>>> Apache environment, ecosystem, and infrastrastructure.
> >>>>
> >>>>
> >>>> === Documentation ===
> >>>> Current documentation about the project is available in the wiki of
> >>>> Omid’s Github repository: https://github.com/yahoo/omid/wiki . It
> will
> >>>> be moved under https://omid.incubator.apache.org/docs if the project
> >>>> is accepted as an Apache Incubator.
> >>>>
> >>>> === Initial Source ===
> >>>> Initial source code is currently hosted in Github for general viewing
> >>>> and contribution:
> >>>>
> >>>> https://github.com/yahoo/omid.git
> >>>>
> >>>>
> >>>> Omid source code is written in Java code (99%) mixed with some shell
> >>>> script (1%) in order to configure and trigger the execution of main
> >>>> components.
> >>>>
> >>>>
> >>>> The code will be moved to Apache http://git.apache.org/ if accepted
> as
> >>>> an Incubator project.
> >>>>
> >>>> === Source and Intellectual Property Submission Plan ===
> >>>>
> >>>> The current Omid License for the code published in Github is Apache
> >>>> 2.0. If Omid fulfills and passes the conditions for being an Incubator
> >>>> project in the ASF, the source code will be transitioned via the
> >>>> Software Grant Agreement onto the ASF infrastructure and in turn made
> >>>> available under the Apache License, version 2.0.
> >>>>
> >>>> === External Dependencies ===
> >>>>
> >>>>
> >>>> The required external dependencies that are not Apache projects are
> >>>> all Apache licenses or other compatible Licenses:
> >>>>
> >>>> Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
> >>>>
> >>>> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
> >>>>
> >>>> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
> >>>>
> >>>> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
> >>>>
> >>>> Testng v6.8.8  (http://testng.org) [Apache 2.0]
> >>>>
> >>>> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
> >>>>
> >>>> Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
> >>>>
> >>>> Google Protocol Buffers v2.5.0
> >>>> (https://developers.google.com/protocol-buffers/) [BSD License]
> >>>>
> >>>> Mockito (http://mockito.org/) v1.9.5 [MIT License]
> >>>>
> >>>> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
> >>>> [Apache 2.0]
> >>>>
> >>>> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
> >>>> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
> >>>>
> >>>> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
> >>>>
> >>>> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
> >>>>
> >>>>
> >>>> === Cryptography ===
> >>>> Omid project does not use cryptography itself. However, Apache HBase
> >>>> -the datastore on top of which Omid works in its current version- uses
> >>>> standard APIs and tools for SSH and SSL communication where necessary.
> >>>>
> >>>> === Required Resources ===
> >>>> We request that following resources be created for the project to use:
> >>>>
> >>>> ==== Mailing lists ====
> >>>>
> >>>> omid-private (moderated subscriptions)
> >>>>
> >>>> omid-commits (commit notification)
> >>>> omid-dev (technical discussions)
> >>>>
> >>>> ==== Git repository ====
> >>>> https://github.com/apache/incubator-omid
> >>>>
> >>>> ==== Documentation ====
> >>>> https://omid.incubator.apache.org/docs/
> >>>>
> >>>> ==== JIRA instance ====
> >>>> https://issues.apache.org/jira/browse/omid
> >>>>
> >>>> === Initial Committers ===
> >>>>
> >>>> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> >>>>
> >>>>
> >>>> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> >>>>
> >>>>
> >>>> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> >>>>
> >>>>
> >>>> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> >>>>
> >>>>
> >>>> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
> >>>>
> >>>>
> >>>> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
> >>>>
> >>>> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> >>>>
> >>>>
> >>>> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
> >>>>
> >>>>
> >>>> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
> >>>>
> >>>>
> >>>> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
> >>>>
> >>>> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> >>>>
> >>>>
> >>>> === Additional Interested Contributors ===
> >>>> * Ivan Kelly (ivank<AT>apache<DOT>org)
> >>>>
> >>>> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
> >>>>
> >>>>
> >>>> === Affiliations ===
> >>>>
> >>>> * Edward Bortnikov, Yahoo Inc.
> >>>>
> >>>>
> >>>> * Daniel Dai, Hortonworks
> >>>>
> >>>>
> >>>> * Flavio P. Junqueira, Confluent
> >>>>
> >>>>
> >>>> * Igor Katkov, Yahoo Inc.
> >>>>
> >>>>
> >>>> * Ivan Kelly, Midokura
> >>>>
> >>>>
> >>>> * Francis C. Liu, Yahoo Inc.
> >>>>
> >>>>
> >>>> * Sameer Paranjpye, Arimo
> >>>>
> >>>> * Francisco Perez-Sorrosal, Yahoo Inc.
> >>>>
> >>>>
> >>>> * Ohad Shacham, Yahoo Inc.
> >>>>
> >>>>
> >>>> * Maysam Yabandeh, Dropbox Inc.
> >>>>
> >>>>
> >>>> === Sponsors ===
> >>>>
> >>>> ==== Champion ====
> >>>>
> >>>> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> >>>>
> >>>> ==== Nominated Mentors ====
> >>>>
> >>>> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> >>>>
> >>>> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> >>>>
> >>>> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> >>>>
> >>>> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> >>>>
> >>>> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> >>>>
> >>>>
> >>>> ==== Sponsoring Entity ====
> >>>> Apache Incubator PMC
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> <javascript:;>
> >>>> For additional commands, e-mail: general-help@incubator.apache.org
> >> <javascript:;>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> <javascript:;>
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >> <javascript:;>
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Andrew Purtell <an...@gmail.com>.
Apache Phoenix just released version 4.7.0 with big news: transactions support, using Tephra. There's some interest in a successful Tephra incubation beyond the podling already. That said, that new code in Phoenix can be made pluggable to support more than one transaction oracle. Omid might be able to provide workable integration to stand in for Tephra. Collaboration between or even a joining of the two communities could be good but even if not as a potential downstream consumer it's good to have options! (provided the number of alternatives is bounded with reason of course). I think it would be good to see Omid get in. I think an Omid podling would find interested collaborators in the Phoenix and HBase communities right away. 


> On Mar 19, 2016, at 12:20 PM, Henry Saputra <he...@gmail.com> wrote:
> 
> Thanks for the great explanation, Flavio.
> 
> As many have mentioned before, it is definitely ok to have similar projects
> in ASF. We have prior acts before and I didn't expect incubator to reject
> good projects coming in.
> 
> My intention was to avoid split of resources where both projects have
> very similar goal and approach. But maybe both projects have different
> subtle differences that worthy to be done as independent effort.
> 
> Just being devil advocate a bit to see if potential to collaborate.
> 
> - Henry
> 
>> On Saturday, March 19, 2016, Flavio Junqueira <fp...@apache.org> wrote:
>> 
>> I understand the concern, so let me try to offer some facts and see if we
>> can make progress from there.
>> 
>> Omid has been around for some time now, and its initial design appeared in
>> a couple of research papers that I actually co-authored. The architecture
>> is based on the idea of having a centralized transaction status oracle that
>> shares transaction status data with clients for scalability. The current
>> Omid project evolved out of that initial work and it is a much improved
>> version over that first iteration, with the improvements focusing on
>> scalability. It currently runs in production at scale at Yahoo! and there
>> is interest from other companies according to the proposal. There is a
>> series of blog posts about the experience in the project proposal.
>> 
>> Tephra has a very similar architecture. The description here says that it
>> has a transaction server, which sounds like the TSO in the original Omid
>> papers. I haven't spent enough time understanding the precise protocol they
>> use, but I must say that the protocol is very important for correctness and
>> scalability. Having two protocols with different properties could justify
>> the presence of two projects, but they both promise snapshot isolation so I
>> suspect they will be doing very similar things.
>> 
>> Overall, as I see it, it would be very unfair to reject the Omid proposal
>> on the basis that Tephra was incubated a couple of weeks ago. I'd much
>> rather see how the two communities evolve and have the mentors of the
>> projects fostering collaboration and possibly a merge of the two projects
>> before graduation. Why not think of a general transaction status oracle
>> with different protocol implementations assuming it makes sense? I wouldn't
>> like to see any of the two blocked upfront on the basis that they are in
>> the same space, though. We could postpone this decision until graduation
>> when we'll have more knowledge about the projects and the growth of the two
>> communities.
>> 
>> -Flavio
>> 
>>>> On 18 Mar 2016, at 23:19, Henry Saputra <henry.saputra@gmail.com
>>> <javascript:;>> wrote:
>>> 
>>> I know Apache incubator does not play favorite but it is getting awkward
>>> that TWO transaction engine for HBase coming to incubator at the same
>> time.
>>> 
>>> As most people know, the other one is Tephra, that just coming to
>> incubator
>>> few weeks ago.
>>> 
>>> As member of IPMC, I would like to see Omid provide some more details
>>> comparisons about the difference that the project bring,  in term of
>>> approach and possible integrations with other ASF projects.
>>> 
>>> If possible, I would prefer to see Omid team work together with Tephra to
>>> work on working together to make one solid transaction engine for HBase
>> and
>>> later NoSQL databases.
>>> 
>>> 
>>> - Henry
>>> 
>>>> On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <daijyc@gmail.com
>>> <javascript:;>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I would like to propose Omid as an Apache Incubator project:
>>>> 
>>>> https://wiki.apache.org/incubator/OmidProposal
>>>> 
>>>> I've posted posted the text of the proposal below:
>>>> 
>>>> Thanks,
>>>> Daniel
>>>> 
>>>> = Omid Proposal =
>>>> 
>>>> === Abstract ===
>>>> 
>>>> Omid is a flexible, reliable, high performant and scalable ACID
>>>> transactional framework that allows client applications to execute
>>>> transactions on top of MVCC key/value-based NoSQL datastores
>>>> (currently Apache HBase) providing Snapshot Isolation guarantees on
>>>> the accessed data.
>>>> 
>>>> 
>>>> === Proposal ===
>>>> 
>>>> Omid is a flexible open-source transactional framework that provides
>>>> ACID transactions with Snapshot Isolation guarantees on top of NoSQL
>>>> datastores. In particular, the current codebase brings the concept of
>>>> transactions to the popular Apache HBase datastore. Omid offers great
>>>> performance, it is highly available, and scalable. Omid's current
>>>> version is able to scale to thousands of clients triggering concurrent
>>>> transactions on application data stored in HBase. Omid can scale
>>>> beyond 100K transactions per second on mid-range hardware while
>>>> incurring in a minimal impact on the speed of data access in the
>>>> datastore. We’re currently experimenting with a prototype version that
>>>> can improve the performance up to ~380K TPS.
>>>> 
>>>> 
>>>> Omid has been publicly available as an open-source project in Github
>>>> under Apache License Version 2.0 since 2011 [1]. During these years,
>>>> it has generated certain interest in the open source community,
>>>> especially since the public presentation of the first version in
>>>> Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
>>>> 93 forks. Yahoo Inc. submits this proposal to the Apache Software
>>>> Foundation with the aim to transfer the Omid project -including its
>>>> source code and documentation- to Apache in order to start the build
>>>> of a stable open source community around it.
>>>> 
>>>> 
>>>> [1] https://github.com/yahoo/omid
>>>> 
>>>> [2] Omid presentation at Hadoop Summit 2013:
>> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
>>>> 
>>>> 
>>>> === Background ===
>>>> 
>>>> An Omid prototype was first released as an open-source project back in
>>>> 2011. Inspired by Google Percolator [1], it offered a lock-free
>>>> approach to transactions in NoSQL datastores (See [2]). However,
>>>> during these years, the design of Omid has evolved significantly.
>>>> Whilst the current open-sourced version maintains many aspects of the
>>>> original implementation, it is the result of a major redesign of the
>>>> first prototype released in 2011.
>>>> 
>>>> 
>>>> Omid has now a more decentralized design that does not sacrifice the
>>>> consistency and performance of the original version. The current
>>>> design also enables Omid to scale to thousands of clients executing
>>>> transactions concurrently on application data stored in HBase.
>>>> Internally, Omid still utilizes a lock-free approach to support
>>>> multiple concurrent clients. Its design also relies on a centralized
>>>> conflict detection component, the TSO, which now resolves in an
>>>> efficient manner writeset collisions among concurrent transactions
>>>> without having to piggyback commit information to the clients. Another
>>>> important benefit of Omid is that it doesn't require any modification
>>>> of the underlying key-value datastore, HBase in this case. Moreover,
>>>> the recently added high availability algorithm allows to eliminate the
>>>> single point of failure represented by the TSO in those system
>>>> deployments requiring a higher degree of dependability. Last but not
>>>> least, the provided user API is very simple, mimicking transaction
>>>> managers in the relational world: begin, commit, rollback.
>>>> 
>>>> 
>>>> Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
>>>> management platform powering some of next-generation search and
>>>> personalization products is using Omid as a transaction manager in its
>>>> processing pipeline. Sieve essentially acts as a huge processing hub
>>>> between content feeds and serving systems. It provides an environment
>>>> for highly customizable, real-time, streamed information processing,
>>>> with typical discovery-to-service latencies of just a few seconds. In
>>>> terms of scale and availability, Omid’s new design was largely driven
>>>> by Sieve’s requirements.
>>>> 
>>>> 
>>>> At Yahoo, we are also making an effort to disseminate the current
>>>> status of the project through blog entries (See [3], [4] and [5]) and
>>>> submissions to technical and academic conferences such as ATC 2016,
>>>> Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
>>>> appeared in a TechCrunch article in the last quarter of 2015 (See [6])
>>>> 
>>>> 
>>>> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
>>>> Distributed Transactions and Notifications. USENIX Symposium on
>>>> Operating Systems Design and Implementation, 2010
>>>> 
>>>> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
>>>> Omid: Lock-free transactional support for distributed data stores. In
>>>> Proc. of ICDE, 2013.
>>>> 
>>>> [3]
>> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
>>>> 
>>>> [4]
>> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
>>>> 
>>>> [5]
>> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
>>>> 
>>>> [6]
>> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
>>>> 
>>>> 
>>>> === Rationale ===
>>>> 
>>>> Programming with ACID (Atomicity, Consistency, Isolation, Durability)
>>>> transactions is very popular and it is featured in relational
>>>> databases. However, in the Big Data ecosystem, applications typically
>>>> use NoSQL datastores, which do not provide ACID transactions. Such
>>>> NoSQL datastores used to give up transactional support for greater
>>>> agility and scalability. However, while early NoSQL data store
>>>> implementations did not include transaction support, the need for
>>>> transactions soon emerged in Big Data applications when accessing
>>>> shared data; for  example, transactions are very important  for
>>>> modern, scalable systems that process content incrementally.
>>>> 
>>>> 
>>>> NoSQL datastores -including HBase- don’t provide transactional
>>>> frameworks to coordinate the access to the underlying data for
>>>> preserving consistency. By using Omid, Big Data applications that need
>>>> to bundle multiple read and write operations on HBase into logically
>>>> indivisible units of work can execute transactions with ACID
>>>> properties, just as they would use transactions in the relational
>>>> database world. Omid extends the HBase key-value access APl with
>>>> transaction semantics. It can be exercised either directly, or via
>>>> higher level data management API’s. For example, Apache Phoenix
>>>> (SQL-on-top-of-HBase) might use Omid as its transaction management
>>>> component.
>>>> 
>>>> 
>>>> The following features make Omid an attractive choice for system
>>>> designers and other projects in the Apache community:
>>>> 
>>>> 
>>>> * Semantics. Omid implements Snapshot Isolation (SI,) supported by
>>>> major SQL and NoSQL technologies (e.g. Google Percolator).
>>>> 
>>>> 
>>>> * Performance and Scalability. Omid  provides a highly scalable,
>>>> lock-free implementation of SI. To the best of our knowledge, it is
>>>> also one of the few open source NoSQL transactional platforms that can
>>>> execute more than 100K transactions per second [1]. A new prototype
>>>> still in development can go even further, up to ~380K TPS.
>>>> 
>>>> 
>>>> * Reliability.  Omid has a high-availability (HA) mode, in which the
>>>> core service performing writeset conflict resolution operates as
>>>> primary-backup process pair with automatic failover. The HA support
>>>> has zero overhead on the mainstream operation.
>>>> 
>>>> 
>>>> * Adaptability. Omid current version provides transactions on data
>>>> stored in Apache HBase. However, Omid’s components are generic enough
>>>> to be adapted to any other key-value NoSQL datasource that supports
>>>> MVCC.
>>>> 
>>>> 
>>>> * Development. Omid provides a very simple interface that mimics
>>>> standard HBase APIs, making it developer friendly. Only minimal
>>>> extensions to the standard interfaces have been introduced to enable
>>>> transactions.
>>>> 
>>>> 
>>>> * Simplicity. Omid leverages the HBase infrastructure for managing its
>>>> own metadata. It entails no additional services apart from those
>>>> provided and used by HBase.
>>>> 
>>>> 
>>>> * Track Record. As we have mentioned, Omid is already in use by
>>>> very-large-scale production systems at Yahoo. Also, Hortonworks is
>>>> integrating Omid in a metastore implementation for Hive based on
>>>> HBase.
>>>> 
>>>> [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
>>>> 
>>>> 
>>>> === Current Status ===
>>>> Current Omid implementation is available in both, Yahoo’s internal
>>>> Github repository for internal use at Yahoo as well as in Yahoo’s
>>>> Github public repository (https://github.com/yahoo/omid.git). Both
>>>> repositories are managed by Omid’s current developers at Yahoo.
>>>> 
>>>> As it is mentioned above, Yahoo is currently using Omid for providing
>>>> transactions in Sieve, a web-scale content management platform that
>>>> powers Yahoo’s next-generation search and personalization products.
>>>> 
>>>> 
>>>> ==== Meritocracy ====
>>>> The first version of Omid was originally created in 2011 by Maysam
>>>> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
>>>> Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>>>> 
>>>> 
>>>> During the years after its inception, Omid has matured to operate at
>>>> Web scale and has been used internally by strategic projects at Yahoo
>>>> such as Sieve. The current base of committers belong to the Yahoo team
>>>> that took over the initial Omid prototype and rewrote it to meet the
>>>> high availability and scalability requirements of the Sieve project.
>>>> This base of committers has recently incorporated Hortonworks members
>>>> that helped in the Omid adaptation to HBase 1.x versions.
>>>> 
>>>> 
>>>> With this initial committer base, we aim to form a larger community
>>>> that can collaborate with new ideas over the current code base. This
>>>> new community will run the project following the "Apache Way"
>>>> (http://apache.org/foundation/governance/). Users and new contributors
>>>> will be treated with respect and welcomed. To grow the community, we
>>>> will encourage contributors to provide patches, review code, propose
>>>> new features improvements, talk at conferences such as Hadoop Summit,
>>>> HBaseCon, ApacheCon, etc. Committership and PMC membership will be
>>>> offered according to meritocracy.
>>>> 
>>>> ==== Community ====
>>>> 
>>>> The public Yahoo Omid repository at Github currently has 241 Stars and
>>>> 93 forks, which means that there is an important interest for the
>>>> project in the open-source community, at least compared with other
>>>> similar projects (See https://github.com/yahoo/omid.git).
>>>> 
>>>> 
>>>> Recently, Hortonworks contributors to the Apache Hive project which
>>>> are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
>>>> manifested interest in using Omid. We started with them a fruitful
>>>> collaboration that resulted in Omid supporting HBase 1.x versions.
>>>> 
>>>> 
>>>> Salesforce is also interested in collaborating in doing a Proof of
>>>> Concept for integrating Omid as a pluggable transaction manager in
>>>> Apache Phoenix.
>>>> 
>>>> 
>>>> Yahoo, Hortonworks and Salesforce participants will constitute the
>>>> initial set of committers and mentors for the proposal.
>>>> 
>>>> ==== Core Developers ====
>>>> The core developers of Omid are all skilled software developers and
>>>> research engineers at Yahoo Inc. and Hortonworks with years of
>>>> experiences in their fields. At this moment, developers are
>>>> distributed across U.S. and Israel. The aim is to incorporate more
>>>> committers from different organizations and locations over time.
>>>> 
>>>> 
>>>> The current set of developers include experienced committers from
>>>> Apache HBase, Hive and Hadoop projects that have been working with us
>>>> in the current codebase found in Github.
>>>> 
>>>> Finally, some of the core developers are currently NOT affiliated with
>>>> the ASF and would require new ICLAs to be filed.
>>>> 
>>>> 
>>>> === Alignment ===
>>>> Omid enhances with transactions the already successful Apache HBase
>>>> datastore project. We have collaborated with other developers inside
>>>> and outside Yahoo which are involved in the Apache HBase community, so
>>>> we have had reliable feedback from them.
>>>> 
>>>> Although Omid brings value into HBase, the design of the current
>>>> version provides a general transaction scheme that can potentially be
>>>> adapted to other MVCC key-value datastores such as Apache Cassandra.
>>>> 
>>>> 
>>>> Apache Phoenix is also a potential target. Phoenix is a SQL layer on
>>>> top of HBase that can potentially integrate Omid in order to provide
>>>> the well-know concept of transactions to Phoenix-based applications.
>>>> 
>>>> 
>>>> === Known Risks ===
>>>> ==== Orphaned products ====
>>>> Yahoo’s Research and Search organizations have been taking care of
>>>> Omid development since the first prototype creation in 2011. Yahoo has
>>>> a long history participating in open-source projects, and has been
>>>> also a long time contributor to the Apache community. For example, in
>>>> Apache, Yahoo is an important contributor in many projects in the
>>>> Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
>>>> open-sourced other well-known projects outside Hadoop, such as
>>>> Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
>>>> Omid also a successful open-source Apache product. If this happens, we
>>>> are sure that a larger community will be formed around the project in
>>>> a relatively short period of time, contributing to the diversification
>>>> and stabilization of the base of committers.
>>>> 
>>>> 
>>>> ==== Inexperience with Open Source ====
>>>> This project has long standing experienced mentors and interested
>>>> contributors from Apache HBase, Hive and Phoenix to help us moving
>>>> through the open source process. We are actively working with
>>>> experienced Apache community members to improve our project and
>>>> further testing.
>>>> 
>>>> ==== Homogeneous Developers ====
>>>> Omid has been supported by Yahoo since its inception in 2011. However,
>>>> all current committers are employed by their respective companies
>>>> shown in the Affiliations section.
>>>> 
>>>> 
>>>> ==== Reliance on Salaried Developers ====
>>>> 
>>>> All the current developers are paid by their employers to contribute
>>>> to this project. Yahoo developers will also continuing maintaining the
>>>> internal Omid repository at their company.
>>>> 
>>>> Of course, other developers are welcomed to contribute to this project
>>>> after it is open sourced in Apache.
>>>> 
>>>> ==== Relationships with Other Apache Product ====
>>>> 
>>>> Current Omid incarnation serves transactional contexts to applications
>>>> storing their data in HBase. However Omid design potentially allows to
>>>> be adapted to serve transactions on top of other MVCC-based key-value
>>>> datastores in Apache community such as Cassandra.
>>>> 
>>>> 
>>>> As a transactional framework, many other Apache projects such as
>>>> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
>>>> potentially benefit from Omid to get transactional contexts. In
>>>> particular, Apache Phoenix -a SQL layer on top of HBase- might use
>>>> Omid as its transaction management component. Once we open source Omid
>>>> as an Apache project, we expect to generate more interest in the
>>>> surrounded communities.
>>>> 
>>>> 
>>>> Very recently, a new incubator proposal for a similar project called
>>>> Tephra, has been submitted to the ASF. We think this is good for the
>>>> Apache community, and we believe that there’s room for both proposals
>>>> as the design of each of them is based on different principles (e.g.
>>>> Omid does not require to maintain the state of ongoing transactions on
>>>> the server-side component) and due to the fact that both -Tephra and
>>>> Omid- have also gained certain traction in the open-source community.
>>>> 
>>>> 
>>>> With regard to the Apache projects that Omid uses, apart from HBase,
>>>> Omid relies on Apache Zookeeper and Curator projects in order to
>>>> coordinate the (re)connection of transaction managers (acting as
>>>> clients) to the conflict resolution component for transactions (server
>>>> side.) They’re also used in order to coordinate the master and backup
>>>> replicas in high availability scenarios.
>>>> 
>>>> 
>>>> ==== An Excessive Fascination with the Apache Brand ====
>>>> 
>>>> We are applying to the Incubator process because we think that it is
>>>> the logical next step for the  Omid project after we open-sourced the
>>>> code in Github some years ago. Yahoo has a long-standing history of
>>>> contributing to Apache projects. The developers and contributors
>>>> understand the implications of making it an Apache project, and
>>>> strongly believe that the growing community can benefit from the
>>>> Apache environment, ecosystem, and infrastrastructure.
>>>> 
>>>> 
>>>> === Documentation ===
>>>> Current documentation about the project is available in the wiki of
>>>> Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
>>>> be moved under https://omid.incubator.apache.org/docs if the project
>>>> is accepted as an Apache Incubator.
>>>> 
>>>> === Initial Source ===
>>>> Initial source code is currently hosted in Github for general viewing
>>>> and contribution:
>>>> 
>>>> https://github.com/yahoo/omid.git
>>>> 
>>>> 
>>>> Omid source code is written in Java code (99%) mixed with some shell
>>>> script (1%) in order to configure and trigger the execution of main
>>>> components.
>>>> 
>>>> 
>>>> The code will be moved to Apache http://git.apache.org/ if accepted as
>>>> an Incubator project.
>>>> 
>>>> === Source and Intellectual Property Submission Plan ===
>>>> 
>>>> The current Omid License for the code published in Github is Apache
>>>> 2.0. If Omid fulfills and passes the conditions for being an Incubator
>>>> project in the ASF, the source code will be transitioned via the
>>>> Software Grant Agreement onto the ASF infrastructure and in turn made
>>>> available under the Apache License, version 2.0.
>>>> 
>>>> === External Dependencies ===
>>>> 
>>>> 
>>>> The required external dependencies that are not Apache projects are
>>>> all Apache licenses or other compatible Licenses:
>>>> 
>>>> Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
>>>> 
>>>> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
>>>> 
>>>> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>>>> 
>>>> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
>>>> 
>>>> Testng v6.8.8  (http://testng.org) [Apache 2.0]
>>>> 
>>>> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>>>> 
>>>> Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>>>> 
>>>> Google Protocol Buffers v2.5.0
>>>> (https://developers.google.com/protocol-buffers/) [BSD License]
>>>> 
>>>> Mockito (http://mockito.org/) v1.9.5 [MIT License]
>>>> 
>>>> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
>>>> [Apache 2.0]
>>>> 
>>>> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
>>>> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>>>> 
>>>> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>>>> 
>>>> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>>>> 
>>>> 
>>>> === Cryptography ===
>>>> Omid project does not use cryptography itself. However, Apache HBase
>>>> -the datastore on top of which Omid works in its current version- uses
>>>> standard APIs and tools for SSH and SSL communication where necessary.
>>>> 
>>>> === Required Resources ===
>>>> We request that following resources be created for the project to use:
>>>> 
>>>> ==== Mailing lists ====
>>>> 
>>>> omid-private (moderated subscriptions)
>>>> 
>>>> omid-commits (commit notification)
>>>> omid-dev (technical discussions)
>>>> 
>>>> ==== Git repository ====
>>>> https://github.com/apache/incubator-omid
>>>> 
>>>> ==== Documentation ====
>>>> https://omid.incubator.apache.org/docs/
>>>> 
>>>> ==== JIRA instance ====
>>>> https://issues.apache.org/jira/browse/omid
>>>> 
>>>> === Initial Committers ===
>>>> 
>>>> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>>>> 
>>>> 
>>>> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>>>> 
>>>> 
>>>> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>>>> 
>>>> 
>>>> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>>>> 
>>>> 
>>>> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>>>> 
>>>> 
>>>> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>>>> 
>>>> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>>>> 
>>>> 
>>>> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>>>> 
>>>> 
>>>> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>>>> 
>>>> 
>>>> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>>>> 
>>>> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>>>> 
>>>> 
>>>> === Additional Interested Contributors ===
>>>> * Ivan Kelly (ivank<AT>apache<DOT>org)
>>>> 
>>>> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>>>> 
>>>> 
>>>> === Affiliations ===
>>>> 
>>>> * Edward Bortnikov, Yahoo Inc.
>>>> 
>>>> 
>>>> * Daniel Dai, Hortonworks
>>>> 
>>>> 
>>>> * Flavio P. Junqueira, Confluent
>>>> 
>>>> 
>>>> * Igor Katkov, Yahoo Inc.
>>>> 
>>>> 
>>>> * Ivan Kelly, Midokura
>>>> 
>>>> 
>>>> * Francis C. Liu, Yahoo Inc.
>>>> 
>>>> 
>>>> * Sameer Paranjpye, Arimo
>>>> 
>>>> * Francisco Perez-Sorrosal, Yahoo Inc.
>>>> 
>>>> 
>>>> * Ohad Shacham, Yahoo Inc.
>>>> 
>>>> 
>>>> * Maysam Yabandeh, Dropbox Inc.
>>>> 
>>>> 
>>>> === Sponsors ===
>>>> 
>>>> ==== Champion ====
>>>> 
>>>> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>>>> 
>>>> ==== Nominated Mentors ====
>>>> 
>>>> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>>>> 
>>>> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>>>> 
>>>> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>>>> 
>>>> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>>>> 
>>>> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>>>> 
>>>> 
>>>> ==== Sponsoring Entity ====
>>>> Apache Incubator PMC
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> <javascript:;>
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>> <javascript:;>
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> <javascript:;>
>> For additional commands, e-mail: general-help@incubator.apache.org
>> <javascript:;>
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Henry Saputra <he...@gmail.com>.
Thanks for the great explanation, Flavio.

As many have mentioned before, it is definitely ok to have similar projects
in ASF. We have prior acts before and I didn't expect incubator to reject
good projects coming in.

My intention was to avoid split of resources where both projects have
very similar goal and approach. But maybe both projects have different
subtle differences that worthy to be done as independent effort.

Just being devil advocate a bit to see if potential to collaborate.

- Henry

On Saturday, March 19, 2016, Flavio Junqueira <fp...@apache.org> wrote:

> I understand the concern, so let me try to offer some facts and see if we
> can make progress from there.
>
> Omid has been around for some time now, and its initial design appeared in
> a couple of research papers that I actually co-authored. The architecture
> is based on the idea of having a centralized transaction status oracle that
> shares transaction status data with clients for scalability. The current
> Omid project evolved out of that initial work and it is a much improved
> version over that first iteration, with the improvements focusing on
> scalability. It currently runs in production at scale at Yahoo! and there
> is interest from other companies according to the proposal. There is a
> series of blog posts about the experience in the project proposal.
>
> Tephra has a very similar architecture. The description here says that it
> has a transaction server, which sounds like the TSO in the original Omid
> papers. I haven't spent enough time understanding the precise protocol they
> use, but I must say that the protocol is very important for correctness and
> scalability. Having two protocols with different properties could justify
> the presence of two projects, but they both promise snapshot isolation so I
> suspect they will be doing very similar things.
>
> Overall, as I see it, it would be very unfair to reject the Omid proposal
> on the basis that Tephra was incubated a couple of weeks ago. I'd much
> rather see how the two communities evolve and have the mentors of the
> projects fostering collaboration and possibly a merge of the two projects
> before graduation. Why not think of a general transaction status oracle
> with different protocol implementations assuming it makes sense? I wouldn't
> like to see any of the two blocked upfront on the basis that they are in
> the same space, though. We could postpone this decision until graduation
> when we'll have more knowledge about the projects and the growth of the two
> communities.
>
> -Flavio
>
> > On 18 Mar 2016, at 23:19, Henry Saputra <henry.saputra@gmail.com
> <javascript:;>> wrote:
> >
> > I know Apache incubator does not play favorite but it is getting awkward
> > that TWO transaction engine for HBase coming to incubator at the same
> time.
> >
> > As most people know, the other one is Tephra, that just coming to
> incubator
> > few weeks ago.
> >
> > As member of IPMC, I would like to see Omid provide some more details
> > comparisons about the difference that the project bring,  in term of
> > approach and possible integrations with other ASF projects.
> >
> > If possible, I would prefer to see Omid team work together with Tephra to
> > work on working together to make one solid transaction engine for HBase
> and
> > later NoSQL databases.
> >
> >
> > - Henry
> >
> > On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <daijyc@gmail.com
> <javascript:;>> wrote:
> >
> >> Hi,
> >>
> >> I would like to propose Omid as an Apache Incubator project:
> >>
> >> https://wiki.apache.org/incubator/OmidProposal
> >>
> >> I've posted posted the text of the proposal below:
> >>
> >> Thanks,
> >> Daniel
> >>
> >> = Omid Proposal =
> >>
> >> === Abstract ===
> >>
> >> Omid is a flexible, reliable, high performant and scalable ACID
> >> transactional framework that allows client applications to execute
> >> transactions on top of MVCC key/value-based NoSQL datastores
> >> (currently Apache HBase) providing Snapshot Isolation guarantees on
> >> the accessed data.
> >>
> >>
> >> === Proposal ===
> >>
> >> Omid is a flexible open-source transactional framework that provides
> >> ACID transactions with Snapshot Isolation guarantees on top of NoSQL
> >> datastores. In particular, the current codebase brings the concept of
> >> transactions to the popular Apache HBase datastore. Omid offers great
> >> performance, it is highly available, and scalable. Omid's current
> >> version is able to scale to thousands of clients triggering concurrent
> >> transactions on application data stored in HBase. Omid can scale
> >> beyond 100K transactions per second on mid-range hardware while
> >> incurring in a minimal impact on the speed of data access in the
> >> datastore. We’re currently experimenting with a prototype version that
> >> can improve the performance up to ~380K TPS.
> >>
> >>
> >> Omid has been publicly available as an open-source project in Github
> >> under Apache License Version 2.0 since 2011 [1]. During these years,
> >> it has generated certain interest in the open source community,
> >> especially since the public presentation of the first version in
> >> Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
> >> 93 forks. Yahoo Inc. submits this proposal to the Apache Software
> >> Foundation with the aim to transfer the Omid project -including its
> >> source code and documentation- to Apache in order to start the build
> >> of a stable open source community around it.
> >>
> >>
> >> [1] https://github.com/yahoo/omid
> >>
> >> [2] Omid presentation at Hadoop Summit 2013:
> >>
> >>
> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
> >>
> >>
> >> === Background ===
> >>
> >> An Omid prototype was first released as an open-source project back in
> >> 2011. Inspired by Google Percolator [1], it offered a lock-free
> >> approach to transactions in NoSQL datastores (See [2]). However,
> >> during these years, the design of Omid has evolved significantly.
> >> Whilst the current open-sourced version maintains many aspects of the
> >> original implementation, it is the result of a major redesign of the
> >> first prototype released in 2011.
> >>
> >>
> >> Omid has now a more decentralized design that does not sacrifice the
> >> consistency and performance of the original version. The current
> >> design also enables Omid to scale to thousands of clients executing
> >> transactions concurrently on application data stored in HBase.
> >> Internally, Omid still utilizes a lock-free approach to support
> >> multiple concurrent clients. Its design also relies on a centralized
> >> conflict detection component, the TSO, which now resolves in an
> >> efficient manner writeset collisions among concurrent transactions
> >> without having to piggyback commit information to the clients. Another
> >> important benefit of Omid is that it doesn't require any modification
> >> of the underlying key-value datastore, HBase in this case. Moreover,
> >> the recently added high availability algorithm allows to eliminate the
> >> single point of failure represented by the TSO in those system
> >> deployments requiring a higher degree of dependability. Last but not
> >> least, the provided user API is very simple, mimicking transaction
> >> managers in the relational world: begin, commit, rollback.
> >>
> >>
> >> Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
> >> management platform powering some of next-generation search and
> >> personalization products is using Omid as a transaction manager in its
> >> processing pipeline. Sieve essentially acts as a huge processing hub
> >> between content feeds and serving systems. It provides an environment
> >> for highly customizable, real-time, streamed information processing,
> >> with typical discovery-to-service latencies of just a few seconds. In
> >> terms of scale and availability, Omid’s new design was largely driven
> >> by Sieve’s requirements.
> >>
> >>
> >> At Yahoo, we are also making an effort to disseminate the current
> >> status of the project through blog entries (See [3], [4] and [5]) and
> >> submissions to technical and academic conferences such as ATC 2016,
> >> Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
> >> appeared in a TechCrunch article in the last quarter of 2015 (See [6])
> >>
> >>
> >> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
> >> Distributed Transactions and Notifications. USENIX Symposium on
> >> Operating Systems Design and Implementation, 2010
> >>
> >> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
> >> Omid: Lock-free transactional support for distributed data stores. In
> >> Proc. of ICDE, 2013.
> >>
> >> [3]
> >>
> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
> >>
> >> [4]
> >>
> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
> >>
> >> [5]
> >>
> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
> >>
> >> [6]
> >>
> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
> >>
> >>
> >> === Rationale ===
> >>
> >> Programming with ACID (Atomicity, Consistency, Isolation, Durability)
> >> transactions is very popular and it is featured in relational
> >> databases. However, in the Big Data ecosystem, applications typically
> >> use NoSQL datastores, which do not provide ACID transactions. Such
> >> NoSQL datastores used to give up transactional support for greater
> >> agility and scalability. However, while early NoSQL data store
> >> implementations did not include transaction support, the need for
> >> transactions soon emerged in Big Data applications when accessing
> >> shared data; for  example, transactions are very important  for
> >> modern, scalable systems that process content incrementally.
> >>
> >>
> >> NoSQL datastores -including HBase- don’t provide transactional
> >> frameworks to coordinate the access to the underlying data for
> >> preserving consistency. By using Omid, Big Data applications that need
> >> to bundle multiple read and write operations on HBase into logically
> >> indivisible units of work can execute transactions with ACID
> >> properties, just as they would use transactions in the relational
> >> database world. Omid extends the HBase key-value access APl with
> >> transaction semantics. It can be exercised either directly, or via
> >> higher level data management API’s. For example, Apache Phoenix
> >> (SQL-on-top-of-HBase) might use Omid as its transaction management
> >> component.
> >>
> >>
> >> The following features make Omid an attractive choice for system
> >> designers and other projects in the Apache community:
> >>
> >>
> >> * Semantics. Omid implements Snapshot Isolation (SI,) supported by
> >> major SQL and NoSQL technologies (e.g. Google Percolator).
> >>
> >>
> >> * Performance and Scalability. Omid  provides a highly scalable,
> >> lock-free implementation of SI. To the best of our knowledge, it is
> >> also one of the few open source NoSQL transactional platforms that can
> >> execute more than 100K transactions per second [1]. A new prototype
> >> still in development can go even further, up to ~380K TPS.
> >>
> >>
> >> * Reliability.  Omid has a high-availability (HA) mode, in which the
> >> core service performing writeset conflict resolution operates as
> >> primary-backup process pair with automatic failover. The HA support
> >> has zero overhead on the mainstream operation.
> >>
> >>
> >> * Adaptability. Omid current version provides transactions on data
> >> stored in Apache HBase. However, Omid’s components are generic enough
> >> to be adapted to any other key-value NoSQL datasource that supports
> >> MVCC.
> >>
> >>
> >> * Development. Omid provides a very simple interface that mimics
> >> standard HBase APIs, making it developer friendly. Only minimal
> >> extensions to the standard interfaces have been introduced to enable
> >> transactions.
> >>
> >>
> >> * Simplicity. Omid leverages the HBase infrastructure for managing its
> >> own metadata. It entails no additional services apart from those
> >> provided and used by HBase.
> >>
> >>
> >> * Track Record. As we have mentioned, Omid is already in use by
> >> very-large-scale production systems at Yahoo. Also, Hortonworks is
> >> integrating Omid in a metastore implementation for Hive based on
> >> HBase.
> >>
> >> [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
> >>
> >>
> >> === Current Status ===
> >> Current Omid implementation is available in both, Yahoo’s internal
> >> Github repository for internal use at Yahoo as well as in Yahoo’s
> >> Github public repository (https://github.com/yahoo/omid.git). Both
> >> repositories are managed by Omid’s current developers at Yahoo.
> >>
> >> As it is mentioned above, Yahoo is currently using Omid for providing
> >> transactions in Sieve, a web-scale content management platform that
> >> powers Yahoo’s next-generation search and personalization products.
> >>
> >>
> >> ==== Meritocracy ====
> >> The first version of Omid was originally created in 2011 by Maysam
> >> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
> >> Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
> >>
> >>
> >> During the years after its inception, Omid has matured to operate at
> >> Web scale and has been used internally by strategic projects at Yahoo
> >> such as Sieve. The current base of committers belong to the Yahoo team
> >> that took over the initial Omid prototype and rewrote it to meet the
> >> high availability and scalability requirements of the Sieve project.
> >> This base of committers has recently incorporated Hortonworks members
> >> that helped in the Omid adaptation to HBase 1.x versions.
> >>
> >>
> >> With this initial committer base, we aim to form a larger community
> >> that can collaborate with new ideas over the current code base. This
> >> new community will run the project following the "Apache Way"
> >> (http://apache.org/foundation/governance/). Users and new contributors
> >> will be treated with respect and welcomed. To grow the community, we
> >> will encourage contributors to provide patches, review code, propose
> >> new features improvements, talk at conferences such as Hadoop Summit,
> >> HBaseCon, ApacheCon, etc. Committership and PMC membership will be
> >> offered according to meritocracy.
> >>
> >> ==== Community ====
> >>
> >> The public Yahoo Omid repository at Github currently has 241 Stars and
> >> 93 forks, which means that there is an important interest for the
> >> project in the open-source community, at least compared with other
> >> similar projects (See https://github.com/yahoo/omid.git).
> >>
> >>
> >> Recently, Hortonworks contributors to the Apache Hive project which
> >> are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
> >> manifested interest in using Omid. We started with them a fruitful
> >> collaboration that resulted in Omid supporting HBase 1.x versions.
> >>
> >>
> >> Salesforce is also interested in collaborating in doing a Proof of
> >> Concept for integrating Omid as a pluggable transaction manager in
> >> Apache Phoenix.
> >>
> >>
> >> Yahoo, Hortonworks and Salesforce participants will constitute the
> >> initial set of committers and mentors for the proposal.
> >>
> >> ==== Core Developers ====
> >> The core developers of Omid are all skilled software developers and
> >> research engineers at Yahoo Inc. and Hortonworks with years of
> >> experiences in their fields. At this moment, developers are
> >> distributed across U.S. and Israel. The aim is to incorporate more
> >> committers from different organizations and locations over time.
> >>
> >>
> >> The current set of developers include experienced committers from
> >> Apache HBase, Hive and Hadoop projects that have been working with us
> >> in the current codebase found in Github.
> >>
> >> Finally, some of the core developers are currently NOT affiliated with
> >> the ASF and would require new ICLAs to be filed.
> >>
> >>
> >> === Alignment ===
> >> Omid enhances with transactions the already successful Apache HBase
> >> datastore project. We have collaborated with other developers inside
> >> and outside Yahoo which are involved in the Apache HBase community, so
> >> we have had reliable feedback from them.
> >>
> >> Although Omid brings value into HBase, the design of the current
> >> version provides a general transaction scheme that can potentially be
> >> adapted to other MVCC key-value datastores such as Apache Cassandra.
> >>
> >>
> >> Apache Phoenix is also a potential target. Phoenix is a SQL layer on
> >> top of HBase that can potentially integrate Omid in order to provide
> >> the well-know concept of transactions to Phoenix-based applications.
> >>
> >>
> >> === Known Risks ===
> >> ==== Orphaned products ====
> >> Yahoo’s Research and Search organizations have been taking care of
> >> Omid development since the first prototype creation in 2011. Yahoo has
> >> a long history participating in open-source projects, and has been
> >> also a long time contributor to the Apache community. For example, in
> >> Apache, Yahoo is an important contributor in many projects in the
> >> Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
> >> open-sourced other well-known projects outside Hadoop, such as
> >> Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
> >> Omid also a successful open-source Apache product. If this happens, we
> >> are sure that a larger community will be formed around the project in
> >> a relatively short period of time, contributing to the diversification
> >> and stabilization of the base of committers.
> >>
> >>
> >> ==== Inexperience with Open Source ====
> >> This project has long standing experienced mentors and interested
> >> contributors from Apache HBase, Hive and Phoenix to help us moving
> >> through the open source process. We are actively working with
> >> experienced Apache community members to improve our project and
> >> further testing.
> >>
> >> ==== Homogeneous Developers ====
> >> Omid has been supported by Yahoo since its inception in 2011. However,
> >> all current committers are employed by their respective companies
> >> shown in the Affiliations section.
> >>
> >>
> >> ==== Reliance on Salaried Developers ====
> >>
> >> All the current developers are paid by their employers to contribute
> >> to this project. Yahoo developers will also continuing maintaining the
> >> internal Omid repository at their company.
> >>
> >> Of course, other developers are welcomed to contribute to this project
> >> after it is open sourced in Apache.
> >>
> >> ==== Relationships with Other Apache Product ====
> >>
> >> Current Omid incarnation serves transactional contexts to applications
> >> storing their data in HBase. However Omid design potentially allows to
> >> be adapted to serve transactions on top of other MVCC-based key-value
> >> datastores in Apache community such as Cassandra.
> >>
> >>
> >> As a transactional framework, many other Apache projects such as
> >> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
> >> potentially benefit from Omid to get transactional contexts. In
> >> particular, Apache Phoenix -a SQL layer on top of HBase- might use
> >> Omid as its transaction management component. Once we open source Omid
> >> as an Apache project, we expect to generate more interest in the
> >> surrounded communities.
> >>
> >>
> >> Very recently, a new incubator proposal for a similar project called
> >> Tephra, has been submitted to the ASF. We think this is good for the
> >> Apache community, and we believe that there’s room for both proposals
> >> as the design of each of them is based on different principles (e.g.
> >> Omid does not require to maintain the state of ongoing transactions on
> >> the server-side component) and due to the fact that both -Tephra and
> >> Omid- have also gained certain traction in the open-source community.
> >>
> >>
> >> With regard to the Apache projects that Omid uses, apart from HBase,
> >> Omid relies on Apache Zookeeper and Curator projects in order to
> >> coordinate the (re)connection of transaction managers (acting as
> >> clients) to the conflict resolution component for transactions (server
> >> side.) They’re also used in order to coordinate the master and backup
> >> replicas in high availability scenarios.
> >>
> >>
> >> ==== An Excessive Fascination with the Apache Brand ====
> >>
> >> We are applying to the Incubator process because we think that it is
> >> the logical next step for the  Omid project after we open-sourced the
> >> code in Github some years ago. Yahoo has a long-standing history of
> >> contributing to Apache projects. The developers and contributors
> >> understand the implications of making it an Apache project, and
> >> strongly believe that the growing community can benefit from the
> >> Apache environment, ecosystem, and infrastrastructure.
> >>
> >>
> >> === Documentation ===
> >> Current documentation about the project is available in the wiki of
> >> Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
> >> be moved under https://omid.incubator.apache.org/docs if the project
> >> is accepted as an Apache Incubator.
> >>
> >> === Initial Source ===
> >> Initial source code is currently hosted in Github for general viewing
> >> and contribution:
> >>
> >> https://github.com/yahoo/omid.git
> >>
> >>
> >> Omid source code is written in Java code (99%) mixed with some shell
> >> script (1%) in order to configure and trigger the execution of main
> >> components.
> >>
> >>
> >> The code will be moved to Apache http://git.apache.org/ if accepted as
> >> an Incubator project.
> >>
> >> === Source and Intellectual Property Submission Plan ===
> >>
> >> The current Omid License for the code published in Github is Apache
> >> 2.0. If Omid fulfills and passes the conditions for being an Incubator
> >> project in the ASF, the source code will be transitioned via the
> >> Software Grant Agreement onto the ASF infrastructure and in turn made
> >> available under the Apache License, version 2.0.
> >>
> >> === External Dependencies ===
> >>
> >>
> >> The required external dependencies that are not Apache projects are
> >> all Apache licenses or other compatible Licenses:
> >>
> >> Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
> >>
> >> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
> >>
> >> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
> >>
> >> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
> >>
> >> Testng v6.8.8  (http://testng.org) [Apache 2.0]
> >>
> >> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
> >>
> >> Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
> >>
> >> Google Protocol Buffers v2.5.0
> >> (https://developers.google.com/protocol-buffers/) [BSD License]
> >>
> >> Mockito (http://mockito.org/) v1.9.5 [MIT License]
> >>
> >> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
> >> [Apache 2.0]
> >>
> >> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
> >> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
> >>
> >> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
> >>
> >> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
> >>
> >>
> >> === Cryptography ===
> >> Omid project does not use cryptography itself. However, Apache HBase
> >> -the datastore on top of which Omid works in its current version- uses
> >> standard APIs and tools for SSH and SSL communication where necessary.
> >>
> >> === Required Resources ===
> >> We request that following resources be created for the project to use:
> >>
> >> ==== Mailing lists ====
> >>
> >> omid-private (moderated subscriptions)
> >>
> >> omid-commits (commit notification)
> >> omid-dev (technical discussions)
> >>
> >> ==== Git repository ====
> >> https://github.com/apache/incubator-omid
> >>
> >> ==== Documentation ====
> >> https://omid.incubator.apache.org/docs/
> >>
> >> ==== JIRA instance ====
> >> https://issues.apache.org/jira/browse/omid
> >>
> >> === Initial Committers ===
> >>
> >> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> >>
> >>
> >> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> >>
> >>
> >> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> >>
> >>
> >> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> >>
> >>
> >> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
> >>
> >>
> >> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
> >>
> >> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> >>
> >>
> >> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
> >>
> >>
> >> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
> >>
> >>
> >> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
> >>
> >> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> >>
> >>
> >> === Additional Interested Contributors ===
> >> * Ivan Kelly (ivank<AT>apache<DOT>org)
> >>
> >> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
> >>
> >>
> >> === Affiliations ===
> >>
> >> * Edward Bortnikov, Yahoo Inc.
> >>
> >>
> >> * Daniel Dai, Hortonworks
> >>
> >>
> >> * Flavio P. Junqueira, Confluent
> >>
> >>
> >> * Igor Katkov, Yahoo Inc.
> >>
> >>
> >> * Ivan Kelly, Midokura
> >>
> >>
> >> * Francis C. Liu, Yahoo Inc.
> >>
> >>
> >> * Sameer Paranjpye, Arimo
> >>
> >> * Francisco Perez-Sorrosal, Yahoo Inc.
> >>
> >>
> >> * Ohad Shacham, Yahoo Inc.
> >>
> >>
> >> * Maysam Yabandeh, Dropbox Inc.
> >>
> >>
> >> === Sponsors ===
> >>
> >> ==== Champion ====
> >>
> >> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
> >>
> >> ==== Nominated Mentors ====
> >>
> >> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
> >>
> >> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
> >>
> >> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
> >>
> >> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
> >>
> >> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
> >>
> >>
> >> ==== Sponsoring Entity ====
> >> Apache Incubator PMC
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> >> For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
>
>

Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Chris Douglas <cd...@apache.org>.
+1 -C



On Sat, Mar 19, 2016 at 7:26 AM, Flavio Junqueira <fp...@apache.org> wrote:
> I understand the concern, so let me try to offer some facts and see if we can make progress from there.
>
> Omid has been around for some time now, and its initial design appeared in a couple of research papers that I actually co-authored. The architecture is based on the idea of having a centralized transaction status oracle that shares transaction status data with clients for scalability. The current Omid project evolved out of that initial work and it is a much improved version over that first iteration, with the improvements focusing on scalability. It currently runs in production at scale at Yahoo! and there is interest from other companies according to the proposal. There is a series of blog posts about the experience in the project proposal.
>
> Tephra has a very similar architecture. The description here says that it has a transaction server, which sounds like the TSO in the original Omid papers. I haven't spent enough time understanding the precise protocol they use, but I must say that the protocol is very important for correctness and scalability. Having two protocols with different properties could justify the presence of two projects, but they both promise snapshot isolation so I suspect they will be doing very similar things.
>
> Overall, as I see it, it would be very unfair to reject the Omid proposal on the basis that Tephra was incubated a couple of weeks ago. I'd much rather see how the two communities evolve and have the mentors of the projects fostering collaboration and possibly a merge of the two projects before graduation. Why not think of a general transaction status oracle with different protocol implementations assuming it makes sense? I wouldn't like to see any of the two blocked upfront on the basis that they are in the same space, though. We could postpone this decision until graduation when we'll have more knowledge about the projects and the growth of the two communities.
>
> -Flavio
>
>> On 18 Mar 2016, at 23:19, Henry Saputra <he...@gmail.com> wrote:
>>
>> I know Apache incubator does not play favorite but it is getting awkward
>> that TWO transaction engine for HBase coming to incubator at the same time.
>>
>> As most people know, the other one is Tephra, that just coming to incubator
>> few weeks ago.
>>
>> As member of IPMC, I would like to see Omid provide some more details
>> comparisons about the difference that the project bring,  in term of
>> approach and possible integrations with other ASF projects.
>>
>> If possible, I would prefer to see Omid team work together with Tephra to
>> work on working together to make one solid transaction engine for HBase and
>> later NoSQL databases.
>>
>>
>> - Henry
>>
>> On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <da...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I would like to propose Omid as an Apache Incubator project:
>>>
>>> https://wiki.apache.org/incubator/OmidProposal
>>>
>>> I've posted posted the text of the proposal below:
>>>
>>> Thanks,
>>> Daniel
>>>
>>> = Omid Proposal =
>>>
>>> === Abstract ===
>>>
>>> Omid is a flexible, reliable, high performant and scalable ACID
>>> transactional framework that allows client applications to execute
>>> transactions on top of MVCC key/value-based NoSQL datastores
>>> (currently Apache HBase) providing Snapshot Isolation guarantees on
>>> the accessed data.
>>>
>>>
>>> === Proposal ===
>>>
>>> Omid is a flexible open-source transactional framework that provides
>>> ACID transactions with Snapshot Isolation guarantees on top of NoSQL
>>> datastores. In particular, the current codebase brings the concept of
>>> transactions to the popular Apache HBase datastore. Omid offers great
>>> performance, it is highly available, and scalable. Omid's current
>>> version is able to scale to thousands of clients triggering concurrent
>>> transactions on application data stored in HBase. Omid can scale
>>> beyond 100K transactions per second on mid-range hardware while
>>> incurring in a minimal impact on the speed of data access in the
>>> datastore. We’re currently experimenting with a prototype version that
>>> can improve the performance up to ~380K TPS.
>>>
>>>
>>> Omid has been publicly available as an open-source project in Github
>>> under Apache License Version 2.0 since 2011 [1]. During these years,
>>> it has generated certain interest in the open source community,
>>> especially since the public presentation of the first version in
>>> Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
>>> 93 forks. Yahoo Inc. submits this proposal to the Apache Software
>>> Foundation with the aim to transfer the Omid project -including its
>>> source code and documentation- to Apache in order to start the build
>>> of a stable open source community around it.
>>>
>>>
>>> [1] https://github.com/yahoo/omid
>>>
>>> [2] Omid presentation at Hadoop Summit 2013:
>>>
>>> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
>>>
>>>
>>> === Background ===
>>>
>>> An Omid prototype was first released as an open-source project back in
>>> 2011. Inspired by Google Percolator [1], it offered a lock-free
>>> approach to transactions in NoSQL datastores (See [2]). However,
>>> during these years, the design of Omid has evolved significantly.
>>> Whilst the current open-sourced version maintains many aspects of the
>>> original implementation, it is the result of a major redesign of the
>>> first prototype released in 2011.
>>>
>>>
>>> Omid has now a more decentralized design that does not sacrifice the
>>> consistency and performance of the original version. The current
>>> design also enables Omid to scale to thousands of clients executing
>>> transactions concurrently on application data stored in HBase.
>>> Internally, Omid still utilizes a lock-free approach to support
>>> multiple concurrent clients. Its design also relies on a centralized
>>> conflict detection component, the TSO, which now resolves in an
>>> efficient manner writeset collisions among concurrent transactions
>>> without having to piggyback commit information to the clients. Another
>>> important benefit of Omid is that it doesn't require any modification
>>> of the underlying key-value datastore, HBase in this case. Moreover,
>>> the recently added high availability algorithm allows to eliminate the
>>> single point of failure represented by the TSO in those system
>>> deployments requiring a higher degree of dependability. Last but not
>>> least, the provided user API is very simple, mimicking transaction
>>> managers in the relational world: begin, commit, rollback.
>>>
>>>
>>> Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
>>> management platform powering some of next-generation search and
>>> personalization products is using Omid as a transaction manager in its
>>> processing pipeline. Sieve essentially acts as a huge processing hub
>>> between content feeds and serving systems. It provides an environment
>>> for highly customizable, real-time, streamed information processing,
>>> with typical discovery-to-service latencies of just a few seconds. In
>>> terms of scale and availability, Omid’s new design was largely driven
>>> by Sieve’s requirements.
>>>
>>>
>>> At Yahoo, we are also making an effort to disseminate the current
>>> status of the project through blog entries (See [3], [4] and [5]) and
>>> submissions to technical and academic conferences such as ATC 2016,
>>> Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
>>> appeared in a TechCrunch article in the last quarter of 2015 (See [6])
>>>
>>>
>>> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
>>> Distributed Transactions and Notifications. USENIX Symposium on
>>> Operating Systems Design and Implementation, 2010
>>>
>>> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
>>> Omid: Lock-free transactional support for distributed data stores. In
>>> Proc. of ICDE, 2013.
>>>
>>> [3]
>>> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
>>>
>>> [4]
>>> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
>>>
>>> [5]
>>> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
>>>
>>> [6]
>>> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
>>>
>>>
>>> === Rationale ===
>>>
>>> Programming with ACID (Atomicity, Consistency, Isolation, Durability)
>>> transactions is very popular and it is featured in relational
>>> databases. However, in the Big Data ecosystem, applications typically
>>> use NoSQL datastores, which do not provide ACID transactions. Such
>>> NoSQL datastores used to give up transactional support for greater
>>> agility and scalability. However, while early NoSQL data store
>>> implementations did not include transaction support, the need for
>>> transactions soon emerged in Big Data applications when accessing
>>> shared data; for  example, transactions are very important  for
>>> modern, scalable systems that process content incrementally.
>>>
>>>
>>> NoSQL datastores -including HBase- don’t provide transactional
>>> frameworks to coordinate the access to the underlying data for
>>> preserving consistency. By using Omid, Big Data applications that need
>>> to bundle multiple read and write operations on HBase into logically
>>> indivisible units of work can execute transactions with ACID
>>> properties, just as they would use transactions in the relational
>>> database world. Omid extends the HBase key-value access APl with
>>> transaction semantics. It can be exercised either directly, or via
>>> higher level data management API’s. For example, Apache Phoenix
>>> (SQL-on-top-of-HBase) might use Omid as its transaction management
>>> component.
>>>
>>>
>>> The following features make Omid an attractive choice for system
>>> designers and other projects in the Apache community:
>>>
>>>
>>> * Semantics. Omid implements Snapshot Isolation (SI,) supported by
>>> major SQL and NoSQL technologies (e.g. Google Percolator).
>>>
>>>
>>> * Performance and Scalability. Omid  provides a highly scalable,
>>> lock-free implementation of SI. To the best of our knowledge, it is
>>> also one of the few open source NoSQL transactional platforms that can
>>> execute more than 100K transactions per second [1]. A new prototype
>>> still in development can go even further, up to ~380K TPS.
>>>
>>>
>>> * Reliability.  Omid has a high-availability (HA) mode, in which the
>>> core service performing writeset conflict resolution operates as
>>> primary-backup process pair with automatic failover. The HA support
>>> has zero overhead on the mainstream operation.
>>>
>>>
>>> * Adaptability. Omid current version provides transactions on data
>>> stored in Apache HBase. However, Omid’s components are generic enough
>>> to be adapted to any other key-value NoSQL datasource that supports
>>> MVCC.
>>>
>>>
>>> * Development. Omid provides a very simple interface that mimics
>>> standard HBase APIs, making it developer friendly. Only minimal
>>> extensions to the standard interfaces have been introduced to enable
>>> transactions.
>>>
>>>
>>> * Simplicity. Omid leverages the HBase infrastructure for managing its
>>> own metadata. It entails no additional services apart from those
>>> provided and used by HBase.
>>>
>>>
>>> * Track Record. As we have mentioned, Omid is already in use by
>>> very-large-scale production systems at Yahoo. Also, Hortonworks is
>>> integrating Omid in a metastore implementation for Hive based on
>>> HBase.
>>>
>>> [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
>>>
>>>
>>> === Current Status ===
>>> Current Omid implementation is available in both, Yahoo’s internal
>>> Github repository for internal use at Yahoo as well as in Yahoo’s
>>> Github public repository (https://github.com/yahoo/omid.git). Both
>>> repositories are managed by Omid’s current developers at Yahoo.
>>>
>>> As it is mentioned above, Yahoo is currently using Omid for providing
>>> transactions in Sieve, a web-scale content management platform that
>>> powers Yahoo’s next-generation search and personalization products.
>>>
>>>
>>> ==== Meritocracy ====
>>> The first version of Omid was originally created in 2011 by Maysam
>>> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
>>> Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>>>
>>>
>>> During the years after its inception, Omid has matured to operate at
>>> Web scale and has been used internally by strategic projects at Yahoo
>>> such as Sieve. The current base of committers belong to the Yahoo team
>>> that took over the initial Omid prototype and rewrote it to meet the
>>> high availability and scalability requirements of the Sieve project.
>>> This base of committers has recently incorporated Hortonworks members
>>> that helped in the Omid adaptation to HBase 1.x versions.
>>>
>>>
>>> With this initial committer base, we aim to form a larger community
>>> that can collaborate with new ideas over the current code base. This
>>> new community will run the project following the "Apache Way"
>>> (http://apache.org/foundation/governance/). Users and new contributors
>>> will be treated with respect and welcomed. To grow the community, we
>>> will encourage contributors to provide patches, review code, propose
>>> new features improvements, talk at conferences such as Hadoop Summit,
>>> HBaseCon, ApacheCon, etc. Committership and PMC membership will be
>>> offered according to meritocracy.
>>>
>>> ==== Community ====
>>>
>>> The public Yahoo Omid repository at Github currently has 241 Stars and
>>> 93 forks, which means that there is an important interest for the
>>> project in the open-source community, at least compared with other
>>> similar projects (See https://github.com/yahoo/omid.git).
>>>
>>>
>>> Recently, Hortonworks contributors to the Apache Hive project which
>>> are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
>>> manifested interest in using Omid. We started with them a fruitful
>>> collaboration that resulted in Omid supporting HBase 1.x versions.
>>>
>>>
>>> Salesforce is also interested in collaborating in doing a Proof of
>>> Concept for integrating Omid as a pluggable transaction manager in
>>> Apache Phoenix.
>>>
>>>
>>> Yahoo, Hortonworks and Salesforce participants will constitute the
>>> initial set of committers and mentors for the proposal.
>>>
>>> ==== Core Developers ====
>>> The core developers of Omid are all skilled software developers and
>>> research engineers at Yahoo Inc. and Hortonworks with years of
>>> experiences in their fields. At this moment, developers are
>>> distributed across U.S. and Israel. The aim is to incorporate more
>>> committers from different organizations and locations over time.
>>>
>>>
>>> The current set of developers include experienced committers from
>>> Apache HBase, Hive and Hadoop projects that have been working with us
>>> in the current codebase found in Github.
>>>
>>> Finally, some of the core developers are currently NOT affiliated with
>>> the ASF and would require new ICLAs to be filed.
>>>
>>>
>>> === Alignment ===
>>> Omid enhances with transactions the already successful Apache HBase
>>> datastore project. We have collaborated with other developers inside
>>> and outside Yahoo which are involved in the Apache HBase community, so
>>> we have had reliable feedback from them.
>>>
>>> Although Omid brings value into HBase, the design of the current
>>> version provides a general transaction scheme that can potentially be
>>> adapted to other MVCC key-value datastores such as Apache Cassandra.
>>>
>>>
>>> Apache Phoenix is also a potential target. Phoenix is a SQL layer on
>>> top of HBase that can potentially integrate Omid in order to provide
>>> the well-know concept of transactions to Phoenix-based applications.
>>>
>>>
>>> === Known Risks ===
>>> ==== Orphaned products ====
>>> Yahoo’s Research and Search organizations have been taking care of
>>> Omid development since the first prototype creation in 2011. Yahoo has
>>> a long history participating in open-source projects, and has been
>>> also a long time contributor to the Apache community. For example, in
>>> Apache, Yahoo is an important contributor in many projects in the
>>> Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
>>> open-sourced other well-known projects outside Hadoop, such as
>>> Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
>>> Omid also a successful open-source Apache product. If this happens, we
>>> are sure that a larger community will be formed around the project in
>>> a relatively short period of time, contributing to the diversification
>>> and stabilization of the base of committers.
>>>
>>>
>>> ==== Inexperience with Open Source ====
>>> This project has long standing experienced mentors and interested
>>> contributors from Apache HBase, Hive and Phoenix to help us moving
>>> through the open source process. We are actively working with
>>> experienced Apache community members to improve our project and
>>> further testing.
>>>
>>> ==== Homogeneous Developers ====
>>> Omid has been supported by Yahoo since its inception in 2011. However,
>>> all current committers are employed by their respective companies
>>> shown in the Affiliations section.
>>>
>>>
>>> ==== Reliance on Salaried Developers ====
>>>
>>> All the current developers are paid by their employers to contribute
>>> to this project. Yahoo developers will also continuing maintaining the
>>> internal Omid repository at their company.
>>>
>>> Of course, other developers are welcomed to contribute to this project
>>> after it is open sourced in Apache.
>>>
>>> ==== Relationships with Other Apache Product ====
>>>
>>> Current Omid incarnation serves transactional contexts to applications
>>> storing their data in HBase. However Omid design potentially allows to
>>> be adapted to serve transactions on top of other MVCC-based key-value
>>> datastores in Apache community such as Cassandra.
>>>
>>>
>>> As a transactional framework, many other Apache projects such as
>>> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
>>> potentially benefit from Omid to get transactional contexts. In
>>> particular, Apache Phoenix -a SQL layer on top of HBase- might use
>>> Omid as its transaction management component. Once we open source Omid
>>> as an Apache project, we expect to generate more interest in the
>>> surrounded communities.
>>>
>>>
>>> Very recently, a new incubator proposal for a similar project called
>>> Tephra, has been submitted to the ASF. We think this is good for the
>>> Apache community, and we believe that there’s room for both proposals
>>> as the design of each of them is based on different principles (e.g.
>>> Omid does not require to maintain the state of ongoing transactions on
>>> the server-side component) and due to the fact that both -Tephra and
>>> Omid- have also gained certain traction in the open-source community.
>>>
>>>
>>> With regard to the Apache projects that Omid uses, apart from HBase,
>>> Omid relies on Apache Zookeeper and Curator projects in order to
>>> coordinate the (re)connection of transaction managers (acting as
>>> clients) to the conflict resolution component for transactions (server
>>> side.) They’re also used in order to coordinate the master and backup
>>> replicas in high availability scenarios.
>>>
>>>
>>> ==== An Excessive Fascination with the Apache Brand ====
>>>
>>> We are applying to the Incubator process because we think that it is
>>> the logical next step for the  Omid project after we open-sourced the
>>> code in Github some years ago. Yahoo has a long-standing history of
>>> contributing to Apache projects. The developers and contributors
>>> understand the implications of making it an Apache project, and
>>> strongly believe that the growing community can benefit from the
>>> Apache environment, ecosystem, and infrastrastructure.
>>>
>>>
>>> === Documentation ===
>>> Current documentation about the project is available in the wiki of
>>> Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
>>> be moved under https://omid.incubator.apache.org/docs if the project
>>> is accepted as an Apache Incubator.
>>>
>>> === Initial Source ===
>>> Initial source code is currently hosted in Github for general viewing
>>> and contribution:
>>>
>>> https://github.com/yahoo/omid.git
>>>
>>>
>>> Omid source code is written in Java code (99%) mixed with some shell
>>> script (1%) in order to configure and trigger the execution of main
>>> components.
>>>
>>>
>>> The code will be moved to Apache http://git.apache.org/ if accepted as
>>> an Incubator project.
>>>
>>> === Source and Intellectual Property Submission Plan ===
>>>
>>> The current Omid License for the code published in Github is Apache
>>> 2.0. If Omid fulfills and passes the conditions for being an Incubator
>>> project in the ASF, the source code will be transitioned via the
>>> Software Grant Agreement onto the ASF infrastructure and in turn made
>>> available under the Apache License, version 2.0.
>>>
>>> === External Dependencies ===
>>>
>>>
>>> The required external dependencies that are not Apache projects are
>>> all Apache licenses or other compatible Licenses:
>>>
>>> Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
>>>
>>> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
>>>
>>> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>>>
>>> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
>>>
>>> Testng v6.8.8  (http://testng.org) [Apache 2.0]
>>>
>>> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>>>
>>> Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>>>
>>> Google Protocol Buffers v2.5.0
>>> (https://developers.google.com/protocol-buffers/) [BSD License]
>>>
>>> Mockito (http://mockito.org/) v1.9.5 [MIT License]
>>>
>>> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
>>> [Apache 2.0]
>>>
>>> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
>>> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>>>
>>> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>>>
>>> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>>>
>>>
>>> === Cryptography ===
>>> Omid project does not use cryptography itself. However, Apache HBase
>>> -the datastore on top of which Omid works in its current version- uses
>>> standard APIs and tools for SSH and SSL communication where necessary.
>>>
>>> === Required Resources ===
>>> We request that following resources be created for the project to use:
>>>
>>> ==== Mailing lists ====
>>>
>>> omid-private (moderated subscriptions)
>>>
>>> omid-commits (commit notification)
>>> omid-dev (technical discussions)
>>>
>>> ==== Git repository ====
>>> https://github.com/apache/incubator-omid
>>>
>>> ==== Documentation ====
>>> https://omid.incubator.apache.org/docs/
>>>
>>> ==== JIRA instance ====
>>> https://issues.apache.org/jira/browse/omid
>>>
>>> === Initial Committers ===
>>>
>>> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>>>
>>>
>>> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>>>
>>>
>>> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>>>
>>>
>>> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>>>
>>>
>>> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>>>
>>>
>>> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>>>
>>> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>>>
>>>
>>> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>>>
>>>
>>> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>>>
>>>
>>> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>>>
>>> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>>>
>>>
>>> === Additional Interested Contributors ===
>>> * Ivan Kelly (ivank<AT>apache<DOT>org)
>>>
>>> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>>>
>>>
>>> === Affiliations ===
>>>
>>> * Edward Bortnikov, Yahoo Inc.
>>>
>>>
>>> * Daniel Dai, Hortonworks
>>>
>>>
>>> * Flavio P. Junqueira, Confluent
>>>
>>>
>>> * Igor Katkov, Yahoo Inc.
>>>
>>>
>>> * Ivan Kelly, Midokura
>>>
>>>
>>> * Francis C. Liu, Yahoo Inc.
>>>
>>>
>>> * Sameer Paranjpye, Arimo
>>>
>>> * Francisco Perez-Sorrosal, Yahoo Inc.
>>>
>>>
>>> * Ohad Shacham, Yahoo Inc.
>>>
>>>
>>> * Maysam Yabandeh, Dropbox Inc.
>>>
>>>
>>> === Sponsors ===
>>>
>>> ==== Champion ====
>>>
>>> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>>>
>>> ==== Nominated Mentors ====
>>>
>>> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>>>
>>> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>>>
>>> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>>>
>>> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>>>
>>> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>>>
>>>
>>> ==== Sponsoring Entity ====
>>> Apache Incubator PMC
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Flavio Junqueira <fp...@apache.org>.
I understand the concern, so let me try to offer some facts and see if we can make progress from there. 

Omid has been around for some time now, and its initial design appeared in a couple of research papers that I actually co-authored. The architecture is based on the idea of having a centralized transaction status oracle that shares transaction status data with clients for scalability. The current Omid project evolved out of that initial work and it is a much improved version over that first iteration, with the improvements focusing on scalability. It currently runs in production at scale at Yahoo! and there is interest from other companies according to the proposal. There is a series of blog posts about the experience in the project proposal.

Tephra has a very similar architecture. The description here says that it has a transaction server, which sounds like the TSO in the original Omid papers. I haven't spent enough time understanding the precise protocol they use, but I must say that the protocol is very important for correctness and scalability. Having two protocols with different properties could justify the presence of two projects, but they both promise snapshot isolation so I suspect they will be doing very similar things.

Overall, as I see it, it would be very unfair to reject the Omid proposal on the basis that Tephra was incubated a couple of weeks ago. I'd much rather see how the two communities evolve and have the mentors of the projects fostering collaboration and possibly a merge of the two projects before graduation. Why not think of a general transaction status oracle with different protocol implementations assuming it makes sense? I wouldn't like to see any of the two blocked upfront on the basis that they are in the same space, though. We could postpone this decision until graduation when we'll have more knowledge about the projects and the growth of the two communities.

-Flavio

> On 18 Mar 2016, at 23:19, Henry Saputra <he...@gmail.com> wrote:
> 
> I know Apache incubator does not play favorite but it is getting awkward
> that TWO transaction engine for HBase coming to incubator at the same time.
> 
> As most people know, the other one is Tephra, that just coming to incubator
> few weeks ago.
> 
> As member of IPMC, I would like to see Omid provide some more details
> comparisons about the difference that the project bring,  in term of
> approach and possible integrations with other ASF projects.
> 
> If possible, I would prefer to see Omid team work together with Tephra to
> work on working together to make one solid transaction engine for HBase and
> later NoSQL databases.
> 
> 
> - Henry
> 
> On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <da...@gmail.com> wrote:
> 
>> Hi,
>> 
>> I would like to propose Omid as an Apache Incubator project:
>> 
>> https://wiki.apache.org/incubator/OmidProposal
>> 
>> I've posted posted the text of the proposal below:
>> 
>> Thanks,
>> Daniel
>> 
>> = Omid Proposal =
>> 
>> === Abstract ===
>> 
>> Omid is a flexible, reliable, high performant and scalable ACID
>> transactional framework that allows client applications to execute
>> transactions on top of MVCC key/value-based NoSQL datastores
>> (currently Apache HBase) providing Snapshot Isolation guarantees on
>> the accessed data.
>> 
>> 
>> === Proposal ===
>> 
>> Omid is a flexible open-source transactional framework that provides
>> ACID transactions with Snapshot Isolation guarantees on top of NoSQL
>> datastores. In particular, the current codebase brings the concept of
>> transactions to the popular Apache HBase datastore. Omid offers great
>> performance, it is highly available, and scalable. Omid's current
>> version is able to scale to thousands of clients triggering concurrent
>> transactions on application data stored in HBase. Omid can scale
>> beyond 100K transactions per second on mid-range hardware while
>> incurring in a minimal impact on the speed of data access in the
>> datastore. We’re currently experimenting with a prototype version that
>> can improve the performance up to ~380K TPS.
>> 
>> 
>> Omid has been publicly available as an open-source project in Github
>> under Apache License Version 2.0 since 2011 [1]. During these years,
>> it has generated certain interest in the open source community,
>> especially since the public presentation of the first version in
>> Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
>> 93 forks. Yahoo Inc. submits this proposal to the Apache Software
>> Foundation with the aim to transfer the Omid project -including its
>> source code and documentation- to Apache in order to start the build
>> of a stable open source community around it.
>> 
>> 
>> [1] https://github.com/yahoo/omid
>> 
>> [2] Omid presentation at Hadoop Summit 2013:
>> 
>> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
>> 
>> 
>> === Background ===
>> 
>> An Omid prototype was first released as an open-source project back in
>> 2011. Inspired by Google Percolator [1], it offered a lock-free
>> approach to transactions in NoSQL datastores (See [2]). However,
>> during these years, the design of Omid has evolved significantly.
>> Whilst the current open-sourced version maintains many aspects of the
>> original implementation, it is the result of a major redesign of the
>> first prototype released in 2011.
>> 
>> 
>> Omid has now a more decentralized design that does not sacrifice the
>> consistency and performance of the original version. The current
>> design also enables Omid to scale to thousands of clients executing
>> transactions concurrently on application data stored in HBase.
>> Internally, Omid still utilizes a lock-free approach to support
>> multiple concurrent clients. Its design also relies on a centralized
>> conflict detection component, the TSO, which now resolves in an
>> efficient manner writeset collisions among concurrent transactions
>> without having to piggyback commit information to the clients. Another
>> important benefit of Omid is that it doesn't require any modification
>> of the underlying key-value datastore, HBase in this case. Moreover,
>> the recently added high availability algorithm allows to eliminate the
>> single point of failure represented by the TSO in those system
>> deployments requiring a higher degree of dependability. Last but not
>> least, the provided user API is very simple, mimicking transaction
>> managers in the relational world: begin, commit, rollback.
>> 
>> 
>> Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
>> management platform powering some of next-generation search and
>> personalization products is using Omid as a transaction manager in its
>> processing pipeline. Sieve essentially acts as a huge processing hub
>> between content feeds and serving systems. It provides an environment
>> for highly customizable, real-time, streamed information processing,
>> with typical discovery-to-service latencies of just a few seconds. In
>> terms of scale and availability, Omid’s new design was largely driven
>> by Sieve’s requirements.
>> 
>> 
>> At Yahoo, we are also making an effort to disseminate the current
>> status of the project through blog entries (See [3], [4] and [5]) and
>> submissions to technical and academic conferences such as ATC 2016,
>> Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
>> appeared in a TechCrunch article in the last quarter of 2015 (See [6])
>> 
>> 
>> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
>> Distributed Transactions and Notifications. USENIX Symposium on
>> Operating Systems Design and Implementation, 2010
>> 
>> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
>> Omid: Lock-free transactional support for distributed data stores. In
>> Proc. of ICDE, 2013.
>> 
>> [3]
>> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
>> 
>> [4]
>> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
>> 
>> [5]
>> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
>> 
>> [6]
>> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
>> 
>> 
>> === Rationale ===
>> 
>> Programming with ACID (Atomicity, Consistency, Isolation, Durability)
>> transactions is very popular and it is featured in relational
>> databases. However, in the Big Data ecosystem, applications typically
>> use NoSQL datastores, which do not provide ACID transactions. Such
>> NoSQL datastores used to give up transactional support for greater
>> agility and scalability. However, while early NoSQL data store
>> implementations did not include transaction support, the need for
>> transactions soon emerged in Big Data applications when accessing
>> shared data; for  example, transactions are very important  for
>> modern, scalable systems that process content incrementally.
>> 
>> 
>> NoSQL datastores -including HBase- don’t provide transactional
>> frameworks to coordinate the access to the underlying data for
>> preserving consistency. By using Omid, Big Data applications that need
>> to bundle multiple read and write operations on HBase into logically
>> indivisible units of work can execute transactions with ACID
>> properties, just as they would use transactions in the relational
>> database world. Omid extends the HBase key-value access APl with
>> transaction semantics. It can be exercised either directly, or via
>> higher level data management API’s. For example, Apache Phoenix
>> (SQL-on-top-of-HBase) might use Omid as its transaction management
>> component.
>> 
>> 
>> The following features make Omid an attractive choice for system
>> designers and other projects in the Apache community:
>> 
>> 
>> * Semantics. Omid implements Snapshot Isolation (SI,) supported by
>> major SQL and NoSQL technologies (e.g. Google Percolator).
>> 
>> 
>> * Performance and Scalability. Omid  provides a highly scalable,
>> lock-free implementation of SI. To the best of our knowledge, it is
>> also one of the few open source NoSQL transactional platforms that can
>> execute more than 100K transactions per second [1]. A new prototype
>> still in development can go even further, up to ~380K TPS.
>> 
>> 
>> * Reliability.  Omid has a high-availability (HA) mode, in which the
>> core service performing writeset conflict resolution operates as
>> primary-backup process pair with automatic failover. The HA support
>> has zero overhead on the mainstream operation.
>> 
>> 
>> * Adaptability. Omid current version provides transactions on data
>> stored in Apache HBase. However, Omid’s components are generic enough
>> to be adapted to any other key-value NoSQL datasource that supports
>> MVCC.
>> 
>> 
>> * Development. Omid provides a very simple interface that mimics
>> standard HBase APIs, making it developer friendly. Only minimal
>> extensions to the standard interfaces have been introduced to enable
>> transactions.
>> 
>> 
>> * Simplicity. Omid leverages the HBase infrastructure for managing its
>> own metadata. It entails no additional services apart from those
>> provided and used by HBase.
>> 
>> 
>> * Track Record. As we have mentioned, Omid is already in use by
>> very-large-scale production systems at Yahoo. Also, Hortonworks is
>> integrating Omid in a metastore implementation for Hive based on
>> HBase.
>> 
>> [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
>> 
>> 
>> === Current Status ===
>> Current Omid implementation is available in both, Yahoo’s internal
>> Github repository for internal use at Yahoo as well as in Yahoo’s
>> Github public repository (https://github.com/yahoo/omid.git). Both
>> repositories are managed by Omid’s current developers at Yahoo.
>> 
>> As it is mentioned above, Yahoo is currently using Omid for providing
>> transactions in Sieve, a web-scale content management platform that
>> powers Yahoo’s next-generation search and personalization products.
>> 
>> 
>> ==== Meritocracy ====
>> The first version of Omid was originally created in 2011 by Maysam
>> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
>> Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>> 
>> 
>> During the years after its inception, Omid has matured to operate at
>> Web scale and has been used internally by strategic projects at Yahoo
>> such as Sieve. The current base of committers belong to the Yahoo team
>> that took over the initial Omid prototype and rewrote it to meet the
>> high availability and scalability requirements of the Sieve project.
>> This base of committers has recently incorporated Hortonworks members
>> that helped in the Omid adaptation to HBase 1.x versions.
>> 
>> 
>> With this initial committer base, we aim to form a larger community
>> that can collaborate with new ideas over the current code base. This
>> new community will run the project following the "Apache Way"
>> (http://apache.org/foundation/governance/). Users and new contributors
>> will be treated with respect and welcomed. To grow the community, we
>> will encourage contributors to provide patches, review code, propose
>> new features improvements, talk at conferences such as Hadoop Summit,
>> HBaseCon, ApacheCon, etc. Committership and PMC membership will be
>> offered according to meritocracy.
>> 
>> ==== Community ====
>> 
>> The public Yahoo Omid repository at Github currently has 241 Stars and
>> 93 forks, which means that there is an important interest for the
>> project in the open-source community, at least compared with other
>> similar projects (See https://github.com/yahoo/omid.git).
>> 
>> 
>> Recently, Hortonworks contributors to the Apache Hive project which
>> are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
>> manifested interest in using Omid. We started with them a fruitful
>> collaboration that resulted in Omid supporting HBase 1.x versions.
>> 
>> 
>> Salesforce is also interested in collaborating in doing a Proof of
>> Concept for integrating Omid as a pluggable transaction manager in
>> Apache Phoenix.
>> 
>> 
>> Yahoo, Hortonworks and Salesforce participants will constitute the
>> initial set of committers and mentors for the proposal.
>> 
>> ==== Core Developers ====
>> The core developers of Omid are all skilled software developers and
>> research engineers at Yahoo Inc. and Hortonworks with years of
>> experiences in their fields. At this moment, developers are
>> distributed across U.S. and Israel. The aim is to incorporate more
>> committers from different organizations and locations over time.
>> 
>> 
>> The current set of developers include experienced committers from
>> Apache HBase, Hive and Hadoop projects that have been working with us
>> in the current codebase found in Github.
>> 
>> Finally, some of the core developers are currently NOT affiliated with
>> the ASF and would require new ICLAs to be filed.
>> 
>> 
>> === Alignment ===
>> Omid enhances with transactions the already successful Apache HBase
>> datastore project. We have collaborated with other developers inside
>> and outside Yahoo which are involved in the Apache HBase community, so
>> we have had reliable feedback from them.
>> 
>> Although Omid brings value into HBase, the design of the current
>> version provides a general transaction scheme that can potentially be
>> adapted to other MVCC key-value datastores such as Apache Cassandra.
>> 
>> 
>> Apache Phoenix is also a potential target. Phoenix is a SQL layer on
>> top of HBase that can potentially integrate Omid in order to provide
>> the well-know concept of transactions to Phoenix-based applications.
>> 
>> 
>> === Known Risks ===
>> ==== Orphaned products ====
>> Yahoo’s Research and Search organizations have been taking care of
>> Omid development since the first prototype creation in 2011. Yahoo has
>> a long history participating in open-source projects, and has been
>> also a long time contributor to the Apache community. For example, in
>> Apache, Yahoo is an important contributor in many projects in the
>> Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
>> open-sourced other well-known projects outside Hadoop, such as
>> Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
>> Omid also a successful open-source Apache product. If this happens, we
>> are sure that a larger community will be formed around the project in
>> a relatively short period of time, contributing to the diversification
>> and stabilization of the base of committers.
>> 
>> 
>> ==== Inexperience with Open Source ====
>> This project has long standing experienced mentors and interested
>> contributors from Apache HBase, Hive and Phoenix to help us moving
>> through the open source process. We are actively working with
>> experienced Apache community members to improve our project and
>> further testing.
>> 
>> ==== Homogeneous Developers ====
>> Omid has been supported by Yahoo since its inception in 2011. However,
>> all current committers are employed by their respective companies
>> shown in the Affiliations section.
>> 
>> 
>> ==== Reliance on Salaried Developers ====
>> 
>> All the current developers are paid by their employers to contribute
>> to this project. Yahoo developers will also continuing maintaining the
>> internal Omid repository at their company.
>> 
>> Of course, other developers are welcomed to contribute to this project
>> after it is open sourced in Apache.
>> 
>> ==== Relationships with Other Apache Product ====
>> 
>> Current Omid incarnation serves transactional contexts to applications
>> storing their data in HBase. However Omid design potentially allows to
>> be adapted to serve transactions on top of other MVCC-based key-value
>> datastores in Apache community such as Cassandra.
>> 
>> 
>> As a transactional framework, many other Apache projects such as
>> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
>> potentially benefit from Omid to get transactional contexts. In
>> particular, Apache Phoenix -a SQL layer on top of HBase- might use
>> Omid as its transaction management component. Once we open source Omid
>> as an Apache project, we expect to generate more interest in the
>> surrounded communities.
>> 
>> 
>> Very recently, a new incubator proposal for a similar project called
>> Tephra, has been submitted to the ASF. We think this is good for the
>> Apache community, and we believe that there’s room for both proposals
>> as the design of each of them is based on different principles (e.g.
>> Omid does not require to maintain the state of ongoing transactions on
>> the server-side component) and due to the fact that both -Tephra and
>> Omid- have also gained certain traction in the open-source community.
>> 
>> 
>> With regard to the Apache projects that Omid uses, apart from HBase,
>> Omid relies on Apache Zookeeper and Curator projects in order to
>> coordinate the (re)connection of transaction managers (acting as
>> clients) to the conflict resolution component for transactions (server
>> side.) They’re also used in order to coordinate the master and backup
>> replicas in high availability scenarios.
>> 
>> 
>> ==== An Excessive Fascination with the Apache Brand ====
>> 
>> We are applying to the Incubator process because we think that it is
>> the logical next step for the  Omid project after we open-sourced the
>> code in Github some years ago. Yahoo has a long-standing history of
>> contributing to Apache projects. The developers and contributors
>> understand the implications of making it an Apache project, and
>> strongly believe that the growing community can benefit from the
>> Apache environment, ecosystem, and infrastrastructure.
>> 
>> 
>> === Documentation ===
>> Current documentation about the project is available in the wiki of
>> Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
>> be moved under https://omid.incubator.apache.org/docs if the project
>> is accepted as an Apache Incubator.
>> 
>> === Initial Source ===
>> Initial source code is currently hosted in Github for general viewing
>> and contribution:
>> 
>> https://github.com/yahoo/omid.git
>> 
>> 
>> Omid source code is written in Java code (99%) mixed with some shell
>> script (1%) in order to configure and trigger the execution of main
>> components.
>> 
>> 
>> The code will be moved to Apache http://git.apache.org/ if accepted as
>> an Incubator project.
>> 
>> === Source and Intellectual Property Submission Plan ===
>> 
>> The current Omid License for the code published in Github is Apache
>> 2.0. If Omid fulfills and passes the conditions for being an Incubator
>> project in the ASF, the source code will be transitioned via the
>> Software Grant Agreement onto the ASF infrastructure and in turn made
>> available under the Apache License, version 2.0.
>> 
>> === External Dependencies ===
>> 
>> 
>> The required external dependencies that are not Apache projects are
>> all Apache licenses or other compatible Licenses:
>> 
>> Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
>> 
>> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
>> 
>> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>> 
>> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
>> 
>> Testng v6.8.8  (http://testng.org) [Apache 2.0]
>> 
>> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>> 
>> Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>> 
>> Google Protocol Buffers v2.5.0
>> (https://developers.google.com/protocol-buffers/) [BSD License]
>> 
>> Mockito (http://mockito.org/) v1.9.5 [MIT License]
>> 
>> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
>> [Apache 2.0]
>> 
>> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
>> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>> 
>> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>> 
>> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>> 
>> 
>> === Cryptography ===
>> Omid project does not use cryptography itself. However, Apache HBase
>> -the datastore on top of which Omid works in its current version- uses
>> standard APIs and tools for SSH and SSL communication where necessary.
>> 
>> === Required Resources ===
>> We request that following resources be created for the project to use:
>> 
>> ==== Mailing lists ====
>> 
>> omid-private (moderated subscriptions)
>> 
>> omid-commits (commit notification)
>> omid-dev (technical discussions)
>> 
>> ==== Git repository ====
>> https://github.com/apache/incubator-omid
>> 
>> ==== Documentation ====
>> https://omid.incubator.apache.org/docs/
>> 
>> ==== JIRA instance ====
>> https://issues.apache.org/jira/browse/omid
>> 
>> === Initial Committers ===
>> 
>> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>> 
>> 
>> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>> 
>> 
>> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>> 
>> 
>> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>> 
>> 
>> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>> 
>> 
>> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>> 
>> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>> 
>> 
>> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>> 
>> 
>> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>> 
>> 
>> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>> 
>> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>> 
>> 
>> === Additional Interested Contributors ===
>> * Ivan Kelly (ivank<AT>apache<DOT>org)
>> 
>> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>> 
>> 
>> === Affiliations ===
>> 
>> * Edward Bortnikov, Yahoo Inc.
>> 
>> 
>> * Daniel Dai, Hortonworks
>> 
>> 
>> * Flavio P. Junqueira, Confluent
>> 
>> 
>> * Igor Katkov, Yahoo Inc.
>> 
>> 
>> * Ivan Kelly, Midokura
>> 
>> 
>> * Francis C. Liu, Yahoo Inc.
>> 
>> 
>> * Sameer Paranjpye, Arimo
>> 
>> * Francisco Perez-Sorrosal, Yahoo Inc.
>> 
>> 
>> * Ohad Shacham, Yahoo Inc.
>> 
>> 
>> * Maysam Yabandeh, Dropbox Inc.
>> 
>> 
>> === Sponsors ===
>> 
>> ==== Champion ====
>> 
>> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>> 
>> ==== Nominated Mentors ====
>> 
>> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>> 
>> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>> 
>> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>> 
>> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>> 
>> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>> 
>> 
>> ==== Sponsoring Entity ====
>> Apache Incubator PMC
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Henry Saputra <he...@gmail.com>.
I know Apache incubator does not play favorite but it is getting awkward
that TWO transaction engine for HBase coming to incubator at the same time.

As most people know, the other one is Tephra, that just coming to incubator
few weeks ago.

As member of IPMC, I would like to see Omid provide some more details
comparisons about the difference that the project bring,  in term of
approach and possible integrations with other ASF projects.

If possible, I would prefer to see Omid team work together with Tephra to
work on working together to make one solid transaction engine for HBase and
later NoSQL databases.


- Henry

On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <da...@gmail.com> wrote:

> Hi,
>
> I would like to propose Omid as an Apache Incubator project:
>
> https://wiki.apache.org/incubator/OmidProposal
>
> I've posted posted the text of the proposal below:
>
> Thanks,
> Daniel
>
> = Omid Proposal =
>
> === Abstract ===
>
> Omid is a flexible, reliable, high performant and scalable ACID
> transactional framework that allows client applications to execute
> transactions on top of MVCC key/value-based NoSQL datastores
> (currently Apache HBase) providing Snapshot Isolation guarantees on
> the accessed data.
>
>
> === Proposal ===
>
> Omid is a flexible open-source transactional framework that provides
> ACID transactions with Snapshot Isolation guarantees on top of NoSQL
> datastores. In particular, the current codebase brings the concept of
> transactions to the popular Apache HBase datastore. Omid offers great
> performance, it is highly available, and scalable. Omid's current
> version is able to scale to thousands of clients triggering concurrent
> transactions on application data stored in HBase. Omid can scale
> beyond 100K transactions per second on mid-range hardware while
> incurring in a minimal impact on the speed of data access in the
> datastore. We’re currently experimenting with a prototype version that
> can improve the performance up to ~380K TPS.
>
>
> Omid has been publicly available as an open-source project in Github
> under Apache License Version 2.0 since 2011 [1]. During these years,
> it has generated certain interest in the open source community,
> especially since the public presentation of the first version in
> Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
> 93 forks. Yahoo Inc. submits this proposal to the Apache Software
> Foundation with the aim to transfer the Omid project -including its
> source code and documentation- to Apache in order to start the build
> of a stable open source community around it.
>
>
> [1] https://github.com/yahoo/omid
>
> [2] Omid presentation at Hadoop Summit 2013:
>
> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
>
>
> === Background ===
>
> An Omid prototype was first released as an open-source project back in
> 2011. Inspired by Google Percolator [1], it offered a lock-free
> approach to transactions in NoSQL datastores (See [2]). However,
> during these years, the design of Omid has evolved significantly.
> Whilst the current open-sourced version maintains many aspects of the
> original implementation, it is the result of a major redesign of the
> first prototype released in 2011.
>
>
> Omid has now a more decentralized design that does not sacrifice the
> consistency and performance of the original version. The current
> design also enables Omid to scale to thousands of clients executing
> transactions concurrently on application data stored in HBase.
> Internally, Omid still utilizes a lock-free approach to support
> multiple concurrent clients. Its design also relies on a centralized
> conflict detection component, the TSO, which now resolves in an
> efficient manner writeset collisions among concurrent transactions
> without having to piggyback commit information to the clients. Another
> important benefit of Omid is that it doesn't require any modification
> of the underlying key-value datastore, HBase in this case. Moreover,
> the recently added high availability algorithm allows to eliminate the
> single point of failure represented by the TSO in those system
> deployments requiring a higher degree of dependability. Last but not
> least, the provided user API is very simple, mimicking transaction
> managers in the relational world: begin, commit, rollback.
>
>
> Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
> management platform powering some of next-generation search and
> personalization products is using Omid as a transaction manager in its
> processing pipeline. Sieve essentially acts as a huge processing hub
> between content feeds and serving systems. It provides an environment
> for highly customizable, real-time, streamed information processing,
> with typical discovery-to-service latencies of just a few seconds. In
> terms of scale and availability, Omid’s new design was largely driven
> by Sieve’s requirements.
>
>
> At Yahoo, we are also making an effort to disseminate the current
> status of the project through blog entries (See [3], [4] and [5]) and
> submissions to technical and academic conferences such as ATC 2016,
> Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
> appeared in a TechCrunch article in the last quarter of 2015 (See [6])
>
>
> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
> Distributed Transactions and Notifications. USENIX Symposium on
> Operating Systems Design and Implementation, 2010
>
> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
> Omid: Lock-free transactional support for distributed data stores. In
> Proc. of ICDE, 2013.
>
> [3]
> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
>
> [4]
> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
>
> [5]
> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
>
> [6]
> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
>
>
> === Rationale ===
>
> Programming with ACID (Atomicity, Consistency, Isolation, Durability)
> transactions is very popular and it is featured in relational
> databases. However, in the Big Data ecosystem, applications typically
> use NoSQL datastores, which do not provide ACID transactions. Such
> NoSQL datastores used to give up transactional support for greater
> agility and scalability. However, while early NoSQL data store
> implementations did not include transaction support, the need for
> transactions soon emerged in Big Data applications when accessing
> shared data; for  example, transactions are very important  for
> modern, scalable systems that process content incrementally.
>
>
> NoSQL datastores -including HBase- don’t provide transactional
> frameworks to coordinate the access to the underlying data for
> preserving consistency. By using Omid, Big Data applications that need
> to bundle multiple read and write operations on HBase into logically
> indivisible units of work can execute transactions with ACID
> properties, just as they would use transactions in the relational
> database world. Omid extends the HBase key-value access APl with
> transaction semantics. It can be exercised either directly, or via
> higher level data management API’s. For example, Apache Phoenix
> (SQL-on-top-of-HBase) might use Omid as its transaction management
> component.
>
>
> The following features make Omid an attractive choice for system
> designers and other projects in the Apache community:
>
>
> * Semantics. Omid implements Snapshot Isolation (SI,) supported by
> major SQL and NoSQL technologies (e.g. Google Percolator).
>
>
> * Performance and Scalability. Omid  provides a highly scalable,
> lock-free implementation of SI. To the best of our knowledge, it is
> also one of the few open source NoSQL transactional platforms that can
> execute more than 100K transactions per second [1]. A new prototype
> still in development can go even further, up to ~380K TPS.
>
>
> * Reliability.  Omid has a high-availability (HA) mode, in which the
> core service performing writeset conflict resolution operates as
> primary-backup process pair with automatic failover. The HA support
> has zero overhead on the mainstream operation.
>
>
> * Adaptability. Omid current version provides transactions on data
> stored in Apache HBase. However, Omid’s components are generic enough
> to be adapted to any other key-value NoSQL datasource that supports
> MVCC.
>
>
> * Development. Omid provides a very simple interface that mimics
> standard HBase APIs, making it developer friendly. Only minimal
> extensions to the standard interfaces have been introduced to enable
> transactions.
>
>
> * Simplicity. Omid leverages the HBase infrastructure for managing its
> own metadata. It entails no additional services apart from those
> provided and used by HBase.
>
>
> * Track Record. As we have mentioned, Omid is already in use by
> very-large-scale production systems at Yahoo. Also, Hortonworks is
> integrating Omid in a metastore implementation for Hive based on
> HBase.
>
> [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
>
>
> === Current Status ===
> Current Omid implementation is available in both, Yahoo’s internal
> Github repository for internal use at Yahoo as well as in Yahoo’s
> Github public repository (https://github.com/yahoo/omid.git). Both
> repositories are managed by Omid’s current developers at Yahoo.
>
> As it is mentioned above, Yahoo is currently using Omid for providing
> transactions in Sieve, a web-scale content management platform that
> powers Yahoo’s next-generation search and personalization products.
>
>
> ==== Meritocracy ====
> The first version of Omid was originally created in 2011 by Maysam
> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
> Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>
>
> During the years after its inception, Omid has matured to operate at
> Web scale and has been used internally by strategic projects at Yahoo
> such as Sieve. The current base of committers belong to the Yahoo team
> that took over the initial Omid prototype and rewrote it to meet the
> high availability and scalability requirements of the Sieve project.
> This base of committers has recently incorporated Hortonworks members
> that helped in the Omid adaptation to HBase 1.x versions.
>
>
> With this initial committer base, we aim to form a larger community
> that can collaborate with new ideas over the current code base. This
> new community will run the project following the "Apache Way"
> (http://apache.org/foundation/governance/). Users and new contributors
> will be treated with respect and welcomed. To grow the community, we
> will encourage contributors to provide patches, review code, propose
> new features improvements, talk at conferences such as Hadoop Summit,
> HBaseCon, ApacheCon, etc. Committership and PMC membership will be
> offered according to meritocracy.
>
> ==== Community ====
>
> The public Yahoo Omid repository at Github currently has 241 Stars and
> 93 forks, which means that there is an important interest for the
> project in the open-source community, at least compared with other
> similar projects (See https://github.com/yahoo/omid.git).
>
>
> Recently, Hortonworks contributors to the Apache Hive project which
> are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
> manifested interest in using Omid. We started with them a fruitful
> collaboration that resulted in Omid supporting HBase 1.x versions.
>
>
> Salesforce is also interested in collaborating in doing a Proof of
> Concept for integrating Omid as a pluggable transaction manager in
> Apache Phoenix.
>
>
> Yahoo, Hortonworks and Salesforce participants will constitute the
> initial set of committers and mentors for the proposal.
>
> ==== Core Developers ====
> The core developers of Omid are all skilled software developers and
> research engineers at Yahoo Inc. and Hortonworks with years of
> experiences in their fields. At this moment, developers are
> distributed across U.S. and Israel. The aim is to incorporate more
> committers from different organizations and locations over time.
>
>
> The current set of developers include experienced committers from
> Apache HBase, Hive and Hadoop projects that have been working with us
> in the current codebase found in Github.
>
> Finally, some of the core developers are currently NOT affiliated with
> the ASF and would require new ICLAs to be filed.
>
>
> === Alignment ===
> Omid enhances with transactions the already successful Apache HBase
> datastore project. We have collaborated with other developers inside
> and outside Yahoo which are involved in the Apache HBase community, so
> we have had reliable feedback from them.
>
> Although Omid brings value into HBase, the design of the current
> version provides a general transaction scheme that can potentially be
> adapted to other MVCC key-value datastores such as Apache Cassandra.
>
>
> Apache Phoenix is also a potential target. Phoenix is a SQL layer on
> top of HBase that can potentially integrate Omid in order to provide
> the well-know concept of transactions to Phoenix-based applications.
>
>
> === Known Risks ===
> ==== Orphaned products ====
> Yahoo’s Research and Search organizations have been taking care of
> Omid development since the first prototype creation in 2011. Yahoo has
> a long history participating in open-source projects, and has been
> also a long time contributor to the Apache community. For example, in
> Apache, Yahoo is an important contributor in many projects in the
> Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
> open-sourced other well-known projects outside Hadoop, such as
> Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
> Omid also a successful open-source Apache product. If this happens, we
> are sure that a larger community will be formed around the project in
> a relatively short period of time, contributing to the diversification
> and stabilization of the base of committers.
>
>
> ==== Inexperience with Open Source ====
> This project has long standing experienced mentors and interested
> contributors from Apache HBase, Hive and Phoenix to help us moving
> through the open source process. We are actively working with
> experienced Apache community members to improve our project and
> further testing.
>
> ==== Homogeneous Developers ====
> Omid has been supported by Yahoo since its inception in 2011. However,
> all current committers are employed by their respective companies
> shown in the Affiliations section.
>
>
> ==== Reliance on Salaried Developers ====
>
> All the current developers are paid by their employers to contribute
> to this project. Yahoo developers will also continuing maintaining the
> internal Omid repository at their company.
>
> Of course, other developers are welcomed to contribute to this project
> after it is open sourced in Apache.
>
> ==== Relationships with Other Apache Product ====
>
> Current Omid incarnation serves transactional contexts to applications
> storing their data in HBase. However Omid design potentially allows to
> be adapted to serve transactions on top of other MVCC-based key-value
> datastores in Apache community such as Cassandra.
>
>
> As a transactional framework, many other Apache projects such as
> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
> potentially benefit from Omid to get transactional contexts. In
> particular, Apache Phoenix -a SQL layer on top of HBase- might use
> Omid as its transaction management component. Once we open source Omid
> as an Apache project, we expect to generate more interest in the
> surrounded communities.
>
>
> Very recently, a new incubator proposal for a similar project called
> Tephra, has been submitted to the ASF. We think this is good for the
> Apache community, and we believe that there’s room for both proposals
> as the design of each of them is based on different principles (e.g.
> Omid does not require to maintain the state of ongoing transactions on
> the server-side component) and due to the fact that both -Tephra and
> Omid- have also gained certain traction in the open-source community.
>
>
> With regard to the Apache projects that Omid uses, apart from HBase,
> Omid relies on Apache Zookeeper and Curator projects in order to
> coordinate the (re)connection of transaction managers (acting as
> clients) to the conflict resolution component for transactions (server
> side.) They’re also used in order to coordinate the master and backup
> replicas in high availability scenarios.
>
>
> ==== An Excessive Fascination with the Apache Brand ====
>
> We are applying to the Incubator process because we think that it is
> the logical next step for the  Omid project after we open-sourced the
> code in Github some years ago. Yahoo has a long-standing history of
> contributing to Apache projects. The developers and contributors
> understand the implications of making it an Apache project, and
> strongly believe that the growing community can benefit from the
> Apache environment, ecosystem, and infrastrastructure.
>
>
> === Documentation ===
> Current documentation about the project is available in the wiki of
> Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
> be moved under https://omid.incubator.apache.org/docs if the project
> is accepted as an Apache Incubator.
>
> === Initial Source ===
> Initial source code is currently hosted in Github for general viewing
> and contribution:
>
> https://github.com/yahoo/omid.git
>
>
> Omid source code is written in Java code (99%) mixed with some shell
> script (1%) in order to configure and trigger the execution of main
> components.
>
>
> The code will be moved to Apache http://git.apache.org/ if accepted as
> an Incubator project.
>
> === Source and Intellectual Property Submission Plan ===
>
> The current Omid License for the code published in Github is Apache
> 2.0. If Omid fulfills and passes the conditions for being an Incubator
> project in the ASF, the source code will be transitioned via the
> Software Grant Agreement onto the ASF infrastructure and in turn made
> available under the Apache License, version 2.0.
>
> === External Dependencies ===
>
>
> The required external dependencies that are not Apache projects are
> all Apache licenses or other compatible Licenses:
>
> Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
>
> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
>
> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>
> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
>
> Testng v6.8.8  (http://testng.org) [Apache 2.0]
>
> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>
> Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>
> Google Protocol Buffers v2.5.0
> (https://developers.google.com/protocol-buffers/) [BSD License]
>
> Mockito (http://mockito.org/) v1.9.5 [MIT License]
>
> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
> [Apache 2.0]
>
> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>
> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>
> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>
>
> === Cryptography ===
> Omid project does not use cryptography itself. However, Apache HBase
> -the datastore on top of which Omid works in its current version- uses
> standard APIs and tools for SSH and SSL communication where necessary.
>
> === Required Resources ===
> We request that following resources be created for the project to use:
>
> ==== Mailing lists ====
>
> omid-private (moderated subscriptions)
>
> omid-commits (commit notification)
> omid-dev (technical discussions)
>
> ==== Git repository ====
> https://github.com/apache/incubator-omid
>
> ==== Documentation ====
> https://omid.incubator.apache.org/docs/
>
> ==== JIRA instance ====
> https://issues.apache.org/jira/browse/omid
>
> === Initial Committers ===
>
> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>
>
> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>
>
> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>
>
> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>
>
> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>
>
> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>
> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>
>
> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>
>
> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>
>
> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>
> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>
>
> === Additional Interested Contributors ===
> * Ivan Kelly (ivank<AT>apache<DOT>org)
>
> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>
>
> === Affiliations ===
>
> * Edward Bortnikov, Yahoo Inc.
>
>
> * Daniel Dai, Hortonworks
>
>
> * Flavio P. Junqueira, Confluent
>
>
> * Igor Katkov, Yahoo Inc.
>
>
> * Ivan Kelly, Midokura
>
>
> * Francis C. Liu, Yahoo Inc.
>
>
> * Sameer Paranjpye, Arimo
>
> * Francisco Perez-Sorrosal, Yahoo Inc.
>
>
> * Ohad Shacham, Yahoo Inc.
>
>
> * Maysam Yabandeh, Dropbox Inc.
>
>
> === Sponsors ===
>
> ==== Champion ====
>
> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>
> ==== Nominated Mentors ====
>
> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>
> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>
> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>
> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>
> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>
>
> ==== Sponsoring Entity ====
> Apache Incubator PMC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Josh Elser <el...@apache.org>.
+1

Glad to see this hit incubator. Feel free to add me as a mentor if you'd 
like another.

Daniel Dai wrote:
> Hi,
>
> I would like to propose Omid as an Apache Incubator project:
>
> https://wiki.apache.org/incubator/OmidProposal
>
> I've posted posted the text of the proposal below:
>
> Thanks,
> Daniel
>
> = Omid Proposal =
>
> === Abstract ===
>
> Omid is a flexible, reliable, high performant and scalable ACID
> transactional framework that allows client applications to execute
> transactions on top of MVCC key/value-based NoSQL datastores
> (currently Apache HBase) providing Snapshot Isolation guarantees on
> the accessed data.
>
>
> === Proposal ===
>
> Omid is a flexible open-source transactional framework that provides
> ACID transactions with Snapshot Isolation guarantees on top of NoSQL
> datastores. In particular, the current codebase brings the concept of
> transactions to the popular Apache HBase datastore. Omid offers great
> performance, it is highly available, and scalable. Omid's current
> version is able to scale to thousands of clients triggering concurrent
> transactions on application data stored in HBase. Omid can scale
> beyond 100K transactions per second on mid-range hardware while
> incurring in a minimal impact on the speed of data access in the
> datastore. We’re currently experimenting with a prototype version that
> can improve the performance up to ~380K TPS.
>
>
> Omid has been publicly available as an open-source project in Github
> under Apache License Version 2.0 since 2011 [1]. During these years,
> it has generated certain interest in the open source community,
> especially since the public presentation of the first version in
> Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
> 93 forks. Yahoo Inc. submits this proposal to the Apache Software
> Foundation with the aim to transfer the Omid project -including its
> source code and documentation- to Apache in order to start the build
> of a stable open source community around it.
>
>
> [1] https://github.com/yahoo/omid
>
> [2] Omid presentation at Hadoop Summit 2013:
> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
>
>
> === Background ===
>
> An Omid prototype was first released as an open-source project back in
> 2011. Inspired by Google Percolator [1], it offered a lock-free
> approach to transactions in NoSQL datastores (See [2]). However,
> during these years, the design of Omid has evolved significantly.
> Whilst the current open-sourced version maintains many aspects of the
> original implementation, it is the result of a major redesign of the
> first prototype released in 2011.
>
>
> Omid has now a more decentralized design that does not sacrifice the
> consistency and performance of the original version. The current
> design also enables Omid to scale to thousands of clients executing
> transactions concurrently on application data stored in HBase.
> Internally, Omid still utilizes a lock-free approach to support
> multiple concurrent clients. Its design also relies on a centralized
> conflict detection component, the TSO, which now resolves in an
> efficient manner writeset collisions among concurrent transactions
> without having to piggyback commit information to the clients. Another
> important benefit of Omid is that it doesn't require any modification
> of the underlying key-value datastore, HBase in this case. Moreover,
> the recently added high availability algorithm allows to eliminate the
> single point of failure represented by the TSO in those system
> deployments requiring a higher degree of dependability. Last but not
> least, the provided user API is very simple, mimicking transaction
> managers in the relational world: begin, commit, rollback.
>
>
> Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
> management platform powering some of next-generation search and
> personalization products is using Omid as a transaction manager in its
> processing pipeline. Sieve essentially acts as a huge processing hub
> between content feeds and serving systems. It provides an environment
> for highly customizable, real-time, streamed information processing,
> with typical discovery-to-service latencies of just a few seconds. In
> terms of scale and availability, Omid’s new design was largely driven
> by Sieve’s requirements.
>
>
> At Yahoo, we are also making an effort to disseminate the current
> status of the project through blog entries (See [3], [4] and [5]) and
> submissions to technical and academic conferences such as ATC 2016,
> Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
> appeared in a TechCrunch article in the last quarter of 2015 (See [6])
>
>
> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
> Distributed Transactions and Notifications. USENIX Symposium on
> Operating Systems Design and Implementation, 2010
>
> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
> Omid: Lock-free transactional support for distributed data stores. In
> Proc. of ICDE, 2013.
>
> [3] http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
>
> [4] http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
>
> [5] http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
>
> [6] http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
>
>
> === Rationale ===
>
> Programming with ACID (Atomicity, Consistency, Isolation, Durability)
> transactions is very popular and it is featured in relational
> databases. However, in the Big Data ecosystem, applications typically
> use NoSQL datastores, which do not provide ACID transactions. Such
> NoSQL datastores used to give up transactional support for greater
> agility and scalability. However, while early NoSQL data store
> implementations did not include transaction support, the need for
> transactions soon emerged in Big Data applications when accessing
> shared data; for  example, transactions are very important  for
> modern, scalable systems that process content incrementally.
>
>
> NoSQL datastores -including HBase- don’t provide transactional
> frameworks to coordinate the access to the underlying data for
> preserving consistency. By using Omid, Big Data applications that need
> to bundle multiple read and write operations on HBase into logically
> indivisible units of work can execute transactions with ACID
> properties, just as they would use transactions in the relational
> database world. Omid extends the HBase key-value access APl with
> transaction semantics. It can be exercised either directly, or via
> higher level data management API’s. For example, Apache Phoenix
> (SQL-on-top-of-HBase) might use Omid as its transaction management
> component.
>
>
> The following features make Omid an attractive choice for system
> designers and other projects in the Apache community:
>
>
> * Semantics. Omid implements Snapshot Isolation (SI,) supported by
> major SQL and NoSQL technologies (e.g. Google Percolator).
>
>
> * Performance and Scalability. Omid  provides a highly scalable,
> lock-free implementation of SI. To the best of our knowledge, it is
> also one of the few open source NoSQL transactional platforms that can
> execute more than 100K transactions per second [1]. A new prototype
> still in development can go even further, up to ~380K TPS.
>
>
> * Reliability.  Omid has a high-availability (HA) mode, in which the
> core service performing writeset conflict resolution operates as
> primary-backup process pair with automatic failover. The HA support
> has zero overhead on the mainstream operation.
>
>
> * Adaptability. Omid current version provides transactions on data
> stored in Apache HBase. However, Omid’s components are generic enough
> to be adapted to any other key-value NoSQL datasource that supports
> MVCC.
>
>
> * Development. Omid provides a very simple interface that mimics
> standard HBase APIs, making it developer friendly. Only minimal
> extensions to the standard interfaces have been introduced to enable
> transactions.
>
>
> * Simplicity. Omid leverages the HBase infrastructure for managing its
> own metadata. It entails no additional services apart from those
> provided and used by HBase.
>
>
> * Track Record. As we have mentioned, Omid is already in use by
> very-large-scale production systems at Yahoo. Also, Hortonworks is
> integrating Omid in a metastore implementation for Hive based on
> HBase.
>
> [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
>
>
> === Current Status ===
> Current Omid implementation is available in both, Yahoo’s internal
> Github repository for internal use at Yahoo as well as in Yahoo’s
> Github public repository (https://github.com/yahoo/omid.git). Both
> repositories are managed by Omid’s current developers at Yahoo.
>
> As it is mentioned above, Yahoo is currently using Omid for providing
> transactions in Sieve, a web-scale content management platform that
> powers Yahoo’s next-generation search and personalization products.
>
>
> ==== Meritocracy ====
> The first version of Omid was originally created in 2011 by Maysam
> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
> Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>
>
> During the years after its inception, Omid has matured to operate at
> Web scale and has been used internally by strategic projects at Yahoo
> such as Sieve. The current base of committers belong to the Yahoo team
> that took over the initial Omid prototype and rewrote it to meet the
> high availability and scalability requirements of the Sieve project.
> This base of committers has recently incorporated Hortonworks members
> that helped in the Omid adaptation to HBase 1.x versions.
>
>
> With this initial committer base, we aim to form a larger community
> that can collaborate with new ideas over the current code base. This
> new community will run the project following the "Apache Way"
> (http://apache.org/foundation/governance/). Users and new contributors
> will be treated with respect and welcomed. To grow the community, we
> will encourage contributors to provide patches, review code, propose
> new features improvements, talk at conferences such as Hadoop Summit,
> HBaseCon, ApacheCon, etc. Committership and PMC membership will be
> offered according to meritocracy.
>
> ==== Community ====
>
> The public Yahoo Omid repository at Github currently has 241 Stars and
> 93 forks, which means that there is an important interest for the
> project in the open-source community, at least compared with other
> similar projects (See https://github.com/yahoo/omid.git).
>
>
> Recently, Hortonworks contributors to the Apache Hive project which
> are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
> manifested interest in using Omid. We started with them a fruitful
> collaboration that resulted in Omid supporting HBase 1.x versions.
>
>
> Salesforce is also interested in collaborating in doing a Proof of
> Concept for integrating Omid as a pluggable transaction manager in
> Apache Phoenix.
>
>
> Yahoo, Hortonworks and Salesforce participants will constitute the
> initial set of committers and mentors for the proposal.
>
> ==== Core Developers ====
> The core developers of Omid are all skilled software developers and
> research engineers at Yahoo Inc. and Hortonworks with years of
> experiences in their fields. At this moment, developers are
> distributed across U.S. and Israel. The aim is to incorporate more
> committers from different organizations and locations over time.
>
>
> The current set of developers include experienced committers from
> Apache HBase, Hive and Hadoop projects that have been working with us
> in the current codebase found in Github.
>
> Finally, some of the core developers are currently NOT affiliated with
> the ASF and would require new ICLAs to be filed.
>
>
> === Alignment ===
> Omid enhances with transactions the already successful Apache HBase
> datastore project. We have collaborated with other developers inside
> and outside Yahoo which are involved in the Apache HBase community, so
> we have had reliable feedback from them.
>
> Although Omid brings value into HBase, the design of the current
> version provides a general transaction scheme that can potentially be
> adapted to other MVCC key-value datastores such as Apache Cassandra.
>
>
> Apache Phoenix is also a potential target. Phoenix is a SQL layer on
> top of HBase that can potentially integrate Omid in order to provide
> the well-know concept of transactions to Phoenix-based applications.
>
>
> === Known Risks ===
> ==== Orphaned products ====
> Yahoo’s Research and Search organizations have been taking care of
> Omid development since the first prototype creation in 2011. Yahoo has
> a long history participating in open-source projects, and has been
> also a long time contributor to the Apache community. For example, in
> Apache, Yahoo is an important contributor in many projects in the
> Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
> open-sourced other well-known projects outside Hadoop, such as
> Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
> Omid also a successful open-source Apache product. If this happens, we
> are sure that a larger community will be formed around the project in
> a relatively short period of time, contributing to the diversification
> and stabilization of the base of committers.
>
>
> ==== Inexperience with Open Source ====
> This project has long standing experienced mentors and interested
> contributors from Apache HBase, Hive and Phoenix to help us moving
> through the open source process. We are actively working with
> experienced Apache community members to improve our project and
> further testing.
>
> ==== Homogeneous Developers ====
> Omid has been supported by Yahoo since its inception in 2011. However,
> all current committers are employed by their respective companies
> shown in the Affiliations section.
>
>
> ==== Reliance on Salaried Developers ====
>
> All the current developers are paid by their employers to contribute
> to this project. Yahoo developers will also continuing maintaining the
> internal Omid repository at their company.
>
> Of course, other developers are welcomed to contribute to this project
> after it is open sourced in Apache.
>
> ==== Relationships with Other Apache Product ====
>
> Current Omid incarnation serves transactional contexts to applications
> storing their data in HBase. However Omid design potentially allows to
> be adapted to serve transactions on top of other MVCC-based key-value
> datastores in Apache community such as Cassandra.
>
>
> As a transactional framework, many other Apache projects such as
> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
> potentially benefit from Omid to get transactional contexts. In
> particular, Apache Phoenix -a SQL layer on top of HBase- might use
> Omid as its transaction management component. Once we open source Omid
> as an Apache project, we expect to generate more interest in the
> surrounded communities.
>
>
> Very recently, a new incubator proposal for a similar project called
> Tephra, has been submitted to the ASF. We think this is good for the
> Apache community, and we believe that there’s room for both proposals
> as the design of each of them is based on different principles (e.g.
> Omid does not require to maintain the state of ongoing transactions on
> the server-side component) and due to the fact that both -Tephra and
> Omid- have also gained certain traction in the open-source community.
>
>
> With regard to the Apache projects that Omid uses, apart from HBase,
> Omid relies on Apache Zookeeper and Curator projects in order to
> coordinate the (re)connection of transaction managers (acting as
> clients) to the conflict resolution component for transactions (server
> side.) They’re also used in order to coordinate the master and backup
> replicas in high availability scenarios.
>
>
> ==== An Excessive Fascination with the Apache Brand ====
>
> We are applying to the Incubator process because we think that it is
> the logical next step for the  Omid project after we open-sourced the
> code in Github some years ago. Yahoo has a long-standing history of
> contributing to Apache projects. The developers and contributors
> understand the implications of making it an Apache project, and
> strongly believe that the growing community can benefit from the
> Apache environment, ecosystem, and infrastrastructure.
>
>
> === Documentation ===
> Current documentation about the project is available in the wiki of
> Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
> be moved under https://omid.incubator.apache.org/docs if the project
> is accepted as an Apache Incubator.
>
> === Initial Source ===
> Initial source code is currently hosted in Github for general viewing
> and contribution:
>
> https://github.com/yahoo/omid.git
>
>
> Omid source code is written in Java code (99%) mixed with some shell
> script (1%) in order to configure and trigger the execution of main
> components.
>
>
> The code will be moved to Apache http://git.apache.org/ if accepted as
> an Incubator project.
>
> === Source and Intellectual Property Submission Plan ===
>
> The current Omid License for the code published in Github is Apache
> 2.0. If Omid fulfills and passes the conditions for being an Incubator
> project in the ASF, the source code will be transitioned via the
> Software Grant Agreement onto the ASF infrastructure and in turn made
> available under the Apache License, version 2.0.
>
> === External Dependencies ===
>
>
> The required external dependencies that are not Apache projects are
> all Apache licenses or other compatible Licenses:
>
> Maven&  Maven plugins (http://maven.apache.org/) [Apache 2.0]
>
> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
>
> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>
> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
>
> Testng v6.8.8  (http://testng.org) [Apache 2.0]
>
> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>
> Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>
> Google Protocol Buffers v2.5.0
> (https://developers.google.com/protocol-buffers/) [BSD License]
>
> Mockito (http://mockito.org/) v1.9.5 [MIT License]
>
> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/) [Apache 2.0]
>
> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>
> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>
> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>
>
> === Cryptography ===
> Omid project does not use cryptography itself. However, Apache HBase
> -the datastore on top of which Omid works in its current version- uses
> standard APIs and tools for SSH and SSL communication where necessary.
>
> === Required Resources ===
> We request that following resources be created for the project to use:
>
> ==== Mailing lists ====
>
> omid-private (moderated subscriptions)
>
> omid-commits (commit notification)
> omid-dev (technical discussions)
>
> ==== Git repository ====
> https://github.com/apache/incubator-omid
>
> ==== Documentation ====
> https://omid.incubator.apache.org/docs/
>
> ==== JIRA instance ====
> https://issues.apache.org/jira/browse/omid
>
> === Initial Committers ===
>
> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>
>
> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>
>
> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>
>
> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>
>
> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>
>
> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>
> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>
>
> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>
>
> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>
>
> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>
> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>
>
> === Additional Interested Contributors ===
> * Ivan Kelly (ivank<AT>apache<DOT>org)
>
> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>
>
> === Affiliations ===
>
> * Edward Bortnikov, Yahoo Inc.
>
>
> * Daniel Dai, Hortonworks
>
>
> * Flavio P. Junqueira, Confluent
>
>
> * Igor Katkov, Yahoo Inc.
>
>
> * Ivan Kelly, Midokura
>
>
> * Francis C. Liu, Yahoo Inc.
>
>
> * Sameer Paranjpye, Arimo
>
> * Francisco Perez-Sorrosal, Yahoo Inc.
>
>
> * Ohad Shacham, Yahoo Inc.
>
>
> * Maysam Yabandeh, Dropbox Inc.
>
>
> === Sponsors ===
>
> ==== Champion ====
>
> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>
> ==== Nominated Mentors ====
>
> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>
> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>
> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>
> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>
> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>
>
> ==== Sponsoring Entity ====
> Apache Incubator PMC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Stack <st...@duboce.net>.
+1 (binding)
Good luck lads,
St.Ack

On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <da...@gmail.com> wrote:

> Hi,
>
> I would like to propose Omid as an Apache Incubator project:
>
> https://wiki.apache.org/incubator/OmidProposal
>
> I've posted posted the text of the proposal below:
>
> Thanks,
> Daniel
>
> = Omid Proposal =
>
> === Abstract ===
>
> Omid is a flexible, reliable, high performant and scalable ACID
> transactional framework that allows client applications to execute
> transactions on top of MVCC key/value-based NoSQL datastores
> (currently Apache HBase) providing Snapshot Isolation guarantees on
> the accessed data.
>
>
> === Proposal ===
>
> Omid is a flexible open-source transactional framework that provides
> ACID transactions with Snapshot Isolation guarantees on top of NoSQL
> datastores. In particular, the current codebase brings the concept of
> transactions to the popular Apache HBase datastore. Omid offers great
> performance, it is highly available, and scalable. Omid's current
> version is able to scale to thousands of clients triggering concurrent
> transactions on application data stored in HBase. Omid can scale
> beyond 100K transactions per second on mid-range hardware while
> incurring in a minimal impact on the speed of data access in the
> datastore. We’re currently experimenting with a prototype version that
> can improve the performance up to ~380K TPS.
>
>
> Omid has been publicly available as an open-source project in Github
> under Apache License Version 2.0 since 2011 [1]. During these years,
> it has generated certain interest in the open source community,
> especially since the public presentation of the first version in
> Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
> 93 forks. Yahoo Inc. submits this proposal to the Apache Software
> Foundation with the aim to transfer the Omid project -including its
> source code and documentation- to Apache in order to start the build
> of a stable open source community around it.
>
>
> [1] https://github.com/yahoo/omid
>
> [2] Omid presentation at Hadoop Summit 2013:
>
> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
>
>
> === Background ===
>
> An Omid prototype was first released as an open-source project back in
> 2011. Inspired by Google Percolator [1], it offered a lock-free
> approach to transactions in NoSQL datastores (See [2]). However,
> during these years, the design of Omid has evolved significantly.
> Whilst the current open-sourced version maintains many aspects of the
> original implementation, it is the result of a major redesign of the
> first prototype released in 2011.
>
>
> Omid has now a more decentralized design that does not sacrifice the
> consistency and performance of the original version. The current
> design also enables Omid to scale to thousands of clients executing
> transactions concurrently on application data stored in HBase.
> Internally, Omid still utilizes a lock-free approach to support
> multiple concurrent clients. Its design also relies on a centralized
> conflict detection component, the TSO, which now resolves in an
> efficient manner writeset collisions among concurrent transactions
> without having to piggyback commit information to the clients. Another
> important benefit of Omid is that it doesn't require any modification
> of the underlying key-value datastore, HBase in this case. Moreover,
> the recently added high availability algorithm allows to eliminate the
> single point of failure represented by the TSO in those system
> deployments requiring a higher degree of dependability. Last but not
> least, the provided user API is very simple, mimicking transaction
> managers in the relational world: begin, commit, rollback.
>
>
> Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
> management platform powering some of next-generation search and
> personalization products is using Omid as a transaction manager in its
> processing pipeline. Sieve essentially acts as a huge processing hub
> between content feeds and serving systems. It provides an environment
> for highly customizable, real-time, streamed information processing,
> with typical discovery-to-service latencies of just a few seconds. In
> terms of scale and availability, Omid’s new design was largely driven
> by Sieve’s requirements.
>
>
> At Yahoo, we are also making an effort to disseminate the current
> status of the project through blog entries (See [3], [4] and [5]) and
> submissions to technical and academic conferences such as ATC 2016,
> Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
> appeared in a TechCrunch article in the last quarter of 2015 (See [6])
>
>
> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
> Distributed Transactions and Notifications. USENIX Symposium on
> Operating Systems Design and Implementation, 2010
>
> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
> Omid: Lock-free transactional support for distributed data stores. In
> Proc. of ICDE, 2013.
>
> [3]
> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
>
> [4]
> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
>
> [5]
> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
>
> [6]
> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
>
>
> === Rationale ===
>
> Programming with ACID (Atomicity, Consistency, Isolation, Durability)
> transactions is very popular and it is featured in relational
> databases. However, in the Big Data ecosystem, applications typically
> use NoSQL datastores, which do not provide ACID transactions. Such
> NoSQL datastores used to give up transactional support for greater
> agility and scalability. However, while early NoSQL data store
> implementations did not include transaction support, the need for
> transactions soon emerged in Big Data applications when accessing
> shared data; for  example, transactions are very important  for
> modern, scalable systems that process content incrementally.
>
>
> NoSQL datastores -including HBase- don’t provide transactional
> frameworks to coordinate the access to the underlying data for
> preserving consistency. By using Omid, Big Data applications that need
> to bundle multiple read and write operations on HBase into logically
> indivisible units of work can execute transactions with ACID
> properties, just as they would use transactions in the relational
> database world. Omid extends the HBase key-value access APl with
> transaction semantics. It can be exercised either directly, or via
> higher level data management API’s. For example, Apache Phoenix
> (SQL-on-top-of-HBase) might use Omid as its transaction management
> component.
>
>
> The following features make Omid an attractive choice for system
> designers and other projects in the Apache community:
>
>
> * Semantics. Omid implements Snapshot Isolation (SI,) supported by
> major SQL and NoSQL technologies (e.g. Google Percolator).
>
>
> * Performance and Scalability. Omid  provides a highly scalable,
> lock-free implementation of SI. To the best of our knowledge, it is
> also one of the few open source NoSQL transactional platforms that can
> execute more than 100K transactions per second [1]. A new prototype
> still in development can go even further, up to ~380K TPS.
>
>
> * Reliability.  Omid has a high-availability (HA) mode, in which the
> core service performing writeset conflict resolution operates as
> primary-backup process pair with automatic failover. The HA support
> has zero overhead on the mainstream operation.
>
>
> * Adaptability. Omid current version provides transactions on data
> stored in Apache HBase. However, Omid’s components are generic enough
> to be adapted to any other key-value NoSQL datasource that supports
> MVCC.
>
>
> * Development. Omid provides a very simple interface that mimics
> standard HBase APIs, making it developer friendly. Only minimal
> extensions to the standard interfaces have been introduced to enable
> transactions.
>
>
> * Simplicity. Omid leverages the HBase infrastructure for managing its
> own metadata. It entails no additional services apart from those
> provided and used by HBase.
>
>
> * Track Record. As we have mentioned, Omid is already in use by
> very-large-scale production systems at Yahoo. Also, Hortonworks is
> integrating Omid in a metastore implementation for Hive based on
> HBase.
>
> [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
>
>
> === Current Status ===
> Current Omid implementation is available in both, Yahoo’s internal
> Github repository for internal use at Yahoo as well as in Yahoo’s
> Github public repository (https://github.com/yahoo/omid.git). Both
> repositories are managed by Omid’s current developers at Yahoo.
>
> As it is mentioned above, Yahoo is currently using Omid for providing
> transactions in Sieve, a web-scale content management platform that
> powers Yahoo’s next-generation search and personalization products.
>
>
> ==== Meritocracy ====
> The first version of Omid was originally created in 2011 by Maysam
> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
> Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>
>
> During the years after its inception, Omid has matured to operate at
> Web scale and has been used internally by strategic projects at Yahoo
> such as Sieve. The current base of committers belong to the Yahoo team
> that took over the initial Omid prototype and rewrote it to meet the
> high availability and scalability requirements of the Sieve project.
> This base of committers has recently incorporated Hortonworks members
> that helped in the Omid adaptation to HBase 1.x versions.
>
>
> With this initial committer base, we aim to form a larger community
> that can collaborate with new ideas over the current code base. This
> new community will run the project following the "Apache Way"
> (http://apache.org/foundation/governance/). Users and new contributors
> will be treated with respect and welcomed. To grow the community, we
> will encourage contributors to provide patches, review code, propose
> new features improvements, talk at conferences such as Hadoop Summit,
> HBaseCon, ApacheCon, etc. Committership and PMC membership will be
> offered according to meritocracy.
>
> ==== Community ====
>
> The public Yahoo Omid repository at Github currently has 241 Stars and
> 93 forks, which means that there is an important interest for the
> project in the open-source community, at least compared with other
> similar projects (See https://github.com/yahoo/omid.git).
>
>
> Recently, Hortonworks contributors to the Apache Hive project which
> are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
> manifested interest in using Omid. We started with them a fruitful
> collaboration that resulted in Omid supporting HBase 1.x versions.
>
>
> Salesforce is also interested in collaborating in doing a Proof of
> Concept for integrating Omid as a pluggable transaction manager in
> Apache Phoenix.
>
>
> Yahoo, Hortonworks and Salesforce participants will constitute the
> initial set of committers and mentors for the proposal.
>
> ==== Core Developers ====
> The core developers of Omid are all skilled software developers and
> research engineers at Yahoo Inc. and Hortonworks with years of
> experiences in their fields. At this moment, developers are
> distributed across U.S. and Israel. The aim is to incorporate more
> committers from different organizations and locations over time.
>
>
> The current set of developers include experienced committers from
> Apache HBase, Hive and Hadoop projects that have been working with us
> in the current codebase found in Github.
>
> Finally, some of the core developers are currently NOT affiliated with
> the ASF and would require new ICLAs to be filed.
>
>
> === Alignment ===
> Omid enhances with transactions the already successful Apache HBase
> datastore project. We have collaborated with other developers inside
> and outside Yahoo which are involved in the Apache HBase community, so
> we have had reliable feedback from them.
>
> Although Omid brings value into HBase, the design of the current
> version provides a general transaction scheme that can potentially be
> adapted to other MVCC key-value datastores such as Apache Cassandra.
>
>
> Apache Phoenix is also a potential target. Phoenix is a SQL layer on
> top of HBase that can potentially integrate Omid in order to provide
> the well-know concept of transactions to Phoenix-based applications.
>
>
> === Known Risks ===
> ==== Orphaned products ====
> Yahoo’s Research and Search organizations have been taking care of
> Omid development since the first prototype creation in 2011. Yahoo has
> a long history participating in open-source projects, and has been
> also a long time contributor to the Apache community. For example, in
> Apache, Yahoo is an important contributor in many projects in the
> Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
> open-sourced other well-known projects outside Hadoop, such as
> Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
> Omid also a successful open-source Apache product. If this happens, we
> are sure that a larger community will be formed around the project in
> a relatively short period of time, contributing to the diversification
> and stabilization of the base of committers.
>
>
> ==== Inexperience with Open Source ====
> This project has long standing experienced mentors and interested
> contributors from Apache HBase, Hive and Phoenix to help us moving
> through the open source process. We are actively working with
> experienced Apache community members to improve our project and
> further testing.
>
> ==== Homogeneous Developers ====
> Omid has been supported by Yahoo since its inception in 2011. However,
> all current committers are employed by their respective companies
> shown in the Affiliations section.
>
>
> ==== Reliance on Salaried Developers ====
>
> All the current developers are paid by their employers to contribute
> to this project. Yahoo developers will also continuing maintaining the
> internal Omid repository at their company.
>
> Of course, other developers are welcomed to contribute to this project
> after it is open sourced in Apache.
>
> ==== Relationships with Other Apache Product ====
>
> Current Omid incarnation serves transactional contexts to applications
> storing their data in HBase. However Omid design potentially allows to
> be adapted to serve transactions on top of other MVCC-based key-value
> datastores in Apache community such as Cassandra.
>
>
> As a transactional framework, many other Apache projects such as
> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
> potentially benefit from Omid to get transactional contexts. In
> particular, Apache Phoenix -a SQL layer on top of HBase- might use
> Omid as its transaction management component. Once we open source Omid
> as an Apache project, we expect to generate more interest in the
> surrounded communities.
>
>
> Very recently, a new incubator proposal for a similar project called
> Tephra, has been submitted to the ASF. We think this is good for the
> Apache community, and we believe that there’s room for both proposals
> as the design of each of them is based on different principles (e.g.
> Omid does not require to maintain the state of ongoing transactions on
> the server-side component) and due to the fact that both -Tephra and
> Omid- have also gained certain traction in the open-source community.
>
>
> With regard to the Apache projects that Omid uses, apart from HBase,
> Omid relies on Apache Zookeeper and Curator projects in order to
> coordinate the (re)connection of transaction managers (acting as
> clients) to the conflict resolution component for transactions (server
> side.) They’re also used in order to coordinate the master and backup
> replicas in high availability scenarios.
>
>
> ==== An Excessive Fascination with the Apache Brand ====
>
> We are applying to the Incubator process because we think that it is
> the logical next step for the  Omid project after we open-sourced the
> code in Github some years ago. Yahoo has a long-standing history of
> contributing to Apache projects. The developers and contributors
> understand the implications of making it an Apache project, and
> strongly believe that the growing community can benefit from the
> Apache environment, ecosystem, and infrastrastructure.
>
>
> === Documentation ===
> Current documentation about the project is available in the wiki of
> Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
> be moved under https://omid.incubator.apache.org/docs if the project
> is accepted as an Apache Incubator.
>
> === Initial Source ===
> Initial source code is currently hosted in Github for general viewing
> and contribution:
>
> https://github.com/yahoo/omid.git
>
>
> Omid source code is written in Java code (99%) mixed with some shell
> script (1%) in order to configure and trigger the execution of main
> components.
>
>
> The code will be moved to Apache http://git.apache.org/ if accepted as
> an Incubator project.
>
> === Source and Intellectual Property Submission Plan ===
>
> The current Omid License for the code published in Github is Apache
> 2.0. If Omid fulfills and passes the conditions for being an Incubator
> project in the ASF, the source code will be transitioned via the
> Software Grant Agreement onto the ASF infrastructure and in turn made
> available under the Apache License, version 2.0.
>
> === External Dependencies ===
>
>
> The required external dependencies that are not Apache projects are
> all Apache licenses or other compatible Licenses:
>
> Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
>
> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
>
> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>
> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
>
> Testng v6.8.8  (http://testng.org) [Apache 2.0]
>
> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>
> Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>
> Google Protocol Buffers v2.5.0
> (https://developers.google.com/protocol-buffers/) [BSD License]
>
> Mockito (http://mockito.org/) v1.9.5 [MIT License]
>
> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
> [Apache 2.0]
>
> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>
> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>
> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>
>
> === Cryptography ===
> Omid project does not use cryptography itself. However, Apache HBase
> -the datastore on top of which Omid works in its current version- uses
> standard APIs and tools for SSH and SSL communication where necessary.
>
> === Required Resources ===
> We request that following resources be created for the project to use:
>
> ==== Mailing lists ====
>
> omid-private (moderated subscriptions)
>
> omid-commits (commit notification)
> omid-dev (technical discussions)
>
> ==== Git repository ====
> https://github.com/apache/incubator-omid
>
> ==== Documentation ====
> https://omid.incubator.apache.org/docs/
>
> ==== JIRA instance ====
> https://issues.apache.org/jira/browse/omid
>
> === Initial Committers ===
>
> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>
>
> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>
>
> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>
>
> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>
>
> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>
>
> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>
> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>
>
> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>
>
> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>
>
> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>
> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>
>
> === Additional Interested Contributors ===
> * Ivan Kelly (ivank<AT>apache<DOT>org)
>
> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>
>
> === Affiliations ===
>
> * Edward Bortnikov, Yahoo Inc.
>
>
> * Daniel Dai, Hortonworks
>
>
> * Flavio P. Junqueira, Confluent
>
>
> * Igor Katkov, Yahoo Inc.
>
>
> * Ivan Kelly, Midokura
>
>
> * Francis C. Liu, Yahoo Inc.
>
>
> * Sameer Paranjpye, Arimo
>
> * Francisco Perez-Sorrosal, Yahoo Inc.
>
>
> * Ohad Shacham, Yahoo Inc.
>
>
> * Maysam Yabandeh, Dropbox Inc.
>
>
> === Sponsors ===
>
> ==== Champion ====
>
> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>
> ==== Nominated Mentors ====
>
> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>
> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>
> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>
> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>
> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>
>
> ==== Sponsoring Entity ====
> Apache Incubator PMC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by "Gangumalla, Uma" <um...@intel.com>.
Proposal looks very good and clear. This project should be a good addition.

+1 (non-binding)

Regards,
Uma

On 3/17/16, 1:17 PM, "Daniel Dai" <da...@gmail.com> wrote:

>Hi,
>
>I would like to propose Omid as an Apache Incubator project:
>
>https://wiki.apache.org/incubator/OmidProposal
>
>I've posted posted the text of the proposal below:
>
>Thanks,
>Daniel
>
>= Omid Proposal =
>
>=== Abstract ===
>
>Omid is a flexible, reliable, high performant and scalable ACID
>transactional framework that allows client applications to execute
>transactions on top of MVCC key/value-based NoSQL datastores
>(currently Apache HBase) providing Snapshot Isolation guarantees on
>the accessed data.
>
>
>=== Proposal ===
>
>Omid is a flexible open-source transactional framework that provides
>ACID transactions with Snapshot Isolation guarantees on top of NoSQL
>datastores. In particular, the current codebase brings the concept of
>transactions to the popular Apache HBase datastore. Omid offers great
>performance, it is highly available, and scalable. Omid's current
>version is able to scale to thousands of clients triggering concurrent
>transactions on application data stored in HBase. Omid can scale
>beyond 100K transactions per second on mid-range hardware while
>incurring in a minimal impact on the speed of data access in the
>datastore. We¹re currently experimenting with a prototype version that
>can improve the performance up to ~380K TPS.
>
>
>Omid has been publicly available as an open-source project in Github
>under Apache License Version 2.0 since 2011 [1]. During these years,
>it has generated certain interest in the open source community,
>especially since the public presentation of the first version in
>Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
>93 forks. Yahoo Inc. submits this proposal to the Apache Software
>Foundation with the aim to transfer the Omid project -including its
>source code and documentation- to Apache in order to start the build
>of a stable open source community around it.
>
>
>[1] https://github.com/yahoo/omid
>
>[2] Omid presentation at Hadoop Summit 2013:
>https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyq
>LU464Nxz4aQe7EPBus
>
>
>=== Background ===
>
>An Omid prototype was first released as an open-source project back in
>2011. Inspired by Google Percolator [1], it offered a lock-free
>approach to transactions in NoSQL datastores (See [2]). However,
>during these years, the design of Omid has evolved significantly.
>Whilst the current open-sourced version maintains many aspects of the
>original implementation, it is the result of a major redesign of the
>first prototype released in 2011.
>
>
>Omid has now a more decentralized design that does not sacrifice the
>consistency and performance of the original version. The current
>design also enables Omid to scale to thousands of clients executing
>transactions concurrently on application data stored in HBase.
>Internally, Omid still utilizes a lock-free approach to support
>multiple concurrent clients. Its design also relies on a centralized
>conflict detection component, the TSO, which now resolves in an
>efficient manner writeset collisions among concurrent transactions
>without having to piggyback commit information to the clients. Another
>important benefit of Omid is that it doesn't require any modification
>of the underlying key-value datastore, HBase in this case. Moreover,
>the recently added high availability algorithm allows to eliminate the
>single point of failure represented by the TSO in those system
>deployments requiring a higher degree of dependability. Last but not
>least, the provided user API is very simple, mimicking transaction
>managers in the relational world: begin, commit, rollback.
>
>
>Omid is used internally at Yahoo. Sieve, Yahoo¹s web-scale content
>management platform powering some of next-generation search and
>personalization products is using Omid as a transaction manager in its
>processing pipeline. Sieve essentially acts as a huge processing hub
>between content feeds and serving systems. It provides an environment
>for highly customizable, real-time, streamed information processing,
>with typical discovery-to-service latencies of just a few seconds. In
>terms of scale and availability, Omid¹s new design was largely driven
>by Sieve¹s requirements.
>
>
>At Yahoo, we are also making an effort to disseminate the current
>status of the project through blog entries (See [3], [4] and [5]) and
>submissions to technical and academic conferences such as ATC 2016,
>Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
>appeared in a TechCrunch article in the last quarter of 2015 (See [6])
>
>
>[1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
>Distributed Transactions and Notifications. USENIX Symposium on
>Operating Systems Design and Implementation, 2010
>
>[2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
>Omid: Lock-free transactional support for distributed data stores. In
>Proc. of ICDE, 2013.
>
>[3] 
>http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transacti
>on-processing-for
>
>[4] 
>http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-prot
>ocol
>
>[5] 
>http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
>
>[6] 
>http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-sc
>alable-transaction-processing-to-hbase/
>
>
>=== Rationale ===
>
>Programming with ACID (Atomicity, Consistency, Isolation, Durability)
>transactions is very popular and it is featured in relational
>databases. However, in the Big Data ecosystem, applications typically
>use NoSQL datastores, which do not provide ACID transactions. Such
>NoSQL datastores used to give up transactional support for greater
>agility and scalability. However, while early NoSQL data store
>implementations did not include transaction support, the need for
>transactions soon emerged in Big Data applications when accessing
>shared data; for  example, transactions are very important  for
>modern, scalable systems that process content incrementally.
>
>
>NoSQL datastores -including HBase- don¹t provide transactional
>frameworks to coordinate the access to the underlying data for
>preserving consistency. By using Omid, Big Data applications that need
>to bundle multiple read and write operations on HBase into logically
>indivisible units of work can execute transactions with ACID
>properties, just as they would use transactions in the relational
>database world. Omid extends the HBase key-value access APl with
>transaction semantics. It can be exercised either directly, or via
>higher level data management API¹s. For example, Apache Phoenix
>(SQL-on-top-of-HBase) might use Omid as its transaction management
>component.
>
>
>The following features make Omid an attractive choice for system
>designers and other projects in the Apache community:
>
>
>* Semantics. Omid implements Snapshot Isolation (SI,) supported by
>major SQL and NoSQL technologies (e.g. Google Percolator).
>
>
>* Performance and Scalability. Omid  provides a highly scalable,
>lock-free implementation of SI. To the best of our knowledge, it is
>also one of the few open source NoSQL transactional platforms that can
>execute more than 100K transactions per second [1]. A new prototype
>still in development can go even further, up to ~380K TPS.
>
>
>* Reliability.  Omid has a high-availability (HA) mode, in which the
>core service performing writeset conflict resolution operates as
>primary-backup process pair with automatic failover. The HA support
>has zero overhead on the mainstream operation.
>
>
>* Adaptability. Omid current version provides transactions on data
>stored in Apache HBase. However, Omid¹s components are generic enough
>to be adapted to any other key-value NoSQL datasource that supports
>MVCC.
>
>
>* Development. Omid provides a very simple interface that mimics
>standard HBase APIs, making it developer friendly. Only minimal
>extensions to the standard interfaces have been introduced to enable
>transactions.
>
>
>* Simplicity. Omid leverages the HBase infrastructure for managing its
>own metadata. It entails no additional services apart from those
>provided and used by HBase.
>
>
>* Track Record. As we have mentioned, Omid is already in use by
>very-large-scale production systems at Yahoo. Also, Hortonworks is
>integrating Omid in a metastore implementation for Hive based on
>HBase.
>
>[1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
>
>
>=== Current Status ===
>Current Omid implementation is available in both, Yahoo¹s internal
>Github repository for internal use at Yahoo as well as in Yahoo¹s
>Github public repository (https://github.com/yahoo/omid.git). Both
>repositories are managed by Omid¹s current developers at Yahoo.
>
>As it is mentioned above, Yahoo is currently using Omid for providing
>transactions in Sieve, a web-scale content management platform that
>powers Yahoo¹s next-generation search and personalization products.
>
>
>==== Meritocracy ====
>The first version of Omid was originally created in 2011 by Maysam
>Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
>Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>
>
>During the years after its inception, Omid has matured to operate at
>Web scale and has been used internally by strategic projects at Yahoo
>such as Sieve. The current base of committers belong to the Yahoo team
>that took over the initial Omid prototype and rewrote it to meet the
>high availability and scalability requirements of the Sieve project.
>This base of committers has recently incorporated Hortonworks members
>that helped in the Omid adaptation to HBase 1.x versions.
>
>
>With this initial committer base, we aim to form a larger community
>that can collaborate with new ideas over the current code base. This
>new community will run the project following the "Apache Way"
>(http://apache.org/foundation/governance/). Users and new contributors
>will be treated with respect and welcomed. To grow the community, we
>will encourage contributors to provide patches, review code, propose
>new features improvements, talk at conferences such as Hadoop Summit,
>HBaseCon, ApacheCon, etc. Committership and PMC membership will be
>offered according to meritocracy.
>
>==== Community ====
>
>The public Yahoo Omid repository at Github currently has 241 Stars and
>93 forks, which means that there is an important interest for the
>project in the open-source community, at least compared with other
>similar projects (See https://github.com/yahoo/omid.git).
>
>
>Recently, Hortonworks contributors to the Apache Hive project which
>are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
>manifested interest in using Omid. We started with them a fruitful
>collaboration that resulted in Omid supporting HBase 1.x versions.
>
>
>Salesforce is also interested in collaborating in doing a Proof of
>Concept for integrating Omid as a pluggable transaction manager in
>Apache Phoenix.
>
>
>Yahoo, Hortonworks and Salesforce participants will constitute the
>initial set of committers and mentors for the proposal.
>
>==== Core Developers ====
>The core developers of Omid are all skilled software developers and
>research engineers at Yahoo Inc. and Hortonworks with years of
>experiences in their fields. At this moment, developers are
>distributed across U.S. and Israel. The aim is to incorporate more
>committers from different organizations and locations over time.
>
>
>The current set of developers include experienced committers from
>Apache HBase, Hive and Hadoop projects that have been working with us
>in the current codebase found in Github.
>
>Finally, some of the core developers are currently NOT affiliated with
>the ASF and would require new ICLAs to be filed.
>
>
>=== Alignment ===
>Omid enhances with transactions the already successful Apache HBase
>datastore project. We have collaborated with other developers inside
>and outside Yahoo which are involved in the Apache HBase community, so
>we have had reliable feedback from them.
>
>Although Omid brings value into HBase, the design of the current
>version provides a general transaction scheme that can potentially be
>adapted to other MVCC key-value datastores such as Apache Cassandra.
>
>
>Apache Phoenix is also a potential target. Phoenix is a SQL layer on
>top of HBase that can potentially integrate Omid in order to provide
>the well-know concept of transactions to Phoenix-based applications.
>
>
>=== Known Risks ===
>==== Orphaned products ====
>Yahoo¹s Research and Search organizations have been taking care of
>Omid development since the first prototype creation in 2011. Yahoo has
>a long history participating in open-source projects, and has been
>also a long time contributor to the Apache community. For example, in
>Apache, Yahoo is an important contributor in many projects in the
>Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
>open-sourced other well-known projects outside Hadoop, such as
>Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
>Omid also a successful open-source Apache product. If this happens, we
>are sure that a larger community will be formed around the project in
>a relatively short period of time, contributing to the diversification
>and stabilization of the base of committers.
>
>
>==== Inexperience with Open Source ====
>This project has long standing experienced mentors and interested
>contributors from Apache HBase, Hive and Phoenix to help us moving
>through the open source process. We are actively working with
>experienced Apache community members to improve our project and
>further testing.
>
>==== Homogeneous Developers ====
>Omid has been supported by Yahoo since its inception in 2011. However,
>all current committers are employed by their respective companies
>shown in the Affiliations section.
>
>
>==== Reliance on Salaried Developers ====
>
>All the current developers are paid by their employers to contribute
>to this project. Yahoo developers will also continuing maintaining the
>internal Omid repository at their company.
>
>Of course, other developers are welcomed to contribute to this project
>after it is open sourced in Apache.
>
>==== Relationships with Other Apache Product ====
>
>Current Omid incarnation serves transactional contexts to applications
>storing their data in HBase. However Omid design potentially allows to
>be adapted to serve transactions on top of other MVCC-based key-value
>datastores in Apache community such as Cassandra.
>
>
>As a transactional framework, many other Apache projects such as
>Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
>potentially benefit from Omid to get transactional contexts. In
>particular, Apache Phoenix -a SQL layer on top of HBase- might use
>Omid as its transaction management component. Once we open source Omid
>as an Apache project, we expect to generate more interest in the
>surrounded communities.
>
>
>Very recently, a new incubator proposal for a similar project called
>Tephra, has been submitted to the ASF. We think this is good for the
>Apache community, and we believe that there¹s room for both proposals
>as the design of each of them is based on different principles (e.g.
>Omid does not require to maintain the state of ongoing transactions on
>the server-side component) and due to the fact that both -Tephra and
>Omid- have also gained certain traction in the open-source community.
>
>
>With regard to the Apache projects that Omid uses, apart from HBase,
>Omid relies on Apache Zookeeper and Curator projects in order to
>coordinate the (re)connection of transaction managers (acting as
>clients) to the conflict resolution component for transactions (server
>side.) They¹re also used in order to coordinate the master and backup
>replicas in high availability scenarios.
>
>
>==== An Excessive Fascination with the Apache Brand ====
>
>We are applying to the Incubator process because we think that it is
>the logical next step for the  Omid project after we open-sourced the
>code in Github some years ago. Yahoo has a long-standing history of
>contributing to Apache projects. The developers and contributors
>understand the implications of making it an Apache project, and
>strongly believe that the growing community can benefit from the
>Apache environment, ecosystem, and infrastrastructure.
>
>
>=== Documentation ===
>Current documentation about the project is available in the wiki of
>Omid¹s Github repository: https://github.com/yahoo/omid/wiki . It will
>be moved under https://omid.incubator.apache.org/docs if the project
>is accepted as an Apache Incubator.
>
>=== Initial Source ===
>Initial source code is currently hosted in Github for general viewing
>and contribution:
>
>https://github.com/yahoo/omid.git
>
>
>Omid source code is written in Java code (99%) mixed with some shell
>script (1%) in order to configure and trigger the execution of main
>components.
>
>
>The code will be moved to Apache http://git.apache.org/ if accepted as
>an Incubator project.
>
>=== Source and Intellectual Property Submission Plan ===
>
>The current Omid License for the code published in Github is Apache
>2.0. If Omid fulfills and passes the conditions for being an Incubator
>project in the ASF, the source code will be transitioned via the
>Software Grant Agreement onto the ASF infrastructure and in turn made
>available under the Apache License, version 2.0.
>
>=== External Dependencies ===
>
>
>The required external dependencies that are not Apache projects are
>all Apache licenses or other compatible Licenses:
>
>Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
>
>JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
>
>Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>
>Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
>
>Testng v6.8.8  (http://testng.org) [Apache 2.0]
>
>SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>
>Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>
>Google Protocol Buffers v2.5.0
>(https://developers.google.com/protocol-buffers/) [BSD License]
>
>Mockito (http://mockito.org/) v1.9.5 [MIT License]
>
>LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
>[Apache 2.0]
>
>Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
>(http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>
>C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>
>Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>
>
>=== Cryptography ===
>Omid project does not use cryptography itself. However, Apache HBase
>-the datastore on top of which Omid works in its current version- uses
>standard APIs and tools for SSH and SSL communication where necessary.
>
>=== Required Resources ===
>We request that following resources be created for the project to use:
>
>==== Mailing lists ====
>
>omid-private (moderated subscriptions)
>
>omid-commits (commit notification)
>omid-dev (technical discussions)
>
>==== Git repository ====
>https://github.com/apache/incubator-omid
>
>==== Documentation ====
>https://omid.incubator.apache.org/docs/
>
>==== JIRA instance ====
>https://issues.apache.org/jira/browse/omid
>
>=== Initial Committers ===
>
>* Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>
>
>* Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>
>
>* Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>
>
>* Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>
>
>* Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>
>
>* Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>
>* Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>
>
>* Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>
>
>* Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>
>
>* Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>
>* James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>
>
>=== Additional Interested Contributors ===
>* Ivan Kelly (ivank<AT>apache<DOT>org)
>
>* Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>
>
>=== Affiliations ===
>
>* Edward Bortnikov, Yahoo Inc.
>
>
>* Daniel Dai, Hortonworks
>
>
>* Flavio P. Junqueira, Confluent
>
>
>* Igor Katkov, Yahoo Inc.
>
>
>* Ivan Kelly, Midokura
>
>
>* Francis C. Liu, Yahoo Inc.
>
>
>* Sameer Paranjpye, Arimo
>
>* Francisco Perez-Sorrosal, Yahoo Inc.
>
>
>* Ohad Shacham, Yahoo Inc.
>
>
>* Maysam Yabandeh, Dropbox Inc.
>
>
>=== Sponsors ===
>
>==== Champion ====
>
>Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>
>==== Nominated Mentors ====
>
>Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>
>Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>
>Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>
>Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>
>James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>
>
>==== Sponsoring Entity ====
>Apache Incubator PMC
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>For additional commands, e-mail: general-help@incubator.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] [PROPOSAL] Omid for Apache Incubator

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
+1 (binding)

Regards
JB

On 03/17/2016 09:17 PM, Daniel Dai wrote:
> Hi,
>
> I would like to propose Omid as an Apache Incubator project:
>
> https://wiki.apache.org/incubator/OmidProposal
>
> I've posted posted the text of the proposal below:
>
> Thanks,
> Daniel
>
> = Omid Proposal =
>
> === Abstract ===
>
> Omid is a flexible, reliable, high performant and scalable ACID
> transactional framework that allows client applications to execute
> transactions on top of MVCC key/value-based NoSQL datastores
> (currently Apache HBase) providing Snapshot Isolation guarantees on
> the accessed data.
>
>
> === Proposal ===
>
> Omid is a flexible open-source transactional framework that provides
> ACID transactions with Snapshot Isolation guarantees on top of NoSQL
> datastores. In particular, the current codebase brings the concept of
> transactions to the popular Apache HBase datastore. Omid offers great
> performance, it is highly available, and scalable. Omid's current
> version is able to scale to thousands of clients triggering concurrent
> transactions on application data stored in HBase. Omid can scale
> beyond 100K transactions per second on mid-range hardware while
> incurring in a minimal impact on the speed of data access in the
> datastore. We’re currently experimenting with a prototype version that
> can improve the performance up to ~380K TPS.
>
>
> Omid has been publicly available as an open-source project in Github
> under Apache License Version 2.0 since 2011 [1]. During these years,
> it has generated certain interest in the open source community,
> especially since the public presentation of the first version in
> Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
> 93 forks. Yahoo Inc. submits this proposal to the Apache Software
> Foundation with the aim to transfer the Omid project -including its
> source code and documentation- to Apache in order to start the build
> of a stable open source community around it.
>
>
> [1] https://github.com/yahoo/omid
>
> [2] Omid presentation at Hadoop Summit 2013:
> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus
>
>
> === Background ===
>
> An Omid prototype was first released as an open-source project back in
> 2011. Inspired by Google Percolator [1], it offered a lock-free
> approach to transactions in NoSQL datastores (See [2]). However,
> during these years, the design of Omid has evolved significantly.
> Whilst the current open-sourced version maintains many aspects of the
> original implementation, it is the result of a major redesign of the
> first prototype released in 2011.
>
>
> Omid has now a more decentralized design that does not sacrifice the
> consistency and performance of the original version. The current
> design also enables Omid to scale to thousands of clients executing
> transactions concurrently on application data stored in HBase.
> Internally, Omid still utilizes a lock-free approach to support
> multiple concurrent clients. Its design also relies on a centralized
> conflict detection component, the TSO, which now resolves in an
> efficient manner writeset collisions among concurrent transactions
> without having to piggyback commit information to the clients. Another
> important benefit of Omid is that it doesn't require any modification
> of the underlying key-value datastore, HBase in this case. Moreover,
> the recently added high availability algorithm allows to eliminate the
> single point of failure represented by the TSO in those system
> deployments requiring a higher degree of dependability. Last but not
> least, the provided user API is very simple, mimicking transaction
> managers in the relational world: begin, commit, rollback.
>
>
> Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content
> management platform powering some of next-generation search and
> personalization products is using Omid as a transaction manager in its
> processing pipeline. Sieve essentially acts as a huge processing hub
> between content feeds and serving systems. It provides an environment
> for highly customizable, real-time, streamed information processing,
> with typical discovery-to-service latencies of just a few seconds. In
> terms of scale and availability, Omid’s new design was largely driven
> by Sieve’s requirements.
>
>
> At Yahoo, we are also making an effort to disseminate the current
> status of the project through blog entries (See [3], [4] and [5]) and
> submissions to technical and academic conferences such as ATC 2016,
> Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
> appeared in a TechCrunch article in the last quarter of 2015 (See [6])
>
>
> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
> Distributed Transactions and Notifications. USENIX Symposium on
> Operating Systems Design and Implementation, 2010
>
> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
> Omid: Lock-free transactional support for distributed data stores. In
> Proc. of ICDE, 2013.
>
> [3] http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
>
> [4] http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
>
> [5] http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
>
> [6] http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/
>
>
> === Rationale ===
>
> Programming with ACID (Atomicity, Consistency, Isolation, Durability)
> transactions is very popular and it is featured in relational
> databases. However, in the Big Data ecosystem, applications typically
> use NoSQL datastores, which do not provide ACID transactions. Such
> NoSQL datastores used to give up transactional support for greater
> agility and scalability. However, while early NoSQL data store
> implementations did not include transaction support, the need for
> transactions soon emerged in Big Data applications when accessing
> shared data; for  example, transactions are very important  for
> modern, scalable systems that process content incrementally.
>
>
> NoSQL datastores -including HBase- don’t provide transactional
> frameworks to coordinate the access to the underlying data for
> preserving consistency. By using Omid, Big Data applications that need
> to bundle multiple read and write operations on HBase into logically
> indivisible units of work can execute transactions with ACID
> properties, just as they would use transactions in the relational
> database world. Omid extends the HBase key-value access APl with
> transaction semantics. It can be exercised either directly, or via
> higher level data management API’s. For example, Apache Phoenix
> (SQL-on-top-of-HBase) might use Omid as its transaction management
> component.
>
>
> The following features make Omid an attractive choice for system
> designers and other projects in the Apache community:
>
>
> * Semantics. Omid implements Snapshot Isolation (SI,) supported by
> major SQL and NoSQL technologies (e.g. Google Percolator).
>
>
> * Performance and Scalability. Omid  provides a highly scalable,
> lock-free implementation of SI. To the best of our knowledge, it is
> also one of the few open source NoSQL transactional platforms that can
> execute more than 100K transactions per second [1]. A new prototype
> still in development can go even further, up to ~380K TPS.
>
>
> * Reliability.  Omid has a high-availability (HA) mode, in which the
> core service performing writeset conflict resolution operates as
> primary-backup process pair with automatic failover. The HA support
> has zero overhead on the mainstream operation.
>
>
> * Adaptability. Omid current version provides transactions on data
> stored in Apache HBase. However, Omid’s components are generic enough
> to be adapted to any other key-value NoSQL datasource that supports
> MVCC.
>
>
> * Development. Omid provides a very simple interface that mimics
> standard HBase APIs, making it developer friendly. Only minimal
> extensions to the standard interfaces have been introduced to enable
> transactions.
>
>
> * Simplicity. Omid leverages the HBase infrastructure for managing its
> own metadata. It entails no additional services apart from those
> provided and used by HBase.
>
>
> * Track Record. As we have mentioned, Omid is already in use by
> very-large-scale production systems at Yahoo. Also, Hortonworks is
> integrating Omid in a metastore implementation for Hive based on
> HBase.
>
> [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
>
>
> === Current Status ===
> Current Omid implementation is available in both, Yahoo’s internal
> Github repository for internal use at Yahoo as well as in Yahoo’s
> Github public repository (https://github.com/yahoo/omid.git). Both
> repositories are managed by Omid’s current developers at Yahoo.
>
> As it is mentioned above, Yahoo is currently using Omid for providing
> transactions in Sieve, a web-scale content management platform that
> powers Yahoo’s next-generation search and personalization products.
>
>
> ==== Meritocracy ====
> The first version of Omid was originally created in 2011 by Maysam
> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
> Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>
>
> During the years after its inception, Omid has matured to operate at
> Web scale and has been used internally by strategic projects at Yahoo
> such as Sieve. The current base of committers belong to the Yahoo team
> that took over the initial Omid prototype and rewrote it to meet the
> high availability and scalability requirements of the Sieve project.
> This base of committers has recently incorporated Hortonworks members
> that helped in the Omid adaptation to HBase 1.x versions.
>
>
> With this initial committer base, we aim to form a larger community
> that can collaborate with new ideas over the current code base. This
> new community will run the project following the "Apache Way"
> (http://apache.org/foundation/governance/). Users and new contributors
> will be treated with respect and welcomed. To grow the community, we
> will encourage contributors to provide patches, review code, propose
> new features improvements, talk at conferences such as Hadoop Summit,
> HBaseCon, ApacheCon, etc. Committership and PMC membership will be
> offered according to meritocracy.
>
> ==== Community ====
>
> The public Yahoo Omid repository at Github currently has 241 Stars and
> 93 forks, which means that there is an important interest for the
> project in the open-source community, at least compared with other
> similar projects (See https://github.com/yahoo/omid.git).
>
>
> Recently, Hortonworks contributors to the Apache Hive project which
> are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
> manifested interest in using Omid. We started with them a fruitful
> collaboration that resulted in Omid supporting HBase 1.x versions.
>
>
> Salesforce is also interested in collaborating in doing a Proof of
> Concept for integrating Omid as a pluggable transaction manager in
> Apache Phoenix.
>
>
> Yahoo, Hortonworks and Salesforce participants will constitute the
> initial set of committers and mentors for the proposal.
>
> ==== Core Developers ====
> The core developers of Omid are all skilled software developers and
> research engineers at Yahoo Inc. and Hortonworks with years of
> experiences in their fields. At this moment, developers are
> distributed across U.S. and Israel. The aim is to incorporate more
> committers from different organizations and locations over time.
>
>
> The current set of developers include experienced committers from
> Apache HBase, Hive and Hadoop projects that have been working with us
> in the current codebase found in Github.
>
> Finally, some of the core developers are currently NOT affiliated with
> the ASF and would require new ICLAs to be filed.
>
>
> === Alignment ===
> Omid enhances with transactions the already successful Apache HBase
> datastore project. We have collaborated with other developers inside
> and outside Yahoo which are involved in the Apache HBase community, so
> we have had reliable feedback from them.
>
> Although Omid brings value into HBase, the design of the current
> version provides a general transaction scheme that can potentially be
> adapted to other MVCC key-value datastores such as Apache Cassandra.
>
>
> Apache Phoenix is also a potential target. Phoenix is a SQL layer on
> top of HBase that can potentially integrate Omid in order to provide
> the well-know concept of transactions to Phoenix-based applications.
>
>
> === Known Risks ===
> ==== Orphaned products ====
> Yahoo’s Research and Search organizations have been taking care of
> Omid development since the first prototype creation in 2011. Yahoo has
> a long history participating in open-source projects, and has been
> also a long time contributor to the Apache community. For example, in
> Apache, Yahoo is an important contributor in many projects in the
> Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
> open-sourced other well-known projects outside Hadoop, such as
> Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
> Omid also a successful open-source Apache product. If this happens, we
> are sure that a larger community will be formed around the project in
> a relatively short period of time, contributing to the diversification
> and stabilization of the base of committers.
>
>
> ==== Inexperience with Open Source ====
> This project has long standing experienced mentors and interested
> contributors from Apache HBase, Hive and Phoenix to help us moving
> through the open source process. We are actively working with
> experienced Apache community members to improve our project and
> further testing.
>
> ==== Homogeneous Developers ====
> Omid has been supported by Yahoo since its inception in 2011. However,
> all current committers are employed by their respective companies
> shown in the Affiliations section.
>
>
> ==== Reliance on Salaried Developers ====
>
> All the current developers are paid by their employers to contribute
> to this project. Yahoo developers will also continuing maintaining the
> internal Omid repository at their company.
>
> Of course, other developers are welcomed to contribute to this project
> after it is open sourced in Apache.
>
> ==== Relationships with Other Apache Product ====
>
> Current Omid incarnation serves transactional contexts to applications
> storing their data in HBase. However Omid design potentially allows to
> be adapted to serve transactions on top of other MVCC-based key-value
> datastores in Apache community such as Cassandra.
>
>
> As a transactional framework, many other Apache projects such as
> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
> potentially benefit from Omid to get transactional contexts. In
> particular, Apache Phoenix -a SQL layer on top of HBase- might use
> Omid as its transaction management component. Once we open source Omid
> as an Apache project, we expect to generate more interest in the
> surrounded communities.
>
>
> Very recently, a new incubator proposal for a similar project called
> Tephra, has been submitted to the ASF. We think this is good for the
> Apache community, and we believe that there’s room for both proposals
> as the design of each of them is based on different principles (e.g.
> Omid does not require to maintain the state of ongoing transactions on
> the server-side component) and due to the fact that both -Tephra and
> Omid- have also gained certain traction in the open-source community.
>
>
> With regard to the Apache projects that Omid uses, apart from HBase,
> Omid relies on Apache Zookeeper and Curator projects in order to
> coordinate the (re)connection of transaction managers (acting as
> clients) to the conflict resolution component for transactions (server
> side.) They’re also used in order to coordinate the master and backup
> replicas in high availability scenarios.
>
>
> ==== An Excessive Fascination with the Apache Brand ====
>
> We are applying to the Incubator process because we think that it is
> the logical next step for the  Omid project after we open-sourced the
> code in Github some years ago. Yahoo has a long-standing history of
> contributing to Apache projects. The developers and contributors
> understand the implications of making it an Apache project, and
> strongly believe that the growing community can benefit from the
> Apache environment, ecosystem, and infrastrastructure.
>
>
> === Documentation ===
> Current documentation about the project is available in the wiki of
> Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will
> be moved under https://omid.incubator.apache.org/docs if the project
> is accepted as an Apache Incubator.
>
> === Initial Source ===
> Initial source code is currently hosted in Github for general viewing
> and contribution:
>
> https://github.com/yahoo/omid.git
>
>
> Omid source code is written in Java code (99%) mixed with some shell
> script (1%) in order to configure and trigger the execution of main
> components.
>
>
> The code will be moved to Apache http://git.apache.org/ if accepted as
> an Incubator project.
>
> === Source and Intellectual Property Submission Plan ===
>
> The current Omid License for the code published in Github is Apache
> 2.0. If Omid fulfills and passes the conditions for being an Incubator
> project in the ASF, the source code will be transitioned via the
> Software Grant Agreement onto the ASF infrastructure and in turn made
> available under the Apache License, version 2.0.
>
> === External Dependencies ===
>
>
> The required external dependencies that are not Apache projects are
> all Apache licenses or other compatible Licenses:
>
> Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
>
> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
>
> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>
> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
>
> Testng v6.8.8  (http://testng.org) [Apache 2.0]
>
> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>
> Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>
> Google Protocol Buffers v2.5.0
> (https://developers.google.com/protocol-buffers/) [BSD License]
>
> Mockito (http://mockito.org/) v1.9.5 [MIT License]
>
> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/) [Apache 2.0]
>
> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>
> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>
> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>
>
> === Cryptography ===
> Omid project does not use cryptography itself. However, Apache HBase
> -the datastore on top of which Omid works in its current version- uses
> standard APIs and tools for SSH and SSL communication where necessary.
>
> === Required Resources ===
> We request that following resources be created for the project to use:
>
> ==== Mailing lists ====
>
> omid-private (moderated subscriptions)
>
> omid-commits (commit notification)
> omid-dev (technical discussions)
>
> ==== Git repository ====
> https://github.com/apache/incubator-omid
>
> ==== Documentation ====
> https://omid.incubator.apache.org/docs/
>
> ==== JIRA instance ====
> https://issues.apache.org/jira/browse/omid
>
> === Initial Committers ===
>
> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>
>
> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>
>
> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>
>
> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>
>
> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>
>
> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>
> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>
>
> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>
>
> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>
>
> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>
> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>
>
> === Additional Interested Contributors ===
> * Ivan Kelly (ivank<AT>apache<DOT>org)
>
> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>
>
> === Affiliations ===
>
> * Edward Bortnikov, Yahoo Inc.
>
>
> * Daniel Dai, Hortonworks
>
>
> * Flavio P. Junqueira, Confluent
>
>
> * Igor Katkov, Yahoo Inc.
>
>
> * Ivan Kelly, Midokura
>
>
> * Francis C. Liu, Yahoo Inc.
>
>
> * Sameer Paranjpye, Arimo
>
> * Francisco Perez-Sorrosal, Yahoo Inc.
>
>
> * Ohad Shacham, Yahoo Inc.
>
>
> * Maysam Yabandeh, Dropbox Inc.
>
>
> === Sponsors ===
>
> ==== Champion ====
>
> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>
> ==== Nominated Mentors ====
>
> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>
> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>
> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>
> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>
> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>
>
> ==== Sponsoring Entity ====
> Apache Incubator PMC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org