You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Ioan Eugen Stan (Commented) (JIRA)" <ji...@apache.org> on 2012/01/27 12:09:40 UTC

[jira] [Commented] (MAILBOX-103) [gsoc2011] Design and implement Distributed UID generation

    [ https://issues.apache.org/jira/browse/MAILBOX-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194605#comment-13194605 ] 

Ioan Eugen Stan commented on MAILBOX-103:
-----------------------------------------

We can use ZooKeeper to implement this. Full thread: http://mail-archives.apache.org/mod_mbox/zookeeper-user/201201.mbox/%3CCAFvdMiCeMRxJaRg56zAFMRQSMB_oxRMzAYJ7e%3DJOQVf94Wscdg%40mail.gmail.com%3E

Use plain ZooKeeper and rely on znode version for sequence generation for both UID's and ModSeq.

This should scale very well with a single Zk ensemble to the number of
millions. After that we can use multiple Zk ensembles where each
ensemble should manage a shard of the mailboxes.

The first thing that comes to mind is the way Debian stores packages
[3], where they use the first letter of the package as a directory to
group all packages that start with the same name into a single
directory.
This way we can make an ensemble handle all mailboxes that start with
0-4 and another that handles 5-9. This way, considering the mailboxes
are generated uniformly, we can split the load in half so we have
horizontal scalability.

[1] http://zookeeper.apache.org/doc/current/zookeeperOver.html#fg_zkPerfReliability
[2] http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview
[3] ftp://ftp.be.debian.org/debian/pool/main

                
> [gsoc2011] Design and implement Distributed UID generation
> ----------------------------------------------------------
>
>                 Key: MAILBOX-103
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-103
>             Project: James Mailbox
>          Issue Type: New Feature
>          Components: hbase
>    Affects Versions: 0.4
>            Reporter: Eric Charles
>             Fix For: 0.4
>
>
> Context: IMAP4rev1 (RFC3501 requires that every message is identified by a stable 32-bit Unique Identifier (UID) assigned in incremental sequence. This is now achieved in James IMAP subproject (http://james.apache.org/imap) with a UidProvider interface implemented in memory. This implementation does not allow distributed working of the solution.
> Task: A DistributedUidProvider must be designed. The design can rely on a distributed memory cache such as hazelcast , or any other solution (hadoop, hbase, cassandra,...), and implemented.
> Mentor: eric at apache dot org
> Complexity: medium 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org