You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "ioan eugen stan (JIRA)" <ji...@apache.org> on 2011/05/30 17:36:54 UTC

[jira] [Commented] (MAILBOX-72) Requirements for a distributed mailbox implementation

    [ https://issues.apache.org/jira/browse/MAILBOX-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041169#comment-13041169 ] 

ioan eugen stan commented on MAILBOX-72:
----------------------------------------

This is a summary of a discution available on the server-dev@james.apache.org mailing list

For best results in implementing a mailbox storage over a distributed environment (namely HBase/HDFS), the following things must be taken in consideration:

- mailbox (immutable: create/read/delete/query)
- message (immutable: create/read/delete/query)

- message flags (create/read/update/delete/query)
- subscriptions (create/read/update/delete/query)

Important things regarding HBase:

- cells are versioned
- rows are sorted by row key - very important
- column families are physically stored in the same place and they should have the same access pattern (just read, or read/write)
- all column families must be created with the table
- columns may be added on the fly to column families.

Some useful tips for choosing keys and column names:

- you can use reverse domain name to keep things in a proper sorted order (like org.apache@username)
- you can use reverse order time stamp (Long.MAX_VALUE - epoch) to keep the newest records first (get the latest emails first).
- use binary data instead of string representation if key is integer numeric value.

A paper detailing a sample data schema for Cassandra is available in [1]. 

Some reading regarding about Big data column stores:

[1] http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf
[2] Hadoop the definitive guide second edition - The HBase chapter. 
[3] http://db.csail.mit.edu/projects/cstore/abadicidr07.pdf
[4] http://en.wikipedia.org/wiki/Column-oriented_DBMS


> Requirements for a distributed mailbox implementation
> -----------------------------------------------------
>
>                 Key: MAILBOX-72
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-72
>             Project: James Mailbox
>          Issue Type: New Feature
>            Reporter: Eric Charles
>            Assignee: Norman Maurer
>
> This JIRA will collect some generic technical requirements regarding a distributed mailbox implementation, whatever the implementation  technology is.
> The implementator is responsible to enforce those requirements.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org