You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "ioan eugen stan (JIRA)" <ji...@apache.org> on 2011/05/30 17:36:54 UTC
[jira] [Commented] (MAILBOX-72) Requirements for a distributed
mailbox implementation
[ https://issues.apache.org/jira/browse/MAILBOX-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041169#comment-13041169 ]
ioan eugen stan commented on MAILBOX-72:
----------------------------------------
This is a summary of a discution available on the server-dev@james.apache.org mailing list
For best results in implementing a mailbox storage over a distributed environment (namely HBase/HDFS), the following things must be taken in consideration:
- mailbox (immutable: create/read/delete/query)
- message (immutable: create/read/delete/query)
- message flags (create/read/update/delete/query)
- subscriptions (create/read/update/delete/query)
Important things regarding HBase:
- cells are versioned
- rows are sorted by row key - very important
- column families are physically stored in the same place and they should have the same access pattern (just read, or read/write)
- all column families must be created with the table
- columns may be added on the fly to column families.
Some useful tips for choosing keys and column names:
- you can use reverse domain name to keep things in a proper sorted order (like org.apache@username)
- you can use reverse order time stamp (Long.MAX_VALUE - epoch) to keep the newest records first (get the latest emails first).
- use binary data instead of string representation if key is integer numeric value.
A paper detailing a sample data schema for Cassandra is available in [1].
Some reading regarding about Big data column stores:
[1] http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf
[2] Hadoop the definitive guide second edition - The HBase chapter.
[3] http://db.csail.mit.edu/projects/cstore/abadicidr07.pdf
[4] http://en.wikipedia.org/wiki/Column-oriented_DBMS
> Requirements for a distributed mailbox implementation
> -----------------------------------------------------
>
> Key: MAILBOX-72
> URL: https://issues.apache.org/jira/browse/MAILBOX-72
> Project: James Mailbox
> Issue Type: New Feature
> Reporter: Eric Charles
> Assignee: Norman Maurer
>
> This JIRA will collect some generic technical requirements regarding a distributed mailbox implementation, whatever the implementation technology is.
> The implementator is responsible to enforce those requirements.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org