You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Robert Burrell Donkin (JIRA)" <ji...@apache.org> on 2011/04/06 20:29:05 UTC

[jira] [Commented] (MAILBOX-44) [gsoc2011] Design and implement a distributed mailbox using Hadoop

    [ https://issues.apache.org/jira/browse/MAILBOX-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016477#comment-13016477 ] 

Robert Burrell Donkin commented on MAILBOX-44:
----------------------------------------------

The Structure Of An Mail 
------------------------------------
Numerous RFCs describe the structure which emails should have. Though in the wild, wild web variations are encountered, it's important to read these standards to start to understand the data structure used by mail. 

Take a look at the Mime4J mail parser (http://james.apache.org/mime4j/index.html) and here's a selection of RFC to skim:

http://tools.ietf.org/html/rfc5322
http://tools.ietf.org/html/rfc5335
(and for historic reasons also: 
 http://tools.ietf.org/html/rfc5335 
 http://tools.ietf.org/html/rfc2822
 http://tools.ietf.org/html/rfc822)

http://tools.ietf.org/html/rfc2045
http://tools.ietf.org/html/rfc2184
http://tools.ietf.org/html/rfc2231
http://tools.ietf.org/html/rfc2046
http://tools.ietf.org/html/rfc2646
http://tools.ietf.org/html/rfc3676
http://tools.ietf.org/html/rfc3798
http://tools.ietf.org/html/rfc5147

> [gsoc2011] Design and implement a distributed mailbox using Hadoop
> ------------------------------------------------------------------
>
>                 Key: MAILBOX-44
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-44
>             Project: James Mailbox
>          Issue Type: New Feature
>            Reporter: Eric Charles
>            Assignee: Norman Maurer
>              Labels: gsoc2011
>
> Context: The mailbox subproject (http://james.apache.org/mailbox/) supports maildir, SQL database (via JPA) and Java Content Repository (JCR) as technology for mail storage. This flexibility is achieved thanks to a API design that abstracts mail storage from the mail protocols.
> Task: We need to implement mailbox storage as a distributed system on top of Hadoop HDFS. The James mailbox API will be used. A first step is to design how to interact with Hadoop (native api, gora incubator at apache,...) and deal with specific performance questions related to mail loading/parsing in a distributed system (use map/reduce or not, use existing local lucene indexes for search,...). The second step is to implement the HDFS mailbox (maildir mailbox is similar because is stores mails as a file and can be an inspiration). A single James server will still be deployed because we don't have any distributed UID generation.
> Mentor: eric at apache dot org
> Complexity: medium 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org