You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Manish Rai Jain <ma...@gmail.com> on 2005/06/13 04:04:07 UTC

James-Imap proposal [Jason Webb]

Hello Sir
I am a student of Nanyang Technological University, Singapore. I have
createad a proposal for this project. Can u please check this proposal
for the SEARCH command of IMAP RFC3501.
***********************
SEARCH command implementation for James-imap server.
According to RFC 3501 IMAPv4

Implementation Schedule (assume 1 week less for w/o a db scheme)
Week 1 and 2: 
1. Installation of James-IMAP sever and familiarization. 
2. Familiarization with RFC3501(IMAPv4) and RFC2822(Internet Message Format).
3. Source code familiarization.
Week 3: 
4. Implementation of recognition functions for the 'search keys'
[RFC3501 'SEARCH command' section]
5. Implementation of QUERY creation functions (for mySQL db) using
these search keys.
Week 4 and 5: 
5. Implementation of an indexing thread (DAEMON specified below) and debugging. 
Week 6:
6. Integration and full module testing and debugging. 
Week 7 onwards:
7. The left time can be used to debug and beautify upon SEARCH command
implementation as well as other implemented IMAP commands (as by that
time, the student will have a good understanding of the IMAP server
code). The schedule will leave with plenty of time for better
implementation of JAMES-IMAP server.

Procedure (w/o a database) slow:
a. Recognise the command 'search keywords'.
b. Parse the data associated with keywords, and store the 'keyword -
data' pairs.
[USE Database]
c. Do a Priority Based searching. 
WHAT and WHY?
 RFC3501 specifies that 
	"When multiple keys are specified, the result is the intersection
      (AND function) of all the messages that match those keys."
Set priorities for the KEYWORDS according to the easiness of their
processing and comparison. Means, that time consuming text based
searches willbe given less priority.
d. Store each result's Message Sequence Number and return. 

Procedure(with a database, for eg. mySQL) fast:
a. Create queries from command (ANDing if multiple keys)
b. Execure queries and get the Message Sequence Numbers and return.
SEARCHING using db: FullText Search using mySQL db (or other efficient db)
INDEXING HOW? When a mail comes in or a mail is deleted, moved, etc.
put the MSN(Message Sequence Number)  along with the action(done) in a
LOGGER and 'REQUEST action' from an 'indexing daemon'. This
DAEMON(thread) will determine if the mailbox requesting action is
open(or active). If yes, then it will give the action a unique Action
Sequence Number and put it in an ACTIVE queue. This queue will be
serviced asap(Prioritized). Otherwise, This daemon will put the action
(giving a unique action sequence number) at the end of a WAITING queue
and continue servicing the queues. When it reaches this action id, it
will check out the LOGGER and index (or remove an index) the mail. The
indexing done is on a per folder basis i.e. 1 (mysql)table for each
folder (note that each subfolder will be considered different from
parent folder and also from each other). Thus, in this way, several
users who share this folder, can be provided search results w/o
separately indexing the mails and also restraining access to tables
(as with folders).

Optional: Separate plugins can be implemented which will retrieve data
in text/plain mimetype from (attached)documents such as .pdf, .doc,
etc. This data can then be indexed upto a certain size (for eg.
100kb). The plugin will determine what gets indexed using
string-matching patterns and other algorithms. This is similar to
google desktop search facility.

WHY MySQL?
We choose a relational database as a schema can be used to better
represent various parts of mails. It has also been tested to perform
better than non-relational db (like BerkeleyDB) as well as the servers
not using db. [Ref. An Analysis of Database-Driven Mail Servers Author
Nick Elprin and Bryan Parno - Harvard College]

What will be indexed?
persistent data[Body, Header, date etc.] . Flags may optionally be
stored in db; if found optimal. Otherwise, these flags will be checked
(from mailbox itself) against a retrieved list of MSNs from the db(in
multiple keywords).
****************
I am very much interested in the project and it will be great if u can
provide me some ideas/suggestions on this proposal. A fast reply will
surely help.
Thanks
Regards
Manish Rai Jain

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


RE: James-Imap proposal [Jason Webb]

Posted by Jason Webb <jw...@inovem.com>.
Be aware that the James IMAP server needs to support both db: & file:
repositories and that most people use the file repositories as their primary
store.
However that doesn't stop you providing optimizations if a DB repository is
used.

Good plan so far though, but know that you will have to do the file
repository work first. This is the reason I mentioned Lucene but this can be
argued about later as I'm not really hung up on it. Once this is done then
we can look at the interfaces etc., so we can better use the DB
functionality. MS SQL server & Oracle, for example, have excellent full text
services that might give us a serious speed boost. MS Exchange is very slow
on searches so I'm hoping we out do them in that department.

As a general comment I'm pleased with the general level of entries so far.
Most people have done their research and seem to know what they are up
against.

-- Jason 

> -----Original Message-----
> From: Manish Rai Jain [mailto:manishrjain@gmail.com]
> Sent: 13 June 2005 03:04
> To: server-dev@james.apache.org
> Subject: James-Imap proposal [Jason Webb]
> 
> Hello Sir
> I am a student of Nanyang Technological University, Singapore. I have
> createad a proposal for this project. Can u please check this proposal
> for the SEARCH command of IMAP RFC3501.
> ***********************
> SEARCH command implementation for James-imap server.
> According to RFC 3501 IMAPv4
> 
> Implementation Schedule (assume 1 week less for w/o a db scheme)
> Week 1 and 2:
> 1. Installation of James-IMAP sever and familiarization.
> 2. Familiarization with RFC3501(IMAPv4) and RFC2822(Internet Message
> Format).
> 3. Source code familiarization.
> Week 3:
> 4. Implementation of recognition functions for the 'search keys'
> [RFC3501 'SEARCH command' section]
> 5. Implementation of QUERY creation functions (for mySQL db) using
> these search keys.
> Week 4 and 5:
> 5. Implementation of an indexing thread (DAEMON specified below) and
> debugging.
> Week 6:
> 6. Integration and full module testing and debugging.
> Week 7 onwards:
> 7. The left time can be used to debug and beautify upon SEARCH command
> implementation as well as other implemented IMAP commands (as by that
> time, the student will have a good understanding of the IMAP server
> code). The schedule will leave with plenty of time for better
> implementation of JAMES-IMAP server.
> 
> Procedure (w/o a database) slow:
> a. Recognise the command 'search keywords'.
> b. Parse the data associated with keywords, and store the 'keyword -
> data' pairs.
> [USE Database]
> c. Do a Priority Based searching.
> WHAT and WHY?
>  RFC3501 specifies that
> 	"When multiple keys are specified, the result is the intersection
>       (AND function) of all the messages that match those keys."
> Set priorities for the KEYWORDS according to the easiness of their
> processing and comparison. Means, that time consuming text based
> searches willbe given less priority.
> d. Store each result's Message Sequence Number and return.
> 
> Procedure(with a database, for eg. mySQL) fast:
> a. Create queries from command (ANDing if multiple keys)
> b. Execure queries and get the Message Sequence Numbers and return.
> SEARCHING using db: FullText Search using mySQL db (or other efficient db)
> INDEXING HOW? When a mail comes in or a mail is deleted, moved, etc.
> put the MSN(Message Sequence Number)  along with the action(done) in a
> LOGGER and 'REQUEST action' from an 'indexing daemon'. This
> DAEMON(thread) will determine if the mailbox requesting action is
> open(or active). If yes, then it will give the action a unique Action
> Sequence Number and put it in an ACTIVE queue. This queue will be
> serviced asap(Prioritized). Otherwise, This daemon will put the action
> (giving a unique action sequence number) at the end of a WAITING queue
> and continue servicing the queues. When it reaches this action id, it
> will check out the LOGGER and index (or remove an index) the mail. The
> indexing done is on a per folder basis i.e. 1 (mysql)table for each
> folder (note that each subfolder will be considered different from
> parent folder and also from each other). Thus, in this way, several
> users who share this folder, can be provided search results w/o
> separately indexing the mails and also restraining access to tables
> (as with folders).
> 
> Optional: Separate plugins can be implemented which will retrieve data
> in text/plain mimetype from (attached)documents such as .pdf, .doc,
> etc. This data can then be indexed upto a certain size (for eg.
> 100kb). The plugin will determine what gets indexed using
> string-matching patterns and other algorithms. This is similar to
> google desktop search facility.
> 
> WHY MySQL?
> We choose a relational database as a schema can be used to better
> represent various parts of mails. It has also been tested to perform
> better than non-relational db (like BerkeleyDB) as well as the servers
> not using db. [Ref. An Analysis of Database-Driven Mail Servers Author
> Nick Elprin and Bryan Parno - Harvard College]
> 
> What will be indexed?
> persistent data[Body, Header, date etc.] . Flags may optionally be
> stored in db; if found optimal. Otherwise, these flags will be checked
> (from mailbox itself) against a retrieved list of MSNs from the db(in
> multiple keywords).
> ****************
> I am very much interested in the project and it will be great if u can
> provide me some ideas/suggestions on this proposal. A fast reply will
> surely help.
> Thanks
> Regards
> Manish Rai Jain
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org