You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Ioan Eugen Stan (Created) (JIRA)" <ji...@apache.org> on 2012/03/28 22:11:27 UTC
[jira] [Created] (MAILBOX-173) Distribuited mailbox indexing over
HBase/HDFS
Distribuited mailbox indexing over HBase/HDFS
---------------------------------------------
Key: MAILBOX-173
URL: https://issues.apache.org/jira/browse/MAILBOX-173
Project: James Mailbox
Issue Type: New Feature
Components: hbase, lucene, store
Reporter: Ioan Eugen Stan
Assignee: Ioan Eugen Stan
James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
[1] http://www.infoq.com/articles/LuceneHbase
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
[jira] [Commented] (MAILBOX-173) [gsoc2012] Distribuited mailbox
indexing over HBase/HDFS
Posted by "Ioan Eugen Stan (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241022#comment-13241022 ]
Ioan Eugen Stan commented on MAILBOX-173:
-----------------------------------------
The directory implementation should accept a mailbox-id as parameter and use it to prefix the index for a mailbox to limit the search space to a single mailbox.
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
> Key: MAILBOX-173
> URL: https://issues.apache.org/jira/browse/MAILBOX-173
> Project: James Mailbox
> Issue Type: New Feature
> Components: hbase, lucene, store
> Reporter: Ioan Eugen Stan
> Assignee: Ioan Eugen Stan
> Labels: gsoc, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
[jira] [Commented] (MAILBOX-173) [gsoc2012] Distribuited mailbox
indexing over HBase/HDFS
Posted by "Mihai Soloi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400729#comment-13400729 ]
Mihai Soloi commented on MAILBOX-173:
-------------------------------------
Implemented an HBaseDirectory and HBase IndexInput and IndexOutput, looking into HBASE-3529 for a more optimal approach on distributed searching also emailed Lucene dev mailing list for problems with the checksum when trying to get an already open IndexReader.
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
> Key: MAILBOX-173
> URL: https://issues.apache.org/jira/browse/MAILBOX-173
> Project: James Mailbox
> Issue Type: New Feature
> Components: hbase, lucene, store
> Reporter: Ioan Eugen Stan
> Assignee: Ioan Eugen Stan
> Labels: gsoc, gsoc2012, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
[jira] [Commented] (MAILBOX-173) [gsoc2012] Distribuited mailbox
indexing over HBase/HDFS
Posted by "Ioan Eugen Stan (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242166#comment-13242166 ]
Ioan Eugen Stan commented on MAILBOX-173:
-----------------------------------------
Hi,
You should check out mailbox project. lucene-mailbox is responsible for indexing and searching. It exposes API that James server calls when a search is performed (usually triggered by a IMAP SEARCH command). An indexing mailet is usually is responsible for indexing the document.
Good luck,
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
> Key: MAILBOX-173
> URL: https://issues.apache.org/jira/browse/MAILBOX-173
> Project: James Mailbox
> Issue Type: New Feature
> Components: hbase, lucene, store
> Reporter: Ioan Eugen Stan
> Assignee: Ioan Eugen Stan
> Labels: gsoc, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
[jira] [Commented] (MAILBOX-173) [gsoc2012] Distribuited mailbox
indexing over HBase/HDFS
Posted by "Mihai Soloi (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247629#comment-13247629 ]
Mihai Soloi commented on MAILBOX-173:
-------------------------------------
Proposal submitted! Please take a look at the google melange: http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/mihaisoloi/1
Thank you Eugen for the help so far!
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
> Key: MAILBOX-173
> URL: https://issues.apache.org/jira/browse/MAILBOX-173
> Project: James Mailbox
> Issue Type: New Feature
> Components: hbase, lucene, store
> Reporter: Ioan Eugen Stan
> Assignee: Ioan Eugen Stan
> Labels: gsoc, gsoc2012, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
[jira] [Commented] (MAILBOX-173) [gsoc2012] Distribuited mailbox
indexing over HBase/HDFS
Posted by "Mihai Soloi (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247632#comment-13247632 ]
Mihai Soloi commented on MAILBOX-173:
-------------------------------------
Any input/suggestion is highly appreciated
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
> Key: MAILBOX-173
> URL: https://issues.apache.org/jira/browse/MAILBOX-173
> Project: James Mailbox
> Issue Type: New Feature
> Components: hbase, lucene, store
> Reporter: Ioan Eugen Stan
> Assignee: Ioan Eugen Stan
> Labels: gsoc, gsoc2012, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
[jira] [Commented] (MAILBOX-173) [gsoc2012] Distribuited mailbox
indexing over HBase/HDFS
Posted by "Mihai Soloi (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242149#comment-13242149 ]
Mihai Soloi commented on MAILBOX-173:
-------------------------------------
Hi Eugen, I've read the article and I am interested on this subject, i am currently playing with the James Server and will look for the way search should be implemented.
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
> Key: MAILBOX-173
> URL: https://issues.apache.org/jira/browse/MAILBOX-173
> Project: James Mailbox
> Issue Type: New Feature
> Components: hbase, lucene, store
> Reporter: Ioan Eugen Stan
> Assignee: Ioan Eugen Stan
> Labels: gsoc, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
[jira] [Updated] (MAILBOX-173) [gsoc2012] Distribuited mailbox
indexing over HBase/HDFS
Posted by "Ioan Eugen Stan (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ioan Eugen Stan updated MAILBOX-173:
------------------------------------
Summary: [gsoc2012] Distribuited mailbox indexing over HBase/HDFS (was: Distribuited mailbox indexing over HBase/HDFS)
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
> Key: MAILBOX-173
> URL: https://issues.apache.org/jira/browse/MAILBOX-173
> Project: James Mailbox
> Issue Type: New Feature
> Components: hbase, lucene, store
> Reporter: Ioan Eugen Stan
> Assignee: Ioan Eugen Stan
> Labels: gsoc, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org