You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Ioan Eugen Stan (Created) (JIRA)" <ji...@apache.org> on 2012/03/28 22:11:27 UTC

[jira] [Created] (MAILBOX-173) Distribuited mailbox indexing over HBase/HDFS

Distribuited mailbox indexing over HBase/HDFS
---------------------------------------------

                 Key: MAILBOX-173
                 URL: https://issues.apache.org/jira/browse/MAILBOX-173
             Project: James Mailbox
          Issue Type: New Feature
          Components: hbase, lucene, store
            Reporter: Ioan Eugen Stan
            Assignee: Ioan Eugen Stan


James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.

In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].

[1] http://www.infoq.com/articles/LuceneHbase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


[jira] [Commented] (MAILBOX-173) [gsoc2012] Distribuited mailbox indexing over HBase/HDFS

Posted by "Ioan Eugen Stan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241022#comment-13241022 ] 

Ioan Eugen Stan commented on MAILBOX-173:
-----------------------------------------

The directory implementation should accept a mailbox-id as parameter and use it to prefix the index for a mailbox to limit the search space to a single mailbox.
                
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
>                 Key: MAILBOX-173
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-173
>             Project: James Mailbox
>          Issue Type: New Feature
>          Components: hbase, lucene, store
>            Reporter: Ioan Eugen Stan
>            Assignee: Ioan Eugen Stan
>              Labels: gsoc, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


[jira] [Commented] (MAILBOX-173) [gsoc2012] Distribuited mailbox indexing over HBase/HDFS

Posted by "Mihai Soloi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400729#comment-13400729 ] 

Mihai Soloi commented on MAILBOX-173:
-------------------------------------

Implemented an HBaseDirectory and HBase IndexInput and IndexOutput, looking into HBASE-3529 for a more optimal approach on distributed searching also emailed Lucene dev mailing list for problems with the checksum when trying to get an already open IndexReader.
                
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
>                 Key: MAILBOX-173
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-173
>             Project: James Mailbox
>          Issue Type: New Feature
>          Components: hbase, lucene, store
>            Reporter: Ioan Eugen Stan
>            Assignee: Ioan Eugen Stan
>              Labels: gsoc, gsoc2012, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


[jira] [Commented] (MAILBOX-173) [gsoc2012] Distribuited mailbox indexing over HBase/HDFS

Posted by "Ioan Eugen Stan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242166#comment-13242166 ] 

Ioan Eugen Stan commented on MAILBOX-173:
-----------------------------------------

Hi,

You should check out mailbox project. lucene-mailbox is responsible for indexing and searching. It exposes API that James server calls when a search is performed (usually triggered by a IMAP SEARCH command). An indexing mailet is usually is responsible for indexing the document. 

Good luck, 
                
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
>                 Key: MAILBOX-173
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-173
>             Project: James Mailbox
>          Issue Type: New Feature
>          Components: hbase, lucene, store
>            Reporter: Ioan Eugen Stan
>            Assignee: Ioan Eugen Stan
>              Labels: gsoc, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


[jira] [Commented] (MAILBOX-173) [gsoc2012] Distribuited mailbox indexing over HBase/HDFS

Posted by "Mihai Soloi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247629#comment-13247629 ] 

Mihai Soloi commented on MAILBOX-173:
-------------------------------------

Proposal submitted! Please take a look at the google melange: http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/mihaisoloi/1

Thank you Eugen for the help so far!
                
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
>                 Key: MAILBOX-173
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-173
>             Project: James Mailbox
>          Issue Type: New Feature
>          Components: hbase, lucene, store
>            Reporter: Ioan Eugen Stan
>            Assignee: Ioan Eugen Stan
>              Labels: gsoc, gsoc2012, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


[jira] [Commented] (MAILBOX-173) [gsoc2012] Distribuited mailbox indexing over HBase/HDFS

Posted by "Mihai Soloi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247632#comment-13247632 ] 

Mihai Soloi commented on MAILBOX-173:
-------------------------------------

Any input/suggestion is highly appreciated
                
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
>                 Key: MAILBOX-173
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-173
>             Project: James Mailbox
>          Issue Type: New Feature
>          Components: hbase, lucene, store
>            Reporter: Ioan Eugen Stan
>            Assignee: Ioan Eugen Stan
>              Labels: gsoc, gsoc2012, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


[jira] [Commented] (MAILBOX-173) [gsoc2012] Distribuited mailbox indexing over HBase/HDFS

Posted by "Mihai Soloi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242149#comment-13242149 ] 

Mihai Soloi commented on MAILBOX-173:
-------------------------------------

Hi Eugen, I've read the article and I am interested on this subject, i am currently playing with the James Server and will look for the way search should be implemented.
                
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
>                 Key: MAILBOX-173
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-173
>             Project: James Mailbox
>          Issue Type: New Feature
>          Components: hbase, lucene, store
>            Reporter: Ioan Eugen Stan
>            Assignee: Ioan Eugen Stan
>              Labels: gsoc, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


[jira] [Updated] (MAILBOX-173) [gsoc2012] Distribuited mailbox indexing over HBase/HDFS

Posted by "Ioan Eugen Stan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAILBOX-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ioan Eugen Stan updated MAILBOX-173:
------------------------------------

    Summary: [gsoc2012] Distribuited mailbox indexing over HBase/HDFS  (was: Distribuited mailbox indexing over HBase/HDFS)
    
> [gsoc2012] Distribuited mailbox indexing over HBase/HDFS
> --------------------------------------------------------
>
>                 Key: MAILBOX-173
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-173
>             Project: James Mailbox
>          Issue Type: New Feature
>          Components: hbase, lucene, store
>            Reporter: Ioan Eugen Stan
>            Assignee: Ioan Eugen Stan
>              Labels: gsoc, mentor
>
> James provide a module called Lucene Mailbox Index that knows how to index emails. Indexing is done by providing a suitable Lucene Directory implementation that will store the index and allow searching. Lucene comes with File system directory JDBC Directory and a few other implementations to store the index in a file-system or in a database.
> In order to provide distributed search we should implement a Directory implementation that will store the index in HBase. Such an implementation is described very well here [1].
> [1] http://www.infoq.com/articles/LuceneHbase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org