You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2010/06/30 17:44:50 UTC

[jira] Created: (NUTCH-837) Remove search servers and Lucene dependencies

Remove search servers and Lucene dependencies 
----------------------------------------------

                 Key: NUTCH-837
                 URL: https://issues.apache.org/jira/browse/NUTCH-837
             Project: Nutch
          Issue Type: Task
          Components: searcher, web gui
    Affects Versions: 1.1
            Reporter: Julien Nioche
             Fix For: 2.0


One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
* search servers
* indexing and analysis with Lucene
* search side functionalities : ontologies / clustering etc...
In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated NUTCH-837:
------------------------------------

    Attachment:     (was: NUTCH-837.patch)

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884729#action_12884729 ] 

Andrzej Bialecki  commented on NUTCH-837:
-----------------------------------------

bq. So, I think we should still have a Nutch webapp and in my mind it's a must-have for a 2.0 release...

I agree. But for the moment it's better to delete the old webapp stuff that we know for sure doesn't work with the current Nutch, and it will be completely reimplemented anyway.

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884739#action_12884739 ] 

Julien Nioche commented on NUTCH-837:
-------------------------------------

Comments on the latest patch : 
* default.properties : some entries can be removed
{code}
docs.dir = ./docs
docs.src = ${basedir}/src/web
xmlcatalog.dir = ${basedir}/src/xmlcatalog
build.webapps = ${build.dir}/webapps
web.src.dir = ./src/web
src.webapps = ./src/webapps
{code}
* docs/ : still there
* src/web/ : ditto

apart from that +1


> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884734#action_12884734 ] 

Julien Nioche commented on NUTCH-837:
-------------------------------------

:-)

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884296#action_12884296 ] 

Chris A. Mattmann commented on NUTCH-837:
-----------------------------------------

hahah uh oh!

I'll try and take a look before next Tuesday...

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  resolved NUTCH-837.
-------------------------------------

    Resolution: Fixed

Committed in r960064. Thanks for review!

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated NUTCH-837:
------------------------------------

    Attachment: NUTCH-837.patch

Updated patch against r959954 (after NUTCH-836).

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch, NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884712#action_12884712 ] 

Chris A. Mattmann commented on NUTCH-837:
-----------------------------------------

I'm not sure I agree :) 

The Nutch webapp is just a set of web pages that let someone know that Search is working. They are decent web pages, have a great look and feel and are something I've seen nearly every newbie Nutch user I've been around leverage to tell whether or not Nutch installed correctly.

I'm also a fan of the "let's not loose functionality on a technology upgrade task" mantra. That is, we are reorganizing the architecture of Nutch to improve it, not to take away functionality. We should at least support the baseline of functionality that was present in 1.x.

That said, I'm not sure the existing webapp should be maintained in its current form. Maybe we should take a pass at updating the webapp to work with the Nutch 2.0 architecture underneath. I'm happy to pick up a shovel and dig on that one.

Cheers,
Chris


> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884715#action_12884715 ] 

Julien Nioche commented on NUTCH-837:
-------------------------------------

Thanks for your comments Chris

{quote}
The Nutch webapp is just a set of web pages that let someone know that Search is working. They are decent web pages, have a great look and feel and are something I've seen nearly every newbie Nutch user I've been around leverage to tell whether or not Nutch installed correctly.
{quote}

well the SOLR webapps would be just as good if not better for debugging. You get all sorts of stats + can debug your queries etc... The front end and its configuration is also a common source of trouble for beginners. 

{quote}
I'm also a fan of the "let's not loose functionality on a technology upgrade task" mantra. That is, we are reorganizing the architecture of Nutch to improve it, not to take away functionality. We should at least support the baseline of functionality that was present in 1.x.
{quote}

I don't think it is completely lost we still do have the webapps from SOLR :-) 
Regardless of the debug aspect mentioned earlier I really think that any real application based on Nutch would customise the front end anyway. 

{quote}
That said, I'm not sure the existing webapp should be maintained in its current form. Maybe we should take a pass at updating the webapp to work with the Nutch 2.0 architecture underneath. I'm happy to pick up a shovel and dig on that one.
{quote}

This would need doing indeed i.e. get the cached data or inlinks straight from the webtable via GORA. Speaking of which we should probably think in terms of "what functionalities do we have in Nutch that are currently missing in SOLR", one of them being to be able to get the cache from HDFS/GORA/etc... without having to store the content in the index.







> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884700#action_12884700 ] 

Julien Nioche commented on NUTCH-837:
-------------------------------------

Hi Chris, 

My position on this is that we simply wouldn't replace it. We delegate the search to SOLR and expect people to reuse existing front ends for SOLR or build custom ones (as I expect real world deployments of Nutch would do anyway). Maintaining the webapps takes some effort that I doubt we can afford given the limited number of active committers that we have. I'd rather we focused on crawl-related functionalities.  

WDYT? 

J.

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884718#action_12884718 ] 

Chris A. Mattmann commented on NUTCH-837:
-----------------------------------------

Hey Julien,

Yep that's the point. Solr != Nutch, so Solr's Webapp can't be expected to be = Nutch's webapp. The example you cited about cached data is a great one, because Solr's webapp doesn't really support that (nor should it IMHO).

So, I think we should still have a Nutch webapp and in my mind it's a must-have for a 2.0 release...not to worry though I'm volunteering to help do it! :)

Cheers,
Chris

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884731#action_12884731 ] 

Chris A. Mattmann commented on NUTCH-837:
-----------------------------------------

Okey dok, I created NUTCH-841 to track it. Julien, Andrzej, you have my +1 to take your axe to the old one :)

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884996#action_12884996 ] 

Hudson commented on NUTCH-837:
------------------------------

Integrated in Nutch-trunk #1197 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1197/])
    

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884729#action_12884729 ] 

Andrzej Bialecki  edited comment on NUTCH-837 at 7/2/10 11:55 AM:
------------------------------------------------------------------

bq. So, I think we should still have a Nutch webapp and in my mind it's a must-have for a 2.0 release...

I agree. But for the moment it's better to delete the old webapp stuff that we know for sure doesn't work with the current Nutch, and it will be completely reimplemented anyway. Refactoring it to work with the new Solr-based app is likely not worth it - we can achieve a similar effect to the current webapp by just tweaking the styling of the Solritas handler.

      was (Author: ab):
    bq. So, I think we should still have a Nutch webapp and in my mind it's a must-have for a 2.0 release...

I agree. But for the moment it's better to delete the old webapp stuff that we know for sure doesn't work with the current Nutch, and it will be completely reimplemented anyway.
  
> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  reassigned NUTCH-837:
---------------------------------------

    Assignee: Andrzej Bialecki 

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated NUTCH-837:
------------------------------------

    Attachment: NUTCH-837.patch

Warning - Nutch veterans may want to sit down before reading, because it looks like half of Nutch code is deleted in this patch... ;)

This patch implements the changes. All tests (that remain) pass, and a full crawl cycle plus  Solr indexing works as before. There is no single entry point in Nutch at this moment for searching - we may want to add a minimal test search setup based on Solr in another patch.

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884671#action_12884671 ] 

Julien Nioche commented on NUTCH-837:
-------------------------------------

I think we can also get rid of  :

* docs/
* WAR related tasks in ANT
* src/web/
* src/xmlcatalog/
* src/engines/


> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884691#action_12884691 ] 

Chris A. Mattmann commented on NUTCH-837:
-----------------------------------------

Hey Julien:

How are we going to replace the Nutch webapp? 

Cheers,
Chris

> Remove search servers and Lucene dependencies 
> ----------------------------------------------
>
>                 Key: NUTCH-837
>                 URL: https://issues.apache.org/jira/browse/NUTCH-837
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher, web gui
>    Affects Versions: 1.1
>            Reporter: Julien Nioche
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-837.patch
>
>
> One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : 
> * search servers
> * indexing and analysis with Lucene
> * search side functionalities : ontologies / clustering etc...
> In the short term only SOLR / SOLRCloud will be supported but the plan would be to add other systems as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.