You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Dan Rosher (Created) (JIRA)" <ji...@apache.org> on 2012/03/01 16:47:57 UTC

[jira] [Created] (NUTCH-1294) IndexClean job with solr implementation.

IndexClean job with solr implementation.
----------------------------------------

                 Key: NUTCH-1294
                 URL: https://issues.apache.org/jira/browse/NUTCH-1294
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: nutchgora
            Reporter: Dan Rosher
            Priority: Minor
             Fix For: nutchgora
         Attachments: NUTCH-1294.patch

I started by copying/altering the trunk version of SolrClean, though is was inadequate for our needs. We needed to mark particular pages as gone even though they still might be visible on the web, this implementation abstracts the index cleaning process, has a Solr implementation, and adds a clean index plugin extension that allows others to tailor how pages might be removed from their store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1294) IndexClean job with solr implementation.

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1294:
----------------------------------------

    Attachment: NUTCH-1294-v2.patch

New patch which makes trivial accomodations for the associated class(es) in conf/log4j.properties and adds the relevant CLI configuration to bin/nutch 
                
> IndexClean job with solr implementation.
> ----------------------------------------
>
>                 Key: NUTCH-1294
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1294
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Minor
>             Fix For: nutchgora
>
>         Attachments: NUTCH-1294-v2.patch, NUTCH-1294.patch
>
>
> I started by copying/altering the trunk version of SolrClean, though is was inadequate for our needs. We needed to mark particular pages as gone even though they still might be visible on the web, this implementation abstracts the index cleaning process, has a Solr implementation, and adds a clean index plugin extension that allows others to tailor how pages might be removed from their store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1294) IndexClean job with solr implementation.

Posted by "Dan Rosher (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Rosher updated NUTCH-1294:
------------------------------

    Attachment:     (was: NUTCH-1294.patch)
    
> IndexClean job with solr implementation.
> ----------------------------------------
>
>                 Key: NUTCH-1294
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1294
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Minor
>             Fix For: nutchgora
>
>         Attachments: NUTCH-1294.patch
>
>
> I started by copying/altering the trunk version of SolrClean, though is was inadequate for our needs. We needed to mark particular pages as gone even though they still might be visible on the web, this implementation abstracts the index cleaning process, has a Solr implementation, and adds a clean index plugin extension that allows others to tailor how pages might be removed from their store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1294) IndexClean job with solr implementation.

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266889#comment-13266889 ] 

Lewis John McGibbney commented on NUTCH-1294:
---------------------------------------------

I think this is a really neat patch. The new extension point is a great addition to this often desired aspect of maintaining your index. The script in bin/nutch requires to be updated with the correct command, and the patch needs to be tested before we commit. I would be happy to get this tested once the blocker NUTCH-1205 has be resolved (which looks to be very soon). It would be great to get this into 2.0. Thanks Dan.   
                
> IndexClean job with solr implementation.
> ----------------------------------------
>
>                 Key: NUTCH-1294
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1294
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Minor
>             Fix For: nutchgora
>
>         Attachments: NUTCH-1294.patch
>
>
> I started by copying/altering the trunk version of SolrClean, though is was inadequate for our needs. We needed to mark particular pages as gone even though they still might be visible on the web, this implementation abstracts the index cleaning process, has a Solr implementation, and adds a clean index plugin extension that allows others to tailor how pages might be removed from their store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1294) IndexClean job with solr implementation.

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1294:
----------------------------------------

    Fix Version/s:     (was: nutchgora)
                   2.1

Still not tested thoroughly enough so setting and classifying for 2.1
                
> IndexClean job with solr implementation.
> ----------------------------------------
>
>                 Key: NUTCH-1294
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1294
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Minor
>             Fix For: 2.1
>
>         Attachments: NUTCH-1294-v2.patch, NUTCH-1294.patch
>
>
> I started by copying/altering the trunk version of SolrClean, though is was inadequate for our needs. We needed to mark particular pages as gone even though they still might be visible on the web, this implementation abstracts the index cleaning process, has a Solr implementation, and adds a clean index plugin extension that allows others to tailor how pages might be removed from their store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1294) IndexClean job with solr implementation.

Posted by "Dan Rosher (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Rosher updated NUTCH-1294:
------------------------------

    Attachment: NUTCH-1294.patch
    
> IndexClean job with solr implementation.
> ----------------------------------------
>
>                 Key: NUTCH-1294
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1294
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Minor
>             Fix For: nutchgora
>
>         Attachments: NUTCH-1294.patch
>
>
> I started by copying/altering the trunk version of SolrClean, though is was inadequate for our needs. We needed to mark particular pages as gone even though they still might be visible on the web, this implementation abstracts the index cleaning process, has a Solr implementation, and adds a clean index plugin extension that allows others to tailor how pages might be removed from their store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1294) IndexClean job with solr implementation.

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1294:
----------------------------------------

    Fix Version/s:     (was: 2.1)
                   2.2
    
> IndexClean job with solr implementation.
> ----------------------------------------
>
>                 Key: NUTCH-1294
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1294
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: NUTCH-1294.patch, NUTCH-1294-v2.patch
>
>
> I started by copying/altering the trunk version of SolrClean, though is was inadequate for our needs. We needed to mark particular pages as gone even though they still might be visible on the web, this implementation abstracts the index cleaning process, has a Solr implementation, and adds a clean index plugin extension that allows others to tailor how pages might be removed from their store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1294) IndexClean job with solr implementation.

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267729#comment-13267729 ] 

Lewis John McGibbney commented on NUTCH-1294:
---------------------------------------------

Meant to say, I'm still testing this out, but ended up identifying some peculiarities in the gora-cassandra backend when browsing through some debug logs ;0)

Generally speaking I think this looks OK but would be great if others could provide some comments if and when you guys get around to it.
                
> IndexClean job with solr implementation.
> ----------------------------------------
>
>                 Key: NUTCH-1294
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1294
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Minor
>             Fix For: nutchgora
>
>         Attachments: NUTCH-1294-v2.patch, NUTCH-1294.patch
>
>
> I started by copying/altering the trunk version of SolrClean, though is was inadequate for our needs. We needed to mark particular pages as gone even though they still might be visible on the web, this implementation abstracts the index cleaning process, has a Solr implementation, and adds a clean index plugin extension that allows others to tailor how pages might be removed from their store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1294) IndexClean job with solr implementation.

Posted by "Dan Rosher (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Rosher updated NUTCH-1294:
------------------------------

    Attachment: NUTCH-1294.patch
    
> IndexClean job with solr implementation.
> ----------------------------------------
>
>                 Key: NUTCH-1294
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1294
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Minor
>             Fix For: nutchgora
>
>         Attachments: NUTCH-1294.patch
>
>
> I started by copying/altering the trunk version of SolrClean, though is was inadequate for our needs. We needed to mark particular pages as gone even though they still might be visible on the web, this implementation abstracts the index cleaning process, has a Solr implementation, and adds a clean index plugin extension that allows others to tailor how pages might be removed from their store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira