You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Dawid Weiss (JIRA)" <ji...@apache.org> on 2007/08/22 15:42:32 UTC

[jira] Created: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
--------------------------------------------------------------------

                 Key: NUTCH-544
                 URL: https://issues.apache.org/jira/browse/NUTCH-544
             Project: Nutch
          Issue Type: Improvement
            Reporter: Dawid Weiss
            Priority: Minor


This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doğacan Güney resolved NUTCH-544.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.0
         Assignee: Doğacan Güney

Latest patch committed in rev. 570327.

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: clustering-upgrade-2.1.patch2, libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521791 ] 

Dawid Weiss commented on NUTCH-544:
-----------------------------------

Yes, absolutely -- it's actually my fault I didn't notice these tasks, apologies.

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522041 ] 

Doğacan Güney commented on NUTCH-544:
-------------------------------------

> Doğacan, would it be a problem if we threw in BeanShell and Dom4j JARs? We have been talking about this with Staszek -- this 
> would allow us to instantiate clustering algorithms dynamically and would effectively provide alternatives for Nutch users to 
> use  Lingo, STC or Lingo3G (our commercial clusterer).
>
> I'm asking because I remember at the beginning there were concerns about the size of Nutch when compliled with all plugin 
> dependencies etc.

I wouldn't want to comment on it, since I wasn't around during those discussions. However, as far as I am concerned we can add those two jars because being able to use different clustering algorithms sounds useful to me. (though I don't understand why we need beanshell and dom4j to provide alternatives. Can you elaborate a bit?)

> Same patch, but I added an optional parameter that allows custom clustering processes to be used.

I took a quick look at the code but couldn't find it. Is this a configuration parameter (via nutch-site.xml)?


I am going to let the code stay here for a few days, then commit it if there are no objections...

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: clustering-upgrade-2.1.patch, libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521842 ] 

Dawid Weiss commented on NUTCH-544:
-----------------------------------

Ok, this patch does the following:

- upgrades Carrot2 libs to 2.1 (the most recent stable version)
- fixes issues with tests not run properly,
- fixes some multiple-initialization issues.

It is ready for review/ commit.

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: clustering-upgrade-2.1.patch, libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521843 ] 

Dawid Weiss commented on NUTCH-544:
-----------------------------------

Not exactly; the initialization issue is still present, but I'll create another JIRA entry for it and fix it there (it's not related to the upgrade, but rather to the webapp).

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: clustering-upgrade-2.1.patch, libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521784 ] 

Dawid Weiss commented on NUTCH-544:
-----------------------------------

I've started working on this -- will send a patch for revision soon (tested against the current trunk -- didn't know which version to set for "Affects version", please feel free to edit this field on this issue).

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated NUTCH-544:
------------------------------

    Attachment: clustering-upgrade-2.1.patch

Same patch, but I added an optional parameter that allows custom clustering processes to be used.

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: clustering-upgrade-2.1.patch, libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521792 ] 

Dawid Weiss commented on NUTCH-544:
-----------------------------------

Doğacan, would it be a problem if we threw in BeanShell and Dom4j JARs? We have been talking about this with Staszek -- this would allow us to instantiate clustering algorithms dynamically and would effectively provide alternatives for Nutch users to use Lingo, STC or Lingo3G (our commercial clusterer).

I'm asking because I remember at the beginning there were concerns about the size of Nutch when compliled with all plugin dependencies etc. 


> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522992 ] 

Dawid Weiss commented on NUTCH-544:
-----------------------------------

Hey, Doğacan will you find a spare minute to commit this patch some time this week? Thanks a bunch,

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: clustering-upgrade-2.1.patch, libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated NUTCH-544:
------------------------------

    Attachment: clustering-upgrade-2.1.patch2

The same patch, one extra line of logging info added (specifying the clustering algorithm used).

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: clustering-upgrade-2.1.patch2, libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522047 ] 

Dawid Weiss commented on NUTCH-544:
-----------------------------------

This parameter is in the code. It is specific to the plugin, not the extension point, so I didn't add it to nutch-defaults.xml. I'll write the configuration/ process switching info on the Wiki -- I guess it makes more sense to have it there.

http://wiki.apache.org/nutch/ClusteringPlugin

Switching clustering algorithms isn't very intuitive because they come with their own JARs and Nutch's plugin system requires all JARs to be explicitly defined in the plugin's descriptor. I finally decided to go for a workaround -- there is a default clustering algorithm embedded with the clustering plugin (which uses the Lingo algorithm), if another clustering process is to be used, all its required classes must be present in classpath (for example by placing them in the container's shared classes). Worked for me quite well since you don't have to modify Nutch's WAR at all.  As I said, I'll write a longer explanation of this on the Wiki.

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: clustering-upgrade-2.1.patch, libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doğacan Güney closed NUTCH-544.
-------------------------------


> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: clustering-upgrade-2.1.patch2, libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated NUTCH-544:
------------------------------

    Attachment: libs-packed.tar.gz

lib folder (binary files to be replaced).

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: clustering-upgrade-2.1.patch, libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated NUTCH-544:
------------------------------

    Attachment: clustering-upgrade-2.1.patch

svn diff of the patch. Binary files are not included (is there a way to do it with Subversion?), I'll post them in a separate bundle.

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: clustering-upgrade-2.1.patch
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523448 ] 

Hudson commented on NUTCH-544:
------------------------------

Integrated in Nutch-Nightly #192 (See [http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/192/])

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: clustering-upgrade-2.1.patch2, libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521786 ] 

Doğacan Güney commented on NUTCH-544:
-------------------------------------

Hi Dawid,

Thanks for working on this. I am going to close NUTCH-237 and NUTCH-397 as duplicates, is it OK with you?

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated NUTCH-544:
------------------------------

    Attachment:     (was: clustering-upgrade-2.1.patch)

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: clustering-upgrade-2.1.patch2, libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-544) Upgrade Carrot2 clustering plugin to the newest stable release (2.1)

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated NUTCH-544:
------------------------------

    Attachment:     (was: clustering-upgrade-2.1.patch)

> Upgrade Carrot2 clustering plugin to the newest stable release (2.1)
> --------------------------------------------------------------------
>
>                 Key: NUTCH-544
>                 URL: https://issues.apache.org/jira/browse/NUTCH-544
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: libs-packed.tar.gz
>
>
> This issue upgrades Carrot2 search results clustering plugin to the newest stable version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.