You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by ab...@apache.org on 2005/09/19 16:12:08 UTC

svn commit: r290163 - in /lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2: ./ lib/

Author: ab
Date: Mon Sep 19 07:11:07 2005
New Revision: 290163

URL: http://svn.apache.org/viewcvs?rev=290163&view=rev
Log:
Update of the clustering plugin, contributed by Dawid Weiss.

Carrot2 components updated to the newest stable versions. Improvements in
tokenizers (speedups) and stop words handling. Internal API changed slightly
(update needed if anyone wants to use other Carrot2 components and uses this
code as a glue). Support added for Danish, Finnish, Norwegian (bokmaal) and
Swedish.

Added:
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-collections-3.1-patched.jar   (with props)
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/log4j-1.2.11.jar   (with props)
Removed:
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-collections-3.0.jar
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/log4j-1.2.8.jar
Modified:
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-filter-lingo.jar
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-local-core.jar
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-snowball-stemmers.jar
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-util-common.jar
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-util-tokenizer.jar
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.CONTRIBUTORS
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.LICENSE
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-pool.LICENSE
    lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/plugin.xml

Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-filter-lingo.jar
URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-filter-lingo.jar?rev=290163&r1=290162&r2=290163&view=diff
==============================================================================
Binary files - no diff available.

Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-local-core.jar
URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-local-core.jar?rev=290163&r1=290162&r2=290163&view=diff
==============================================================================
Binary files - no diff available.

Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-snowball-stemmers.jar
URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-snowball-stemmers.jar?rev=290163&r1=290162&r2=290163&view=diff
==============================================================================
Binary files - no diff available.

Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-util-common.jar
URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-util-common.jar?rev=290163&r1=290162&r2=290163&view=diff
==============================================================================
Binary files - no diff available.

Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-util-tokenizer.jar
URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-util-tokenizer.jar?rev=290163&r1=290162&r2=290163&view=diff
==============================================================================
Binary files - no diff available.

Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.CONTRIBUTORS
URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.CONTRIBUTORS?rev=290163&r1=290162&r2=290163&view=diff
==============================================================================
--- lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.CONTRIBUTORS (original)
+++ lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.CONTRIBUTORS Mon Sep 19 07:11:07 2005
@@ -5,9 +5,10 @@
 #
 # First name, surname name; Duties; Active from; Institution
 
-Dawid Weiss; Project administrator, various components, core; 2002; Poznan University of Technology, Poland
-Stanisław, Osiński; Lingo clustering component, ODP Input; 2003; Poznan University of Technology, Poland
+Dawid Weiss; Project administrator, various components, core; 2002; Poland
+Stanisław, Osiński; Lingo clustering component, ODP Input; 2003; Poland
+
 Michał, Wróblewski [*]; AHC clustering components; 2003; Poznan University of Technology, Poland
 Paweł, Kowalik [*]; Inductive search engine wrapper; 2003; Poznan University of Technology, Poland
-Steven, Schockaert; Fuzzy Ants clustering component; 2004; University of Gent, Belgium
-Lang [,] Ngo Chi; Fuzzy Rough set clustering component; 2004; Warsaw University, Poland
+Steven, Schockaert [*]; Fuzzy Ants clustering component; 2004; University of Gent, Belgium
+Lang, Ngo Chi [*]; Fuzzy Rough set clustering component; 2004; Warsaw University, Poland

Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.LICENSE
URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.LICENSE?rev=290163&r1=290162&r2=290163&view=diff
==============================================================================
--- lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.LICENSE (original)
+++ lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.LICENSE Mon Sep 19 07:11:07 2005
@@ -1,6 +1,6 @@
 
 Carrot2 Project
-Copyright (C) 2002-2004, Dawid Weiss
+Copyright (C) Dawid Weiss, Stanislaw Osinski
 Portions (C) Contributors listed in carrot2.CONTRIBUTORS file.
 
 Redistribution and use in source and binary forms, with or without modification,

Added: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-collections-3.1-patched.jar
URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-collections-3.1-patched.jar?rev=290163&view=auto
==============================================================================
Binary file - no diff available.

Propchange: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-collections-3.1-patched.jar
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-pool.LICENSE
URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-pool.LICENSE?rev=290163&r1=290162&r2=290163&view=diff
==============================================================================
--- lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-pool.LICENSE (original)
+++ lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-pool.LICENSE Mon Sep 19 07:11:07 2005
@@ -1,6 +1,6 @@
 /*
- * $Revision: 1.1 $
- * $Date: 2004/08/09 23:23:53 $
+ * $Revision: 1.2 $
+ * $Date: 2004/06/19 16:26:16 $
  *
  * ====================================================================
  *

Added: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/log4j-1.2.11.jar
URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/log4j-1.2.11.jar?rev=290163&view=auto
==============================================================================
Binary file - no diff available.

Propchange: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/log4j-1.2.11.jar
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/plugin.xml
URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/plugin.xml?rev=290163&r1=290162&r2=290163&view=diff
==============================================================================
--- lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/plugin.xml (original)
+++ lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/plugin.xml Mon Sep 19 07:11:07 2005
@@ -18,11 +18,11 @@
       <library name="carrot2-util-tokenizer.jar"/>
 
       <library name="colt-1.0.3.jar"/>
-      <library name="commons-collections-3.0.jar"/>
+      <library name="commons-collections-3.1-patched.jar"/>
       <library name="commons-pool-1.1.jar"/>
       <library name="FSA.jar"/>
       <library name="Jama-1.0.1-patched.jar"/>
-      <library name="log4j-1.2.8.jar"/>
+      <library name="log4j-1.2.11.jar"/>
 
       <library name="nekohtml-0.9.2.jar"/>
    </runtime>



Re: svn commit: r290163 - in /lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2: ./ lib/

Posted by Andrzej Bialecki <ab...@getopt.org>.
Piotr Kosiorowski wrote:
> Hi Andrzej,
> Is anything related to clustering commits left? Or should we proceed 
> with 0.7.1 release?

I will commit the PDFBox update today, and then I don't have anything 
more...


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: svn commit: r290163 - in /lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2: ./ lib/

Posted by Piotr Kosiorowski <pk...@gmail.com>.
Hi Andrzej,
Is anything related to clustering commits left? Or should we proceed 
with 0.7.1 release?
Piotr
ab@apache.org wrote:
> Author: ab
> Date: Mon Sep 19 07:11:07 2005
> New Revision: 290163
> 
> URL: http://svn.apache.org/viewcvs?rev=290163&view=rev
> Log:
> Update of the clustering plugin, contributed by Dawid Weiss.
> 
> Carrot2 components updated to the newest stable versions. Improvements in
> tokenizers (speedups) and stop words handling. Internal API changed slightly
> (update needed if anyone wants to use other Carrot2 components and uses this
> code as a glue). Support added for Danish, Finnish, Norwegian (bokmaal) and
> Swedish.
> 
> Added:
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-collections-3.1-patched.jar   (with props)
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/log4j-1.2.11.jar   (with props)
> Removed:
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-collections-3.0.jar
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/log4j-1.2.8.jar
> Modified:
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-filter-lingo.jar
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-local-core.jar
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-snowball-stemmers.jar
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-util-common.jar
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-util-tokenizer.jar
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.CONTRIBUTORS
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.LICENSE
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-pool.LICENSE
>     lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/plugin.xml
> 
> Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-filter-lingo.jar
> URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-filter-lingo.jar?rev=290163&r1=290162&r2=290163&view=diff
> ==============================================================================
> Binary files - no diff available.
> 
> Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-local-core.jar
> URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-local-core.jar?rev=290163&r1=290162&r2=290163&view=diff
> ==============================================================================
> Binary files - no diff available.
> 
> Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-snowball-stemmers.jar
> URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-snowball-stemmers.jar?rev=290163&r1=290162&r2=290163&view=diff
> ==============================================================================
> Binary files - no diff available.
> 
> Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-util-common.jar
> URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-util-common.jar?rev=290163&r1=290162&r2=290163&view=diff
> ==============================================================================
> Binary files - no diff available.
> 
> Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-util-tokenizer.jar
> URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2-util-tokenizer.jar?rev=290163&r1=290162&r2=290163&view=diff
> ==============================================================================
> Binary files - no diff available.
> 
> Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.CONTRIBUTORS
> URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.CONTRIBUTORS?rev=290163&r1=290162&r2=290163&view=diff
> ==============================================================================
> --- lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.CONTRIBUTORS (original)
> +++ lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.CONTRIBUTORS Mon Sep 19 07:11:07 2005
> @@ -5,9 +5,10 @@
>  #
>  # First name, surname name; Duties; Active from; Institution
>  
> -Dawid Weiss; Project administrator, various components, core; 2002; Poznan University of Technology, Poland
> -Stanisław, Osiński; Lingo clustering component, ODP Input; 2003; Poznan University of Technology, Poland
> +Dawid Weiss; Project administrator, various components, core; 2002; Poland
> +Stanisław, Osiński; Lingo clustering component, ODP Input; 2003; Poland
> +
>  Michał, Wróblewski [*]; AHC clustering components; 2003; Poznan University of Technology, Poland
>  Paweł, Kowalik [*]; Inductive search engine wrapper; 2003; Poznan University of Technology, Poland
> -Steven, Schockaert; Fuzzy Ants clustering component; 2004; University of Gent, Belgium
> -Lang [,] Ngo Chi; Fuzzy Rough set clustering component; 2004; Warsaw University, Poland
> +Steven, Schockaert [*]; Fuzzy Ants clustering component; 2004; University of Gent, Belgium
> +Lang, Ngo Chi [*]; Fuzzy Rough set clustering component; 2004; Warsaw University, Poland
> 
> Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.LICENSE
> URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.LICENSE?rev=290163&r1=290162&r2=290163&view=diff
> ==============================================================================
> --- lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.LICENSE (original)
> +++ lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/carrot2.LICENSE Mon Sep 19 07:11:07 2005
> @@ -1,6 +1,6 @@
>  
> 
>  Carrot2 Project
> 
> -Copyright (C) 2002-2004, Dawid Weiss
> 
> +Copyright (C) Dawid Weiss, Stanislaw Osinski
> 
>  Portions (C) Contributors listed in carrot2.CONTRIBUTORS file.
> 
>  
> 
>  Redistribution and use in source and binary forms, with or without modification,
> 
> 
> Added: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-collections-3.1-patched.jar
> URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-collections-3.1-patched.jar?rev=290163&view=auto
> ==============================================================================
> Binary file - no diff available.
> 
> Propchange: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-collections-3.1-patched.jar
> ------------------------------------------------------------------------------
>     svn:mime-type = application/octet-stream
> 
> Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-pool.LICENSE
> URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-pool.LICENSE?rev=290163&r1=290162&r2=290163&view=diff
> ==============================================================================
> --- lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-pool.LICENSE (original)
> +++ lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/commons-pool.LICENSE Mon Sep 19 07:11:07 2005
> @@ -1,6 +1,6 @@
>  /*
> 
> - * $Revision: 1.1 $
> 
> - * $Date: 2004/08/09 23:23:53 $
> 
> + * $Revision: 1.2 $
> 
> + * $Date: 2004/06/19 16:26:16 $
> 
>   *
> 
>   * ====================================================================
> 
>   *
> 
> 
> Added: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/log4j-1.2.11.jar
> URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/log4j-1.2.11.jar?rev=290163&view=auto
> ==============================================================================
> Binary file - no diff available.
> 
> Propchange: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/lib/log4j-1.2.11.jar
> ------------------------------------------------------------------------------
>     svn:mime-type = application/octet-stream
> 
> Modified: lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/plugin.xml
> URL: http://svn.apache.org/viewcvs/lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/plugin.xml?rev=290163&r1=290162&r2=290163&view=diff
> ==============================================================================
> --- lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/plugin.xml (original)
> +++ lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2/plugin.xml Mon Sep 19 07:11:07 2005
> @@ -18,11 +18,11 @@
>        <library name="carrot2-util-tokenizer.jar"/>
> 
>  
> 
>        <library name="colt-1.0.3.jar"/>
> 
> -      <library name="commons-collections-3.0.jar"/>
> 
> +      <library name="commons-collections-3.1-patched.jar"/>
> 
>        <library name="commons-pool-1.1.jar"/>
> 
>        <library name="FSA.jar"/>
> 
>        <library name="Jama-1.0.1-patched.jar"/>
> 
> -      <library name="log4j-1.2.8.jar"/>
> 
> +      <library name="log4j-1.2.11.jar"/>
> 
>  
> 
>        <library name="nekohtml-0.9.2.jar"/>
> 
>     </runtime>
> 
> 
> 
>