You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Dawid Weiss (JIRA)" <ji...@apache.org> on 2010/02/05 13:09:27 UTC

[jira] Created: (NUTCH-787) Upgrade Lucene to 3.0.0.

Upgrade Lucene to 3.0.0.
------------------------

                 Key: NUTCH-787
                 URL: https://issues.apache.org/jira/browse/NUTCH-787
             Project: Nutch
          Issue Type: Task
          Components: build
            Reporter: Dawid Weiss
            Priority: Trivial




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.1.

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847709#action_12847709 ] 

Hudson commented on NUTCH-787:
------------------------------

Integrated in Nutch-trunk #1101 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1101/])
     Upgrade to Lucene 3.0.1.


> Upgrade Lucene to 3.0.1.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Assignee: Andrzej Bialecki 
>            Priority: Trivial
>             Fix For: 1.1
>
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-787) Upgrade Lucene to 3.0.0.

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche updated NUTCH-787:
--------------------------------

    Fix Version/s: 1.1

> Upgrade Lucene to 3.0.0.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Priority: Trivial
>             Fix For: 1.1
>
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.0.

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847315#action_12847315 ] 

Andrzej Bialecki  commented on NUTCH-787:
-----------------------------------------

Using Lucene 3.0.1 artifacts I verified that your patch passes all tests and produces correct searchable indexes. I'll commit this shortly.

> Upgrade Lucene to 3.0.0.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Priority: Trivial
>             Fix For: 1.1
>
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.0.

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830085#action_12830085 ] 

Dawid Weiss commented on NUTCH-787:
-----------------------------------

Just did an initial check -- this should be doable, although will result in a sizeable patch due to API changes and removed deprecations. I think it still makes sense to try and push the 3.0 version of Lucene into Nutch, so I will keep working on this and seek help in reviewing the patch (and incompatible changes) once it's ready.

> Upgrade Lucene to 3.0.0.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Priority: Trivial
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-787) Upgrade Lucene to 3.0.1.

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated NUTCH-787:
------------------------------------

    Assignee: Andrzej Bialecki 
     Summary: Upgrade Lucene to 3.0.1.  (was: Upgrade Lucene to 3.0.0.)

We're shooting at 3.0.1 now.

> Upgrade Lucene to 3.0.1.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Assignee: Andrzej Bialecki 
>            Priority: Trivial
>             Fix For: 1.1
>
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-787) Upgrade Lucene to 3.0.0.

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated NUTCH-787:
------------------------------

    Attachment: NUTCH-787.patch

This patch moves Nutch from Lucene 2.9.1 to Lucene 3.0.0. All tests pass. The patch does not contain binary files (Lucene JARs), these should be applied manually.

D       src/plugin/summary-lucene/lib/lucene-highlighter-2.9.1.jar
A       src/plugin/summary-lucene/lib/lucene-highlighter-3.0.0.jar
D       src/plugin/lib-lucene-analyzers/lib/lucene-analyzers-2.9.1.jar
A       src/plugin/lib-lucene-analyzers/lib/lucene-analyzers-3.0.0.jar
D       lib/lucene-misc-2.9.1.jar
A       lib/lucene-core-3.0.0.jar
D       lib/lucene-core-2.9.1.jar
A       lib/lucene-misc-3.0.0.jar


> Upgrade Lucene to 3.0.0.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Priority: Trivial
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.1.

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847325#action_12847325 ] 

Dawid Weiss commented on NUTCH-787:
-----------------------------------

Thanks Andrzej.

> Upgrade Lucene to 3.0.1.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Assignee: Andrzej Bialecki 
>            Priority: Trivial
>             Fix For: 1.1
>
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-787) Upgrade Lucene to 3.0.0.

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated NUTCH-787:
------------------------------

    Attachment:     (was: NUTCH-787.patch)

> Upgrade Lucene to 3.0.0.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Priority: Trivial
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.0.

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846428#action_12846428 ] 

Andrzej Bialecki  commented on NUTCH-787:
-----------------------------------------

Lucene 3.0.1 is out now .. I'll test this patch with 3.0.1 artifacts and will report.

> Upgrade Lucene to 3.0.0.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Priority: Trivial
>             Fix For: 1.1
>
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-787) Upgrade Lucene to 3.0.0.

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated NUTCH-787:
------------------------------

    Attachment: NUTCH-787.patch

Text-patch of changes porting the code to Lucene 3.0.0.

> Upgrade Lucene to 3.0.0.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Priority: Trivial
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.0.

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846434#action_12846434 ] 

Dawid Weiss commented on NUTCH-787:
-----------------------------------

I'll be happy to help if I can. I admit I only ran the build tests -- some empirical crawls and other types of jobs would be more then desirable, but I don't have the infrastructure to do it.

> Upgrade Lucene to 3.0.0.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Priority: Trivial
>             Fix For: 1.1
>
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.0.

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830900#action_12830900 ] 

Dawid Weiss commented on NUTCH-787:
-----------------------------------

The failing test in TestIndexSorter is caused by the change of implementation inside Lucene. In Lucene 2.9, SegmentMerger calls IndexReader#document(int, FieldSelector), but in 3.0 this has been changed to a call to document(int):

        Document doc = reader.document(docCount);

Now, IndexSorter in Nutch overrides both methods and delegates to the superclass (IndexReader) with mapping from old ids to new ids, but IndexReader re-delegates back to the overriden method, so IDs are effectively remapped back to original values.


> Upgrade Lucene to 3.0.0.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Priority: Trivial
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.0.

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830534#action_12830534 ] 

Dawid Weiss commented on NUTCH-787:
-----------------------------------

Definitely not an easy thing to do. I need to finish for today, the code compiles, here's a brief summary of changes:

- modified all filters and streams to use token attributes instead of raw Tokens. In many places I tried to be least intrusive so that the patch can be easily reviewed and accepted; improvements resulting from the new API can follow,

- replaced deprecated constants to their new equivalents (UN_TOKENIZED, etc),

- there are no compressed fields any more, so this stuff is commented out.

If I may ask as many people with Lucene/Nutch knowledge to go through the patch and point out potential problems, it would be great. At the moment one core test fails for me -- TestIndexSorter. I don't know if the difference in boosts is something that is a result of Lucene changes or my bug introduced somewhere along the way. 



> Upgrade Lucene to 3.0.0.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Priority: Trivial
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.0.

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830902#action_12830902 ] 

Dawid Weiss commented on NUTCH-787:
-----------------------------------

O.K. I think this is ready for review/ testing and integration. All built-in tests pass, it would be good if people could test it against their indexes.

> Upgrade Lucene to 3.0.0.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Priority: Trivial
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (NUTCH-787) Upgrade Lucene to 3.0.1.

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  closed NUTCH-787.
-----------------------------------

    Resolution: Fixed

Committed. Thanks Dawid!

> Upgrade Lucene to 3.0.1.
> ------------------------
>
>                 Key: NUTCH-787
>                 URL: https://issues.apache.org/jira/browse/NUTCH-787
>             Project: Nutch
>          Issue Type: Task
>          Components: build
>            Reporter: Dawid Weiss
>            Assignee: Andrzej Bialecki 
>            Priority: Trivial
>             Fix For: 1.1
>
>         Attachments: NUTCH-787.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.