You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Filipe Antunes (JIRA)" <ji...@apache.org> on 2009/04/07 14:08:12 UTC

[jira] Created: (NUTCH-732) Subcollection plugin not working on Nutch-1.0

Subcollection plugin not working on Nutch-1.0
---------------------------------------------

                 Key: NUTCH-732
                 URL: https://issues.apache.org/jira/browse/NUTCH-732
             Project: Nutch
          Issue Type: Bug
          Components: indexer
    Affects Versions: 1.0.0
         Environment: Mac OS X 10.5 intel
            Reporter: Filipe Antunes
            Priority: Critical


I am trying to get subcollections working, using Nutch-1.0 !
I configured subcolections.xml then I added the plugin on nutch-site.xml.
When the index finishes, I opened lucene luke to check if the database was working properly.
The field subcollection is populated as it should, but searching for any subcollection, on the search tab of luke, returns no results.
If I do a search on the url field, I can see that every record has a subcollection associated, yet i can't search for using the  subcollection field.
search examples on luke:
subcollection:sub1 -> no results
url:sub1 -> results with field subcollection populated -> sub1

Same results using:
./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 sub"

If i use the "explain", subcollection field is there with the correct word.

It makes no sense so i beleive it's a bug.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (NUTCH-732) Subcollection plugin not working on Nutch-1.0

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  closed NUTCH-732.
-----------------------------------

         Assignee: Andrzej Bialecki 
    Fix Version/s: 1.1
       Resolution: Fixed

Fixed in rev. 938592. Thanks!

> Subcollection plugin not working on Nutch-1.0
> ---------------------------------------------
>
>                 Key: NUTCH-732
>                 URL: https://issues.apache.org/jira/browse/NUTCH-732
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.0.0
>         Environment: Mac OS X 10.5 intel
>            Reporter: Filipe Antunes
>            Assignee: Andrzej Bialecki 
>            Priority: Critical
>             Fix For: 1.1
>
>         Attachments: sub.patch
>
>
> I am trying to get subcollections working, using Nutch-1.0 !
> I configured subcolections.xml then I added the plugin on nutch-site.xml.
> When the index finishes, I opened lucene luke to check if the database was working properly.
> The field subcollection is populated as it should, but searching for any subcollection, on the search tab of luke, returns no results.
> If I do a search on the url field, I can see that every record has a subcollection associated, yet i can't search for using the  subcollection field.
> search examples on luke:
> subcollection:sub1 -> no results
> url:sub1 -> results with field subcollection populated -> sub1
> Same results using:
> ./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 sub"
> If i use the "explain", subcollection field is there with the correct word.
> It makes no sense so i beleive it's a bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-732) Subcollection plugin not working on Nutch-1.0

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated NUTCH-732:
------------------------------------

    Attachment: sub.patch

Turns out this was due to a way the list of applicable collections is created, and how that field is added to the indexing backend. First, it appends a leading space, creating collection names like ' nutch' instead of 'nutch'. Then, instead of tokenizing this field it passes it as is, so the leading space is kept and prevents you from running a query.

I changed the collection name appending logic, and turned the field into tokenized.

I'll commit the patch shortly.

> Subcollection plugin not working on Nutch-1.0
> ---------------------------------------------
>
>                 Key: NUTCH-732
>                 URL: https://issues.apache.org/jira/browse/NUTCH-732
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.0.0
>         Environment: Mac OS X 10.5 intel
>            Reporter: Filipe Antunes
>            Priority: Critical
>         Attachments: sub.patch
>
>
> I am trying to get subcollections working, using Nutch-1.0 !
> I configured subcolections.xml then I added the plugin on nutch-site.xml.
> When the index finishes, I opened lucene luke to check if the database was working properly.
> The field subcollection is populated as it should, but searching for any subcollection, on the search tab of luke, returns no results.
> If I do a search on the url field, I can see that every record has a subcollection associated, yet i can't search for using the  subcollection field.
> search examples on luke:
> subcollection:sub1 -> no results
> url:sub1 -> results with field subcollection populated -> sub1
> Same results using:
> ./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 sub"
> If i use the "explain", subcollection field is there with the correct word.
> It makes no sense so i beleive it's a bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.