You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Filipe Antunes (JIRA)" <ji...@apache.org> on 2009/04/07 14:08:12 UTC
[jira] Created: (NUTCH-732) Subcollection plugin not working on
Nutch-1.0
Subcollection plugin not working on Nutch-1.0
---------------------------------------------
Key: NUTCH-732
URL: https://issues.apache.org/jira/browse/NUTCH-732
Project: Nutch
Issue Type: Bug
Components: indexer
Affects Versions: 1.0.0
Environment: Mac OS X 10.5 intel
Reporter: Filipe Antunes
Priority: Critical
I am trying to get subcollections working, using Nutch-1.0 !
I configured subcolections.xml then I added the plugin on nutch-site.xml.
When the index finishes, I opened lucene luke to check if the database was working properly.
The field subcollection is populated as it should, but searching for any subcollection, on the search tab of luke, returns no results.
If I do a search on the url field, I can see that every record has a subcollection associated, yet i can't search for using the subcollection field.
search examples on luke:
subcollection:sub1 -> no results
url:sub1 -> results with field subcollection populated -> sub1
Same results using:
./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 sub"
If i use the "explain", subcollection field is there with the correct word.
It makes no sense so i beleive it's a bug.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Closed: (NUTCH-732) Subcollection plugin not working on
Nutch-1.0
Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki closed NUTCH-732.
-----------------------------------
Assignee: Andrzej Bialecki
Fix Version/s: 1.1
Resolution: Fixed
Fixed in rev. 938592. Thanks!
> Subcollection plugin not working on Nutch-1.0
> ---------------------------------------------
>
> Key: NUTCH-732
> URL: https://issues.apache.org/jira/browse/NUTCH-732
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.0.0
> Environment: Mac OS X 10.5 intel
> Reporter: Filipe Antunes
> Assignee: Andrzej Bialecki
> Priority: Critical
> Fix For: 1.1
>
> Attachments: sub.patch
>
>
> I am trying to get subcollections working, using Nutch-1.0 !
> I configured subcolections.xml then I added the plugin on nutch-site.xml.
> When the index finishes, I opened lucene luke to check if the database was working properly.
> The field subcollection is populated as it should, but searching for any subcollection, on the search tab of luke, returns no results.
> If I do a search on the url field, I can see that every record has a subcollection associated, yet i can't search for using the subcollection field.
> search examples on luke:
> subcollection:sub1 -> no results
> url:sub1 -> results with field subcollection populated -> sub1
> Same results using:
> ./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 sub"
> If i use the "explain", subcollection field is there with the correct word.
> It makes no sense so i beleive it's a bug.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-732) Subcollection plugin not working on
Nutch-1.0
Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki updated NUTCH-732:
------------------------------------
Attachment: sub.patch
Turns out this was due to a way the list of applicable collections is created, and how that field is added to the indexing backend. First, it appends a leading space, creating collection names like ' nutch' instead of 'nutch'. Then, instead of tokenizing this field it passes it as is, so the leading space is kept and prevents you from running a query.
I changed the collection name appending logic, and turned the field into tokenized.
I'll commit the patch shortly.
> Subcollection plugin not working on Nutch-1.0
> ---------------------------------------------
>
> Key: NUTCH-732
> URL: https://issues.apache.org/jira/browse/NUTCH-732
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.0.0
> Environment: Mac OS X 10.5 intel
> Reporter: Filipe Antunes
> Priority: Critical
> Attachments: sub.patch
>
>
> I am trying to get subcollections working, using Nutch-1.0 !
> I configured subcolections.xml then I added the plugin on nutch-site.xml.
> When the index finishes, I opened lucene luke to check if the database was working properly.
> The field subcollection is populated as it should, but searching for any subcollection, on the search tab of luke, returns no results.
> If I do a search on the url field, I can see that every record has a subcollection associated, yet i can't search for using the subcollection field.
> search examples on luke:
> subcollection:sub1 -> no results
> url:sub1 -> results with field subcollection populated -> sub1
> Same results using:
> ./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 sub"
> If i use the "explain", subcollection field is there with the correct word.
> It makes no sense so i beleive it's a bug.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.