You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2010/04/27 20:04:35 UTC
[jira] Updated: (NUTCH-732) Subcollection plugin not working on
Nutch-1.0
[ https://issues.apache.org/jira/browse/NUTCH-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki updated NUTCH-732:
------------------------------------
Attachment: sub.patch
Turns out this was due to a way the list of applicable collections is created, and how that field is added to the indexing backend. First, it appends a leading space, creating collection names like ' nutch' instead of 'nutch'. Then, instead of tokenizing this field it passes it as is, so the leading space is kept and prevents you from running a query.
I changed the collection name appending logic, and turned the field into tokenized.
I'll commit the patch shortly.
> Subcollection plugin not working on Nutch-1.0
> ---------------------------------------------
>
> Key: NUTCH-732
> URL: https://issues.apache.org/jira/browse/NUTCH-732
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.0.0
> Environment: Mac OS X 10.5 intel
> Reporter: Filipe Antunes
> Priority: Critical
> Attachments: sub.patch
>
>
> I am trying to get subcollections working, using Nutch-1.0 !
> I configured subcolections.xml then I added the plugin on nutch-site.xml.
> When the index finishes, I opened lucene luke to check if the database was working properly.
> The field subcollection is populated as it should, but searching for any subcollection, on the search tab of luke, returns no results.
> If I do a search on the url field, I can see that every record has a subcollection associated, yet i can't search for using the subcollection field.
> search examples on luke:
> subcollection:sub1 -> no results
> url:sub1 -> results with field subcollection populated -> sub1
> Same results using:
> ./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 sub"
> If i use the "explain", subcollection field is there with the correct word.
> It makes no sense so i beleive it's a bug.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.