You are viewing a plain text version of this content. The canonical link for it is here.
Posted to agent@nutch.apache.org by Filipe Antunes <fa...@tecnica.cc> on 2009/04/08 16:09:53 UTC

Subcollection plugin not working

I'm using nutch 1.0.
My subcollections.xml config file is configured like this:

<?xml version="1.0" encoding="UTF-8"?>
<subcollections>
<subcollection>
        <name>sub1</name>
        <id>sub1</id>
                <whitelist>
                        http://www.apache.org/
                </whitelist>
                <blacklist />
</subcollection>
<subcollection>
        <name>sub2</name>
                <id>sub2</id>
                <whitelist>
                        http://www.mysql.com/
                </whitelist>
                <blacklist />
</subcollection>
<subcollection>
        <name>sub3</name>
                <id>sub3</id>
                <whitelist>
                        http://www.redhat.com/
                </whitelist>
                <blacklist />
</subcollection>
</subcollections>


After indexing, and making sure that plugin subcollection was activated 
on nutch-site.xml,
I checked the database with luke.
Subcollection field was populated as it should with sub1,sub2,sub3
Problem is when I try to search for anything associated with a 
subcollection.
I get zero results (on luke).
Using the command line, the same results:
./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 apache"
Total hits: 0
After performing a normal search, following the explain link on the 
search results, the subcollection content is correct too but any search 
using subcollection:sub1 text, returns no results..
Bug maybe?


-- 

AVISO DE CONFIDENCIALIDADE: Esta mensagem, assim como os ficheiros 
eventualmente anexos, é confidencial e reservada apenas ao conhecimento 
da(s) pessoa(s) nela indicada(s) como destinatária(s). Se não é o seu 
destinatário, solicitamos que não faça qualquer uso do respectivo 
conteúdo e proceda à sua destruição, notificando o remetente.
LIMITAÇÃO DE RESPONSABILIDADE: A segurança da transmissão de informação 
por via electrónica não pode ser garantida pelo remetente, que 
consequentemente, não se responsabiliza por qualquer facto susceptível 
de afectar a sua integridade.
CONFIDENTIALITY NOTICE: This message, as well as any existing attached 
files, is confidential and intended exclusively for the individual(s) 
named as addressees. If you are not the intended recipient, you are 
kindly requested not to make any use whatsoever of its contents and to 
proceed to the destruction of the message, thereby notifying the sender.
DISCLAIMER: The sender of this message can NOT ensure the security of 
its electronic transmission and consequently does not accept liability 
for any fact, which may interfere with the integrity of its content.