You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Mark Jones <ma...@quovadx.com> on 2006/08/15 00:28:59 UTC

searching on multiple subcollections

Hi,
I noticed that "subcollection" field has multiple space separated
subcollection values for documents that are in more than one
subcollection.  I read the sources and tried several syntaxes.
Is there a way to search for membership in multiple subcollections?
 
A site:
site.foo.com/index.html
site.foo.com/faq.html
site.foo.com/contact.html
site.foo.com/bar/bar1.html
site.foo.com/foo/foo1.html
 
A subcollections.xml
<subcollections>
    <subcollection>
        <name>foosite</name>
        <id>foosite</id>
        <whitelist>http://site.foo.com</whitelist>
        <blacklist />
    </subcollection>
    <subcollection>
        <name>foobar</name>
        <id>foobar</id>
        <whitelist>http://site.foo.com/bar</whitelist>
        <blacklist />
    </subcollection>
</subcollections>
 
subcollection field for site.foo.com/bar/bar1.html
"foosite foobar"
 
subcollection field for site.foo.com/foo/foo1.html
"foosite"
 
I would like to search for documents in subcollections
foosite AND foobar
foosite OR foobar
 
Thanks,
 
Mark
 
Mark Jones
Sr. Systems Integration Specialist
Mark.Jones@quovadx.com