You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ".: Abhishek :." <ab...@gmail.com> on 2011/02/07 02:27:34 UTC

Indexing question - Setting low boost

Hi all,

 I was looking at the following example,

 http://wiki.apache.org/nutch/WritingPluginExample

 In the example, the author sets a boost of 5.0f for the recommended tag.

 In this same way, can I also set a boost value such that a tag or content
is never indexed at all? If so, what would be the boost value? On a related
note, what are the default content that are usually(by default) indexed by
Lucene?

 Thanks a bunch for all your time and patience. Have a good day.

Cheers,
Abi

Re: Indexing question - Setting low boost

Posted by ".: Abhishek :." <ab...@gmail.com>.
Hi folks,

 Sorry again for asking a question on the same grounds. I am kind of getting
lost with various fragments of the source code.

 I am extending the IndexingFilter, try to check the contents for some words
and if they are present I am returning a NULL. When I try to test this
IndexingFilter(the test class replicated as in TestMoreIndexingFilter) I see
I have to get the contents from the Parse object like "parse.getText()". The
content I get here is "foo bar" which is passed from the test class.

 Now if I have to get the contents of the URL passed in as Text(new Text("
http://nutch.apache.org/index.html");) which of the argument should I be
using.

 The filter method in the implementation is as follows,
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
            CrawlDatum datum, Inlinks inlinks) throws IndexingException {


        String content = parse.getText();
        System.out.println("Content : "+content);
        System.out.println("Contains : "+content.contains("nutch"));
        if(content.contains("nutch")){
            System.out.println("Nutch keyword found! Hence not indexing the
doc :)");
            return null;
        }

        return doc;
    }

 The test method is as follows,

public void testRemoveIndex(){

        Configuration conf = NutchConfiguration.create();
        RemoveIndexingPlugin filter = new RemoveIndexingPlugin ();
        filter.setConf(conf);
        assertNotNull(filter);
        NutchDocument doc = new NutchDocument();
        ParseImpl parse = new ParseImpl("foo bar", new ParseData());
        try{
            filter.filter(doc, parse, new Text("
http://nutch.apache.org/index.html"), new CrawlDatum(), new Inlinks());
        }catch(Exception e){
            e.printStackTrace();
            fail(e.getMessage());
        }

    }


Am I doing something wrong or do I have to write my utility method to get
the contents form the Text(the url)? I see this quite different from the
HTMLParseFilter implementation. Should I be chaining the Index filtering or
something similar?

 I think I am lacking depth of information on Nutch for doing this. Your
guidance would be much appreciated. Thanks!

./Abi


On Tue, Feb 8, 2011 at 9:23 AM, .: Abhishek :. <ab...@gmail.com> wrote:

> Thanks Arkadi. Thanks all for your patience and guidance.
>
>
> On Tue, Feb 8, 2011 at 8:48 AM, <Ar...@csiro.au> wrote:
>
>> You can exclude documents by returning NULL from an index filter.
>>
>> Regards,
>>
>> Arkadi
>>
>> >-----Original Message-----
>> >From: .: Abhishek :. [mailto:ab1sh3k@gmail.com]
>> >Sent: Tuesday, February 08, 2011 11:44 AM
>> >To: markus.jelsma@openindex.io; user@nutch.apache.org
>> >Subject: Re: Indexing question - Setting low boost
>> >
>> >Hi folks,
>> >
>> > Some help would be appreciated. Thanks a bunch..
>> >
>> >Cheers,
>> >Abi
>> >
>> >
>> >On Mon, Feb 7, 2011 at 10:46 AM, .: Abhishek :. <ab...@gmail.com>
>> >wrote:
>> >
>> >> Hi,
>> >>
>> >>  Thanks again for your time and patience.
>> >>
>> >>  The boost makes sense now. I am kind of not sure how to exclude the
>> >entire
>> >> document because there are only two methods,
>> >>
>> >>    - public NutchDocument filter(NutchDocument doc, Parse parse, Text
>> >url,
>> >>    CrawlDatum datum, Inlinks inlinks)
>> >>        throws IndexingException
>> >>    - public void addIndexBackendOptions(Configuration conf)
>> >>
>> >>
>> >>  May be should I add nothing in the document and/or return a null??
>> >>
>> >> ./Abi
>> >>
>> >>
>> >> On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma
>> ><markus.jelsma@openindex.io
>> >> > wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> A high boost depends on the index and query time boosts on other
>> >fields.
>> >>> If the
>> >>> highest boost on a field is N, then N*100 will certainly do the
>> >trick.
>> >>>
>> >>> I haven't studied the LuceneWriter but storing and indexing
>> >parameters are
>> >>> very familiar. Storing a field means it can be retrieved along with
>> >the
>> >>> document if it's queried. Having it indexed just means it can be
>> >queried.
>> >>> But
>> >>> this is about fields, not on the entire document itself.
>> >>>
>> >>> In an indexing filter you want to exclude the entire document.
>> >>>
>> >>> Cheers,
>> >>>
>> >>> > Hi Markus,
>> >>> >
>> >>> >  Thanks for the quick reply.
>> >>> >
>> >>> >  Could you tell me a possible a value for the high boost such that
>> >its
>> >>> to
>> >>> > be negated? or Is there a way I can calculate or find that out.
>> >>> >
>> >>> >  Also, for the other approach on using indexing filter does the
>> >("...",
>> >>> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the
>> >work?
>> >>> >
>> >>> > Thanks,
>> >>> > Abi
>> >>> >
>> >>> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma
>> >>> <ma...@openindex.io>wrote:
>> >>> > > Hi,
>> >>> > >
>> >>> > > A negative boost does not exist and a very low boost is still a
>> >boost.
>> >>> In
>> >>> > > queries, you can work around the problem by giving a very high
>> >boost
>> >>> do
>> >>> > > documents that do not match; the negation parameter with a high
>> >boost
>> >>> > > will do
>> >>> > > the trick.
>> >>> > >
>> >>> > > If you don't want to index certain documents then you'll need an
>> >>> indexing
>> >>> > > filter. That's a different approach.
>> >>> > >
>> >>> > > Cheers,
>> >>> > >
>> >>> > > > Hi all,
>> >>> > > >
>> >>> > > >  I was looking at the following example,
>> >>> > > >
>> >>> > > >  http://wiki.apache.org/nutch/WritingPluginExample
>> >>> > > >
>> >>> > > >  In the example, the author sets a boost of 5.0f for the
>> >recommended
>> >>> > > >  tag.
>> >>> > > >
>> >>> > > >  In this same way, can I also set a boost value such that a tag
>> >or
>> >>> > >
>> >>> > > content
>> >>> > >
>> >>> > > > is never indexed at all? If so, what would be the boost value?
>> >On a
>> >>> > >
>> >>> > > related
>> >>> > >
>> >>> > > > note, what are the default content that are usually(by default)
>> >>> indexed
>> >>> > >
>> >>> > > by
>> >>> > >
>> >>> > > > Lucene?
>> >>> > > >
>> >>> > > >  Thanks a bunch for all your time and patience. Have a good
>> >day.
>> >>> > > >
>> >>> > > > Cheers,
>> >>> > > > Abi
>> >>>
>> >>
>> >>
>>
>
>

Re: Indexing question - Setting low boost

Posted by ".: Abhishek :." <ab...@gmail.com>.
Thanks Arkadi. Thanks all for your patience and guidance.


On Tue, Feb 8, 2011 at 8:48 AM, <Ar...@csiro.au> wrote:

> You can exclude documents by returning NULL from an index filter.
>
> Regards,
>
> Arkadi
>
> >-----Original Message-----
> >From: .: Abhishek :. [mailto:ab1sh3k@gmail.com]
> >Sent: Tuesday, February 08, 2011 11:44 AM
> >To: markus.jelsma@openindex.io; user@nutch.apache.org
> >Subject: Re: Indexing question - Setting low boost
> >
> >Hi folks,
> >
> > Some help would be appreciated. Thanks a bunch..
> >
> >Cheers,
> >Abi
> >
> >
> >On Mon, Feb 7, 2011 at 10:46 AM, .: Abhishek :. <ab...@gmail.com>
> >wrote:
> >
> >> Hi,
> >>
> >>  Thanks again for your time and patience.
> >>
> >>  The boost makes sense now. I am kind of not sure how to exclude the
> >entire
> >> document because there are only two methods,
> >>
> >>    - public NutchDocument filter(NutchDocument doc, Parse parse, Text
> >url,
> >>    CrawlDatum datum, Inlinks inlinks)
> >>        throws IndexingException
> >>    - public void addIndexBackendOptions(Configuration conf)
> >>
> >>
> >>  May be should I add nothing in the document and/or return a null??
> >>
> >> ./Abi
> >>
> >>
> >> On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma
> ><markus.jelsma@openindex.io
> >> > wrote:
> >>
> >>> Hi,
> >>>
> >>> A high boost depends on the index and query time boosts on other
> >fields.
> >>> If the
> >>> highest boost on a field is N, then N*100 will certainly do the
> >trick.
> >>>
> >>> I haven't studied the LuceneWriter but storing and indexing
> >parameters are
> >>> very familiar. Storing a field means it can be retrieved along with
> >the
> >>> document if it's queried. Having it indexed just means it can be
> >queried.
> >>> But
> >>> this is about fields, not on the entire document itself.
> >>>
> >>> In an indexing filter you want to exclude the entire document.
> >>>
> >>> Cheers,
> >>>
> >>> > Hi Markus,
> >>> >
> >>> >  Thanks for the quick reply.
> >>> >
> >>> >  Could you tell me a possible a value for the high boost such that
> >its
> >>> to
> >>> > be negated? or Is there a way I can calculate or find that out.
> >>> >
> >>> >  Also, for the other approach on using indexing filter does the
> >("...",
> >>> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the
> >work?
> >>> >
> >>> > Thanks,
> >>> > Abi
> >>> >
> >>> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma
> >>> <ma...@openindex.io>wrote:
> >>> > > Hi,
> >>> > >
> >>> > > A negative boost does not exist and a very low boost is still a
> >boost.
> >>> In
> >>> > > queries, you can work around the problem by giving a very high
> >boost
> >>> do
> >>> > > documents that do not match; the negation parameter with a high
> >boost
> >>> > > will do
> >>> > > the trick.
> >>> > >
> >>> > > If you don't want to index certain documents then you'll need an
> >>> indexing
> >>> > > filter. That's a different approach.
> >>> > >
> >>> > > Cheers,
> >>> > >
> >>> > > > Hi all,
> >>> > > >
> >>> > > >  I was looking at the following example,
> >>> > > >
> >>> > > >  http://wiki.apache.org/nutch/WritingPluginExample
> >>> > > >
> >>> > > >  In the example, the author sets a boost of 5.0f for the
> >recommended
> >>> > > >  tag.
> >>> > > >
> >>> > > >  In this same way, can I also set a boost value such that a tag
> >or
> >>> > >
> >>> > > content
> >>> > >
> >>> > > > is never indexed at all? If so, what would be the boost value?
> >On a
> >>> > >
> >>> > > related
> >>> > >
> >>> > > > note, what are the default content that are usually(by default)
> >>> indexed
> >>> > >
> >>> > > by
> >>> > >
> >>> > > > Lucene?
> >>> > > >
> >>> > > >  Thanks a bunch for all your time and patience. Have a good
> >day.
> >>> > > >
> >>> > > > Cheers,
> >>> > > > Abi
> >>>
> >>
> >>
>

RE: Indexing question - Setting low boost

Posted by Ar...@csiro.au.
You can exclude documents by returning NULL from an index filter.

Regards,

Arkadi

>-----Original Message-----
>From: .: Abhishek :. [mailto:ab1sh3k@gmail.com]
>Sent: Tuesday, February 08, 2011 11:44 AM
>To: markus.jelsma@openindex.io; user@nutch.apache.org
>Subject: Re: Indexing question - Setting low boost
>
>Hi folks,
>
> Some help would be appreciated. Thanks a bunch..
>
>Cheers,
>Abi
>
>
>On Mon, Feb 7, 2011 at 10:46 AM, .: Abhishek :. <ab...@gmail.com>
>wrote:
>
>> Hi,
>>
>>  Thanks again for your time and patience.
>>
>>  The boost makes sense now. I am kind of not sure how to exclude the
>entire
>> document because there are only two methods,
>>
>>    - public NutchDocument filter(NutchDocument doc, Parse parse, Text
>url,
>>    CrawlDatum datum, Inlinks inlinks)
>>        throws IndexingException
>>    - public void addIndexBackendOptions(Configuration conf)
>>
>>
>>  May be should I add nothing in the document and/or return a null??
>>
>> ./Abi
>>
>>
>> On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma
><markus.jelsma@openindex.io
>> > wrote:
>>
>>> Hi,
>>>
>>> A high boost depends on the index and query time boosts on other
>fields.
>>> If the
>>> highest boost on a field is N, then N*100 will certainly do the
>trick.
>>>
>>> I haven't studied the LuceneWriter but storing and indexing
>parameters are
>>> very familiar. Storing a field means it can be retrieved along with
>the
>>> document if it's queried. Having it indexed just means it can be
>queried.
>>> But
>>> this is about fields, not on the entire document itself.
>>>
>>> In an indexing filter you want to exclude the entire document.
>>>
>>> Cheers,
>>>
>>> > Hi Markus,
>>> >
>>> >  Thanks for the quick reply.
>>> >
>>> >  Could you tell me a possible a value for the high boost such that
>its
>>> to
>>> > be negated? or Is there a way I can calculate or find that out.
>>> >
>>> >  Also, for the other approach on using indexing filter does the
>("...",
>>> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the
>work?
>>> >
>>> > Thanks,
>>> > Abi
>>> >
>>> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma
>>> <ma...@openindex.io>wrote:
>>> > > Hi,
>>> > >
>>> > > A negative boost does not exist and a very low boost is still a
>boost.
>>> In
>>> > > queries, you can work around the problem by giving a very high
>boost
>>> do
>>> > > documents that do not match; the negation parameter with a high
>boost
>>> > > will do
>>> > > the trick.
>>> > >
>>> > > If you don't want to index certain documents then you'll need an
>>> indexing
>>> > > filter. That's a different approach.
>>> > >
>>> > > Cheers,
>>> > >
>>> > > > Hi all,
>>> > > >
>>> > > >  I was looking at the following example,
>>> > > >
>>> > > >  http://wiki.apache.org/nutch/WritingPluginExample
>>> > > >
>>> > > >  In the example, the author sets a boost of 5.0f for the
>recommended
>>> > > >  tag.
>>> > > >
>>> > > >  In this same way, can I also set a boost value such that a tag
>or
>>> > >
>>> > > content
>>> > >
>>> > > > is never indexed at all? If so, what would be the boost value?
>On a
>>> > >
>>> > > related
>>> > >
>>> > > > note, what are the default content that are usually(by default)
>>> indexed
>>> > >
>>> > > by
>>> > >
>>> > > > Lucene?
>>> > > >
>>> > > >  Thanks a bunch for all your time and patience. Have a good
>day.
>>> > > >
>>> > > > Cheers,
>>> > > > Abi
>>>
>>
>>

Re: Indexing question - Setting low boost

Posted by ".: Abhishek :." <ab...@gmail.com>.
Hi folks,

 Some help would be appreciated. Thanks a bunch..

Cheers,
Abi


On Mon, Feb 7, 2011 at 10:46 AM, .: Abhishek :. <ab...@gmail.com> wrote:

> Hi,
>
>  Thanks again for your time and patience.
>
>  The boost makes sense now. I am kind of not sure how to exclude the entire
> document because there are only two methods,
>
>    - public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
>    CrawlDatum datum, Inlinks inlinks)
>        throws IndexingException
>    - public void addIndexBackendOptions(Configuration conf)
>
>
>  May be should I add nothing in the document and/or return a null??
>
> ./Abi
>
>
> On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma <markus.jelsma@openindex.io
> > wrote:
>
>> Hi,
>>
>> A high boost depends on the index and query time boosts on other fields.
>> If the
>> highest boost on a field is N, then N*100 will certainly do the trick.
>>
>> I haven't studied the LuceneWriter but storing and indexing parameters are
>> very familiar. Storing a field means it can be retrieved along with the
>> document if it's queried. Having it indexed just means it can be queried.
>> But
>> this is about fields, not on the entire document itself.
>>
>> In an indexing filter you want to exclude the entire document.
>>
>> Cheers,
>>
>> > Hi Markus,
>> >
>> >  Thanks for the quick reply.
>> >
>> >  Could you tell me a possible a value for the high boost such that its
>> to
>> > be negated? or Is there a way I can calculate or find that out.
>> >
>> >  Also, for the other approach on using indexing filter does the ("...",
>> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the work?
>> >
>> > Thanks,
>> > Abi
>> >
>> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma
>> <ma...@openindex.io>wrote:
>> > > Hi,
>> > >
>> > > A negative boost does not exist and a very low boost is still a boost.
>> In
>> > > queries, you can work around the problem by giving a very high boost
>> do
>> > > documents that do not match; the negation parameter with a high boost
>> > > will do
>> > > the trick.
>> > >
>> > > If you don't want to index certain documents then you'll need an
>> indexing
>> > > filter. That's a different approach.
>> > >
>> > > Cheers,
>> > >
>> > > > Hi all,
>> > > >
>> > > >  I was looking at the following example,
>> > > >
>> > > >  http://wiki.apache.org/nutch/WritingPluginExample
>> > > >
>> > > >  In the example, the author sets a boost of 5.0f for the recommended
>> > > >  tag.
>> > > >
>> > > >  In this same way, can I also set a boost value such that a tag or
>> > >
>> > > content
>> > >
>> > > > is never indexed at all? If so, what would be the boost value? On a
>> > >
>> > > related
>> > >
>> > > > note, what are the default content that are usually(by default)
>> indexed
>> > >
>> > > by
>> > >
>> > > > Lucene?
>> > > >
>> > > >  Thanks a bunch for all your time and patience. Have a good day.
>> > > >
>> > > > Cheers,
>> > > > Abi
>>
>
>

Re: Indexing question - Setting low boost

Posted by ".: Abhishek :." <ab...@gmail.com>.
Hi,

 Thanks again for your time and patience.

 The boost makes sense now. I am kind of not sure how to exclude the entire
document because there are only two methods,

   - public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
   CrawlDatum datum, Inlinks inlinks)
       throws IndexingException
   - public void addIndexBackendOptions(Configuration conf)


 May be should I add nothing in the document and/or return a null??

./Abi

On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma
<ma...@openindex.io>wrote:

> Hi,
>
> A high boost depends on the index and query time boosts on other fields. If
> the
> highest boost on a field is N, then N*100 will certainly do the trick.
>
> I haven't studied the LuceneWriter but storing and indexing parameters are
> very familiar. Storing a field means it can be retrieved along with the
> document if it's queried. Having it indexed just means it can be queried.
> But
> this is about fields, not on the entire document itself.
>
> In an indexing filter you want to exclude the entire document.
>
> Cheers,
>
> > Hi Markus,
> >
> >  Thanks for the quick reply.
> >
> >  Could you tell me a possible a value for the high boost such that its to
> > be negated? or Is there a way I can calculate or find that out.
> >
> >  Also, for the other approach on using indexing filter does the ("...",
> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the work?
> >
> > Thanks,
> > Abi
> >
> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma
> <ma...@openindex.io>wrote:
> > > Hi,
> > >
> > > A negative boost does not exist and a very low boost is still a boost.
> In
> > > queries, you can work around the problem by giving a very high boost do
> > > documents that do not match; the negation parameter with a high boost
> > > will do
> > > the trick.
> > >
> > > If you don't want to index certain documents then you'll need an
> indexing
> > > filter. That's a different approach.
> > >
> > > Cheers,
> > >
> > > > Hi all,
> > > >
> > > >  I was looking at the following example,
> > > >
> > > >  http://wiki.apache.org/nutch/WritingPluginExample
> > > >
> > > >  In the example, the author sets a boost of 5.0f for the recommended
> > > >  tag.
> > > >
> > > >  In this same way, can I also set a boost value such that a tag or
> > >
> > > content
> > >
> > > > is never indexed at all? If so, what would be the boost value? On a
> > >
> > > related
> > >
> > > > note, what are the default content that are usually(by default)
> indexed
> > >
> > > by
> > >
> > > > Lucene?
> > > >
> > > >  Thanks a bunch for all your time and patience. Have a good day.
> > > >
> > > > Cheers,
> > > > Abi
>

Re: Indexing question - Setting low boost

Posted by Markus Jelsma <ma...@openindex.io>.
Hi,

A high boost depends on the index and query time boosts on other fields. If the 
highest boost on a field is N, then N*100 will certainly do the trick.

I haven't studied the LuceneWriter but storing and indexing parameters are 
very familiar. Storing a field means it can be retrieved along with the 
document if it's queried. Having it indexed just means it can be queried. But 
this is about fields, not on the entire document itself.

In an indexing filter you want to exclude the entire document.

Cheers,

> Hi Markus,
> 
>  Thanks for the quick reply.
> 
>  Could you tell me a possible a value for the high boost such that its to
> be negated? or Is there a way I can calculate or find that out.
> 
>  Also, for the other approach on using indexing filter does the ("...",
> LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the work?
> 
> Thanks,
> Abi
> 
> On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma 
<ma...@openindex.io>wrote:
> > Hi,
> > 
> > A negative boost does not exist and a very low boost is still a boost. In
> > queries, you can work around the problem by giving a very high boost do
> > documents that do not match; the negation parameter with a high boost
> > will do
> > the trick.
> > 
> > If you don't want to index certain documents then you'll need an indexing
> > filter. That's a different approach.
> > 
> > Cheers,
> > 
> > > Hi all,
> > > 
> > >  I was looking at the following example,
> > >  
> > >  http://wiki.apache.org/nutch/WritingPluginExample
> > >  
> > >  In the example, the author sets a boost of 5.0f for the recommended
> > >  tag.
> > >  
> > >  In this same way, can I also set a boost value such that a tag or
> > 
> > content
> > 
> > > is never indexed at all? If so, what would be the boost value? On a
> > 
> > related
> > 
> > > note, what are the default content that are usually(by default) indexed
> > 
> > by
> > 
> > > Lucene?
> > > 
> > >  Thanks a bunch for all your time and patience. Have a good day.
> > > 
> > > Cheers,
> > > Abi

Re: Indexing question - Setting low boost

Posted by ".: Abhishek :." <ab...@gmail.com>.
Hi Markus,

 Thanks for the quick reply.

 Could you tell me a possible a value for the high boost such that its to be
negated? or Is there a way I can calculate or find that out.

 Also, for the other approach on using indexing filter does the ("...",
LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the work?

Thanks,
Abi


On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma <ma...@openindex.io>wrote:

> Hi,
>
> A negative boost does not exist and a very low boost is still a boost. In
> queries, you can work around the problem by giving a very high boost do
> documents that do not match; the negation parameter with a high boost will
> do
> the trick.
>
> If you don't want to index certain documents then you'll need an indexing
> filter. That's a different approach.
>
> Cheers,
>
> > Hi all,
> >
> >  I was looking at the following example,
> >
> >  http://wiki.apache.org/nutch/WritingPluginExample
> >
> >  In the example, the author sets a boost of 5.0f for the recommended tag.
> >
> >  In this same way, can I also set a boost value such that a tag or
> content
> > is never indexed at all? If so, what would be the boost value? On a
> related
> > note, what are the default content that are usually(by default) indexed
> by
> > Lucene?
> >
> >  Thanks a bunch for all your time and patience. Have a good day.
> >
> > Cheers,
> > Abi
>

Re: Indexing question - Setting low boost

Posted by Markus Jelsma <ma...@openindex.io>.
Hi,

A negative boost does not exist and a very low boost is still a boost. In 
queries, you can work around the problem by giving a very high boost do 
documents that do not match; the negation parameter with a high boost will do 
the trick.

If you don't want to index certain documents then you'll need an indexing 
filter. That's a different approach.

Cheers,

> Hi all,
> 
>  I was looking at the following example,
> 
>  http://wiki.apache.org/nutch/WritingPluginExample
> 
>  In the example, the author sets a boost of 5.0f for the recommended tag.
> 
>  In this same way, can I also set a boost value such that a tag or content
> is never indexed at all? If so, what would be the boost value? On a related
> note, what are the default content that are usually(by default) indexed by
> Lucene?
> 
>  Thanks a bunch for all your time and patience. Have a good day.
> 
> Cheers,
> Abi