You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Beats <ta...@yahoo.com> on 2009/07/10 09:01:51 UTC

indexing each item in seperate page

hi,

i m new to nutch.
i m trying to crawl and index the rss feed using feed plugin.

what i want is to parse the rss page and index each item's content
seperately.
so that when the user search the content , the content in the item is
searched and displayed...(not the whole rss feed page content).

any suggestion would b appriciated..


thanx in advance

Beats
-- 
View this message in context: http://www.nabble.com/indexing-each-item-in-seperate-page-tp24422674p24422674.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: indexing each item in seperate page

Posted by Doğacan Güney <do...@gmail.com>.
On Fri, Jul 10, 2009 at 13:21, Beats<ta...@yahoo.com> wrote:
>
> hi,
>
> thanx for the help
>
> but it is giving parsing error. is there some other changes to b made???
>
>
> the error is
> fetcher.Fetcher (Fetcher.java:output(796)) - Error parsing:
> http://www.indeed.co.in/rss: failed(2,0)
>

http://www.indeed.co.in/robots.txt

/rss is Disallow-ed. So nutch doesn't crawl it.

>
> Doğacan Güney-3 wrote:
>>
>> On Fri, Jul 10, 2009 at 10:01, Beats<ta...@yahoo.com> wrote:
>>>
>>> hi,
>>>
>>> i m new to nutch.
>>> i m trying to crawl and index the rss feed using feed plugin.
>>>
>>> what i want is to parse the rss page and index each item's content
>>> seperately.
>>> so that when the user search the content , the content in the item is
>>> searched and displayed...(not the whole rss feed page content).
>>>
>>
>> Try using the feed plugin. It extracts each item in rss as a different
>> page.
>>
>>> any suggestion would b appriciated..
>>>
>>>
>>> thanx in advance
>>>
>>> Beats
>>> --
>>> View this message in context:
>>> http://www.nabble.com/indexing-each-item-in-seperate-page-tp24422674p24422674.html
>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Doğacan Güney
>>
>>
>
> --
> View this message in context: http://www.nabble.com/indexing-each-item-in-seperate-page-tp24422674p24424901.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



-- 
Doğacan Güney

Re: indexing each item in seperate page

Posted by Beats <ta...@yahoo.com>.
hi,

thanx for the help

but it is giving parsing error. is there some other changes to b made???


the error is
fetcher.Fetcher (Fetcher.java:output(796)) - Error parsing:
http://www.indeed.co.in/rss: failed(2,0)


Doğacan Güney-3 wrote:
> 
> On Fri, Jul 10, 2009 at 10:01, Beats<ta...@yahoo.com> wrote:
>>
>> hi,
>>
>> i m new to nutch.
>> i m trying to crawl and index the rss feed using feed plugin.
>>
>> what i want is to parse the rss page and index each item's content
>> seperately.
>> so that when the user search the content , the content in the item is
>> searched and displayed...(not the whole rss feed page content).
>>
> 
> Try using the feed plugin. It extracts each item in rss as a different
> page.
> 
>> any suggestion would b appriciated..
>>
>>
>> thanx in advance
>>
>> Beats
>> --
>> View this message in context:
>> http://www.nabble.com/indexing-each-item-in-seperate-page-tp24422674p24422674.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Doğacan Güney
> 
> 

-- 
View this message in context: http://www.nabble.com/indexing-each-item-in-seperate-page-tp24422674p24424901.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: indexing each item in seperate page

Posted by Doğacan Güney <do...@gmail.com>.
On Fri, Jul 10, 2009 at 10:01, Beats<ta...@yahoo.com> wrote:
>
> hi,
>
> i m new to nutch.
> i m trying to crawl and index the rss feed using feed plugin.
>
> what i want is to parse the rss page and index each item's content
> seperately.
> so that when the user search the content , the content in the item is
> searched and displayed...(not the whole rss feed page content).
>

Try using the feed plugin. It extracts each item in rss as a different page.

> any suggestion would b appriciated..
>
>
> thanx in advance
>
> Beats
> --
> View this message in context: http://www.nabble.com/indexing-each-item-in-seperate-page-tp24422674p24422674.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



-- 
Doğacan Güney