You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Thumuluri, Sai" <Sa...@VerizonWireless.com> on 2010/09/07 21:08:20 UTC

Solr and Nutch

Hi - I am trying to crawl using Nutch and index content using Solr. I
have some custom metadata in my html source files that I need to extract
from Nutch to Solr - has anyone done this successfully and if so - can
you please direct as to how to accomplish this?

Thanks,
Sai


RE: Solr and Nutch

Posted by Markus Jelsma <ma...@buyways.nl>.
You will need to configure Nutch to extract those elements and assign them to a field. Then, you must add the field to your Solr schema and don't forget to add both fields to Nutch's solrmapping configuration.
-----Original message-----
From: Thumuluri, Sai <Sa...@VerizonWireless.com>
Sent: Tue 07-09-2010 21:09
To: user@nutch.apache.org; 
Subject: Solr and Nutch

Hi - I am trying to crawl using Nutch and index content using Solr. I
have some custom metadata in my html source files that I need to extract
from Nutch to Solr - has anyone done this successfully and if so - can
you please direct as to how to accomplish this?

Thanks,
Sai


Re: Solr and Nutch

Posted by Yavuz Selim YILMAZ <yv...@gmail.com>.
Also, for html, should metadata be at the "head", can it be in "body" ?
--

Yavuz Selim YILMAZ


2010/9/8 Yavuz Selim YILMAZ <yv...@gmail.com>

> More than one field, then define a new plugin per new metadata?
>
> Differenet pages have different extra metadatas, then would it be
> configured in schema.xml and solrmapping.xml?
> --
>
> Yavuz Selim YILMAZ
>
>
> 2010/9/7 André Ricardo <an...@gmail.com>
>
> Hello Sai,
>>
>> First you need to extract those elements as Markus said. Please see here
>> how
>> to add custom metadata to nutch
>> http://wiki.apache.org/nutch/HowToMakeCustomSearch
>>
>> Then you need to add that custom metadata to schema.xml in Solr
>> http://wiki.apache.org/nutch/RunningNutchAndSolr
>>
>>
>> Cheers,
>> André
>>
>> On Tue, Sep 7, 2010 at 8:08 PM, Thumuluri, Sai <
>> Sai.Thumuluri@verizonwireless.com> wrote:
>>
>> > Hi - I am trying to crawl using Nutch and index content using Solr. I
>> > have some custom metadata in my html source files that I need to extract
>> > from Nutch to Solr - has anyone done this successfully and if so - can
>> > you please direct as to how to accomplish this?
>> >
>> > Thanks,
>> > Sai
>> >
>> >
>>
>
>

Re: Solr and Nutch

Posted by André Ricardo <an...@gmail.com>.
  One plugin can add multiple and different fields.

In the schema.xml you can map your new fields coming from Nutch. But I 
don't really know about solrmapping.xml.


On 10/09/08 07:35, Yavuz Selim YILMAZ wrote:
> More than one field, then define a new plugin per new metadata?
>
> Differenet pages have different extra metadatas, then would it be configured
> in schema.xml and solrmapping.xml?
> --
>
> Yavuz Selim YILMAZ
>
>
> 2010/9/7 André Ricardo<an...@gmail.com>
>
>> Hello Sai,
>>
>> First you need to extract those elements as Markus said. Please see here
>> how
>> to add custom metadata to nutch
>> http://wiki.apache.org/nutch/HowToMakeCustomSearch
>>
>> Then you need to add that custom metadata to schema.xml in Solr
>> http://wiki.apache.org/nutch/RunningNutchAndSolr
>>
>>
>> Cheers,
>> André
>>
>> On Tue, Sep 7, 2010 at 8:08 PM, Thumuluri, Sai<
>> Sai.Thumuluri@verizonwireless.com>  wrote:
>>
>>> Hi - I am trying to crawl using Nutch and index content using Solr. I
>>> have some custom metadata in my html source files that I need to extract
>>> from Nutch to Solr - has anyone done this successfully and if so - can
>>> you please direct as to how to accomplish this?
>>>
>>> Thanks,
>>> Sai
>>>
>>>

RE: Solr and Nutch

Posted by "Thumuluri, Sai" <Sa...@VerizonWireless.com>.
Thank you - We are able to see the meta data on the Nutch front using bin/nutch org.apache.nutch.parse.ParserChecker *, but cannot see the metadata on the Solr side. We have added metadata fields in solrmapping and also checked our schema.xml on both nutch and solr. Are there any additional configuration files involved?

Thanks,
Sai Thumuluri

-----Original Message-----
From: André Ricardo [mailto:andric87@gmail.com] 
Sent: Wednesday, September 08, 2010 9:49 AM
To: user@nutch.apache.org
Subject: Re: Solr and Nutch


  One plugin can add multiple and different fields.

In the schema.xml you can map your new fields coming from Nutch. But I 
don't really know about solrmapping.xml.


On 10/09/08 07:35, Yavuz Selim YILMAZ wrote:
> More than one field, then define a new plugin per new metadata?
>
> Differenet pages have different extra metadatas, then would it be configured
> in schema.xml and solrmapping.xml?
> --
>
> Yavuz Selim YILMAZ
>
>
> 2010/9/7 André Ricardo<an...@gmail.com>
>
>> Hello Sai,
>>
>> First you need to extract those elements as Markus said. Please see here
>> how
>> to add custom metadata to nutch
>> http://wiki.apache.org/nutch/HowToMakeCustomSearch
>>
>> Then you need to add that custom metadata to schema.xml in Solr
>> http://wiki.apache.org/nutch/RunningNutchAndSolr
>>
>>
>> Cheers,
>> André
>>
>> On Tue, Sep 7, 2010 at 8:08 PM, Thumuluri, Sai<
>> Sai.Thumuluri@verizonwireless.com>  wrote:
>>
>>> Hi - I am trying to crawl using Nutch and index content using Solr. I
>>> have some custom metadata in my html source files that I need to extract
>>> from Nutch to Solr - has anyone done this successfully and if so - can
>>> you please direct as to how to accomplish this?
>>>
>>> Thanks,
>>> Sai
>>>
>>>

Re: Solr and Nutch

Posted by Yavuz Selim YILMAZ <yv...@gmail.com>.
More than one field, then define a new plugin per new metadata?

Differenet pages have different extra metadatas, then would it be configured
in schema.xml and solrmapping.xml?
--

Yavuz Selim YILMAZ


2010/9/7 André Ricardo <an...@gmail.com>

> Hello Sai,
>
> First you need to extract those elements as Markus said. Please see here
> how
> to add custom metadata to nutch
> http://wiki.apache.org/nutch/HowToMakeCustomSearch
>
> Then you need to add that custom metadata to schema.xml in Solr
> http://wiki.apache.org/nutch/RunningNutchAndSolr
>
>
> Cheers,
> André
>
> On Tue, Sep 7, 2010 at 8:08 PM, Thumuluri, Sai <
> Sai.Thumuluri@verizonwireless.com> wrote:
>
> > Hi - I am trying to crawl using Nutch and index content using Solr. I
> > have some custom metadata in my html source files that I need to extract
> > from Nutch to Solr - has anyone done this successfully and if so - can
> > you please direct as to how to accomplish this?
> >
> > Thanks,
> > Sai
> >
> >
>

Re: Solr and Nutch

Posted by André Ricardo <an...@gmail.com>.
Hello Sai,

First you need to extract those elements as Markus said. Please see here how
to add custom metadata to nutch
http://wiki.apache.org/nutch/HowToMakeCustomSearch

Then you need to add that custom metadata to schema.xml in Solr
http://wiki.apache.org/nutch/RunningNutchAndSolr


Cheers,
André

On Tue, Sep 7, 2010 at 8:08 PM, Thumuluri, Sai <
Sai.Thumuluri@verizonwireless.com> wrote:

> Hi - I am trying to crawl using Nutch and index content using Solr. I
> have some custom metadata in my html source files that I need to extract
> from Nutch to Solr - has anyone done this successfully and if so - can
> you please direct as to how to accomplish this?
>
> Thanks,
> Sai
>
>