You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by abhayd <aj...@hotmail.com> on 2011/07/29 17:26:18 UTC

combining xml and nutch index in solr

hi 

I have a xml file which has url, category,subcategory, title kind of
details.

and we crawl the urls in xml using Nutch. Anyway for use to merge both?

like schema will look like

url
category
subcategory
title
crawl_data_summary_from_nutch
crawl_data_body_content_from_nutch

Any solution for this?

thanks
abhay


--
View this message in context: http://lucene.472066.n3.nabble.com/combining-xml-and-nutch-index-in-solr-tp3209911p3209911.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: combining xml and nutch index in solr

Posted by abhayd <aj...@hotmail.com>.
hi
thanks, That's exactly what i want

as far as I know we can not update solr index with partial values it does
not update the index record, it gets recreated.

so I m not sure how solrindex command will work here

--
View this message in context: http://lucene.472066.n3.nabble.com/combining-xml-and-nutch-index-in-solr-tp3209911p3218125.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: combining xml and nutch index in solr

Posted by Gora Mohanty <go...@mimirtech.com>.
On Fri, Jul 29, 2011 at 8:56 PM, abhayd <aj...@hotmail.com> wrote:
> hi
>
> I have a xml file which has url, category,subcategory, title kind of
> details.
>
> and we crawl the urls in xml using Nutch. Anyway for use to merge both?
[...]

Not sure that I follow your requirements, and it has
been some time since I used Nutch. But, if I understand
correctly, you should be able to do the following:
* Populate the Solr index from the XML file, leaving
  the crawl_data_summary and crawl_data_body_content
  fields blank.
* Crawl the URLs with Nutch, and use its solrindex command
  to fill these two fields.

Regards,
Gora