You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by abhayd <aj...@hotmail.com> on 2011/07/29 17:26:18 UTC
combining xml and nutch index in solr
hi
I have a xml file which has url, category,subcategory, title kind of
details.
and we crawl the urls in xml using Nutch. Anyway for use to merge both?
like schema will look like
url
category
subcategory
title
crawl_data_summary_from_nutch
crawl_data_body_content_from_nutch
Any solution for this?
thanks
abhay
--
View this message in context: http://lucene.472066.n3.nabble.com/combining-xml-and-nutch-index-in-solr-tp3209911p3209911.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: combining xml and nutch index in solr
Posted by abhayd <aj...@hotmail.com>.
hi
thanks, That's exactly what i want
as far as I know we can not update solr index with partial values it does
not update the index record, it gets recreated.
so I m not sure how solrindex command will work here
--
View this message in context: http://lucene.472066.n3.nabble.com/combining-xml-and-nutch-index-in-solr-tp3209911p3218125.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: combining xml and nutch index in solr
Posted by Gora Mohanty <go...@mimirtech.com>.
On Fri, Jul 29, 2011 at 8:56 PM, abhayd <aj...@hotmail.com> wrote:
> hi
>
> I have a xml file which has url, category,subcategory, title kind of
> details.
>
> and we crawl the urls in xml using Nutch. Anyway for use to merge both?
[...]
Not sure that I follow your requirements, and it has
been some time since I used Nutch. But, if I understand
correctly, you should be able to do the following:
* Populate the Solr index from the XML file, leaving
the crawl_data_summary and crawl_data_body_content
fields blank.
* Crawl the URLs with Nutch, and use its solrindex command
to fill these two fields.
Regards,
Gora