You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by celebis <ce...@gmail.com> on 2015/01/13 15:14:41 UTC

Getting error while indexing XML files on Hadoop


Hi to all from Istanbul, Turkey,

I can say that I'm a newbie in Solr & Hadoop,

I’m trying to index XML files (ipod_other.xml from lucidworks’ example
files, converted into sequence file format), using SolrXMLIngestMapper jars.
I’ve modified the schema.xml file by making the necesssary addions of the
fields stated in the ipod_other.xml file.

*Here’s my command:*
hadoop jar jobjar com.lucidworks.hadoop.ingest.IngestJob
-Dlww.commit.on.close=true -cls
com.lucidworks.hadoop.ingest.SolrXMLIngestMapper -c hdp1  -i
/user/hadoop/output/1420812982906sfu/part-r-00000 -of
com.lucidworks.hadoop.io.LWMapRedOutputFormat -s
http://dc2vmhadappt01:8983/solr


In the end I constatly get "Didn’t ingest any documents, failing" error.

Anybody out there to help me out with this problem, any help is
appreciated..

Thanks

*Here are the addions to the schema.xml:*

<field name="id" type="string" indexed="true" stored="true" required="true"
multiValued="false" /> 
<field name="name" multiValued="true" stored="true"  type="text_en"
indexed="true"/>
<field name="sku" type="text_en_splitting_tight" indexed="true"
stored="true" omitNorms="true"/>
<field name="manu" type="text_general" indexed="true" stored="true"
omitNorms="true"/>
<field name="cat" type="string" indexed="true" stored="true"
multiValued="true"/>
<field name="features" type="text_general" indexed="true" stored="true"
multiValued="true"/>
<field name="includes" type="text_general" indexed="true" stored="true"
termVectors="true" termPositions="true" termOffsets="true" />

<field name="weight" type="float" indexed="true" stored="true"/>
<field name="price"  type="float" indexed="true" stored="true"/>
<field name="popularity" type="int" indexed="true" stored="true" />
<field name="inStock" type="boolean" indexed="true" stored="true" />

<field name="store" type="location" indexed="true" stored="true"/>

<dynamicField name="*_dt"  type="date"    indexed="true"  stored="true"/>

<field name="data_source" stored="false" type="text_en" indexed="true"/> 


*And here is the ipod_other.xml file;*

<add>

<doc>
  <field name="id">F8V7067-APL-KIT</field>
  <field name="name">Belkin Mobile Power Cord for iPod w/ Dock</field>
  <field name="manu">Belkin</field>
  <field name="cat">electronics</field>
  <field name="cat">connector</field>
  <field name="features">car power adapter, white</field>
  <field name="weight">4</field>
  <field name="price">19.95</field>
  <field name="popularity">1</field>
  <field name="inStock">false</field>
  
  <field name="store">45.17614,-93.87341</field>
  <field name="manufacturedate_dt">2005-08-01T16:30:25Z</field>
</doc>

<doc>
  <field name="id">IW-02</field>
  <field name="name">iPod &amp; iPod Mini USB 2.0 Cable</field>
  <field name="manu">Belkin</field>
  <field name="cat">electronics</field>
  <field name="cat">connector</field>
  <field name="features">car power adapter for iPod, white</field>
  <field name="weight">2</field>
  <field name="price">11.50</field>
  <field name="popularity">1</field>
  <field name="inStock">false</field>
  
  <field name="store">37.7752,-122.4232</field>
  <field name="manufacturedate_dt">2006-02-14T23:55:59Z</field>
</doc>


</add>






--
View this message in context: http://lucene.472066.n3.nabble.com/Getting-error-while-indexing-XML-files-on-Hadoop-tp4179168.html
Sent from the Solr - User mailing list archive at Nabble.com.