You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by zhk011 <zh...@hotmail.com> on 2012/12/11 02:48:04 UTC

How to parse XML attributes with prefix using DIH?

Hi there,

I'm new to Solr and DIH, recently I've been planning to use Solr/DIH to
index some local xml files. Following the DIH example page on solr wiki,
most things work fine, but I found that xml attributes with prefix cannot be
parse. take the following xml file to be indexed for instance:
-----------------------------------------------------------
<book xmlns:bk='urn:samples' bk:genre='novel' self='test1'>
  <id>test</id>
  <title >Pride And Prejudice</title>
</book>
-----------------------------------------------------------

The data-config.xml is like:
-----------------------------------------------------------
<field column="tsip.action" xpath="/book/@xmlns:bk"/>
<field column="tsip.cc" xpath="/book/@bk:genre"/>
<field column="tsip.se" xpath="/book/@self"/>
<field column="tsip.ki" xpath="/book/id"/>

-----------------------------------------------------------

And all the columns have corresponding field definitions in schema.xml.

But in the index result, only the following fields contain value.
-----------------------------------------------------------
<doc>
<str name="tsip.se">test</str>
<str name="tsip.ki">test</str>
<date name="timestamp">2012-12-11T09:26:42.716Z</date>
</doc>
-----------------------------------------------------------

Which means I cannot get the value for attributes with prefixes: tsip.action
and tsip.cc. 

What configuration do I need to let DIH parse these attributes with prefix?
Thanks.



--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-parse-XML-attributes-with-prefix-using-DIH-tp4025888.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to parse XML attributes with prefix using DIH?

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I believe DIH completely ignores names places/prefixes. Try skipping those
and just use local
names.
 On 10 Dec 2012 20:48, "zhk011" <zh...@hotmail.com> wrote:

> Hi there,
>
> I'm new to Solr and DIH, recently I've been planning to use Solr/DIH to
> index some local xml files. Following the DIH example page on solr wiki,
> most things work fine, but I found that xml attributes with prefix cannot
> be
> parse. take the following xml file to be indexed for instance:
> -----------------------------------------------------------
> <book xmlns:bk='urn:samples' bk:genre='novel' self='test1'>
>   <id>test</id>
>   <title >Pride And Prejudice</title>
> </book>
> -----------------------------------------------------------
>
> The data-config.xml is like:
> -----------------------------------------------------------
> <field column="tsip.action" xpath="/book/@xmlns:bk"/>
> <field column="tsip.cc" xpath="/book/@bk:genre"/>
> <field column="tsip.se" xpath="/book/@self"/>
> <field column="tsip.ki" xpath="/book/id"/>
>
> -----------------------------------------------------------
>
> And all the columns have corresponding field definitions in schema.xml.
>
> But in the index result, only the following fields contain value.
> -----------------------------------------------------------
> <doc>
> <str name="tsip.se">test</str>
> <str name="tsip.ki">test</str>
> <date name="timestamp">2012-12-11T09:26:42.716Z</date>
> </doc>
> -----------------------------------------------------------
>
> Which means I cannot get the value for attributes with prefixes:
> tsip.action
> and tsip.cc.
>
> What configuration do I need to let DIH parse these attributes with prefix?
> Thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-parse-XML-attributes-with-prefix-using-DIH-tp4025888.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>