You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by P Williams <wi...@gmail.com> on 2011/11/03 16:46:03 UTC

Re: DIH doesn't handle bound namespaces?

Hi Gary,

From
http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource

*It does not support namespaces , but it can handle xmls with namespaces .
When you provide the xpath, just drop the namespace and give the rest (eg
if the tag is '<dc:subject>' the mapping should just
contain 'subject').Easy, isn't it? And you didn't need to write one line of
code! Enjoy **
*
You should be able to use xpath="//titleInfo/title" without making any
modifications (removing the namespace) to your xml.

I hope that answers your question.

Regards,
Tricia

On Mon, Oct 31, 2011 at 9:24 AM, Moore, Gary <Ga...@ars.usda.gov>wrote:

> I'm trying to import some MODS XML using DIH.  The XML uses bound
> namespacing:
>
> <mods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>      xmlns:mods="http://www.loc.gov/mods/v3"
>      xmlns:xlink="http://www.w3.org/1999/xlink"
>      xmlns="http://www.loc.gov/mods/v3"
>      xsi:schemaLocation="http://www.loc.gov/mods/v3
> http://www.loc.gov/mods/v3/mods-3-4.xsd"
>      version="3.4">
>   <mods:titleInfo>
>      <mods:title>Malus domestica: Arnold</mods:title>
>   </mods:titleInfo>
> </mods>
>
> However, XPathEntityProcessor doesn't seem to handle xpaths of the type
> xpath="//mods:titleInfo/mods:title".
>
> If I remove the namespaces from the source XML:
>
> <mods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>      xmlns:mods="http://www.loc.gov/mods/v3"
>      xmlns:xlink="http://www.w3.org/1999/xlink"
>      xmlns="http://www.loc.gov/mods/v3"
>      xsi:schemaLocation="http://www.loc.gov/mods/v3
> http://www.loc.gov/mods/v3/mods-3-4.xsd"
>      version="3.4">
>   <titleInfo>
>      <title>Malus domestica: Arnold</title>
>   </titleInfo>
> </mods>
>
> then xpath="//titleInfo/title" works just fine.  Can anyone confirm that
> this is the case and, if so, recommend a solution?
> Thanks
> Gary
>
>
> Gary Moore
> Technical Lead
> LCA Digital Commons Project
> NAL/ARS/USDA
>
>

Re: DIH doesn't handle bound namespaces?

Posted by Lance Norskog <go...@gmail.com>.
Yes, the xpath thing is a custom lightweight thing for high-speed use.

There is a separate full XSL processor.
http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1

I think this lets you run real XSL on input files. I assume it lets you
throw in your favorite XSL implementation.

On Thu, Nov 3, 2011 at 12:45 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : *It does not support namespaces , but it can handle xmls with namespaces
> .
>
> The real crux of hte issue is that XPathEntityProcessor is terribly named.
> it should have been called "LimitedXPathishSyntaxEntityProcessor" or
> something like that because it doesn't support full xpath syntax...
>
> "The XPathEntityProcessor implements a streaming parser which supports a
> subset of xpath syntax. Complete xpath syntax is not supported but most of
> the common use cases are covered..."
>
> ...i thought there was a DIH FAQ about this, but if not there really
> should be.
>
>
> -Hoss
>



-- 
Lance Norskog
goksron@gmail.com

Re: DIH doesn't handle bound namespaces?

Posted by Chris Hostetter <ho...@fucit.org>.
: *It does not support namespaces , but it can handle xmls with namespaces .

The real crux of hte issue is that XPathEntityProcessor is terribly named.  
it should have been called "LimitedXPathishSyntaxEntityProcessor" or 
something like that because it doesn't support full xpath syntax...

"The XPathEntityProcessor implements a streaming parser which supports a 
subset of xpath syntax. Complete xpath syntax is not supported but most of 
the common use cases are covered..."

...i thought there was a DIH FAQ about this, but if not there really 
should be.


-Hoss