You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by pg...@ucla.edu on 2010/11/03 01:37:10 UTC

Re: xpath processing

<?xml version="1.0" encoding="UTF-8"?>
<mods:mods xmlns:mods="http://www.loc.gov/mods/v3"  
xmlns:xlink="http://www.w3.org/1999/xlink"  
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
xsi:schemaLocation="http://www.loc.gov/mods/v3         
http://www.loc.gov/standards/mods/v3/mods-3-0.xsd">
     <mods:titleInfo>
         <mods:title>Any place I hang my hat is home</mods:title>
     </mods:titleInfo>
     <mods:titleInfo type="uniform">
         <mods:title>St. Louis woman</mods:title>
         <mods:partName>Any place I hang my hat is home</mods:partName>
     </mods:titleInfo>
     <mods:titleInfo type="alternative">
         <mods:title>Free an' easy that's my style</mods:title>
     </mods:titleInfo>
     <mods:name type="personal">
         <mods:namePart>Arlen, Harold</mods:namePart>
         <mods:namePart type="date">1905-1986</mods:namePart>
         <mods:role>
             <mods:roleTerm authority="marcrelator"  
type="text">creator</mods:roleTerm>
         </mods:role>
     </mods:name>
     <mods:name type="personal">
         <mods:namePart>Mercer, Johnny</mods:namePart>
         <mods:namePart type="date">1909-</mods:namePart>
     </mods:name>
     <mods:name type="personal">
         <mods:namePart>Davison, R.</mods:namePart>
     </mods:name>
     <mods:name type="personal">
         <mods:namePart>Bontemps, Arna Wendell</mods:namePart>
         <mods:namePart type="date">1902-1973</mods:namePart>
     </mods:name>
     <mods:name type="personal">
         <mods:namePart>Cullen, Countee</mods:namePart>
         <mods:namePart type="date">1903-1946</mods:namePart>
     </mods:name>
     <mods:typeOfResource>notated music</mods:typeOfResource>
     <mods:originInfo>
         <mods:place>
             <mods:placeTerm authority="marccountry"  
type="code">nyu</mods:placeTerm>
         </mods:place>
         <mods:place>
             <mods:placeTerm type="text">New York</mods:placeTerm>
         </mods:place>
         <mods:publisher>De Sylva, Brown &amp; Henderson, Inc.</mods:publisher>
         <mods:dateIssued>c1946</mods:dateIssued>
         <mods:dateIssued encoding="marc">1946</mods:dateIssued>
         <mods:issuance>monographic</mods:issuance>
         <mods:dateOther type="normalized">1946</mods:dateOther>
         <mods:dateOther type="normalized">1946</mods:dateOther>
     </mods:originInfo>
     <mods:language>
         <mods:languageTerm authority="iso639-2b"  
type="code">eng</mods:languageTerm>
     </mods:language>
     <mods:physicalDescription>
         <mods:form authority="marcform">print</mods:form>
         <mods:extent>1 vocal score (5 p.) : ill. ; 31 cm.</mods:extent>
     </mods:physicalDescription>
     <mods:note type="statement of responsibility">music by Harold  
Arlen ; lyrics by Johnny Mercer.</mods:note>
     <mods:note>For voice and piano.</mods:note>
     <mods:note>Includes chord symbols.</mods:note>
     <mods:note>Illustration by R. Davison.</mods:note>
     <mods:note>First line: Free an' easy that's my style.</mods:note>
     <mods:note>"Edward Gross presents St. Louis Woman ... Book by  
Arna Bontemps &amp; Countee Cullen" -- Cover.</mods:note>
     <mods:note>Publisher's advertising includes musical incipits.</mods:note>
     <mods:subject authority="lcsh">
         <mods:topic>Motion picture music</mods:topic>
         <mods:topic>Excerpts</mods:topic>
         <mods:topic>Vocal scores with piano</mods:topic>
     </mods:subject>
     <mods:classification authority="lcc">M1 .S8</mods:classification>
     <mods:identifier type="music plate">1403-4 De Sylva, Brown  
Henderson, Inc.</mods:identifier>
     <mods:location>
         <mods:physicalLocation>Lilly Library, Indiana University  
Bloomington</mods:physicalLocation>
     </mods:location>
     <mods:recordInfo>
         <mods:recordContentSource  
authority="marcorg">IUL</mods:recordContentSource>
         <mods:recordCreationDate  
encoding="marc">990316</mods:recordCreationDate>
         <mods:recordIdentifier>LL-SSM-ALC4888</mods:recordIdentifier>
     </mods:recordInfo>
</mods:mods>

Above is my sample xml

<dataConfig>
<dataSource name="myfilereader" type="FileDataSource"/>
<document>
<entity name="f" rootEntity="false" dataSource="null"  
processor="FileListEntityProcessor" fileName=".*xml" recursive="true"  
baseDir="C:\test_xml">
<entity name="x" dataSource="myfilereader"  
processor="XPathEntityProcessor" url="${f.fileAbsolutePath}"  
stream="false" forEach="/mods"  
transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer">
<field column="id" template="${f.file}"/>
<field column="collectionKey" template="uw"/>
<field column="collectionName" template="University of Washington  
Pacific Northwest Sheet Music Collection"/>
<field column="fileAbsolutePath" template="${f.fileAbsolutePath}"/>
<field column="fileName" template="${f.file}"/>
<field column="fileSize" template="${f.fileSize}"/>
<field column="fileLastModified" template="${f.fileLastModified}"/>
<field column="nameNamePart_keyword" xpath="/mods/name/namePart[@type  
!= 'date']"/>
</entity>
</entity>
</document>
</dataConfig>

above is the data config file
The namePart element in the above xml may or may not have type attribute.
How can i get data from the namePart element which has no type attribute?
xpath="/mods/name/namePart[@type != 'date']" This is not working. I  
dont get any errors ,There is no namePart_keyword in the index.


Quoting Ken Stanley <do...@gmail.com>:

> On Fri, Oct 22, 2010 at 11:52 PM, <pg...@ucla.edu> wrote:
>
>>
>>
>> <dataConfig>
>> <dataSource name="myfilereader" type="FileDataSource"/>
>> <document>
>> <entity name="f" rootEntity="false" dataSource="null"
>> processor="FileListEntityProcessor" fileName=".*xml" recursive="true"
>> baseDir="C:\data\sample_records\mods\starr">
>> <entity name="x" dataSource="myfilereader" processor="XPathEntityProcessor"
>> url="${f.fileAbsolutePath}" stream="false" forEach="/mods"
>> transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer">
>> <field column="id" template="${f.file}"/>
>> <field column="collectionKey" template="starr"/>
>> <field column="collectionName" template="starr"/>
>> <field column="fileAbsolutePath" template="${f.fileAbsolutePath}"/>
>> <field column="fileName" template="${f.file}"/>
>> <field column="fileSize" template="${f.fileSize}"/>
>> <field column="fileLastModified" template="${f.fileLastModified}"/>
>> <field column="classification_keyword" xpath="/mods/classification"/>
>> <field column="accessCondition_keyword" xpath="/mods/accessCondition"/>
>> <field column="nameNamePart_s" xpath="/mods/name/namePart[@type = 'date']"
>> />
>> </entity>
>> </entity>
>> </document>
>> </dataConfig>
>
>
> The documentation says you don't need a dataSource for your
> XPathEntityProcessor entity; in my configuration, I have mine set to the
> name of the top-level FileListEntityProcessor. Everything else looks fine.
> Can you provide one record from your data? Also, are you getting any errors
> in your log?
>
> - Ken
>



Re: xpath processing

Posted by Lance Norskog <go...@gmail.com>.
The XPathEP has the option to run a real XSL script at some point in
its processing chain. I guess you could make an XSL that pulls your
fields out into a simpler XML in the /a/b/c format that the XPath
parser supports.



On Tue, Nov 2, 2010 at 5:37 PM,  <pg...@ucla.edu> wrote:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <mods:mods xmlns:mods="http://www.loc.gov/mods/v3"
> xmlns:xlink="http://www.w3.org/1999/xlink"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://www.loc.gov/mods/v3
>  http://www.loc.gov/standards/mods/v3/mods-3-0.xsd">
>    <mods:titleInfo>
>        <mods:title>Any place I hang my hat is home</mods:title>
>    </mods:titleInfo>
>    <mods:titleInfo type="uniform">
>        <mods:title>St. Louis woman</mods:title>
>        <mods:partName>Any place I hang my hat is home</mods:partName>
>    </mods:titleInfo>
>    <mods:titleInfo type="alternative">
>        <mods:title>Free an' easy that's my style</mods:title>
>    </mods:titleInfo>
>    <mods:name type="personal">
>        <mods:namePart>Arlen, Harold</mods:namePart>
>        <mods:namePart type="date">1905-1986</mods:namePart>
>        <mods:role>
>            <mods:roleTerm authority="marcrelator"
> type="text">creator</mods:roleTerm>
>        </mods:role>
>    </mods:name>
>    <mods:name type="personal">
>        <mods:namePart>Mercer, Johnny</mods:namePart>
>        <mods:namePart type="date">1909-</mods:namePart>
>    </mods:name>
>    <mods:name type="personal">
>        <mods:namePart>Davison, R.</mods:namePart>
>    </mods:name>
>    <mods:name type="personal">
>        <mods:namePart>Bontemps, Arna Wendell</mods:namePart>
>        <mods:namePart type="date">1902-1973</mods:namePart>
>    </mods:name>
>    <mods:name type="personal">
>        <mods:namePart>Cullen, Countee</mods:namePart>
>        <mods:namePart type="date">1903-1946</mods:namePart>
>    </mods:name>
>    <mods:typeOfResource>notated music</mods:typeOfResource>
>    <mods:originInfo>
>        <mods:place>
>            <mods:placeTerm authority="marccountry"
> type="code">nyu</mods:placeTerm>
>        </mods:place>
>        <mods:place>
>            <mods:placeTerm type="text">New York</mods:placeTerm>
>        </mods:place>
>        <mods:publisher>De Sylva, Brown &amp; Henderson,
> Inc.</mods:publisher>
>        <mods:dateIssued>c1946</mods:dateIssued>
>        <mods:dateIssued encoding="marc">1946</mods:dateIssued>
>        <mods:issuance>monographic</mods:issuance>
>        <mods:dateOther type="normalized">1946</mods:dateOther>
>        <mods:dateOther type="normalized">1946</mods:dateOther>
>    </mods:originInfo>
>    <mods:language>
>        <mods:languageTerm authority="iso639-2b"
> type="code">eng</mods:languageTerm>
>    </mods:language>
>    <mods:physicalDescription>
>        <mods:form authority="marcform">print</mods:form>
>        <mods:extent>1 vocal score (5 p.) : ill. ; 31 cm.</mods:extent>
>    </mods:physicalDescription>
>    <mods:note type="statement of responsibility">music by Harold Arlen ;
> lyrics by Johnny Mercer.</mods:note>
>    <mods:note>For voice and piano.</mods:note>
>    <mods:note>Includes chord symbols.</mods:note>
>    <mods:note>Illustration by R. Davison.</mods:note>
>    <mods:note>First line: Free an' easy that's my style.</mods:note>
>    <mods:note>"Edward Gross presents St. Louis Woman ... Book by Arna
> Bontemps &amp; Countee Cullen" -- Cover.</mods:note>
>    <mods:note>Publisher's advertising includes musical incipits.</mods:note>
>    <mods:subject authority="lcsh">
>        <mods:topic>Motion picture music</mods:topic>
>        <mods:topic>Excerpts</mods:topic>
>        <mods:topic>Vocal scores with piano</mods:topic>
>    </mods:subject>
>    <mods:classification authority="lcc">M1 .S8</mods:classification>
>    <mods:identifier type="music plate">1403-4 De Sylva, Brown Henderson,
> Inc.</mods:identifier>
>    <mods:location>
>        <mods:physicalLocation>Lilly Library, Indiana University
> Bloomington</mods:physicalLocation>
>    </mods:location>
>    <mods:recordInfo>
>        <mods:recordContentSource
> authority="marcorg">IUL</mods:recordContentSource>
>        <mods:recordCreationDate
> encoding="marc">990316</mods:recordCreationDate>
>        <mods:recordIdentifier>LL-SSM-ALC4888</mods:recordIdentifier>
>    </mods:recordInfo>
> </mods:mods>
>
> Above is my sample xml
>
> <dataConfig>
> <dataSource name="myfilereader" type="FileDataSource"/>
> <document>
> <entity name="f" rootEntity="false" dataSource="null"
> processor="FileListEntityProcessor" fileName=".*xml" recursive="true"
> baseDir="C:\test_xml">
> <entity name="x" dataSource="myfilereader" processor="XPathEntityProcessor"
> url="${f.fileAbsolutePath}" stream="false" forEach="/mods"
> transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer">
> <field column="id" template="${f.file}"/>
> <field column="collectionKey" template="uw"/>
> <field column="collectionName" template="University of Washington Pacific
> Northwest Sheet Music Collection"/>
> <field column="fileAbsolutePath" template="${f.fileAbsolutePath}"/>
> <field column="fileName" template="${f.file}"/>
> <field column="fileSize" template="${f.fileSize}"/>
> <field column="fileLastModified" template="${f.fileLastModified}"/>
> <field column="nameNamePart_keyword" xpath="/mods/name/namePart[@type !=
> 'date']"/>
> </entity>
> </entity>
> </document>
> </dataConfig>
>
> above is the data config file
> The namePart element in the above xml may or may not have type attribute.
> How can i get data from the namePart element which has no type attribute?
> xpath="/mods/name/namePart[@type != 'date']" This is not working. I dont get
> any errors ,There is no namePart_keyword in the index.
>
>
> Quoting Ken Stanley <do...@gmail.com>:
>
>> On Fri, Oct 22, 2010 at 11:52 PM, <pg...@ucla.edu> wrote:
>>
>>>
>>>
>>> <dataConfig>
>>> <dataSource name="myfilereader" type="FileDataSource"/>
>>> <document>
>>> <entity name="f" rootEntity="false" dataSource="null"
>>> processor="FileListEntityProcessor" fileName=".*xml" recursive="true"
>>> baseDir="C:\data\sample_records\mods\starr">
>>> <entity name="x" dataSource="myfilereader"
>>> processor="XPathEntityProcessor"
>>> url="${f.fileAbsolutePath}" stream="false" forEach="/mods"
>>> transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer">
>>> <field column="id" template="${f.file}"/>
>>> <field column="collectionKey" template="starr"/>
>>> <field column="collectionName" template="starr"/>
>>> <field column="fileAbsolutePath" template="${f.fileAbsolutePath}"/>
>>> <field column="fileName" template="${f.file}"/>
>>> <field column="fileSize" template="${f.fileSize}"/>
>>> <field column="fileLastModified" template="${f.fileLastModified}"/>
>>> <field column="classification_keyword" xpath="/mods/classification"/>
>>> <field column="accessCondition_keyword" xpath="/mods/accessCondition"/>
>>> <field column="nameNamePart_s" xpath="/mods/name/namePart[@type =
>>> 'date']"
>>> />
>>> </entity>
>>> </entity>
>>> </document>
>>> </dataConfig>
>>
>>
>> The documentation says you don't need a dataSource for your
>> XPathEntityProcessor entity; in my configuration, I have mine set to the
>> name of the top-level FileListEntityProcessor. Everything else looks fine.
>> Can you provide one record from your data? Also, are you getting any
>> errors
>> in your log?
>>
>> - Ken
>>
>
>
>



-- 
Lance Norskog
goksron@gmail.com