You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Twomey, David" <da...@novartis.com> on 2012/04/30 22:46:17 UTC
correct XPATH syntax
Is this possible in DataImportHandler
I want the following XML to all collapse into one Author field
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Sørlie</LastName>
<ForeName>T</ForeName>
<Initials>T</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Perou</LastName>
<ForeName>C M</ForeName>
<Initials>CM</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Tibshirani</LastName>
<ForeName>R</ForeName>
<Initials>R</Initials>
</Author>
...
So my XPATH is like
Re: correct XPATH syntax
Posted by Lance Norskog <go...@gmail.com>.
The XPath implementation in DIH is very minimal- it is tuned for
speed, not features. The XSL option lets you do everything you could
want, with a slower engine.
On Thu, May 3, 2012 at 7:30 AM, lboutros <bo...@gmail.com> wrote:
> ok, not that easy :)
>
> I did not test it myself but it seems that you could use an XSL
> preprocessing with the 'xsl' option in your XPathEntityProcessor :
>
> http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1
>
> You could transform the author part as you wish and then import the author
> field with your actual configuration.
>
> Ludovic.
>
> -----
> Jouve
> France.
> --
> View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959397.html
> Sent from the Solr - User mailing list archive at Nabble.com.
--
Lance Norskog
goksron@gmail.com
Re: correct XPATH syntax
Posted by lboutros <bo...@gmail.com>.
ok, not that easy :)
I did not test it myself but it seems that you could use an XSL
preprocessing with the 'xsl' option in your XPathEntityProcessor :
http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1
You could transform the author part as you wish and then import the author
field with your actual configuration.
Ludovic.
-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959397.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct XPATH syntax
Posted by "Twomey, David" <da...@novartis.com>.
Is what I want even possible with XPathEntityProcessor?
It sort of works now - I didn't realize the "flatten" attribute is an attribute of field instead of entity.
BUT it's still not what I would like.
The XML looks like below and it's nested within /MedlineCitationSet/MedlineCitation/Article/
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Starremans</LastName>
<ForeName>Patrick G J F</ForeName>
<Initials>PG</Initials>
</Author><Author ValidYN="Y">
<LastName>van der Kemp</LastName>
<ForeName>Annemiete W C M</ForeName>
<Initials>AW</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Knoers</LastName>
<ForeName>Nine V A M</ForeName>
<Initials>NV</Initials>
</Author>
<Author ValidYN="Y">
<LastName>van den Heuvel</LastName>
<ForeName>Lambertus P W J</ForeName>
<Initials>LP</Initials>
</Author>
</AuthorList>
What I would like to see in the index author field is
<author>Starremans PG, Van der Kemp AW, etc </author> note "lastname Initials", no forename.
When I set Xpath like this
<field column="author" xpath="/MedlineCitationSet/MedlineCitation/Article/AuthorList/Author" flatten="true" />
I get this in the index
<arr name="author">
<str>Starremans Patrick G J F PG</str>
<str>Van der Kemp Annemiete W C M AW</str>
.
.
</arr>
note: the forename field is included
My author field in the schema.xml is
<field name="author" type="textgen" indexed="true" stored="true" multiValued="true" required="false"/>
So is this even possible with XPathEntityProcessor?
Thanks
David
On 5/3/12 8:40 AM, "lboutros" <bo...@gmail.com>> wrote:
Hi David,
what do you want to do with the 'commonField' option ?
Is it possible to have the part of the schema for the author field please ?
Is the author field stored ?
Ludovic.
-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959097.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct XPATH syntax
Posted by lboutros <bo...@gmail.com>.
Hi David,
what do you want to do with the 'commonField' option ?
Is it possible to have the part of the schema for the author field please ?
Is the author field stored ?
Ludovic.
-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959097.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct XPATH syntax
Posted by "Twomey, David" <da...@novartis.com>.
Ludovic,
Thanks for your help. I tried your suggestion but it didn't work for
Authors. Below are 3 snippets from data-config.xml, the XML file and the
XML response from the DB
Data-config:
<entity name="medlineFiles" processor="XPathEntityProcessor"
url="${medlineFileList.fileAbsolutePath}"
forEach="/MedlineCitationSet/MedlineCitation"
transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,Log
Transformer"
logTemplate=" processing
${medlineFileList.fileAbsolutePath}"
logLevel="info"
flatten="true"
stream="true">
<field column="pmid"
xpath="/MedlineCitationSet/MedlineCitation/PMID" commonField="true" />
<field column="journal_name"
xpath="/MedlineCitationSet/MedlineCitation/Article/Journal/Title"
commonField="true" />
<field column="title"
xpath="/MedlineCitationSet/MedlineCitation/Article/ArticleTitle"
commonField="true" />
<field column="abstract"
xpath="/MedlineCitationSet/MedlineCitation/Article/Abstract/AbstractText"
commonField="true" />
<field column="author"
xpath="/MedlineCitationSet/MedlineCitation/Article/AuthorList/Author"
commonField="false" />
<field column="year"
xpath="/MedlineCitationSet/MedlineCitation/Article/Journal/JournalIssue/Pub
Date/Year" commonField="true" />
</entity>
XML Snippet for Author:
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Malathi</LastName>
<ForeName>K</ForeName>
<Initials>K</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Xiao</LastName>
<ForeName>Y</ForeName>
<Initials>Y</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Mitchell</LastName>
<ForeName>A P</ForeName>
<Initials>AP</Initials>
</Author>
</AuthorList>
Response from SOLR:
<arr name="author">
<str></str>
<str></str>
<str></str>
<str></str>
<str></str>
<str></str>
<str></str>
<str></str>
<str></str>
<str></str>
<str></str>
<str></str>
<str></str>
<str></str>
</arr>
<str name="journal_name">Journal of cancer research and clinical
oncology</str>
Thanks
David
On 5/1/12 8:05 AM, "lboutros" <bo...@gmail.com> wrote:
>Hi David,
>
>I think you should add this option : flatten=true
>
>and the could you try to use this XPath :
>
>/MedlineCitationSet/MedlineCitation/AuthorList/Author
>
>see here for the description :
>
>http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config
>.xml-1
>
>I don't think the that the commonField option is needed here, I think you
>should suppress it.
>
>Ludovic.
>
>-----
>Jouve
>France.
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812.
>html
>Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct XPATH syntax
Posted by lboutros <bo...@gmail.com>.
Hi David,
I think you should add this option : flatten=true
and the could you try to use this XPath :
/MedlineCitationSet/MedlineCitation/AuthorList/Author
see here for the description :
http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1
I don't think the that the commonField option is needed here, I think you
should suppress it.
Ludovic.
-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct XPATH syntax
Posted by "Twomey, David" <da...@novartis.com>.
Answering my own question: I think I can do this by writing a script that
concats the Lastname, Forname and Initials and adding that to xpath =
/AuthorList/Author
Yes?
On 4/30/12 4:49 PM, "Twomey, David" <da...@novartis.com> wrote:
>Sorry hit send too soon. Continued the email below
>
>On 4/30/12 4:46 PM, "Twomey, David" <da...@novartis.com> wrote:
>
>>
>>Is this possible in DataImportHandler
>>
>>I want the following XML to all collapse into one mult-valued Author
>>field
>>
>><AuthorList CompleteYN="Y">
>> <Author ValidYN="Y">
>> <LastName>Sørlie</LastName>
>> <ForeName>T</ForeName>
>> <Initials>T</Initials>
>> </Author>
>> <Author ValidYN="Y">
>> <LastName>Perou</LastName>
>> <ForeName>C M</ForeName>
>> <Initials>CM</Initials>
>> </Author>
>> <Author ValidYN="Y">
>> <LastName>Tibshirani</LastName>
>> <ForeName>R</ForeName>
>> <Initials>R</Initials>
>> </Author>
>>...
>>
>>So my XPATH is like
>>xpath="/MedlineCitationSet/MedlineCitation/AuthorList/??"
>>commonField="true" />
>
>>
>
Re: correct XPATH syntax
Posted by "Twomey, David" <da...@novartis.com>.
Sorry hit send too soon. Continued the email below
On 4/30/12 4:46 PM, "Twomey, David" <da...@novartis.com> wrote:
>
>Is this possible in DataImportHandler
>
>I want the following XML to all collapse into one mult-valued Author field
>
><AuthorList CompleteYN="Y">
> <Author ValidYN="Y">
> <LastName>Sørlie</LastName>
> <ForeName>T</ForeName>
> <Initials>T</Initials>
> </Author>
> <Author ValidYN="Y">
> <LastName>Perou</LastName>
> <ForeName>C M</ForeName>
> <Initials>CM</Initials>
> </Author>
> <Author ValidYN="Y">
> <LastName>Tibshirani</LastName>
> <ForeName>R</ForeName>
> <Initials>R</Initials>
> </Author>
>...
>
>So my XPATH is like
>xpath="/MedlineCitationSet/MedlineCitation/AuthorList/??"
>commonField="true" />
>