You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Hans-Peter Stricker <st...@epublius.de> on 2013/05/29 14:35:02 UTC

Problem with xpath expression in data-config.xml

Replacing the contents of 
solr-4.3.0\example\example-DIH\solr\rss\conf\rss-data-config.xml

by

<dataConfig>
    <dataSource type="URLDataSource" />
    <document>
        <entity name="beautybooks88 " pk="link" 
url="http://beautybooks88.blogspot.com/feeds/posts/default" 
processor="XPathEntityProcessor" forEach="/feed/entry" 
transformer="DateFormatTransformer">
			<field column="source" xpath="/feed/title" commonField="true" />
			<field column="source-link" xpath="/feed/link[@rel='self']/@href" 
commonField="true" />

			<field column="title" xpath="/feed/entry/title" />
			<field column="link" xpath="/feed/entry/link[@rel='self']/@href" />
			<field column="description" xpath="/feed/entry/content" 
stripHTML="true"/>
			<field column="creator" xpath="/feed/entry/author" />
			<field column="item-subject" xpath="/feed/entry/category/@term"/>
			<field column="date" xpath="/feed/entry/updated" 
dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
		</entity>
    </document>
</dataConfig>

and running the full dataimport from 
http://localhost:8983/solr/#/rss/dataimport//dataimport results in an error.

1) How could I have found the reason faster than I did - by looking into 
which log files,....?

2) If you remove the first occurrence of /@href above, the import succeeds. 
(Note that the same pattern works for column "link".) What's the reason 
why?!!

Best regards and thanks in advance

Hans-Peter 



Re: Problem with xpath expression in data-config.xml

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Ah, I missed that part.

The problem that you have is because you have forEach="/feed/entry" but you
want to read /feed/link as a common field. You need to have forEach="/feed
| /feed/entry" which should let you have both /feed/link as well as
/feed/entry/link.


On Thu, May 30, 2013 at 1:25 PM, Hans-Peter Stricker
<st...@epublius.de>wrote:

> Thanks for having analyzed the problem. But please let me note that I came
> to a somehow different conclusion.
>
> Define for the moment "title" to be the primary unique key:
>
> solr-4.3.0\example\example-DIH\solr\rss\conf\schema.xml
>
> <uniqueKey>title</uniqueKey>
>
> solr-4.3.0\example\example-DIH\solr\rss\conf\rss-data-config.xml
>
> [BAD CASE] (irrespective of the predicate @rel='self')
> <dataConfig>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="beautybooks88 " pk="title" url="
> http://beautybooks88.blogspot.com/feeds/posts/default"
> processor="XPathEntityProcessor" forEach="/feed/entry"
> transformer="DateFormatTransformer">
>             <field column="title" xpath="/feed/entry/title" />
>             <field column="source-link"
> xpath="/feed/link[@rel='self']/@href" commonField="true" />
>         </entity>
>     </document>
> </dataConfig>
>
> [GOOD CASE]
> <dataConfig>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="beautybooks88 " pk="title" url="
> http://beautybooks88.blogspot.com/feeds/posts/default"
> processor="XPathEntityProcessor" forEach="/feed/entry"
> transformer="DateFormatTransformer">
>             <field column="title" xpath="/feed/entry/title" />
>             <field column="link"
> xpath="/feed/entry/link[@rel='self']/@href" />
>         </entity>
>     </document>
> </dataConfig>
>
> Conclusion: It has nothing to do with the number of occurrences of the
> pattern.




-- 
Regards,
Shalin Shekhar Mangar.

Re: Problem with xpath expression in data-config.xml

Posted by Hans-Peter Stricker <st...@epublius.de>.
Thanks for having analyzed the problem. But please let me note that I came to a somehow different conclusion.

Define for the moment "title" to be the primary unique key: 

solr-4.3.0\example\example-DIH\solr\rss\conf\schema.xml

<uniqueKey>title</uniqueKey> 

solr-4.3.0\example\example-DIH\solr\rss\conf\rss-data-config.xml

[BAD CASE] (irrespective of the predicate @rel='self')
<dataConfig>
    <dataSource type="URLDataSource" />
    <document>
        <entity name="beautybooks88 " pk="title" url="http://beautybooks88.blogspot.com/feeds/posts/default" processor="XPathEntityProcessor" forEach="/feed/entry" transformer="DateFormatTransformer">
            <field column="title" xpath="/feed/entry/title" />
            <field column="source-link" xpath="/feed/link[@rel='self']/@href" commonField="true" />
        </entity>
    </document>
</dataConfig>

[GOOD CASE]
<dataConfig>
    <dataSource type="URLDataSource" />
    <document>
        <entity name="beautybooks88 " pk="title" url="http://beautybooks88.blogspot.com/feeds/posts/default" processor="XPathEntityProcessor" forEach="/feed/entry" transformer="DateFormatTransformer">
            <field column="title" xpath="/feed/entry/title" />
            <field column="link" xpath="/feed/entry/link[@rel='self']/@href" />
        </entity>
    </document>
</dataConfig>

Conclusion: It has nothing to do with the number of occurrences of the pattern.

Re: Problem with xpath expression in data-config.xml

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
I created https://issues.apache.org/jira/browse/SOLR-4875


On Wed, May 29, 2013 at 9:15 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

>
> On Wed, May 29, 2013 at 6:05 PM, Hans-Peter Stricker <stricker@epublius.de
> > wrote:
>
>> Replacing the contents of solr-4.3.0\example\example-**
>> DIH\solr\rss\conf\rss-data-**config.xml
>>
>> by
>>
>> <dataConfig>
>>    <dataSource type="URLDataSource" />
>>    <document>
>>        <entity name="beautybooks88 " pk="link" url="http://beautybooks88.
>> **blogspot.com/feeds/posts/**default<http://beautybooks88.blogspot.com/feeds/posts/default>"
>> processor="**XPathEntityProcessor" forEach="/feed/entry" transformer="**
>> DateFormatTransformer">
>>                         <field column="source" xpath="/feed/title"
>> commonField="true" />
>>                         <field column="source-link"
>> xpath="/feed/link[@rel='self']**/@href" commonField="true" />
>>
>>                         <field column="title" xpath="/feed/entry/title" />
>>                         <field column="link"
>> xpath="/feed/entry/link[@rel='**self']/@href" />
>>                         <field column="description"
>> xpath="/feed/entry/content" stripHTML="true"/>
>>                         <field column="creator"
>> xpath="/feed/entry/author" />
>>                         <field column="item-subject"
>> xpath="/feed/entry/category/@**term"/>
>>                         <field column="date" xpath="/feed/entry/updated"
>> dateTimeFormat="yyyy-MM-dd'T'**HH:mm:ss" />
>>                 </entity>
>>    </document>
>> </dataConfig>
>>
>> and running the full dataimport from http://localhost:8983/solr/#/**
>> rss/dataimport//dataimport<http://localhost:8983/solr/#/rss/dataimport//dataimport>results in an error.
>>
>> 1) How could I have found the reason faster than I did - by looking into
>> which log files,....?
>>
>>
> DIH uses the same log file as solr. The name/location of the log file
> depends on your logging configuration.
>
>
>> 2) If you remove the first occurrence of /@href above, the import
>> succeeds. (Note that the same pattern works for column "link".) What's the
>> reason why?!!
>>
>
> I think there is a bug here. In my tests, xpath="/root/a/@y"
> works, xpath="/root/a[@x='1']/@y" also works. But if you use them together
> the one which is defined last returns null. I'll open an issue.
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Problem with xpath expression in data-config.xml

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, May 29, 2013 at 6:05 PM, Hans-Peter Stricker
<st...@epublius.de>wrote:

> Replacing the contents of solr-4.3.0\example\example-**
> DIH\solr\rss\conf\rss-data-**config.xml
>
> by
>
> <dataConfig>
>    <dataSource type="URLDataSource" />
>    <document>
>        <entity name="beautybooks88 " pk="link" url="http://beautybooks88.*
> *blogspot.com/feeds/posts/**default<http://beautybooks88.blogspot.com/feeds/posts/default>"
> processor="**XPathEntityProcessor" forEach="/feed/entry" transformer="**
> DateFormatTransformer">
>                         <field column="source" xpath="/feed/title"
> commonField="true" />
>                         <field column="source-link"
> xpath="/feed/link[@rel='self']**/@href" commonField="true" />
>
>                         <field column="title" xpath="/feed/entry/title" />
>                         <field column="link" xpath="/feed/entry/link[@rel='
> **self']/@href" />
>                         <field column="description"
> xpath="/feed/entry/content" stripHTML="true"/>
>                         <field column="creator" xpath="/feed/entry/author"
> />
>                         <field column="item-subject"
> xpath="/feed/entry/category/@**term"/>
>                         <field column="date" xpath="/feed/entry/updated"
> dateTimeFormat="yyyy-MM-dd'T'**HH:mm:ss" />
>                 </entity>
>    </document>
> </dataConfig>
>
> and running the full dataimport from http://localhost:8983/solr/#/**
> rss/dataimport//dataimport<http://localhost:8983/solr/#/rss/dataimport//dataimport>results in an error.
>
> 1) How could I have found the reason faster than I did - by looking into
> which log files,....?
>
>
DIH uses the same log file as solr. The name/location of the log file
depends on your logging configuration.


> 2) If you remove the first occurrence of /@href above, the import
> succeeds. (Note that the same pattern works for column "link".) What's the
> reason why?!!
>

I think there is a bug here. In my tests, xpath="/root/a/@y"
works, xpath="/root/a[@x='1']/@y" also works. But if you use them together
the one which is defined last returns null. I'll open an issue.


-- 
Regards,
Shalin Shekhar Mangar.