You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Hans-Peter Stricker <st...@epublius.de> on 2013/05/29 14:35:02 UTC
Problem with xpath expression in data-config.xml
Replacing the contents of
solr-4.3.0\example\example-DIH\solr\rss\conf\rss-data-config.xml
by
<dataConfig>
<dataSource type="URLDataSource" />
<document>
<entity name="beautybooks88 " pk="link"
url="http://beautybooks88.blogspot.com/feeds/posts/default"
processor="XPathEntityProcessor" forEach="/feed/entry"
transformer="DateFormatTransformer">
<field column="source" xpath="/feed/title" commonField="true" />
<field column="source-link" xpath="/feed/link[@rel='self']/@href"
commonField="true" />
<field column="title" xpath="/feed/entry/title" />
<field column="link" xpath="/feed/entry/link[@rel='self']/@href" />
<field column="description" xpath="/feed/entry/content"
stripHTML="true"/>
<field column="creator" xpath="/feed/entry/author" />
<field column="item-subject" xpath="/feed/entry/category/@term"/>
<field column="date" xpath="/feed/entry/updated"
dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
</entity>
</document>
</dataConfig>
and running the full dataimport from
http://localhost:8983/solr/#/rss/dataimport//dataimport results in an error.
1) How could I have found the reason faster than I did - by looking into
which log files,....?
2) If you remove the first occurrence of /@href above, the import succeeds.
(Note that the same pattern works for column "link".) What's the reason
why?!!
Best regards and thanks in advance
Hans-Peter
Re: Problem with xpath expression in data-config.xml
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Ah, I missed that part.
The problem that you have is because you have forEach="/feed/entry" but you
want to read /feed/link as a common field. You need to have forEach="/feed
| /feed/entry" which should let you have both /feed/link as well as
/feed/entry/link.
On Thu, May 30, 2013 at 1:25 PM, Hans-Peter Stricker
<st...@epublius.de>wrote:
> Thanks for having analyzed the problem. But please let me note that I came
> to a somehow different conclusion.
>
> Define for the moment "title" to be the primary unique key:
>
> solr-4.3.0\example\example-DIH\solr\rss\conf\schema.xml
>
> <uniqueKey>title</uniqueKey>
>
> solr-4.3.0\example\example-DIH\solr\rss\conf\rss-data-config.xml
>
> [BAD CASE] (irrespective of the predicate @rel='self')
> <dataConfig>
> <dataSource type="URLDataSource" />
> <document>
> <entity name="beautybooks88 " pk="title" url="
> http://beautybooks88.blogspot.com/feeds/posts/default"
> processor="XPathEntityProcessor" forEach="/feed/entry"
> transformer="DateFormatTransformer">
> <field column="title" xpath="/feed/entry/title" />
> <field column="source-link"
> xpath="/feed/link[@rel='self']/@href" commonField="true" />
> </entity>
> </document>
> </dataConfig>
>
> [GOOD CASE]
> <dataConfig>
> <dataSource type="URLDataSource" />
> <document>
> <entity name="beautybooks88 " pk="title" url="
> http://beautybooks88.blogspot.com/feeds/posts/default"
> processor="XPathEntityProcessor" forEach="/feed/entry"
> transformer="DateFormatTransformer">
> <field column="title" xpath="/feed/entry/title" />
> <field column="link"
> xpath="/feed/entry/link[@rel='self']/@href" />
> </entity>
> </document>
> </dataConfig>
>
> Conclusion: It has nothing to do with the number of occurrences of the
> pattern.
--
Regards,
Shalin Shekhar Mangar.
Re: Problem with xpath expression in data-config.xml
Posted by Hans-Peter Stricker <st...@epublius.de>.
Thanks for having analyzed the problem. But please let me note that I came to a somehow different conclusion.
Define for the moment "title" to be the primary unique key:
solr-4.3.0\example\example-DIH\solr\rss\conf\schema.xml
<uniqueKey>title</uniqueKey>
solr-4.3.0\example\example-DIH\solr\rss\conf\rss-data-config.xml
[BAD CASE] (irrespective of the predicate @rel='self')
<dataConfig>
<dataSource type="URLDataSource" />
<document>
<entity name="beautybooks88 " pk="title" url="http://beautybooks88.blogspot.com/feeds/posts/default" processor="XPathEntityProcessor" forEach="/feed/entry" transformer="DateFormatTransformer">
<field column="title" xpath="/feed/entry/title" />
<field column="source-link" xpath="/feed/link[@rel='self']/@href" commonField="true" />
</entity>
</document>
</dataConfig>
[GOOD CASE]
<dataConfig>
<dataSource type="URLDataSource" />
<document>
<entity name="beautybooks88 " pk="title" url="http://beautybooks88.blogspot.com/feeds/posts/default" processor="XPathEntityProcessor" forEach="/feed/entry" transformer="DateFormatTransformer">
<field column="title" xpath="/feed/entry/title" />
<field column="link" xpath="/feed/entry/link[@rel='self']/@href" />
</entity>
</document>
</dataConfig>
Conclusion: It has nothing to do with the number of occurrences of the pattern.
Re: Problem with xpath expression in data-config.xml
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
I created https://issues.apache.org/jira/browse/SOLR-4875
On Wed, May 29, 2013 at 9:15 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:
>
> On Wed, May 29, 2013 at 6:05 PM, Hans-Peter Stricker <stricker@epublius.de
> > wrote:
>
>> Replacing the contents of solr-4.3.0\example\example-**
>> DIH\solr\rss\conf\rss-data-**config.xml
>>
>> by
>>
>> <dataConfig>
>> <dataSource type="URLDataSource" />
>> <document>
>> <entity name="beautybooks88 " pk="link" url="http://beautybooks88.
>> **blogspot.com/feeds/posts/**default<http://beautybooks88.blogspot.com/feeds/posts/default>"
>> processor="**XPathEntityProcessor" forEach="/feed/entry" transformer="**
>> DateFormatTransformer">
>> <field column="source" xpath="/feed/title"
>> commonField="true" />
>> <field column="source-link"
>> xpath="/feed/link[@rel='self']**/@href" commonField="true" />
>>
>> <field column="title" xpath="/feed/entry/title" />
>> <field column="link"
>> xpath="/feed/entry/link[@rel='**self']/@href" />
>> <field column="description"
>> xpath="/feed/entry/content" stripHTML="true"/>
>> <field column="creator"
>> xpath="/feed/entry/author" />
>> <field column="item-subject"
>> xpath="/feed/entry/category/@**term"/>
>> <field column="date" xpath="/feed/entry/updated"
>> dateTimeFormat="yyyy-MM-dd'T'**HH:mm:ss" />
>> </entity>
>> </document>
>> </dataConfig>
>>
>> and running the full dataimport from http://localhost:8983/solr/#/**
>> rss/dataimport//dataimport<http://localhost:8983/solr/#/rss/dataimport//dataimport>results in an error.
>>
>> 1) How could I have found the reason faster than I did - by looking into
>> which log files,....?
>>
>>
> DIH uses the same log file as solr. The name/location of the log file
> depends on your logging configuration.
>
>
>> 2) If you remove the first occurrence of /@href above, the import
>> succeeds. (Note that the same pattern works for column "link".) What's the
>> reason why?!!
>>
>
> I think there is a bug here. In my tests, xpath="/root/a/@y"
> works, xpath="/root/a[@x='1']/@y" also works. But if you use them together
> the one which is defined last returns null. I'll open an issue.
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
--
Regards,
Shalin Shekhar Mangar.
Re: Problem with xpath expression in data-config.xml
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, May 29, 2013 at 6:05 PM, Hans-Peter Stricker
<st...@epublius.de>wrote:
> Replacing the contents of solr-4.3.0\example\example-**
> DIH\solr\rss\conf\rss-data-**config.xml
>
> by
>
> <dataConfig>
> <dataSource type="URLDataSource" />
> <document>
> <entity name="beautybooks88 " pk="link" url="http://beautybooks88.*
> *blogspot.com/feeds/posts/**default<http://beautybooks88.blogspot.com/feeds/posts/default>"
> processor="**XPathEntityProcessor" forEach="/feed/entry" transformer="**
> DateFormatTransformer">
> <field column="source" xpath="/feed/title"
> commonField="true" />
> <field column="source-link"
> xpath="/feed/link[@rel='self']**/@href" commonField="true" />
>
> <field column="title" xpath="/feed/entry/title" />
> <field column="link" xpath="/feed/entry/link[@rel='
> **self']/@href" />
> <field column="description"
> xpath="/feed/entry/content" stripHTML="true"/>
> <field column="creator" xpath="/feed/entry/author"
> />
> <field column="item-subject"
> xpath="/feed/entry/category/@**term"/>
> <field column="date" xpath="/feed/entry/updated"
> dateTimeFormat="yyyy-MM-dd'T'**HH:mm:ss" />
> </entity>
> </document>
> </dataConfig>
>
> and running the full dataimport from http://localhost:8983/solr/#/**
> rss/dataimport//dataimport<http://localhost:8983/solr/#/rss/dataimport//dataimport>results in an error.
>
> 1) How could I have found the reason faster than I did - by looking into
> which log files,....?
>
>
DIH uses the same log file as solr. The name/location of the log file
depends on your logging configuration.
> 2) If you remove the first occurrence of /@href above, the import
> succeeds. (Note that the same pattern works for column "link".) What's the
> reason why?!!
>
I think there is a bug here. In my tests, xpath="/root/a/@y"
works, xpath="/root/a[@x='1']/@y" also works. But if you use them together
the one which is defined last returns null. I'll open an issue.
--
Regards,
Shalin Shekhar Mangar.