You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Antonio Eggberg <an...@yahoo.se> on 2009/07/22 10:46:06 UTC
DIH example explanation
Hi,
I am looking at the slashdot example and I am having hard time understanding the following, from the wiki
==
"You can use this feature for indexing from REST API's such as rss/atom feeds, XML data feeds , other Solr servers or even well formed xhtml documents . Our XPath support has its limitations (no wildcards , only fullpath etc) but we have tried to make sure that common use-cases are covered and since it's based on a streaming parser, it is extremely fast and consumes constant amount of memory even for large XMLs. It does not support namespaces , but it can handle xmls with namespaces . When you provide the xpath, just drop the namespace and give the rest (eg if the tag is '<dc:subject>' the mapping should just contain 'subject').Easy, isn't it? And you didn't need to write one line of code! Enjoy"
==
How does <dc:subject> becomes field subject and why it's mapping xpath="/RDF/item/subject".. what is the secret?
I am trying to index atom files and I need to understand the above cos I have namespace, not sure how to proceed. are there any atom example anywhere?
Thanks again for clarification.
Anton
__________________________________________________________
Ta semester! - sök efter resor hos Kelkoo.
Jämför pris på flygbiljetter och hotellrum här:
http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052
Re: DIH example explanation
Posted by Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>.
any string that is templatized in DIH can have variables like this ${a.b}
for instance look at the following
url="http://xyz.com/atom/${dataimporter.request.foo}"
if you pass a parameter foo=bar when you invoke the command the url
invoked becomes
http://xyz.com/atom/bar
the variable can come from many places
see this http://wiki.apache.org/solr/DataImportHandler#head-86408ce7721ea6f9a3f05b12ace8742fd41737d4
On Wed, Jul 22, 2009 at 4:30 PM, Antonio
Eggberg<an...@yahoo.se> wrote:
> :)
>
> thank you paul! and it works! I have one more stupid question about the wiki.
>
> "url (required) : The url used to invoke the REST API. (Can be templatized)."
>
> How do you templatize the URL? My URL's are being updated all the time by an external program. i.e. list of atom sites it's a text file. So I should use some form of transformer to process it? any hint..
>
> Thanks.
> Anton
>
> --- Den ons 2009-07-22 skrev Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>:
>
>> Från: Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>
>> Ämne: Re: DIH example explanation
>> Till: solr-user@lucene.apache.org
>> Datum: onsdag 22 juli 2009 10.52
>> The point is that namespace is
>> ignored while DIH reads the xml. So
>> just use the part after the colon (:) in your xpath
>> expressions and it
>> should just work.
>>
>>
>>
>>
>>
>> On Wed, Jul 22, 2009 at 2:16 PM, Antonio
>> Eggberg<an...@yahoo.se>
>> wrote:
>> > Hi,
>> >
>> > I am looking at the slashdot example and I am having
>> hard time understanding the following, from the wiki
>> >
>> > ==
>> >
>> > "You can use this feature for indexing from REST API's
>> such as rss/atom feeds, XML data feeds , other Solr servers
>> or even well formed xhtml documents . Our XPath support has
>> its limitations (no wildcards , only fullpath etc) but we
>> have tried to make sure that common use-cases are covered
>> and since it's based on a streaming parser, it is extremely
>> fast and consumes constant amount of memory even for large
>> XMLs. It does not support namespaces , but it can handle
>> xmls with namespaces . When you provide the xpath, just drop
>> the namespace and give the rest (eg if the tag is
>> '<dc:subject>' the mapping should just contain
>> 'subject').Easy, isn't it? And you didn't need to write one
>> line of code! Enjoy"
>> > ==
>> >
>> > How does <dc:subject> becomes field subject and
>> why it's mapping xpath="/RDF/item/subject".. what is the
>> secret?
>> >
>> > I am trying to index atom files and I need to
>> understand the above cos I have namespace, not sure how to
>> proceed. are there any atom example anywhere?
>> >
>> > Thanks again for clarification.
>> > Anton
>> >
>> >
>> >
>> __________________________________________________________
>> > Ta semester! - sök efter resor hos Kelkoo.
>> > Jämför pris på flygbiljetter och hotellrum här:
>> > http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052
>> >
>> >
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>
>
> __________________________________________________________
> Ta semester! - sök efter resor hos Kelkoo.
> Jämför pris på flygbiljetter och hotellrum här:
> http://www.kelkoo..se/c-169901-resor-biljetter.html?partnerId=96914052
>
>
--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com
Re: DIH example explanation
Posted by Antonio Eggberg <an...@yahoo.se>.
:)
thank you paul! and it works! I have one more stupid question about the wiki.
"url (required) : The url used to invoke the REST API. (Can be templatized)."
How do you templatize the URL? My URL's are being updated all the time by an external program. i.e. list of atom sites it's a text file. So I should use some form of transformer to process it? any hint..
Thanks.
Anton
--- Den ons 2009-07-22 skrev Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>:
> Från: Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>
> Ämne: Re: DIH example explanation
> Till: solr-user@lucene.apache.org
> Datum: onsdag 22 juli 2009 10.52
> The point is that namespace is
> ignored while DIH reads the xml. So
> just use the part after the colon (:) in your xpath
> expressions and it
> should just work.
>
>
>
>
>
> On Wed, Jul 22, 2009 at 2:16 PM, Antonio
> Eggberg<an...@yahoo.se>
> wrote:
> > Hi,
> >
> > I am looking at the slashdot example and I am having
> hard time understanding the following, from the wiki
> >
> > ==
> >
> > "You can use this feature for indexing from REST API's
> such as rss/atom feeds, XML data feeds , other Solr servers
> or even well formed xhtml documents . Our XPath support has
> its limitations (no wildcards , only fullpath etc) but we
> have tried to make sure that common use-cases are covered
> and since it's based on a streaming parser, it is extremely
> fast and consumes constant amount of memory even for large
> XMLs. It does not support namespaces , but it can handle
> xmls with namespaces . When you provide the xpath, just drop
> the namespace and give the rest (eg if the tag is
> '<dc:subject>' the mapping should just contain
> 'subject').Easy, isn't it? And you didn't need to write one
> line of code! Enjoy"
> > ==
> >
> > How does <dc:subject> becomes field subject and
> why it's mapping xpath="/RDF/item/subject".. what is the
> secret?
> >
> > I am trying to index atom files and I need to
> understand the above cos I have namespace, not sure how to
> proceed. are there any atom example anywhere?
> >
> > Thanks again for clarification.
> > Anton
> >
> >
> >
> __________________________________________________________
> > Ta semester! - sök efter resor hos Kelkoo.
> > Jämför pris på flygbiljetter och hotellrum här:
> > http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052
> >
> >
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>
__________________________________________________________
Ta semester! - sök efter resor hos Kelkoo.
Jämför pris på flygbiljetter och hotellrum här:
http://www.kelkoo..se/c-169901-resor-biljetter.html?partnerId=96914052
Re: DIH example explanation
Posted by Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>.
The point is that namespace is ignored while DIH reads the xml. So
just use the part after the colon (:) in your xpath expressions and it
should just work.
On Wed, Jul 22, 2009 at 2:16 PM, Antonio
Eggberg<an...@yahoo.se> wrote:
> Hi,
>
> I am looking at the slashdot example and I am having hard time understanding the following, from the wiki
>
> ==
>
> "You can use this feature for indexing from REST API's such as rss/atom feeds, XML data feeds , other Solr servers or even well formed xhtml documents . Our XPath support has its limitations (no wildcards , only fullpath etc) but we have tried to make sure that common use-cases are covered and since it's based on a streaming parser, it is extremely fast and consumes constant amount of memory even for large XMLs. It does not support namespaces , but it can handle xmls with namespaces . When you provide the xpath, just drop the namespace and give the rest (eg if the tag is '<dc:subject>' the mapping should just contain 'subject').Easy, isn't it? And you didn't need to write one line of code! Enjoy"
> ==
>
> How does <dc:subject> becomes field subject and why it's mapping xpath="/RDF/item/subject".. what is the secret?
>
> I am trying to index atom files and I need to understand the above cos I have namespace, not sure how to proceed. are there any atom example anywhere?
>
> Thanks again for clarification.
> Anton
>
>
> __________________________________________________________
> Ta semester! - sök efter resor hos Kelkoo.
> Jämför pris på flygbiljetter och hotellrum här:
> http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052
>
>
--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com