You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2009/06/20 02:01:32 UTC
Re: XPath query support in Solr Cell
: Date: Wed, 20 May 2009 16:45:25 -0400
: From: Eric Pugh
: Subject: XPath query support in Solr Cell
Not sure if you figured this out, but your error is coming from curl, not
from Solr. curl has a "feature" where it can hit multiple URLs that
differe only by a sequential number in a range. check the "URL" section
of "man curl" for all the details.
Full URI escaping of the square brackets (to %5B and %5D) should work
however ... it works for me anyway.
: So I am trying to filter down what I am indexing, and the basic XPath queries
: don't work. For example, working with tutorial.pdf this indexes all the
: <div/>:
:
: curl
: http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text\&ext.map.div=foo_t\&ext.capture=div\&ext.literal.id=126\&ext.xpath=\/xhtml:html\/xhtml:body\/descendant:node\(\)
: -F "tutorial=@tutorial.pdf"
:
: However, if I want to only index the first div, I expect to do this:
:
: budapest:site epugh$ curl
: http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text\&ext.map.div=foo_t\&ext.capture=div\&ext.literal.id=126\&ext.xpath=\/xhtml:html\/xhtml:body\/xhtml:div[1]
: -F "tutorial=@tutorial.pdf"
:
: But I keep getting back an issue from curl. My attempts to escape the [1]
: have failed. Any suggestions?
:
: curl: (3) [globbing] error: bad range specification after pos 174
:
: Eric
:
: PS,
: Also, this site seems to be okay as a place to upload your html and practice
: xpath:
:
: http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm
:
: I did have to trip out the namespace stuff though.
:
:
:
:
: -----------------------------------------------------
: Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
: http://www.opensourceconnections.com
: Free/Busy: http://tinyurl.com/eric-cal
:
:
:
-Hoss