You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@any23.apache.org by Brian Sletten <br...@gmail.com> on 2012/09/13 00:50:53 UTC

Non-HTML XPathExtraction

Greetings.

I am interested in something similar to the XPathExtractor but for regular XML documents, not HTML.  Is there such a thing?  It seems that the SingleDocumentExtraction/XPathExtractor pair is based on the assumption of HTML.  I've been spelunking in the code this afternoon and it appears as if it might be possible if you were able to feed a non-HTMLDocumentImpl into the process.

Before I spend any more time, I thought I'd ask. Congrats on the new home and status. This is a tremendously useful infrastructure. Glad to see it getting the recognition it deserves.

Regards,

Brian

Re: Non-HTML XPathExtraction

Posted by Michele Mostarda <mi...@gmail.com>.

Hi Brian,

On 13 September 2012 00:50, Brian Sletten <br...@gmail.com> wrote:

> Greetings.
>
> I am interested in something similar to the XPathExtractor but for regular
> XML documents, not HTML.  Is there such a thing?  It seems that the
> SingleDocumentExtraction/XPathExtractor pair is based on the assumption of
> HTML.  I've been spelunking in the code this afternoon and it appears as if
> it might be possible if you were able to feed a non-HTMLDocumentImpl into
> the process.
>

Currently Any23 doesn't handle generic XML. The XPathExtractor was meant to
extract fragment of well known HTML pages.
For your purpose why don't use just XSLT[1] ?


>
> Before I spend any more time, I thought I'd ask. Congrats on the new home
> and status. This is a tremendously useful infrastructure. Glad to see it
> getting the recognition it deserves.
>

Thanks a lot!


>
> Regards,
>
> Brian


The best,
Mic

[1] http://en.wikipedia.org/wiki/XSLT


-- 
Michele Mostarda
Senior Software Engineer
skype: michele.mostarda
twitter: micmos
mail: me@michelemostarda.com
site : http://www.michelemostarda.com