You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@servicemix.apache.org by Stefan Klinger <kl...@cs.york.ac.uk> on 2006/04/24 16:44:28 UTC

retrieving html using http get

Hello,

I have built my own lightweight binding components similar to the 
o.a.s.components.http.* classes to use with http get in order to 
retrieve html pages via ESB. The problem is that the HttpMarshaler/ 
HttpClientMarshaler expect the content of the http request/ response 
body to be xml in order for them to include it in the NormalizedMessage 
content. Unfortunately, this does not work with html as it is not xml 
compliant, so I put the response body in as an attachment. I have tested 
the new components and it seems to work fine.

I wonder whether for a more generic solution the lightweight Marshalers 
should support other Mime-Types. Only if the content-type of the http 
header is "text/xml" the http requests/ responses are transformed into 
the NormalizedMessage contents, otherwise they use an dummy contents and 
attach the request/response to the message. I'd be happy to work on this.

I know that the HttpComponent already allows for MultiMimeMessage Post 
Requests, however, it expects at least one part of it to be xml (which I 
don't like). I would much rather prefer for the SoapReader to check if 
there is a xml part available, if yes, use it as content, otherwise 
attach everything and use a dummy message content.

Any comments are most welcome.

Stefan

Re: retrieving html using http get

Posted by Antoni Reus <ar...@ibit.org>.
Hi,

AFAIK, the NormalizedMessage content, or "payload" must be always XML, 
is a requirement of the JBI spec.

If you can serve your pages in XHTML it could be a good solution, XHTML 
is xml compliant.


Salut.

-- Antoni Reus

En/na Stefan Klinger ha escrit:

> Hello,
>
> I have built my own lightweight binding components similar to the 
> o.a.s.components.http.* classes to use with http get in order to 
> retrieve html pages via ESB. The problem is that the HttpMarshaler/ 
> HttpClientMarshaler expect the content of the http request/ response 
> body to be xml in order for them to include it in the 
> NormalizedMessage content. Unfortunately, this does not work with html 
> as it is not xml compliant, so I put the response body in as an 
> attachment. I have tested the new components and it seems to work fine.
>
> I wonder whether for a more generic solution the lightweight 
> Marshalers should support other Mime-Types. Only if the content-type 
> of the http header is "text/xml" the http requests/ responses are 
> transformed into the NormalizedMessage contents, otherwise they use an 
> dummy contents and attach the request/response to the message. I'd be 
> happy to work on this.
>
> I know that the HttpComponent already allows for MultiMimeMessage Post 
> Requests, however, it expects at least one part of it to be xml (which 
> I don't like). I would much rather prefer for the SoapReader to check 
> if there is a xml part available, if yes, use it as content, otherwise 
> attach everything and use a dummy message content.
>
> Any comments are most welcome.
>
> Stefan
>
>


Re: retrieving html using http get

Posted by Guillaume Nodet <gn...@gmail.com>.
See comments inline ...

On 4/25/06, Stefan Klinger <kl...@cs.york.ac.uk> wrote:
> Thanks Guillaume,
>
> I forgot to check but there is already a Jira
> (http://issues.apache.org/activemq/browse/SM-44) which states that Mime
> encodings will not be supported by HttpInOutBinding. Is that still the case?

Yes

>
> I have had a quick look at the http parser below, it seems to do the
> trick quite nicely. It is also available as a Maven package at
> http://www.ibiblio.org/maven/nekohtml so it should integrate well into
> the build.
>
> So the questions that remain are:
>
> - Should the leightweight components support mime?

Well, if someone contributes a patch for that -- with tests ;) -- i'd
be happy to commit it.  However, the servicemix-http component has
already more features than the lightweight ones and i'd rather work on
this one.

> - Should they support http get?

Same answer.

> - Should they recognise the text/html content-type and parse the
> document using nekohtml?

Why not.  As long as it does not break compatibility, i have no objections.

>
> or
>
> - Should this be only part of the http component?

I'd rather add new features to the servicemix-http component.
If this somewhat breaks compatibility, we can add some mechanism to
handle more than one consumer / provider processor set the way it's
done in servicemix-jms.  This way, we could have a set for standard
web services (soap or not) which is the current one, another one for
REST style services (with support for GET and other http methods),
another one for html, ....

>
> I have also raised a jira for the SoapReader
> (http://issues.apache.org/activemq/browse/SM-409) so it does not assume
> by default that the first part of multi mime is xml. Once I have figured
> out how to add a patch, I will add it.

Simply attach a file to the jira.  The prefered way is a diff or patch
(you can do that using svn in command line or with a gui like tortoise
or eclipse).

Cheers,
Guillaume Nodet

>
> Stefan
>
>
> Guillaume Nodet wrote:
> > FYI, there are also some http parsers that can generate valid xml content.
> > Take a look at http://people.apache.org/~andyc/neko/doc/html/index.html
> >
> > Cheers,
> > Guillaume Nodet
> >
> > On 4/24/06, Guillaume Nodet <gn...@gmail.com> wrote:
> >
> >> Feel free to raise a jira and contribute a patch on servicemix-soap.
> >> The dummy message is also needed for GET requests, so I guess we could
> >> also use it in this case ...
> >>
> >> Cheers,
> >> Guillaume Nodet
> >>
> >> On 4/24/06, Stefan Klinger <kl...@cs.york.ac.uk> wrote:
> >>
> >>> Hello,
> >>>
> >>> I have built my own lightweight binding components similar to the
> >>> o.a.s.components.http.* classes to use with http get in order to
> >>> retrieve html pages via ESB. The problem is that the HttpMarshaler/
> >>> HttpClientMarshaler expect the content of the http request/ response
> >>> body to be xml in order for them to include it in the NormalizedMessage
> >>> content. Unfortunately, this does not work with html as it is not xml
> >>> compliant, so I put the response body in as an attachment. I have tested
> >>> the new components and it seems to work fine.
> >>>
> >>> I wonder whether for a more generic solution the lightweight Marshalers
> >>> should support other Mime-Types. Only if the content-type of the http
> >>> header is "text/xml" the http requests/ responses are transformed into
> >>> the NormalizedMessage contents, otherwise they use an dummy contents and
> >>> attach the request/response to the message. I'd be happy to work on this.
> >>>
> >>> I know that the HttpComponent already allows for MultiMimeMessage Post
> >>> Requests, however, it expects at least one part of it to be xml (which I
> >>> don't like). I would much rather prefer for the SoapReader to check if
> >>> there is a xml part available, if yes, use it as content, otherwise
> >>> attach everything and use a dummy message content.
> >>>
> >>> Any comments are most welcome.
> >>>
> >>> Stefan
> >>>
> >>>
> >
> >
>
>

Re: retrieving html using http get

Posted by Stefan Klinger <kl...@cs.york.ac.uk>.
Thanks Guillaume,

I forgot to check but there is already a Jira 
(http://issues.apache.org/activemq/browse/SM-44) which states that Mime 
encodings will not be supported by HttpInOutBinding. Is that still the case?

I have had a quick look at the http parser below, it seems to do the 
trick quite nicely. It is also available as a Maven package at 
http://www.ibiblio.org/maven/nekohtml so it should integrate well into 
the build.

So the questions that remain are:

- Should the leightweight components support mime?
- Should they support http get?
- Should they recognise the text/html content-type and parse the 
document using nekohtml?

or

- Should this be only part of the http component?

I have also raised a jira for the SoapReader 
(http://issues.apache.org/activemq/browse/SM-409) so it does not assume 
by default that the first part of multi mime is xml. Once I have figured 
out how to add a patch, I will add it.

Stefan


Guillaume Nodet wrote:
> FYI, there are also some http parsers that can generate valid xml content.
> Take a look at http://people.apache.org/~andyc/neko/doc/html/index.html
>
> Cheers,
> Guillaume Nodet
>
> On 4/24/06, Guillaume Nodet <gn...@gmail.com> wrote:
>   
>> Feel free to raise a jira and contribute a patch on servicemix-soap.
>> The dummy message is also needed for GET requests, so I guess we could
>> also use it in this case ...
>>
>> Cheers,
>> Guillaume Nodet
>>
>> On 4/24/06, Stefan Klinger <kl...@cs.york.ac.uk> wrote:
>>     
>>> Hello,
>>>
>>> I have built my own lightweight binding components similar to the
>>> o.a.s.components.http.* classes to use with http get in order to
>>> retrieve html pages via ESB. The problem is that the HttpMarshaler/
>>> HttpClientMarshaler expect the content of the http request/ response
>>> body to be xml in order for them to include it in the NormalizedMessage
>>> content. Unfortunately, this does not work with html as it is not xml
>>> compliant, so I put the response body in as an attachment. I have tested
>>> the new components and it seems to work fine.
>>>
>>> I wonder whether for a more generic solution the lightweight Marshalers
>>> should support other Mime-Types. Only if the content-type of the http
>>> header is "text/xml" the http requests/ responses are transformed into
>>> the NormalizedMessage contents, otherwise they use an dummy contents and
>>> attach the request/response to the message. I'd be happy to work on this.
>>>
>>> I know that the HttpComponent already allows for MultiMimeMessage Post
>>> Requests, however, it expects at least one part of it to be xml (which I
>>> don't like). I would much rather prefer for the SoapReader to check if
>>> there is a xml part available, if yes, use it as content, otherwise
>>> attach everything and use a dummy message content.
>>>
>>> Any comments are most welcome.
>>>
>>> Stefan
>>>
>>>       
>
>   


Re: retrieving html using http get

Posted by Guillaume Nodet <gn...@gmail.com>.
FYI, there are also some http parsers that can generate valid xml content.
Take a look at http://people.apache.org/~andyc/neko/doc/html/index.html

Cheers,
Guillaume Nodet

On 4/24/06, Guillaume Nodet <gn...@gmail.com> wrote:
> Feel free to raise a jira and contribute a patch on servicemix-soap.
> The dummy message is also needed for GET requests, so I guess we could
> also use it in this case ...
>
> Cheers,
> Guillaume Nodet
>
> On 4/24/06, Stefan Klinger <kl...@cs.york.ac.uk> wrote:
> > Hello,
> >
> > I have built my own lightweight binding components similar to the
> > o.a.s.components.http.* classes to use with http get in order to
> > retrieve html pages via ESB. The problem is that the HttpMarshaler/
> > HttpClientMarshaler expect the content of the http request/ response
> > body to be xml in order for them to include it in the NormalizedMessage
> > content. Unfortunately, this does not work with html as it is not xml
> > compliant, so I put the response body in as an attachment. I have tested
> > the new components and it seems to work fine.
> >
> > I wonder whether for a more generic solution the lightweight Marshalers
> > should support other Mime-Types. Only if the content-type of the http
> > header is "text/xml" the http requests/ responses are transformed into
> > the NormalizedMessage contents, otherwise they use an dummy contents and
> > attach the request/response to the message. I'd be happy to work on this.
> >
> > I know that the HttpComponent already allows for MultiMimeMessage Post
> > Requests, however, it expects at least one part of it to be xml (which I
> > don't like). I would much rather prefer for the SoapReader to check if
> > there is a xml part available, if yes, use it as content, otherwise
> > attach everything and use a dummy message content.
> >
> > Any comments are most welcome.
> >
> > Stefan
> >
>

Re: retrieving html using http get

Posted by Guillaume Nodet <gn...@gmail.com>.
Feel free to raise a jira and contribute a patch on servicemix-soap.
The dummy message is also needed for GET requests, so I guess we could
also use it in this case ...

Cheers,
Guillaume Nodet

On 4/24/06, Stefan Klinger <kl...@cs.york.ac.uk> wrote:
> Hello,
>
> I have built my own lightweight binding components similar to the
> o.a.s.components.http.* classes to use with http get in order to
> retrieve html pages via ESB. The problem is that the HttpMarshaler/
> HttpClientMarshaler expect the content of the http request/ response
> body to be xml in order for them to include it in the NormalizedMessage
> content. Unfortunately, this does not work with html as it is not xml
> compliant, so I put the response body in as an attachment. I have tested
> the new components and it seems to work fine.
>
> I wonder whether for a more generic solution the lightweight Marshalers
> should support other Mime-Types. Only if the content-type of the http
> header is "text/xml" the http requests/ responses are transformed into
> the NormalizedMessage contents, otherwise they use an dummy contents and
> attach the request/response to the message. I'd be happy to work on this.
>
> I know that the HttpComponent already allows for MultiMimeMessage Post
> Requests, however, it expects at least one part of it to be xml (which I
> don't like). I would much rather prefer for the SoapReader to check if
> there is a xml part available, if yes, use it as content, otherwise
> attach everything and use a dummy message content.
>
> Any comments are most welcome.
>
> Stefan
>