You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cxf.apache.org by Brice <br...@gmail.com> on 2008/08/01 12:19:09 UTC

Re: Just to be sure about Gzip

OK thx Google,

I found out that the GZip support is fixed for the 2.1.2 release:

https://issues.apache.org/jira/browse/CXF-1387

However I wonder if this patch is available in 2.0.8.



On Thu, Jul 31, 2008 at 12:45, Brice <br...@gmail.com> wrote:

> We are using CXF as a REST Client, this client connects to a partner
> service. The context is that the response are a really huge POX, so we would
> like our partner to compresse the answer with GZip.
>
> I googled around, and I didn't really found something useful yet, except
> that XFire 1.2 was supporting HTTP GZip.
>
> Well knowing that HTTP GZip is a part of the RFC and this is the server
> that send a gzipped response, I'm guessing the user side (as a client)
> doesn't have to set any related parameters!?
> Am I wrong?
>
> And as a general idea for webservices, when this is CXF that is sending XML
> documents (either SOAP or REST) how can we configure CXF to gzip them (or
> any compression algorythm).
>
> Thx in advance.
>
> --
> Bryce
>



-- 
Bryce

Re: Just to be sure about Gzip

Posted by Brice <br...@gmail.com>.
Indeed thx for the explanation, I was thinking for the incoming messages be
cause I'am using CXF as a REST Client, but I guess the principle in action
is the same in reverse order.

As for the release ETA, it's just a bit late but we can manage. For the
patches as much as I would like to participate, I'm clearly too much
overbooked here ;)

In the mean time we might want to compile the trunk at home and see what we
have to do on the REST side as Sergey seems to have changed/added several
things on this matter, and to try out the gzip support.


Any way thx for your involvement ;)

On Fri, Aug 1, 2008 at 18:00, Daniel Kulp <dk...@apache.org> wrote:

>
> On Aug 1, 2008, at 11:44 AM, Brice wrote:
>
>> OK, thx for the update, indeed our application will have to handle a big
>> load for two reasons:
>>  first : our application and the server will have to handle a lot of
>> request (~50 eventually more) per seconds
>>  second : the documents might weight from 200KB to several MB
>>
>> I guess we will switch to the 2.1.2. As this release is not yet ready (7
>> issues left), do you have an ETA and if not do you think the actual code
>> on
>> the trunk is stable enough to go in production?
>>
>
> I wouldn't count on JIRA for this.   For patch releases, I tend to do "time
> based" releases instead of feature/issue based releases.   Generally, I try
> to do patch releases about every 8 weeks or so.   Whatever JIRA items have
> been fixed in that 8 weeks is included.     Thus, if you REALLY want
> something included, submit a patch.   :-)    Otherwise it might not get done
> in time and may get pushed out.
>
> 2.1.1 was done mid June so probably mid August.
>
>
>  And finally just for my own information, when you talk about streaming,
>> I'm
>> guessing that the gzipped payload is stremed to the "xml handler" using
>> stax
>> to map the objects with Jaxb. Am I correct?
>>
>
>
> It's mostly for the writing side of things.   Normally, the path is
> basically:
> JAXB -> Stax -> HttpOutputStream
> While JAXB is writing to the Stax writer, it's streaming it directly onto
> the wire.
>
> Ian's patch had this cool little feature where messages less than 1K in
> size (configurable) don't get gzip compressed.   However, to determine if it
> is 1K in size, he basically did:
> JAXB -> Stax -> Cache   (byte[] for the first 64K, file on disk if larger)
> then check the cache size and do:
> (big) Cache -> GZipOutputStream -> HttpOutputStream
> or
> (little) Cache -> HttpOutputStream
>
> that's a bit inefficient as it involves using the disk for the large
> messages, etc.
>
> The stuff on trunk sets it up like:
>
> JAXB -> Stax -> 1K buffer (-> GZIPOut if 1k exceeded) -> HttpOutputStream
>
> If the 1K buffer never overflows, it's flushed right out to the
> HttpOutputStream at the end.   If it does overflow, the gzip stream is
> injected and from there on, all writing is immediately streamed out via
> gzip.   The large buffer/cache of the whole message does not take place.
>
> Does that help explain it?
>
>
> Dan
>
>
>
>
>
>>
>> Thx :)
>>
>>
>> On Fri, Aug 1, 2008 at 17:19, Daniel Kulp <dk...@apache.org> wrote:
>>
>>
>>> On Aug 1, 2008, at 6:38 AM, Ian Roberts wrote:
>>>
>>> Brice wrote:
>>>
>>>>
>>>>  OK thx Google,
>>>>> I found out that the GZip support is fixed for the 2.1.2 release:
>>>>> https://issues.apache.org/jira/browse/CXF-1387
>>>>> However I wonder if this patch is available in 2.0.8.
>>>>>
>>>>>
>>>> Not that I'm aware of, but if you're prepared to build CXF from source
>>>> you
>>>> should just be able to apply the patch from the JIRA issue directly onto
>>>> the
>>>> 2.0.x source code (I originally wrote the patch against the 2.0 branch
>>>> as
>>>> that's what I use).
>>>>
>>>>
>>> Keep in mind, what's on trunk is a bit different than your original
>>> patch.
>>>  The original patch was an EXCELLENT starting point, but what's on trunk
>>> works quite a bit better for large messages in that it can stream instead
>>> of
>>> doing a "buffer than send" thing.     That's partially why I didn't push
>>> it
>>> to 2.0.x.    The classes used for the threshold management and such
>>> aren't
>>> available there so there is quite a bit more work to find everything that
>>> would need to be merged.
>>>
>>> ---
>>> Daniel Kulp
>>> dkulp@apache.org
>>> http://www.dankulp.com/blog
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Bryce
>>
>
> ---
> Daniel Kulp
> dkulp@apache.org
> http://www.dankulp.com/blog
>
>
>
>
>


-- 
Bryce

Re: Just to be sure about Gzip

Posted by Brice <br...@gmail.com>.
> To support it on the server side you can use a servlet filter rather than
> the interceptors, but for the client side CXF needs to do the zipping and
> setting of all the relevant HTTP headers itself.  Before I upgraded the
> interceptors to work on the server side as well as the client I used to use
> http://sourceforge.net/projects/pjl-comp-filter in my web.xml.
>

This part of the application is actually not inside a Servlet. Actually we
are using CXF as a REST client, using some examples out there, the following
code was made from a 2.0.4 version of CXF.

final JaxWsProxyFactoryBean sf = new JaxWsProxyFactoryBean();
> sf.setServiceClass(this.serviceInterface);
> sf.getServiceFactory().setWrapped(false);
> sf.getClientFactoryBean().setBindingId(HttpBindingFactory.HTTP_BINDING_ID);
> sf.setAddress(this.httpAddress);

final Object object = sf.create();
> final Client client = ClientProxy.getClient(object);
>
So we don't have any web.xml in the classpath. Too bad this tip seemed good.


Considering the modification of the input interceptor.
I cannot say I will have time to work on it indefinetly as I'm staffed on
other things, but if I want to implement the same intelligent mecanism used
to write a gzipped stream. I have somehow planned as follows
 1 - I think first should create an AbstractThresholdInputStream
 2 - Optimize the GZipInInterceptor
 3 - ? Well that's were I'm confused, is it possible to have some guidance.

Thx a lot,
Regards,


On Mon, Aug 4, 2008 at 16:42, Ian Roberts <i....@dcs.shef.ac.uk> wrote:

> Brice wrote:
>
>> It's mostly for the writing side of things.
>>>
>>>
>> Ok I just get this part, while looking at the code. Indeed the
>> GZipInInterceptor doesn't have a GZipThresholdOutputStream nor a
>> AbstractThresholdOutputStream exists.
>> I just looked a the code, well for now the code is a bit obscure.
>>
>> However on a different view and with my limited understanding, I really
>> don't get why CXF requires to write gzip interceptors, why couldn't we
>> rely
>> on the transport layer wether it is jetty or commons-http.
>>
>
> To support it on the server side you can use a servlet filter rather than
> the interceptors, but for the client side CXF needs to do the zipping and
> setting of all the relevant HTTP headers itself.  Before I upgraded the
> interceptors to work on the server side as well as the client I used to use
> http://sourceforge.net/projects/pjl-comp-filter in my web.xml.
>
>
> Ian
>
> --
> Ian Roberts               | Department of Computer Science
> i.roberts@dcs.shef.ac.uk  | University of Sheffield, UK
>



-- 
Bryce

Re: Just to be sure about Gzip

Posted by Ian Roberts <i....@dcs.shef.ac.uk>.
Brice wrote:
>> It's mostly for the writing side of things.
>>
> 
> Ok I just get this part, while looking at the code. Indeed the
> GZipInInterceptor doesn't have a GZipThresholdOutputStream nor a
> AbstractThresholdOutputStream exists.
> I just looked a the code, well for now the code is a bit obscure.
> 
> However on a different view and with my limited understanding, I really
> don't get why CXF requires to write gzip interceptors, why couldn't we rely
> on the transport layer wether it is jetty or commons-http.

To support it on the server side you can use a servlet filter rather 
than the interceptors, but for the client side CXF needs to do the 
zipping and setting of all the relevant HTTP headers itself.  Before I 
upgraded the interceptors to work on the server side as well as the 
client I used to use http://sourceforge.net/projects/pjl-comp-filter in 
my web.xml.

Ian

-- 
Ian Roberts               | Department of Computer Science
i.roberts@dcs.shef.ac.uk  | University of Sheffield, UK

Re: Just to be sure about Gzip

Posted by Brice <br...@gmail.com>.
>
> It's mostly for the writing side of things.
>

Ok I just get this part, while looking at the code. Indeed the
GZipInInterceptor doesn't have a GZipThresholdOutputStream nor a
AbstractThresholdOutputStream exists.
I just looked a the code, well for now the code is a bit obscure.

However on a different view and with my limited understanding, I really
don't get why CXF requires to write gzip interceptors, why couldn't we rely
on the transport layer wether it is jetty or commons-http.

Thx

On Fri, Aug 1, 2008 at 18:00, Daniel Kulp <dk...@apache.org> wrote:

>
> On Aug 1, 2008, at 11:44 AM, Brice wrote:
>
>> OK, thx for the update, indeed our application will have to handle a big
>> load for two reasons:
>>  first : our application and the server will have to handle a lot of
>> request (~50 eventually more) per seconds
>>  second : the documents might weight from 200KB to several MB
>>
>> I guess we will switch to the 2.1.2. As this release is not yet ready (7
>> issues left), do you have an ETA and if not do you think the actual code
>> on
>> the trunk is stable enough to go in production?
>>
>
> I wouldn't count on JIRA for this.   For patch releases, I tend to do "time
> based" releases instead of feature/issue based releases.   Generally, I try
> to do patch releases about every 8 weeks or so.   Whatever JIRA items have
> been fixed in that 8 weeks is included.     Thus, if you REALLY want
> something included, submit a patch.   :-)    Otherwise it might not get done
> in time and may get pushed out.
>
> 2.1.1 was done mid June so probably mid August.
>
>
>  And finally just for my own information, when you talk about streaming,
>> I'm
>> guessing that the gzipped payload is stremed to the "xml handler" using
>> stax
>> to map the objects with Jaxb. Am I correct?
>>
>
>
> It's mostly for the writing side of things.   Normally, the path is
> basically:
> JAXB -> Stax -> HttpOutputStream
> While JAXB is writing to the Stax writer, it's streaming it directly onto
> the wire.
>
> Ian's patch had this cool little feature where messages less than 1K in
> size (configurable) don't get gzip compressed.   However, to determine if it
> is 1K in size, he basically did:
> JAXB -> Stax -> Cache   (byte[] for the first 64K, file on disk if larger)
> then check the cache size and do:
> (big) Cache -> GZipOutputStream -> HttpOutputStream
> or
> (little) Cache -> HttpOutputStream
>
> that's a bit inefficient as it involves using the disk for the large
> messages, etc.
>
> The stuff on trunk sets it up like:
>
> JAXB -> Stax -> 1K buffer (-> GZIPOut if 1k exceeded) -> HttpOutputStream
>
> If the 1K buffer never overflows, it's flushed right out to the
> HttpOutputStream at the end.   If it does overflow, the gzip stream is
> injected and from there on, all writing is immediately streamed out via
> gzip.   The large buffer/cache of the whole message does not take place.
>
> Does that help explain it?
>
>
> Dan
>
>
>
>
>
>>
>> Thx :)
>>
>>
>> On Fri, Aug 1, 2008 at 17:19, Daniel Kulp <dk...@apache.org> wrote:
>>
>>
>>> On Aug 1, 2008, at 6:38 AM, Ian Roberts wrote:
>>>
>>> Brice wrote:
>>>
>>>>
>>>>  OK thx Google,
>>>>> I found out that the GZip support is fixed for the 2.1.2 release:
>>>>> https://issues.apache.org/jira/browse/CXF-1387
>>>>> However I wonder if this patch is available in 2.0.8.
>>>>>
>>>>>
>>>> Not that I'm aware of, but if you're prepared to build CXF from source
>>>> you
>>>> should just be able to apply the patch from the JIRA issue directly onto
>>>> the
>>>> 2.0.x source code (I originally wrote the patch against the 2.0 branch
>>>> as
>>>> that's what I use).
>>>>
>>>>
>>> Keep in mind, what's on trunk is a bit different than your original
>>> patch.
>>>  The original patch was an EXCELLENT starting point, but what's on trunk
>>> works quite a bit better for large messages in that it can stream instead
>>> of
>>> doing a "buffer than send" thing.     That's partially why I didn't push
>>> it
>>> to 2.0.x.    The classes used for the threshold management and such
>>> aren't
>>> available there so there is quite a bit more work to find everything that
>>> would need to be merged.
>>>
>>> ---
>>> Daniel Kulp
>>> dkulp@apache.org
>>> http://www.dankulp.com/blog
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Bryce
>>
>
> ---
> Daniel Kulp
> dkulp@apache.org
> http://www.dankulp.com/blog
>
>
>
>
>


-- 
Bryce

Re: Just to be sure about Gzip

Posted by Daniel Kulp <dk...@apache.org>.
On Aug 1, 2008, at 11:44 AM, Brice wrote:
> OK, thx for the update, indeed our application will have to handle a  
> big
> load for two reasons:
>  first : our application and the server will have to handle a lot of
> request (~50 eventually more) per seconds
>  second : the documents might weight from 200KB to several MB
>
> I guess we will switch to the 2.1.2. As this release is not yet  
> ready (7
> issues left), do you have an ETA and if not do you think the actual  
> code on
> the trunk is stable enough to go in production?

I wouldn't count on JIRA for this.   For patch releases, I tend to do  
"time based" releases instead of feature/issue based releases.    
Generally, I try to do patch releases about every 8 weeks or so.    
Whatever JIRA items have been fixed in that 8 weeks is included.      
Thus, if you REALLY want something included, submit a patch.   :-)     
Otherwise it might not get done in time and may get pushed out.

2.1.1 was done mid June so probably mid August.


> And finally just for my own information, when you talk about  
> streaming, I'm
> guessing that the gzipped payload is stremed to the "xml handler"  
> using stax
> to map the objects with Jaxb. Am I correct?


It's mostly for the writing side of things.   Normally, the path is  
basically:
JAXB -> Stax -> HttpOutputStream
While JAXB is writing to the Stax writer, it's streaming it directly  
onto the wire.

Ian's patch had this cool little feature where messages less than 1K  
in size (configurable) don't get gzip compressed.   However, to  
determine if it is 1K in size, he basically did:
JAXB -> Stax -> Cache   (byte[] for the first 64K, file on disk if  
larger)
then check the cache size and do:
(big) Cache -> GZipOutputStream -> HttpOutputStream
or
(little) Cache -> HttpOutputStream

that's a bit inefficient as it involves using the disk for the large  
messages, etc.

The stuff on trunk sets it up like:

JAXB -> Stax -> 1K buffer (-> GZIPOut if 1k exceeded) ->  
HttpOutputStream

If the 1K buffer never overflows, it's flushed right out to the  
HttpOutputStream at the end.   If it does overflow, the gzip stream is  
injected and from there on, all writing is immediately streamed out  
via gzip.   The large buffer/cache of the whole message does not take  
place.

Does that help explain it?


Dan



>
>
> Thx :)
>
>
> On Fri, Aug 1, 2008 at 17:19, Daniel Kulp <dk...@apache.org> wrote:
>
>>
>> On Aug 1, 2008, at 6:38 AM, Ian Roberts wrote:
>>
>> Brice wrote:
>>>
>>>> OK thx Google,
>>>> I found out that the GZip support is fixed for the 2.1.2 release:
>>>> https://issues.apache.org/jira/browse/CXF-1387
>>>> However I wonder if this patch is available in 2.0.8.
>>>>
>>>
>>> Not that I'm aware of, but if you're prepared to build CXF from  
>>> source you
>>> should just be able to apply the patch from the JIRA issue  
>>> directly onto the
>>> 2.0.x source code (I originally wrote the patch against the 2.0  
>>> branch as
>>> that's what I use).
>>>
>>
>> Keep in mind, what's on trunk is a bit different than your original  
>> patch.
>>  The original patch was an EXCELLENT starting point, but what's on  
>> trunk
>> works quite a bit better for large messages in that it can stream  
>> instead of
>> doing a "buffer than send" thing.     That's partially why I didn't  
>> push it
>> to 2.0.x.    The classes used for the threshold management and such  
>> aren't
>> available there so there is quite a bit more work to find  
>> everything that
>> would need to be merged.
>>
>> ---
>> Daniel Kulp
>> dkulp@apache.org
>> http://www.dankulp.com/blog
>>
>>
>>
>>
>>
>
>
> -- 
> Bryce

---
Daniel Kulp
dkulp@apache.org
http://www.dankulp.com/blog





Re: Just to be sure about Gzip

Posted by Brice <br...@gmail.com>.
OK, thx for the update, indeed our application will have to handle a big
load for two reasons:
  first : our application and the server will have to handle a lot of
request (~50 eventually more) per seconds
  second : the documents might weight from 200KB to several MB

I guess we will switch to the 2.1.2. As this release is not yet ready (7
issues left), do you have an ETA and if not do you think the actual code on
the trunk is stable enough to go in production?

And finally just for my own information, when you talk about streaming, I'm
guessing that the gzipped payload is stremed to the "xml handler" using stax
to map the objects with Jaxb. Am I correct?

Thx :)


On Fri, Aug 1, 2008 at 17:19, Daniel Kulp <dk...@apache.org> wrote:

>
> On Aug 1, 2008, at 6:38 AM, Ian Roberts wrote:
>
>  Brice wrote:
>>
>>> OK thx Google,
>>> I found out that the GZip support is fixed for the 2.1.2 release:
>>> https://issues.apache.org/jira/browse/CXF-1387
>>> However I wonder if this patch is available in 2.0.8.
>>>
>>
>> Not that I'm aware of, but if you're prepared to build CXF from source you
>> should just be able to apply the patch from the JIRA issue directly onto the
>> 2.0.x source code (I originally wrote the patch against the 2.0 branch as
>> that's what I use).
>>
>
> Keep in mind, what's on trunk is a bit different than your original patch.
>   The original patch was an EXCELLENT starting point, but what's on trunk
> works quite a bit better for large messages in that it can stream instead of
> doing a "buffer than send" thing.     That's partially why I didn't push it
> to 2.0.x.    The classes used for the threshold management and such aren't
> available there so there is quite a bit more work to find everything that
> would need to be merged.
>
> ---
> Daniel Kulp
> dkulp@apache.org
> http://www.dankulp.com/blog
>
>
>
>
>


-- 
Bryce

Re: Just to be sure about Gzip

Posted by Daniel Kulp <dk...@apache.org>.
On Aug 1, 2008, at 6:38 AM, Ian Roberts wrote:

> Brice wrote:
>> OK thx Google,
>> I found out that the GZip support is fixed for the 2.1.2 release:
>> https://issues.apache.org/jira/browse/CXF-1387
>> However I wonder if this patch is available in 2.0.8.
>
> Not that I'm aware of, but if you're prepared to build CXF from  
> source you should just be able to apply the patch from the JIRA  
> issue directly onto the 2.0.x source code (I originally wrote the  
> patch against the 2.0 branch as that's what I use).

Keep in mind, what's on trunk is a bit different than your original  
patch.   The original patch was an EXCELLENT starting point, but  
what's on trunk works quite a bit better for large messages in that it  
can stream instead of doing a "buffer than send" thing.     That's  
partially why I didn't push it to 2.0.x.    The classes used for the  
threshold management and such aren't available there so there is quite  
a bit more work to find everything that would need to be merged.

---
Daniel Kulp
dkulp@apache.org
http://www.dankulp.com/blog





Re: Just to be sure about Gzip

Posted by Brice <br...@gmail.com>.
Yes indeed, until the new cxf versions are released, that's the only option,
or to migrate to a pre-2.1.2

I hop the remaining bugs will be fixed soon :P
Anyway thx guys for your support and work, again!

On Fri, Aug 1, 2008 at 12:38, Ian Roberts <i....@dcs.shef.ac.uk> wrote:

> Brice wrote:
>
>> OK thx Google,
>>
>> I found out that the GZip support is fixed for the 2.1.2 release:
>>
>> https://issues.apache.org/jira/browse/CXF-1387
>>
>> However I wonder if this patch is available in 2.0.8.
>>
>
> Not that I'm aware of, but if you're prepared to build CXF from source you
> should just be able to apply the patch from the JIRA issue directly onto the
> 2.0.x source code (I originally wrote the patch against the 2.0 branch as
> that's what I use).
>
> Ian
>
> --
> Ian Roberts               | Department of Computer Science
> i.roberts@dcs.shef.ac.uk  | University of Sheffield, UK
>



-- 
Bryce

Re: Just to be sure about Gzip

Posted by Ian Roberts <i....@dcs.shef.ac.uk>.
Brice wrote:
> OK thx Google,
> 
> I found out that the GZip support is fixed for the 2.1.2 release:
> 
> https://issues.apache.org/jira/browse/CXF-1387
> 
> However I wonder if this patch is available in 2.0.8.

Not that I'm aware of, but if you're prepared to build CXF from source 
you should just be able to apply the patch from the JIRA issue directly 
onto the 2.0.x source code (I originally wrote the patch against the 2.0 
branch as that's what I use).

Ian

-- 
Ian Roberts               | Department of Computer Science
i.roberts@dcs.shef.ac.uk  | University of Sheffield, UK