You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cxf.apache.org by Alessio Soldano <as...@redhat.com> on 2014/07/14 09:54:09 UTC

Performance issue with Content-ID computation of multipart MTOM/XOP messages

Hi,
while running some performance benchmarks here, we noticed lot of time 
spent computing the content-id of multipart MTOM/XOP messages, which is 
quite unexpected (at least to me). We have a client consuming a wsdl 
which references an external xsd. That xsd contains a type with base64 
encoded data. The schema declares elementFormDefault="qualified", 
attributeFormDefault="unqualified" and 
targetNamespace="org:foo:PurchaseOrder".
The problem is in AttachmentUtil's createContentID:

     public static String createContentID(String ns) throws 
UnsupportedEncodingException {
         String cid = "cxf.apache.org";
         String name = ATT_UUID + "-" + String.valueOf(++counter);
         if (ns != null && (ns.length() > 0)) {
             try {
                 URI uri = new URI(ns);
                 String host = uri.toURL().getHost();
                 cid = host;
             } catch (Exception e) {
                 cid = ns;
             }
         }
         return URLEncoder.encode(name, "UTF-8") + "@" + 
URLEncoder.encode(cid, "UTF-8");
     }

If the code inside the 'if' block is executed, a URL is to be created 
from the namespace string, which in my case is something like 
"org:foo:PurchaseOrder" (note, I can't change that, it's part of the 
benchmark sources). Building a URL from a String is potentially very 
expensive, because of the involved URLStreamHandler processing. In my 
case, the method will try to locate a URLStreamHandler named something 
like "xyz.org.Handler", which obviously does not exist; that causes a 
CNFE to be initialized, thrown and caught in the catch block above. That 
badly affects performances.

Now, I have few questions:
1) do we really need that mechanism for computing the content-id from 
the host of the url generated using the namespace? is there a spec 
requiring that?
2) if that's required, would you mind me trying to add some preliminary 
checks to avoid the URL generation when that's clearly going to raise an 
exception (for instance by parsing the string using a pre-computed 
regular expression) ?
3) any different idea / solution?

Thanks
Alessio

-- 
Alessio Soldano
Web Service Lead, JBoss


Re: Performance issue with Content-ID computation of multipart MTOM/XOP messages

Posted by Alessio Soldano <as...@redhat.com>.
On 14/07/14 09:54, Alessio Soldano wrote:
> 2) if that's required, would you mind me trying to add some 
> preliminary checks to avoid the URL generation when that's clearly 
> going to raise an exception (for instance by parsing the string using 
> a pre-computed regular expression) ?
Apache Commons UrlValidator [1] might be a solution for this, btw.

[1] 
http://commons.apache.org/proper/commons-validator/apidocs/src-html/org/apache/commons/validator/routines/UrlValidator.html

-- 
Alessio Soldano
Web Service Lead, JBoss


Re: Performance issue with Content-ID computation of multipart MTOM/XOP messages

Posted by Daniel Kulp <dk...@apache.org>.
I’m not aware of anything that would require this at all.   Looking at the logs (I had to go back to the SVN repo for this), this seems to have been part of the original sets of imports way back in 2006 done by Dan Diephouse.   Thus, it’s likely something that came from XFire.  


Dan


On Jul 14, 2014, at 3:54 AM, Alessio Soldano <as...@redhat.com> wrote:

> Hi,
> while running some performance benchmarks here, we noticed lot of time spent computing the content-id of multipart MTOM/XOP messages, which is quite unexpected (at least to me). We have a client consuming a wsdl which references an external xsd. That xsd contains a type with base64 encoded data. The schema declares elementFormDefault="qualified", attributeFormDefault="unqualified" and targetNamespace="org:foo:PurchaseOrder".
> The problem is in AttachmentUtil's createContentID:
> 
>    public static String createContentID(String ns) throws UnsupportedEncodingException {
>        String cid = "cxf.apache.org";
>        String name = ATT_UUID + "-" + String.valueOf(++counter);
>        if (ns != null && (ns.length() > 0)) {
>            try {
>                URI uri = new URI(ns);
>                String host = uri.toURL().getHost();
>                cid = host;
>            } catch (Exception e) {
>                cid = ns;
>            }
>        }
>        return URLEncoder.encode(name, "UTF-8") + "@" + URLEncoder.encode(cid, "UTF-8");
>    }
> 
> If the code inside the 'if' block is executed, a URL is to be created from the namespace string, which in my case is something like "org:foo:PurchaseOrder" (note, I can't change that, it's part of the benchmark sources). Building a URL from a String is potentially very expensive, because of the involved URLStreamHandler processing. In my case, the method will try to locate a URLStreamHandler named something like "xyz.org.Handler", which obviously does not exist; that causes a CNFE to be initialized, thrown and caught in the catch block above. That badly affects performances.
> 
> Now, I have few questions:
> 1) do we really need that mechanism for computing the content-id from the host of the url generated using the namespace? is there a spec requiring that?
> 2) if that's required, would you mind me trying to add some preliminary checks to avoid the URL generation when that's clearly going to raise an exception (for instance by parsing the string using a pre-computed regular expression) ?
> 3) any different idea / solution?
> 
> Thanks
> Alessio
> 
> -- 
> Alessio Soldano
> Web Service Lead, JBoss
> 

-- 
Daniel Kulp
dkulp@apache.org - http://dankulp.com/blog
Talend Community Coder - http://coders.talend.com


Re: Performance issue with Content-ID computation of multipart MTOM/XOP messages

Posted by Alessio Soldano <as...@redhat.com>.
On 14/07/14 11:58, Aki Yoshida wrote:
> I'm not sure whether it is really necessary to make the cid part
> depend on the namespace string.
>
> If we only need to guarantee uniquness within a document, a single
> thread calling the createContentID method will get a series of unique
> IDs. However, as the static variable counter is not synchronously
> updated, currently two threads may get the same ID value but this
> situation is not relevant as long as these two threads are working on
> two different documents. And even if two threads may be working on the
> same document, using the namespace depending value for the cid part
> won't decrease the collision chance very much as they are likely to be
> using the same namespace value. If we need to guarantee uniqueness
> among multiple documents, we will need a different mechanism anyway.
> So, I see not much benefit in using the namespace depending variable
> here.
Thanks for the consideration Aki. This is basically my first question; 
the reason why I asked about existing spec requirement is also that in 
most cases the namespace there will simply be null, so we're already 
skipping the if block very often...
I see not much benefit in the namespace usage here too.

Cheers
Alessio

-- 
Alessio Soldano
Web Service Lead, JBoss


Re: Performance issue with Content-ID computation of multipart MTOM/XOP messages

Posted by Aki Yoshida <el...@gmail.com>.
I'm not sure whether it is really necessary to make the cid part
depend on the namespace string.

If we only need to guarantee uniquness within a document, a single
thread calling the createContentID method will get a series of unique
IDs. However, as the static variable counter is not synchronously
updated, currently two threads may get the same ID value but this
situation is not relevant as long as these two threads are working on
two different documents. And even if two threads may be working on the
same document, using the namespace depending value for the cid part
won't decrease the collision chance very much as they are likely to be
using the same namespace value. If we need to guarantee uniqueness
among multiple documents, we will need a different mechanism anyway.
So, I see not much benefit in using the namespace depending variable
here.

regards, aki

2014-07-14 11:11 GMT+02:00 Sergey Beryozkin <sb...@gmail.com>:
> Hi Alessio
>
> On 14/07/14 08:54, Alessio Soldano wrote:
>>
>> Hi,
>> while running some performance benchmarks here, we noticed lot of time
>> spent computing the content-id of multipart MTOM/XOP messages, which is
>> quite unexpected (at least to me). We have a client consuming a wsdl
>> which references an external xsd. That xsd contains a type with base64
>> encoded data. The schema declares elementFormDefault="qualified",
>> attributeFormDefault="unqualified" and
>> targetNamespace="org:foo:PurchaseOrder".
>> The problem is in AttachmentUtil's createContentID:
>>
>>      public static String createContentID(String ns) throws
>> UnsupportedEncodingException {
>>          String cid = "cxf.apache.org";
>>          String name = ATT_UUID + "-" + String.valueOf(++counter);
>>          if (ns != null && (ns.length() > 0)) {
>>              try {
>>                  URI uri = new URI(ns);
>>                  String host = uri.toURL().getHost();
>>                  cid = host;
>>              } catch (Exception e) {
>>                  cid = ns;
>>              }
>>          }
>>          return URLEncoder.encode(name, "UTF-8") + "@" +
>> URLEncoder.encode(cid, "UTF-8");
>>      }
>>
>> If the code inside the 'if' block is executed, a URL is to be created
>> from the namespace string, which in my case is something like
>> "org:foo:PurchaseOrder" (note, I can't change that, it's part of the
>> benchmark sources). Building a URL from a String is potentially very
>> expensive, because of the involved URLStreamHandler processing. In my
>> case, the method will try to locate a URLStreamHandler named something
>> like "xyz.org.Handler", which obviously does not exist; that causes a
>> CNFE to be initialized, thrown and caught in the catch block above. That
>> badly affects performances.
>>
>> Now, I have few questions:
>> 1) do we really need that mechanism for computing the content-id from
>> the host of the url generated using the namespace? is there a spec
>> requiring that?
>> 2) if that's required, would you mind me trying to add some preliminary
>> checks to avoid the URL generation when that's clearly going to raise an
>> exception (for instance by parsing the string using a pre-computed
>> regular expression) ?
>
>
> Doing some basic manual checks would be faster indeed. You can simply try
> URI.getScheme and/or URI.getAuthority, and do some basic checks around it,
> no need to convert to URL for sure...
>
> Thanks, Sergey
>
>
>> 3) any different idea / solution?
>>
>> Thanks
>> Alessio
>>
>
>

Re: Performance issue with Content-ID computation of multipart MTOM/XOP messages

Posted by Sergey Beryozkin <sb...@gmail.com>.
Hi Alessio
On 14/07/14 08:54, Alessio Soldano wrote:
> Hi,
> while running some performance benchmarks here, we noticed lot of time
> spent computing the content-id of multipart MTOM/XOP messages, which is
> quite unexpected (at least to me). We have a client consuming a wsdl
> which references an external xsd. That xsd contains a type with base64
> encoded data. The schema declares elementFormDefault="qualified",
> attributeFormDefault="unqualified" and
> targetNamespace="org:foo:PurchaseOrder".
> The problem is in AttachmentUtil's createContentID:
>
>      public static String createContentID(String ns) throws
> UnsupportedEncodingException {
>          String cid = "cxf.apache.org";
>          String name = ATT_UUID + "-" + String.valueOf(++counter);
>          if (ns != null && (ns.length() > 0)) {
>              try {
>                  URI uri = new URI(ns);
>                  String host = uri.toURL().getHost();
>                  cid = host;
>              } catch (Exception e) {
>                  cid = ns;
>              }
>          }
>          return URLEncoder.encode(name, "UTF-8") + "@" +
> URLEncoder.encode(cid, "UTF-8");
>      }
>
> If the code inside the 'if' block is executed, a URL is to be created
> from the namespace string, which in my case is something like
> "org:foo:PurchaseOrder" (note, I can't change that, it's part of the
> benchmark sources). Building a URL from a String is potentially very
> expensive, because of the involved URLStreamHandler processing. In my
> case, the method will try to locate a URLStreamHandler named something
> like "xyz.org.Handler", which obviously does not exist; that causes a
> CNFE to be initialized, thrown and caught in the catch block above. That
> badly affects performances.
>
> Now, I have few questions:
> 1) do we really need that mechanism for computing the content-id from
> the host of the url generated using the namespace? is there a spec
> requiring that?
> 2) if that's required, would you mind me trying to add some preliminary
> checks to avoid the URL generation when that's clearly going to raise an
> exception (for instance by parsing the string using a pre-computed
> regular expression) ?

Doing some basic manual checks would be faster indeed. You can simply 
try URI.getScheme and/or URI.getAuthority, and do some basic checks 
around it, no need to convert to URL for sure...

Thanks, Sergey

> 3) any different idea / solution?
>
> Thanks
> Alessio
>