You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficserver.apache.org by Leif Hedstrom <zw...@apache.org> on 2010/12/03 01:47:11 UTC

SDK and string "copies"

Hi,

here's an idea, that could optimize memory allocation pressure 
significantly (I think) when writing plugins. Here's the deal; A few 
plugin APIs will make copies of the strings from the internal marshal 
buffers, before returning them, with the only purpose of making them 
NULL terminated. The odd thing, to me at least, is that all these APIs 
already take an argument (an int*), which gets set with the length of 
the string.

I think it'd make much more sense to consistently make all APIs that 
deals with strings like this not be NULL terminated. This would require 
no API changes, just that the "semantics" would change. And I think the 
requirement to 'release' strings could then go away entirely (since they 
are just const char* pointers into the marshal buffers).

Below are a few APIs that would change their "behavior", in that they no 
longer would gurantee NULL terminated strings (which is something most 
APIs don't anyways).

Thoughts on this?

-- leif

const char * TSUrlSchemeGet(TSMBuffer bufp, TSMLoc obj, int *length);
const char *TSUrlUserGet(TSMBuffer bufp, TSMLoc obj, int *length);
const char *TSUrlPasswordGet(TSMBuffer bufp, TSMLoc obj, int *length);
const char *TSUrlHostGet(TSMBuffer bufp, TSMLoc obj, int *length);
const char *TSUrlPathGet(TSMBuffer bufp, TSMLoc obj, int *length);
const char *TSUrlHttpParamsGet(TSMBuffer bufp, TSMLoc obj, int *length);
const char *TSUrlHttpQueryGet(TSMBuffer bufp, TSMLoc obj, int *length);
const char *TSUrlHttpFragmentGet(TSMBuffer bufp, TSMLoc obj, int *length);

const char *TSMimeHdrFieldNameGet(TSMBuffer bufp, TSMLoc hdr, TSMLoc 
field, int *length);
const char *TSHttpHdrMethodGet(TSMBuffer bufp, TSMLoc obj, int *length);
const char *TSHttpHdrReasonGet(TSMBuffer bufp, TSMLoc obj, int *length);

TSReturnCode TSMimeHdrFieldValueStringGet(TSMBuffer bufp, TSMLoc hdr, 
TSMLoc field, int idx, const char **value_ptr, int *value_len_ptr);



Re: SDK and string "copies"

Posted by Bryan Call <bc...@yahoo-inc.com>.
+1 - simplifies ts code and increases performance

-Bryan

On 12/02/2010 04:47 PM, Leif Hedstrom wrote:
> Hi,
>
> here's an idea, that could optimize memory allocation pressure
> significantly (I think) when writing plugins. Here's the deal; A few
> plugin APIs will make copies of the strings from the internal marshal
> buffers, before returning them, with the only purpose of making them
> NULL terminated. The odd thing, to me at least, is that all these APIs
> already take an argument (an int*), which gets set with the length of
> the string.
>
> I think it'd make much more sense to consistently make all APIs that
> deals with strings like this not be NULL terminated. This would require
> no API changes, just that the "semantics" would change. And I think the
> requirement to 'release' strings could then go away entirely (since they
> are just const char* pointers into the marshal buffers).
>
> Below are a few APIs that would change their "behavior", in that they no
> longer would gurantee NULL terminated strings (which is something most
> APIs don't anyways).
>
> Thoughts on this?
>
> -- leif
>
> const char * TSUrlSchemeGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlUserGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlPasswordGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlHostGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlPathGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlHttpParamsGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlHttpQueryGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlHttpFragmentGet(TSMBuffer bufp, TSMLoc obj, int *length);
>
> const char *TSMimeHdrFieldNameGet(TSMBuffer bufp, TSMLoc hdr, TSMLoc
> field, int *length);
> const char *TSHttpHdrMethodGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSHttpHdrReasonGet(TSMBuffer bufp, TSMLoc obj, int *length);
>
> TSReturnCode TSMimeHdrFieldValueStringGet(TSMBuffer bufp, TSMLoc hdr,
> TSMLoc field, int idx, const char **value_ptr, int *value_len_ptr);
>
>


Re: SDK and string "copies"

Posted by Leif Hedstrom <zw...@apache.org>.
On 12/02/2010 08:05 PM, Jason wrote:
> ditto +1, just have to make a point of it in the docs

Absolutely, I will file separate bug(s) for that (doing this change will 
definitely eliminate the requirement for ever having to call 
TSHandleStringRelease(), which actually simplifies the usage of the APIs 
a fair amount.

Fwiw, most of the doxygen docs for this already said the strings aren't 
guaranteed to be NULL terminated. And most of the APIs I listed have a 
note like this in the SDK doc:

     Note: the returned string is not guaranteed to be null-terminated.

That note is obviously a lie, since up until now, they were guaranteed 
to be NULL terminated, in a very expensive way (for this handful of 
APIs). There's even a big section already in the docs, that says:

     No null-terminated strings
     In Traffic Server 5.2 (sic) and newer, you cannot assume that the 
string data contained in
     marshal buffers (data such as URLs and MIME fields) is stored in 
null-terminated string
     copies. This means that your plugins should always use the length 
parameter when
     retrieving or manipulating these strings.


I believe this whole nonsense of doing copies of strings on the heap, 
and NULL terminate them, was some attempt to make the APIs compatible 
with much older versions of Inktomi Traffic Server (which we don't need 
to deal with obviously).

Cheers,

-- leif

Re: SDK and string "copies"

Posted by Jason <ja...@gmail.com>.
ditto +1, just have to make a point of it in the docs


On Dec 2, 2010, at 9:23 PM, Eric Balsa wrote:

> I'm +1 for making stuff non-null and returning a pointer straight into
> the MBuf like you propose.
> 
> The only problem I see is that plugin writers will get direct access
> to the MBuf and one could cast and change data directly (but these are
> getters!). Before, you were operating on a copy of that data. However,
> I think the benefits (plugins without mallocs) outweigh the
> babysitting of developers who should know better than to cast away
> const.
> 
> --Eric
> 
> On Thu, Dec 2, 2010 at 4:47 PM, Leif Hedstrom <zw...@apache.org> wrote:
>> Hi,
>> 
>> here's an idea, that could optimize memory allocation pressure significantly
>> (I think) when writing plugins. Here's the deal; A few plugin APIs will make
>> copies of the strings from the internal marshal buffers, before returning
>> them, with the only purpose of making them NULL terminated. The odd thing,
>> to me at least, is that all these APIs already take an argument (an int*),
>> which gets set with the length of the string.
>> 
>> I think it'd make much more sense to consistently make all APIs that deals
>> with strings like this not be NULL terminated. This would require no API
>> changes, just that the "semantics" would change. And I think the requirement
>> to 'release' strings could then go away entirely (since they are just const
>> char* pointers into the marshal buffers).
>> 
>> Below are a few APIs that would change their "behavior", in that they no
>> longer would gurantee NULL terminated strings (which is something most APIs
>> don't anyways).
>> 
>> Thoughts on this?
>> 
>> -- leif
>> 
>> const char * TSUrlSchemeGet(TSMBuffer bufp, TSMLoc obj, int *length);
>> const char *TSUrlUserGet(TSMBuffer bufp, TSMLoc obj, int *length);
>> const char *TSUrlPasswordGet(TSMBuffer bufp, TSMLoc obj, int *length);
>> const char *TSUrlHostGet(TSMBuffer bufp, TSMLoc obj, int *length);
>> const char *TSUrlPathGet(TSMBuffer bufp, TSMLoc obj, int *length);
>> const char *TSUrlHttpParamsGet(TSMBuffer bufp, TSMLoc obj, int *length);
>> const char *TSUrlHttpQueryGet(TSMBuffer bufp, TSMLoc obj, int *length);
>> const char *TSUrlHttpFragmentGet(TSMBuffer bufp, TSMLoc obj, int *length);
>> 
>> const char *TSMimeHdrFieldNameGet(TSMBuffer bufp, TSMLoc hdr, TSMLoc field,
>> int *length);
>> const char *TSHttpHdrMethodGet(TSMBuffer bufp, TSMLoc obj, int *length);
>> const char *TSHttpHdrReasonGet(TSMBuffer bufp, TSMLoc obj, int *length);
>> 
>> TSReturnCode TSMimeHdrFieldValueStringGet(TSMBuffer bufp, TSMLoc hdr, TSMLoc
>> field, int idx, const char **value_ptr, int *value_len_ptr);
>> 
>> 
>> 


Re: SDK and string "copies"

Posted by Eric Balsa <er...@apache.org>.
I'm +1 for making stuff non-null and returning a pointer straight into
the MBuf like you propose.

The only problem I see is that plugin writers will get direct access
to the MBuf and one could cast and change data directly (but these are
getters!). Before, you were operating on a copy of that data. However,
I think the benefits (plugins without mallocs) outweigh the
babysitting of developers who should know better than to cast away
const.

--Eric

On Thu, Dec 2, 2010 at 4:47 PM, Leif Hedstrom <zw...@apache.org> wrote:
> Hi,
>
> here's an idea, that could optimize memory allocation pressure significantly
> (I think) when writing plugins. Here's the deal; A few plugin APIs will make
> copies of the strings from the internal marshal buffers, before returning
> them, with the only purpose of making them NULL terminated. The odd thing,
> to me at least, is that all these APIs already take an argument (an int*),
> which gets set with the length of the string.
>
> I think it'd make much more sense to consistently make all APIs that deals
> with strings like this not be NULL terminated. This would require no API
> changes, just that the "semantics" would change. And I think the requirement
> to 'release' strings could then go away entirely (since they are just const
> char* pointers into the marshal buffers).
>
> Below are a few APIs that would change their "behavior", in that they no
> longer would gurantee NULL terminated strings (which is something most APIs
> don't anyways).
>
> Thoughts on this?
>
> -- leif
>
> const char * TSUrlSchemeGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlUserGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlPasswordGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlHostGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlPathGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlHttpParamsGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlHttpQueryGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlHttpFragmentGet(TSMBuffer bufp, TSMLoc obj, int *length);
>
> const char *TSMimeHdrFieldNameGet(TSMBuffer bufp, TSMLoc hdr, TSMLoc field,
> int *length);
> const char *TSHttpHdrMethodGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSHttpHdrReasonGet(TSMBuffer bufp, TSMLoc obj, int *length);
>
> TSReturnCode TSMimeHdrFieldValueStringGet(TSMBuffer bufp, TSMLoc hdr, TSMLoc
> field, int idx, const char **value_ptr, int *value_len_ptr);
>
>
>

Re: SDK and string "copies"

Posted by Manish Pandey <ma...@gmail.com>.
Seems like an great idea for simplifying memory management. The overhead of
dealing with non-NULL termination should be minor.

Isn't there an equivalent of TSHttpBodyGet(TSMBuffer bufp, TSMLoc obj, int
*length);

Manish

On Thu, Dec 2, 2010 at 4:47 PM, Leif Hedstrom <zw...@apache.org> wrote:

> Hi,
>
> here's an idea, that could optimize memory allocation pressure
> significantly (I think) when writing plugins. Here's the deal; A few plugin
> APIs will make copies of the strings from the internal marshal buffers,
> before returning them, with the only purpose of making them NULL terminated.
> The odd thing, to me at least, is that all these APIs already take an
> argument (an int*), which gets set with the length of the string.
>
> I think it'd make much more sense to consistently make all APIs that deals
> with strings like this not be NULL terminated. This would require no API
> changes, just that the "semantics" would change. And I think the requirement
> to 'release' strings could then go away entirely (since they are just const
> char* pointers into the marshal buffers).
>
> Below are a few APIs that would change their "behavior", in that they no
> longer would gurantee NULL terminated strings (which is something most APIs
> don't anyways).
>
> Thoughts on this?
>
> -- leif
>
> const char * TSUrlSchemeGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlUserGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlPasswordGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlHostGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlPathGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlHttpParamsGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlHttpQueryGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSUrlHttpFragmentGet(TSMBuffer bufp, TSMLoc obj, int *length);
>
> const char *TSMimeHdrFieldNameGet(TSMBuffer bufp, TSMLoc hdr, TSMLoc field,
> int *length);
> const char *TSHttpHdrMethodGet(TSMBuffer bufp, TSMLoc obj, int *length);
> const char *TSHttpHdrReasonGet(TSMBuffer bufp, TSMLoc obj, int *length);
>
> TSReturnCode TSMimeHdrFieldValueStringGet(TSMBuffer bufp, TSMLoc hdr,
> TSMLoc field, int idx, const char **value_ptr, int *value_len_ptr);
>
>
>

Re: SDK and string "copies"

Posted by "Alan M. Carroll" <am...@network-geographics.com>.
+1

Thursday, December 2, 2010, 6:47:11 PM, you wrote:

> I think it'd make much more sense to consistently make all APIs that 
> deals with strings like this not be NULL terminated. This would require 
> no API changes, just that the "semantics" would change. And I think the 
> requirement to 'release' strings could then go away entirely (since they 
> are just const char* pointers into the marshal buffers).

> Below are a few APIs that would change their "behavior", in that they no 
> longer would gurantee NULL terminated strings (which is something most 
> APIs don't anyways).