You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Konstantin Chuguev <Ch...@Clickstream.com> on 2008/04/28 16:41:44 UTC
"Better" mod_unique_id
Hi,
I'm developing a solution generating unique IDs for the requests to
websites that are not only clustered but also geographically
dispersed. This implies the following:
- the website's virtual host section on each Apache server has the
same ServerName which is mapped by DNS to different IP addresses using
various methods, geo-proximity, round-robin, etc.
- the virtual host's IP address is normally but not necessarily *;
- the actual IP address the Apache listens to for this virtual host is
normally, but not necessarily, an intranet address (behind a load
balancer).
After analysing the format of the ID generated by mod_unique_id, and
reading the module's source code, I have a feeling that this module
has serious flaws if used in my situation.
No offence to the authors, I'm sure the module serves its purpose just
right for the majority of its users. But as it seems that it doesn't
do this in my case, I thought I'd better ask if someone knows why.
I understand that the module is relatively old and likely has been
ported from a pre-2.0 version, when no APR library existed, and this
might explain its design. I'd be glad if someone could either confirm
this or
explain why it has been done like that.
Now to the point of my question. The unique_id_rec structure that
contains the binary representation of the unique ID consists of the
following fields:
unsigned int stamp;
unsigned int in_addr;
unsigned int pid;
unsigned short counter;
unsigned int thread_index;
1. Why use unsigned int timestamp when there exists apr_time_t which
is 64 bit and seems to be at least 1 microsecond accurate? Surely
there is unsigned short counter which helps if there is more than one
request coming to the same IP address / PID / thread per second, but
still I can hardly see this as a better design.
2. Why use unsigned id pid plus unsigned int thread_index if there
exists long r->connection->id? thread_index is in fact produced by
doing htonl((unsigned int)r->connection->id), but MPMs seem to ensure
the child_id is included there already! While it is just 4 bytes long
compared to the 8-byte pid/thread_index combination, still it is
guaranteed to be unique among all worker threads of the Apache server
in the system. And I don't think this particular field needs
converting to the network byte order.
3. Using unsigned int in_addr with the server-side IPv4 address works
well in the single cluster in the IPv4 network only. What if only IPv6
is being used in the intranet? What if multiple dispersed clusters
with exactly the same intranet IP addressing schemes serve the same
website? Please correct me if I'm wrong but I think the following
structure would represent the unique website more correctly:
- union {struct in_addr, struct in6_addr} local_ip_addr: the IP
address of the local side of the HTTP connection;
- union {struct in_addr, struct in6_addr} dns_ip_addr: one (any?) of
the IP addresses that are mapped to the website's domain name in DNS.
The latter can be omitted if the former IP address is public.
Does anyone see any flaws in the design where the following structure
is used?
apr_time_t stamp; // 8 bytes, converted to network byte order
long connection_id; // size depends on architecture: normally 4 or 8
bytes, doesn't need htonl
union {struct in_addr, struct in6_addr} local_ip_addr; // 4 to 16 bytes
[union {struct in_addr, struct in6_addr} dns_ip_addr;] // 0 to 16 bytes
Comments and suggestions are appreciated.
Konstantin Chuguev
Software Developer
Clickstream Technologies PLC, 58 Davies Street, London, W1K 5JF,
Registered in England No. 3774129
Re: "Better" mod_unique_id
Posted by Konstantin Chuguev <Ch...@Clickstream.com>.
Hi Ian,
Shame I wasn't aware of UUIDs. It looks like a very credible solution.
RFC 4122 even defines a URN namespace for it. And it is provided on
many platforms straight away. I think I'll stick to it until I find
someone who convinces me it is not good for some reason.
Thanks a lot for the hint.
Konstantin.
On 29 Apr 2008, at 10:53, Ian Holsman wrote:
> Hi Konstantin.
>
> I'm about to look at the same issue for my employer.
>
> for my version I was planning on using apr_uuid_get that uses
> uuid_create / uuid_generate function to generate a unique value.
>
> have you looked at this function?
>
> regards
> Ian
>
> Konstantin Chuguev wrote:
>> Hi,
>>
>> I'm developing a solution generating unique IDs for the requests to
>> websites that are not only clustered but also geographically
>> dispersed. This implies the following:
>> - the website's virtual host section on each Apache server has the
>> same ServerName which is mapped by DNS to different IP addresses
>> using various methods, geo-proximity, round-robin, etc.
>> - the virtual host's IP address is normally but not necessarily *;
>> - the actual IP address the Apache listens to for this virtual host
>> is normally, but not necessarily, an intranet address (behind a
>> load balancer).
>>
>> After analysing the format of the ID generated by mod_unique_id,
>> and reading the module's source code, I have a feeling that this
>> module has serious flaws if used in my situation.
>> No offence to the authors, I'm sure the module serves its purpose
>> just right for the majority of its users. But as it seems that it
>> doesn't do this in my case, I thought I'd better ask if someone
>> knows why.
>>
>> I understand that the module is relatively old and likely has been
>> ported from a pre-2.0 version, when no APR library existed, and
>> this might explain its design. I'd be glad if someone could either
>> confirm this or
>> explain why it has been done like that.
>>
>> Now to the point of my question. The unique_id_rec structure that
>> contains the binary representation of the unique ID consists of the
>> following fields:
>> unsigned int stamp;
>> unsigned int in_addr;
>> unsigned int pid;
>> unsigned short counter;
>> unsigned int thread_index;
>>
>> 1. Why use unsigned int timestamp when there exists apr_time_t
>> which is 64 bit and seems to be at least 1 microsecond accurate?
>> Surely there is unsigned short counter which helps if there is more
>> than one request coming to the same IP address / PID / thread per
>> second, but still I can hardly see this as a better design.
>>
>> 2. Why use unsigned id pid plus unsigned int thread_index if there
>> exists long r->connection->id? thread_index is in fact produced by
>> doing htonl((unsigned int)r->connection->id), but MPMs seem to
>> ensure the child_id is included there already! While it is just 4
>> bytes long compared to the 8-byte pid/thread_index combination,
>> still it is guaranteed to be unique among all worker threads of the
>> Apache server in the system. And I don't think this particular
>> field needs converting to the network byte order.
>>
>> 3. Using unsigned int in_addr with the server-side IPv4 address
>> works well in the single cluster in the IPv4 network only. What if
>> only IPv6 is being used in the intranet? What if multiple dispersed
>> clusters with exactly the same intranet IP addressing schemes serve
>> the same website? Please correct me if I'm wrong but I think the
>> following structure would represent the unique website more
>> correctly:
>> - union {struct in_addr, struct in6_addr} local_ip_addr: the IP
>> address of the local side of the HTTP connection;
>> - union {struct in_addr, struct in6_addr} dns_ip_addr: one (any?)
>> of the IP addresses that are mapped to the website's domain name in
>> DNS. The latter can be omitted if the former IP address is public.
>>
>> Does anyone see any flaws in the design where the following
>> structure is used?
>> apr_time_t stamp; // 8 bytes, converted to network byte order
>> long connection_id; // size depends on architecture: normally
>> 4 or 8 bytes, doesn't need htonl
>> union {struct in_addr, struct in6_addr} local_ip_addr; // 4
>> to 16 bytes
>> [union {struct in_addr, struct in6_addr} dns_ip_addr;] // 0
>> to 16 bytes
>>
>> Comments and suggestions are appreciated.
>>
>> Konstantin Chuguev
>> Software Developer
>>
>> Clickstream Technologies PLC, 58 Davies Street, London, W1K 5JF,
>> Registered in England No. 3774129
>>
>>
>>
>
Konstantin Chuguev
Software Developer
Clickstream Technologies PLC, 58 Davies Street, London, W1K 5JF,
Registered in England No. 3774129
Re: "Better" mod_unique_id
Posted by Ian Holsman <li...@holsman.net>.
Hi Konstantin.
I'm about to look at the same issue for my employer.
for my version I was planning on using apr_uuid_get that uses
uuid_create / uuid_generate function to generate a unique value.
have you looked at this function?
regards
Ian
Konstantin Chuguev wrote:
> Hi,
>
> I'm developing a solution generating unique IDs for the requests to
> websites that are not only clustered but also geographically
> dispersed. This implies the following:
> - the website's virtual host section on each Apache server has the
> same ServerName which is mapped by DNS to different IP addresses using
> various methods, geo-proximity, round-robin, etc.
> - the virtual host's IP address is normally but not necessarily *;
> - the actual IP address the Apache listens to for this virtual host is
> normally, but not necessarily, an intranet address (behind a load
> balancer).
>
> After analysing the format of the ID generated by mod_unique_id, and
> reading the module's source code, I have a feeling that this module
> has serious flaws if used in my situation.
> No offence to the authors, I'm sure the module serves its purpose just
> right for the majority of its users. But as it seems that it doesn't
> do this in my case, I thought I'd better ask if someone knows why.
>
> I understand that the module is relatively old and likely has been
> ported from a pre-2.0 version, when no APR library existed, and this
> might explain its design. I'd be glad if someone could either confirm
> this or
> explain why it has been done like that.
>
> Now to the point of my question. The unique_id_rec structure that
> contains the binary representation of the unique ID consists of the
> following fields:
> unsigned int stamp;
> unsigned int in_addr;
> unsigned int pid;
> unsigned short counter;
> unsigned int thread_index;
>
> 1. Why use unsigned int timestamp when there exists apr_time_t which
> is 64 bit and seems to be at least 1 microsecond accurate? Surely
> there is unsigned short counter which helps if there is more than one
> request coming to the same IP address / PID / thread per second, but
> still I can hardly see this as a better design.
>
> 2. Why use unsigned id pid plus unsigned int thread_index if there
> exists long r->connection->id? thread_index is in fact produced by
> doing htonl((unsigned int)r->connection->id), but MPMs seem to ensure
> the child_id is included there already! While it is just 4 bytes long
> compared to the 8-byte pid/thread_index combination, still it is
> guaranteed to be unique among all worker threads of the Apache server
> in the system. And I don't think this particular field needs
> converting to the network byte order.
>
> 3. Using unsigned int in_addr with the server-side IPv4 address works
> well in the single cluster in the IPv4 network only. What if only IPv6
> is being used in the intranet? What if multiple dispersed clusters
> with exactly the same intranet IP addressing schemes serve the same
> website? Please correct me if I'm wrong but I think the following
> structure would represent the unique website more correctly:
> - union {struct in_addr, struct in6_addr} local_ip_addr: the IP
> address of the local side of the HTTP connection;
> - union {struct in_addr, struct in6_addr} dns_ip_addr: one (any?) of
> the IP addresses that are mapped to the website's domain name in DNS.
> The latter can be omitted if the former IP address is public.
>
> Does anyone see any flaws in the design where the following structure
> is used?
> apr_time_t stamp; // 8 bytes, converted to network byte order
> long connection_id; // size depends on architecture: normally 4
> or 8 bytes, doesn't need htonl
> union {struct in_addr, struct in6_addr} local_ip_addr; // 4 to
> 16 bytes
> [union {struct in_addr, struct in6_addr} dns_ip_addr;] // 0 to
> 16 bytes
>
> Comments and suggestions are appreciated.
>
> Konstantin Chuguev
> Software Developer
>
> Clickstream Technologies PLC, 58 Davies Street, London, W1K 5JF,
> Registered in England No. 3774129
>
>
>