You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Thomas <al...@gmx.net> on 2013/02/12 16:31:54 UTC

mod_firehose and pcap format

Looking at mod_firehose from trunk, is there any effort going on or already
concluded on converting the output of mod_firehose or it's parser program
firehose to the pcap format ? I know it was shortly discussed during the
mod_firehose integretation proposal but I have not seen any result there. I
realize mod_firehose actually aims for something simpler then a full blown
tcpdump/wireshark compatible dump but it would still be neat to be able to
do it.

Re: mod_firehose and pcap format

Posted by Graham Leggett <mi...@sharp.fm>.
On 12 Feb 2013, at 5:31 PM, Thomas <al...@gmx.net> wrote:

> Looking at mod_firehose from trunk, is there any effort going on or already concluded on converting the output of mod_firehose or it's parser program firehose to the pcap format ? I know it was shortly discussed during the mod_firehose integretation proposal but I have not seen any result there. I realize mod_firehose actually aims for something simpler then a full blown tcpdump/wireshark compatible dump but it would still be neat to be able to do it.


Firehose aims to give you a view of requests inside an HTTP stream rather than packets over a wire, and aims to allow you to see the different buckets as they were recorded, but also gives enough information to reconstruct the original requests and responses back into a usable form.

Firehose was designed for an extremely high load environment, where it is more important to deliver the response to the audience at GBE and 10GBE than it is to wait for disks and processes to record the firehose packet to a pipe or file. Firehose may drop buckets, and this has to be detectable by the application reading the raw firehose. It was used to detect "one in a billion" request failures, where live traffic was recorded until the problem could be found, and then the original traffic could be "played back" to determine if the bug was fixed. We were dealing with hundreds of gigabytes of recorded request data that was analysed directly on live servers (that volume of data is completely impractical to copy around), thus the aim for efficiency in processing.

The pcap format (as I've read it in the past) just captures packet streams, there is no relationship captured between packets. This maps well into the world of packet based networking, but not into the world of streams, which HTTP is. Firehose cares that a bucket has been dropped, while pcap doesn't (by design, packet based networks drop packets).

Software that analyses pcap files expects to find network packets inside. While I could fake a TCP packet encapsulated in pcap, lots of questions emerge, do I fake TCP retransmission behaviour if firehose drops a bucket? Do I fake IPv4 or IPv6? How do I map the recording of requests to a fake TCP stream when multiple requests can run over the same connection? You reach a point where using a "designed for packets" encapsulation format works too hard against you when you're recording streams with potential holes in it, and you care where the holes are, and humans want to read this too.

The firehose format as it stands now is an extension to chunked encoding. A single line gives the length and additional parameters, followed by the binary chunk, followed by CRLF. The additional parameters give you the number in the sequence (allowing you to detect dropped buckets in the stream), and a UUID allowing you to reconstitute either a request or a connection. The current format is also human readable, which you will want to do if you care about the buckets being sent over the wire and whether they are excessively fragmented. The size of each bucket is carefully controlled to ensure that it can be written to a pipe atomically, which is why even if httpd sends an 8000 byte bucket, firehose will read fewer bytes to ensure a fit, and mod_firehose cares if pipes are involved and nobody is listening, the show must go on whether firehose works or not.

Regards,
Graham
--