You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Daniel Andres Pelaez Lopez <es...@gmail.com> on 2023/06/26 16:47:32 UTC

Tomcat 10.1.x: Using CoyoteInputStream to read a Chunked Transfer Encoding (CTE) stream, manually, skiping ChunkedInputFilter

Hi Tomcat community,

I have a requirement where we want to manually decode a Chunked Transfer
Encoding (CTE) stream using CoyoteInputStream to have access to the chunk
size. This means I want to use CoyoteInputStream.read method and get the
whole CTE bytes. Saying it in another way: we want to decode the CTE at
hand skipping Tomcat defaults.

The current flow from the point of view of CoyoteInputStream is:
CoyoteInputStream.read -> Request.read -> ChunkedInputFilter.read.

ChunkedInputFilter handles the CTE decoding and the read method only
returns the chunks, with no other information, like chunk size.

I found that the method Request.setInputBuffer might allow to set a
different InputBuffer implementation, for instance, the
IdentityInputFilter, which I understand returns all the stream bytes, with
no decoding. However, not sure if this is the right way and which
consequences might have.

I would like to know if there are other ways to override the CTE behavior,
any help would be appreciated.

BTW: We are using Spring Boot with Tomcat embedded.

--
Daniel Andrés Pelaez López

Re: Tomcat 10.1.x: Using CoyoteInputStream to read a Chunked Transfer Encoding (CTE) stream, manually, skiping ChunkedInputFilter

Posted by Daniel Andres Pelaez Lopez <es...@gmail.com>.
El mié, 28 jun 2023 a las 7:15, Christopher Schultz (<
chris@christopherschultz.net>) escribió:

> Daniel,
>
> On 6/27/23 15:40, Daniel Andres Pelaez Lopez wrote:
> > You are right, the CMAF format of the segment might bring the fragment
> size
> > information, but as you state, we might need to parse the segment as it
> is
> > being uploaded to figure out the fragment size, that's an option over the
> > table, but being fast is also important here, as we are creating low
> > latency streams (under 3 seconds glass to glass). Seems easier to just
> read
> > the chunk size from the CTE, as this DASH server example shows
> > https://gitlab.com/fflabs/dash_server/-/blob/master/dash_server.py#L62
> > that's a DASH server in Python with pretty low-level network access.
>
> You are only trying to optimize one link, here, right? The
> Tomcat-to-client link? The video-generator-to-Tomcat link need not be
> particularly optimized, correct?
>
> It might be easier to parse the CMAF as it arrives and store your own
> mapping metadata. It will require less hacking of Tomcat (which would
> not work if you ever had to switch server vendors, for example) and it
> would also work with *any* client, not just the specially-coded client
> that already does this particularly-convenient upload-chunking you are
> trying to capture.
>
> >> 1. The line which sets the output buffer size. If you use the default
> >> buffer size, Tomcat may (okay, WILL) "chunk" the response in the middle
> >> of your video-chunk of a video-chunk can get bigger than the current
> >> buffer size. So you need to make sure that doesn't happen.
> >>
> >> Or, maybe it's okay if that happens, but you want to minimize the number
> >> of times that happens or you waste bytes, cycles, etc.
> >
> > This is great info, I didn't know, as we would like to transfer full
> > fragments, we might need to increase that above the max, I have seen 20
> kb
> > fragments.
>
> While that's "not very big", I believe the default buffer size is 8kb so
> you might have been bitten by this.
>
> > Streaming video is hard and harder in low latency glass to glass, so,
> seems
> > like optimizations on how to transfer the video are important, for
> > instance, the HLS spec mentions how those fragments/byteranges should be
> > returned
> >
> https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis#section-6.2.6
> > (partial segments = fragments):
> >
> >     When processing requests for a URI or a byte range of a URI that
> >     includes one or more Partial Segments that are not yet completely
> >     available to be sent - such as requests made in response to an EXT-X-
> >     PRELOAD-HINT tag - the server MUST refrain from transmitting any
> >     bytes belonging to a Partial Segment until all bytes of that Partial
> >     Segment can be transmitted at the full speed of the link to the
> >     client.  If the requested range includes more than one Partial
> >     Segment then the server MUST enforce this delivery guarantee for each
> >     Partial Segment in turn.  This enables the client to perform accurate
> >     Adaptive Bit Rate (ABR) measurements
>
> Yeah, it's not surprising that I'm ignorant of all that stuff. :)
>
> > Our understanding of that statement is that we must have the whole
> > chunk/fragment/partial segment ready before transmitting it through the
> > network, as a chunk.
>
> But I think it was mostly written to ensure that no other delay factors
> would come into play during transmission -- such as waiting on some
> OTHER network resource to provide the source data. I mean... you are
> still going to be waiting on the disk/NAS/etc. right? Or are you reading
> everything into memory before this?
>
> I'd still argue that math is fast and networks are slow, but it's not my
> project :)
>
> > Regarding using org.apache.coyote.Request.setInputBuffer as a workaround,
> > seems like we don't have access to org.apache.coyote.Request directly, we
> > have access to org.apache.catalina.connector.RequestFacade, which doesn't
> > offer any way to access the
> > underlying org.apache.catalina.connector.Request, and therefore
> > org.apache.coyote.Request. Any way to have access to
> > org.apache.coyote.Request?
>
> Yeah, this is the part where I think you need some support added into
> Tomcat itself at the source level.
>
> Tomcat doesn't expose the coyote.Request object to applications for two
> reasons: (1) it's contrary to the spec and (2) it's potentially
> dangerous, since it offers access to Tomcat internals.
>
> So we'll have to come up with the most convenient thing in Tomcat that
> can reasonably help you out, here.
>
> It's possible that we won't come up with anything, because it will break
> too much encapsulation / protection and you may have to resort to my
> proposal above, which is to instrument the upload differently and
> capture your metadata in a more product-agnostic way.
>
> -chris
>
>

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Christopher,

Thanks for the help and insights. We are going to evaluate what other
options we have, but seems like hacking Tomcat is not the right path.
-- 
Daniel Andrés Pelaez López
Master’s Degree in IT Architectures, Universidad de los Andes.
Software Construction Specialist, Universidad de los Andes.
Bachelor's Degree in Computer Sciences, Universidad del Quindio.
e. estigma88@gmail.com

Re: Tomcat 10.1.x: Using CoyoteInputStream to read a Chunked Transfer Encoding (CTE) stream, manually, skiping ChunkedInputFilter

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Daniel,

On 6/27/23 15:40, Daniel Andres Pelaez Lopez wrote:
> You are right, the CMAF format of the segment might bring the fragment size
> information, but as you state, we might need to parse the segment as it is
> being uploaded to figure out the fragment size, that's an option over the
> table, but being fast is also important here, as we are creating low
> latency streams (under 3 seconds glass to glass). Seems easier to just read
> the chunk size from the CTE, as this DASH server example shows
> https://gitlab.com/fflabs/dash_server/-/blob/master/dash_server.py#L62
> that's a DASH server in Python with pretty low-level network access.

You are only trying to optimize one link, here, right? The 
Tomcat-to-client link? The video-generator-to-Tomcat link need not be 
particularly optimized, correct?

It might be easier to parse the CMAF as it arrives and store your own 
mapping metadata. It will require less hacking of Tomcat (which would 
not work if you ever had to switch server vendors, for example) and it 
would also work with *any* client, not just the specially-coded client 
that already does this particularly-convenient upload-chunking you are 
trying to capture.

>> 1. The line which sets the output buffer size. If you use the default
>> buffer size, Tomcat may (okay, WILL) "chunk" the response in the middle
>> of your video-chunk of a video-chunk can get bigger than the current
>> buffer size. So you need to make sure that doesn't happen.
>>
>> Or, maybe it's okay if that happens, but you want to minimize the number
>> of times that happens or you waste bytes, cycles, etc.
> 
> This is great info, I didn't know, as we would like to transfer full
> fragments, we might need to increase that above the max, I have seen 20 kb
> fragments.

While that's "not very big", I believe the default buffer size is 8kb so 
you might have been bitten by this.

> Streaming video is hard and harder in low latency glass to glass, so, seems
> like optimizations on how to transfer the video are important, for
> instance, the HLS spec mentions how those fragments/byteranges should be
> returned
> https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis#section-6.2.6
> (partial segments = fragments):
> 
>     When processing requests for a URI or a byte range of a URI that
>     includes one or more Partial Segments that are not yet completely
>     available to be sent - such as requests made in response to an EXT-X-
>     PRELOAD-HINT tag - the server MUST refrain from transmitting any
>     bytes belonging to a Partial Segment until all bytes of that Partial
>     Segment can be transmitted at the full speed of the link to the
>     client.  If the requested range includes more than one Partial
>     Segment then the server MUST enforce this delivery guarantee for each
>     Partial Segment in turn.  This enables the client to perform accurate
>     Adaptive Bit Rate (ABR) measurements

Yeah, it's not surprising that I'm ignorant of all that stuff. :)

> Our understanding of that statement is that we must have the whole
> chunk/fragment/partial segment ready before transmitting it through the
> network, as a chunk.

But I think it was mostly written to ensure that no other delay factors 
would come into play during transmission -- such as waiting on some 
OTHER network resource to provide the source data. I mean... you are 
still going to be waiting on the disk/NAS/etc. right? Or are you reading 
everything into memory before this?

I'd still argue that math is fast and networks are slow, but it's not my 
project :)

> Regarding using org.apache.coyote.Request.setInputBuffer as a workaround,
> seems like we don't have access to org.apache.coyote.Request directly, we
> have access to org.apache.catalina.connector.RequestFacade, which doesn't
> offer any way to access the
> underlying org.apache.catalina.connector.Request, and therefore
> org.apache.coyote.Request. Any way to have access to
> org.apache.coyote.Request?

Yeah, this is the part where I think you need some support added into 
Tomcat itself at the source level.

Tomcat doesn't expose the coyote.Request object to applications for two 
reasons: (1) it's contrary to the spec and (2) it's potentially 
dangerous, since it offers access to Tomcat internals.

So we'll have to come up with the most convenient thing in Tomcat that 
can reasonably help you out, here.

It's possible that we won't come up with anything, because it will break 
too much encapsulation / protection and you may have to resort to my 
proposal above, which is to instrument the upload differently and 
capture your metadata in a more product-agnostic way.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 10.1.x: Using CoyoteInputStream to read a Chunked Transfer Encoding (CTE) stream, manually, skiping ChunkedInputFilter

Posted by Daniel Andres Pelaez Lopez <es...@gmail.com>.
El mar, 27 jun 2023 a las 13:48, Christopher Schultz (<
chris@christopherschultz.net>) escribió:

> Daniel,
>
> On 6/27/23 12:56, Daniel Andres Pelaez Lopez wrote:
> > Christopher,
> >
> > El mar, 27 jun 2023 a las 9:33, Christopher Schultz (<
> > chris@christopherschultz.net>) escribió:
> >
> >> Daniel,
> >>
> >> On 6/26/23 16:15, Daniel Andres Pelaez Lopez wrote:
> >>> El lun, 26 jun 2023 a las 14:53, Mark Thomas (<ma...@apache.org>)
> >> escribió:
> >>>
> >>>> On 26/06/2023 20:34, Christopher Schultz wrote:
> >>>>> Daniel,
> >>>>>
> >>>>> On 6/26/23 12:47, Daniel Andres Pelaez Lopez wrote:
> >>>>>> Hi Tomcat community,
> >>>>>>
> >>>>>> I have a requirement where we want to manually decode a Chunked
> >> Transfer
> >>>>>> Encoding (CTE) stream using CoyoteInputStream to have access to the
> >>>> chunk
> >>>>>> size. This means I want to use CoyoteInputStream.read method and get
> >> the
> >>>>>> whole CTE bytes. Saying it in another way: we want to decode the CTE
> >> at
> >>>>>> hand skipping Tomcat defaults.
> >>>>>
> >>>>> Dumb question: why?
> >>>>
> >>>> Not a dumb question at all. It is the key question. I'm curious as to
> >>>> what the answer is.
> >>>>
> >>>> Mark
> >>>>
> >>>>
> >>> Not dumb question at all. Let me expand the use case: we are working on
> >> an
> >>> HTTP origin (Tomcat) for video streaming using HLS and DASH. Our video
> >>> packager generates video segments of X size, and each segment is also
> >>> divided into fragments (CMAF). The segment size is fixed, but the
> >> fragment
> >>> size is variable. Our packager transfers the segment meanwhile it
> >> generates
> >>> it, a fragment at the time (a chunk), in a CTE, to the HTTP origin
> >>> (Tomcat). Now, video players want to download the segment, but as for
> the
> >>> HLS spec, we require to transfer the segment to the video player, as we
> >>> received, a fragment a the time. To be able of sending a fragment at
> the
> >>> time, we need to know its size, which is implicit inside the CTE (each
> >>> chunk declares the chunk size).
> >>>
> >>> Our current implementation sends the segment using CTE to the video
> >>> players, but we cannot guarantee we are sending a fragment by chunk.
> >>>
> >>> This is why having access to each chunk and its size will help us.
> >>
> >> Thanks for the details. I think I've got it, but I want to clarify a
> >> little bit.
> >>
> >> Is your video-chunk-generator producing anything HTTP-related? It almost
> >> sounds like Tomcat is a reverse-proxy and your video-generator is the
> >> origin. Maybe you are just generating byte[] from the video-generator?
> >>
> >> Or maybe your video-generator is UPLOADING the chunks to the HTTP
> >> server? It's not entirely clear to me, and the details matter.
> >>
> >
> > Thanks for staying in the conversation.
> >
> > The packager (video-chunk-generator) sends an HTTP PUT with
> > Transfer-Encoding: chunked header, the content is a video segment, where
> > each chunk is a fragment, so, yes, the video-chunk-generator uploads the
> > segment in chunks to the Tomcat server (origin)
> >
> > Sorry for the confusion regarding the word "origin", that is a video
> > streaming term that doesn't matter for the question.
>
> Yes, that's important information to have: in HTTPD, the "origin" is the
> web server which actually has the desired resource. Contrast that with a
> reverse proxy, etc.
>
> >> It sounds like you are trying to optimize things such that video-chunk
> >> size ends up being equal to the HTTP-chunk size. Is that the real goal?
> >>
> >
> > The video-chunk-generator does it for us, it sends each video fragment as
> > an HTTP chunk. What we want to optimize is not the transfer from the
> > video-chunk-generator to the server, but from the server to its clients.
> > Clients will do an HTTP GET against the server to grab the segment, that
> > GET we want to optimize in a way that we keep the fragment-by-chunk
> > strategy, using Transfer-Encoding: chunked. This is why, accessing each
> > chunk size when the video-chunk-generator does the PUT, and saving that
> > info in the server, we can use it when clients do a GET, to assure we
> > transfer the same way we received.
>
> Is there no way to observe the video-chunk-size by looking at the raw
> bytes of the video file itself? Take the MP3 audio format, with which
> I'm more familiar. MP3 frame lengths can be computed based upon some
> information at the start of each frame including the version number, bit
> rate, sample rate, etc. So by reading a few bytes into the file, you
> know how big each chunk would need to be. Then you can bush the bytes
> and go to the next chunk, etc.
>
> If you can do that with your files, there is no reason to record the
> chunk-sizes that you got at the time of upload unless you just want the
> download to be as absolutely screaming-fast as possible and you don't
> want to perform any mathematical operations at all during the download
> (though you will presumably have to read a file from storage, which has
> a much higher cost than a little bit of math IMHO).
>

You are right, the CMAF format of the segment might bring the fragment size
information, but as you state, we might need to parse the segment as it is
being uploaded to figure out the fragment size, that's an option over the
table, but being fast is also important here, as we are creating low
latency streams (under 3 seconds glass to glass). Seems easier to just read
the chunk size from the CTE, as this DASH server example shows
https://gitlab.com/fflabs/dash_server/-/blob/master/dash_server.py#L62
that's a DASH server in Python with pretty low-level network access.


>
> Let's assume you CAN determine chunk-size from your source file. You can
> get Tomcat to chunk your file the same way just like this:
>
> public void goGet(HttpServletRequest request, HttpServletResponse
> response) throws IOException {
>
>    response.setHeader("Transfer-Encoding", "chunked");
>    response.setBufferSize(MAXIMUM_VIDEO_FRAME_SIZE); // This is important
>
>    InputStream video = ...; // You figure this out
>    OutputStream out = response.getOutputStream();
>
>    boolean eof = false;
>
>    byte[] buffer = new buffer[1024]; // Or something appropriate
>
>    while(!eof) {
>      int c = video.read(buffer);
>
>      if(-1 == c) {
>        eof = true;
>      } else {
>        int chunkSize = getChunkSize(buffer);
>
>        chunkSize =- c; // We have already read c bytes from video
>
>        out.write(buffer, 0, c);
>
>        for(i=c; i<chunkSize; ++i) { // TODO: Optimize this copy operation
>          out.write(i);
>        }
>
>        out.flush(); // This triggers Tomcat to generate a chunked
>                     // response
>      }
>    }
> }
>
> There are lots of way the above code can fail, etc. and so it needs to
> be much more robust, but I just wanted you to get the general idea.
>
> There are two very important things in the code:
>
> 1. The line which sets the output buffer size. If you use the default
> buffer size, Tomcat may (okay, WILL) "chunk" the response in the middle
> of your video-chunk of a video-chunk can get bigger than the current
> buffer size. So you need to make sure that doesn't happen.
>
> Or, maybe it's okay if that happens, but you want to minimize the number
> of times that happens or you waste bytes, cycles, etc.
>

This is great info, I didn't know, as we would like to transfer full
fragments, we might need to increase that above the max, I have seen 20 kb
fragments.


>
> 2. You must call ServletOutputStream.flush, which is how Tomcat knows to
> actually chunk the response.
>

Yes, we are doing that today.


>
> >> In that case, you want to force the chunk size to something specific,
> >> rather than just trying to see what the chunk size is.
> >>
> >> How you do that depends on whether your video-generator is sending data
> >> in the *request entity* in e.g. PUT or POST or if you are fetching the
> >> data in a *response entity*.
> >>
> >> I *think* you want to inspect chunk-size of an upload-to-Tomcat, but I
> >> want to be sure. Might this be easier to do on the client to force a
> >> certain chunk-size?
> >
> > You are right, we want to inspect the chunk-size of an upload to Tomcat.
> We
> > have no control over the video-chunk-generator, so, the only way to know
> > the fragment/chunk size they are sending is by inspecting the CTE.
>
> The only way to know the chunk size THEY are sending is to inspect and
> record it. But you don't really care what they send; instead you care
> what chunk-size to use for your Tomcat responses. They *should* be the
> same thing, but I wanted to re-frame (hah!) the problem to me more
> accurate, because I think you are trying to solve problem X (how to
> observe inbound chunk size) when you really want to solve problem Y
> (optimize outbound chunk size).


> >> Finally... for video, perhaps a Websocket connection would be better
> >> since there is less protocol-overhead once the ws connection is
> >> established?
> >
> > True, but the video-chunk-generator only offers two ways of transfer:
> HTTP
> > PUT or writing to disk. The second option was discarded as we will need
> to
> > listen to file system events and do some magic there, which we don't need
> > to do for the HTTP PUT, as the protocol/Tomcat guarantee when the
> transfer
> > starts and ends.
>
> Sounds good to me. Plus, if you use HTTP then you can de-couple the
> services easily at any time.
>
> >>>>>> The current flow from the point of view of CoyoteInputStream is:
> >>>>>> CoyoteInputStream.read -> Request.read -> ChunkedInputFilter.read.
> >>>>>>
> >>>>>> ChunkedInputFilter handles the CTE decoding and the read method only
> >>>>>> returns the chunks, with no other information, like chunk size.
> >>>>>>
> >>>>>> I found that the method Request.setInputBuffer might allow to set a
> >>>>>> different InputBuffer implementation, for instance, the
> >>>>>> IdentityInputFilter, which I understand returns all the stream
> bytes,
> >>>>>> with
> >>>>>> no decoding. However, not sure if this is the right way and which
> >>>>>> consequences might have.
> >>>>>>
> >>>>>> I would like to know if there are other ways to override the CTE
> >>>>>> behavior,
> >>>>>> any help would be appreciated.
> >>>>>
> >>>>> A problem I can see is that you are working with a blocking streaming
> >>>>> interface e.g. read(byte[]) and you also want to get the chunk size.
> >>>>> When? The chunk-size can change for every chunk, so if you call
> >>>>> getChunkSize() before the read() and after the read(), they may be
> >>>>> different if the read() returns data from multiple chunks. It may
> have
> >>>>> changed multiple times between read() was called and when it
> completed.
> >>>>>
> >>>>> If you want to always size byte byte[] to read full-chunks at once
> ...
> >> I
> >>>>> guess I would again ask "why?"
> >>>>>
> >>>>> Would it be sufficient for ChunkedInputFilter to maybe send an
> >>>>> event-notification each time a chunk boundary was crossed? For
> example:
> >>>>>
> >>>>> public interface ChunkListener {
> >>>>>      public void chunkStarted(ChunkedInputFilter source, long offset,
> >> long
> >>>>> length);
> >>>>>      public void chunkFinished(ChunkedInputFilter source, long
> offset,
> >>>>> long length);
> >>>>> }
> >>>>>
> >>>>> Then, every time the Filter begins or ends a chunk it could notify
> your
> >>>>> code and you can do whatever you want with that information.
> >>>
> >>>> You might be able to subclass the (somewhat confusingly-named)
> >>>>> ChunkInputFilter and bolt-on your own logic like what I have above.
> >>>>>
> >>>
> >>>
> >>> Yes, a listener like that looks great. Any more clues on how to inject
> my
> >>> own ChunkInputFilter implementation in Tomcat configuration? seems
> quite
> >>> hard to do it well.  Also, the listener must be linked by HTTP request.
> >>
> >> I think doing so would require some internal support for messing-around
> >> with the chain of objects that handle the requests. I don't think you
> >> can do this "on your own". One option would be for us to add the ability
> >> to register a "ChunkListener" with the ChunkInputFilter but honestly
> >> this is a pretty odd use-case and having that code running on every
> >> server worldwide seems like a waste. The other option would be to allow
> >> you to specify your own ChunkInputFilter class at some point during
> >> server initialization, which seems like a much better option.
> >>
> >
> > I totally agree Tomcat shouldn't add anything specific regarding this
> > uncommon use case, I am happy having a workaround. Specifying my own
> > ChunkInputFilter seems the way to go, I have access to the Request object
> > (which Spring Boot can inject), so, using Request.setInputBuffer should
> be
> > enough? I am a little concerned about playing with Tomcat defaults, but
> not
> > many options on my plate.
>
> One more frame-challenge (a bit of an intentional joke, there) for you:
> why bother "optimizing" the HTTP chunk-size? Most networking components
> and software work with buffers of sized sizes and end up naturally
> filling and emptying those buffers on a schedule that is pretty regular.
> By introducing an artificial "chunk size" which likely doesn't match any
> of those, you are definitely making things more complicated... but is it
> actually *improving* anything?
>
> If you have a 1MB video (small, I know) and it's video-chunked into
> segments of weird sizes like 1243, 6873, 2341, 7654, and 8790 bytes,
> does it matter to the client/recipient if they get HTTP-chunks of those
> exact same sizes or if they get HTTP-chunks which are all, say, 4096
> bytes in size (except the final chunk, which will be short)?
>
> Most media-players download several frames in advance of actually
> starting playing and continue to buffer throughout the playback.
> Additionally, any decent player will not just do something naive like this:
>
> HTTP GET /movies/guardians_of_the_galaxy.h264
>
> And download the entire file. Instead, the player will most likely make
> a range-request like this:
>
> HTTP GET /movies/guardians_of_the_galaxy.h264
> Range: bytes=0-1023
>
> Then the server sends the first 1k of data and the client decides what
> to do, next. The client makes many of these requests as playback
> continues. This allows the user to pause, scrub-around the timeline,
> rewind, etc. without ever download the entire file each time.
>
> I'm making a lot of assumptions about your usage of this service, but I
> think you may be trying to solve a problem that doesn't need to be
> solved... at least not the way you think it needs to be solved.
>

Streaming video is hard and harder in low latency glass to glass, so, seems
like optimizations on how to transfer the video are important, for
instance, the HLS spec mentions how those fragments/byteranges should be
returned
https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis#section-6.2.6
(partial segments = fragments):

   When processing requests for a URI or a byte range of a URI that
   includes one or more Partial Segments that are not yet completely
   available to be sent - such as requests made in response to an EXT-X-
   PRELOAD-HINT tag - the server MUST refrain from transmitting any
   bytes belonging to a Partial Segment until all bytes of that Partial
   Segment can be transmitted at the full speed of the link to the
   client.  If the requested range includes more than one Partial
   Segment then the server MUST enforce this delivery guarantee for each
   Partial Segment in turn.  This enables the client to perform accurate
   Adaptive Bit Rate (ABR) measurements

Our understanding of that statement is that we must have the whole
chunk/fragment/partial segment ready before transmitting it through the
network, as a chunk.

Regarding using org.apache.coyote.Request.setInputBuffer as a workaround,
seems like we don't have access to org.apache.coyote.Request directly, we
have access to org.apache.catalina.connector.RequestFacade, which doesn't
offer any way to access the
underlying org.apache.catalina.connector.Request, and therefore
org.apache.coyote.Request. Any way to have access to
org.apache.coyote.Request?


> -chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

-- 
Daniel Andrés Pelaez López
Master’s Degree in IT Architectures, Universidad de los Andes.
Software Construction Specialist, Universidad de los Andes.
Bachelor's Degree in Computer Sciences, Universidad del Quindio.
e. estigma88@gmail.com

Re: Tomcat 10.1.x: Using CoyoteInputStream to read a Chunked Transfer Encoding (CTE) stream, manually, skiping ChunkedInputFilter

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Daniel,

On 6/27/23 12:56, Daniel Andres Pelaez Lopez wrote:
> Christopher,
> 
> El mar, 27 jun 2023 a las 9:33, Christopher Schultz (<
> chris@christopherschultz.net>) escribió:
> 
>> Daniel,
>>
>> On 6/26/23 16:15, Daniel Andres Pelaez Lopez wrote:
>>> El lun, 26 jun 2023 a las 14:53, Mark Thomas (<ma...@apache.org>)
>> escribió:
>>>
>>>> On 26/06/2023 20:34, Christopher Schultz wrote:
>>>>> Daniel,
>>>>>
>>>>> On 6/26/23 12:47, Daniel Andres Pelaez Lopez wrote:
>>>>>> Hi Tomcat community,
>>>>>>
>>>>>> I have a requirement where we want to manually decode a Chunked
>> Transfer
>>>>>> Encoding (CTE) stream using CoyoteInputStream to have access to the
>>>> chunk
>>>>>> size. This means I want to use CoyoteInputStream.read method and get
>> the
>>>>>> whole CTE bytes. Saying it in another way: we want to decode the CTE
>> at
>>>>>> hand skipping Tomcat defaults.
>>>>>
>>>>> Dumb question: why?
>>>>
>>>> Not a dumb question at all. It is the key question. I'm curious as to
>>>> what the answer is.
>>>>
>>>> Mark
>>>>
>>>>
>>> Not dumb question at all. Let me expand the use case: we are working on
>> an
>>> HTTP origin (Tomcat) for video streaming using HLS and DASH. Our video
>>> packager generates video segments of X size, and each segment is also
>>> divided into fragments (CMAF). The segment size is fixed, but the
>> fragment
>>> size is variable. Our packager transfers the segment meanwhile it
>> generates
>>> it, a fragment at the time (a chunk), in a CTE, to the HTTP origin
>>> (Tomcat). Now, video players want to download the segment, but as for the
>>> HLS spec, we require to transfer the segment to the video player, as we
>>> received, a fragment a the time. To be able of sending a fragment at the
>>> time, we need to know its size, which is implicit inside the CTE (each
>>> chunk declares the chunk size).
>>>
>>> Our current implementation sends the segment using CTE to the video
>>> players, but we cannot guarantee we are sending a fragment by chunk.
>>>
>>> This is why having access to each chunk and its size will help us.
>>
>> Thanks for the details. I think I've got it, but I want to clarify a
>> little bit.
>>
>> Is your video-chunk-generator producing anything HTTP-related? It almost
>> sounds like Tomcat is a reverse-proxy and your video-generator is the
>> origin. Maybe you are just generating byte[] from the video-generator?
>>
>> Or maybe your video-generator is UPLOADING the chunks to the HTTP
>> server? It's not entirely clear to me, and the details matter.
>>
> 
> Thanks for staying in the conversation.
> 
> The packager (video-chunk-generator) sends an HTTP PUT with
> Transfer-Encoding: chunked header, the content is a video segment, where
> each chunk is a fragment, so, yes, the video-chunk-generator uploads the
> segment in chunks to the Tomcat server (origin)
> 
> Sorry for the confusion regarding the word "origin", that is a video
> streaming term that doesn't matter for the question.

Yes, that's important information to have: in HTTPD, the "origin" is the 
web server which actually has the desired resource. Contrast that with a 
reverse proxy, etc.

>> It sounds like you are trying to optimize things such that video-chunk
>> size ends up being equal to the HTTP-chunk size. Is that the real goal?
>>
> 
> The video-chunk-generator does it for us, it sends each video fragment as
> an HTTP chunk. What we want to optimize is not the transfer from the
> video-chunk-generator to the server, but from the server to its clients.
> Clients will do an HTTP GET against the server to grab the segment, that
> GET we want to optimize in a way that we keep the fragment-by-chunk
> strategy, using Transfer-Encoding: chunked. This is why, accessing each
> chunk size when the video-chunk-generator does the PUT, and saving that
> info in the server, we can use it when clients do a GET, to assure we
> transfer the same way we received.

Is there no way to observe the video-chunk-size by looking at the raw 
bytes of the video file itself? Take the MP3 audio format, with which 
I'm more familiar. MP3 frame lengths can be computed based upon some 
information at the start of each frame including the version number, bit 
rate, sample rate, etc. So by reading a few bytes into the file, you 
know how big each chunk would need to be. Then you can bush the bytes 
and go to the next chunk, etc.

If you can do that with your files, there is no reason to record the 
chunk-sizes that you got at the time of upload unless you just want the 
download to be as absolutely screaming-fast as possible and you don't 
want to perform any mathematical operations at all during the download 
(though you will presumably have to read a file from storage, which has 
a much higher cost than a little bit of math IMHO).

Let's assume you CAN determine chunk-size from your source file. You can 
get Tomcat to chunk your file the same way just like this:

public void goGet(HttpServletRequest request, HttpServletResponse 
response) throws IOException {

   response.setHeader("Transfer-Encoding", "chunked");
   response.setBufferSize(MAXIMUM_VIDEO_FRAME_SIZE); // This is important

   InputStream video = ...; // You figure this out
   OutputStream out = response.getOutputStream();

   boolean eof = false;

   byte[] buffer = new buffer[1024]; // Or something appropriate

   while(!eof) {
     int c = video.read(buffer);

     if(-1 == c) {
       eof = true;
     } else {
       int chunkSize = getChunkSize(buffer);

       chunkSize =- c; // We have already read c bytes from video

       out.write(buffer, 0, c);

       for(i=c; i<chunkSize; ++i) { // TODO: Optimize this copy operation
         out.write(i);
       }

       out.flush(); // This triggers Tomcat to generate a chunked
                    // response
     }
   }
}

There are lots of way the above code can fail, etc. and so it needs to 
be much more robust, but I just wanted you to get the general idea.

There are two very important things in the code:

1. The line which sets the output buffer size. If you use the default 
buffer size, Tomcat may (okay, WILL) "chunk" the response in the middle 
of your video-chunk of a video-chunk can get bigger than the current 
buffer size. So you need to make sure that doesn't happen.

Or, maybe it's okay if that happens, but you want to minimize the number 
of times that happens or you waste bytes, cycles, etc.

2. You must call ServletOutputStream.flush, which is how Tomcat knows to 
actually chunk the response.

>> In that case, you want to force the chunk size to something specific,
>> rather than just trying to see what the chunk size is.
>>
>> How you do that depends on whether your video-generator is sending data
>> in the *request entity* in e.g. PUT or POST or if you are fetching the
>> data in a *response entity*.
>>
>> I *think* you want to inspect chunk-size of an upload-to-Tomcat, but I
>> want to be sure. Might this be easier to do on the client to force a
>> certain chunk-size?
> 
> You are right, we want to inspect the chunk-size of an upload to Tomcat. We
> have no control over the video-chunk-generator, so, the only way to know
> the fragment/chunk size they are sending is by inspecting the CTE.

The only way to know the chunk size THEY are sending is to inspect and 
record it. But you don't really care what they send; instead you care 
what chunk-size to use for your Tomcat responses. They *should* be the 
same thing, but I wanted to re-frame (hah!) the problem to me more 
accurate, because I think you are trying to solve problem X (how to 
observe inbound chunk size) when you really want to solve problem Y 
(optimize outbound chunk size).

>> Finally... for video, perhaps a Websocket connection would be better
>> since there is less protocol-overhead once the ws connection is
>> established?
> 
> True, but the video-chunk-generator only offers two ways of transfer: HTTP
> PUT or writing to disk. The second option was discarded as we will need to
> listen to file system events and do some magic there, which we don't need
> to do for the HTTP PUT, as the protocol/Tomcat guarantee when the transfer
> starts and ends.

Sounds good to me. Plus, if you use HTTP then you can de-couple the 
services easily at any time.

>>>>>> The current flow from the point of view of CoyoteInputStream is:
>>>>>> CoyoteInputStream.read -> Request.read -> ChunkedInputFilter.read.
>>>>>>
>>>>>> ChunkedInputFilter handles the CTE decoding and the read method only
>>>>>> returns the chunks, with no other information, like chunk size.
>>>>>>
>>>>>> I found that the method Request.setInputBuffer might allow to set a
>>>>>> different InputBuffer implementation, for instance, the
>>>>>> IdentityInputFilter, which I understand returns all the stream bytes,
>>>>>> with
>>>>>> no decoding. However, not sure if this is the right way and which
>>>>>> consequences might have.
>>>>>>
>>>>>> I would like to know if there are other ways to override the CTE
>>>>>> behavior,
>>>>>> any help would be appreciated.
>>>>>
>>>>> A problem I can see is that you are working with a blocking streaming
>>>>> interface e.g. read(byte[]) and you also want to get the chunk size.
>>>>> When? The chunk-size can change for every chunk, so if you call
>>>>> getChunkSize() before the read() and after the read(), they may be
>>>>> different if the read() returns data from multiple chunks. It may have
>>>>> changed multiple times between read() was called and when it completed.
>>>>>
>>>>> If you want to always size byte byte[] to read full-chunks at once ...
>> I
>>>>> guess I would again ask "why?"
>>>>>
>>>>> Would it be sufficient for ChunkedInputFilter to maybe send an
>>>>> event-notification each time a chunk boundary was crossed? For example:
>>>>>
>>>>> public interface ChunkListener {
>>>>>      public void chunkStarted(ChunkedInputFilter source, long offset,
>> long
>>>>> length);
>>>>>      public void chunkFinished(ChunkedInputFilter source, long offset,
>>>>> long length);
>>>>> }
>>>>>
>>>>> Then, every time the Filter begins or ends a chunk it could notify your
>>>>> code and you can do whatever you want with that information.
>>>
>>>> You might be able to subclass the (somewhat confusingly-named)
>>>>> ChunkInputFilter and bolt-on your own logic like what I have above.
>>>>>
>>>
>>>
>>> Yes, a listener like that looks great. Any more clues on how to inject my
>>> own ChunkInputFilter implementation in Tomcat configuration? seems quite
>>> hard to do it well.  Also, the listener must be linked by HTTP request.
>>
>> I think doing so would require some internal support for messing-around
>> with the chain of objects that handle the requests. I don't think you
>> can do this "on your own". One option would be for us to add the ability
>> to register a "ChunkListener" with the ChunkInputFilter but honestly
>> this is a pretty odd use-case and having that code running on every
>> server worldwide seems like a waste. The other option would be to allow
>> you to specify your own ChunkInputFilter class at some point during
>> server initialization, which seems like a much better option.
>>
> 
> I totally agree Tomcat shouldn't add anything specific regarding this
> uncommon use case, I am happy having a workaround. Specifying my own
> ChunkInputFilter seems the way to go, I have access to the Request object
> (which Spring Boot can inject), so, using Request.setInputBuffer should be
> enough? I am a little concerned about playing with Tomcat defaults, but not
> many options on my plate.

One more frame-challenge (a bit of an intentional joke, there) for you: 
why bother "optimizing" the HTTP chunk-size? Most networking components 
and software work with buffers of sized sizes and end up naturally 
filling and emptying those buffers on a schedule that is pretty regular. 
By introducing an artificial "chunk size" which likely doesn't match any 
of those, you are definitely making things more complicated... but is it 
actually *improving* anything?

If you have a 1MB video (small, I know) and it's video-chunked into 
segments of weird sizes like 1243, 6873, 2341, 7654, and 8790 bytes, 
does it matter to the client/recipient if they get HTTP-chunks of those 
exact same sizes or if they get HTTP-chunks which are all, say, 4096 
bytes in size (except the final chunk, which will be short)?

Most media-players download several frames in advance of actually 
starting playing and continue to buffer throughout the playback. 
Additionally, any decent player will not just do something naive like this:

HTTP GET /movies/guardians_of_the_galaxy.h264

And download the entire file. Instead, the player will most likely make 
a range-request like this:

HTTP GET /movies/guardians_of_the_galaxy.h264
Range: bytes=0-1023

Then the server sends the first 1k of data and the client decides what 
to do, next. The client makes many of these requests as playback 
continues. This allows the user to pause, scrub-around the timeline, 
rewind, etc. without ever download the entire file each time.

I'm making a lot of assumptions about your usage of this service, but I 
think you may be trying to solve a problem that doesn't need to be 
solved... at least not the way you think it needs to be solved.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 10.1.x: Using CoyoteInputStream to read a Chunked Transfer Encoding (CTE) stream, manually, skiping ChunkedInputFilter

Posted by Daniel Andres Pelaez Lopez <es...@gmail.com>.
Christopher,

El mar, 27 jun 2023 a las 9:33, Christopher Schultz (<
chris@christopherschultz.net>) escribió:

> Daniel,
>
> On 6/26/23 16:15, Daniel Andres Pelaez Lopez wrote:
> > El lun, 26 jun 2023 a las 14:53, Mark Thomas (<ma...@apache.org>)
> escribió:
> >
> >> On 26/06/2023 20:34, Christopher Schultz wrote:
> >>> Daniel,
> >>>
> >>> On 6/26/23 12:47, Daniel Andres Pelaez Lopez wrote:
> >>>> Hi Tomcat community,
> >>>>
> >>>> I have a requirement where we want to manually decode a Chunked
> Transfer
> >>>> Encoding (CTE) stream using CoyoteInputStream to have access to the
> >> chunk
> >>>> size. This means I want to use CoyoteInputStream.read method and get
> the
> >>>> whole CTE bytes. Saying it in another way: we want to decode the CTE
> at
> >>>> hand skipping Tomcat defaults.
> >>>
> >>> Dumb question: why?
> >>
> >> Not a dumb question at all. It is the key question. I'm curious as to
> >> what the answer is.
> >>
> >> Mark
> >>
> >>
> > Not dumb question at all. Let me expand the use case: we are working on
> an
> > HTTP origin (Tomcat) for video streaming using HLS and DASH. Our video
> > packager generates video segments of X size, and each segment is also
> > divided into fragments (CMAF). The segment size is fixed, but the
> fragment
> > size is variable. Our packager transfers the segment meanwhile it
> generates
> > it, a fragment at the time (a chunk), in a CTE, to the HTTP origin
> > (Tomcat). Now, video players want to download the segment, but as for the
> > HLS spec, we require to transfer the segment to the video player, as we
> > received, a fragment a the time. To be able of sending a fragment at the
> > time, we need to know its size, which is implicit inside the CTE (each
> > chunk declares the chunk size).
> >
> > Our current implementation sends the segment using CTE to the video
> > players, but we cannot guarantee we are sending a fragment by chunk.
> >
> > This is why having access to each chunk and its size will help us.
>
> Thanks for the details. I think I've got it, but I want to clarify a
> little bit.
>
> Is your video-chunk-generator producing anything HTTP-related? It almost
> sounds like Tomcat is a reverse-proxy and your video-generator is the
> origin. Maybe you are just generating byte[] from the video-generator?
>
> Or maybe your video-generator is UPLOADING the chunks to the HTTP
> server? It's not entirely clear to me, and the details matter.
>

Thanks for staying in the conversation.

The packager (video-chunk-generator) sends an HTTP PUT with
Transfer-Encoding: chunked header, the content is a video segment, where
each chunk is a fragment, so, yes, the video-chunk-generator uploads the
segment in chunks to the Tomcat server (origin)

Sorry for the confusion regarding the word "origin", that is a video
streaming term that doesn't matter for the question.


> It sounds like you are trying to optimize things such that video-chunk
> size ends up being equal to the HTTP-chunk size. Is that the real goal?
>

The video-chunk-generator does it for us, it sends each video fragment as
an HTTP chunk. What we want to optimize is not the transfer from the
video-chunk-generator to the server, but from the server to its clients.
Clients will do an HTTP GET against the server to grab the segment, that
GET we want to optimize in a way that we keep the fragment-by-chunk
strategy, using Transfer-Encoding: chunked. This is why, accessing each
chunk size when the video-chunk-generator does the PUT, and saving that
info in the server, we can use it when clients do a GET, to assure we
transfer the same way we received.


>
> In that case, you want to force the chunk size to something specific,
> rather than just trying to see what the chunk size is.
>
> How you do that depends on whether your video-generator is sending data
> in the *request entity* in e.g. PUT or POST or if you are fetching the
> data in a *response entity*.
>
> I *think* you want to inspect chunk-size of an upload-to-Tomcat, but I
> want to be sure. Might this be easier to do on the client to force a
> certain chunk-size?
>

You are right, we want to inspect the chunk-size of an upload to Tomcat. We
have no control over the video-chunk-generator, so, the only way to know
the fragment/chunk size they are sending is by inspecting the CTE.


>
> Finally... for video, perhaps a Websocket connection would be better
> since there is less protocol-overhead once the ws connection is
> established?
>

True, but the video-chunk-generator only offers two ways of transfer: HTTP
PUT or writing to disk. The second option was discarded as we will need to
listen to file system events and do some magic there, which we don't need
to do for the HTTP PUT, as the protocol/Tomcat guarantee when the transfer
starts and ends.


>
> >>>> The current flow from the point of view of CoyoteInputStream is:
> >>>> CoyoteInputStream.read -> Request.read -> ChunkedInputFilter.read.
> >>>>
> >>>> ChunkedInputFilter handles the CTE decoding and the read method only
> >>>> returns the chunks, with no other information, like chunk size.
> >>>>
> >>>> I found that the method Request.setInputBuffer might allow to set a
> >>>> different InputBuffer implementation, for instance, the
> >>>> IdentityInputFilter, which I understand returns all the stream bytes,
> >>>> with
> >>>> no decoding. However, not sure if this is the right way and which
> >>>> consequences might have.
> >>>>
> >>>> I would like to know if there are other ways to override the CTE
> >>>> behavior,
> >>>> any help would be appreciated.
> >>>
> >>> A problem I can see is that you are working with a blocking streaming
> >>> interface e.g. read(byte[]) and you also want to get the chunk size.
> >>> When? The chunk-size can change for every chunk, so if you call
> >>> getChunkSize() before the read() and after the read(), they may be
> >>> different if the read() returns data from multiple chunks. It may have
> >>> changed multiple times between read() was called and when it completed.
> >>>
> >>> If you want to always size byte byte[] to read full-chunks at once ...
> I
> >>> guess I would again ask "why?"
> >>>
> >>> Would it be sufficient for ChunkedInputFilter to maybe send an
> >>> event-notification each time a chunk boundary was crossed? For example:
> >>>
> >>> public interface ChunkListener {
> >>>     public void chunkStarted(ChunkedInputFilter source, long offset,
> long
> >>> length);
> >>>     public void chunkFinished(ChunkedInputFilter source, long offset,
> >>> long length);
> >>> }
> >>>
> >>> Then, every time the Filter begins or ends a chunk it could notify your
> >>> code and you can do whatever you want with that information.
> >
> >> You might be able to subclass the (somewhat confusingly-named)
> >>> ChunkInputFilter and bolt-on your own logic like what I have above.
> >>>
> >
> >
> > Yes, a listener like that looks great. Any more clues on how to inject my
> > own ChunkInputFilter implementation in Tomcat configuration? seems quite
> > hard to do it well.  Also, the listener must be linked by HTTP request.
>
> I think doing so would require some internal support for messing-around
> with the chain of objects that handle the requests. I don't think you
> can do this "on your own". One option would be for us to add the ability
> to register a "ChunkListener" with the ChunkInputFilter but honestly
> this is a pretty odd use-case and having that code running on every
> server worldwide seems like a waste. The other option would be to allow
> you to specify your own ChunkInputFilter class at some point during
> server initialization, which seems like a much better option.
>

I totally agree Tomcat shouldn't add anything specific regarding this
uncommon use case, I am happy having a workaround. Specifying my own
ChunkInputFilter seems the way to go, I have access to the Request object
(which Spring Boot can inject), so, using Request.setInputBuffer should be
enough? I am a little concerned about playing with Tomcat defaults, but not
many options on my plate.


> -chris
>










>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

-- 
Daniel Andrés Pelaez López
Master’s Degree in IT Architectures, Universidad de los Andes.
Software Construction Specialist, Universidad de los Andes.
Bachelor's Degree in Computer Sciences, Universidad del Quindio.
e. estigma88@gmail.com

Re: Tomcat 10.1.x: Using CoyoteInputStream to read a Chunked Transfer Encoding (CTE) stream, manually, skiping ChunkedInputFilter

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Daniel,

On 6/26/23 16:15, Daniel Andres Pelaez Lopez wrote:
> El lun, 26 jun 2023 a las 14:53, Mark Thomas (<ma...@apache.org>) escribió:
> 
>> On 26/06/2023 20:34, Christopher Schultz wrote:
>>> Daniel,
>>>
>>> On 6/26/23 12:47, Daniel Andres Pelaez Lopez wrote:
>>>> Hi Tomcat community,
>>>>
>>>> I have a requirement where we want to manually decode a Chunked Transfer
>>>> Encoding (CTE) stream using CoyoteInputStream to have access to the
>> chunk
>>>> size. This means I want to use CoyoteInputStream.read method and get the
>>>> whole CTE bytes. Saying it in another way: we want to decode the CTE at
>>>> hand skipping Tomcat defaults.
>>>
>>> Dumb question: why?
>>
>> Not a dumb question at all. It is the key question. I'm curious as to
>> what the answer is.
>>
>> Mark
>>
>>
> Not dumb question at all. Let me expand the use case: we are working on an
> HTTP origin (Tomcat) for video streaming using HLS and DASH. Our video
> packager generates video segments of X size, and each segment is also
> divided into fragments (CMAF). The segment size is fixed, but the fragment
> size is variable. Our packager transfers the segment meanwhile it generates
> it, a fragment at the time (a chunk), in a CTE, to the HTTP origin
> (Tomcat). Now, video players want to download the segment, but as for the
> HLS spec, we require to transfer the segment to the video player, as we
> received, a fragment a the time. To be able of sending a fragment at the
> time, we need to know its size, which is implicit inside the CTE (each
> chunk declares the chunk size).
> 
> Our current implementation sends the segment using CTE to the video
> players, but we cannot guarantee we are sending a fragment by chunk.
> 
> This is why having access to each chunk and its size will help us.

Thanks for the details. I think I've got it, but I want to clarify a 
little bit.

Is your video-chunk-generator producing anything HTTP-related? It almost 
sounds like Tomcat is a reverse-proxy and your video-generator is the 
origin. Maybe you are just generating byte[] from the video-generator?

Or maybe your video-generator is UPLOADING the chunks to the HTTP 
server? It's not entirely clear to me, and the details matter.

It sounds like you are trying to optimize things such that video-chunk 
size ends up being equal to the HTTP-chunk size. Is that the real goal?

In that case, you want to force the chunk size to something specific, 
rather than just trying to see what the chunk size is.

How you do that depends on whether your video-generator is sending data 
in the *request entity* in e.g. PUT or POST or if you are fetching the 
data in a *response entity*.

I *think* you want to inspect chunk-size of an upload-to-Tomcat, but I 
want to be sure. Might this be easier to do on the client to force a 
certain chunk-size?

Finally... for video, perhaps a Websocket connection would be better 
since there is less protocol-overhead once the ws connection is established?

>>>> The current flow from the point of view of CoyoteInputStream is:
>>>> CoyoteInputStream.read -> Request.read -> ChunkedInputFilter.read.
>>>>
>>>> ChunkedInputFilter handles the CTE decoding and the read method only
>>>> returns the chunks, with no other information, like chunk size.
>>>>
>>>> I found that the method Request.setInputBuffer might allow to set a
>>>> different InputBuffer implementation, for instance, the
>>>> IdentityInputFilter, which I understand returns all the stream bytes,
>>>> with
>>>> no decoding. However, not sure if this is the right way and which
>>>> consequences might have.
>>>>
>>>> I would like to know if there are other ways to override the CTE
>>>> behavior,
>>>> any help would be appreciated.
>>>
>>> A problem I can see is that you are working with a blocking streaming
>>> interface e.g. read(byte[]) and you also want to get the chunk size.
>>> When? The chunk-size can change for every chunk, so if you call
>>> getChunkSize() before the read() and after the read(), they may be
>>> different if the read() returns data from multiple chunks. It may have
>>> changed multiple times between read() was called and when it completed.
>>>
>>> If you want to always size byte byte[] to read full-chunks at once ... I
>>> guess I would again ask "why?"
>>>
>>> Would it be sufficient for ChunkedInputFilter to maybe send an
>>> event-notification each time a chunk boundary was crossed? For example:
>>>
>>> public interface ChunkListener {
>>>     public void chunkStarted(ChunkedInputFilter source, long offset, long
>>> length);
>>>     public void chunkFinished(ChunkedInputFilter source, long offset,
>>> long length);
>>> }
>>>
>>> Then, every time the Filter begins or ends a chunk it could notify your
>>> code and you can do whatever you want with that information.
> 
>> You might be able to subclass the (somewhat confusingly-named)
>>> ChunkInputFilter and bolt-on your own logic like what I have above.
>>>
> 
> 
> Yes, a listener like that looks great. Any more clues on how to inject my
> own ChunkInputFilter implementation in Tomcat configuration? seems quite
> hard to do it well.  Also, the listener must be linked by HTTP request.

I think doing so would require some internal support for messing-around 
with the chain of objects that handle the requests. I don't think you 
can do this "on your own". One option would be for us to add the ability 
to register a "ChunkListener" with the ChunkInputFilter but honestly 
this is a pretty odd use-case and having that code running on every 
server worldwide seems like a waste. The other option would be to allow 
you to specify your own ChunkInputFilter class at some point during 
server initialization, which seems like a much better option.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 10.1.x: Using CoyoteInputStream to read a Chunked Transfer Encoding (CTE) stream, manually, skiping ChunkedInputFilter

Posted by Daniel Andres Pelaez Lopez <es...@gmail.com>.
El lun, 26 jun 2023 a las 14:53, Mark Thomas (<ma...@apache.org>) escribió:

> On 26/06/2023 20:34, Christopher Schultz wrote:
> > Daniel,
> >
> > On 6/26/23 12:47, Daniel Andres Pelaez Lopez wrote:
> >> Hi Tomcat community,
> >>
> >> I have a requirement where we want to manually decode a Chunked Transfer
> >> Encoding (CTE) stream using CoyoteInputStream to have access to the
> chunk
> >> size. This means I want to use CoyoteInputStream.read method and get the
> >> whole CTE bytes. Saying it in another way: we want to decode the CTE at
> >> hand skipping Tomcat defaults.
> >
> > Dumb question: why?
>
> Not a dumb question at all. It is the key question. I'm curious as to
> what the answer is.
>
> Mark
>
>
Not dumb question at all. Let me expand the use case: we are working on an
HTTP origin (Tomcat) for video streaming using HLS and DASH. Our video
packager generates video segments of X size, and each segment is also
divided into fragments (CMAF). The segment size is fixed, but the fragment
size is variable. Our packager transfers the segment meanwhile it generates
it, a fragment at the time (a chunk), in a CTE, to the HTTP origin
(Tomcat). Now, video players want to download the segment, but as for the
HLS spec, we require to transfer the segment to the video player, as we
received, a fragment a the time. To be able of sending a fragment at the
time, we need to know its size, which is implicit inside the CTE (each
chunk declares the chunk size).

Our current implementation sends the segment using CTE to the video
players, but we cannot guarantee we are sending a fragment by chunk.

This is why having access to each chunk and its size will help us.



> >
> >> The current flow from the point of view of CoyoteInputStream is:
> >> CoyoteInputStream.read -> Request.read -> ChunkedInputFilter.read.
> >>
> >> ChunkedInputFilter handles the CTE decoding and the read method only
> >> returns the chunks, with no other information, like chunk size.
> >>
> >> I found that the method Request.setInputBuffer might allow to set a
> >> different InputBuffer implementation, for instance, the
> >> IdentityInputFilter, which I understand returns all the stream bytes,
> >> with
> >> no decoding. However, not sure if this is the right way and which
> >> consequences might have.
> >>
> >> I would like to know if there are other ways to override the CTE
> >> behavior,
> >> any help would be appreciated.
> >
> > A problem I can see is that you are working with a blocking streaming
> > interface e.g. read(byte[]) and you also want to get the chunk size.
> > When? The chunk-size can change for every chunk, so if you call
> > getChunkSize() before the read() and after the read(), they may be
> > different if the read() returns data from multiple chunks. It may have
> > changed multiple times between read() was called and when it completed.
> >
> > If you want to always size byte byte[] to read full-chunks at once ... I
> > guess I would again ask "why?"
> >
> > Would it be sufficient for ChunkedInputFilter to maybe send an
> > event-notification each time a chunk boundary was crossed? For example:
> >
> > public interface ChunkListener {
> >    public void chunkStarted(ChunkedInputFilter source, long offset, long
> > length);
> >    public void chunkFinished(ChunkedInputFilter source, long offset,
> > long length);
> > }
> >
> > Then, every time the Filter begins or ends a chunk it could notify your
> > code and you can do whatever you want with that information.

> You might be able to subclass the (somewhat confusingly-named)
> > ChunkInputFilter and bolt-on your own logic like what I have above.
> >


Yes, a listener like that looks great. Any more clues on how to inject my
own ChunkInputFilter implementation in Tomcat configuration? seems quite
hard to do it well.  Also, the listener must be linked by HTTP request.


>
> >> BTW: We are using Spring Boot with Tomcat embedded.
> >
> > That probably makes it much easier to tamper with the setup, thanks for
> > providing that information.
> >
> > -chris
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> > For additional commands, e-mail: users-help@tomcat.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

-- 
Daniel Andrés Pelaez López
Master’s Degree in IT Architectures, Universidad de los Andes.
Software Construction Specialist, Universidad de los Andes.
Bachelor's Degree in Computer Sciences, Universidad del Quindio.
e. estigma88@gmail.com

Re: Tomcat 10.1.x: Using CoyoteInputStream to read a Chunked Transfer Encoding (CTE) stream, manually, skiping ChunkedInputFilter

Posted by Mark Thomas <ma...@apache.org>.
On 26/06/2023 20:34, Christopher Schultz wrote:
> Daniel,
> 
> On 6/26/23 12:47, Daniel Andres Pelaez Lopez wrote:
>> Hi Tomcat community,
>>
>> I have a requirement where we want to manually decode a Chunked Transfer
>> Encoding (CTE) stream using CoyoteInputStream to have access to the chunk
>> size. This means I want to use CoyoteInputStream.read method and get the
>> whole CTE bytes. Saying it in another way: we want to decode the CTE at
>> hand skipping Tomcat defaults.
> 
> Dumb question: why?

Not a dumb question at all. It is the key question. I'm curious as to 
what the answer is.

Mark



> 
>> The current flow from the point of view of CoyoteInputStream is:
>> CoyoteInputStream.read -> Request.read -> ChunkedInputFilter.read.
>>
>> ChunkedInputFilter handles the CTE decoding and the read method only
>> returns the chunks, with no other information, like chunk size.
>>
>> I found that the method Request.setInputBuffer might allow to set a
>> different InputBuffer implementation, for instance, the
>> IdentityInputFilter, which I understand returns all the stream bytes, 
>> with
>> no decoding. However, not sure if this is the right way and which
>> consequences might have.
>>
>> I would like to know if there are other ways to override the CTE 
>> behavior,
>> any help would be appreciated.
> 
> A problem I can see is that you are working with a blocking streaming 
> interface e.g. read(byte[]) and you also want to get the chunk size. 
> When? The chunk-size can change for every chunk, so if you call 
> getChunkSize() before the read() and after the read(), they may be 
> different if the read() returns data from multiple chunks. It may have 
> changed multiple times between read() was called and when it completed.
> 
> If you want to always size byte byte[] to read full-chunks at once ... I 
> guess I would again ask "why?"
> 
> Would it be sufficient for ChunkedInputFilter to maybe send an 
> event-notification each time a chunk boundary was crossed? For example:
> 
> public interface ChunkListener {
>    public void chunkStarted(ChunkedInputFilter source, long offset, long 
> length);
>    public void chunkFinished(ChunkedInputFilter source, long offset, 
> long length);
> }
> 
> Then, every time the Filter begins or ends a chunk it could notify your 
> code and you can do whatever you want with that information.
> 
> You might be able to subclass the (somewhat confusingly-named) 
> ChunkInputFilter and bolt-on your own logic like what I have above.
> 
>> BTW: We are using Spring Boot with Tomcat embedded.
> 
> That probably makes it much easier to tamper with the setup, thanks for 
> providing that information.
> 
> -chris
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 10.1.x: Using CoyoteInputStream to read a Chunked Transfer Encoding (CTE) stream, manually, skiping ChunkedInputFilter

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Daniel,

On 6/26/23 12:47, Daniel Andres Pelaez Lopez wrote:
> Hi Tomcat community,
> 
> I have a requirement where we want to manually decode a Chunked Transfer
> Encoding (CTE) stream using CoyoteInputStream to have access to the chunk
> size. This means I want to use CoyoteInputStream.read method and get the
> whole CTE bytes. Saying it in another way: we want to decode the CTE at
> hand skipping Tomcat defaults.

Dumb question: why?

> The current flow from the point of view of CoyoteInputStream is:
> CoyoteInputStream.read -> Request.read -> ChunkedInputFilter.read.
> 
> ChunkedInputFilter handles the CTE decoding and the read method only
> returns the chunks, with no other information, like chunk size.
> 
> I found that the method Request.setInputBuffer might allow to set a
> different InputBuffer implementation, for instance, the
> IdentityInputFilter, which I understand returns all the stream bytes, with
> no decoding. However, not sure if this is the right way and which
> consequences might have.
> 
> I would like to know if there are other ways to override the CTE behavior,
> any help would be appreciated.

A problem I can see is that you are working with a blocking streaming 
interface e.g. read(byte[]) and you also want to get the chunk size. 
When? The chunk-size can change for every chunk, so if you call 
getChunkSize() before the read() and after the read(), they may be 
different if the read() returns data from multiple chunks. It may have 
changed multiple times between read() was called and when it completed.

If you want to always size byte byte[] to read full-chunks at once ... I 
guess I would again ask "why?"

Would it be sufficient for ChunkedInputFilter to maybe send an 
event-notification each time a chunk boundary was crossed? For example:

public interface ChunkListener {
   public void chunkStarted(ChunkedInputFilter source, long offset, long 
length);
   public void chunkFinished(ChunkedInputFilter source, long offset, 
long length);
}

Then, every time the Filter begins or ends a chunk it could notify your 
code and you can do whatever you want with that information.

You might be able to subclass the (somewhat confusingly-named) 
ChunkInputFilter and bolt-on your own logic like what I have above.

> BTW: We are using Spring Boot with Tomcat embedded.

That probably makes it much easier to tamper with the setup, thanks for 
providing that information.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org