You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Saurav Sarkar <sa...@gmail.com> on 2019/01/02 17:20:42 UTC

Parsing of multi part content

Hi All,

This is regarding the reading of multi part content in java server side.

ServletRequest has an API getParts() API for reading the parts of a multi
part request
https://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletRequest.html#getParts()
.

It has part.getInputStream which can be used to read the content of a
specific part.

Tomcat also provides an implementation for this API.

But this API  parses the multi part content and keeps it in memory. If the
size increase then the content can be offloaded to disk.

Why does the getPart API or any multi part parsing need to load the content
in memory ? Why can't direct streaming of content happen ? Loading the
content in memory and reading/writing to disk brings extra cost. This will
be specially costly when large files are getting uploaded.

Is there no way where at least the file content loading could be avoided ?

It may be not be a very specific question for tomcat but more applicable to
any servlet container.

Best Regards,

Saurav

Re: Parsing of multi part content

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Saurav,

On 1/7/19 13:13, Saurav Sarkar wrote:
> Thanks a lot Chris for the reply.
> 
> I think even if i parse the request myself i have to always load 
> the content in memory/disk.
> 
> Because in order to extract the file uploaded from the request , i
>  have to go through the whole request stream and trim down the 
> boundaries.

Exactly. Neither you *nor Tomcat* really have many options, here.

- -chris


> On Thu, Jan 3, 2019 at 3:20 AM Christopher Schultz < 
> chris@christopherschultz.net> wrote:
> 
> Saurav,
> 
> On 1/2/19 12:20, Saurav Sarkar wrote:
>>>> Hi All,
>>>> 
>>>> This is regarding the reading of multi part content in java 
>>>> server side.
>>>> 
>>>> ServletRequest has an API getParts() API for reading the 
>>>> parts of a multi part request 
>>>> https://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServlet
Req
>
>>>>
>>>> 
uest.html#getParts()
> <https://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletRe
quest.html#getParts()>
>>>>
>>>>
>
>
> 
.
>>>> 
>>>> It has part.getInputStream which can be used to read the 
>>>> content of a specific part.
>>>> 
>>>> Tomcat also provides an implementation for this API.
>>>> 
>>>> But this API  parses the multi part content and keeps it in 
>>>> memory. If the size increase then the content can be 
>>>> offloaded to disk.
>>>> 
>>>> Why does the getPart API or any multi part parsing need to 
>>>> load the content in memory ? Why can't direct streaming of 
>>>> content happen ? Loading the content in memory and 
>>>> reading/writing to disk brings extra cost. This will be 
>>>> specially costly when large files are getting uploaded.
> 
> True. You can always limit the part-size or request-size, but you 
> can't stream huge uploads if you want to use getParts().
> 
>>>> Is there no way where at least the file content loading
>>>> could be avoided ?
> 
> Yes, there is a way.
> 
> Instead of calling HttpServletRequest.getParameter* or 
> HttpServletRequest.getPart*, you can call 
> HttpServletRequest.getInputStream and parse everything yourself.
> 
>>>> It may be not be a very specific question for tomcat but more
>>>> applicable to any servlet container.
> 
> Correct, this is applicable to any servlet container.
> 
> The multipart code in Tomcat parses everything to memory/disk at 
> once because servlet code needs to be able to call 
> HttpServletRequest.getParameter(String) in any order regardless of
>  what how the request data is actually ordered. Also, getParts must
>  return before the calling code can actually do anything with the 
> data. There is no "register a stream handler for a multipart 
> request part called 'foo'" or anything like that.
> 
> If you want those semantics, you'll have to parse the request 
> yourself.
> 
> -chris
>> 
>> ---------------------------------------------------------------------
>>
>>
>> 
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
>> 
>> 
> 
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwzzNUACgkQHPApP6U8
pFgNJg//VoweS5Ve1jgMYi4e2yekWL4lexpAkm1Apu+mwOku93c7oQ28RW1x/h54
ShuQbh+smt5DWPoM3Qu9F/PvKEiNcLyRxLyhmf5c/SRSRoLrBrZt66Oq+SKmekrb
wugGyLOP6ZrhULXJOlNqIa/oh5pBA/GU7cEGTjcLdCzz3i58c+0H/XyO7EG5QjvS
rX6umdKNp86bMgSjw9wL/KNCB7KbLguhPxIo93S6AjRNtbXWxThuZGlLb6x6fjpK
J+nsnuwhv8nxrpCawdq+hKABMFL9PsIq28Yf5MZ7gLzgCr/iYibDXDJpF9puBe5r
xKOhAvhTy7yK84C44QzkRCBzqTp2+d/0xL1rQCyMU7wq4fDZm8ZKd8QsPOzNd0QG
Sfb+/HMu00UlZTcOhNmIdPMrGwuDT2cwRlgWZGBJAnA7gJNzoJ1ZY/F+62UJPin7
UrFH5IMP0Gw3cJ5PEHH/pi3nTTb+FCtMobb3AogPvfXE+jDVb3vC1Shg52UdvX2h
I63h8+l/UHhcy+ldP+9ov2EeoBgUfn+7feUKTbuyefmPjsKZdj8xeRC4CoWG6UT3
xDEMSdZ/EhkGRmPhzBcPPlPHR9vBjyY26V7Sk0AapqLSVRxBbG4s4HeGRJxdNHj8
r1vyaawX66/6ACzL3pp7XqHdshTRoxR8nZuBuBSq3a7LQDjO1Ps=
=IapN
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Parsing of multi part content

Posted by Saurav Sarkar <sa...@gmail.com>.
Thanks a lot Chris for the reply.

i think even if i parse the request myself i have to always load the
content in memory/disk.
Because in order to extract the file uploaded from the request , i have to
go through
the whole request stream and trim down the boundaries.

Best Regards,
Saurav

On Thu, Jan 3, 2019 at 3:20 AM Christopher Schultz <
chris@christopherschultz.net> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Saurav,
>
> On 1/2/19 12:20, Saurav Sarkar wrote:
> > Hi All,
> >
> > This is regarding the reading of multi part content in java server
> > side.
> >
> > ServletRequest has an API getParts() API for reading the parts of a
> > multi part request
> > https://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletReq
> uest.html#getParts()
> <https://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletRequest.html#getParts()>
> >
> >
> .
> >
> > It has part.getInputStream which can be used to read the content of
> > a specific part.
> >
> > Tomcat also provides an implementation for this API.
> >
> > But this API  parses the multi part content and keeps it in memory.
> > If the size increase then the content can be offloaded to disk.
> >
> > Why does the getPart API or any multi part parsing need to load the
> > content in memory ? Why can't direct streaming of content happen ?
> > Loading the content in memory and reading/writing to disk brings
> > extra cost. This will be specially costly when large files are
> > getting uploaded.
>
> True. You can always limit the part-size or request-size, but you
> can't stream huge uploads if you want to use getParts().
>
> > Is there no way where at least the file content loading could be
> > avoided ?
>
> Yes, there is a way.
>
> Instead of calling HttpServletRequest.getParameter* or
> HttpServletRequest.getPart*, you can call
> HttpServletRequest.getInputStream and parse everything yourself.
>
> > It may be not be a very specific question for tomcat but more
> > applicable to any servlet container.
>
> Correct, this is applicable to any servlet container.
>
> The multipart code in Tomcat parses everything to memory/disk at once
> because servlet code needs to be able to call
> HttpServletRequest.getParameter(String) in any order regardless of
> what how the request data is actually ordered. Also, getParts must
> return before the calling code can actually do anything with the data.
> There is no "register a stream handler for a multipart request part
> called 'foo'" or anything like that.
>
> If you want those semantics, you'll have to parse the request yourself.
>
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwtMjkACgkQHPApP6U8
> pFgtJA//eRy6nI9gSj10Ok1PYRSYNGJHcblGieiK8aicWqq0xV/RQgMVarq6PK8i
> OHaBt3e/plA9Z5fr7tNs0jyT6dEhrVONYgkLJmyNxLC/EBtYXFD4M2q2R+YnIZbO
> GZBjse/O5xzJAK3jJbWVe9w+rJfz6FCp6mPn/0AUNMUVhOgzC5/1oeKvMkyooEHY
> 598ULLioK0ZvHWHeVJNe/hNdggjwm9jNuDrxuvrNLX6fY44ed/jlfzUh3G0tAw8B
> Ik1Ug8AJi1EQU0sVPfik5Fos7D740DI0KiRcQWsjvEqvelJhWfNTQkkY9GWUmPzW
> EMvCJH1T+ehGYo8HD1w+I74SsFlfTRyI/muzzlT5Gy2hCzN56JN4QU+oUQhGfS1E
> njF0SAmB47XYdMq2fKSaaqmi+zfsvr1AgaPBE/TyfXhCRUYe7K34ThXBpbqon0dd
> UdphHvka7gyBp/dqrufyhr/EjfnCi6MWUoLSWEIhrMvfeEFsrKshRlql3B+aE9Vk
> iuwb0p2TT7vu79oCeHr+eANdIurM8vrBx5+PYWJ8AbMqarHeCyvyR0tfgAzokI9w
> 2rVlg2NuiVN3ByuK9ytDGp94m5BwxdQ1jC8zeJUgCpKesXxzrB4c1IhaY6CRTEFa
> S3K6IfGtc1zSKGMaN/gz8Mqq5ljm2P8GfkwrzxoDLETjjgVjxKE=
> =DKX3
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Re: Parsing of multi part content

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Saurav,

On 1/2/19 12:20, Saurav Sarkar wrote:
> Hi All,
> 
> This is regarding the reading of multi part content in java server
> side.
> 
> ServletRequest has an API getParts() API for reading the parts of a
> multi part request 
> https://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletReq
uest.html#getParts()
>
> 
.
> 
> It has part.getInputStream which can be used to read the content of
> a specific part.
> 
> Tomcat also provides an implementation for this API.
> 
> But this API  parses the multi part content and keeps it in memory.
> If the size increase then the content can be offloaded to disk.
> 
> Why does the getPart API or any multi part parsing need to load the
> content in memory ? Why can't direct streaming of content happen ?
> Loading the content in memory and reading/writing to disk brings
> extra cost. This will be specially costly when large files are
> getting uploaded.

True. You can always limit the part-size or request-size, but you
can't stream huge uploads if you want to use getParts().

> Is there no way where at least the file content loading could be
> avoided ?

Yes, there is a way.

Instead of calling HttpServletRequest.getParameter* or
HttpServletRequest.getPart*, you can call
HttpServletRequest.getInputStream and parse everything yourself.

> It may be not be a very specific question for tomcat but more
> applicable to any servlet container.

Correct, this is applicable to any servlet container.

The multipart code in Tomcat parses everything to memory/disk at once
because servlet code needs to be able to call
HttpServletRequest.getParameter(String) in any order regardless of
what how the request data is actually ordered. Also, getParts must
return before the calling code can actually do anything with the data.
There is no "register a stream handler for a multipart request part
called 'foo'" or anything like that.

If you want those semantics, you'll have to parse the request yourself.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwtMjkACgkQHPApP6U8
pFgtJA//eRy6nI9gSj10Ok1PYRSYNGJHcblGieiK8aicWqq0xV/RQgMVarq6PK8i
OHaBt3e/plA9Z5fr7tNs0jyT6dEhrVONYgkLJmyNxLC/EBtYXFD4M2q2R+YnIZbO
GZBjse/O5xzJAK3jJbWVe9w+rJfz6FCp6mPn/0AUNMUVhOgzC5/1oeKvMkyooEHY
598ULLioK0ZvHWHeVJNe/hNdggjwm9jNuDrxuvrNLX6fY44ed/jlfzUh3G0tAw8B
Ik1Ug8AJi1EQU0sVPfik5Fos7D740DI0KiRcQWsjvEqvelJhWfNTQkkY9GWUmPzW
EMvCJH1T+ehGYo8HD1w+I74SsFlfTRyI/muzzlT5Gy2hCzN56JN4QU+oUQhGfS1E
njF0SAmB47XYdMq2fKSaaqmi+zfsvr1AgaPBE/TyfXhCRUYe7K34ThXBpbqon0dd
UdphHvka7gyBp/dqrufyhr/EjfnCi6MWUoLSWEIhrMvfeEFsrKshRlql3B+aE9Vk
iuwb0p2TT7vu79oCeHr+eANdIurM8vrBx5+PYWJ8AbMqarHeCyvyR0tfgAzokI9w
2rVlg2NuiVN3ByuK9ytDGp94m5BwxdQ1jC8zeJUgCpKesXxzrB4c1IhaY6CRTEFa
S3K6IfGtc1zSKGMaN/gz8Mqq5ljm2P8GfkwrzxoDLETjjgVjxKE=
=DKX3
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org