You are viewing a plain text version of this content. The canonical link for it is here.

Posted to c-dev@axis.apache.org by Manjula Peiris <ma...@wso2.com> on 2008/03/13 12:48:57 UTC

Caching support for large attachments

Hi devs,

In the current MTOM implementation we are keeping the whole attachment
in memory until passing it to the reciever. The [1] is on supporting
some sort of caching in order to handle large attachments. Axis2/Java
does this by writing the attachment to a file when it exceeds a certain
threshold. 

What we can do in our implementation is after extracting the binary
content write that to a file and keeping the file name inside
data_handler instead of the whole buffer. So the service or the client
will get the file name instead of the buffered stream, when it receives
an attachment. This will not prevent buffering the attachment at the
transport but will prevent keeping it inside the om_tree till it reaches
the receiver. 

Before implementing this I would like to know your suggestions regarding
this.

[1] https://issues.apache.org/jira/browse/AXIS2C-672 

Thanks,
-Manjula

-- 
Manjula Peiris: http://manjula-peiris.blogspot.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Manjula Peiris <ma...@wso2.com>.

On Fri, 2008-03-14 at 16:35 -0400, Thilina Gunarathne wrote:
> Hi,
> >  What we did in Axis2/Java to overcome this is to read the data to a
> byte[] buffer of up to a certain size (the size threshold). If there
> are more data available in the mime part (if we have not encountered
> the boundary yet) then we know this attachment is bigger than the
> threshold. So we create the temp file, pump the content in the buffer
> to the file, then pump the rest of the stream to the file..

Here are you pumping the rest of the whole stream or till it finds the
end of that particular mime part ?  

If yes then how you find the end of the mime part ? Because if the mime
boundary is divided in a part which is already written to the file and
in a part which needs to be read won't there be a problem ? 

-Manjula.

>  In this
> way we do not need to know the size of the attachment upfront.. BTW we
> do all of the above while we are parsing the MIME message at the MIME
> parser level..



> 
> >  > This has the plus point that the attachment size will be
> >  > limited only by the available free space in the Temp Directory..
> >  > Will that be possible in Axis2/C.. Or is that wat you have in mind :)..
> >
> >  Yes this is possible.
> But in Axis2/JAVA we will get a OutOfMemory if we parse a large MIME
> part upfront, since it reads the attachment to memory. May be you can
> have a larger limit with C than in Java, but ultimately you'll come to
> a situation where you will not have enough memory to store that MIME
> part in memory in the parsing time, unless you write in to a File
> while parsing,..
> 
> thanks,
> Thilina
> 
> >
> >
> >
> >  >
> >  > thanks,
> >  > Thilina
> >  >
> >  >  >and keeping the file name inside
> >  > >  data_handler instead of the whole buffer. So the service or the client
> >  > >  will get the file name instead of the buffered stream, when it receives
> >  > >  an attachment. This will not prevent buffering the attachment at the
> >  > >  transport but will prevent keeping it inside the om_tree till it reaches
> >  > >  the receiver.
> >  > >
> >  > >  Before implementing this I would like to know your suggestions regarding
> >  > >  this.
> >  > >
> >  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
> >  > >
> >  > >  Thanks,
> >  > >  -Manjula
> >  > >
> >  > >  --
> >  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
> >  > >
> >  > >
> >  > >  ---------------------------------------------------------------------
> >  > >  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >  > >  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >  > >
> >  > >
> >  >
> >  >
> >  >
> >
> >
> >  ---------------------------------------------------------------------
> >  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >
> >
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Thilina Gunarathne <cs...@gmail.com>.

Hi Senaka,
Looks like you are confused by my mail...

> >>  BTW, this whole discussion is about in path, that is reading an
> >>  incomming message. How about the out path? We have the same problems
> >>  when sending attachments. Right now, we read the whole file into memory
> >>  and then only we send over the wire.
> > hmm... Why not write it in chunks.. Read a chunk from the file, then
> > write it to the outstream.. Use size of the file for content-type
> > calculation in case of non-chunking.. But mostly people will use
> > chunking when using MTOM..
>
> No, chunking is not required.
In the above I'm talking about two levels of chunks.. Read a chunk to
the buffer and write than chunk to the stream irrespective of HTTP
chunking is enabled or not..

>You also don't need to write the entire data
> to be sent, to the stream at once.
Isn't this wat I was telling :)..
>Because any HTTP Receiver will pull
> from the stream until it sees a valid ending character sequence.
>
> I believe that you should be able to write part by part to the stream, and
> send it, then reuse the buffer and write part 2, and send and so on.
Once again I don't see anthing different than wat I said... May be I'm
missing something..


>This
> argument can be justified, because on the receiving end, we must read the
> multi-part data until we encounter the mime boundary, unlike an ordinary
> payload where it can be terminated by a valid terminating character
> sequence .
MIME multipart/related packaging has a specified ending sequence. MIME
boundary can be anywhere in the middle of the message, since it's
multipart... Ending sequence is boundary together with "--",

> We'll only have issues if we are to write large soap payloads
This is where HttpChunking comes in to play..

thanks,
Thilina

> which of course can be dealt with once we've implemented Session in
> Axis2/C.
>
> Regards,
> Senaka
>
>
> >
> > thanks,
> > Thilina
> >
> >
> >>
> >>  Samisa...
> >>
> >>
> >>
> >>  > Regards,
> >>  > Senaka
> >>  >
> >>  >
> >>  >> Hi,
> >>  >>
> >>  >>>  > In Axis2/Java case we do write the attachment content directly
> >> from
> >>  >>>  > the InputStream to the File when the attachment size is larger
> >> than
> >>  >>>  > the threshold.  This avoids loading the whole attachment to the
> >>  >>> memory
> >>  >>>  > at all.
> >>  >>>
> >>  >>>  In this case to find out the attachment size don't you need to do
> >> any
> >>  >>>  mime parsing? How do you find the attachment size with out
> >> searching
> >>  >>> for
> >>  >>>  the mime boundaries ?
> >>  >>>
> >>  >> Yes.. MIME is a boundary based packaging mechanism and you does not
> >>  >> need to specify the length for each of the parts...Even the HTTP
> >>  >> content length is not there if the message is chunked.
> >>  >>
> >>  >> What we did in Axis2/Java to overcome this is to read the data to a
> >>  >> byte[] buffer of up to a certain size (the size threshold). If there
> >>  >> are more data available in the mime part (if we have not encountered
> >>  >> the boundary yet) then we know this attachment is bigger than the
> >>  >> threshold. So we create the temp file, pump the content in the
> >> buffer
> >>  >> to the file, then pump the rest of the stream to the file.. In this
> >>  >> way we do not need to know the size of the attachment upfront.. BTW
> >> we
> >>  >> do all of the above while we are parsing the MIME message at the
> >> MIME
> >>  >> parser level..
> >>  >>
> >>  >>
> >>  >>>  > This has the plus point that the attachment size will be
> >>  >>>  > limited only by the available free space in the Temp Directory..
> >>  >>>  > Will that be possible in Axis2/C.. Or is that wat you have in
> >> mind
> >>  >>> :)..
> >>  >>>
> >>  >>>  Yes this is possible.
> >>  >>>
> >>  >> But in Axis2/JAVA we will get a OutOfMemory if we parse a large MIME
> >>  >> part upfront, since it reads the attachment to memory. May be you
> >> can
> >>  >> have a larger limit with C than in Java, but ultimately you'll come
> >> to
> >>  >> a situation where you will not have enough memory to store that MIME
> >>  >> part in memory in the parsing time, unless you write in to a File
> >>  >> while parsing,..
> >>  >>
> >>  >> thanks,
> >>  >> Thilina
> >>  >>
> >>  >>
> >>  >>>
> >>  >>>  >
> >>  >>>  > thanks,
> >>  >>>  > Thilina
> >>  >>>  >
> >>  >>>  >  >and keeping the file name inside
> >>  >>>  > >  data_handler instead of the whole buffer. So the service or
> >> the
> >>  >>> client
> >>  >>>  > >  will get the file name instead of the buffered stream, when
> >> it
> >>  >>> receives
> >>  >>>  > >  an attachment. This will not prevent buffering the attachment
> >> at
> >>  >>> the
> >>  >>>  > >  transport but will prevent keeping it inside the om_tree till
> >> it
> >>  >>> reaches
> >>  >>>  > >  the receiver.
> >>  >>>  > >
> >>  >>>  > >  Before implementing this I would like to know your
> >> suggestions
> >>  >>> regarding
> >>  >>>  > >  this.
> >>  >>>  > >
> >>  >>>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
> >>  >>>  > >
> >>  >>>  > >  Thanks,
> >>  >>>  > >  -Manjula
> >>  >>>  > >
> >>  >>>  > >  --
> >>  >>>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
> >>  >>>  > >
> >>  >>>  > >
> >>  >>>  > >  ---------------------------------------------------------------------
> >>  >>>  > >  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >>  >>>  > >  For additional commands, e-mail:
> >> axis-c-dev-help@ws.apache.org
> >>  >>>  > >
> >>  >>>  > >
> >>  >>>  >
> >>  >>>  >
> >>  >>>  >
> >>  >>>
> >>  >>>
> >>  >>>  ---------------------------------------------------------------------
> >>  >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >>  >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >>  >>>
> >>  >>>
> >>  >>>
> >>  >>
> >>  >> --
> >>  >> Thilina Gunarathne - http://thilinag.blogspot.com
> >>  >>
> >>  >> ---------------------------------------------------------------------
> >>  >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >>  >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >>  >>
> >>  >>
> >>  >>
> >>  >
> >>  >
> >>  > ---------------------------------------------------------------------
> >>  > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >>  > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >>  >
> >>  >
> >>  >
> >>  >
> >>
> >>
> >>  --
> >>  Samisa Abeysinghe
> >>  Software Architect; WSO2 Inc.
> >>
> >>  http://www.wso2.com/ - "Oxygenating the Web Service Platform."
> >>
> >>
> >>
> >>
> >>  ---------------------------------------------------------------------
> >>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >>
> >>
> >
> >
> >
> > --
> > Thilina Gunarathne - http://thilinag.blogspot.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>



-- 
Thilina Gunarathne - http://thilinag.blogspot.com

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Thilina Gunarathne <cs...@gmail.com>.

Hi Senaka,

> No I'm not taking the discussion to the starting point. I'm rather
> proposing an alternative implementation. According to what I mention here,
> we will rather still read till the end of the stream. But, we will not
> buffer everything we read into memory. We will flush the buffer to a file
> once it exceeds a threshold. However, when we read beyond the buffer size,
> we will not directly copy the entire content to file without parsing it.
> Instead we will use our fixed-sized buffer to temporarily store the
> content before being flushed and then parse it and write it to file. Thus,
> the file will contain only the binary part. It will not contain the
> "--MIMEBoundary" statements etc. These, along with the file name(s) can be
> stored into the parsed attachment object created. Thus, the memory
> consumption will be limited to the size of the fixed buffer and we will
> use the file for storage. This mechanism gives us the added plus of not
> having to worry about re-parsing what is written to file as it has already
> being parsed once. Please note that MIME parsing DOES NOT require us to
> store the entire content in memory.
OK,,, Seems like you misunderstood what we were proposing..  All the
copying, buffering and file writing we talked earlier happens after
the MIME parsing.. Im sorry but once again I don't see much of a
difference in your proposal to what we were being discussing.. We also
proposed to have a file per each MIME part (aca attachment).. No need
to write boundaries to the file..
 May be you need to have a look at the Axis2/Java implementation..

Also MIME parsing can be done using a small state machine, which will
need to buffer data only the for size of the boundary.. May be you
guys would need to use a separate buffer for transport level
optimisations.. But I thought we are talking in a much higher level..

There is a added advantage by doing that.. Mostly people send files
using MTOM and want to retrieve then as files.. When we write them to
separate temp files, people can use those files directly, and do
whatever they need using file operations..

thanks,
Thilina


>
> >
> > When sending writing part by part to the stream is same as chunking.
> > Because when sending either you should specify a content-length or
> > specified it as chunked.
>
> No, it is not the same as chunking. What I meant here is that you need not
> read the entire content at once to memory and write to the stream in a
> single step. Rather we can read part by part and write it to the stream
> and repeat the process until the whole large file is written. In here you
> will still be using the Content Length. Chunking is a whole different
> story where you can transmit data as blocks. Using chunking we can send an
> arbitrary length of data of which the length is not pre-calculated. Now
> you might wonder how do we calculate the content-length without reading
> the entire content to the memory. Well, you can seek through the file and
> find out the size of the content to be written. Add to it the standard
> header block and MIME boundary demarcation string lengths and you will get
> the Content Length. This is a not at all expensive operation as the file
> seek will be scanning the file as a block without reading it to memory.
> The OS will manage it's efficiency.
>
> >
> > -Manjula.
>
> Regards,
> Senaka
>
>
> >
> > On Sat, 2008-03-15 at 13:39 +0530, Senaka Fernando wrote:
> >> >>>  BTW, this whole discussion is about in path, that is reading an
> >> >>>  incomming message. How about the out path? We have the same
> >> problems
> >> >>>  when sending attachments. Right now, we read the whole file into
> >> >>> memory
> >> >>>  and then only we send over the wire.
> >> >> hmm... Why not write it in chunks.. Read a chunk from the file, then
> >> >> write it to the outstream.. Use size of the file for content-type
> >> >> calculation in case of non-chunking.. But mostly people will use
> >> >> chunking when using MTOM..
> >> >
> >> > No, chunking is not required. You also don't need to write the entire
> >> data
> >> > to be sent, to the stream at once. Because any HTTP Receiver will pull
> >> > from the stream until it sees a valid ending character sequence.
> >>
> >> It should rather read a length equal to content length. And the
> >> terminating sequence is for headers. Sorry for the confusion. Therefore,
> >> the HTTP Receiver will pull from the stream until it reads a content
> >> length or until an error occurs.
> >>
> >> >
> >> > I believe that you should be able to write part by part to the stream,
> >> and
> >> > send it, then reuse the buffer and write part 2, and send and so on.
> >> This
> >> > argument can be justified, because on the receiving end, we must read
> >> the
> >> > multi-part data until we encounter the mime boundary, unlike an
> >> ordinary
> >> > payload where it can be terminated by a valid terminating character
> >>
> >> Same here. We'll be reading a length equal to content length.
> >>
> >> > sequence . We'll only have issues if we are to write large soap
> >> payloads
> >> > which of course can be dealt with once we've implemented Session in
> >> > Axis2/C.
> >> >
> >> > Regards,
> >> > Senaka
> >> >
> >> >>
> >> >> thanks,
> >> >> Thilina
> >> >>
> >> >>
> >> >>>
> >> >>>  Samisa...
> >> >>>
> >> >>>
> >> >>>
> >> >>>  > Regards,
> >> >>>  > Senaka
> >> >>>  >
> >> >>>  >
> >> >>>  >> Hi,
> >> >>>  >>
> >> >>>  >>>  > In Axis2/Java case we do write the attachment content
> >> directly
> >> >>> from
> >> >>>  >>>  > the InputStream to the File when the attachment size is
> >> larger
> >> >>> than
> >> >>>  >>>  > the threshold.  This avoids loading the whole attachment to
> >> the
> >> >>>  >>> memory
> >> >>>  >>>  > at all.
> >> >>>  >>>
> >> >>>  >>>  In this case to find out the attachment size don't you need to
> >> do
> >> >>> any
> >> >>>  >>>  mime parsing? How do you find the attachment size with out
> >> >>> searching
> >> >>>  >>> for
> >> >>>  >>>  the mime boundaries ?
> >> >>>  >>>
> >> >>>  >> Yes.. MIME is a boundary based packaging mechanism and you does
> >> not
> >> >>>  >> need to specify the length for each of the parts...Even the HTTP
> >> >>>  >> content length is not there if the message is chunked.
> >> >>>  >>
> >> >>>  >> What we did in Axis2/Java to overcome this is to read the data
> >> to a
> >> >>>  >> byte[] buffer of up to a certain size (the size threshold). If
> >> >>> there
> >> >>>  >> are more data available in the mime part (if we have not
> >> >>> encountered
> >> >>>  >> the boundary yet) then we know this attachment is bigger than
> >> the
> >> >>>  >> threshold. So we create the temp file, pump the content in the
> >> >>> buffer
> >> >>>  >> to the file, then pump the rest of the stream to the file.. In
> >> this
> >> >>>  >> way we do not need to know the size of the attachment upfront..
> >> BTW
> >> >>> we
> >> >>>  >> do all of the above while we are parsing the MIME message at the
> >> >>> MIME
> >> >>>  >> parser level..
> >> >>>  >>
> >> >>>  >>
> >> >>>  >>>  > This has the plus point that the attachment size will be
> >> >>>  >>>  > limited only by the available free space in the Temp
> >> >>> Directory..
> >> >>>  >>>  > Will that be possible in Axis2/C.. Or is that wat you have
> >> in
> >> >>> mind
> >> >>>  >>> :)..
> >> >>>  >>>
> >> >>>  >>>  Yes this is possible.
> >> >>>  >>>
> >> >>>  >> But in Axis2/JAVA we will get a OutOfMemory if we parse a large
> >> >>> MIME
> >> >>>  >> part upfront, since it reads the attachment to memory. May be
> >> you
> >> >>> can
> >> >>>  >> have a larger limit with C than in Java, but ultimately you'll
> >> come
> >> >>> to
> >> >>>  >> a situation where you will not have enough memory to store that
> >> >>> MIME
> >> >>>  >> part in memory in the parsing time, unless you write in to a
> >> File
> >> >>>  >> while parsing,..
> >> >>>  >>
> >> >>>  >> thanks,
> >> >>>  >> Thilina
> >> >>>  >>
> >> >>>  >>
> >> >>>  >>>
> >> >>>  >>>  >
> >> >>>  >>>  > thanks,
> >> >>>  >>>  > Thilina
> >> >>>  >>>  >
> >> >>>  >>>  >  >and keeping the file name inside
> >> >>>  >>>  > >  data_handler instead of the whole buffer. So the service
> >> or
> >> >>> the
> >> >>>  >>> client
> >> >>>  >>>  > >  will get the file name instead of the buffered stream,
> >> when
> >> >>> it
> >> >>>  >>> receives
> >> >>>  >>>  > >  an attachment. This will not prevent buffering the
> >> >>> attachment
> >> >>> at
> >> >>>  >>> the
> >> >>>  >>>  > >  transport but will prevent keeping it inside the om_tree
> >> >>> till
> >> >>> it
> >> >>>  >>> reaches
> >> >>>  >>>  > >  the receiver.
> >> >>>  >>>  > >
> >> >>>  >>>  > >  Before implementing this I would like to know your
> >> >>> suggestions
> >> >>>  >>> regarding
> >> >>>  >>>  > >  this.
> >> >>>  >>>  > >
> >> >>>  >>>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
> >> >>>  >>>  > >
> >> >>>  >>>  > >  Thanks,
> >> >>>  >>>  > >  -Manjula
> >> >>>  >>>  > >
> >> >>>  >>>  > >  --
> >> >>>  >>>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
> >> >>>  >>>  > >
> >> >>>  >>>  > >
> >> >>>  >>>  > >  ---------------------------------------------------------------------
> >> >>>  >>>  > >  To unsubscribe, e-mail:
> >> axis-c-dev-unsubscribe@ws.apache.org
> >> >>>  >>>  > >  For additional commands, e-mail:
> >> >>> axis-c-dev-help@ws.apache.org
> >> >>>  >>>  > >
> >> >>>  >>>  > >
> >> >>>  >>>  >
> >> >>>  >>>  >
> >> >>>  >>>  >
> >> >>>  >>>
> >> >>>  >>>
> >> >>>  >>>  ---------------------------------------------------------------------
> >> >>>  >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> >>>  >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >> >>>  >>>
> >> >>>  >>>
> >> >>>  >>>
> >> >>>  >>
> >> >>>  >> --
> >> >>>  >> Thilina Gunarathne - http://thilinag.blogspot.com
> >> >>>  >>
> >> >>>  >> ---------------------------------------------------------------------
> >> >>>  >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> >>>  >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >> >>>  >>
> >> >>>  >>
> >> >>>  >>
> >> >>>  >
> >> >>>  >
> >> >>>  > ---------------------------------------------------------------------
> >> >>>  > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> >>>  > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >> >>>  >
> >> >>>  >
> >> >>>  >
> >> >>>  >
> >> >>>
> >> >>>
> >> >>>  --
> >> >>>  Samisa Abeysinghe
> >> >>>  Software Architect; WSO2 Inc.
> >> >>>
> >> >>>  http://www.wso2.com/ - "Oxygenating the Web Service Platform."
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>  ---------------------------------------------------------------------
> >> >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Thilina Gunarathne - http://thilinag.blogspot.com
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >> >>
> >> >>
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >> >
> >> >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >>
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>



-- 
Thilina Gunarathne - http://thilinag.blogspot.com

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Samisa Abeysinghe <sa...@wso2.com>.

Senaka Fernando wrote:
>>>  Ideally it should be either the Content-Length or if Chunked then we'll
>>>  have to read until the end of chunks. So, if you see Content-Length,
>>> then
>>>  no need to calculate size while parsing. You already know it. But, your
>>>  proposal is valid for the Chunked case.
>>>       
>> I'm talking about calculating the size for a individual MIME part.. I
>> think I made it clear (+how we do it in Axis2/java) at the beginning
>> of this thread... Probably you need to reread this thread.
>>     
>
> Yep, true. The point is, as I said earlier, several small attachments ==
> one big attachment. Thus, Buffering will have to happen anyway, if the
> Content-Length exceeds the threshold rather than a single attachment
> exceeding the threshold.
>   

No this is a flawed view. You cannot overlay parsing and caching in the 
right manner with this view.

We have parsing in place, what needs to be figured out is the overlay. 
We are going back and forth in this discussion and we are not addressing 
the real issue because the discussion is dragged over and over again 
into parsing problems. If there are parsing problems, could you please 
point out where they are exactly so that we can fix those and be done 
with it. Else we are wasting time on an already solved problem. Rather 
we should move and solve the problem at hand which is caching and 
overlaying it on current parsing logic.

Samisa...

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Senaka Fernando <se...@wso2.com>.

>>  Ideally it should be either the Content-Length or if Chunked then we'll
>>  have to read until the end of chunks. So, if you see Content-Length,
>> then
>>  no need to calculate size while parsing. You already know it. But, your
>>  proposal is valid for the Chunked case.
> I'm talking about calculating the size for a individual MIME part.. I
> think I made it clear (+how we do it in Axis2/java) at the beginning
> of this thread... Probably you need to reread this thread.

Yep, true. The point is, as I said earlier, several small attachments ==
one big attachment. Thus, Buffering will have to happen anyway, if the
Content-Length exceeds the threshold rather than a single attachment
exceeding the threshold.

>
>>  That will cause the stream to exhaust for the lifetime of the request
>>  isn't it? If not can you explain the procedure?
> It works for StAX xml parsing and for axis2/java MIME parsing.. Simply
> increase the socket time out in the client.

+1, That sounds a good idea.

Regards,
Senaka

>
>
> --
> Thilina Gunarathne - http://thilinag.blogspot.com
>


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Thilina Gunarathne <cs...@gmail.com>.

>  Ideally it should be either the Content-Length or if Chunked then we'll
>  have to read until the end of chunks. So, if you see Content-Length, then
>  no need to calculate size while parsing. You already know it. But, your
>  proposal is valid for the Chunked case.
I'm talking about calculating the size for a individual MIME part.. I
think I made it clear (+how we do it in Axis2/java) at the beginning
of this thread... Probably you need to reread this thread.

>  That will cause the stream to exhaust for the lifetime of the request
>  isn't it? If not can you explain the procedure?
It works for StAX xml parsing and for axis2/java MIME parsing.. Simply
increase the socket time out in the client.


-- 
Thilina Gunarathne - http://thilinag.blogspot.com

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Thilina Gunarathne <cs...@gmail.com>.

> Well, the Java folks have the luxury of streams. C has to manage this at
> socket level. So my question was, how are we supposed to do that with C
> network handler?

I'm not familiar with the workings of the C network handler. But for us the
only useful part of the input stream was the read() method, which returns
the filtered bytes according to the function of the wrapper.. I'm sure you
smart people would be able to simulate that easily..

~Thilina


>
> Samisa...
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>


-- 
Thilina Gunarathne - http://thilinag.blogspot.com

Re: Caching support for large attachments

Posted by Samisa Abeysinghe <sa...@wso2.com>.

Thilina Gunarathne wrote:
> >> Also you might want to consider deferred parsing of attachments. That
>
>     >> means read the attachment for the stream only when needed.
>
>     This is what the Java does. 
>
> That's why I proposed you people to consider doing it in C :)..
>
> In Axis2/Java we first parse only the SOAP part (MIME part containing 
> the SOAP envelope) and build the OM tree.  when building we create 
> place holder OMText elements whenever we encounter <xop:include> 
> elements. These OMText's contain the content-id for the attachment, 
> which will be used when buildWithAttachments() or getDataHandler() is 
> called on these objects. At that point we parse the rest of the MIME 
> message up to that particular attachment and return the datahandler 
> for that.
>
>  Also in Axis2/Java we placed the attachment caching code in between 
> the builder.getDataHandler() method presented to the Axis2 engine and 
> the MIME parser. The MIME parser, which has a InputStream interface, 
> returns data belongs to a
>
> Have a look at the following.. 
> http://svn.apache.org/viewvc/webservices/commons/trunk/modules/axiom/modules/axiom-api/src/main/java/org/apache/axiom/attachments/Attachments.java?view=markup
>
> specifically the getPart() method. Our MIME parsing logic is 
> effectively a set of InputStream wrappers.* MIMEBodyPartInputStream 
> which is at the end of this set of wrappers only returns bytes belongs 
> to a particular MIME part. We cache the data which we get through this 
> Stream, in other words already parsed MIME part.
> *

Well, the Java folks have the luxury of streams. C has to manage this at 
socket level. So my question was, how are we supposed to do that with C 
network handler?

Samisa...


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Thilina Gunarathne <cs...@gmail.com>.

>> Also you might want to consider deferred parsing of attachments. That

>  >> means read the attachment for the stream only when needed.
>
> This is what the Java does.

That's why I proposed you people to consider doing it in C :)..

In Axis2/Java we first parse only the SOAP part (MIME part containing the
SOAP envelope) and build the OM tree.  when building we create place holder
OMText elements whenever we encounter <xop:include> elements. These OMText's
contain the content-id for the attachment, which will be used when
buildWithAttachments() or getDataHandler() is called on these objects. At
that point we parse the rest of the MIME message up to that particular
attachment and return the datahandler for that.

 Also in Axis2/Java we placed the attachment caching code in between the
builder.getDataHandler() method presented to the Axis2 engine and the MIME
parser. The MIME parser, which has a InputStream interface, returns data
belongs to a

Have a look at the following..
http://svn.apache.org/viewvc/webservices/commons/trunk/modules/axiom/modules/axiom-api/src/main/java/org/apache/axiom/attachments/Attachments.java?view=markup

specifically the getPart() method. Our MIME parsing logic is effectively a
set of InputStream wrappers.* MIMEBodyPartInputStream which is at the end of
this set of wrappers only returns bytes belongs to a particular MIME part.
We cache the data which we get through this Stream, in other words already
parsed MIME part.

thanks,
Thilina
*

> How can you do this in C?
>
> Samisa...
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>


-- 
Thilina Gunarathne - http://thilinag.blogspot.com

Re: Caching support for large attachments

Posted by Samisa Abeysinghe <sa...@wso2.com>.

>> Also you might want to consider deferred parsing of attachments. That
>> means read the attachment for the stream only when needed. 

This is what the Java does. How can you do this in C?

Samisa...


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Senaka Fernando <se...@wso2.com>.

>>  However, the real issue is how are we going to implement "parse it for
>>  MIME, and then cache it and move on". I still think that it is better
>> to
>>  stick to Thilina's viewpoint in having each attachment cached as a
>>  separate file. And, each attachment should be cached, even if it is
>> small
>>  or large, when the content-length exceeds the threshold.
> What I proposed is not based on the content-length.. It's based on the
> size of a particular attachment. We calculate the size while parsing.
> If the size exceeds a certain limit then put everything to file.

Ideally it should be either the Content-Length or if Chunked then we'll
have to read until the end of chunks. So, if you see Content-Length, then
no need to calculate size while parsing. You already know it. But, your
proposal is valid for the Chunked case.

>
> Also you might want to consider deferred parsing of attachments. That
> means read the attachment for the stream only when needed. Similar in
> concept to StAX parsing of XML.

That will cause the stream to exhaust for the lifetime of the request
isn't it? If not can you explain the procedure?

Regards,
Senaka

>
>>This is because
>>  many small attachments == one big attachment.
> Good point..
>
> thanks,
> Thilina
>
>>Thus, I'm still on the
>>  parse_1st->cache_1st->parse_2nd->cache_2nd->... approach. I don't think
>>  that a cache all at once will give us desirable results.
>>
>>
>>
>>  >>
>>  >>> Writing the partially passed buffer was a solution to caching. Do
>> we
>>  >>> have any other alternatives? If so what, in short, what are they?
>>  >>>
>>  >>
>>  >> We can keep current implementation and write the attachment to a
>> file
>>  >> when it exceeds a certain threshold. This is inside mime_parser
>> means at
>>  >> transport level. So we are not keeping the whole binary inside
>> om_tree
>>  >> during the invocation of handlers and the receiver (may be the
>> actual
>>  >> service or client) can straightaway access the file. Even though
>> this
>>  >> approach will limit the attachment size we can handle to the system
>>  >> available memory , I think it has the added advantage of not keeping
>> the
>>  >> attachment in memory.
>>  >>
>>  >
>>  > How does this compare with what I proposed above in this reply?
>>  >
>>  > Samisa...
>>  >>
>>  >>> Samisa...
>>  >>>
>>  >>>
>>  >>>
>>  >>> ---------------------------------------------------------------------
>>  >>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>  >>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>  >>>
>>  >>>
>>  >>
>>  >>
>>  >> ---------------------------------------------------------------------
>>  >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>  >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>  >>
>>  >>
>>  >>
>>  >>
>>  >
>>  >
>>  > --
>>  > Samisa Abeysinghe
>>  > Software Architect; WSO2 Inc.
>>  >
>>  > http://www.wso2.com/ - "Oxygenating the Web Service Platform."
>>  >
>>  >
>>  >
>>  > ---------------------------------------------------------------------
>>  > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>  > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>  >
>>  >
>>
>>
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>
>>
>
>
>
> --
> Thilina Gunarathne - http://thilinag.blogspot.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Manjula Peiris <ma...@wso2.com>.

On Tue, 2008-03-18 at 03:49 +0530, Samisa Abeysinghe wrote:
> Thilina Gunarathne wrote:
> >>  However, the real issue is how are we going to implement "parse it for
> >>  MIME, and then cache it and move on". I still think that it is better to
> >>  stick to Thilina's viewpoint in having each attachment cached as a
> >>  separate file. And, each attachment should be cached, even if it is small
> >>  or large, when the content-length exceeds the threshold.
> >>     
> > What I proposed is not based on the content-length.. It's based on the
> > size of a particular attachment. We calculate the size while parsing.
> > If the size exceeds a certain limit then put everything to file.
> >
> > Also you might want to consider deferred parsing of attachments. That
> > means read the attachment for the stream only when needed. Similar in
> > concept to StAX parsing of XML.
> >
> >   
> >> This is because
> >>  many small attachments == one big attachment.
> >>     
> > Good point..
> >   
> I do not think so. You do not get mime boundaries in the middle. So the 
> parsing and buffering implications are different.

When there are multiple attachments don't you get mime boundaries in the
middle of the message ?

> 
> Samisa...
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Samisa Abeysinghe <sa...@wso2.com>.

Thilina Gunarathne wrote:
>>  However, the real issue is how are we going to implement "parse it for
>>  MIME, and then cache it and move on". I still think that it is better to
>>  stick to Thilina's viewpoint in having each attachment cached as a
>>  separate file. And, each attachment should be cached, even if it is small
>>  or large, when the content-length exceeds the threshold.
>>     
> What I proposed is not based on the content-length.. It's based on the
> size of a particular attachment. We calculate the size while parsing.
> If the size exceeds a certain limit then put everything to file.
>
> Also you might want to consider deferred parsing of attachments. That
> means read the attachment for the stream only when needed. Similar in
> concept to StAX parsing of XML.
>
>   
>> This is because
>>  many small attachments == one big attachment.
>>     
> Good point..
>   
I do not think so. You do not get mime boundaries in the middle. So the 
parsing and buffering implications are different.

Samisa...


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Thilina Gunarathne <cs...@gmail.com>.

>  However, the real issue is how are we going to implement "parse it for
>  MIME, and then cache it and move on". I still think that it is better to
>  stick to Thilina's viewpoint in having each attachment cached as a
>  separate file. And, each attachment should be cached, even if it is small
>  or large, when the content-length exceeds the threshold.
What I proposed is not based on the content-length.. It's based on the
size of a particular attachment. We calculate the size while parsing.
If the size exceeds a certain limit then put everything to file.

Also you might want to consider deferred parsing of attachments. That
means read the attachment for the stream only when needed. Similar in
concept to StAX parsing of XML.

>This is because
>  many small attachments == one big attachment.
Good point..

thanks,
Thilina

>Thus, I'm still on the
>  parse_1st->cache_1st->parse_2nd->cache_2nd->... approach. I don't think
>  that a cache all at once will give us desirable results.
>
>
>
>  >>
>  >>> Writing the partially passed buffer was a solution to caching. Do we
>  >>> have any other alternatives? If so what, in short, what are they?
>  >>>
>  >>
>  >> We can keep current implementation and write the attachment to a file
>  >> when it exceeds a certain threshold. This is inside mime_parser means at
>  >> transport level. So we are not keeping the whole binary inside om_tree
>  >> during the invocation of handlers and the receiver (may be the actual
>  >> service or client) can straightaway access the file. Even though this
>  >> approach will limit the attachment size we can handle to the system
>  >> available memory , I think it has the added advantage of not keeping the
>  >> attachment in memory.
>  >>
>  >
>  > How does this compare with what I proposed above in this reply?
>  >
>  > Samisa...
>  >>
>  >>> Samisa...
>  >>>
>  >>>
>  >>>
>  >>> ---------------------------------------------------------------------
>  >>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>  >>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>  >>>
>  >>>
>  >>
>  >>
>  >> ---------------------------------------------------------------------
>  >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>  >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>  >>
>  >>
>  >>
>  >>
>  >
>  >
>  > --
>  > Samisa Abeysinghe
>  > Software Architect; WSO2 Inc.
>  >
>  > http://www.wso2.com/ - "Oxygenating the Web Service Platform."
>  >
>  >
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>  > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>  >
>  >
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>



-- 
Thilina Gunarathne - http://thilinag.blogspot.com

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Senaka Fernando <se...@wso2.com>.

> Senaka Fernando wrote:
>> Hi Samisa,
>>
>> IIRC, this discussion is on handling attachments and thus, does not
>> relate
>> to caching. Though $subject says "Caching" what actually was discussed
>> was
>> a mechanism to buffer the attachment in a file, rather than in memory,
>> and
>> that buffer has nothing to do with a Caching, which is a totally
>> different
>> concept, as in [1].
>>
>
> If you look at Axis2/Java caching really means what we discussed, and
> not [1]. In other words the context is different.

Yes, and also thinking about [1], it is actually a implementor's choice
rather, just the same as session. WS Specs do not demand such native
support for either a SOAP or REST engine. Thus, IMHO, even in the long run
[1] will not be needed.

>
>> The previous mail I sent was a reply to Manjula's concern in handling a
>> scenario where the MIME boundary appears as two parts distributed among
>> two reads. As unlike the previous scenarios, the once read block will be
>> flushed to a file, instead of having it in memory. Thus, parsing may
>> have
>> to be thought of. Sorry if it confused you.
>>
>> IMHO, writing a partially parsed buffer to a file is not that efficient
>> as
>> we will have to parse it sometime later, to discover MIME Boundaries and
>> extract attachments. Thus, I still believe that realtime buffering to a
>> file while parsing is still a better choice. To implement such, we will
>> have to modify our mime_parser.c, and probably the data_handler
>> implementation.
>>
>
> That bit needs to be rationalized when implementing the caching, on top
> of current parser logic.

+1,

Regards,
Senaka

>
> Samisa...
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Samisa Abeysinghe <sa...@wso2.com>.

Senaka Fernando wrote:
> Hi Samisa,
>
> IIRC, this discussion is on handling attachments and thus, does not relate
> to caching. Though $subject says "Caching" what actually was discussed was
> a mechanism to buffer the attachment in a file, rather than in memory, and
> that buffer has nothing to do with a Caching, which is a totally different
> concept, as in [1].
>   

If you look at Axis2/Java caching really means what we discussed, and 
not [1]. In other words the context is different.

> The previous mail I sent was a reply to Manjula's concern in handling a
> scenario where the MIME boundary appears as two parts distributed among
> two reads. As unlike the previous scenarios, the once read block will be
> flushed to a file, instead of having it in memory. Thus, parsing may have
> to be thought of. Sorry if it confused you.
>
> IMHO, writing a partially parsed buffer to a file is not that efficient as
> we will have to parse it sometime later, to discover MIME Boundaries and
> extract attachments. Thus, I still believe that realtime buffering to a
> file while parsing is still a better choice. To implement such, we will
> have to modify our mime_parser.c, and probably the data_handler
> implementation.
>   

That bit needs to be rationalized when implementing the caching, on top 
of current parser logic.

Samisa...



---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Manjula Peiris <ma...@wso2.com>.

On Sun, 2008-03-16 at 17:53 +0530, Senaka Fernando wrote:
> Hi Samisa,
> 
> IIRC, this discussion is on handling attachments and thus, does not relate
> to caching. Though $subject says "Caching" what actually was discussed was
> a mechanism to buffer the attachment in a file,

I used the word "Caching" because the jira
https://issues.apache.org/jira/browse/AXIS2C-672

used the word caching and our Axis2/Java uses that word for buffer the
attachment to file. So I don't think any of the developers misunderstood
that.

>  rather than in memory, and
> that buffer has nothing to do with a Caching, which is a totally different
> concept, as in [1].
> 
> The previous mail I sent was a reply to Manjula's concern in handling a
> scenario where the MIME boundary appears as two parts distributed among
> two reads. As unlike the previous scenarios, the once read block will be
> flushed to a file, instead of having it in memory. Thus, parsing may have
> to be thought of. Sorry if it confused you.
> 
> IMHO, writing a partially parsed buffer to a file is not that efficient as
> we will have to parse it sometime later, to discover MIME Boundaries and
> extract attachments. Thus, I still believe that realtime buffering to a
> file while parsing is still a better choice. To implement such, we will
> have to modify our mime_parser.c, and probably the data_handler
> implementation.
> 
> Or if not, am I misunderstanding $subject?
> 
> [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
> 
> Regards,
> Senaka
> 
> > Senaka Fernando wrote:
> >>> Hi Manjula, Thilina and others,
> >>>
> >>> Yep, I think I'm exactly in the same view point as Thilina when it
> >>> comes
> >>> to handling attachment data. Well for the chunking part. I think I
> >>> didn't
> >>> get Thilina right in his first e-mail.
> >>>
> >>> And, However, the file per MIME part may not always be optimal. I say
> >>> rather  each file should have a fixed Max Size and if that is exceeded
> >>> perhaps you can divide it to two. Also a user should always be given
> >>> the
> >>> option to choose between Thilina's method and this method through the
> >>> axis2.xml (or services.xml). Thus, a user can fine tune memory use.
> >>>
> >>> When it comes to base64 encoded binary data, you can use a mechanism
> >>> where
> >>> the buffer would always have the size which is a multiple of 4, and
> >>> then
> >>> when you flush you decode it and copy it to the file, so that should
> >>> essentially be the same to a user when it comes to caching.
> >>>
> >>> OK, so Manjula, you mean when the MIME boundary appears partially in
> >>> the
> >>> first read and partially in the second?
> >>>
> >>> Well this is probably the best solution.
> >>>
> >>> You will allocate enough size to read twice the size of a MIME boundary
> >>> and in your very first read, you will read 2 times the MIME boundary,
> >>> then
> >>> you will search for the existence of the MIME boundary. Next you will
> >>> do a
> >>> memmove() and move all the contents of the buffer starting from the
> >>> MidPoint until the end, to the beginning of the buffer. After doing
> >>> this,
> >>> you will read a size equivalent to 1/2 the buffer (which again is the
> >>> size
> >>> of the MIME boundary marker) and store it from the Mid Point of the
> >>> buffer
> >>> to the end. Then you will search again. You will iterate this procedure
> >>> until you read less than half the size of the buffer.
> >>>
> >>
> >> If you are interested further in this mechanism, I used this approach
> >> when
> >> it comes to resending Binary data using TCPMon. You may check that also.
> >>
> >> Also, the strstr() has issues when you have '\0' in the middle. Thus you
> >> will have to use a temporary search marker and use that in the process.
> >> Before calling strstr() you will check whether strlen(temp) is greater
> >> than the MIME boundary marker or equal. If it is greater, you only need
> >> to
> >> search once. If it is equal, you will need to search exactly twice. If
> >> it
> >> is less you increment temp by strlen(temp) and repeat until you cross
> >> the
> >> Midpoint. So this makes the search even efficient.
> >>
> >> If you want to make the search even efficient, you can make the buffer
> >> size one less than the size of the MIME boundary marker, so when you get
> >> the equals scenario, you will have to search only once.
> >>
> >> The fact I've used here is that strstr and strlen behaves the same in a
> >> given implementation. In Windows if strlen() is multibyte aware, so will
> >> strstr(). So, no worries.
> >>
> >
> > We have an efficient parsing mechanism already, tested and proven to
> > work, with 1.3. Why on earth are we discussing this over and over again?
> >
> > Does caching get affected by the mime parser logic? IMHO no. They are
> > two separate concerns, so ahy are we wasting time discussing parsing
> > while the problem at had is not parsing but caching?
> >
> > Writing the partially passed buffer was a solution to caching. Do we
> > have any other alternatives? If so what, in short, what are they?
> >
> > Samisa...
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Senaka Fernando <se...@wso2.com>.

Hi Samisa,

IIRC, this discussion is on handling attachments and thus, does not relate
to caching. Though $subject says "Caching" what actually was discussed was
a mechanism to buffer the attachment in a file, rather than in memory, and
that buffer has nothing to do with a Caching, which is a totally different
concept, as in [1].

The previous mail I sent was a reply to Manjula's concern in handling a
scenario where the MIME boundary appears as two parts distributed among
two reads. As unlike the previous scenarios, the once read block will be
flushed to a file, instead of having it in memory. Thus, parsing may have
to be thought of. Sorry if it confused you.

IMHO, writing a partially parsed buffer to a file is not that efficient as
we will have to parse it sometime later, to discover MIME Boundaries and
extract attachments. Thus, I still believe that realtime buffering to a
file while parsing is still a better choice. To implement such, we will
have to modify our mime_parser.c, and probably the data_handler
implementation.

Or if not, am I misunderstanding $subject?

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html

Regards,
Senaka

> Senaka Fernando wrote:
>>> Hi Manjula, Thilina and others,
>>>
>>> Yep, I think I'm exactly in the same view point as Thilina when it
>>> comes
>>> to handling attachment data. Well for the chunking part. I think I
>>> didn't
>>> get Thilina right in his first e-mail.
>>>
>>> And, However, the file per MIME part may not always be optimal. I say
>>> rather  each file should have a fixed Max Size and if that is exceeded
>>> perhaps you can divide it to two. Also a user should always be given
>>> the
>>> option to choose between Thilina's method and this method through the
>>> axis2.xml (or services.xml). Thus, a user can fine tune memory use.
>>>
>>> When it comes to base64 encoded binary data, you can use a mechanism
>>> where
>>> the buffer would always have the size which is a multiple of 4, and
>>> then
>>> when you flush you decode it and copy it to the file, so that should
>>> essentially be the same to a user when it comes to caching.
>>>
>>> OK, so Manjula, you mean when the MIME boundary appears partially in
>>> the
>>> first read and partially in the second?
>>>
>>> Well this is probably the best solution.
>>>
>>> You will allocate enough size to read twice the size of a MIME boundary
>>> and in your very first read, you will read 2 times the MIME boundary,
>>> then
>>> you will search for the existence of the MIME boundary. Next you will
>>> do a
>>> memmove() and move all the contents of the buffer starting from the
>>> MidPoint until the end, to the beginning of the buffer. After doing
>>> this,
>>> you will read a size equivalent to 1/2 the buffer (which again is the
>>> size
>>> of the MIME boundary marker) and store it from the Mid Point of the
>>> buffer
>>> to the end. Then you will search again. You will iterate this procedure
>>> until you read less than half the size of the buffer.
>>>
>>
>> If you are interested further in this mechanism, I used this approach
>> when
>> it comes to resending Binary data using TCPMon. You may check that also.
>>
>> Also, the strstr() has issues when you have '\0' in the middle. Thus you
>> will have to use a temporary search marker and use that in the process.
>> Before calling strstr() you will check whether strlen(temp) is greater
>> than the MIME boundary marker or equal. If it is greater, you only need
>> to
>> search once. If it is equal, you will need to search exactly twice. If
>> it
>> is less you increment temp by strlen(temp) and repeat until you cross
>> the
>> Midpoint. So this makes the search even efficient.
>>
>> If you want to make the search even efficient, you can make the buffer
>> size one less than the size of the MIME boundary marker, so when you get
>> the equals scenario, you will have to search only once.
>>
>> The fact I've used here is that strstr and strlen behaves the same in a
>> given implementation. In Windows if strlen() is multibyte aware, so will
>> strstr(). So, no worries.
>>
>
> We have an efficient parsing mechanism already, tested and proven to
> work, with 1.3. Why on earth are we discussing this over and over again?
>
> Does caching get affected by the mime parser logic? IMHO no. They are
> two separate concerns, so ahy are we wasting time discussing parsing
> while the problem at had is not parsing but caching?
>
> Writing the partially passed buffer was a solution to caching. Do we
> have any other alternatives? If so what, in short, what are they?
>
> Samisa...
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Manjula Peiris <ma...@wso2.com>.

Hi Samisa, Senaka and Thilina,

These are my view points on caching and how it can be done in Axis2/C.

For me the main purpose of caching is to support any size of attachment
which depends only on the folder size the attachment is going to be
stored.

In the current implementation (Axis2/C 1.3) we read the whole message
and parse. Before 1.3 the same thing was done (I mean always parse full
buffers not partially read buffers) in a very inefficient manner. 

Now to implement caching we need to change the current logic. In simple
steps these are the things we need to do.

1. First parse the part containing the soap-envelope
2. Then Read up to some threshold and search for the mime_boundary
3. If it is not found then move one half of the buffer to the file and
append some content equal to that part from stream and parse

The step 3 is needed because in case of multiple attachments the mime
boundaries can be there in the middle of the message.

We need above step to fully support caching when there are multiple
attachments. 

If we assume we have only one attachment then we can write whole content
without parsing after it exceeds a certain threshold. Otherwise we need
step 3. So that will require to change the current logic.

Above I described is option 1.

Option 2 is as I suggested previously keep the current logic and after
parsing if the attachment exceeds the limit write it to a file. Please
read the first mail in this thread for more on this. But this will not
achieve the main purpose of caching as I mentioned in the beginning.

Thanks,
-Manjula.   

On Tue, 2008-03-18 at 03:47 +0530, Samisa Abeysinghe wrote:
> Senaka Fernando wrote:
> >> Manjula Peiris wrote:
> >>     
> >>> On Sun, 2008-03-16 at 16:26 +0530, Samisa Abeysinghe wrote:
> >>>
> >>>
> >>>       
> >>>> We have an efficient parsing mechanism already, tested and proven to
> >>>> work, with 1.3. Why on earth are we discussing this over and over
> >>>> again?
> >>>>
> >>>> Does caching get affected by the mime parser logic? IMHO no. They are
> >>>> two separate concerns, so ahy are we wasting time discussing parsing
> >>>> while the problem at had is not parsing but caching?
> >>>>
> >>>>         
> >>> No, the current implementation starts parsing after reading the whole
> >>> stream. Because of that the parsing is simple and efficient. And for
> >>> considerable size of large attachments(eg : 100MB) also it is working
> >>> well. The only problem it has is the attachment size will depend on the
> >>> available system memory.
> >>>
> >>>       
> >> Still, my argument on the separation of concerns on caching vs. parsing
> >> holds.
> >> It is a question about what takes precedence over the other. If the
> >> attachment is too large, we need to interleave the concepts, where you
> >> read a considerable amount that is ideal in size in terms of caching,
> >> parse it for MIME, and then cache it and move on.
> >>     
> >
> > Parsing will always be choice No. 1. We cache only if we can't handle it.
> >
> > However, the real issue is how are we going to implement "parse it for
> > MIME, and then cache it and move on". I still think that it is better to
> > stick to Thilina's viewpoint in having each attachment cached as a
> > separate file. And, each attachment should be cached, even if it is small
> > or large, when the content-length exceeds the threshold. This is because
> > many small attachments == one big attachment. Thus, I'm still on the
> > parse_1st->cache_1st->parse_2nd->cache_2nd->... approach. I don't think
> > that a cache all at once will give us desirable results.
> >   
> 
> I do not think you seem to understand what I am talking about. Seperate
> attachments do need to go to seperate files. There is no question about
> that. The queestion  here is not about multiple attachments. The
> question is about "very large attachment".
> 
> Samisa...
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Samisa Abeysinghe <sa...@wso2.com>.

Senaka Fernando wrote:
>> Manjula Peiris wrote:
>>     
>>> On Sun, 2008-03-16 at 16:26 +0530, Samisa Abeysinghe wrote:
>>>
>>>
>>>       
>>>> We have an efficient parsing mechanism already, tested and proven to
>>>> work, with 1.3. Why on earth are we discussing this over and over
>>>> again?
>>>>
>>>> Does caching get affected by the mime parser logic? IMHO no. They are
>>>> two separate concerns, so ahy are we wasting time discussing parsing
>>>> while the problem at had is not parsing but caching?
>>>>
>>>>         
>>> No, the current implementation starts parsing after reading the whole
>>> stream. Because of that the parsing is simple and efficient. And for
>>> considerable size of large attachments(eg : 100MB) also it is working
>>> well. The only problem it has is the attachment size will depend on the
>>> available system memory.
>>>
>>>       
>> Still, my argument on the separation of concerns on caching vs. parsing
>> holds.
>> It is a question about what takes precedence over the other. If the
>> attachment is too large, we need to interleave the concepts, where you
>> read a considerable amount that is ideal in size in terms of caching,
>> parse it for MIME, and then cache it and move on.
>>     
>
> Parsing will always be choice No. 1. We cache only if we can't handle it.
>
> However, the real issue is how are we going to implement "parse it for
> MIME, and then cache it and move on". I still think that it is better to
> stick to Thilina's viewpoint in having each attachment cached as a
> separate file. And, each attachment should be cached, even if it is small
> or large, when the content-length exceeds the threshold. This is because
> many small attachments == one big attachment. Thus, I'm still on the
> parse_1st->cache_1st->parse_2nd->cache_2nd->... approach. I don't think
> that a cache all at once will give us desirable results.
>   

I do not think you seem to understand what I am talking about. Seperate
attachments do need to go to seperate files. There is no question about
that. The queestion  here is not about multiple attachments. The
question is about "very large attachment".

Samisa...



---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Senaka Fernando <se...@wso2.com>.

> Manjula Peiris wrote:
>> On Sun, 2008-03-16 at 16:26 +0530, Samisa Abeysinghe wrote:
>>
>>
>>> We have an efficient parsing mechanism already, tested and proven to
>>> work, with 1.3. Why on earth are we discussing this over and over
>>> again?
>>>
>>> Does caching get affected by the mime parser logic? IMHO no. They are
>>> two separate concerns, so ahy are we wasting time discussing parsing
>>> while the problem at had is not parsing but caching?
>>>
>>
>> No, the current implementation starts parsing after reading the whole
>> stream. Because of that the parsing is simple and efficient. And for
>> considerable size of large attachments(eg : 100MB) also it is working
>> well. The only problem it has is the attachment size will depend on the
>> available system memory.
>>
>
> Still, my argument on the separation of concerns on caching vs. parsing
> holds.
> It is a question about what takes precedence over the other. If the
> attachment is too large, we need to interleave the concepts, where you
> read a considerable amount that is ideal in size in terms of caching,
> parse it for MIME, and then cache it and move on.

Parsing will always be choice No. 1. We cache only if we can't handle it.

However, the real issue is how are we going to implement "parse it for
MIME, and then cache it and move on". I still think that it is better to
stick to Thilina's viewpoint in having each attachment cached as a
separate file. And, each attachment should be cached, even if it is small
or large, when the content-length exceeds the threshold. This is because
many small attachments == one big attachment. Thus, I'm still on the
parse_1st->cache_1st->parse_2nd->cache_2nd->... approach. I don't think
that a cache all at once will give us desirable results.

>>
>>> Writing the partially passed buffer was a solution to caching. Do we
>>> have any other alternatives? If so what, in short, what are they?
>>>
>>
>> We can keep current implementation and write the attachment to a file
>> when it exceeds a certain threshold. This is inside mime_parser means at
>> transport level. So we are not keeping the whole binary inside om_tree
>> during the invocation of handlers and the receiver (may be the actual
>> service or client) can straightaway access the file. Even though this
>> approach will limit the attachment size we can handle to the system
>> available memory , I think it has the added advantage of not keeping the
>> attachment in memory.
>>
>
> How does this compare with what I proposed above in this reply?
>
> Samisa...
>>
>>> Samisa...
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>
>>
>>
>>
>
>
> --
> Samisa Abeysinghe
> Software Architect; WSO2 Inc.
>
> http://www.wso2.com/ - "Oxygenating the Web Service Platform."
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Samisa Abeysinghe <sa...@wso2.com>.

Manjula Peiris wrote:
> On Sun, 2008-03-16 at 16:26 +0530, Samisa Abeysinghe wrote:
>
>   
>> We have an efficient parsing mechanism already, tested and proven to 
>> work, with 1.3. Why on earth are we discussing this over and over again?
>>
>> Does caching get affected by the mime parser logic? IMHO no. They are 
>> two separate concerns, so ahy are we wasting time discussing parsing 
>> while the problem at had is not parsing but caching?
>>     
>
> No, the current implementation starts parsing after reading the whole
> stream. Because of that the parsing is simple and efficient. And for
> considerable size of large attachments(eg : 100MB) also it is working
> well. The only problem it has is the attachment size will depend on the
> available system memory.  
>   

Still, my argument on the separation of concerns on caching vs. parsing
holds.
It is a question about what takes precedence over the other. If the
attachment is too large, we need to interleave the concepts, where you
read a considerable amount that is ideal in size in terms of caching,
parse it for MIME, and then cache it and move on.
>   
>> Writing the partially passed buffer was a solution to caching. Do we 
>> have any other alternatives? If so what, in short, what are they?
>>     
>
> We can keep current implementation and write the attachment to a file
> when it exceeds a certain threshold. This is inside mime_parser means at
> transport level. So we are not keeping the whole binary inside om_tree
> during the invocation of handlers and the receiver (may be the actual
> service or client) can straightaway access the file. Even though this
> approach will limit the attachment size we can handle to the system
> available memory , I think it has the added advantage of not keeping the
> attachment in memory.
>   

How does this compare with what I proposed above in this reply?

Samisa...
>   
>> Samisa...
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>
>
>   


-- 
Samisa Abeysinghe
Software Architect; WSO2 Inc.

http://www.wso2.com/ - "Oxygenating the Web Service Platform."



---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Manjula Peiris <ma...@wso2.com>.

On Sun, 2008-03-16 at 16:26 +0530, Samisa Abeysinghe wrote:

> We have an efficient parsing mechanism already, tested and proven to 
> work, with 1.3. Why on earth are we discussing this over and over again?
> 
> Does caching get affected by the mime parser logic? IMHO no. They are 
> two separate concerns, so ahy are we wasting time discussing parsing 
> while the problem at had is not parsing but caching?

No, the current implementation starts parsing after reading the whole
stream. Because of that the parsing is simple and efficient. And for
considerable size of large attachments(eg : 100MB) also it is working
well. The only problem it has is the attachment size will depend on the
available system memory.  

> Writing the partially passed buffer was a solution to caching. Do we 
> have any other alternatives? If so what, in short, what are they?

We can keep current implementation and write the attachment to a file
when it exceeds a certain threshold. This is inside mime_parser means at
transport level. So we are not keeping the whole binary inside om_tree
during the invocation of handlers and the receiver (may be the actual
service or client) can straightaway access the file. Even though this
approach will limit the attachment size we can handle to the system
available memory , I think it has the added advantage of not keeping the
attachment in memory.

> 
> Samisa...
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Samisa Abeysinghe <sa...@wso2.com>.

Senaka Fernando wrote:
>> Hi Manjula, Thilina and others,
>>
>> Yep, I think I'm exactly in the same view point as Thilina when it comes
>> to handling attachment data. Well for the chunking part. I think I didn't
>> get Thilina right in his first e-mail.
>>
>> And, However, the file per MIME part may not always be optimal. I say
>> rather  each file should have a fixed Max Size and if that is exceeded
>> perhaps you can divide it to two. Also a user should always be given the
>> option to choose between Thilina's method and this method through the
>> axis2.xml (or services.xml). Thus, a user can fine tune memory use.
>>
>> When it comes to base64 encoded binary data, you can use a mechanism where
>> the buffer would always have the size which is a multiple of 4, and then
>> when you flush you decode it and copy it to the file, so that should
>> essentially be the same to a user when it comes to caching.
>>
>> OK, so Manjula, you mean when the MIME boundary appears partially in the
>> first read and partially in the second?
>>
>> Well this is probably the best solution.
>>
>> You will allocate enough size to read twice the size of a MIME boundary
>> and in your very first read, you will read 2 times the MIME boundary, then
>> you will search for the existence of the MIME boundary. Next you will do a
>> memmove() and move all the contents of the buffer starting from the
>> MidPoint until the end, to the beginning of the buffer. After doing this,
>> you will read a size equivalent to 1/2 the buffer (which again is the size
>> of the MIME boundary marker) and store it from the Mid Point of the buffer
>> to the end. Then you will search again. You will iterate this procedure
>> until you read less than half the size of the buffer.
>>     
>
> If you are interested further in this mechanism, I used this approach when
> it comes to resending Binary data using TCPMon. You may check that also.
>
> Also, the strstr() has issues when you have '\0' in the middle. Thus you
> will have to use a temporary search marker and use that in the process.
> Before calling strstr() you will check whether strlen(temp) is greater
> than the MIME boundary marker or equal. If it is greater, you only need to
> search once. If it is equal, you will need to search exactly twice. If it
> is less you increment temp by strlen(temp) and repeat until you cross the
> Midpoint. So this makes the search even efficient.
>
> If you want to make the search even efficient, you can make the buffer
> size one less than the size of the MIME boundary marker, so when you get
> the equals scenario, you will have to search only once.
>
> The fact I've used here is that strstr and strlen behaves the same in a
> given implementation. In Windows if strlen() is multibyte aware, so will
> strstr(). So, no worries.
>   

We have an efficient parsing mechanism already, tested and proven to 
work, with 1.3. Why on earth are we discussing this over and over again?

Does caching get affected by the mime parser logic? IMHO no. They are 
two separate concerns, so ahy are we wasting time discussing parsing 
while the problem at had is not parsing but caching?

Writing the partially passed buffer was a solution to caching. Do we 
have any other alternatives? If so what, in short, what are they?

Samisa...



---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Senaka Fernando <se...@wso2.com>.

> Hi Manjula, Thilina and others,
>
> Yep, I think I'm exactly in the same view point as Thilina when it comes
> to handling attachment data. Well for the chunking part. I think I didn't
> get Thilina right in his first e-mail.
>
> And, However, the file per MIME part may not always be optimal. I say
> rather  each file should have a fixed Max Size and if that is exceeded
> perhaps you can divide it to two. Also a user should always be given the
> option to choose between Thilina's method and this method through the
> axis2.xml (or services.xml). Thus, a user can fine tune memory use.
>
> When it comes to base64 encoded binary data, you can use a mechanism where
> the buffer would always have the size which is a multiple of 4, and then
> when you flush you decode it and copy it to the file, so that should
> essentially be the same to a user when it comes to caching.
>
> OK, so Manjula, you mean when the MIME boundary appears partially in the
> first read and partially in the second?
>
> Well this is probably the best solution.
>
> You will allocate enough size to read twice the size of a MIME boundary
> and in your very first read, you will read 2 times the MIME boundary, then
> you will search for the existence of the MIME boundary. Next you will do a
> memmove() and move all the contents of the buffer starting from the
> MidPoint until the end, to the beginning of the buffer. After doing this,
> you will read a size equivalent to 1/2 the buffer (which again is the size
> of the MIME boundary marker) and store it from the Mid Point of the buffer
> to the end. Then you will search again. You will iterate this procedure
> until you read less than half the size of the buffer.

If you are interested further in this mechanism, I used this approach when
it comes to resending Binary data using TCPMon. You may check that also.

Also, the strstr() has issues when you have '\0' in the middle. Thus you
will have to use a temporary search marker and use that in the process.
Before calling strstr() you will check whether strlen(temp) is greater
than the MIME boundary marker or equal. If it is greater, you only need to
search once. If it is equal, you will need to search exactly twice. If it
is less you increment temp by strlen(temp) and repeat until you cross the
Midpoint. So this makes the search even efficient.

If you want to make the search even efficient, you can make the buffer
size one less than the size of the MIME boundary marker, so when you get
the equals scenario, you will have to search only once.

The fact I've used here is that strstr and strlen behaves the same in a
given implementation. In Windows if strlen() is multibyte aware, so will
strstr(). So, no worries.

Regards,
Senaka

>
> HTH,
> Regards,
> Senaka
>
>>
>> On Sat, 2008-03-15 at 16:03 +0530, Senaka Fernando wrote:
>>> Hi Manjula,
>>>
>>> Please read my reply inline.
>>>
>>> > Hi Senaka,
>>> >
>>> > I am confused here. I think you are taking the discussion to the
>>> > beginning. Because in the receiving side we read till the end of the
>>> > stream. Please see my first mail.
>>>
>>> No I'm not taking the discussion to the starting point. I'm rather
>>> proposing an alternative implementation. According to what I mention
>>> here,
>>> we will rather still read till the end of the stream. But, we will not
>>> buffer everything we read into memory. We will flush the buffer to a
>>> file
>>> once it exceeds a threshold. However, when we read beyond the buffer
>>> size,
>>> we will not directly copy the entire content to file without parsing
>>> it.
>>> Instead we will use our fixed-sized buffer to temporarily store the
>>> content before being flushed and then parse it and write it to file.
>>> Thus,
>>> the file will contain only the binary part. It will not contain the
>>> "--MIMEBoundary" statements etc. These, along with the file name(s) can
>>> be
>>> stored into the parsed attachment object created. Thus, the memory
>>> consumption will be limited to the size of the fixed buffer and we will
>>> use the file for storage. This mechanism gives us the added plus of not
>>> having to worry about re-parsing what is written to file as it has
>>> already
>>> being parsed once. Please note that MIME parsing DOES NOT require us to
>>> store the entire content in memory.
>>
>> For me this is same as what Thilina is saying. So again I need to ask
>> the question what happened when the mime boundary is divided between two
>> reads.
>>
>>>
>>> >
>>> > When sending writing part by part to the stream is same as chunking.
>>> > Because when sending either you should specify a content-length or
>>> > specified it as chunked.
>>>
>>> No, it is not the same as chunking. What I meant here is that you need
>>> not
>>> read the entire content at once to memory and write to the stream in a
>>> single step. Rather we can read part by part and write it to the stream
>>> and repeat the process until the whole large file is written. In here
>>> you
>>> will still be using the Content Length. Chunking is a whole different
>>> story where you can transmit data as blocks. Using chunking we can send
>>> an
>>> arbitrary length of data of which the length is not pre-calculated. Now
>>> you might wonder how do we calculate the content-length without reading
>>> the entire content to the memory. Well, you can seek through the file
>>> and
>>> find out the size of the content to be written. Add to it the standard
>>> header block and MIME boundary demarcation string lengths and you will
>>> get
>>> the Content Length. This is a not at all expensive operation as the
>>> file
>>> seek will be scanning the file as a block without reading it to memory.
>>> The OS will manage it's efficiency.
>>
>> Here also I think it is same as what Thilina is saying.
>>
>> Thanks,
>> -Manjula.
>>
>>>
>>> >
>>> > -Manjula.
>>>
>>> Regards,
>>> Senaka
>>>
>>> >
>>> > On Sat, 2008-03-15 at 13:39 +0530, Senaka Fernando wrote:
>>> >> >>>  BTW, this whole discussion is about in path, that is reading an
>>> >> >>>  incomming message. How about the out path? We have the same
>>> >> problems
>>> >> >>>  when sending attachments. Right now, we read the whole file
>>> into
>>> >> >>> memory
>>> >> >>>  and then only we send over the wire.
>>> >> >> hmm... Why not write it in chunks.. Read a chunk from the file,
>>> then
>>> >> >> write it to the outstream.. Use size of the file for content-type
>>> >> >> calculation in case of non-chunking.. But mostly people will use
>>> >> >> chunking when using MTOM..
>>> >> >
>>> >> > No, chunking is not required. You also don't need to write the
>>> entire
>>> >> data
>>> >> > to be sent, to the stream at once. Because any HTTP Receiver will
>>> pull
>>> >> > from the stream until it sees a valid ending character sequence.
>>> >>
>>> >> It should rather read a length equal to content length. And the
>>> >> terminating sequence is for headers. Sorry for the confusion.
>>> Therefore,
>>> >> the HTTP Receiver will pull from the stream until it reads a content
>>> >> length or until an error occurs.
>>> >>
>>> >> >
>>> >> > I believe that you should be able to write part by part to the
>>> stream,
>>> >> and
>>> >> > send it, then reuse the buffer and write part 2, and send and so
>>> on.
>>> >> This
>>> >> > argument can be justified, because on the receiving end, we must
>>> read
>>> >> the
>>> >> > multi-part data until we encounter the mime boundary, unlike an
>>> >> ordinary
>>> >> > payload where it can be terminated by a valid terminating
>>> character
>>> >>
>>> >> Same here. We'll be reading a length equal to content length.
>>> >>
>>> >> > sequence . We'll only have issues if we are to write large soap
>>> >> payloads
>>> >> > which of course can be dealt with once we've implemented Session
>>> in
>>> >> > Axis2/C.
>>> >> >
>>> >> > Regards,
>>> >> > Senaka
>>> >> >
>>> >> >>
>>> >> >> thanks,
>>> >> >> Thilina
>>> >> >>
>>> >> >>
>>> >> >>>
>>> >> >>>  Samisa...
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>  > Regards,
>>> >> >>>  > Senaka
>>> >> >>>  >
>>> >> >>>  >
>>> >> >>>  >> Hi,
>>> >> >>>  >>
>>> >> >>>  >>>  > In Axis2/Java case we do write the attachment content
>>> >> directly
>>> >> >>> from
>>> >> >>>  >>>  > the InputStream to the File when the attachment size is
>>> >> larger
>>> >> >>> than
>>> >> >>>  >>>  > the threshold.  This avoids loading the whole attachment
>>> to
>>> >> the
>>> >> >>>  >>> memory
>>> >> >>>  >>>  > at all.
>>> >> >>>  >>>
>>> >> >>>  >>>  In this case to find out the attachment size don't you
>>> need
>>> to
>>> >> do
>>> >> >>> any
>>> >> >>>  >>>  mime parsing? How do you find the attachment size with out
>>> >> >>> searching
>>> >> >>>  >>> for
>>> >> >>>  >>>  the mime boundaries ?
>>> >> >>>  >>>
>>> >> >>>  >> Yes.. MIME is a boundary based packaging mechanism and you
>>> does
>>> >> not
>>> >> >>>  >> need to specify the length for each of the parts...Even the
>>> HTTP
>>> >> >>>  >> content length is not there if the message is chunked.
>>> >> >>>  >>
>>> >> >>>  >> What we did in Axis2/Java to overcome this is to read the
>>> data
>>> >> to a
>>> >> >>>  >> byte[] buffer of up to a certain size (the size threshold).
>>> If
>>> >> >>> there
>>> >> >>>  >> are more data available in the mime part (if we have not
>>> >> >>> encountered
>>> >> >>>  >> the boundary yet) then we know this attachment is bigger
>>> than
>>> >> the
>>> >> >>>  >> threshold. So we create the temp file, pump the content in
>>> the
>>> >> >>> buffer
>>> >> >>>  >> to the file, then pump the rest of the stream to the file..
>>> In
>>> >> this
>>> >> >>>  >> way we do not need to know the size of the attachment
>>> upfront..
>>> >> BTW
>>> >> >>> we
>>> >> >>>  >> do all of the above while we are parsing the MIME message at
>>> the
>>> >> >>> MIME
>>> >> >>>  >> parser level..
>>> >> >>>  >>
>>> >> >>>  >>
>>> >> >>>  >>>  > This has the plus point that the attachment size will be
>>> >> >>>  >>>  > limited only by the available free space in the Temp
>>> >> >>> Directory..
>>> >> >>>  >>>  > Will that be possible in Axis2/C.. Or is that wat you
>>> have
>>> >> in
>>> >> >>> mind
>>> >> >>>  >>> :)..
>>> >> >>>  >>>
>>> >> >>>  >>>  Yes this is possible.
>>> >> >>>  >>>
>>> >> >>>  >> But in Axis2/JAVA we will get a OutOfMemory if we parse a
>>> large
>>> >> >>> MIME
>>> >> >>>  >> part upfront, since it reads the attachment to memory. May
>>> be
>>> >> you
>>> >> >>> can
>>> >> >>>  >> have a larger limit with C than in Java, but ultimately
>>> you'll
>>> >> come
>>> >> >>> to
>>> >> >>>  >> a situation where you will not have enough memory to store
>>> that
>>> >> >>> MIME
>>> >> >>>  >> part in memory in the parsing time, unless you write in to a
>>> >> File
>>> >> >>>  >> while parsing,..
>>> >> >>>  >>
>>> >> >>>  >> thanks,
>>> >> >>>  >> Thilina
>>> >> >>>  >>
>>> >> >>>  >>
>>> >> >>>  >>>
>>> >> >>>  >>>  >
>>> >> >>>  >>>  > thanks,
>>> >> >>>  >>>  > Thilina
>>> >> >>>  >>>  >
>>> >> >>>  >>>  >  >and keeping the file name inside
>>> >> >>>  >>>  > >  data_handler instead of the whole buffer. So the
>>> service
>>> >> or
>>> >> >>> the
>>> >> >>>  >>> client
>>> >> >>>  >>>  > >  will get the file name instead of the buffered
>>> stream,
>>> >> when
>>> >> >>> it
>>> >> >>>  >>> receives
>>> >> >>>  >>>  > >  an attachment. This will not prevent buffering the
>>> >> >>> attachment
>>> >> >>> at
>>> >> >>>  >>> the
>>> >> >>>  >>>  > >  transport but will prevent keeping it inside the
>>> om_tree
>>> >> >>> till
>>> >> >>> it
>>> >> >>>  >>> reaches
>>> >> >>>  >>>  > >  the receiver.
>>> >> >>>  >>>  > >
>>> >> >>>  >>>  > >  Before implementing this I would like to know your
>>> >> >>> suggestions
>>> >> >>>  >>> regarding
>>> >> >>>  >>>  > >  this.
>>> >> >>>  >>>  > >
>>> >> >>>  >>>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
>>> >> >>>  >>>  > >
>>> >> >>>  >>>  > >  Thanks,
>>> >> >>>  >>>  > >  -Manjula
>>> >> >>>  >>>  > >
>>> >> >>>  >>>  > >  --
>>> >> >>>  >>>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
>>> >> >>>  >>>  > >
>>> >> >>>  >>>  > >
>>> >> >>>  >>>  > >  ---------------------------------------------------------------------
>>> >> >>>  >>>  > >  To unsubscribe, e-mail:
>>> >> axis-c-dev-unsubscribe@ws.apache.org
>>> >> >>>  >>>  > >  For additional commands, e-mail:
>>> >> >>> axis-c-dev-help@ws.apache.org
>>> >> >>>  >>>  > >
>>> >> >>>  >>>  > >
>>> >> >>>  >>>  >
>>> >> >>>  >>>  >
>>> >> >>>  >>>  >
>>> >> >>>  >>>
>>> >> >>>  >>>
>>> >> >>>  >>>  ---------------------------------------------------------------------
>>> >> >>>  >>>  To unsubscribe, e-mail:
>>> axis-c-dev-unsubscribe@ws.apache.org
>>> >> >>>  >>>  For additional commands, e-mail:
>>> axis-c-dev-help@ws.apache.org
>>> >> >>>  >>>
>>> >> >>>  >>>
>>> >> >>>  >>>
>>> >> >>>  >>
>>> >> >>>  >> --
>>> >> >>>  >> Thilina Gunarathne - http://thilinag.blogspot.com
>>> >> >>>  >>
>>> >> >>>  >> ---------------------------------------------------------------------
>>> >> >>>  >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>> >> >>>  >> For additional commands, e-mail:
>>> axis-c-dev-help@ws.apache.org
>>> >> >>>  >>
>>> >> >>>  >>
>>> >> >>>  >>
>>> >> >>>  >
>>> >> >>>  >
>>> >> >>>  > ---------------------------------------------------------------------
>>> >> >>>  > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>> >> >>>  > For additional commands, e-mail:
>>> axis-c-dev-help@ws.apache.org
>>> >> >>>  >
>>> >> >>>  >
>>> >> >>>  >
>>> >> >>>  >
>>> >> >>>
>>> >> >>>
>>> >> >>>  --
>>> >> >>>  Samisa Abeysinghe
>>> >> >>>  Software Architect; WSO2 Inc.
>>> >> >>>
>>> >> >>>  http://www.wso2.com/ - "Oxygenating the Web Service Platform."
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>  ---------------------------------------------------------------------
>>> >> >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>> >> >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>> >> >>>
>>> >> >>>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Thilina Gunarathne - http://thilinag.blogspot.com
>>> >> >>
>>> >> >> ---------------------------------------------------------------------
>>> >> >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>> >> >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>> >> >>
>>> >> >>
>>> >> >
>>> >> >
>>> >> > ---------------------------------------------------------------------
>>> >> > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>> >> > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>> >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>> >>
>>> >
>>> >
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Senaka Fernando <se...@wso2.com>.

Hi Manjula, Thilina and others,

Yep, I think I'm exactly in the same view point as Thilina when it comes
to handling attachment data. Well for the chunking part. I think I didn't
get Thilina right in his first e-mail.

And, However, the file per MIME part may not always be optimal. I say
rather  each file should have a fixed Max Size and if that is exceeded
perhaps you can divide it to two. Also a user should always be given the
option to choose between Thilina's method and this method through the
axis2.xml (or services.xml). Thus, a user can fine tune memory use.

When it comes to base64 encoded binary data, you can use a mechanism where
the buffer would always have the size which is a multiple of 4, and then
when you flush you decode it and copy it to the file, so that should
essentially be the same to a user when it comes to caching.

OK, so Manjula, you mean when the MIME boundary appears partially in the
first read and partially in the second?

Well this is probably the best solution.

You will allocate enough size to read twice the size of a MIME boundary
and in your very first read, you will read 2 times the MIME boundary, then
you will search for the existence of the MIME boundary. Next you will do a
memmove() and move all the contents of the buffer starting from the
MidPoint until the end, to the beginning of the buffer. After doing this,
you will read a size equivalent to 1/2 the buffer (which again is the size
of the MIME boundary marker) and store it from the Mid Point of the buffer
to the end. Then you will search again. You will iterate this procedure
until you read less than half the size of the buffer.

HTH,
Regards,
Senaka

>
> On Sat, 2008-03-15 at 16:03 +0530, Senaka Fernando wrote:
>> Hi Manjula,
>>
>> Please read my reply inline.
>>
>> > Hi Senaka,
>> >
>> > I am confused here. I think you are taking the discussion to the
>> > beginning. Because in the receiving side we read till the end of the
>> > stream. Please see my first mail.
>>
>> No I'm not taking the discussion to the starting point. I'm rather
>> proposing an alternative implementation. According to what I mention
>> here,
>> we will rather still read till the end of the stream. But, we will not
>> buffer everything we read into memory. We will flush the buffer to a
>> file
>> once it exceeds a threshold. However, when we read beyond the buffer
>> size,
>> we will not directly copy the entire content to file without parsing it.
>> Instead we will use our fixed-sized buffer to temporarily store the
>> content before being flushed and then parse it and write it to file.
>> Thus,
>> the file will contain only the binary part. It will not contain the
>> "--MIMEBoundary" statements etc. These, along with the file name(s) can
>> be
>> stored into the parsed attachment object created. Thus, the memory
>> consumption will be limited to the size of the fixed buffer and we will
>> use the file for storage. This mechanism gives us the added plus of not
>> having to worry about re-parsing what is written to file as it has
>> already
>> being parsed once. Please note that MIME parsing DOES NOT require us to
>> store the entire content in memory.
>
> For me this is same as what Thilina is saying. So again I need to ask
> the question what happened when the mime boundary is divided between two
> reads.
>
>>
>> >
>> > When sending writing part by part to the stream is same as chunking.
>> > Because when sending either you should specify a content-length or
>> > specified it as chunked.
>>
>> No, it is not the same as chunking. What I meant here is that you need
>> not
>> read the entire content at once to memory and write to the stream in a
>> single step. Rather we can read part by part and write it to the stream
>> and repeat the process until the whole large file is written. In here
>> you
>> will still be using the Content Length. Chunking is a whole different
>> story where you can transmit data as blocks. Using chunking we can send
>> an
>> arbitrary length of data of which the length is not pre-calculated. Now
>> you might wonder how do we calculate the content-length without reading
>> the entire content to the memory. Well, you can seek through the file
>> and
>> find out the size of the content to be written. Add to it the standard
>> header block and MIME boundary demarcation string lengths and you will
>> get
>> the Content Length. This is a not at all expensive operation as the file
>> seek will be scanning the file as a block without reading it to memory.
>> The OS will manage it's efficiency.
>
> Here also I think it is same as what Thilina is saying.
>
> Thanks,
> -Manjula.
>
>>
>> >
>> > -Manjula.
>>
>> Regards,
>> Senaka
>>
>> >
>> > On Sat, 2008-03-15 at 13:39 +0530, Senaka Fernando wrote:
>> >> >>>  BTW, this whole discussion is about in path, that is reading an
>> >> >>>  incomming message. How about the out path? We have the same
>> >> problems
>> >> >>>  when sending attachments. Right now, we read the whole file into
>> >> >>> memory
>> >> >>>  and then only we send over the wire.
>> >> >> hmm... Why not write it in chunks.. Read a chunk from the file,
>> then
>> >> >> write it to the outstream.. Use size of the file for content-type
>> >> >> calculation in case of non-chunking.. But mostly people will use
>> >> >> chunking when using MTOM..
>> >> >
>> >> > No, chunking is not required. You also don't need to write the
>> entire
>> >> data
>> >> > to be sent, to the stream at once. Because any HTTP Receiver will
>> pull
>> >> > from the stream until it sees a valid ending character sequence.
>> >>
>> >> It should rather read a length equal to content length. And the
>> >> terminating sequence is for headers. Sorry for the confusion.
>> Therefore,
>> >> the HTTP Receiver will pull from the stream until it reads a content
>> >> length or until an error occurs.
>> >>
>> >> >
>> >> > I believe that you should be able to write part by part to the
>> stream,
>> >> and
>> >> > send it, then reuse the buffer and write part 2, and send and so
>> on.
>> >> This
>> >> > argument can be justified, because on the receiving end, we must
>> read
>> >> the
>> >> > multi-part data until we encounter the mime boundary, unlike an
>> >> ordinary
>> >> > payload where it can be terminated by a valid terminating character
>> >>
>> >> Same here. We'll be reading a length equal to content length.
>> >>
>> >> > sequence . We'll only have issues if we are to write large soap
>> >> payloads
>> >> > which of course can be dealt with once we've implemented Session in
>> >> > Axis2/C.
>> >> >
>> >> > Regards,
>> >> > Senaka
>> >> >
>> >> >>
>> >> >> thanks,
>> >> >> Thilina
>> >> >>
>> >> >>
>> >> >>>
>> >> >>>  Samisa...
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>  > Regards,
>> >> >>>  > Senaka
>> >> >>>  >
>> >> >>>  >
>> >> >>>  >> Hi,
>> >> >>>  >>
>> >> >>>  >>>  > In Axis2/Java case we do write the attachment content
>> >> directly
>> >> >>> from
>> >> >>>  >>>  > the InputStream to the File when the attachment size is
>> >> larger
>> >> >>> than
>> >> >>>  >>>  > the threshold.  This avoids loading the whole attachment
>> to
>> >> the
>> >> >>>  >>> memory
>> >> >>>  >>>  > at all.
>> >> >>>  >>>
>> >> >>>  >>>  In this case to find out the attachment size don't you need
>> to
>> >> do
>> >> >>> any
>> >> >>>  >>>  mime parsing? How do you find the attachment size with out
>> >> >>> searching
>> >> >>>  >>> for
>> >> >>>  >>>  the mime boundaries ?
>> >> >>>  >>>
>> >> >>>  >> Yes.. MIME is a boundary based packaging mechanism and you
>> does
>> >> not
>> >> >>>  >> need to specify the length for each of the parts...Even the
>> HTTP
>> >> >>>  >> content length is not there if the message is chunked.
>> >> >>>  >>
>> >> >>>  >> What we did in Axis2/Java to overcome this is to read the
>> data
>> >> to a
>> >> >>>  >> byte[] buffer of up to a certain size (the size threshold).
>> If
>> >> >>> there
>> >> >>>  >> are more data available in the mime part (if we have not
>> >> >>> encountered
>> >> >>>  >> the boundary yet) then we know this attachment is bigger than
>> >> the
>> >> >>>  >> threshold. So we create the temp file, pump the content in
>> the
>> >> >>> buffer
>> >> >>>  >> to the file, then pump the rest of the stream to the file..
>> In
>> >> this
>> >> >>>  >> way we do not need to know the size of the attachment
>> upfront..
>> >> BTW
>> >> >>> we
>> >> >>>  >> do all of the above while we are parsing the MIME message at
>> the
>> >> >>> MIME
>> >> >>>  >> parser level..
>> >> >>>  >>
>> >> >>>  >>
>> >> >>>  >>>  > This has the plus point that the attachment size will be
>> >> >>>  >>>  > limited only by the available free space in the Temp
>> >> >>> Directory..
>> >> >>>  >>>  > Will that be possible in Axis2/C.. Or is that wat you
>> have
>> >> in
>> >> >>> mind
>> >> >>>  >>> :)..
>> >> >>>  >>>
>> >> >>>  >>>  Yes this is possible.
>> >> >>>  >>>
>> >> >>>  >> But in Axis2/JAVA we will get a OutOfMemory if we parse a
>> large
>> >> >>> MIME
>> >> >>>  >> part upfront, since it reads the attachment to memory. May be
>> >> you
>> >> >>> can
>> >> >>>  >> have a larger limit with C than in Java, but ultimately
>> you'll
>> >> come
>> >> >>> to
>> >> >>>  >> a situation where you will not have enough memory to store
>> that
>> >> >>> MIME
>> >> >>>  >> part in memory in the parsing time, unless you write in to a
>> >> File
>> >> >>>  >> while parsing,..
>> >> >>>  >>
>> >> >>>  >> thanks,
>> >> >>>  >> Thilina
>> >> >>>  >>
>> >> >>>  >>
>> >> >>>  >>>
>> >> >>>  >>>  >
>> >> >>>  >>>  > thanks,
>> >> >>>  >>>  > Thilina
>> >> >>>  >>>  >
>> >> >>>  >>>  >  >and keeping the file name inside
>> >> >>>  >>>  > >  data_handler instead of the whole buffer. So the
>> service
>> >> or
>> >> >>> the
>> >> >>>  >>> client
>> >> >>>  >>>  > >  will get the file name instead of the buffered stream,
>> >> when
>> >> >>> it
>> >> >>>  >>> receives
>> >> >>>  >>>  > >  an attachment. This will not prevent buffering the
>> >> >>> attachment
>> >> >>> at
>> >> >>>  >>> the
>> >> >>>  >>>  > >  transport but will prevent keeping it inside the
>> om_tree
>> >> >>> till
>> >> >>> it
>> >> >>>  >>> reaches
>> >> >>>  >>>  > >  the receiver.
>> >> >>>  >>>  > >
>> >> >>>  >>>  > >  Before implementing this I would like to know your
>> >> >>> suggestions
>> >> >>>  >>> regarding
>> >> >>>  >>>  > >  this.
>> >> >>>  >>>  > >
>> >> >>>  >>>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
>> >> >>>  >>>  > >
>> >> >>>  >>>  > >  Thanks,
>> >> >>>  >>>  > >  -Manjula
>> >> >>>  >>>  > >
>> >> >>>  >>>  > >  --
>> >> >>>  >>>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
>> >> >>>  >>>  > >
>> >> >>>  >>>  > >
>> >> >>>  >>>  > >  ---------------------------------------------------------------------
>> >> >>>  >>>  > >  To unsubscribe, e-mail:
>> >> axis-c-dev-unsubscribe@ws.apache.org
>> >> >>>  >>>  > >  For additional commands, e-mail:
>> >> >>> axis-c-dev-help@ws.apache.org
>> >> >>>  >>>  > >
>> >> >>>  >>>  > >
>> >> >>>  >>>  >
>> >> >>>  >>>  >
>> >> >>>  >>>  >
>> >> >>>  >>>
>> >> >>>  >>>
>> >> >>>  >>>  ---------------------------------------------------------------------
>> >> >>>  >>>  To unsubscribe, e-mail:
>> axis-c-dev-unsubscribe@ws.apache.org
>> >> >>>  >>>  For additional commands, e-mail:
>> axis-c-dev-help@ws.apache.org
>> >> >>>  >>>
>> >> >>>  >>>
>> >> >>>  >>>
>> >> >>>  >>
>> >> >>>  >> --
>> >> >>>  >> Thilina Gunarathne - http://thilinag.blogspot.com
>> >> >>>  >>
>> >> >>>  >> ---------------------------------------------------------------------
>> >> >>>  >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> >> >>>  >> For additional commands, e-mail:
>> axis-c-dev-help@ws.apache.org
>> >> >>>  >>
>> >> >>>  >>
>> >> >>>  >>
>> >> >>>  >
>> >> >>>  >
>> >> >>>  > ---------------------------------------------------------------------
>> >> >>>  > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> >> >>>  > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>> >> >>>  >
>> >> >>>  >
>> >> >>>  >
>> >> >>>  >
>> >> >>>
>> >> >>>
>> >> >>>  --
>> >> >>>  Samisa Abeysinghe
>> >> >>>  Software Architect; WSO2 Inc.
>> >> >>>
>> >> >>>  http://www.wso2.com/ - "Oxygenating the Web Service Platform."
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>  ---------------------------------------------------------------------
>> >> >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> >> >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>> >> >>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Thilina Gunarathne - http://thilinag.blogspot.com
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> >> >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> >> > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>> >> >
>> >> >
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>> >>
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Manjula Peiris <ma...@wso2.com>.

On Sat, 2008-03-15 at 16:03 +0530, Senaka Fernando wrote:
> Hi Manjula,
> 
> Please read my reply inline.
> 
> > Hi Senaka,
> >
> > I am confused here. I think you are taking the discussion to the
> > beginning. Because in the receiving side we read till the end of the
> > stream. Please see my first mail.
> 
> No I'm not taking the discussion to the starting point. I'm rather
> proposing an alternative implementation. According to what I mention here,
> we will rather still read till the end of the stream. But, we will not
> buffer everything we read into memory. We will flush the buffer to a file
> once it exceeds a threshold. However, when we read beyond the buffer size,
> we will not directly copy the entire content to file without parsing it.
> Instead we will use our fixed-sized buffer to temporarily store the
> content before being flushed and then parse it and write it to file. Thus,
> the file will contain only the binary part. It will not contain the
> "--MIMEBoundary" statements etc. These, along with the file name(s) can be
> stored into the parsed attachment object created. Thus, the memory
> consumption will be limited to the size of the fixed buffer and we will
> use the file for storage. This mechanism gives us the added plus of not
> having to worry about re-parsing what is written to file as it has already
> being parsed once. Please note that MIME parsing DOES NOT require us to
> store the entire content in memory.

For me this is same as what Thilina is saying. So again I need to ask
the question what happened when the mime boundary is divided between two
reads. 

> 
> >
> > When sending writing part by part to the stream is same as chunking.
> > Because when sending either you should specify a content-length or
> > specified it as chunked.
> 
> No, it is not the same as chunking. What I meant here is that you need not
> read the entire content at once to memory and write to the stream in a
> single step. Rather we can read part by part and write it to the stream
> and repeat the process until the whole large file is written. In here you
> will still be using the Content Length. Chunking is a whole different
> story where you can transmit data as blocks. Using chunking we can send an
> arbitrary length of data of which the length is not pre-calculated. Now
> you might wonder how do we calculate the content-length without reading
> the entire content to the memory. Well, you can seek through the file and
> find out the size of the content to be written. Add to it the standard
> header block and MIME boundary demarcation string lengths and you will get
> the Content Length. This is a not at all expensive operation as the file
> seek will be scanning the file as a block without reading it to memory.
> The OS will manage it's efficiency.

Here also I think it is same as what Thilina is saying.

Thanks,
-Manjula.

> 
> >
> > -Manjula.
> 
> Regards,
> Senaka
> 
> >
> > On Sat, 2008-03-15 at 13:39 +0530, Senaka Fernando wrote:
> >> >>>  BTW, this whole discussion is about in path, that is reading an
> >> >>>  incomming message. How about the out path? We have the same
> >> problems
> >> >>>  when sending attachments. Right now, we read the whole file into
> >> >>> memory
> >> >>>  and then only we send over the wire.
> >> >> hmm... Why not write it in chunks.. Read a chunk from the file, then
> >> >> write it to the outstream.. Use size of the file for content-type
> >> >> calculation in case of non-chunking.. But mostly people will use
> >> >> chunking when using MTOM..
> >> >
> >> > No, chunking is not required. You also don't need to write the entire
> >> data
> >> > to be sent, to the stream at once. Because any HTTP Receiver will pull
> >> > from the stream until it sees a valid ending character sequence.
> >>
> >> It should rather read a length equal to content length. And the
> >> terminating sequence is for headers. Sorry for the confusion. Therefore,
> >> the HTTP Receiver will pull from the stream until it reads a content
> >> length or until an error occurs.
> >>
> >> >
> >> > I believe that you should be able to write part by part to the stream,
> >> and
> >> > send it, then reuse the buffer and write part 2, and send and so on.
> >> This
> >> > argument can be justified, because on the receiving end, we must read
> >> the
> >> > multi-part data until we encounter the mime boundary, unlike an
> >> ordinary
> >> > payload where it can be terminated by a valid terminating character
> >>
> >> Same here. We'll be reading a length equal to content length.
> >>
> >> > sequence . We'll only have issues if we are to write large soap
> >> payloads
> >> > which of course can be dealt with once we've implemented Session in
> >> > Axis2/C.
> >> >
> >> > Regards,
> >> > Senaka
> >> >
> >> >>
> >> >> thanks,
> >> >> Thilina
> >> >>
> >> >>
> >> >>>
> >> >>>  Samisa...
> >> >>>
> >> >>>
> >> >>>
> >> >>>  > Regards,
> >> >>>  > Senaka
> >> >>>  >
> >> >>>  >
> >> >>>  >> Hi,
> >> >>>  >>
> >> >>>  >>>  > In Axis2/Java case we do write the attachment content
> >> directly
> >> >>> from
> >> >>>  >>>  > the InputStream to the File when the attachment size is
> >> larger
> >> >>> than
> >> >>>  >>>  > the threshold.  This avoids loading the whole attachment to
> >> the
> >> >>>  >>> memory
> >> >>>  >>>  > at all.
> >> >>>  >>>
> >> >>>  >>>  In this case to find out the attachment size don't you need to
> >> do
> >> >>> any
> >> >>>  >>>  mime parsing? How do you find the attachment size with out
> >> >>> searching
> >> >>>  >>> for
> >> >>>  >>>  the mime boundaries ?
> >> >>>  >>>
> >> >>>  >> Yes.. MIME is a boundary based packaging mechanism and you does
> >> not
> >> >>>  >> need to specify the length for each of the parts...Even the HTTP
> >> >>>  >> content length is not there if the message is chunked.
> >> >>>  >>
> >> >>>  >> What we did in Axis2/Java to overcome this is to read the data
> >> to a
> >> >>>  >> byte[] buffer of up to a certain size (the size threshold). If
> >> >>> there
> >> >>>  >> are more data available in the mime part (if we have not
> >> >>> encountered
> >> >>>  >> the boundary yet) then we know this attachment is bigger than
> >> the
> >> >>>  >> threshold. So we create the temp file, pump the content in the
> >> >>> buffer
> >> >>>  >> to the file, then pump the rest of the stream to the file.. In
> >> this
> >> >>>  >> way we do not need to know the size of the attachment upfront..
> >> BTW
> >> >>> we
> >> >>>  >> do all of the above while we are parsing the MIME message at the
> >> >>> MIME
> >> >>>  >> parser level..
> >> >>>  >>
> >> >>>  >>
> >> >>>  >>>  > This has the plus point that the attachment size will be
> >> >>>  >>>  > limited only by the available free space in the Temp
> >> >>> Directory..
> >> >>>  >>>  > Will that be possible in Axis2/C.. Or is that wat you have
> >> in
> >> >>> mind
> >> >>>  >>> :)..
> >> >>>  >>>
> >> >>>  >>>  Yes this is possible.
> >> >>>  >>>
> >> >>>  >> But in Axis2/JAVA we will get a OutOfMemory if we parse a large
> >> >>> MIME
> >> >>>  >> part upfront, since it reads the attachment to memory. May be
> >> you
> >> >>> can
> >> >>>  >> have a larger limit with C than in Java, but ultimately you'll
> >> come
> >> >>> to
> >> >>>  >> a situation where you will not have enough memory to store that
> >> >>> MIME
> >> >>>  >> part in memory in the parsing time, unless you write in to a
> >> File
> >> >>>  >> while parsing,..
> >> >>>  >>
> >> >>>  >> thanks,
> >> >>>  >> Thilina
> >> >>>  >>
> >> >>>  >>
> >> >>>  >>>
> >> >>>  >>>  >
> >> >>>  >>>  > thanks,
> >> >>>  >>>  > Thilina
> >> >>>  >>>  >
> >> >>>  >>>  >  >and keeping the file name inside
> >> >>>  >>>  > >  data_handler instead of the whole buffer. So the service
> >> or
> >> >>> the
> >> >>>  >>> client
> >> >>>  >>>  > >  will get the file name instead of the buffered stream,
> >> when
> >> >>> it
> >> >>>  >>> receives
> >> >>>  >>>  > >  an attachment. This will not prevent buffering the
> >> >>> attachment
> >> >>> at
> >> >>>  >>> the
> >> >>>  >>>  > >  transport but will prevent keeping it inside the om_tree
> >> >>> till
> >> >>> it
> >> >>>  >>> reaches
> >> >>>  >>>  > >  the receiver.
> >> >>>  >>>  > >
> >> >>>  >>>  > >  Before implementing this I would like to know your
> >> >>> suggestions
> >> >>>  >>> regarding
> >> >>>  >>>  > >  this.
> >> >>>  >>>  > >
> >> >>>  >>>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
> >> >>>  >>>  > >
> >> >>>  >>>  > >  Thanks,
> >> >>>  >>>  > >  -Manjula
> >> >>>  >>>  > >
> >> >>>  >>>  > >  --
> >> >>>  >>>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
> >> >>>  >>>  > >
> >> >>>  >>>  > >
> >> >>>  >>>  > >  ---------------------------------------------------------------------
> >> >>>  >>>  > >  To unsubscribe, e-mail:
> >> axis-c-dev-unsubscribe@ws.apache.org
> >> >>>  >>>  > >  For additional commands, e-mail:
> >> >>> axis-c-dev-help@ws.apache.org
> >> >>>  >>>  > >
> >> >>>  >>>  > >
> >> >>>  >>>  >
> >> >>>  >>>  >
> >> >>>  >>>  >
> >> >>>  >>>
> >> >>>  >>>
> >> >>>  >>>  ---------------------------------------------------------------------
> >> >>>  >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> >>>  >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >> >>>  >>>
> >> >>>  >>>
> >> >>>  >>>
> >> >>>  >>
> >> >>>  >> --
> >> >>>  >> Thilina Gunarathne - http://thilinag.blogspot.com
> >> >>>  >>
> >> >>>  >> ---------------------------------------------------------------------
> >> >>>  >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> >>>  >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >> >>>  >>
> >> >>>  >>
> >> >>>  >>
> >> >>>  >
> >> >>>  >
> >> >>>  > ---------------------------------------------------------------------
> >> >>>  > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> >>>  > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >> >>>  >
> >> >>>  >
> >> >>>  >
> >> >>>  >
> >> >>>
> >> >>>
> >> >>>  --
> >> >>>  Samisa Abeysinghe
> >> >>>  Software Architect; WSO2 Inc.
> >> >>>
> >> >>>  http://www.wso2.com/ - "Oxygenating the Web Service Platform."
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>  ---------------------------------------------------------------------
> >> >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Thilina Gunarathne - http://thilinag.blogspot.com
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >> >>
> >> >>
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >> >
> >> >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >>
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Senaka Fernando <se...@wso2.com>.

Hi Manjula,

Please read my reply inline.

> Hi Senaka,
>
> I am confused here. I think you are taking the discussion to the
> beginning. Because in the receiving side we read till the end of the
> stream. Please see my first mail.

No I'm not taking the discussion to the starting point. I'm rather
proposing an alternative implementation. According to what I mention here,
we will rather still read till the end of the stream. But, we will not
buffer everything we read into memory. We will flush the buffer to a file
once it exceeds a threshold. However, when we read beyond the buffer size,
we will not directly copy the entire content to file without parsing it.
Instead we will use our fixed-sized buffer to temporarily store the
content before being flushed and then parse it and write it to file. Thus,
the file will contain only the binary part. It will not contain the
"--MIMEBoundary" statements etc. These, along with the file name(s) can be
stored into the parsed attachment object created. Thus, the memory
consumption will be limited to the size of the fixed buffer and we will
use the file for storage. This mechanism gives us the added plus of not
having to worry about re-parsing what is written to file as it has already
being parsed once. Please note that MIME parsing DOES NOT require us to
store the entire content in memory.

>
> When sending writing part by part to the stream is same as chunking.
> Because when sending either you should specify a content-length or
> specified it as chunked.

No, it is not the same as chunking. What I meant here is that you need not
read the entire content at once to memory and write to the stream in a
single step. Rather we can read part by part and write it to the stream
and repeat the process until the whole large file is written. In here you
will still be using the Content Length. Chunking is a whole different
story where you can transmit data as blocks. Using chunking we can send an
arbitrary length of data of which the length is not pre-calculated. Now
you might wonder how do we calculate the content-length without reading
the entire content to the memory. Well, you can seek through the file and
find out the size of the content to be written. Add to it the standard
header block and MIME boundary demarcation string lengths and you will get
the Content Length. This is a not at all expensive operation as the file
seek will be scanning the file as a block without reading it to memory.
The OS will manage it's efficiency.

>
> -Manjula.

Regards,
Senaka

>
> On Sat, 2008-03-15 at 13:39 +0530, Senaka Fernando wrote:
>> >>>  BTW, this whole discussion is about in path, that is reading an
>> >>>  incomming message. How about the out path? We have the same
>> problems
>> >>>  when sending attachments. Right now, we read the whole file into
>> >>> memory
>> >>>  and then only we send over the wire.
>> >> hmm... Why not write it in chunks.. Read a chunk from the file, then
>> >> write it to the outstream.. Use size of the file for content-type
>> >> calculation in case of non-chunking.. But mostly people will use
>> >> chunking when using MTOM..
>> >
>> > No, chunking is not required. You also don't need to write the entire
>> data
>> > to be sent, to the stream at once. Because any HTTP Receiver will pull
>> > from the stream until it sees a valid ending character sequence.
>>
>> It should rather read a length equal to content length. And the
>> terminating sequence is for headers. Sorry for the confusion. Therefore,
>> the HTTP Receiver will pull from the stream until it reads a content
>> length or until an error occurs.
>>
>> >
>> > I believe that you should be able to write part by part to the stream,
>> and
>> > send it, then reuse the buffer and write part 2, and send and so on.
>> This
>> > argument can be justified, because on the receiving end, we must read
>> the
>> > multi-part data until we encounter the mime boundary, unlike an
>> ordinary
>> > payload where it can be terminated by a valid terminating character
>>
>> Same here. We'll be reading a length equal to content length.
>>
>> > sequence . We'll only have issues if we are to write large soap
>> payloads
>> > which of course can be dealt with once we've implemented Session in
>> > Axis2/C.
>> >
>> > Regards,
>> > Senaka
>> >
>> >>
>> >> thanks,
>> >> Thilina
>> >>
>> >>
>> >>>
>> >>>  Samisa...
>> >>>
>> >>>
>> >>>
>> >>>  > Regards,
>> >>>  > Senaka
>> >>>  >
>> >>>  >
>> >>>  >> Hi,
>> >>>  >>
>> >>>  >>>  > In Axis2/Java case we do write the attachment content
>> directly
>> >>> from
>> >>>  >>>  > the InputStream to the File when the attachment size is
>> larger
>> >>> than
>> >>>  >>>  > the threshold.  This avoids loading the whole attachment to
>> the
>> >>>  >>> memory
>> >>>  >>>  > at all.
>> >>>  >>>
>> >>>  >>>  In this case to find out the attachment size don't you need to
>> do
>> >>> any
>> >>>  >>>  mime parsing? How do you find the attachment size with out
>> >>> searching
>> >>>  >>> for
>> >>>  >>>  the mime boundaries ?
>> >>>  >>>
>> >>>  >> Yes.. MIME is a boundary based packaging mechanism and you does
>> not
>> >>>  >> need to specify the length for each of the parts...Even the HTTP
>> >>>  >> content length is not there if the message is chunked.
>> >>>  >>
>> >>>  >> What we did in Axis2/Java to overcome this is to read the data
>> to a
>> >>>  >> byte[] buffer of up to a certain size (the size threshold). If
>> >>> there
>> >>>  >> are more data available in the mime part (if we have not
>> >>> encountered
>> >>>  >> the boundary yet) then we know this attachment is bigger than
>> the
>> >>>  >> threshold. So we create the temp file, pump the content in the
>> >>> buffer
>> >>>  >> to the file, then pump the rest of the stream to the file.. In
>> this
>> >>>  >> way we do not need to know the size of the attachment upfront..
>> BTW
>> >>> we
>> >>>  >> do all of the above while we are parsing the MIME message at the
>> >>> MIME
>> >>>  >> parser level..
>> >>>  >>
>> >>>  >>
>> >>>  >>>  > This has the plus point that the attachment size will be
>> >>>  >>>  > limited only by the available free space in the Temp
>> >>> Directory..
>> >>>  >>>  > Will that be possible in Axis2/C.. Or is that wat you have
>> in
>> >>> mind
>> >>>  >>> :)..
>> >>>  >>>
>> >>>  >>>  Yes this is possible.
>> >>>  >>>
>> >>>  >> But in Axis2/JAVA we will get a OutOfMemory if we parse a large
>> >>> MIME
>> >>>  >> part upfront, since it reads the attachment to memory. May be
>> you
>> >>> can
>> >>>  >> have a larger limit with C than in Java, but ultimately you'll
>> come
>> >>> to
>> >>>  >> a situation where you will not have enough memory to store that
>> >>> MIME
>> >>>  >> part in memory in the parsing time, unless you write in to a
>> File
>> >>>  >> while parsing,..
>> >>>  >>
>> >>>  >> thanks,
>> >>>  >> Thilina
>> >>>  >>
>> >>>  >>
>> >>>  >>>
>> >>>  >>>  >
>> >>>  >>>  > thanks,
>> >>>  >>>  > Thilina
>> >>>  >>>  >
>> >>>  >>>  >  >and keeping the file name inside
>> >>>  >>>  > >  data_handler instead of the whole buffer. So the service
>> or
>> >>> the
>> >>>  >>> client
>> >>>  >>>  > >  will get the file name instead of the buffered stream,
>> when
>> >>> it
>> >>>  >>> receives
>> >>>  >>>  > >  an attachment. This will not prevent buffering the
>> >>> attachment
>> >>> at
>> >>>  >>> the
>> >>>  >>>  > >  transport but will prevent keeping it inside the om_tree
>> >>> till
>> >>> it
>> >>>  >>> reaches
>> >>>  >>>  > >  the receiver.
>> >>>  >>>  > >
>> >>>  >>>  > >  Before implementing this I would like to know your
>> >>> suggestions
>> >>>  >>> regarding
>> >>>  >>>  > >  this.
>> >>>  >>>  > >
>> >>>  >>>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
>> >>>  >>>  > >
>> >>>  >>>  > >  Thanks,
>> >>>  >>>  > >  -Manjula
>> >>>  >>>  > >
>> >>>  >>>  > >  --
>> >>>  >>>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
>> >>>  >>>  > >
>> >>>  >>>  > >
>> >>>  >>>  > >  ---------------------------------------------------------------------
>> >>>  >>>  > >  To unsubscribe, e-mail:
>> axis-c-dev-unsubscribe@ws.apache.org
>> >>>  >>>  > >  For additional commands, e-mail:
>> >>> axis-c-dev-help@ws.apache.org
>> >>>  >>>  > >
>> >>>  >>>  > >
>> >>>  >>>  >
>> >>>  >>>  >
>> >>>  >>>  >
>> >>>  >>>
>> >>>  >>>
>> >>>  >>>  ---------------------------------------------------------------------
>> >>>  >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> >>>  >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>> >>>  >>>
>> >>>  >>>
>> >>>  >>>
>> >>>  >>
>> >>>  >> --
>> >>>  >> Thilina Gunarathne - http://thilinag.blogspot.com
>> >>>  >>
>> >>>  >> ---------------------------------------------------------------------
>> >>>  >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> >>>  >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>> >>>  >>
>> >>>  >>
>> >>>  >>
>> >>>  >
>> >>>  >
>> >>>  > ---------------------------------------------------------------------
>> >>>  > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> >>>  > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>> >>>  >
>> >>>  >
>> >>>  >
>> >>>  >
>> >>>
>> >>>
>> >>>  --
>> >>>  Samisa Abeysinghe
>> >>>  Software Architect; WSO2 Inc.
>> >>>
>> >>>  http://www.wso2.com/ - "Oxygenating the Web Service Platform."
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>  ---------------------------------------------------------------------
>> >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Thilina Gunarathne - http://thilinag.blogspot.com
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>> >>
>> >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Manjula Peiris <ma...@wso2.com>.

Hi Senaka,

I am confused here. I think you are taking the discussion to the
beginning. Because in the receiving side we read till the end of the
stream. Please see my first mail.

When sending writing part by part to the stream is same as chunking.
Because when sending either you should specify a content-length or
specified it as chunked.

-Manjula.

On Sat, 2008-03-15 at 13:39 +0530, Senaka Fernando wrote:
> >>>  BTW, this whole discussion is about in path, that is reading an
> >>>  incomming message. How about the out path? We have the same problems
> >>>  when sending attachments. Right now, we read the whole file into
> >>> memory
> >>>  and then only we send over the wire.
> >> hmm... Why not write it in chunks.. Read a chunk from the file, then
> >> write it to the outstream.. Use size of the file for content-type
> >> calculation in case of non-chunking.. But mostly people will use
> >> chunking when using MTOM..
> >
> > No, chunking is not required. You also don't need to write the entire data
> > to be sent, to the stream at once. Because any HTTP Receiver will pull
> > from the stream until it sees a valid ending character sequence.
> 
> It should rather read a length equal to content length. And the
> terminating sequence is for headers. Sorry for the confusion. Therefore,
> the HTTP Receiver will pull from the stream until it reads a content
> length or until an error occurs.
> 
> >
> > I believe that you should be able to write part by part to the stream, and
> > send it, then reuse the buffer and write part 2, and send and so on. This
> > argument can be justified, because on the receiving end, we must read the
> > multi-part data until we encounter the mime boundary, unlike an ordinary
> > payload where it can be terminated by a valid terminating character
> 
> Same here. We'll be reading a length equal to content length.
> 
> > sequence . We'll only have issues if we are to write large soap payloads
> > which of course can be dealt with once we've implemented Session in
> > Axis2/C.
> >
> > Regards,
> > Senaka
> >
> >>
> >> thanks,
> >> Thilina
> >>
> >>
> >>>
> >>>  Samisa...
> >>>
> >>>
> >>>
> >>>  > Regards,
> >>>  > Senaka
> >>>  >
> >>>  >
> >>>  >> Hi,
> >>>  >>
> >>>  >>>  > In Axis2/Java case we do write the attachment content directly
> >>> from
> >>>  >>>  > the InputStream to the File when the attachment size is larger
> >>> than
> >>>  >>>  > the threshold.  This avoids loading the whole attachment to the
> >>>  >>> memory
> >>>  >>>  > at all.
> >>>  >>>
> >>>  >>>  In this case to find out the attachment size don't you need to do
> >>> any
> >>>  >>>  mime parsing? How do you find the attachment size with out
> >>> searching
> >>>  >>> for
> >>>  >>>  the mime boundaries ?
> >>>  >>>
> >>>  >> Yes.. MIME is a boundary based packaging mechanism and you does not
> >>>  >> need to specify the length for each of the parts...Even the HTTP
> >>>  >> content length is not there if the message is chunked.
> >>>  >>
> >>>  >> What we did in Axis2/Java to overcome this is to read the data to a
> >>>  >> byte[] buffer of up to a certain size (the size threshold). If
> >>> there
> >>>  >> are more data available in the mime part (if we have not
> >>> encountered
> >>>  >> the boundary yet) then we know this attachment is bigger than the
> >>>  >> threshold. So we create the temp file, pump the content in the
> >>> buffer
> >>>  >> to the file, then pump the rest of the stream to the file.. In this
> >>>  >> way we do not need to know the size of the attachment upfront.. BTW
> >>> we
> >>>  >> do all of the above while we are parsing the MIME message at the
> >>> MIME
> >>>  >> parser level..
> >>>  >>
> >>>  >>
> >>>  >>>  > This has the plus point that the attachment size will be
> >>>  >>>  > limited only by the available free space in the Temp
> >>> Directory..
> >>>  >>>  > Will that be possible in Axis2/C.. Or is that wat you have in
> >>> mind
> >>>  >>> :)..
> >>>  >>>
> >>>  >>>  Yes this is possible.
> >>>  >>>
> >>>  >> But in Axis2/JAVA we will get a OutOfMemory if we parse a large
> >>> MIME
> >>>  >> part upfront, since it reads the attachment to memory. May be you
> >>> can
> >>>  >> have a larger limit with C than in Java, but ultimately you'll come
> >>> to
> >>>  >> a situation where you will not have enough memory to store that
> >>> MIME
> >>>  >> part in memory in the parsing time, unless you write in to a File
> >>>  >> while parsing,..
> >>>  >>
> >>>  >> thanks,
> >>>  >> Thilina
> >>>  >>
> >>>  >>
> >>>  >>>
> >>>  >>>  >
> >>>  >>>  > thanks,
> >>>  >>>  > Thilina
> >>>  >>>  >
> >>>  >>>  >  >and keeping the file name inside
> >>>  >>>  > >  data_handler instead of the whole buffer. So the service or
> >>> the
> >>>  >>> client
> >>>  >>>  > >  will get the file name instead of the buffered stream, when
> >>> it
> >>>  >>> receives
> >>>  >>>  > >  an attachment. This will not prevent buffering the
> >>> attachment
> >>> at
> >>>  >>> the
> >>>  >>>  > >  transport but will prevent keeping it inside the om_tree
> >>> till
> >>> it
> >>>  >>> reaches
> >>>  >>>  > >  the receiver.
> >>>  >>>  > >
> >>>  >>>  > >  Before implementing this I would like to know your
> >>> suggestions
> >>>  >>> regarding
> >>>  >>>  > >  this.
> >>>  >>>  > >
> >>>  >>>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
> >>>  >>>  > >
> >>>  >>>  > >  Thanks,
> >>>  >>>  > >  -Manjula
> >>>  >>>  > >
> >>>  >>>  > >  --
> >>>  >>>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
> >>>  >>>  > >
> >>>  >>>  > >
> >>>  >>>  > >  ---------------------------------------------------------------------
> >>>  >>>  > >  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >>>  >>>  > >  For additional commands, e-mail:
> >>> axis-c-dev-help@ws.apache.org
> >>>  >>>  > >
> >>>  >>>  > >
> >>>  >>>  >
> >>>  >>>  >
> >>>  >>>  >
> >>>  >>>
> >>>  >>>
> >>>  >>>  ---------------------------------------------------------------------
> >>>  >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >>>  >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >>>  >>>
> >>>  >>>
> >>>  >>>
> >>>  >>
> >>>  >> --
> >>>  >> Thilina Gunarathne - http://thilinag.blogspot.com
> >>>  >>
> >>>  >> ---------------------------------------------------------------------
> >>>  >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >>>  >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >>>  >>
> >>>  >>
> >>>  >>
> >>>  >
> >>>  >
> >>>  > ---------------------------------------------------------------------
> >>>  > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >>>  > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >>>  >
> >>>  >
> >>>  >
> >>>  >
> >>>
> >>>
> >>>  --
> >>>  Samisa Abeysinghe
> >>>  Software Architect; WSO2 Inc.
> >>>
> >>>  http://www.wso2.com/ - "Oxygenating the Web Service Platform."
> >>>
> >>>
> >>>
> >>>
> >>>  ---------------------------------------------------------------------
> >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Thilina Gunarathne - http://thilinag.blogspot.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Senaka Fernando <se...@wso2.com>.

>>>  BTW, this whole discussion is about in path, that is reading an
>>>  incomming message. How about the out path? We have the same problems
>>>  when sending attachments. Right now, we read the whole file into
>>> memory
>>>  and then only we send over the wire.
>> hmm... Why not write it in chunks.. Read a chunk from the file, then
>> write it to the outstream.. Use size of the file for content-type
>> calculation in case of non-chunking.. But mostly people will use
>> chunking when using MTOM..
>
> No, chunking is not required. You also don't need to write the entire data
> to be sent, to the stream at once. Because any HTTP Receiver will pull
> from the stream until it sees a valid ending character sequence.

It should rather read a length equal to content length. And the
terminating sequence is for headers. Sorry for the confusion. Therefore,
the HTTP Receiver will pull from the stream until it reads a content
length or until an error occurs.

>
> I believe that you should be able to write part by part to the stream, and
> send it, then reuse the buffer and write part 2, and send and so on. This
> argument can be justified, because on the receiving end, we must read the
> multi-part data until we encounter the mime boundary, unlike an ordinary
> payload where it can be terminated by a valid terminating character

Same here. We'll be reading a length equal to content length.

> sequence . We'll only have issues if we are to write large soap payloads
> which of course can be dealt with once we've implemented Session in
> Axis2/C.
>
> Regards,
> Senaka
>
>>
>> thanks,
>> Thilina
>>
>>
>>>
>>>  Samisa...
>>>
>>>
>>>
>>>  > Regards,
>>>  > Senaka
>>>  >
>>>  >
>>>  >> Hi,
>>>  >>
>>>  >>>  > In Axis2/Java case we do write the attachment content directly
>>> from
>>>  >>>  > the InputStream to the File when the attachment size is larger
>>> than
>>>  >>>  > the threshold.  This avoids loading the whole attachment to the
>>>  >>> memory
>>>  >>>  > at all.
>>>  >>>
>>>  >>>  In this case to find out the attachment size don't you need to do
>>> any
>>>  >>>  mime parsing? How do you find the attachment size with out
>>> searching
>>>  >>> for
>>>  >>>  the mime boundaries ?
>>>  >>>
>>>  >> Yes.. MIME is a boundary based packaging mechanism and you does not
>>>  >> need to specify the length for each of the parts...Even the HTTP
>>>  >> content length is not there if the message is chunked.
>>>  >>
>>>  >> What we did in Axis2/Java to overcome this is to read the data to a
>>>  >> byte[] buffer of up to a certain size (the size threshold). If
>>> there
>>>  >> are more data available in the mime part (if we have not
>>> encountered
>>>  >> the boundary yet) then we know this attachment is bigger than the
>>>  >> threshold. So we create the temp file, pump the content in the
>>> buffer
>>>  >> to the file, then pump the rest of the stream to the file.. In this
>>>  >> way we do not need to know the size of the attachment upfront.. BTW
>>> we
>>>  >> do all of the above while we are parsing the MIME message at the
>>> MIME
>>>  >> parser level..
>>>  >>
>>>  >>
>>>  >>>  > This has the plus point that the attachment size will be
>>>  >>>  > limited only by the available free space in the Temp
>>> Directory..
>>>  >>>  > Will that be possible in Axis2/C.. Or is that wat you have in
>>> mind
>>>  >>> :)..
>>>  >>>
>>>  >>>  Yes this is possible.
>>>  >>>
>>>  >> But in Axis2/JAVA we will get a OutOfMemory if we parse a large
>>> MIME
>>>  >> part upfront, since it reads the attachment to memory. May be you
>>> can
>>>  >> have a larger limit with C than in Java, but ultimately you'll come
>>> to
>>>  >> a situation where you will not have enough memory to store that
>>> MIME
>>>  >> part in memory in the parsing time, unless you write in to a File
>>>  >> while parsing,..
>>>  >>
>>>  >> thanks,
>>>  >> Thilina
>>>  >>
>>>  >>
>>>  >>>
>>>  >>>  >
>>>  >>>  > thanks,
>>>  >>>  > Thilina
>>>  >>>  >
>>>  >>>  >  >and keeping the file name inside
>>>  >>>  > >  data_handler instead of the whole buffer. So the service or
>>> the
>>>  >>> client
>>>  >>>  > >  will get the file name instead of the buffered stream, when
>>> it
>>>  >>> receives
>>>  >>>  > >  an attachment. This will not prevent buffering the
>>> attachment
>>> at
>>>  >>> the
>>>  >>>  > >  transport but will prevent keeping it inside the om_tree
>>> till
>>> it
>>>  >>> reaches
>>>  >>>  > >  the receiver.
>>>  >>>  > >
>>>  >>>  > >  Before implementing this I would like to know your
>>> suggestions
>>>  >>> regarding
>>>  >>>  > >  this.
>>>  >>>  > >
>>>  >>>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
>>>  >>>  > >
>>>  >>>  > >  Thanks,
>>>  >>>  > >  -Manjula
>>>  >>>  > >
>>>  >>>  > >  --
>>>  >>>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
>>>  >>>  > >
>>>  >>>  > >
>>>  >>>  > >  ---------------------------------------------------------------------
>>>  >>>  > >  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>>  >>>  > >  For additional commands, e-mail:
>>> axis-c-dev-help@ws.apache.org
>>>  >>>  > >
>>>  >>>  > >
>>>  >>>  >
>>>  >>>  >
>>>  >>>  >
>>>  >>>
>>>  >>>
>>>  >>>  ---------------------------------------------------------------------
>>>  >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>>  >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>>  >>>
>>>  >>>
>>>  >>>
>>>  >>
>>>  >> --
>>>  >> Thilina Gunarathne - http://thilinag.blogspot.com
>>>  >>
>>>  >> ---------------------------------------------------------------------
>>>  >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>>  >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>>  >>
>>>  >>
>>>  >>
>>>  >
>>>  >
>>>  > ---------------------------------------------------------------------
>>>  > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>>  > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>>  >
>>>  >
>>>  >
>>>  >
>>>
>>>
>>>  --
>>>  Samisa Abeysinghe
>>>  Software Architect; WSO2 Inc.
>>>
>>>  http://www.wso2.com/ - "Oxygenating the Web Service Platform."
>>>
>>>
>>>
>>>
>>>  ---------------------------------------------------------------------
>>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>>
>>>
>>
>>
>>
>> --
>> Thilina Gunarathne - http://thilinag.blogspot.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Senaka Fernando <se...@wso2.com>.

>>  BTW, this whole discussion is about in path, that is reading an
>>  incomming message. How about the out path? We have the same problems
>>  when sending attachments. Right now, we read the whole file into memory
>>  and then only we send over the wire.
> hmm... Why not write it in chunks.. Read a chunk from the file, then
> write it to the outstream.. Use size of the file for content-type
> calculation in case of non-chunking.. But mostly people will use
> chunking when using MTOM..

No, chunking is not required. You also don't need to write the entire data
to be sent, to the stream at once. Because any HTTP Receiver will pull
from the stream until it sees a valid ending character sequence.

I believe that you should be able to write part by part to the stream, and
send it, then reuse the buffer and write part 2, and send and so on. This
argument can be justified, because on the receiving end, we must read the
multi-part data until we encounter the mime boundary, unlike an ordinary
payload where it can be terminated by a valid terminating character
sequence . We'll only have issues if we are to write large soap payloads
which of course can be dealt with once we've implemented Session in
Axis2/C.

Regards,
Senaka

>
> thanks,
> Thilina
>
>
>>
>>  Samisa...
>>
>>
>>
>>  > Regards,
>>  > Senaka
>>  >
>>  >
>>  >> Hi,
>>  >>
>>  >>>  > In Axis2/Java case we do write the attachment content directly
>> from
>>  >>>  > the InputStream to the File when the attachment size is larger
>> than
>>  >>>  > the threshold.  This avoids loading the whole attachment to the
>>  >>> memory
>>  >>>  > at all.
>>  >>>
>>  >>>  In this case to find out the attachment size don't you need to do
>> any
>>  >>>  mime parsing? How do you find the attachment size with out
>> searching
>>  >>> for
>>  >>>  the mime boundaries ?
>>  >>>
>>  >> Yes.. MIME is a boundary based packaging mechanism and you does not
>>  >> need to specify the length for each of the parts...Even the HTTP
>>  >> content length is not there if the message is chunked.
>>  >>
>>  >> What we did in Axis2/Java to overcome this is to read the data to a
>>  >> byte[] buffer of up to a certain size (the size threshold). If there
>>  >> are more data available in the mime part (if we have not encountered
>>  >> the boundary yet) then we know this attachment is bigger than the
>>  >> threshold. So we create the temp file, pump the content in the
>> buffer
>>  >> to the file, then pump the rest of the stream to the file.. In this
>>  >> way we do not need to know the size of the attachment upfront.. BTW
>> we
>>  >> do all of the above while we are parsing the MIME message at the
>> MIME
>>  >> parser level..
>>  >>
>>  >>
>>  >>>  > This has the plus point that the attachment size will be
>>  >>>  > limited only by the available free space in the Temp Directory..
>>  >>>  > Will that be possible in Axis2/C.. Or is that wat you have in
>> mind
>>  >>> :)..
>>  >>>
>>  >>>  Yes this is possible.
>>  >>>
>>  >> But in Axis2/JAVA we will get a OutOfMemory if we parse a large MIME
>>  >> part upfront, since it reads the attachment to memory. May be you
>> can
>>  >> have a larger limit with C than in Java, but ultimately you'll come
>> to
>>  >> a situation where you will not have enough memory to store that MIME
>>  >> part in memory in the parsing time, unless you write in to a File
>>  >> while parsing,..
>>  >>
>>  >> thanks,
>>  >> Thilina
>>  >>
>>  >>
>>  >>>
>>  >>>  >
>>  >>>  > thanks,
>>  >>>  > Thilina
>>  >>>  >
>>  >>>  >  >and keeping the file name inside
>>  >>>  > >  data_handler instead of the whole buffer. So the service or
>> the
>>  >>> client
>>  >>>  > >  will get the file name instead of the buffered stream, when
>> it
>>  >>> receives
>>  >>>  > >  an attachment. This will not prevent buffering the attachment
>> at
>>  >>> the
>>  >>>  > >  transport but will prevent keeping it inside the om_tree till
>> it
>>  >>> reaches
>>  >>>  > >  the receiver.
>>  >>>  > >
>>  >>>  > >  Before implementing this I would like to know your
>> suggestions
>>  >>> regarding
>>  >>>  > >  this.
>>  >>>  > >
>>  >>>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
>>  >>>  > >
>>  >>>  > >  Thanks,
>>  >>>  > >  -Manjula
>>  >>>  > >
>>  >>>  > >  --
>>  >>>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
>>  >>>  > >
>>  >>>  > >
>>  >>>  > >  ---------------------------------------------------------------------
>>  >>>  > >  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>  >>>  > >  For additional commands, e-mail:
>> axis-c-dev-help@ws.apache.org
>>  >>>  > >
>>  >>>  > >
>>  >>>  >
>>  >>>  >
>>  >>>  >
>>  >>>
>>  >>>
>>  >>>  ---------------------------------------------------------------------
>>  >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>  >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>  >>>
>>  >>>
>>  >>>
>>  >>
>>  >> --
>>  >> Thilina Gunarathne - http://thilinag.blogspot.com
>>  >>
>>  >> ---------------------------------------------------------------------
>>  >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>  >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>  >>
>>  >>
>>  >>
>>  >
>>  >
>>  > ---------------------------------------------------------------------
>>  > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>  > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>  >
>>  >
>>  >
>>  >
>>
>>
>>  --
>>  Samisa Abeysinghe
>>  Software Architect; WSO2 Inc.
>>
>>  http://www.wso2.com/ - "Oxygenating the Web Service Platform."
>>
>>
>>
>>
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>
>>
>
>
>
> --
> Thilina Gunarathne - http://thilinag.blogspot.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Thilina Gunarathne <cs...@gmail.com>.

>  BTW, this whole discussion is about in path, that is reading an
>  incomming message. How about the out path? We have the same problems
>  when sending attachments. Right now, we read the whole file into memory
>  and then only we send over the wire.
hmm... Why not write it in chunks.. Read a chunk from the file, then
write it to the outstream.. Use size of the file for content-type
calculation in case of non-chunking.. But mostly people will use
chunking when using MTOM..

thanks,
Thilina


>
>  Samisa...
>
>
>
>  > Regards,
>  > Senaka
>  >
>  >
>  >> Hi,
>  >>
>  >>>  > In Axis2/Java case we do write the attachment content directly from
>  >>>  > the InputStream to the File when the attachment size is larger than
>  >>>  > the threshold.  This avoids loading the whole attachment to the
>  >>> memory
>  >>>  > at all.
>  >>>
>  >>>  In this case to find out the attachment size don't you need to do any
>  >>>  mime parsing? How do you find the attachment size with out searching
>  >>> for
>  >>>  the mime boundaries ?
>  >>>
>  >> Yes.. MIME is a boundary based packaging mechanism and you does not
>  >> need to specify the length for each of the parts...Even the HTTP
>  >> content length is not there if the message is chunked.
>  >>
>  >> What we did in Axis2/Java to overcome this is to read the data to a
>  >> byte[] buffer of up to a certain size (the size threshold). If there
>  >> are more data available in the mime part (if we have not encountered
>  >> the boundary yet) then we know this attachment is bigger than the
>  >> threshold. So we create the temp file, pump the content in the buffer
>  >> to the file, then pump the rest of the stream to the file.. In this
>  >> way we do not need to know the size of the attachment upfront.. BTW we
>  >> do all of the above while we are parsing the MIME message at the MIME
>  >> parser level..
>  >>
>  >>
>  >>>  > This has the plus point that the attachment size will be
>  >>>  > limited only by the available free space in the Temp Directory..
>  >>>  > Will that be possible in Axis2/C.. Or is that wat you have in mind
>  >>> :)..
>  >>>
>  >>>  Yes this is possible.
>  >>>
>  >> But in Axis2/JAVA we will get a OutOfMemory if we parse a large MIME
>  >> part upfront, since it reads the attachment to memory. May be you can
>  >> have a larger limit with C than in Java, but ultimately you'll come to
>  >> a situation where you will not have enough memory to store that MIME
>  >> part in memory in the parsing time, unless you write in to a File
>  >> while parsing,..
>  >>
>  >> thanks,
>  >> Thilina
>  >>
>  >>
>  >>>
>  >>>  >
>  >>>  > thanks,
>  >>>  > Thilina
>  >>>  >
>  >>>  >  >and keeping the file name inside
>  >>>  > >  data_handler instead of the whole buffer. So the service or the
>  >>> client
>  >>>  > >  will get the file name instead of the buffered stream, when it
>  >>> receives
>  >>>  > >  an attachment. This will not prevent buffering the attachment at
>  >>> the
>  >>>  > >  transport but will prevent keeping it inside the om_tree till it
>  >>> reaches
>  >>>  > >  the receiver.
>  >>>  > >
>  >>>  > >  Before implementing this I would like to know your suggestions
>  >>> regarding
>  >>>  > >  this.
>  >>>  > >
>  >>>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
>  >>>  > >
>  >>>  > >  Thanks,
>  >>>  > >  -Manjula
>  >>>  > >
>  >>>  > >  --
>  >>>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
>  >>>  > >
>  >>>  > >
>  >>>  > >  ---------------------------------------------------------------------
>  >>>  > >  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>  >>>  > >  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>  >>>  > >
>  >>>  > >
>  >>>  >
>  >>>  >
>  >>>  >
>  >>>
>  >>>
>  >>>  ---------------------------------------------------------------------
>  >>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>  >>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>  >>>
>  >>>
>  >>>
>  >>
>  >> --
>  >> Thilina Gunarathne - http://thilinag.blogspot.com
>  >>
>  >> ---------------------------------------------------------------------
>  >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>  >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>  >>
>  >>
>  >>
>  >
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>  > For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>  >
>  >
>  >
>  >
>
>
>  --
>  Samisa Abeysinghe
>  Software Architect; WSO2 Inc.
>
>  http://www.wso2.com/ - "Oxygenating the Web Service Platform."
>
>
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>



-- 
Thilina Gunarathne - http://thilinag.blogspot.com

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Samisa Abeysinghe <sa...@wso2.com>.

Senaka Fernando wrote:
> Hi Thilina,
>
> What is the upper limit?
>
> And, we also can use a buffer of say 1MB and keep reading the content into
> it until we find the end, and in the mean time write to a file, if we
> exceed this buffer. This way, we can do the necessary tracking in real
> time. Meaning that the MIME parts will be parsed while it is being cached
> to the file.
>
> You always can jump to a known location of the file, so no need to
> re-parse. This is just a matter of altering the file pointer.
>
> Also, Manjula, isn't the whole file read to memory at once? When we use
> fileopen() I mean. In that case you will have to cache to multiple files
> rather.
>   

I do not think this is an issue because once open you can keep on 
writing to it (and flush) till the whole file is read. That is in the 
inflow.

We also have to be aware that flushing too often too slows down the 
system, because it is an IO to the disk. This is where the optimal 
buffer size comes into play, if we keep on flushing too small buffer, it 
gets too slow.

BTW, this whole discussion is about in path, that is reading an 
incomming message. How about the out path? We have the same problems 
when sending attachments. Right now, we read the whole file into memory 
and then only we send over the wire.

Samisa...

> Regards,
> Senaka
>
>   
>> Hi,
>>     
>>>  > In Axis2/Java case we do write the attachment content directly from
>>>  > the InputStream to the File when the attachment size is larger than
>>>  > the threshold.  This avoids loading the whole attachment to the
>>> memory
>>>  > at all.
>>>
>>>  In this case to find out the attachment size don't you need to do any
>>>  mime parsing? How do you find the attachment size with out searching
>>> for
>>>  the mime boundaries ?
>>>       
>> Yes.. MIME is a boundary based packaging mechanism and you does not
>> need to specify the length for each of the parts...Even the HTTP
>> content length is not there if the message is chunked.
>>
>> What we did in Axis2/Java to overcome this is to read the data to a
>> byte[] buffer of up to a certain size (the size threshold). If there
>> are more data available in the mime part (if we have not encountered
>> the boundary yet) then we know this attachment is bigger than the
>> threshold. So we create the temp file, pump the content in the buffer
>> to the file, then pump the rest of the stream to the file.. In this
>> way we do not need to know the size of the attachment upfront.. BTW we
>> do all of the above while we are parsing the MIME message at the MIME
>> parser level..
>>
>>     
>>>  > This has the plus point that the attachment size will be
>>>  > limited only by the available free space in the Temp Directory..
>>>  > Will that be possible in Axis2/C.. Or is that wat you have in mind
>>> :)..
>>>
>>>  Yes this is possible.
>>>       
>> But in Axis2/JAVA we will get a OutOfMemory if we parse a large MIME
>> part upfront, since it reads the attachment to memory. May be you can
>> have a larger limit with C than in Java, but ultimately you'll come to
>> a situation where you will not have enough memory to store that MIME
>> part in memory in the parsing time, unless you write in to a File
>> while parsing,..
>>
>> thanks,
>> Thilina
>>
>>     
>>>
>>>  >
>>>  > thanks,
>>>  > Thilina
>>>  >
>>>  >  >and keeping the file name inside
>>>  > >  data_handler instead of the whole buffer. So the service or the
>>> client
>>>  > >  will get the file name instead of the buffered stream, when it
>>> receives
>>>  > >  an attachment. This will not prevent buffering the attachment at
>>> the
>>>  > >  transport but will prevent keeping it inside the om_tree till it
>>> reaches
>>>  > >  the receiver.
>>>  > >
>>>  > >  Before implementing this I would like to know your suggestions
>>> regarding
>>>  > >  this.
>>>  > >
>>>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
>>>  > >
>>>  > >  Thanks,
>>>  > >  -Manjula
>>>  > >
>>>  > >  --
>>>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
>>>  > >
>>>  > >
>>>  > >  ---------------------------------------------------------------------
>>>  > >  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>>  > >  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>>  > >
>>>  > >
>>>  >
>>>  >
>>>  >
>>>
>>>
>>>  ---------------------------------------------------------------------
>>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>>
>>>
>>>       
>>
>> --
>> Thilina Gunarathne - http://thilinag.blogspot.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>
>>
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>
>
>   


-- 
Samisa Abeysinghe 
Software Architect; WSO2 Inc.

http://www.wso2.com/ - "Oxygenating the Web Service Platform."


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Senaka Fernando <se...@wso2.com>.

Hi Thilina,

What is the upper limit?

And, we also can use a buffer of say 1MB and keep reading the content into
it until we find the end, and in the mean time write to a file, if we
exceed this buffer. This way, we can do the necessary tracking in real
time. Meaning that the MIME parts will be parsed while it is being cached
to the file.

You always can jump to a known location of the file, so no need to
re-parse. This is just a matter of altering the file pointer.

Also, Manjula, isn't the whole file read to memory at once? When we use
fileopen() I mean. In that case you will have to cache to multiple files
rather.

Regards,
Senaka

> Hi,
>>  > In Axis2/Java case we do write the attachment content directly from
>>  > the InputStream to the File when the attachment size is larger than
>>  > the threshold.  This avoids loading the whole attachment to the
>> memory
>>  > at all.
>>
>>  In this case to find out the attachment size don't you need to do any
>>  mime parsing? How do you find the attachment size with out searching
>> for
>>  the mime boundaries ?
> Yes.. MIME is a boundary based packaging mechanism and you does not
> need to specify the length for each of the parts...Even the HTTP
> content length is not there if the message is chunked.
>
> What we did in Axis2/Java to overcome this is to read the data to a
> byte[] buffer of up to a certain size (the size threshold). If there
> are more data available in the mime part (if we have not encountered
> the boundary yet) then we know this attachment is bigger than the
> threshold. So we create the temp file, pump the content in the buffer
> to the file, then pump the rest of the stream to the file.. In this
> way we do not need to know the size of the attachment upfront.. BTW we
> do all of the above while we are parsing the MIME message at the MIME
> parser level..
>
>>  > This has the plus point that the attachment size will be
>>  > limited only by the available free space in the Temp Directory..
>>  > Will that be possible in Axis2/C.. Or is that wat you have in mind
>> :)..
>>
>>  Yes this is possible.
> But in Axis2/JAVA we will get a OutOfMemory if we parse a large MIME
> part upfront, since it reads the attachment to memory. May be you can
> have a larger limit with C than in Java, but ultimately you'll come to
> a situation where you will not have enough memory to store that MIME
> part in memory in the parsing time, unless you write in to a File
> while parsing,..
>
> thanks,
> Thilina
>
>>
>>
>>
>>  >
>>  > thanks,
>>  > Thilina
>>  >
>>  >  >and keeping the file name inside
>>  > >  data_handler instead of the whole buffer. So the service or the
>> client
>>  > >  will get the file name instead of the buffered stream, when it
>> receives
>>  > >  an attachment. This will not prevent buffering the attachment at
>> the
>>  > >  transport but will prevent keeping it inside the om_tree till it
>> reaches
>>  > >  the receiver.
>>  > >
>>  > >  Before implementing this I would like to know your suggestions
>> regarding
>>  > >  this.
>>  > >
>>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
>>  > >
>>  > >  Thanks,
>>  > >  -Manjula
>>  > >
>>  > >  --
>>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
>>  > >
>>  > >
>>  > >  ---------------------------------------------------------------------
>>  > >  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>  > >  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>  > >
>>  > >
>>  >
>>  >
>>  >
>>
>>
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>>
>>
>
>
>
> --
> Thilina Gunarathne - http://thilinag.blogspot.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Thilina Gunarathne <cs...@gmail.com>.

Hi,
>  > In Axis2/Java case we do write the attachment content directly from
>  > the InputStream to the File when the attachment size is larger than
>  > the threshold.  This avoids loading the whole attachment to the memory
>  > at all.
>
>  In this case to find out the attachment size don't you need to do any
>  mime parsing? How do you find the attachment size with out searching for
>  the mime boundaries ?
Yes.. MIME is a boundary based packaging mechanism and you does not
need to specify the length for each of the parts...Even the HTTP
content length is not there if the message is chunked.

What we did in Axis2/Java to overcome this is to read the data to a
byte[] buffer of up to a certain size (the size threshold). If there
are more data available in the mime part (if we have not encountered
the boundary yet) then we know this attachment is bigger than the
threshold. So we create the temp file, pump the content in the buffer
to the file, then pump the rest of the stream to the file.. In this
way we do not need to know the size of the attachment upfront.. BTW we
do all of the above while we are parsing the MIME message at the MIME
parser level..

>  > This has the plus point that the attachment size will be
>  > limited only by the available free space in the Temp Directory..
>  > Will that be possible in Axis2/C.. Or is that wat you have in mind :)..
>
>  Yes this is possible.
But in Axis2/JAVA we will get a OutOfMemory if we parse a large MIME
part upfront, since it reads the attachment to memory. May be you can
have a larger limit with C than in Java, but ultimately you'll come to
a situation where you will not have enough memory to store that MIME
part in memory in the parsing time, unless you write in to a File
while parsing,..

thanks,
Thilina

>
>
>
>  >
>  > thanks,
>  > Thilina
>  >
>  >  >and keeping the file name inside
>  > >  data_handler instead of the whole buffer. So the service or the client
>  > >  will get the file name instead of the buffered stream, when it receives
>  > >  an attachment. This will not prevent buffering the attachment at the
>  > >  transport but will prevent keeping it inside the om_tree till it reaches
>  > >  the receiver.
>  > >
>  > >  Before implementing this I would like to know your suggestions regarding
>  > >  this.
>  > >
>  > >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
>  > >
>  > >  Thanks,
>  > >  -Manjula
>  > >
>  > >  --
>  > >  Manjula Peiris: http://manjula-peiris.blogspot.com/
>  > >
>  > >
>  > >  ---------------------------------------------------------------------
>  > >  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>  > >  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>  > >
>  > >
>  >
>  >
>  >
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>



-- 
Thilina Gunarathne - http://thilinag.blogspot.com

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Manjula Peiris <ma...@wso2.com>.

On Fri, 2008-03-14 at 10:42 -0400, Thilina Gunarathne wrote:
> Hi Manjula,
> Sounds great..
> 
> Just a small clarification..
> >  What we can do in our implementation is after extracting the binary
> >  content write that to a file
> In Axis2/Java case we do write the attachment content directly from
> the InputStream to the File when the attachment size is larger than
> the threshold.  This avoids loading the whole attachment to the memory
> at all. 

In this case to find out the attachment size don't you need to do any
mime parsing? How do you find the attachment size with out searching for
the mime boundaries ? In Axis2/C the problem we have is we will be aware
of the size of the attachment only after extracting the attachments. We
can't determine it from the content length. Because a message with a
large soap envelope or a message with large number of small attachments
will have a larger content length.


> This has the plus point that the attachment size will be
> limited only by the available free space in the Temp Directory..
> Will that be possible in Axis2/C.. Or is that wat you have in mind :)..

Yes this is possible.

> 
> thanks,
> Thilina
> 
>  >and keeping the file name inside
> >  data_handler instead of the whole buffer. So the service or the client
> >  will get the file name instead of the buffered stream, when it receives
> >  an attachment. This will not prevent buffering the attachment at the
> >  transport but will prevent keeping it inside the om_tree till it reaches
> >  the receiver.
> >
> >  Before implementing this I would like to know your suggestions regarding
> >  this.
> >
> >  [1] https://issues.apache.org/jira/browse/AXIS2C-672
> >
> >  Thanks,
> >  -Manjula
> >
> >  --
> >  Manjula Peiris: http://manjula-peiris.blogspot.com/
> >
> >
> >  ---------------------------------------------------------------------
> >  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> >  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
> >
> >
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Thilina Gunarathne <cs...@gmail.com>.

Hi Manjula,
Sounds great..

Just a small clarification..
>  What we can do in our implementation is after extracting the binary
>  content write that to a file
In Axis2/Java case we do write the attachment content directly from
the InputStream to the File when the attachment size is larger than
the threshold.  This avoids loading the whole attachment to the memory
at all. This has the plus point that the attachment size will be
limited only by the available free space in the Temp Directory..
Will that be possible in Axis2/C.. Or is that wat you have in mind :)..

thanks,
Thilina

 >and keeping the file name inside
>  data_handler instead of the whole buffer. So the service or the client
>  will get the file name instead of the buffered stream, when it receives
>  an attachment. This will not prevent buffering the attachment at the
>  transport but will prevent keeping it inside the om_tree till it reaches
>  the receiver.
>
>  Before implementing this I would like to know your suggestions regarding
>  this.
>
>  [1] https://issues.apache.org/jira/browse/AXIS2C-672
>
>  Thanks,
>  -Manjula
>
>  --
>  Manjula Peiris: http://manjula-peiris.blogspot.com/
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
>  For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>



-- 
Thilina Gunarathne - http://thilinag.blogspot.com

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org

Re: Caching support for large attachments

Posted by Dinesh Premalal <xy...@gmail.com>.

Manjula Peiris <ma...@wso2.com> writes:

> Before implementing this I would like to know your suggestions regarding
> this.

+1

thanks,
Dinesh
-- 
http://nethu.org

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org