You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commons-dev@ws.apache.org by "Andreas Veithen (JIRA)" <ji...@apache.org> on 2009/01/01 17:48:44 UTC

[jira] Commented: (WSCOMMONS-424) BufferUtils#doesDataHandlerExceedLimit needs review

    [ https://issues.apache.org/jira/browse/WSCOMMONS-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660198#action_12660198 ] 

Andreas Veithen commented on WSCOMMONS-424:
-------------------------------------------

Additional issue:

If the DataHandler was constructed from an Object rather than a DataSource, a call to DataSource#getInputStream() will start a new thread and return a PipedInputStream. This is so for Geronimo's as well as Sun's JAF implementaion. The reason is that DataContentHandler only has a writeTo and no getInputStream method. Obviously starting a new thread just to check the size of the data is an overhead that should be avoided.

> BufferUtils#doesDataHandlerExceedLimit needs review
> ---------------------------------------------------
>
>                 Key: WSCOMMONS-424
>                 URL: https://issues.apache.org/jira/browse/WSCOMMONS-424
>             Project: WS-Commons
>          Issue Type: Bug
>          Components: AXIOM
>            Reporter: Andreas Veithen
>
> The code in BufferUtils#doesDataHandlerExceedLimit has several issues and should be reviewed:
> 1) The code never closes the InputStream requested from the DataSource. This might have unexpected consequences if the DataSource is a FileDataSource.
> 2) The code assumes that there are DataSources that can only be read once. Indeed the code in BufferUtils#getInputStream throws an exception if the input stream returned from the DataSource doesn't support mark ("Stream does not support mark, Cannot read the stream as DataSource will be consumed."). This is plain wrong, because by definition a DataSource can be read several times (this is the very reason for the existence of this interface). If there are DataSource implementations that can be "consumed", i.e. read only once, they need to be fixed.
> 3) The code assumes that the end of stream is reached when InputStream#available() returns 0. This is wrong.
> 4) doesDataHandlerExceedLimit tries to establish a lower bound on the DataSource size by reading data from it. This is suboptimal, because in most cases this can be achieved without actually reading a single byte from the data source:
> * If the DataSource is a FileDataSource, it is possible to get the File object and the size of the DataSource can be determined from the file size. This is much less costly than to open the file and read data from it.
> * InputStream#available() can always be used to get a lower limit on the stream size. For a ByteArrayDataSource this actually returns the size directly.
> * InputStream#skip can be used to advance in the stream without reading from it.
> Only if the InputStream implementation returned by the DataSource neither implements available nor skip (this is possible), it is necessary to actually read the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.