You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Boris Kolpackov (JIRA)" <xe...@xml.apache.org> on 2008/06/19 11:25:45 UTC

[jira] Commented: (XERCESC-1805) Accessing the HTTP Content-Type

    [ https://issues.apache.org/jira/browse/XERCESC-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606306#action_12606306 ] 

Boris Kolpackov commented on XERCESC-1805:
------------------------------------------

Hi John,

I reviewed your patch and it looks good overall. I have one suggestion,
however. It is about the getContentType() function. Its documentation
says that it returns a content type in some unspecified form. For
HTTP it it the value of the Content-Type header. For other stream types
it can be something else. As a result, I don't see how this function
can be used unless the user knows what to expect from it (that is,
the user knows the stream is HTTP and the getContentType() will
return the Content-Type header).

This made me think that perhaps placing getContentType() in the
BinInputStream interface is not a best choice. Perhaps it would
be better to create the BinHTTPInputStream interface which extends
the BinInputStream and adds a (pure virtual) getContentType(). Then
all HTTP stream implementations will derive from BinHTTPInputStream.
We will also change the makeNew() function in XMLNetAccessor to return
BinHTTPInputStream instead of BinInputStream.

Even if the user has a BinInputStream instead of BinHTTPInputStream,
he can always try to dynamic_cast to BinHTTPInputStream to see if the
content type is available. In fact I think this is the only way for him
to know what to expect from getContentType().

So here is how I suggest we change the current patch: Create 
src/xercesc/util/BinHTTPInputStream.hpp with the BinHTTPInputStream
interface. This accidently conflicts with the common implementation
that you have created for Winsock and Socket. We can rename those
files to

BinHTTPInputStreamCommon.hpp
BinHTTPInputStreamCommon.cpp 

and move them to the NetAccessor directory (they, BTW, only need
to be compiled when Socket or Winsock accessors are used so it
would make sense to move them to NetAccessor).

Let me know if you are ok with this and I will go ahead and apply
the patch and make the changes.

Boris


> Accessing the HTTP Content-Type
> -------------------------------
>
>                 Key: XERCESC-1805
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1805
>             Project: Xerces-C++
>          Issue Type: Improvement
>          Components: Miscellaneous
>    Affects Versions: 3.0.0
>            Reporter: John Snelson
>             Fix For: 3.0.0
>
>         Attachments: xercesc_3_0_content_type_patch
>
>
> A lot of algorithms need access to the HTTP "Content-Type" header, to decide how to parse a file, or what encoding it is in - for instance see XSLT 2.0's unparsed-text() function:
> http://www.w3.org/TR/xslt20/#unparsed-text
> We should add a method, BinInputStream::getContentType(), and implement it in the HTTP input stream implementations. The method should return 0 when the content type is not available, like for file input streams.
> In addition, the socket and WinSock HTTP InputStream implementations have a number of problems:
> 1) They used fixed buffers which can result in buffer overflow.
> 2) They needlessly duplicate a whole load of code that could be shared.
> 3) They transcode to the local code page rather than "ISO8859-1".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org