You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Tim Ellison <t....@gmail.com> on 2007/08/31 00:18:11 UTC
Re: [jira] Created: (HARMONY-4699) [classlib][luni] URLConnection.getContentType()
works with files incorrectly
The MIME types for a given extension are defined here [1] which we took
from httpd's view of the world. So while it would be trivial to change
them to be the same as the RI, I'm inclined to:
- leave rtf as text/rtf
- add java to our list as text/plain
- leave doc as application/msword
then figure out how to snoop the stream for other types.
[1]
http://svn.apache.org/viewvc/harmony/enhanced/classlib/trunk/depends/files/content-types.properties?revision=494047&view=markup
Thoughts?
Tim
Vasily Zakharov (JIRA) wrote:
> [classlib][luni] URLConnection.getContentType() works with files incorrectly
> ----------------------------------------------------------------------------
>
> Key: HARMONY-4699
> URL: https://issues.apache.org/jira/browse/HARMONY-4699
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Reporter: Vasily Zakharov
>
>
> In Harmony implementation, java.net.URLConnection.getContentType() works incorrectly when addresses a file URL:
>
> 1. For files with .rtf extension, RI returns "application/rtf", while Harmony returns "text/rtf".
>
> 2. For files with .java extension, RI returns "text/plain", while Harmony returns "content/unknown".
>
> 3. For files with .doc extension, RI returns "content/unknown", while Harmony returns "application/msword". The same is true for other known extensions.
>
> 4. For files with unrecognized extension and with HTML content, RI returns "text/html", while Harmony returns "content/unknown".
>
> Items 1 and 2 look like a minor issues that would better be fixed for compatibility with RI.
>
> Item 3 looks like a non-bug difference, as Harmony behaves clearly better than RI in these cases.
>
> Item 4 looks like a serious bug, as RI clearly looks into file content for the file type, and Harmony does not. Looks like org.apache.harmony.luni.internal.net.www.protocol.file.FileURLConnection.getContentType() needs to be fixed to use guessContentTypeFromStream() in addition to guessContentTypeFromName().
>
> The attached archive contains the reproducer with some test files it uses. Here's the reproducer code:
>
> public class Test {
> static void printContentType(String fileName) throws java.io.IOException {
> System.out.println(fileName + ": " + new java.net.URL("file:" + fileName).openConnection().getContentType());
> }
> public static void main(String argv[]) {
> try {
> printContentType("test.rtf");
> printContentType("Test.java");
> printContentType("test.doc");
> printContentType("test.htx");
> } catch (Exception e) {
> e.printStackTrace(System.out);
> }
> }
> }
>
> Output on RI:
>
> test.rtf: application/rtf
> Test.java: text/plain
> test.doc: content/unknown
> test.htx: text/html
>
> Output on Harmony:
>
> test.rtf: text/rtf
> Test.java: content/unknown
> test.doc: application/msword
> test.htx: content/unknown
>
> This issue is a blocker for HARMONY-4696, as on RI JEditorPane.getContentType() should be based on URLConnection.getContentType() that now works incorrectly.
>
>
Re: [jira] Created: (HARMONY-4699) [classlib][luni] URLConnection.getContentType() works with files incorrectly
Posted by Alexey Petrenko <al...@gmail.com>.
Thanks.
SY, Alexey
2007/8/31, Tim Ellison <t....@gmail.com>:
> Alexey Petrenko wrote:
> > Looks like both application/rtf and text/rtf are correct from IANA [1]
> > point of view.
> > So I do not see any harm to follow RI's behavior in this case.
> >
> > By the way application/rtf specification looks more fresh then text/rtf
>
> Fair enough, I'll change it to application/rtf
>
> Regards,
> Tim
>
> > SY, Alexey
> >
> > 1. http://www.iana.org/assignments/media-types/
> >
> > 2007/8/31, Tim Ellison <t....@gmail.com>:
> >> The MIME types for a given extension are defined here [1] which we took
> >> from httpd's view of the world. So while it would be trivial to change
> >> them to be the same as the RI, I'm inclined to:
> >> - leave rtf as text/rtf
> >> - add java to our list as text/plain
> >> - leave doc as application/msword
> >> then figure out how to snoop the stream for other types.
> >>
> >> [1]
> >> http://svn.apache.org/viewvc/harmony/enhanced/classlib/trunk/depends/files/content-types.properties?revision=494047&view=markup
> >>
> >> Thoughts?
> >> Tim
> >>
> >>
> >> Vasily Zakharov (JIRA) wrote:
> >>> [classlib][luni] URLConnection.getContentType() works with files incorrectly
> >>> ----------------------------------------------------------------------------
> >>>
> >>> Key: HARMONY-4699
> >>> URL: https://issues.apache.org/jira/browse/HARMONY-4699
> >>> Project: Harmony
> >>> Issue Type: Bug
> >>> Components: Classlib
> >>> Reporter: Vasily Zakharov
> >>>
> >>>
> >>> In Harmony implementation, java.net.URLConnection.getContentType() works incorrectly when addresses a file URL:
> >>>
> >>> 1. For files with .rtf extension, RI returns "application/rtf", while Harmony returns "text/rtf".
> >>>
> >>> 2. For files with .java extension, RI returns "text/plain", while Harmony returns "content/unknown".
> >>>
> >>> 3. For files with .doc extension, RI returns "content/unknown", while Harmony returns "application/msword". The same is true for other known extensions.
> >>>
> >>> 4. For files with unrecognized extension and with HTML content, RI returns "text/html", while Harmony returns "content/unknown".
> >>>
> >>> Items 1 and 2 look like a minor issues that would better be fixed for compatibility with RI.
> >>>
> >>> Item 3 looks like a non-bug difference, as Harmony behaves clearly better than RI in these cases.
> >>>
> >>> Item 4 looks like a serious bug, as RI clearly looks into file content for the file type, and Harmony does not. Looks like org.apache.harmony.luni.internal.net.www.protocol.file.FileURLConnection.getContentType() needs to be fixed to use guessContentTypeFromStream() in addition to guessContentTypeFromName().
> >>>
> >>> The attached archive contains the reproducer with some test files it uses. Here's the reproducer code:
> >>>
> >>> public class Test {
> >>> static void printContentType(String fileName) throws java.io.IOException {
> >>> System.out.println(fileName + ": " + new java.net.URL("file:" + fileName).openConnection().getContentType());
> >>> }
> >>> public static void main(String argv[]) {
> >>> try {
> >>> printContentType("test.rtf");
> >>> printContentType("Test.java");
> >>> printContentType("test.doc");
> >>> printContentType("test.htx");
> >>> } catch (Exception e) {
> >>> e.printStackTrace(System.out);
> >>> }
> >>> }
> >>> }
> >>>
> >>> Output on RI:
> >>>
> >>> test.rtf: application/rtf
> >>> Test.java: text/plain
> >>> test.doc: content/unknown
> >>> test.htx: text/html
> >>>
> >>> Output on Harmony:
> >>>
> >>> test.rtf: text/rtf
> >>> Test.java: content/unknown
> >>> test.doc: application/msword
> >>> test.htx: content/unknown
> >>>
> >>> This issue is a blocker for HARMONY-4696, as on RI JEditorPane.getContentType() should be based on URLConnection.getContentType() that now works incorrectly.
> >>>
> >>>
> >
>
Re: [jira] Created: (HARMONY-4699) [classlib][luni] URLConnection.getContentType()
works with files incorrectly
Posted by Tim Ellison <t....@gmail.com>.
Alexey Petrenko wrote:
> Looks like both application/rtf and text/rtf are correct from IANA [1]
> point of view.
> So I do not see any harm to follow RI's behavior in this case.
>
> By the way application/rtf specification looks more fresh then text/rtf
Fair enough, I'll change it to application/rtf
Regards,
Tim
> SY, Alexey
>
> 1. http://www.iana.org/assignments/media-types/
>
> 2007/8/31, Tim Ellison <t....@gmail.com>:
>> The MIME types for a given extension are defined here [1] which we took
>> from httpd's view of the world. So while it would be trivial to change
>> them to be the same as the RI, I'm inclined to:
>> - leave rtf as text/rtf
>> - add java to our list as text/plain
>> - leave doc as application/msword
>> then figure out how to snoop the stream for other types.
>>
>> [1]
>> http://svn.apache.org/viewvc/harmony/enhanced/classlib/trunk/depends/files/content-types.properties?revision=494047&view=markup
>>
>> Thoughts?
>> Tim
>>
>>
>> Vasily Zakharov (JIRA) wrote:
>>> [classlib][luni] URLConnection.getContentType() works with files incorrectly
>>> ----------------------------------------------------------------------------
>>>
>>> Key: HARMONY-4699
>>> URL: https://issues.apache.org/jira/browse/HARMONY-4699
>>> Project: Harmony
>>> Issue Type: Bug
>>> Components: Classlib
>>> Reporter: Vasily Zakharov
>>>
>>>
>>> In Harmony implementation, java.net.URLConnection.getContentType() works incorrectly when addresses a file URL:
>>>
>>> 1. For files with .rtf extension, RI returns "application/rtf", while Harmony returns "text/rtf".
>>>
>>> 2. For files with .java extension, RI returns "text/plain", while Harmony returns "content/unknown".
>>>
>>> 3. For files with .doc extension, RI returns "content/unknown", while Harmony returns "application/msword". The same is true for other known extensions.
>>>
>>> 4. For files with unrecognized extension and with HTML content, RI returns "text/html", while Harmony returns "content/unknown".
>>>
>>> Items 1 and 2 look like a minor issues that would better be fixed for compatibility with RI.
>>>
>>> Item 3 looks like a non-bug difference, as Harmony behaves clearly better than RI in these cases.
>>>
>>> Item 4 looks like a serious bug, as RI clearly looks into file content for the file type, and Harmony does not. Looks like org.apache.harmony.luni.internal.net.www.protocol.file.FileURLConnection.getContentType() needs to be fixed to use guessContentTypeFromStream() in addition to guessContentTypeFromName().
>>>
>>> The attached archive contains the reproducer with some test files it uses. Here's the reproducer code:
>>>
>>> public class Test {
>>> static void printContentType(String fileName) throws java.io.IOException {
>>> System.out.println(fileName + ": " + new java.net.URL("file:" + fileName).openConnection().getContentType());
>>> }
>>> public static void main(String argv[]) {
>>> try {
>>> printContentType("test.rtf");
>>> printContentType("Test.java");
>>> printContentType("test.doc");
>>> printContentType("test.htx");
>>> } catch (Exception e) {
>>> e.printStackTrace(System.out);
>>> }
>>> }
>>> }
>>>
>>> Output on RI:
>>>
>>> test.rtf: application/rtf
>>> Test.java: text/plain
>>> test.doc: content/unknown
>>> test.htx: text/html
>>>
>>> Output on Harmony:
>>>
>>> test.rtf: text/rtf
>>> Test.java: content/unknown
>>> test.doc: application/msword
>>> test.htx: content/unknown
>>>
>>> This issue is a blocker for HARMONY-4696, as on RI JEditorPane.getContentType() should be based on URLConnection.getContentType() that now works incorrectly.
>>>
>>>
>
Re: [classlib][luni] guessing content mime types
Posted by Oliver Deakin <ol...@googlemail.com>.
Hi Tim,
There is FindMimeFromData() [1] defined in urlmon.h which may be useful
- from [2] it appears that this is the system function used by IE to
determine mime types.
Regards,
Oliver
[1] http://msdn2.microsoft.com/en-us/library/ms775107.aspx
[2] http://msdn2.microsoft.com/en-us/library/ms775147.aspx
Tim Ellison wrote:
> On a related note, we do a rubbish job of guessing the content type from
> the content of files themselves via
> URLConnection#guessContentTypeFromStream(InputStream). I've added a bit
> more logic in there for the most obvious cases, but when you consider
> the info in your typical Linux 'magic' file we have a long way to go.
> My first thought was whether we could ask the platform to guess for us,
> but I don't think there is any equivalent on Windows etc?
>
> Regards,
> Tim
>
> Alexey Petrenko wrote:
>
>> Looks like both application/rtf and text/rtf are correct from IANA [1]
>> point of view.
>> So I do not see any harm to follow RI's behavior in this case.
>>
>> By the way application/rtf specification looks more fresh then text/rtf
>>
>> SY, Alexey
>>
>> 1. http://www.iana.org/assignments/media-types/
>>
>> 2007/8/31, Tim Ellison <t....@gmail.com>:
>>
>>> The MIME types for a given extension are defined here [1] which we took
>>> from httpd's view of the world. So while it would be trivial to change
>>> them to be the same as the RI, I'm inclined to:
>>> - leave rtf as text/rtf
>>> - add java to our list as text/plain
>>> - leave doc as application/msword
>>> then figure out how to snoop the stream for other types.
>>>
>>> [1]
>>> http://svn.apache.org/viewvc/harmony/enhanced/classlib/trunk/depends/files/content-types.properties?revision=494047&view=markup
>>>
>>> Thoughts?
>>> Tim
>>>
>>>
>>> Vasily Zakharov (JIRA) wrote:
>>>
>>>> [classlib][luni] URLConnection.getContentType() works with files incorrectly
>>>> ----------------------------------------------------------------------------
>>>>
>>>> Key: HARMONY-4699
>>>> URL: https://issues.apache.org/jira/browse/HARMONY-4699
>>>> Project: Harmony
>>>> Issue Type: Bug
>>>> Components: Classlib
>>>> Reporter: Vasily Zakharov
>>>>
>>>>
>>>> In Harmony implementation, java.net.URLConnection.getContentType() works incorrectly when addresses a file URL:
>>>>
>>>> 1. For files with .rtf extension, RI returns "application/rtf", while Harmony returns "text/rtf".
>>>>
>>>> 2. For files with .java extension, RI returns "text/plain", while Harmony returns "content/unknown".
>>>>
>>>> 3. For files with .doc extension, RI returns "content/unknown", while Harmony returns "application/msword". The same is true for other known extensions.
>>>>
>>>> 4. For files with unrecognized extension and with HTML content, RI returns "text/html", while Harmony returns "content/unknown".
>>>>
>>>> Items 1 and 2 look like a minor issues that would better be fixed for compatibility with RI.
>>>>
>>>> Item 3 looks like a non-bug difference, as Harmony behaves clearly better than RI in these cases.
>>>>
>>>> Item 4 looks like a serious bug, as RI clearly looks into file content for the file type, and Harmony does not. Looks like org.apache.harmony.luni.internal.net.www.protocol.file.FileURLConnection.getContentType() needs to be fixed to use guessContentTypeFromStream() in addition to guessContentTypeFromName().
>>>>
>>>> The attached archive contains the reproducer with some test files it uses. Here's the reproducer code:
>>>>
>>>> public class Test {
>>>> static void printContentType(String fileName) throws java.io.IOException {
>>>> System.out.println(fileName + ": " + new java.net.URL("file:" + fileName).openConnection().getContentType());
>>>> }
>>>> public static void main(String argv[]) {
>>>> try {
>>>> printContentType("test.rtf");
>>>> printContentType("Test.java");
>>>> printContentType("test.doc");
>>>> printContentType("test.htx");
>>>> } catch (Exception e) {
>>>> e.printStackTrace(System.out);
>>>> }
>>>> }
>>>> }
>>>>
>>>> Output on RI:
>>>>
>>>> test.rtf: application/rtf
>>>> Test.java: text/plain
>>>> test.doc: content/unknown
>>>> test.htx: text/html
>>>>
>>>> Output on Harmony:
>>>>
>>>> test.rtf: text/rtf
>>>> Test.java: content/unknown
>>>> test.doc: application/msword
>>>> test.htx: content/unknown
>>>>
>>>> This issue is a blocker for HARMONY-4696, as on RI JEditorPane.getContentType() should be based on URLConnection.getContentType() that now works incorrectly.
>>>>
>>>>
>>>>
>
>
--
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
[classlib][luni] guessing content mime types (was: Re: [jira] Created:
(HARMONY-4699) [classlib][luni] URLConnection.getContentType() works with
files incorrectly)
Posted by Tim Ellison <t....@gmail.com>.
On a related note, we do a rubbish job of guessing the content type from
the content of files themselves via
URLConnection#guessContentTypeFromStream(InputStream). I've added a bit
more logic in there for the most obvious cases, but when you consider
the info in your typical Linux 'magic' file we have a long way to go.
My first thought was whether we could ask the platform to guess for us,
but I don't think there is any equivalent on Windows etc?
Regards,
Tim
Alexey Petrenko wrote:
> Looks like both application/rtf and text/rtf are correct from IANA [1]
> point of view.
> So I do not see any harm to follow RI's behavior in this case.
>
> By the way application/rtf specification looks more fresh then text/rtf
>
> SY, Alexey
>
> 1. http://www.iana.org/assignments/media-types/
>
> 2007/8/31, Tim Ellison <t....@gmail.com>:
>> The MIME types for a given extension are defined here [1] which we took
>> from httpd's view of the world. So while it would be trivial to change
>> them to be the same as the RI, I'm inclined to:
>> - leave rtf as text/rtf
>> - add java to our list as text/plain
>> - leave doc as application/msword
>> then figure out how to snoop the stream for other types.
>>
>> [1]
>> http://svn.apache.org/viewvc/harmony/enhanced/classlib/trunk/depends/files/content-types.properties?revision=494047&view=markup
>>
>> Thoughts?
>> Tim
>>
>>
>> Vasily Zakharov (JIRA) wrote:
>>> [classlib][luni] URLConnection.getContentType() works with files incorrectly
>>> ----------------------------------------------------------------------------
>>>
>>> Key: HARMONY-4699
>>> URL: https://issues.apache.org/jira/browse/HARMONY-4699
>>> Project: Harmony
>>> Issue Type: Bug
>>> Components: Classlib
>>> Reporter: Vasily Zakharov
>>>
>>>
>>> In Harmony implementation, java.net.URLConnection.getContentType() works incorrectly when addresses a file URL:
>>>
>>> 1. For files with .rtf extension, RI returns "application/rtf", while Harmony returns "text/rtf".
>>>
>>> 2. For files with .java extension, RI returns "text/plain", while Harmony returns "content/unknown".
>>>
>>> 3. For files with .doc extension, RI returns "content/unknown", while Harmony returns "application/msword". The same is true for other known extensions.
>>>
>>> 4. For files with unrecognized extension and with HTML content, RI returns "text/html", while Harmony returns "content/unknown".
>>>
>>> Items 1 and 2 look like a minor issues that would better be fixed for compatibility with RI.
>>>
>>> Item 3 looks like a non-bug difference, as Harmony behaves clearly better than RI in these cases.
>>>
>>> Item 4 looks like a serious bug, as RI clearly looks into file content for the file type, and Harmony does not. Looks like org.apache.harmony.luni.internal.net.www.protocol.file.FileURLConnection.getContentType() needs to be fixed to use guessContentTypeFromStream() in addition to guessContentTypeFromName().
>>>
>>> The attached archive contains the reproducer with some test files it uses. Here's the reproducer code:
>>>
>>> public class Test {
>>> static void printContentType(String fileName) throws java.io.IOException {
>>> System.out.println(fileName + ": " + new java.net.URL("file:" + fileName).openConnection().getContentType());
>>> }
>>> public static void main(String argv[]) {
>>> try {
>>> printContentType("test.rtf");
>>> printContentType("Test.java");
>>> printContentType("test.doc");
>>> printContentType("test.htx");
>>> } catch (Exception e) {
>>> e.printStackTrace(System.out);
>>> }
>>> }
>>> }
>>>
>>> Output on RI:
>>>
>>> test.rtf: application/rtf
>>> Test.java: text/plain
>>> test.doc: content/unknown
>>> test.htx: text/html
>>>
>>> Output on Harmony:
>>>
>>> test.rtf: text/rtf
>>> Test.java: content/unknown
>>> test.doc: application/msword
>>> test.htx: content/unknown
>>>
>>> This issue is a blocker for HARMONY-4696, as on RI JEditorPane.getContentType() should be based on URLConnection.getContentType() that now works incorrectly.
>>>
>>>
>
Re: [jira] Created: (HARMONY-4699) [classlib][luni] URLConnection.getContentType() works with files incorrectly
Posted by Alexey Petrenko <al...@gmail.com>.
Looks like both application/rtf and text/rtf are correct from IANA [1]
point of view.
So I do not see any harm to follow RI's behavior in this case.
By the way application/rtf specification looks more fresh then text/rtf
SY, Alexey
1. http://www.iana.org/assignments/media-types/
2007/8/31, Tim Ellison <t....@gmail.com>:
> The MIME types for a given extension are defined here [1] which we took
> from httpd's view of the world. So while it would be trivial to change
> them to be the same as the RI, I'm inclined to:
> - leave rtf as text/rtf
> - add java to our list as text/plain
> - leave doc as application/msword
> then figure out how to snoop the stream for other types.
>
> [1]
> http://svn.apache.org/viewvc/harmony/enhanced/classlib/trunk/depends/files/content-types.properties?revision=494047&view=markup
>
> Thoughts?
> Tim
>
>
> Vasily Zakharov (JIRA) wrote:
> > [classlib][luni] URLConnection.getContentType() works with files incorrectly
> > ----------------------------------------------------------------------------
> >
> > Key: HARMONY-4699
> > URL: https://issues.apache.org/jira/browse/HARMONY-4699
> > Project: Harmony
> > Issue Type: Bug
> > Components: Classlib
> > Reporter: Vasily Zakharov
> >
> >
> > In Harmony implementation, java.net.URLConnection.getContentType() works incorrectly when addresses a file URL:
> >
> > 1. For files with .rtf extension, RI returns "application/rtf", while Harmony returns "text/rtf".
> >
> > 2. For files with .java extension, RI returns "text/plain", while Harmony returns "content/unknown".
> >
> > 3. For files with .doc extension, RI returns "content/unknown", while Harmony returns "application/msword". The same is true for other known extensions.
> >
> > 4. For files with unrecognized extension and with HTML content, RI returns "text/html", while Harmony returns "content/unknown".
> >
> > Items 1 and 2 look like a minor issues that would better be fixed for compatibility with RI.
> >
> > Item 3 looks like a non-bug difference, as Harmony behaves clearly better than RI in these cases.
> >
> > Item 4 looks like a serious bug, as RI clearly looks into file content for the file type, and Harmony does not. Looks like org.apache.harmony.luni.internal.net.www.protocol.file.FileURLConnection.getContentType() needs to be fixed to use guessContentTypeFromStream() in addition to guessContentTypeFromName().
> >
> > The attached archive contains the reproducer with some test files it uses. Here's the reproducer code:
> >
> > public class Test {
> > static void printContentType(String fileName) throws java.io.IOException {
> > System.out.println(fileName + ": " + new java.net.URL("file:" + fileName).openConnection().getContentType());
> > }
> > public static void main(String argv[]) {
> > try {
> > printContentType("test.rtf");
> > printContentType("Test.java");
> > printContentType("test.doc");
> > printContentType("test.htx");
> > } catch (Exception e) {
> > e.printStackTrace(System.out);
> > }
> > }
> > }
> >
> > Output on RI:
> >
> > test.rtf: application/rtf
> > Test.java: text/plain
> > test.doc: content/unknown
> > test.htx: text/html
> >
> > Output on Harmony:
> >
> > test.rtf: text/rtf
> > Test.java: content/unknown
> > test.doc: application/msword
> > test.htx: content/unknown
> >
> > This issue is a blocker for HARMONY-4696, as on RI JEditorPane.getContentType() should be based on URLConnection.getContentType() that now works incorrectly.
> >
> >
>