You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Jean-Nicolas Boulay Desjardins <jn...@gmail.com> on 2018/04/19 00:03:45 UTC
Hex of RSS xml file is not recognized as RSS file MIME type
I converted this RSS XML content to hex:
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
<title>W3Schools Home Page</title>
<link>https://www.w3schools.com</link>
<description>Free web building tutorials</description>
<item>
<title>RSS Tutorial</title>
<link>https://www.w3schools.com/xml/xml_rss.asp</link>
<description>New RSS tutorial on W3Schools</description>
</item>
<item>
<title>XML Tutorial</title>
<link>https://www.w3schools.com/xml</link>
<description>New XML tutorial on W3Schools</description>
</item>
</channel>
</rss>
Then send it to Tika... Tika returns: text/plain
Why am I not getting the rss mime type?
Re: Hex of RSS xml file is not recognized as RSS file MIME type
Posted by Jean-Nicolas Boulay Desjardins <jn...@gmail.com>.
I do the same with RDF and it works. By the way I use: tika.detect(hex)
On Thu, Apr 19, 2018 at 7:54 AM, Nick Burch <ap...@gagravarr.org> wrote:
> On Wed, 18 Apr 2018, Jean-Nicolas Boulay Desjardins wrote:
>
>> I converted this RSS XML content to hex:
>>
>> <?xml version="1.0" encoding="UTF-8" ?>
>> <rss version="2.0">
>>
>> Then send it to Tika... Tika returns: text/plain
>>
>
> Base 64 encoded XML is no longer valid XML, so this is as expected.
>
> Why am I not getting the rss mime type?
>>
>
> You need to send Tika the real file as-is
>
> Nick
>
Re: Hex of RSS xml file is not recognized as RSS file MIME type
Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 18 Apr 2018, Jean-Nicolas Boulay Desjardins wrote:
> I converted this RSS XML content to hex:
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <rss version="2.0">
>
> Then send it to Tika... Tika returns: text/plain
Base 64 encoded XML is no longer valid XML, so this is as expected.
> Why am I not getting the rss mime type?
You need to send Tika the real file as-is
Nick