You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Jean-Nicolas Boulay Desjardins <jn...@gmail.com> on 2018/04/19 00:03:45 UTC

Hex of RSS xml file is not recognized as RSS file MIME type

I converted this RSS XML content to hex:

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">

<channel>
  <title>W3Schools Home Page</title>
  <link>https://www.w3schools.com</link>
  <description>Free web building tutorials</description>
  <item>
    <title>RSS Tutorial</title>
    <link>https://www.w3schools.com/xml/xml_rss.asp</link>
    <description>New RSS tutorial on W3Schools</description>
  </item>
  <item>
    <title>XML Tutorial</title>
    <link>https://www.w3schools.com/xml</link>
    <description>New XML tutorial on W3Schools</description>
  </item>
</channel>

</rss>

Then send it to Tika... Tika returns: text/plain

Why am I not getting the rss mime type?

Re: Hex of RSS xml file is not recognized as RSS file MIME type

Posted by Jean-Nicolas Boulay Desjardins <jn...@gmail.com>.
I do the same with RDF and it works. By the way I use: tika.detect(hex)

On Thu, Apr 19, 2018 at 7:54 AM, Nick Burch <ap...@gagravarr.org> wrote:

> On Wed, 18 Apr 2018, Jean-Nicolas Boulay Desjardins wrote:
>
>> I converted this RSS XML content to hex:
>>
>> <?xml version="1.0" encoding="UTF-8" ?>
>> <rss version="2.0">
>>
>> Then send it to Tika... Tika returns: text/plain
>>
>
> Base 64 encoded XML is no longer valid XML, so this is as expected.
>
> Why am I not getting the rss mime type?
>>
>
> You need to send Tika the real file as-is
>
> Nick
>

Re: Hex of RSS xml file is not recognized as RSS file MIME type

Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 18 Apr 2018, Jean-Nicolas Boulay Desjardins wrote:
> I converted this RSS XML content to hex:
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <rss version="2.0">
>
> Then send it to Tika... Tika returns: text/plain

Base 64 encoded XML is no longer valid XML, so this is as expected.

> Why am I not getting the rss mime type?

You need to send Tika the real file as-is

Nick