You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Anas Hammani (Jira)" <ji...@apache.org> on 2021/03/02 16:37:00 UTC

[jira] [Created] (TIKA-3308) SVG file without xml declaration tag is detected as text/plain

Anas Hammani created TIKA-3308:
----------------------------------

             Summary: SVG file without xml declaration tag is detected as text/plain
                 Key: TIKA-3308
                 URL: https://issues.apache.org/jira/browse/TIKA-3308
             Project: Tika
          Issue Type: Bug
          Components: mime
    Affects Versions: 1.25
            Reporter: Anas Hammani
         Attachments: logo-luma.svg

The SVG file attached to the issue is interpreted as *text/plain* by
{code:java}
tika.detect(filePath){code}
 

If I add 
{code:java}
 <?xml version="1.0" standalone="no"?> {code}
at the beginning of the file, then tika detects it as  "image/svg+xml"

 

When i read the documentation i see that xml is not necessary for a file to be well-formed

[https://www.w3.org/TR/REC-xml/#sec-prolog-dtd]

 

It will be great if tika can detect a file as a SVG without the prolog

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)