You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Gerard Bouchar (JIRA)" <ji...@apache.org> on 2018/05/22 14:48:00 UTC

[jira] [Created] (TIKA-2648) resource name based mime detection detects elements as "text/x-php" instead of "text/html"

Gerard Bouchar created TIKA-2648:
------------------------------------

             Summary: resource name based mime detection detects elements as "text/x-php" instead of "text/html" 
                 Key: TIKA-2648
                 URL: https://issues.apache.org/jira/browse/TIKA-2648
             Project: Tika
          Issue Type: Bug
            Reporter: Gerard Bouchar


When using tika to detect a mime type given only an URL containing ".php" and a content-type hint of "text/html", it guesses "text/x-php", whereas one could expect "text/html".

{code}
TikaConfig tika = new TikaConfig();
Metadata metadata = new Metadata();
String url = "https://www.facebook.com/home.php";
metadata.set(Metadata.RESOURCE_NAME_KEY, url);
metadata.set(Metadata.CONTENT_TYPE, "text/html");
MediaType type = tika.getDetector().detect(null, metadata);
System.out.println(url + " is of type " + type.toString());
// Prints https://www.facebook.com/home.php is of type text/x-php
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)