You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ni...@apache.org on 2018/01/25 15:18:32 UTC

[tika] branch master updated (3ce43ad -> d72ae53)

This is an automated email from the ASF dual-hosted git repository.

nick pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/tika.git.


    from 3ce43ad  clean up test dependencies in tika-nlp
     new db75e85  TIKA-2554 Separate out Makefile from text/plain to a specific subtype
     new 1ba30ef  TIKA-2554 Separate out Config formats from text/plain to a specific subtype
     new 9b00c93  Another now-expected difference from HTTPD
     new d72ae53  Resync with http://www.apache.org/dev/svn-eol-style.txt , adding new plain-text extensions from there

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../org/apache/tika/mime/tika-mimetypes.xml        | 43 +++++++++++++++++++---
 .../java/org/apache/tika/TikaDetectionTest.java    |  3 +-
 2 files changed, 40 insertions(+), 6 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
nick@apache.org.

[tika] 03/04: Another now-expected difference from HTTPD

Posted by ni...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

nick pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/tika.git

commit 9b00c9300bb4a207b34daad4789bb389c7a6fc8b
Author: Nick Burch <ni...@gagravarr.org>
AuthorDate: Thu Jan 25 15:12:18 2018 +0000

    Another now-expected difference from HTTPD
---
 tika-core/src/test/java/org/apache/tika/TikaDetectionTest.java | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tika-core/src/test/java/org/apache/tika/TikaDetectionTest.java b/tika-core/src/test/java/org/apache/tika/TikaDetectionTest.java
index a642b47..af28d73 100644
--- a/tika-core/src/test/java/org/apache/tika/TikaDetectionTest.java
+++ b/tika-core/src/test/java/org/apache/tika/TikaDetectionTest.java
@@ -762,7 +762,8 @@ public class TikaDetectionTest {
         assertEquals("text/html", tika.detect("x.htm"));
         assertEquals("text/plain", tika.detect("x.txt"));
         assertEquals("text/plain", tika.detect("x.text"));
-        assertEquals("text/plain", tika.detect("x.conf"));
+        // Differ from httpd - Use a dedicated mimetype for Config files
+        //assertEquals("text/plain", tika.detect("x.conf"));
         assertEquals("text/plain", tika.detect("x.def"));
         assertEquals("text/plain", tika.detect("x.list"));
         assertEquals("text/x-log", tika.detect("x.log"));

-- 
To stop receiving notification emails like this one, please contact
nick@apache.org.

[tika] 01/04: TIKA-2554 Separate out Makefile from text/plain to a specific subtype

Posted by ni...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

nick pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/tika.git

commit db75e85fc9cf0d5f2c25f7eae2ff8deb59611b00
Author: Nick Burch <ni...@gagravarr.org>
AuthorDate: Thu Jan 25 14:42:28 2018 +0000

    TIKA-2554 Separate out Makefile from text/plain to a specific subtype
---
 .../resources/org/apache/tika/mime/tika-mimetypes.xml   | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml b/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
index 98c77ee..0a31c65 100644
--- a/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
+++ b/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
@@ -5945,6 +5945,22 @@
     <glob pattern="*.htm"/>
   </mime-type>
 
+  <mime-type type="text/x-makefile">
+    <_comment>Makefile build file</_comment>
+    <magic priority="20">
+      <!-- Only magic for default autoconf/automake produced ones -->
+      <match value="# Makefile.in generated by" type="string" offset="0"/>
+      <!-- Not exhaustive, and most people don't set this! -->
+      <match value="#!make" type="string" offset="0"/>
+      <match value="#!/usr/bin/make" type="string" offset="0"/>
+      <match value="#!/usr/local/bin/make" type="string" offset="0"/>
+      <match value="#!/usr/bin/env make" type="string" offset="0"/>
+    </magic>
+    <glob pattern="Makefile"/>
+    <glob pattern="GNUMakefile"/>
+    <sub-class-of type="text/plain"/>
+  </mime-type>
+
   <mime-type type="text/parityfec"/>
 
   <mime-type type="text/plain">
@@ -5973,7 +5989,6 @@
     <!-- TIKA-85: http://www.apache.org/dev/svn-eol-style.txt -->
     <glob pattern="INSTALL"/>
     <glob pattern="KEYS"/>
-    <glob pattern="Makefile"/>
     <glob pattern="README"/>
     <glob pattern="abs-linkmap"/>
     <glob pattern="abs-menulinks"/>

-- 
To stop receiving notification emails like this one, please contact
nick@apache.org.

[tika] 02/04: TIKA-2554 Separate out Config formats from text/plain to a specific subtype

Posted by ni...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

nick pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/tika.git

commit 1ba30ef32fee372566790650e9ab8a36bc9ab807
Author: Nick Burch <ni...@gagravarr.org>
AuthorDate: Thu Jan 25 14:46:24 2018 +0000

    TIKA-2554 Separate out Config formats from text/plain to a specific subtype
---
 .../main/resources/org/apache/tika/mime/tika-mimetypes.xml   | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml b/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
index 0a31c65..e35c133 100644
--- a/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
+++ b/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
@@ -5872,6 +5872,14 @@
     <sub-class-of type="text/plain"/>
   </mime-type>
 
+  <mime-type type="text/x-config">
+    <glob pattern="*.config"/>
+    <glob pattern="*.conf"/>
+    <glob pattern="*.cfg"/>
+    <glob pattern="*.xconf"/>
+    <sub-class-of type="text/plain"/>
+  </mime-type>
+
   <mime-type type="text/css">
     <_comment>Cascading Style Sheet</_comment>
     <glob pattern="*.css"/>
@@ -5980,8 +5988,6 @@
 
     <glob pattern="*.txt"/>
     <glob pattern="*.text"/>
-    <glob pattern="*.conf"/>
-    <glob pattern="*.cfg"/>
     <glob pattern="*.def"/>
     <glob pattern="*.list"/>
     <glob pattern="*.in"/>
@@ -5996,7 +6002,6 @@
     <glob pattern="*.ac"/>
     <glob pattern="*.am"/>
     <glob pattern="*.classpath"/>
-    <glob pattern="*.config"/>
     <glob pattern="*.cwiki"/>
     <glob pattern="*.data"/>
     <glob pattern="*.dcl"/>
@@ -6032,7 +6037,6 @@
     <glob pattern="*.wsdd"/>
     <glob pattern="*.xargs"/>
     <glob pattern="*.xcat"/>
-    <glob pattern="*.xconf"/>
     <glob pattern="*.xegrm"/>
     <glob pattern="*.xgrm"/>
     <glob pattern="*.xlex"/>

-- 
To stop receiving notification emails like this one, please contact
nick@apache.org.

[tika] 04/04: Resync with http://www.apache.org/dev/svn-eol-style.txt , adding new plain-text extensions from there

Posted by ni...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

nick pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/tika.git

commit d72ae53d2e1c767c1e5c6d150bb87b33829d10f0
Author: Nick Burch <ni...@gagravarr.org>
AuthorDate: Thu Jan 25 15:18:24 2018 +0000

    Resync with http://www.apache.org/dev/svn-eol-style.txt , adding new plain-text extensions from there
---
 .../main/resources/org/apache/tika/mime/tika-mimetypes.xml | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml b/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
index e35c133..e3e61db 100644
--- a/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
+++ b/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
@@ -4329,6 +4329,7 @@
     </magic>
     <root-XML namespaceURI="http://www.w3.org/1999/xhtml" localName="html"/>
     <glob pattern="*.xhtml"/>
+    <glob pattern="*.xhtml2"/>
     <glob pattern="*.xht"/>
   </mime-type>
 
@@ -5995,16 +5996,23 @@
     <!-- TIKA-85: http://www.apache.org/dev/svn-eol-style.txt -->
     <glob pattern="INSTALL"/>
     <glob pattern="KEYS"/>
+    <glob pattern="LICENSE"/>
+    <glob pattern="NOTICE"/>
     <glob pattern="README"/>
     <glob pattern="abs-linkmap"/>
     <glob pattern="abs-menulinks"/>
     <glob pattern="*.aart"/>
     <glob pattern="*.ac"/>
     <glob pattern="*.am"/>
+    <glob pattern="*.apt"/>
+    <glob pattern="*.bsh"/>
     <glob pattern="*.classpath"/>
+    <glob pattern="*.cnd"/>
     <glob pattern="*.cwiki"/>
     <glob pattern="*.data"/>
     <glob pattern="*.dcl"/>
+    <glob pattern="*.dsp"/>
+    <glob pattern="*.dsw"/>
     <glob pattern="*.egrm"/>
     <glob pattern="*.ent"/>
     <glob pattern="*.ft"/>
@@ -6013,6 +6021,8 @@
     <glob pattern="*.grm"/>
     <glob pattern="*.g"/>
     <glob pattern=".htaccess"/>
+    <glob pattern="*.handlers"/>
+    <glob pattern="*.htc"/>
     <glob pattern="*.ihtml"/>
     <glob pattern="*.jmx"/>
     <glob pattern="*.junit"/>
@@ -6022,6 +6032,7 @@
     <glob pattern="*.mf"/>
     <glob pattern="*.MF"/>
     <glob pattern="*.meta"/>
+    <glob pattern="*.mdo"/>
     <glob pattern="*.n3"/>
     <glob pattern="*.pen"/>
     <glob pattern="*.pod"/>
@@ -6030,6 +6041,7 @@
     <glob pattern="*.rng"/>
     <glob pattern="*.rnx"/>
     <glob pattern="*.roles"/>
+    <glob pattern="*.schemas"/>
     <glob pattern="*.tld"/>
     <glob pattern="*.types"/>
     <glob pattern="*.vm"/>
@@ -6045,6 +6057,7 @@
     <glob pattern="*.xroles"/>
     <glob pattern="*.xsamples"/>
     <glob pattern="*.xsp"/>
+    <glob pattern="*.xtest"/>
     <glob pattern="*.xweb"/>
     <glob pattern="*.xwelcome"/>
   </mime-type>
@@ -6086,6 +6099,7 @@
     <glob pattern="*.t"/>
     <glob pattern="*.tr"/>
     <glob pattern="*.roff"/>
+    <glob pattern="*.nroff"/>
     <glob pattern="*.man"/>
     <glob pattern="*.me"/>
     <glob pattern="*.ms"/>

-- 
To stop receiving notification emails like this one, please contact
nick@apache.org.