You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by adityardesai <gi...@git.apache.org> on 2016/03/03 20:40:53 UTC

[GitHub] tika pull request: Fix for TIKA-1883 and 1884

GitHub user adityardesai opened a pull request:

    https://github.com/apache/tika/pull/87

    Fix for TIKA-1883 and 1884

    TIKA 1883
    Identification of Mime types for empty files, updating TIKA 1.12 source code to fix this issue. The Tika Detector and Parsers have been modified accordingly to identify the empty files and classify them.
    
    TIKA 1884
    Updating Tika's Mime Repository with the following file types
    1. .sfdu - Standard Formatted Data Unit
    2. .CDF - Common Data Format having magic byte CDF with 0 offset
    Tika Mime Repository is updated with these file types.
    
    The updated codes is available at 
    https://github.com/RashmiNalwad/MIME-Type-Identification-of-TREC-POLAR-DATASET
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/tika master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tika/pull/87.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #87
    
----
commit 1a3749fa632fdb8ad0bcb2cea673113031f9b4be
Author: Chris Mattmann <ma...@apache.org>
Date:   2015-06-25T17:54:55Z

    Fix for TIKA-1659 ZipContainerDetector does not detect all IPA files contributed by Rami Shomali <ra...@lookout.com> this closes #51.
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1687594 13f79535-47bb-0310-9956-ffa450edef68

commit 90a2202b5b4a75e7f673bfb42a912cb97ae6d26e
Author: Tim Allison <ta...@apache.org>
Date:   2015-06-28T01:57:30Z

    TIKA-1663 add a DigestingParser
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1687981 13f79535-47bb-0310-9956-ffa450edef68

commit 444dadd5eb090f6e2998507e444b2014905cb90f
Author: Chris Mattmann <ma...@apache.org>
Date:   2015-06-29T05:19:48Z

    Fix for TIKA-1664: GDALParser now correctly sets nitf as a supported media type contributed by Joseph North <jo...@gmail.com> this closes #53.
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1688086 13f79535-47bb-0310-9956-ffa450edef68

commit 761273f9e69c4a7595e50ccd6a2d9304c398d0b1
Author: Chris Mattmann <ma...@apache.org>
Date:   2015-06-29T05:26:52Z

    Fix for TIKA-1669: xpath node test ./node() should match all contained nodes contributed by WulfB <wu...@inacta.ch> this closes #52
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1688087 13f79535-47bb-0310-9956-ffa450edef68

commit fd8514c2c512d9dcc1039aadf1dbc64c1ff6d3fc
Author: Chris Mattmann <ma...@apache.org>
Date:   2015-06-29T14:34:29Z

    Rollback r1688087 as it seems to cause some tests to fail.
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1688239 13f79535-47bb-0310-9956-ffa450edef68

commit 2a47d9aa340d529f027c94f3c233645fb2f8bf7e
Author: Tim Allison <ta...@apache.org>
Date:   2015-06-30T00:48:03Z

    TIKA-1601: integrate Jackcess to parse MSAccess files
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1688337 13f79535-47bb-0310-9956-ffa450edef68

commit 06cfbaafeb308bd979fd2214a4b1a15353a9b4ab
Author: Chris Mattmann <ma...@apache.org>
Date:   2015-07-01T13:21:41Z

    Fix for TIKA-1602: Detecting standards-non-compliant emails as message/rfc822 contributed by Jeremy B. Merrill <je...@nytimes.com> this closes #40.
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1688647 13f79535-47bb-0310-9956-ffa450edef68

commit 425506e90500dadcccf82fd66aa15ce14d23facc
Author: Tyler Palsulich <tp...@apache.org>
Date:   2015-07-02T08:13:00Z

    TIKA-1536. Upgrade to Java 1.7.
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1688779 13f79535-47bb-0310-9956-ffa450edef68

commit 4695df5672492c38a8abcd230c8545f982a7f65d
Author: Tyler Palsulich <tp...@apache.org>
Date:   2015-07-02T08:14:48Z

    TIKA-1536. Update CHANGES.txt with upgrade to Java 7.
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1688780 13f79535-47bb-0310-9956-ffa450edef68

commit de5a2dec6924ebe01e4bf323a98abd208cf9aa7e
Author: Nick Burch <ni...@apache.org>
Date:   2015-07-02T10:35:06Z

    Remove change comment, TIKA-1602
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1688805 13f79535-47bb-0310-9956-ffa450edef68

commit 2764fb8606964c3350c781ecf5df4042706b4099
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-02T13:47:23Z

    TIKA-1673 drop source file name from embedded file path; made a few java 7 updates; added timing for embedded docs
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1688827 13f79535-47bb-0310-9956-ffa450edef68

commit f2218da2a8fff4160c3a3931d4daa7acd2bef36a
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-02T13:50:25Z

    TIKA-1673 -- doh, add back dropped qmino in server's pom
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1688828 13f79535-47bb-0310-9956-ffa450edef68

commit 9688c77128e5c3d53c1eeb4e72db62ff10dfafda
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-02T14:19:15Z

    TIKA-1400
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1688834 13f79535-47bb-0310-9956-ffa450edef68

commit 165eebcb5bcc2f6b5768bcdec5ce6496c3ca5093
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-07T14:21:50Z

    TIKA-1674: initial commit to add example of how to extract embedded files
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1689690 13f79535-47bb-0310-9956-ffa450edef68

commit 6029c0d5c4e07ff217fcd43a6c1e81170acf603e
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-09T13:02:27Z

    TIKA-1676
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1690090 13f79535-47bb-0310-9956-ffa450edef68

commit 320b2890038dfcb9191ba856c0e84e368851d24e
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-16T00:26:56Z

    TIKA-1681
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1691296 13f79535-47bb-0310-9956-ffa450edef68

commit 898f300c87b629590b2bd3bccca5ce0c05330438
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-16T00:55:40Z

    TIKA-1684
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1691297 13f79535-47bb-0310-9956-ffa450edef68

commit 4dcfb74cf702aa88a94f58ca79ede313dfb45852
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-16T01:17:08Z

    TIKA-1685 clean up easily cleaned up deprecations
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1691299 13f79535-47bb-0310-9956-ffa450edef68

commit 4a20585c3338ad95a37499ce11fde65ec3d3b2d7
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-16T01:53:26Z

    TIKA-1687 upgrad xerial.org's sqlite-jdbc to 3.8.10.1
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1691302 13f79535-47bb-0310-9956-ffa450edef68

commit c11eee52587227fa4579436d6f408a4758683520
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-20T15:27:32Z

    TIKA-1238: Update OutlookExtractor's codepoint detection algorithm
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1691962 13f79535-47bb-0310-9956-ffa450edef68

commit 9a8798b8e19b2149e1f5b678804f328dd0be6686
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-21T01:34:25Z

    TIKA-1678 -- initial commit.  Need to wait for fix to PDFBOX-2896 to generate test file.
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1692042 13f79535-47bb-0310-9956-ffa450edef68

commit 9c04fa667610016da957cf3ff2afb6d7231b02c8
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-21T12:44:59Z

    TIKA-1683 -- add encryption support for Jackcess
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1692099 13f79535-47bb-0310-9956-ffa450edef68

commit 0658ee67d6b8faa31d99ce26a3d4067804315c91
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-21T12:45:57Z

    TIKA-1683 -- add encryption support for Jackcess, this time with test document
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1692100 13f79535-47bb-0310-9956-ffa450edef68

commit d2e68d44cb224ec2077d5cc4961420a5cc78de36
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-22T14:30:38Z

    TIKA-1692 : allow MimeTypes to look for a registered mime type that may or may not have parameters.
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1692283 13f79535-47bb-0310-9956-ffa450edef68

commit 28149a6e9c700e6c81f266273cf3381ba5805526
Author: Nick Burch <ni...@apache.org>
Date:   2015-07-23T00:07:24Z

    Tweak the getRegisteredMimeType javadocs a little bit, to try to make it clearer
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1692337 13f79535-47bb-0310-9956-ffa450edef68

commit 5f3acd22272dcaf8b946feecaf0e1bbebb6489fb
Author: Nick Burch <ni...@apache.org>
Date:   2015-07-23T00:10:22Z

    Fix some javadoc warnings
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1692338 13f79535-47bb-0310-9956-ffa450edef68

commit 7cdf08c56bbd381b8b216e8e20183fad149aa984
Author: Nick Burch <ni...@apache.org>
Date:   2015-07-23T00:21:03Z

    Fix some javadoc warnings
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1692339 13f79535-47bb-0310-9956-ffa450edef68

commit a4baebe62c9a29c71fac00b59657f508c681edbf
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-23T01:08:59Z

    TIKA-1588 upgrade to PDFBox 1.8.10
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1692341 13f79535-47bb-0310-9956-ffa450edef68

commit 98672cdd92b6325ff78c763955a7c045b364095b
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-23T13:00:00Z

    TIKA-1690: revert changes made in r1678515 that added fileUrl capability in tika-server
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1692383 13f79535-47bb-0310-9956-ffa450edef68

commit 194a301ce3786a40683213aa3d0e4ae7155d747a
Author: Tim Allison <ta...@apache.org>
Date:   2015-07-23T17:26:58Z

    TIKA-1667: upgrade to POI 3.13-beta1
    
    git-svn-id: https://svn.apache.org/repos/asf/tika/trunk@1692422 13f79535-47bb-0310-9956-ffa450edef68

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tika pull request: Fix for TIKA-1883 and 1884

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tika/pull/87


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---