You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ta...@apache.org on 2019/07/18 18:36:59 UTC

[tika] branch master updated (477a8ca -> 620134b)

This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/tika.git.


    from 477a8ca  TIKA-2908 -- reorder closing of streams in tesseract parser
     new 806489b  Update changes.txt to reflect TIKA-2908.
     new 620134b  TIKA-2899 -- improve robustness of list handling in the RTFParser

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt                                                        | 5 ++++-
 tika-core/src/test/java/org/apache/tika/TikaTest.java              | 7 +++++++
 .../src/main/java/org/apache/tika/parser/rtf/TextExtractor.java    | 4 +++-
 3 files changed, 14 insertions(+), 2 deletions(-)


[tika] 02/02: TIKA-2899 -- improve robustness of list handling in the RTFParser

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/tika.git

commit 620134b90b72632fee486ba3aa5b25ff6b271d22
Author: TALLISON <ta...@apache.org>
AuthorDate: Thu Jul 18 14:35:13 2019 -0400

    TIKA-2899 -- improve robustness of list handling in the RTFParser
---
 tika-core/src/test/java/org/apache/tika/TikaTest.java              | 7 +++++++
 .../src/main/java/org/apache/tika/parser/rtf/TextExtractor.java    | 4 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tika-core/src/test/java/org/apache/tika/TikaTest.java b/tika-core/src/test/java/org/apache/tika/TikaTest.java
index 0aaaf35..91e6dc7 100644
--- a/tika-core/src/test/java/org/apache/tika/TikaTest.java
+++ b/tika-core/src/test/java/org/apache/tika/TikaTest.java
@@ -29,6 +29,7 @@ import java.io.IOException;
 import java.io.InputStream;
 import java.net.URISyntaxException;
 import java.net.URL;
+import java.nio.file.Path;
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.HashSet;
@@ -241,6 +242,12 @@ public abstract class TikaTest {
         }
     }
 
+    protected List<Metadata> getRecursiveMetadata(Path p, boolean suppressException) throws Exception {
+        try (TikaInputStream tis = TikaInputStream.get(p)) {
+            return getRecursiveMetadata(tis, new ParseContext(), new Metadata(), suppressException);
+        }
+    }
+
     protected List<Metadata> getRecursiveMetadata(InputStream is, boolean suppressException) throws Exception {
         return getRecursiveMetadata(is, new ParseContext(), new Metadata(), suppressException);
     }
diff --git a/tika-parsers/src/main/java/org/apache/tika/parser/rtf/TextExtractor.java b/tika-parsers/src/main/java/org/apache/tika/parser/rtf/TextExtractor.java
index e2733b2..4758f2d 100644
--- a/tika-parsers/src/main/java/org/apache/tika/parser/rtf/TextExtractor.java
+++ b/tika-parsers/src/main/java/org/apache/tika/parser/rtf/TextExtractor.java
@@ -865,7 +865,6 @@ final class TextExtractor {
 
     // Handle control word that takes a parameter:
     private void processControlWord(int param, PushbackInputStream in) throws IOException, SAXException, TikaException {
-
         // TODO: afN?  (associated font number)
 
         // TODO: do these alter text output...?
@@ -1245,6 +1244,9 @@ final class TextExtractor {
             if (!ignored) {
                 endParagraph(true);
             }
+            if (inList()) { // && (groupStates.size() == 1 || groupStates.peekLast().list < 0))
+                pendingListEnd();
+            }
         } else if (equals("shptxt")) {
             pushText();
             // Text inside a shape


[tika] 01/02: Update changes.txt to reflect TIKA-2908.

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/tika.git

commit 806489b59aebe4208cd21ad40fa0cf2c9e0e35e7
Author: TALLISON <ta...@apache.org>
AuthorDate: Thu Jul 18 12:05:56 2019 -0400

    Update changes.txt to reflect TIKA-2908.
---
 CHANGES.txt | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index c30d592..e3d3ea6 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -7,9 +7,12 @@ Release 2.0.0 - ???
 
 Release 1.22 - ???
 
-   * Known regression: PDFBOX-4587 PDF passwords with codepoints
+   * NOTE: Known regression: PDFBOX-4587 -- PDF passwords with codepoints
      between 0xF000 and 0XF0000 will cause an exception.
 
+   * Fix order of closing streams to avoid "Failed to close temporary resource"
+     exception (TIKA-2908).
+
    * Improve AutoDetectReader performance by caching encoding
      detector (TIKA-1568).