commits@tika.apache.org, 2016-12

You are viewing a plain text version of this content. The canonical link for it is here.

- tika git commit: TIKA-2187 -- fixed test - posted by ta...@apache.org on 2016/12/01 00:20:56 UTC, 0 replies.
- tika git commit: TIKA-2187 -- make "ignore deleted" as the default in the experimental SAX .docx parser and update the WordExtractor to include extraction of deleted text if requested by the user. - posted by ta...@apache.org on 2016/12/01 00:25:18 UTC, 0 replies.
- [1/4] tika git commit: TIKA-2090 -- first draft - posted by ta...@apache.org on 2016/12/01 00:38:55 UTC, 0 replies.
- [2/4] tika git commit: TIKA-2090 -- add more areas where javascript might live and add ability to turn action extraction on/off - posted by ta...@apache.org on 2016/12/01 00:38:56 UTC, 0 replies.
- [3/4] tika git commit: Merge branch 'pdf_javascript' - posted by ta...@apache.org on 2016/12/01 00:38:57 UTC, 0 replies.
- [4/4] tika git commit: TIKA-2090 -- add ability to extract PDActions from PDF files - posted by ta...@apache.org on 2016/12/01 00:38:58 UTC, 0 replies.
- tika git commit: TIKA-2090: Allow extraction of PDActions (including Javascript) from PDFs (TIKA-2090). - posted by ta...@apache.org on 2016/12/01 00:47:35 UTC, 0 replies.
- svn commit: r1772248 - /tika/site/src/site/resources/doap.rdf - posted by ni...@apache.org on 2016/12/01 18:52:50 UTC, 0 replies.
- svn commit: r1772249 - /tika/site/publish/doap.rdf - posted by ni...@apache.org on 2016/12/01 18:55:09 UTC, 0 replies.
- Attached document - posted by ca...@apache.org on 2016/12/02 19:24:01 UTC, 0 replies.
- [1/7] tika git commit: TIKA-2191 -- step1 -- add other docx tests and comment/ignore where appropriate - posted by ta...@apache.org on 2016/12/06 14:06:44 UTC, 0 replies.
- [2/7] tika git commit: TIKA-2191 -- step2 -- add handling for docm files...extract macros - posted by ta...@apache.org on 2016/12/06 14:06:45 UTC, 0 replies.
- [3/7] tika git commit: TIKA-2191 -- step 3 -- clean up and tag handling - posted by ta...@apache.org on 2016/12/06 14:06:46 UTC, 0 replies.
- [4/7] tika git commit: TIKA-2191 -- step 4-- add markup for embedded pics - posted by ta...@apache.org on 2016/12/06 14:06:47 UTC, 0 replies.
- [5/7] tika git commit: TIKA-2191 -- step 5 actually extract images embedded in areas besides the body of docx/m - posted by ta...@apache.org on 2016/12/06 14:06:48 UTC, 0 replies.
- [6/7] tika git commit: TIKA-2192 - add extraction of embedded objects in DOM docx parser from more than just main document - posted by ta...@apache.org on 2016/12/06 14:06:49 UTC, 0 replies.
- [7/7] tika git commit: update changes for TIKA-2191 and TIKA-2192 - posted by ta...@apache.org on 2016/12/06 14:06:50 UTC, 0 replies.
- tika git commit: TIKA-2191 - step 6(?) add list numbering, bookmarks and styles - posted by ta...@apache.org on 2016/12/07 20:32:49 UTC, 0 replies.
- Message from "RNP002F3E9365CA" - posted by do...@apache.org on 2016/12/08 14:19:02 UTC, 0 replies.
- tika git commit: remove println...the horror...ugh - posted by ta...@apache.org on 2016/12/08 19:11:28 UTC, 0 replies.
- tika git commit: TIKA-2191: fixes after regression testing on TIKA_1302 corpus: 1) add 'cr' and 'br' and 2) add 'template' to potential main story body parts - posted by ta...@apache.org on 2016/12/12 12:20:59 UTC, 0 replies.
- tika git commit: TIKA-2191: fixes after regression testing on TIKA_1302 corpus: 1) add 'cr' and 'br' and 2) add 'template' to potential main story body parts -- git add test file. - posted by ta...@apache.org on 2016/12/12 13:26:10 UTC, 0 replies.
- tika git commit: TIKA-2191: convert Styles reader to SAX and store only styleId->styleName map. - posted by ta...@apache.org on 2016/12/12 15:42:06 UTC, 0 replies.
- tika git commit: TIKA-2195: refactor MockParser to consolidate service loading and custom mime type into tica-core/src/text - posted by ta...@apache.org on 2016/12/13 01:21:40 UTC, 0 replies.
- tika git commit: TIKA-2173: improve configuration of PDFParser via @Field - posted by ta...@apache.org on 2016/12/13 02:41:19 UTC, 0 replies.
- tika git commit: TIKA-2191 -- optimize branching in start and endElement based on corpus statistics - posted by ta...@apache.org on 2016/12/14 18:14:16 UTC, 0 replies.
- tika git commit: Update to PDFBox 2.0.4 - posted by gr...@apache.org on 2016/12/16 16:10:50 UTC, 1 replies.
- [1/2] tika git commit: TIKA-2210 -- add experimental SAX parser for pptx -- this is a first cut. More refactoring is in order. - posted by ta...@apache.org on 2016/12/17 00:47:04 UTC, 0 replies.
- [2/2] tika git commit: TIKA-2210 -- add experimental SAX parser for pptx -- this is a first cut. More refactoring is in order. - posted by ta...@apache.org on 2016/12/17 00:47:05 UTC, 0 replies.
- tika git commit: TIKA-2218 -- add a few more places where .pptx can include embedded objects - posted by ta...@apache.org on 2016/12/19 21:06:22 UTC, 0 replies.
- [1/2] tika git commit: TIKA-2218 -- add a new new locations within a pptx to check for embedded objects - posted by ta...@apache.org on 2016/12/19 21:08:36 UTC, 0 replies.
- [2/2] tika git commit: Merge remote-tracking branch 'origin/2.x' into 2.x - posted by ta...@apache.org on 2016/12/19 21:08:37 UTC, 0 replies.
- [1/2] tika git commit: TIKA-2220 - refactor new sax pptx and docx to reduce code duplication. - posted by ta...@apache.org on 2016/12/20 18:16:31 UTC, 0 replies.
- [2/2] tika git commit: TIKA-2220 - refactor new sax pptx and docx to reduce code duplication. - posted by ta...@apache.org on 2016/12/20 18:16:32 UTC, 0 replies.
- tika git commit: TIKA-2221 -- correctly catch and convert encrypted document exception to EncryptedDocumentException in WordParser via Matthew Caruana Galizia - posted by ta...@apache.org on 2016/12/20 18:27:25 UTC, 0 replies.
- tika git commit: TIKA-2221 -- correctly catch and rethrow encrypted document exception as EncryptedDocumentException in WordExtractor via Matthew Caruana Galizia - posted by ta...@apache.org on 2016/12/20 18:30:07 UTC, 0 replies.
- [1/2] tika git commit: remove printlns in ZeroSizeFileDetectorTest - posted by ta...@apache.org on 2016/12/20 19:22:29 UTC, 0 replies.
- [2/2] tika git commit: TIKA-2219 - make sure to transmit encoding name in detectAll() via Pascal Essiembre - posted by ta...@apache.org on 2016/12/20 19:22:30 UTC, 0 replies.
- tika git commit: TIKA-2219 make sure to transmit charset name in detectAll via Pascal Essiembre - posted by ta...@apache.org on 2016/12/20 19:24:57 UTC, 0 replies.
- [1/4] tika git commit: [TIKA-2189] fix for Default value mismatch for "enableImageProcessing" in TesseractOCRConfig.properties and TesseractOCRConfig.java - posted by ta...@apache.org on 2016/12/20 21:32:52 UTC, 0 replies.
- [2/4] tika git commit: Merge branch 'bug_TIKA-2189' of https://github.com/dasbipulkumar/tika - posted by ta...@apache.org on 2016/12/20 21:32:53 UTC, 0 replies.
- [3/4] tika git commit: clean up tabs - posted by ta...@apache.org on 2016/12/20 21:32:54 UTC, 0 replies.
- [4/4] tika git commit: TIKA-2190 -- add configurability for preserve interword spacing - posted by ta...@apache.org on 2016/12/20 21:32:55 UTC, 0 replies.
- [1/2] tika git commit: TIKA-2219 make sure to transmit charset name in detectAll via Pascal Essiembre -- fix test method to get inputstream from zip - posted by ta...@apache.org on 2016/12/20 21:34:17 UTC, 0 replies.
- [2/2] tika git commit: TIKA 2190 -- add configurability for preserve interword spacing - posted by ta...@apache.org on 2016/12/20 21:34:18 UTC, 0 replies.
- tika git commit: update OCR config to include default for output type - posted by ta...@apache.org on 2016/12/20 21:37:01 UTC, 0 replies.
- tika git commit: add comment on outputType and trigger close of TIKA-2189. This closes #139. - posted by ta...@apache.org on 2016/12/20 21:40:47 UTC, 0 replies.
- [1/2] tika git commit: TIKA-2211 modify test file to include style information to test that we're excluding it. - posted by ta...@apache.org on 2016/12/21 14:11:26 UTC, 0 replies.
- [2/2] tika git commit: TIKA-2211 -- make sure that head (