You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by GitBox <gi...@apache.org> on 2021/01/21 14:13:50 UTC

[GitHub] [tika] tballison commented on a change in pull request #397: Tika 3272 - Remove usage of rotation.py and Python dependency

tballison commented on a change in pull request #397:
URL: https://github.com/apache/tika/pull/397#discussion_r561910483



##########
File path: tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-ocr-module/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java
##########
@@ -239,25 +239,20 @@ private void parse(TikaInputStream tikaInputStream, File tmpOCROutputFile,
             if (size >= config.getMinFileSizeToOcr() && size <= config.getMaxFileSizeToOcr()) {
 
             	// Process image
-            	if (config.isEnableImageProcessing()) {
+            	if (config.isEnableImageProcessing() || config.isApplyRotation()) {

Review comment:
       I think this makes more sense than the consistency check.  If the user selected either, go for it.

##########
File path: tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-ocr-module/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java
##########
@@ -43,7 +43,7 @@
 import org.xml.sax.helpers.DefaultHandler;
 
 import javax.imageio.ImageIO;
-import java.awt.Image;
+import java.awt.*;

Review comment:
       Please avoid wildcard imports

##########
File path: tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-ocr-module/src/main/java/org/apache/tika/parser/ocr/tess4j/ImageUtil.java
##########
@@ -0,0 +1,93 @@
+package org.apache.tika.parser.ocr.tess4j;

Review comment:
       Ditto...license header and comment "copied from..."

##########
File path: tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-ocr-module/src/main/java/org/apache/tika/parser/ocr/tess4j/ImageDeskew.java
##########
@@ -0,0 +1,135 @@
+package org.apache.tika.parser.ocr.tess4j;

Review comment:
       Please add Apache License header.
   
   Also, please add a comment along the lines of "copied and pasted from" and then a link to the tess4j code

##########
File path: tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-ocr-module/src/main/java/org/apache/tika/parser/ocr/tess4j/ImageUtil.java
##########
@@ -0,0 +1,93 @@
+package org.apache.tika.parser.ocr.tess4j;
+
+
+import org.apache.tika.parser.ocr.TesseractOCRParser;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.awt.*;

Review comment:
       please avoid wildcard imports




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org