You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Peter Kronenberg <pe...@torch.ai> on 2022/01/19 01:16:37 UTC

Detecting angle of rotation

Last year, I contributed some changes to Tika to remove the dependnency on Python and rotation.py.  Instead, Tika uses code from Tess4J to figure out how much a document is rotated.  And then uses ImageMagick to correct the rotation.

I just found some situations where this code is not working.  I don't know enough about the actual math behind all this, so hopefully, someone can help

Below is some test code, which is the same as what Tika is using.  The attached file shows a rotation of -6.8 (the unredacted version shows -10) even though it should be 0.   Any idea why it's not calculating correctly?

package com.torchai.service.textextractor.service;

import org.apache.tika.parser.ocr.tess4j.ImageDeskew;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;

public class GetAngle {

    private static double getAngle(Path sourceFile) throws IOException {
        BufferedImage bi = ImageIO.read(sourceFile.toFile());
        ImageDeskew id = new ImageDeskew(bi);
        double angle = id.getSkewAngle();
        if (angle < 1.0D && angle > -1.0D) {
            angle = 0.0D;
        } else {
            System.out.println("*** angle: " + angle);
        }

        return angle;
    }

    public static void main(String[] args) throws IOException {
        Path path = Paths.get( "/testFiles", "apache-tika-3035541828217823624.tmp");
//        Path path = Paths.get( "/testFiles", "skewed-02_image_text.png");
        System.out.println("*** path: " + path);
        System.out.println("*** getAngle: " + getAngle(path));

    }
}




Peter Kronenberg  |  Senior AI Analytic ENGINEER
C: 703.887.5623
[Torch AI]<http://www.torch.ai/>
5250 W 116th Pl, Suite 200., Leawood, KS 66211
WWW.TORCH.AI<http://www.torch.ai/>