You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Peter Kronenberg <pe...@torch.ai> on 2021/02/10 17:34:37 UTC

Error calling ImageMagick

I think yesterday's code introduced a bug.  The temporary file that is created for ImageMagick is not there.


[main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract is installed and is being invoked. This can add greatly to processing time.  If you do not want tesseract to be applied to your files see: https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr
magick: no images found for operation `-resize' at CLI arg 9 @ error/operation.c/CLIOption/5361.
[main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - ImageMagick failed (commandline: [magick, -density, 300, -depth, 4, -colorspace, gray, -filter, triangle, -resize, 200%, C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp, C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp])
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
            at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
            at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
            at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153)
            at org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor.java:121)
            at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:280)
            at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:248)
            at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
            at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
            at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
            at org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImageParser.java:94)
            at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
            at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
            at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
            at org.torchai.ImageMagick.parse(ImageMagick.java:43)
            at org.torchai.ImageMagick.main(ImageMagick.java:56)
Text: MARLEY was dead, to begin with. There is no doubt whatever about
that. The register of his burial was signed by the clergyman, the clerk,
the undertaker, and the chief mourner. Scrooge signed it. And
Scrooge's name was good upon 'Change, for anything he chose to put
his hand to.


Here's the code:

public static String parse(String file) throws TikaException, SAXException, IOException {

    final AutoDetectParser parser = new AutoDetectParser(new TikaConfig());

    final ParseContext parseContext = new ParseContext();

    final TesseractOCRConfig tessConfig = new TesseractOCRConfig();
    parseContext.set(AutoDetectParser.class, parser);
    parseContext.set(TesseractOCRConfig.class, tessConfig);

    tessConfig.setEnableImageProcessing(true);

    ContentHandler contentHandler = new BodyContentHandler();

    Metadata metadata = new Metadata();


    try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new FileInputStream(file)))) {
        parser.parse(stream, contentHandler, metadata, parseContext);
    }

    return contentHandler.toString();
}


RE: Error calling ImageMagick

Posted by Peter Kronenberg <pe...@torch.ai>.
So actually TikaInputSream(File) is deprecated.  But TikaInputStream(Path) works

In point of fact, this code doesn't appear in our product code, because there, we're dealing with streams.  So I'm usually just doing TikaInputStream.get(InputStream)

-----Original Message-----
From: Peter Kronenberg <pe...@torch.ai> 
Sent: Thursday, February 11, 2021 5:17 PM
To: user@tika.apache.org; tallison@apache.org
Subject: RE: Error calling ImageMagick

This email was sent from outside your organisation, yet is displaying the name of someone from your organisation. This often happens in phishing attempts. Please only interact with this email if you know its source and that the content is safe.


No, not seeing that anymore.  I thought it might have been related to the ImageMagick thing, because they both seemed to have to do with temp files.  But obviously, that wasn't really the case.
So not sure what was causing that, but I don't see it anymore.

And thanks for the coding hint.  Wasn't sure if TikaInputStream automatically did the buffering

-----Original Message-----
From: Tim Allison <ta...@apache.org>
Sent: Thursday, February 11, 2021 4:43 PM
To: user@tika.apache.org
Subject: Re: Error calling ImageMagick

Are you still seeing tesseract txt files piling up?  I'm not able to reproduce this on windows/linux/mac.

This shouldn't cause a problem, but this:

try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new FileInputStream(file)))) {

is more efficient if you do this:

try (TikaInputStream stream = TikaInputStream.get(file)) {

On Wed, Feb 10, 2021 at 2:23 PM Peter Kronenberg <pe...@torch.ai> wrote:
>
> I have also noticed since yesterday that there are files in my temp 
> directory that aren’t being cleaned up.  All of these files contain 
> the output of Tesseract
>
>
>
>
>
> From: Peter Kronenberg
> Sent: Wednesday, February 10, 2021 12:35 PM
> To: user@tika.apache.org
> Subject: Error calling ImageMagick
>
>
>
> I think yesterday’s code introduced a bug.  The temporary file that is created for ImageMagick is not there.
>
>
>
>
>
> [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract 
> is installed and is being invoked. This can add greatly to processing 
> time.  If you do not want tesseract to be applied to your files see:
> https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disab
> le-ocr
>
> magick: no images found for operation `-resize' at CLI arg 9 @ error/operation.c/CLIOption/5361.
>
> [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - 
> ImageMagick failed (commandline: [magick, -density, 300, -depth, 4, 
> -colorspace, gray, -filter, triangle, -resize, 200%, 
> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t
> mp,
> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t
> mp])
>
> org.apache.commons.exec.ExecuteException: Process exited with an
> error: 1 (Exit value: 1)
>
>             at
> org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecuto
> r.java:404)
>
>             at
> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1
> 66)
>
>             at
> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1
> 53)
>
>             at
> org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor
> .java:121)
>
>             at
> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser
> .java:280)
>
>             at
> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser
> .java:248)
>
>             at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14
> 3)
>
>             at
> org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImagePa
> rser.java:94)
>
>             at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14
> 3)
>
>             at org.torchai.ImageMagick.parse(ImageMagick.java:43)
>
>             at org.torchai.ImageMagick.main(ImageMagick.java:56)
>
> Text: MARLEY was dead, to begin with. There is no doubt whatever about
>
> that. The register of his burial was signed by the clergyman, the 
> clerk,
>
> the undertaker, and the chief mourner. Scrooge signed it. And
>
> Scrooge’s name was good upon ’Change, for anything he chose to put
>
> his hand to.
>
>
>
>
>
> Here’s the code:
>
>
>
> public static String parse(String file) throws TikaException, 
> SAXException, IOException {
>
>     final AutoDetectParser parser = new AutoDetectParser(new 
> TikaConfig());
>
>     final ParseContext parseContext = new ParseContext();
>
>     final TesseractOCRConfig tessConfig = new TesseractOCRConfig();
>     parseContext.set(AutoDetectParser.class, parser);
>     parseContext.set(TesseractOCRConfig.class, tessConfig);
>
>     tessConfig.setEnableImageProcessing(true);
>
>     ContentHandler contentHandler = new BodyContentHandler();
>
>     Metadata metadata = new Metadata();
>
>
>     try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new FileInputStream(file)))) {
>         parser.parse(stream, contentHandler, metadata, parseContext);
>     }
>
>     return contentHandler.toString();
> }
>
>

RE: Error calling ImageMagick

Posted by Peter Kronenberg <pe...@torch.ai>.
No, not seeing that anymore.  I thought it might have been related to the ImageMagick thing, because they both seemed to have to do with temp files.  But obviously, that wasn't really the case.  
So not sure what was causing that, but I don't see it anymore.

And thanks for the coding hint.  Wasn't sure if TikaInputStream automatically did the buffering

-----Original Message-----
From: Tim Allison <ta...@apache.org> 
Sent: Thursday, February 11, 2021 4:43 PM
To: user@tika.apache.org
Subject: Re: Error calling ImageMagick

Are you still seeing tesseract txt files piling up?  I'm not able to reproduce this on windows/linux/mac.

This shouldn't cause a problem, but this:

try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new FileInputStream(file)))) {

is more efficient if you do this:

try (TikaInputStream stream = TikaInputStream.get(file)) {

On Wed, Feb 10, 2021 at 2:23 PM Peter Kronenberg <pe...@torch.ai> wrote:
>
> I have also noticed since yesterday that there are files in my temp 
> directory that aren’t being cleaned up.  All of these files contain 
> the output of Tesseract
>
>
>
>
>
> From: Peter Kronenberg
> Sent: Wednesday, February 10, 2021 12:35 PM
> To: user@tika.apache.org
> Subject: Error calling ImageMagick
>
>
>
> I think yesterday’s code introduced a bug.  The temporary file that is created for ImageMagick is not there.
>
>
>
>
>
> [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract 
> is installed and is being invoked. This can add greatly to processing 
> time.  If you do not want tesseract to be applied to your files see: 
> https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disab
> le-ocr
>
> magick: no images found for operation `-resize' at CLI arg 9 @ error/operation.c/CLIOption/5361.
>
> [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - 
> ImageMagick failed (commandline: [magick, -density, 300, -depth, 4, 
> -colorspace, gray, -filter, triangle, -resize, 200%, 
> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t
> mp, 
> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t
> mp])
>
> org.apache.commons.exec.ExecuteException: Process exited with an 
> error: 1 (Exit value: 1)
>
>             at 
> org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecuto
> r.java:404)
>
>             at 
> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1
> 66)
>
>             at 
> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1
> 53)
>
>             at 
> org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor
> .java:121)
>
>             at 
> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser
> .java:280)
>
>             at 
> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser
> .java:248)
>
>             at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14
> 3)
>
>             at 
> org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImagePa
> rser.java:94)
>
>             at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14
> 3)
>
>             at org.torchai.ImageMagick.parse(ImageMagick.java:43)
>
>             at org.torchai.ImageMagick.main(ImageMagick.java:56)
>
> Text: MARLEY was dead, to begin with. There is no doubt whatever about
>
> that. The register of his burial was signed by the clergyman, the 
> clerk,
>
> the undertaker, and the chief mourner. Scrooge signed it. And
>
> Scrooge’s name was good upon ’Change, for anything he chose to put
>
> his hand to.
>
>
>
>
>
> Here’s the code:
>
>
>
> public static String parse(String file) throws TikaException, 
> SAXException, IOException {
>
>     final AutoDetectParser parser = new AutoDetectParser(new 
> TikaConfig());
>
>     final ParseContext parseContext = new ParseContext();
>
>     final TesseractOCRConfig tessConfig = new TesseractOCRConfig();
>     parseContext.set(AutoDetectParser.class, parser);
>     parseContext.set(TesseractOCRConfig.class, tessConfig);
>
>     tessConfig.setEnableImageProcessing(true);
>
>     ContentHandler contentHandler = new BodyContentHandler();
>
>     Metadata metadata = new Metadata();
>
>
>     try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new FileInputStream(file)))) {
>         parser.parse(stream, contentHandler, metadata, parseContext);
>     }
>
>     return contentHandler.toString();
> }
>
>

Re: Error calling ImageMagick

Posted by Tim Allison <ta...@apache.org>.
Are you still seeing tesseract txt files piling up?  I'm not able to
reproduce this on windows/linux/mac.

This shouldn't cause a problem, but this:

try (TikaInputStream stream = TikaInputStream.get(new
BufferedInputStream(new FileInputStream(file)))) {

is more efficient if you do this:

try (TikaInputStream stream = TikaInputStream.get(file)) {

On Wed, Feb 10, 2021 at 2:23 PM Peter Kronenberg
<pe...@torch.ai> wrote:
>
> I have also noticed since yesterday that there are files in my temp directory that aren’t being cleaned up.  All of these files contain the output of Tesseract
>
>
>
>
>
> From: Peter Kronenberg
> Sent: Wednesday, February 10, 2021 12:35 PM
> To: user@tika.apache.org
> Subject: Error calling ImageMagick
>
>
>
> I think yesterday’s code introduced a bug.  The temporary file that is created for ImageMagick is not there.
>
>
>
>
>
> [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract is installed and is being invoked. This can add greatly to processing time.  If you do not want tesseract to be applied to your files see: https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr
>
> magick: no images found for operation `-resize' at CLI arg 9 @ error/operation.c/CLIOption/5361.
>
> [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - ImageMagick failed (commandline: [magick, -density, 300, -depth, 4, -colorspace, gray, -filter, triangle, -resize, 200%, C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp, C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp])
>
> org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
>
>             at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
>
>             at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
>
>             at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153)
>
>             at org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor.java:121)
>
>             at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:280)
>
>             at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:248)
>
>             at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>             at org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImageParser.java:94)
>
>             at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>             at org.torchai.ImageMagick.parse(ImageMagick.java:43)
>
>             at org.torchai.ImageMagick.main(ImageMagick.java:56)
>
> Text: MARLEY was dead, to begin with. There is no doubt whatever about
>
> that. The register of his burial was signed by the clergyman, the clerk,
>
> the undertaker, and the chief mourner. Scrooge signed it. And
>
> Scrooge’s name was good upon ’Change, for anything he chose to put
>
> his hand to.
>
>
>
>
>
> Here’s the code:
>
>
>
> public static String parse(String file) throws TikaException, SAXException, IOException {
>
>     final AutoDetectParser parser = new AutoDetectParser(new TikaConfig());
>
>     final ParseContext parseContext = new ParseContext();
>
>     final TesseractOCRConfig tessConfig = new TesseractOCRConfig();
>     parseContext.set(AutoDetectParser.class, parser);
>     parseContext.set(TesseractOCRConfig.class, tessConfig);
>
>     tessConfig.setEnableImageProcessing(true);
>
>     ContentHandler contentHandler = new BodyContentHandler();
>
>     Metadata metadata = new Metadata();
>
>
>     try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new FileInputStream(file)))) {
>         parser.parse(stream, contentHandler, metadata, parseContext);
>     }
>
>     return contentHandler.toString();
> }
>
>

Re: Error calling ImageMagick

Posted by Tim Allison <ta...@apache.org>.
K. Thank you.  Will check.

On Wed, Feb 10, 2021 at 2:23 PM Peter Kronenberg <pe...@torch.ai>
wrote:

> I have also noticed since yesterday that there are files in my temp
> directory that aren’t being cleaned up.  All of these files contain the
> output of Tesseract
>
>
>
>
>
> *From:* Peter Kronenberg
> *Sent:* Wednesday, February 10, 2021 12:35 PM
> *To:* user@tika.apache.org
> *Subject:* Error calling ImageMagick
>
>
>
> I think yesterday’s code introduced a bug.  The temporary file that is
> created for ImageMagick is not there.
>
>
>
>
>
> [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract is
> installed and is being invoked. This can add greatly to processing time.
> If you do not want tesseract to be applied to your files see:
> https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr
>
> magick: no images found for operation `-resize' at CLI arg 9 @
> error/operation.c/CLIOption/5361.
>
> [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - ImageMagick
> failed (commandline: [magick, -density, 300, -depth, 4, -colorspace, gray,
> -filter, triangle, -resize, 200%,
> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp,
> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp])
>
> org.apache.commons.exec.ExecuteException: Process exited with an error: 1
> (Exit value: 1)
>
>             at
> org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
>
>             at
> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
>
>             at
> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153)
>
>             at
> org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor.java:121)
>
>             at
> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:280)
>
>             at
> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:248)
>
>             at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>             at
> org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImageParser.java:94)
>
>             at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>             at org.torchai.ImageMagick.parse(ImageMagick.java:43)
>
>             at org.torchai.ImageMagick.main(ImageMagick.java:56)
>
> Text: MARLEY was dead, to begin with. There is no doubt whatever about
>
> that. The register of his burial was signed by the clergyman, the clerk,
>
> the undertaker, and the chief mourner. Scrooge signed it. And
>
> Scrooge’s name was good upon ’Change, for anything he chose to put
>
> his hand to.
>
>
>
>
>
> Here’s the code:
>
>
>
> *public static *String parse(String file) *throws *TikaException,
> SAXException, IOException {
>
>     *final *AutoDetectParser parser = *new *AutoDetectParser(*new *
> TikaConfig());
>
>     *final *ParseContext parseContext = *new *ParseContext();
>
>     *final *TesseractOCRConfig tessConfig = *new *TesseractOCRConfig();
>     parseContext.set(AutoDetectParser.*class*, parser);
>     parseContext.set(TesseractOCRConfig.*class*, tessConfig);
>
>     tessConfig.setEnableImageProcessing(*true*);
>
>     ContentHandler contentHandler = *new *BodyContentHandler();
>
>     Metadata metadata = *new *Metadata();
>
>
>     *try *(TikaInputStream stream = TikaInputStream.*get*(*new *
> BufferedInputStream(*new *FileInputStream(file)))) {
>         parser.parse(stream, contentHandler, metadata, parseContext);
>     }
>
>     *return *contentHandler.toString();
> }
>
>
>

RE: Error calling ImageMagick

Posted by Peter Kronenberg <pe...@torch.ai>.
I have also noticed since yesterday that there are files in my temp directory that aren't being cleaned up.  All of these files contain the output of Tesseract

[cid:image001.png@01D6FFB8.39C908F0]

From: Peter Kronenberg
Sent: Wednesday, February 10, 2021 12:35 PM
To: user@tika.apache.org
Subject: Error calling ImageMagick

I think yesterday's code introduced a bug.  The temporary file that is created for ImageMagick is not there.


[main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract is installed and is being invoked. This can add greatly to processing time.  If you do not want tesseract to be applied to your files see: https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr
magick: no images found for operation `-resize' at CLI arg 9 @ error/operation.c/CLIOption/5361.
[main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - ImageMagick failed (commandline: [magick, -density, 300, -depth, 4, -colorspace, gray, -filter, triangle, -resize, 200%, C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp, C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp])
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
            at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
            at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
            at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153)
            at org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor.java:121)
            at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:280)
            at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:248)
            at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
            at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
            at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
            at org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImageParser.java:94)
            at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
            at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
            at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
            at org.torchai.ImageMagick.parse(ImageMagick.java:43)
            at org.torchai.ImageMagick.main(ImageMagick.java:56)
Text: MARLEY was dead, to begin with. There is no doubt whatever about
that. The register of his burial was signed by the clergyman, the clerk,
the undertaker, and the chief mourner. Scrooge signed it. And
Scrooge's name was good upon 'Change, for anything he chose to put
his hand to.


Here's the code:

public static String parse(String file) throws TikaException, SAXException, IOException {

    final AutoDetectParser parser = new AutoDetectParser(new TikaConfig());

    final ParseContext parseContext = new ParseContext();

    final TesseractOCRConfig tessConfig = new TesseractOCRConfig();
    parseContext.set(AutoDetectParser.class, parser);
    parseContext.set(TesseractOCRConfig.class, tessConfig);

    tessConfig.setEnableImageProcessing(true);

    ContentHandler contentHandler = new BodyContentHandler();

    Metadata metadata = new Metadata();


    try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new FileInputStream(file)))) {
        parser.parse(stream, contentHandler, metadata, parseContext);
    }

    return contentHandler.toString();
}


Re: Error calling ImageMagick

Posted by Nick Burch <ap...@gagravarr.org>.
On Thu, 11 Feb 2021, Tim Allison wrote:
> I can replicate this on my windows laptop.
>
> The weird thing is that the image file is actually there and if I pause the
> debugger at the point after imagemagick has complained that the file isn't
> there but before Tika does the clean up,

Windows is funny about two programs having the same file open, especially 
if one of them has it for read-write. It does mean that Windows has file 
semantics that better match what people expect (opening a file on unix 
then deleting it, but still being able to keep reading the old file 
confuses developers the first time they discover it!), but also means you 
have to be more careful about closing files, simultanious access etc

Nick

RE: Error calling ImageMagick

Posted by Peter Kronenberg <pe...@torch.ai>.
Yup, works now.  Thanks

From: Tim Allison <ta...@apache.org>
Sent: Thursday, February 11, 2021 4:36 PM
To: user@tika.apache.org
Subject: Re: Error calling ImageMagick

Fixed.  I dropped the extra "convert" command after ImageMagick on Windows.  Give it a try and let me know what you find.

On Thu, Feb 11, 2021 at 4:26 PM Tim Allison <ta...@apache.org>> wrote:
I can replicate this on my windows laptop.

The weird thing is that the image file is actually there and if I pause the debugger at the point after imagemagick has complained that the file isn't there but before Tika does the clean up,
I can see the file is still there, and I can run imagemagick on it from the command line.  I wonder if it is a permissions issue or if commons exec is doing something with the file name that doesn't work on windows?

On Thu, Feb 11, 2021 at 3:20 PM Peter Kronenberg <pe...@torch.ai>> wrote:
Never had the problem before.  It just started about 2 days ago.

From: Tim Allison <ta...@apache.org>>
Sent: Thursday, February 11, 2021 3:14 PM
To: user@tika.apache.org<ma...@tika.apache.org>
Subject: Re: Error calling ImageMagick

I still haven't gotten around to replicating on my windows laptop.

Is there a chance that imagemagick doesn't like having the same file for the input and output?

On Thu, Feb 11, 2021 at 3:11 PM Peter Kronenberg <pe...@torch.ai>> wrote:

Still having this issue with ImageMagick.  I used my phone to take a slow-motion video of the temp directory and was able to see 4 files get created and deleted very quickly.  One of them is the file that ImageMagick was looing for.  But it seems that it’s getting deleted too soon.



[cid:image001.png@01D70099.35F1BBF0]





[cid:image002.png@01D70099.35F1BBF0]





-----Original Message-----
From: Tim Allison <ta...@apache.org>>
Sent: Wednesday, February 10, 2021 5:44 PM
To: user@tika.apache.org<ma...@tika.apache.org>
Subject: Re: Error calling ImageMagick



Works on linux for me.  Let me break out my windows laptop.



On Wed, Feb 10, 2021 at 12:34 PM Peter Kronenberg <pe...@torch.ai>> wrote:

>

> I think yesterday’s code introduced a bug.  The temporary file that is created for ImageMagick is not there.

>

>

>

>

>

> [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract

> is installed and is being invoked. This can add greatly to processing

> time.  If you do not want tesseract to be applied to your files see:

> https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disab

> le-ocr

>

> magick: no images found for operation `-resize' at CLI arg 9 @ error/operation.c/CLIOption/5361.

>

> [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser -

> ImageMagick failed (commandline: [magick, -density, 300, -depth, 4,

> -colorspace, gray, -filter, triangle, -resize, 200%,

> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t

> mp,

> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t

> mp])

>

> org.apache.commons.exec.ExecuteException: Process exited with an

> error: 1 (Exit value: 1)

>

>             at

> org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecuto

> r.java:404)

>

>             at

> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1

> 66)

>

>             at

> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1

> 53)

>

>             at

> org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor

> .java:121)

>

>             at

> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser

> .java:280)

>

>             at

> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser

> .java:248)

>

>             at

> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)

>

>             at

> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)

>

>             at

> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14

> 3)

>

>             at

> org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImagePa

> rser.java:94)

>

>             at

> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)

>

>             at

> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)

>

>             at

> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14

> 3)

>

>             at org.torchai.ImageMagick.parse(ImageMagick.java:43)

>

>             at org.torchai.ImageMagick.main(ImageMagick.java:56)

>

> Text: MARLEY was dead, to begin with. There is no doubt whatever about

>

> that. The register of his burial was signed by the clergyman, the

> clerk,

>

> the undertaker, and the chief mourner. Scrooge signed it. And

>

> Scrooge’s name was good upon ’Change, for anything he chose to put

>

> his hand to.

>

>

>

>

>

> Here’s the code:

>

>

>

> public static String parse(String file) throws TikaException,

> SAXException, IOException {

>

>     final AutoDetectParser parser = new AutoDetectParser(new

> TikaConfig());

>

>     final ParseContext parseContext = new ParseContext();

>

>     final TesseractOCRConfig tessConfig = new TesseractOCRConfig();

>     parseContext.set(AutoDetectParser.class, parser);

>     parseContext.set(TesseractOCRConfig.class, tessConfig);

>

>     tessConfig.setEnableImageProcessing(true);

>

>     ContentHandler contentHandler = new BodyContentHandler();

>

>     Metadata metadata = new Metadata();

>

>

>     try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new FileInputStream(file)))) {

>         parser.parse(stream, contentHandler, metadata, parseContext);

>     }

>

>     return contentHandler.toString();

> }

>

>

Re: Error calling ImageMagick

Posted by Tim Allison <ta...@apache.org>.
Fixed.  I dropped the extra "convert" command after ImageMagick on
Windows.  Give it a try and let me know what you find.

On Thu, Feb 11, 2021 at 4:26 PM Tim Allison <ta...@apache.org> wrote:

> I can replicate this on my windows laptop.
>
> The weird thing is that the image file is actually there and if I pause
> the debugger at the point after imagemagick has complained that the file
> isn't there but before Tika does the clean up,
> I can see the file is still there, and I can run imagemagick on it from
> the command line.  I wonder if it is a permissions issue or if commons exec
> is doing something with the file name that doesn't work on windows?
>
> On Thu, Feb 11, 2021 at 3:20 PM Peter Kronenberg <
> peter.kronenberg@torch.ai> wrote:
>
>> Never had the problem before.  It just started about 2 days ago.
>>
>>
>>
>> *From:* Tim Allison <ta...@apache.org>
>> *Sent:* Thursday, February 11, 2021 3:14 PM
>> *To:* user@tika.apache.org
>> *Subject:* Re: Error calling ImageMagick
>>
>>
>>
>> I still haven't gotten around to replicating on my windows laptop.
>>
>>
>>
>> Is there a chance that imagemagick doesn't like having the same file for
>> the input and output?
>>
>>
>>
>> On Thu, Feb 11, 2021 at 3:11 PM Peter Kronenberg <
>> peter.kronenberg@torch.ai> wrote:
>>
>> Still having this issue with ImageMagick.  I used my phone to take a
>> slow-motion video of the temp directory and was able to see 4 files get
>> created and deleted very quickly.  One of them is the file that ImageMagick
>> was looing for.  But it seems that it’s getting deleted too soon.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Tim Allison <ta...@apache.org>
>> Sent: Wednesday, February 10, 2021 5:44 PM
>> To: user@tika.apache.org
>> Subject: Re: Error calling ImageMagick
>>
>>
>>
>> Works on linux for me.  Let me break out my windows laptop.
>>
>>
>>
>> On Wed, Feb 10, 2021 at 12:34 PM Peter Kronenberg <
>> peter.kronenberg@torch.ai> wrote:
>>
>> >
>>
>> > I think yesterday’s code introduced a bug.  The temporary file that is
>> created for ImageMagick is not there.
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> > [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract
>>
>> > is installed and is being invoked. This can add greatly to processing
>>
>> > time.  If you do not want tesseract to be applied to your files see:
>>
>> > https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disab
>>
>> > le-ocr
>>
>> >
>>
>> > magick: no images found for operation `-resize' at CLI arg 9 @
>> error/operation.c/CLIOption/5361.
>>
>> >
>>
>> > [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser -
>>
>> > ImageMagick failed (commandline: [magick, -density, 300, -depth, 4,
>>
>> > -colorspace, gray, -filter, triangle, -resize, 200%,
>>
>> > C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t
>>
>> > mp,
>>
>> > C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t
>>
>> > mp])
>>
>> >
>>
>> > org.apache.commons.exec.ExecuteException: Process exited with an
>>
>> > error: 1 (Exit value: 1)
>>
>> >
>>
>> >             at
>>
>> > org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecuto
>>
>> > r.java:404)
>>
>> >
>>
>> >             at
>>
>> > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1
>>
>> > 66)
>>
>> >
>>
>> >             at
>>
>> > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1
>>
>> > 53)
>>
>> >
>>
>> >             at
>>
>> > org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor
>>
>> > .java:121)
>>
>> >
>>
>> >             at
>>
>> > org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser
>>
>> > .java:280)
>>
>> >
>>
>> >             at
>>
>> > org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser
>>
>> > .java:248)
>>
>> >
>>
>> >             at
>>
>> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>>
>> >
>>
>> >             at
>>
>> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>>
>> >
>>
>> >             at
>>
>> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14
>>
>> > 3)
>>
>> >
>>
>> >             at
>>
>> > org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImagePa
>>
>> > rser.java:94)
>>
>> >
>>
>> >             at
>>
>> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>>
>> >
>>
>> >             at
>>
>> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>>
>> >
>>
>> >             at
>>
>> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14
>>
>> > 3)
>>
>> >
>>
>> >             at org.torchai.ImageMagick.parse(ImageMagick.java:43)
>>
>> >
>>
>> >             at org.torchai.ImageMagick.main(ImageMagick.java:56)
>>
>> >
>>
>> > Text: MARLEY was dead, to begin with. There is no doubt whatever about
>>
>> >
>>
>> > that. The register of his burial was signed by the clergyman, the
>>
>> > clerk,
>>
>> >
>>
>> > the undertaker, and the chief mourner. Scrooge signed it. And
>>
>> >
>>
>> > Scrooge’s name was good upon ’Change, for anything he chose to put
>>
>> >
>>
>> > his hand to.
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> > Here’s the code:
>>
>> >
>>
>> >
>>
>> >
>>
>> > public static String parse(String file) throws TikaException,
>>
>> > SAXException, IOException {
>>
>> >
>>
>> >     final AutoDetectParser parser = new AutoDetectParser(new
>>
>> > TikaConfig());
>>
>> >
>>
>> >     final ParseContext parseContext = new ParseContext();
>>
>> >
>>
>> >     final TesseractOCRConfig tessConfig = new TesseractOCRConfig();
>>
>> >     parseContext.set(AutoDetectParser.class, parser);
>>
>> >     parseContext.set(TesseractOCRConfig.class, tessConfig);
>>
>> >
>>
>> >     tessConfig.setEnableImageProcessing(true);
>>
>> >
>>
>> >     ContentHandler contentHandler = new BodyContentHandler();
>>
>> >
>>
>> >     Metadata metadata = new Metadata();
>>
>> >
>>
>> >
>>
>> >     try (TikaInputStream stream = TikaInputStream.get(new
>> BufferedInputStream(new FileInputStream(file)))) {
>>
>> >         parser.parse(stream, contentHandler, metadata, parseContext);
>>
>> >     }
>>
>> >
>>
>> >     return contentHandler.toString();
>>
>> > }
>>
>> >
>>
>> >
>>
>>

Re: Error calling ImageMagick

Posted by Tim Allison <ta...@apache.org>.
I can replicate this on my windows laptop.

The weird thing is that the image file is actually there and if I pause the
debugger at the point after imagemagick has complained that the file isn't
there but before Tika does the clean up,
I can see the file is still there, and I can run imagemagick on it from the
command line.  I wonder if it is a permissions issue or if commons exec is
doing something with the file name that doesn't work on windows?

On Thu, Feb 11, 2021 at 3:20 PM Peter Kronenberg <pe...@torch.ai>
wrote:

> Never had the problem before.  It just started about 2 days ago.
>
>
>
> *From:* Tim Allison <ta...@apache.org>
> *Sent:* Thursday, February 11, 2021 3:14 PM
> *To:* user@tika.apache.org
> *Subject:* Re: Error calling ImageMagick
>
>
>
> I still haven't gotten around to replicating on my windows laptop.
>
>
>
> Is there a chance that imagemagick doesn't like having the same file for
> the input and output?
>
>
>
> On Thu, Feb 11, 2021 at 3:11 PM Peter Kronenberg <
> peter.kronenberg@torch.ai> wrote:
>
> Still having this issue with ImageMagick.  I used my phone to take a
> slow-motion video of the temp directory and was able to see 4 files get
> created and deleted very quickly.  One of them is the file that ImageMagick
> was looing for.  But it seems that it’s getting deleted too soon.
>
>
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Tim Allison <ta...@apache.org>
> Sent: Wednesday, February 10, 2021 5:44 PM
> To: user@tika.apache.org
> Subject: Re: Error calling ImageMagick
>
>
>
> Works on linux for me.  Let me break out my windows laptop.
>
>
>
> On Wed, Feb 10, 2021 at 12:34 PM Peter Kronenberg <
> peter.kronenberg@torch.ai> wrote:
>
> >
>
> > I think yesterday’s code introduced a bug.  The temporary file that is
> created for ImageMagick is not there.
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract
>
> > is installed and is being invoked. This can add greatly to processing
>
> > time.  If you do not want tesseract to be applied to your files see:
>
> > https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disab
>
> > le-ocr
>
> >
>
> > magick: no images found for operation `-resize' at CLI arg 9 @
> error/operation.c/CLIOption/5361.
>
> >
>
> > [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser -
>
> > ImageMagick failed (commandline: [magick, -density, 300, -depth, 4,
>
> > -colorspace, gray, -filter, triangle, -resize, 200%,
>
> > C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t
>
> > mp,
>
> > C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t
>
> > mp])
>
> >
>
> > org.apache.commons.exec.ExecuteException: Process exited with an
>
> > error: 1 (Exit value: 1)
>
> >
>
> >             at
>
> > org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecuto
>
> > r.java:404)
>
> >
>
> >             at
>
> > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1
>
> > 66)
>
> >
>
> >             at
>
> > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1
>
> > 53)
>
> >
>
> >             at
>
> > org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor
>
> > .java:121)
>
> >
>
> >             at
>
> > org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser
>
> > .java:280)
>
> >
>
> >             at
>
> > org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser
>
> > .java:248)
>
> >
>
> >             at
>
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
> >
>
> >             at
>
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
> >
>
> >             at
>
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14
>
> > 3)
>
> >
>
> >             at
>
> > org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImagePa
>
> > rser.java:94)
>
> >
>
> >             at
>
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
> >
>
> >             at
>
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
> >
>
> >             at
>
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14
>
> > 3)
>
> >
>
> >             at org.torchai.ImageMagick.parse(ImageMagick.java:43)
>
> >
>
> >             at org.torchai.ImageMagick.main(ImageMagick.java:56)
>
> >
>
> > Text: MARLEY was dead, to begin with. There is no doubt whatever about
>
> >
>
> > that. The register of his burial was signed by the clergyman, the
>
> > clerk,
>
> >
>
> > the undertaker, and the chief mourner. Scrooge signed it. And
>
> >
>
> > Scrooge’s name was good upon ’Change, for anything he chose to put
>
> >
>
> > his hand to.
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > Here’s the code:
>
> >
>
> >
>
> >
>
> > public static String parse(String file) throws TikaException,
>
> > SAXException, IOException {
>
> >
>
> >     final AutoDetectParser parser = new AutoDetectParser(new
>
> > TikaConfig());
>
> >
>
> >     final ParseContext parseContext = new ParseContext();
>
> >
>
> >     final TesseractOCRConfig tessConfig = new TesseractOCRConfig();
>
> >     parseContext.set(AutoDetectParser.class, parser);
>
> >     parseContext.set(TesseractOCRConfig.class, tessConfig);
>
> >
>
> >     tessConfig.setEnableImageProcessing(true);
>
> >
>
> >     ContentHandler contentHandler = new BodyContentHandler();
>
> >
>
> >     Metadata metadata = new Metadata();
>
> >
>
> >
>
> >     try (TikaInputStream stream = TikaInputStream.get(new
> BufferedInputStream(new FileInputStream(file)))) {
>
> >         parser.parse(stream, contentHandler, metadata, parseContext);
>
> >     }
>
> >
>
> >     return contentHandler.toString();
>
> > }
>
> >
>
> >
>
>

RE: Error calling ImageMagick

Posted by Peter Kronenberg <pe...@torch.ai>.
Never had the problem before.  It just started about 2 days ago.

From: Tim Allison <ta...@apache.org>
Sent: Thursday, February 11, 2021 3:14 PM
To: user@tika.apache.org
Subject: Re: Error calling ImageMagick

I still haven't gotten around to replicating on my windows laptop.

Is there a chance that imagemagick doesn't like having the same file for the input and output?

On Thu, Feb 11, 2021 at 3:11 PM Peter Kronenberg <pe...@torch.ai>> wrote:

Still having this issue with ImageMagick.  I used my phone to take a slow-motion video of the temp directory and was able to see 4 files get created and deleted very quickly.  One of them is the file that ImageMagick was looing for.  But it seems that it’s getting deleted too soon.



[cid:image001.png@01D70089.5F5F7550]





[cid:image002.png@01D70089.5F5F7550]





-----Original Message-----
From: Tim Allison <ta...@apache.org>>
Sent: Wednesday, February 10, 2021 5:44 PM
To: user@tika.apache.org<ma...@tika.apache.org>
Subject: Re: Error calling ImageMagick



Works on linux for me.  Let me break out my windows laptop.



On Wed, Feb 10, 2021 at 12:34 PM Peter Kronenberg <pe...@torch.ai>> wrote:

>

> I think yesterday’s code introduced a bug.  The temporary file that is created for ImageMagick is not there.

>

>

>

>

>

> [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract

> is installed and is being invoked. This can add greatly to processing

> time.  If you do not want tesseract to be applied to your files see:

> https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disab

> le-ocr

>

> magick: no images found for operation `-resize' at CLI arg 9 @ error/operation.c/CLIOption/5361.

>

> [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser -

> ImageMagick failed (commandline: [magick, -density, 300, -depth, 4,

> -colorspace, gray, -filter, triangle, -resize, 200%,

> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t

> mp,

> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t

> mp])

>

> org.apache.commons.exec.ExecuteException: Process exited with an

> error: 1 (Exit value: 1)

>

>             at

> org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecuto

> r.java:404)

>

>             at

> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1

> 66)

>

>             at

> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1

> 53)

>

>             at

> org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor

> .java:121)

>

>             at

> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser

> .java:280)

>

>             at

> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser

> .java:248)

>

>             at

> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)

>

>             at

> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)

>

>             at

> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14

> 3)

>

>             at

> org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImagePa

> rser.java:94)

>

>             at

> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)

>

>             at

> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)

>

>             at

> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14

> 3)

>

>             at org.torchai.ImageMagick.parse(ImageMagick.java:43)

>

>             at org.torchai.ImageMagick.main(ImageMagick.java:56)

>

> Text: MARLEY was dead, to begin with. There is no doubt whatever about

>

> that. The register of his burial was signed by the clergyman, the

> clerk,

>

> the undertaker, and the chief mourner. Scrooge signed it. And

>

> Scrooge’s name was good upon ’Change, for anything he chose to put

>

> his hand to.

>

>

>

>

>

> Here’s the code:

>

>

>

> public static String parse(String file) throws TikaException,

> SAXException, IOException {

>

>     final AutoDetectParser parser = new AutoDetectParser(new

> TikaConfig());

>

>     final ParseContext parseContext = new ParseContext();

>

>     final TesseractOCRConfig tessConfig = new TesseractOCRConfig();

>     parseContext.set(AutoDetectParser.class, parser);

>     parseContext.set(TesseractOCRConfig.class, tessConfig);

>

>     tessConfig.setEnableImageProcessing(true);

>

>     ContentHandler contentHandler = new BodyContentHandler();

>

>     Metadata metadata = new Metadata();

>

>

>     try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new FileInputStream(file)))) {

>         parser.parse(stream, contentHandler, metadata, parseContext);

>     }

>

>     return contentHandler.toString();

> }

>

>

Re: Error calling ImageMagick

Posted by Tim Allison <ta...@apache.org>.
I still haven't gotten around to replicating on my windows laptop.

Is there a chance that imagemagick doesn't like having the same file for
the input and output?

On Thu, Feb 11, 2021 at 3:11 PM Peter Kronenberg <pe...@torch.ai>
wrote:

> Still having this issue with ImageMagick.  I used my phone to take a
> slow-motion video of the temp directory and was able to see 4 files get
> created and deleted very quickly.  One of them is the file that ImageMagick
> was looing for.  But it seems that it’s getting deleted too soon.
>
>
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Tim Allison <ta...@apache.org>
> Sent: Wednesday, February 10, 2021 5:44 PM
> To: user@tika.apache.org
> Subject: Re: Error calling ImageMagick
>
>
>
> Works on linux for me.  Let me break out my windows laptop.
>
>
>
> On Wed, Feb 10, 2021 at 12:34 PM Peter Kronenberg <
> peter.kronenberg@torch.ai> wrote:
>
> >
>
> > I think yesterday’s code introduced a bug.  The temporary file that is
> created for ImageMagick is not there.
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract
>
> > is installed and is being invoked. This can add greatly to processing
>
> > time.  If you do not want tesseract to be applied to your files see:
>
> > https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disab
>
> > le-ocr
>
> >
>
> > magick: no images found for operation `-resize' at CLI arg 9 @
> error/operation.c/CLIOption/5361.
>
> >
>
> > [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser -
>
> > ImageMagick failed (commandline: [magick, -density, 300, -depth, 4,
>
> > -colorspace, gray, -filter, triangle, -resize, 200%,
>
> > C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t
>
> > mp,
>
> > C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t
>
> > mp])
>
> >
>
> > org.apache.commons.exec.ExecuteException: Process exited with an
>
> > error: 1 (Exit value: 1)
>
> >
>
> >             at
>
> > org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecuto
>
> > r.java:404)
>
> >
>
> >             at
>
> > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1
>
> > 66)
>
> >
>
> >             at
>
> > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1
>
> > 53)
>
> >
>
> >             at
>
> > org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor
>
> > .java:121)
>
> >
>
> >             at
>
> > org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser
>
> > .java:280)
>
> >
>
> >             at
>
> > org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser
>
> > .java:248)
>
> >
>
> >             at
>
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
> >
>
> >             at
>
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
> >
>
> >             at
>
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14
>
> > 3)
>
> >
>
> >             at
>
> > org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImagePa
>
> > rser.java:94)
>
> >
>
> >             at
>
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
> >
>
> >             at
>
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
> >
>
> >             at
>
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14
>
> > 3)
>
> >
>
> >             at org.torchai.ImageMagick.parse(ImageMagick.java:43)
>
> >
>
> >             at org.torchai.ImageMagick.main(ImageMagick.java:56)
>
> >
>
> > Text: MARLEY was dead, to begin with. There is no doubt whatever about
>
> >
>
> > that. The register of his burial was signed by the clergyman, the
>
> > clerk,
>
> >
>
> > the undertaker, and the chief mourner. Scrooge signed it. And
>
> >
>
> > Scrooge’s name was good upon ’Change, for anything he chose to put
>
> >
>
> > his hand to.
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > Here’s the code:
>
> >
>
> >
>
> >
>
> > public static String parse(String file) throws TikaException,
>
> > SAXException, IOException {
>
> >
>
> >     final AutoDetectParser parser = new AutoDetectParser(new
>
> > TikaConfig());
>
> >
>
> >     final ParseContext parseContext = new ParseContext();
>
> >
>
> >     final TesseractOCRConfig tessConfig = new TesseractOCRConfig();
>
> >     parseContext.set(AutoDetectParser.class, parser);
>
> >     parseContext.set(TesseractOCRConfig.class, tessConfig);
>
> >
>
> >     tessConfig.setEnableImageProcessing(true);
>
> >
>
> >     ContentHandler contentHandler = new BodyContentHandler();
>
> >
>
> >     Metadata metadata = new Metadata();
>
> >
>
> >
>
> >     try (TikaInputStream stream = TikaInputStream.get(new
> BufferedInputStream(new FileInputStream(file)))) {
>
> >         parser.parse(stream, contentHandler, metadata, parseContext);
>
> >     }
>
> >
>
> >     return contentHandler.toString();
>
> > }
>
> >
>
> >
>

RE: Error calling ImageMagick

Posted by Peter Kronenberg <pe...@torch.ai>.
Still having this issue with ImageMagick.  I used my phone to take a slow-motion video of the temp directory and was able to see 4 files get created and deleted very quickly.  One of them is the file that ImageMagick was looing for.  But it seems that it’s getting deleted too soon.



[cid:image001.png@01D70087.0F37D060]





[cid:image002.png@01D70088.2166DAA0]





-----Original Message-----
From: Tim Allison <ta...@apache.org>
Sent: Wednesday, February 10, 2021 5:44 PM
To: user@tika.apache.org
Subject: Re: Error calling ImageMagick



Works on linux for me.  Let me break out my windows laptop.



On Wed, Feb 10, 2021 at 12:34 PM Peter Kronenberg <pe...@torch.ai>> wrote:

>

> I think yesterday’s code introduced a bug.  The temporary file that is created for ImageMagick is not there.

>

>

>

>

>

> [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract

> is installed and is being invoked. This can add greatly to processing

> time.  If you do not want tesseract to be applied to your files see:

> https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disab

> le-ocr

>

> magick: no images found for operation `-resize' at CLI arg 9 @ error/operation.c/CLIOption/5361.

>

> [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser -

> ImageMagick failed (commandline: [magick, -density, 300, -depth, 4,

> -colorspace, gray, -filter, triangle, -resize, 200%,

> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t

> mp,

> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t

> mp])

>

> org.apache.commons.exec.ExecuteException: Process exited with an

> error: 1 (Exit value: 1)

>

>             at

> org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecuto

> r.java:404)

>

>             at

> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1

> 66)

>

>             at

> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1

> 53)

>

>             at

> org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor

> .java:121)

>

>             at

> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser

> .java:280)

>

>             at

> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser

> .java:248)

>

>             at

> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)

>

>             at

> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)

>

>             at

> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14

> 3)

>

>             at

> org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImagePa

> rser.java:94)

>

>             at

> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)

>

>             at

> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)

>

>             at

> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14

> 3)

>

>             at org.torchai.ImageMagick.parse(ImageMagick.java:43)

>

>             at org.torchai.ImageMagick.main(ImageMagick.java:56)

>

> Text: MARLEY was dead, to begin with. There is no doubt whatever about

>

> that. The register of his burial was signed by the clergyman, the

> clerk,

>

> the undertaker, and the chief mourner. Scrooge signed it. And

>

> Scrooge’s name was good upon ’Change, for anything he chose to put

>

> his hand to.

>

>

>

>

>

> Here’s the code:

>

>

>

> public static String parse(String file) throws TikaException,

> SAXException, IOException {

>

>     final AutoDetectParser parser = new AutoDetectParser(new

> TikaConfig());

>

>     final ParseContext parseContext = new ParseContext();

>

>     final TesseractOCRConfig tessConfig = new TesseractOCRConfig();

>     parseContext.set(AutoDetectParser.class, parser);

>     parseContext.set(TesseractOCRConfig.class, tessConfig);

>

>     tessConfig.setEnableImageProcessing(true);

>

>     ContentHandler contentHandler = new BodyContentHandler();

>

>     Metadata metadata = new Metadata();

>

>

>     try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new FileInputStream(file)))) {

>         parser.parse(stream, contentHandler, metadata, parseContext);

>     }

>

>     return contentHandler.toString();

> }

>

>

Re: Error calling ImageMagick

Posted by Tim Allison <ta...@apache.org>.
Works on linux for me.  Let me break out my windows laptop.

On Wed, Feb 10, 2021 at 12:34 PM Peter Kronenberg
<pe...@torch.ai> wrote:
>
> I think yesterday’s code introduced a bug.  The temporary file that is created for ImageMagick is not there.
>
>
>
>
>
> [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract is installed and is being invoked. This can add greatly to processing time.  If you do not want tesseract to be applied to your files see: https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr
>
> magick: no images found for operation `-resize' at CLI arg 9 @ error/operation.c/CLIOption/5361.
>
> [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - ImageMagick failed (commandline: [magick, -density, 300, -depth, 4, -colorspace, gray, -filter, triangle, -resize, 200%, C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp, C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp])
>
> org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
>
>             at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
>
>             at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
>
>             at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153)
>
>             at org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor.java:121)
>
>             at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:280)
>
>             at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:248)
>
>             at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>             at org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImageParser.java:94)
>
>             at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>             at org.torchai.ImageMagick.parse(ImageMagick.java:43)
>
>             at org.torchai.ImageMagick.main(ImageMagick.java:56)
>
> Text: MARLEY was dead, to begin with. There is no doubt whatever about
>
> that. The register of his burial was signed by the clergyman, the clerk,
>
> the undertaker, and the chief mourner. Scrooge signed it. And
>
> Scrooge’s name was good upon ’Change, for anything he chose to put
>
> his hand to.
>
>
>
>
>
> Here’s the code:
>
>
>
> public static String parse(String file) throws TikaException, SAXException, IOException {
>
>     final AutoDetectParser parser = new AutoDetectParser(new TikaConfig());
>
>     final ParseContext parseContext = new ParseContext();
>
>     final TesseractOCRConfig tessConfig = new TesseractOCRConfig();
>     parseContext.set(AutoDetectParser.class, parser);
>     parseContext.set(TesseractOCRConfig.class, tessConfig);
>
>     tessConfig.setEnableImageProcessing(true);
>
>     ContentHandler contentHandler = new BodyContentHandler();
>
>     Metadata metadata = new Metadata();
>
>
>     try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new FileInputStream(file)))) {
>         parser.parse(stream, contentHandler, metadata, parseContext);
>     }
>
>     return contentHandler.toString();
> }
>
>