You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Sasha Goodman (JIRA)" <ji...@apache.org> on 2018/03/09 08:34:00 UTC

[jira] [Updated] (TIKA-2604) Error with certain jar paths on OS X

     [ https://issues.apache.org/jira/browse/TIKA-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sasha Goodman updated TIKA-2604:
--------------------------------
    Description: 
I've been developing an R interface to the Tika batch processor for the past month ( see: [https://github.com/predict-r/rtika] ), and this software is awesome. I use the command line to call the batch processor, and my code has worked on Ubuntu, Windows 10 and OS X. Several people have been testing my code as well. Its been working.

A few days ago I found an issue with the batch processor on OS X. 

When calling the batch processor with the tika-app-1.17.jar on a path with spaces in it, Tika starts to continually restart.

Here is an example of calling the jar *when the path has spaces.* It *produces this error, and the unexpected restarts*: 
{code:java}
java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/space folder/tika-app.jar' -maxRestarts 1 -t -i '/' -o '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e' -fileList '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'

INFO about to start driver
INFO BatchProcess: Error: Could not find or load main class org.apache.tika.batch.fs.FSBatchProcessCLI
INFO BatchProcess: Caused by: java.lang.ClassNotFoundException: org.apache.tika.batch.fs.FSBatchProcessCLI
INFO The child process has finished with an exit value of: 1
WARN Restarting on unexpected restart code: 1
WARN Must restart process (exitValue=1 numRestarts=0 receivedRestartMessage=false)
INFO BatchProcess: Error: Could not find or load main class org.apache.tika.batch.fs.FSBatchProcessCLI
INFO BatchProcess: Caused by: java.lang.ClassNotFoundException: org.apache.tika.batch.fs.FSBatchProcessCLI
INFO The child process has finished with an exit value of: 1
WARN Restarting on unexpected restart code: 1
WARN Hit the maximum number of process restarts. Driver is shutting down now.
INFO Process driver has completed{code}
The error occurs with double quotes also around the jar.

*In contrast,* calling the jar when the *path does not have spaces produces absolutely NO error*:
{code:java}
java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/tika-app.jar' -maxRestarts 1 -t -i '/' -o '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e' -fileList '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
INFO about to start driver
INFO BatchProcess: log4j:WARN No appenders could be found for logger (org.apache.tika.batch.fs.FSBatchProcessCLI).
INFO BatchProcess: log4j:WARN Please initialize the log4j system properly.
INFO BatchProcess: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
INFO BatchProcess: Mar 09, 2018 12:19:17 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
INFO BatchProcess: WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
INFO BatchProcess: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess: TIFFImageWriter not loaded. tiff files will not be processed
INFO BatchProcess: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess: J2KImageReader not loaded. JPEG2000 files will not be processed.
INFO BatchProcess: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess:
INFO BatchProcess: Mar 09, 2018 12:19:17 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
INFO BatchProcess: WARNING: org.xerial's sqlite-jdbc is not loaded.
INFO BatchProcess: Please provide the jar on your classpath to parse sqlite files.
INFO BatchProcess: See tika-parsers/pom.xml for the correct version.
INFO BatchProcess: randomCrawl attribute is ignored by FSListCrawler
BatchProcess:Main thread in TikaFSBatchCLI has finished processing.
BatchProcess:
BatchProcess:
BatchProcess:ParallelFileProcessingResult{considered=1, added=1, consumed=1, numberHandledExceptions=0, secondsElapsed=0.853, exitStatus=0, causeForTermination='COMPLETED_NORMALLY'}
INFO The child process has finished with an exit value of: 0
INFO Process driver has completed{code}
 

 

Further, and what makes this a batch processor issue, is that that path with the space in it produces absolutely *NO error in the normal Tika CLI mode*: 

 

 
{code:java}
java -jar '/Users/sasha/Downloads/space folder/tika-app.jar' -t /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rtika/extdata/jsonlite.pdf

{code}
 

The last two examples work, but the first does not. 

The only difference is the first is calling the batch processor, and that is causing bugs with whatever file.

 

  was:
I've been developing an R interface to the Tika batch processor for the past month ( see: [https://github.com/predict-r/rtika] ), and this software is awesome. I use the command line to call the batch processor, and my code has worked on Ubuntu, Windows 10 and OS X. Several people have been testing my code as well. Its been working.

A few days ago I found an issue with the batch processor on OS X. 

When calling the batch processor with the tika-app-1.17.jar on a path with spaces in it, Tika starts to continually restart.

Here is an example of calling the jar *when the path has spaces.* It ** produces this *error, and unexpected restart*: 
{code:java}
java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/space folder/tika-app.jar' -maxRestarts 1 -t -i '/' -o '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e' -fileList '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'

INFO about to start driver
INFO BatchProcess: Error: Could not find or load main class org.apache.tika.batch.fs.FSBatchProcessCLI
INFO BatchProcess: Caused by: java.lang.ClassNotFoundException: org.apache.tika.batch.fs.FSBatchProcessCLI
INFO The child process has finished with an exit value of: 1
WARN Restarting on unexpected restart code: 1
WARN Must restart process (exitValue=1 numRestarts=0 receivedRestartMessage=false)
INFO BatchProcess: Error: Could not find or load main class org.apache.tika.batch.fs.FSBatchProcessCLI
INFO BatchProcess: Caused by: java.lang.ClassNotFoundException: org.apache.tika.batch.fs.FSBatchProcessCLI
INFO The child process has finished with an exit value of: 1
WARN Restarting on unexpected restart code: 1
WARN Hit the maximum number of process restarts. Driver is shutting down now.
INFO Process driver has completed{code}
The error occurs with double quotes also around the jar.

*In contrast,* calling the jar when the *path does not have spaces produces absolutely NO error*:
{code:java}
java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/tika-app.jar' -maxRestarts 1 -t -i '/' -o '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e' -fileList '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
INFO about to start driver
INFO BatchProcess: log4j:WARN No appenders could be found for logger (org.apache.tika.batch.fs.FSBatchProcessCLI).
INFO BatchProcess: log4j:WARN Please initialize the log4j system properly.
INFO BatchProcess: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
INFO BatchProcess: Mar 09, 2018 12:19:17 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
INFO BatchProcess: WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
INFO BatchProcess: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess: TIFFImageWriter not loaded. tiff files will not be processed
INFO BatchProcess: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess: J2KImageReader not loaded. JPEG2000 files will not be processed.
INFO BatchProcess: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess:
INFO BatchProcess: Mar 09, 2018 12:19:17 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
INFO BatchProcess: WARNING: org.xerial's sqlite-jdbc is not loaded.
INFO BatchProcess: Please provide the jar on your classpath to parse sqlite files.
INFO BatchProcess: See tika-parsers/pom.xml for the correct version.
INFO BatchProcess: randomCrawl attribute is ignored by FSListCrawler
BatchProcess:Main thread in TikaFSBatchCLI has finished processing.
BatchProcess:
BatchProcess:
BatchProcess:ParallelFileProcessingResult{considered=1, added=1, consumed=1, numberHandledExceptions=0, secondsElapsed=0.853, exitStatus=0, causeForTermination='COMPLETED_NORMALLY'}
INFO The child process has finished with an exit value of: 0
INFO Process driver has completed{code}
 

 

Further, and what makes this a batch processor issue, is that that path with the space in it produces absolutely *NO error in the normal Tika CLI mode*: 

 

 
{code:java}
java -jar '/Users/sasha/Downloads/space folder/tika-app.jar' -t /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rtika/extdata/jsonlite.pdf

{code}
 

The last two examples work, but the first does not. 

The only difference is the first is calling the batch processor, and that is causing bugs with whatever file.

 


> Error with certain jar paths on OS X
> ------------------------------------
>
>                 Key: TIKA-2604
>                 URL: https://issues.apache.org/jira/browse/TIKA-2604
>             Project: Tika
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 1.17
>         Environment: tika-app-1.17.jar, OS X 10.13.3. 
>  
>            Reporter: Sasha Goodman
>            Priority: Major
>
> I've been developing an R interface to the Tika batch processor for the past month ( see: [https://github.com/predict-r/rtika] ), and this software is awesome. I use the command line to call the batch processor, and my code has worked on Ubuntu, Windows 10 and OS X. Several people have been testing my code as well. Its been working.
> A few days ago I found an issue with the batch processor on OS X. 
> When calling the batch processor with the tika-app-1.17.jar on a path with spaces in it, Tika starts to continually restart.
> Here is an example of calling the jar *when the path has spaces.* It *produces this error, and the unexpected restarts*: 
> {code:java}
> java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/space folder/tika-app.jar' -maxRestarts 1 -t -i '/' -o '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e' -fileList '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
> INFO about to start driver
> INFO BatchProcess: Error: Could not find or load main class org.apache.tika.batch.fs.FSBatchProcessCLI
> INFO BatchProcess: Caused by: java.lang.ClassNotFoundException: org.apache.tika.batch.fs.FSBatchProcessCLI
> INFO The child process has finished with an exit value of: 1
> WARN Restarting on unexpected restart code: 1
> WARN Must restart process (exitValue=1 numRestarts=0 receivedRestartMessage=false)
> INFO BatchProcess: Error: Could not find or load main class org.apache.tika.batch.fs.FSBatchProcessCLI
> INFO BatchProcess: Caused by: java.lang.ClassNotFoundException: org.apache.tika.batch.fs.FSBatchProcessCLI
> INFO The child process has finished with an exit value of: 1
> WARN Restarting on unexpected restart code: 1
> WARN Hit the maximum number of process restarts. Driver is shutting down now.
> INFO Process driver has completed{code}
> The error occurs with double quotes also around the jar.
> *In contrast,* calling the jar when the *path does not have spaces produces absolutely NO error*:
> {code:java}
> java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/tika-app.jar' -maxRestarts 1 -t -i '/' -o '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e' -fileList '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
> INFO about to start driver
> INFO BatchProcess: log4j:WARN No appenders could be found for logger (org.apache.tika.batch.fs.FSBatchProcessCLI).
> INFO BatchProcess: log4j:WARN Please initialize the log4j system properly.
> INFO BatchProcess: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
> INFO BatchProcess: Mar 09, 2018 12:19:17 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
> INFO BatchProcess: WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
> INFO BatchProcess: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
> INFO BatchProcess: for optional dependencies.
> INFO BatchProcess: TIFFImageWriter not loaded. tiff files will not be processed
> INFO BatchProcess: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
> INFO BatchProcess: for optional dependencies.
> INFO BatchProcess: J2KImageReader not loaded. JPEG2000 files will not be processed.
> INFO BatchProcess: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
> INFO BatchProcess: for optional dependencies.
> INFO BatchProcess:
> INFO BatchProcess: Mar 09, 2018 12:19:17 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
> INFO BatchProcess: WARNING: org.xerial's sqlite-jdbc is not loaded.
> INFO BatchProcess: Please provide the jar on your classpath to parse sqlite files.
> INFO BatchProcess: See tika-parsers/pom.xml for the correct version.
> INFO BatchProcess: randomCrawl attribute is ignored by FSListCrawler
> BatchProcess:Main thread in TikaFSBatchCLI has finished processing.
> BatchProcess:
> BatchProcess:
> BatchProcess:ParallelFileProcessingResult{considered=1, added=1, consumed=1, numberHandledExceptions=0, secondsElapsed=0.853, exitStatus=0, causeForTermination='COMPLETED_NORMALLY'}
> INFO The child process has finished with an exit value of: 0
> INFO Process driver has completed{code}
>  
>  
> Further, and what makes this a batch processor issue, is that that path with the space in it produces absolutely *NO error in the normal Tika CLI mode*: 
>  
>  
> {code:java}
> java -jar '/Users/sasha/Downloads/space folder/tika-app.jar' -t /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rtika/extdata/jsonlite.pdf
> {code}
>  
> The last two examples work, but the first does not. 
> The only difference is the first is calling the batch processor, and that is causing bugs with whatever file.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)