You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Albretch Mueller <lb...@gmail.com> on 2011/09/09 03:37:52 UTC

org.apache.tika.exception.TikaException while trying to get images' metadata ...

 After reading this post:
~
 http://blog.jeroenreijn.com/2010/04/metadata-extraction-with-apache-tika.html
~
 getting metadata from files using tika seemed easy. Right now what I
am most interested in is images but all tika gives me is:
~
org.apache.tika.exception.TikaException: Can't read JPEG metadata
 at org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:92)
 at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:66)
~
org.apache.tika.exception.TikaException: image/png parse error
 at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:91)
~
org.apache.tika.exception.TikaException: image/gif parse error
 at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:91)
~
 I am using tika-app-0.9
~
 How could you read metadata from files? Do you have any true code examples?
~
 lbrtchx
 thank you

Re: org.apache.tika.exception.TikaException while trying to get images' metadata ...

Posted by Albretch Mueller <lb...@gmail.com>.
 Hmm! But, why do I have then so many corrupted files?
~
 $ ls -l
total 240
-rw-r--r-- 1 knoppix knoppix  23772 Sep  9 16:05 26-3.jpg
-rw-r--r-- 1 knoppix knoppix 172471 Sep  9 16:05 3997912989_5e666b3a4b.jpg
-rwxr-xr-x 1 knoppix knoppix   9620 Sep  9 16:05 6926.jpeg
-rwxr-xr-x 1 knoppix knoppix  16131 Sep  9 16:05 edeploy_os.jpg
-rwxr-xr-x 1 knoppix knoppix     76 Sep  9 16:06 free.gif
-rwxr--r-- 1 knoppix knoppix   7379 Sep  9 16:05 s-tree.png

$ md5sum *.*
6f9ae626882a00d518fbaf7059a051d5  26-3.jpg
912f608b84ae22513d617a1b85116f10  3997912989_5e666b3a4b.jpg
43f9e5f26a4d703f6ffe0e281735abaa  6926.jpeg
a377a0415f5e0d62b80d7822632f8ba7  edeploy_os.jpg
655c9615b9b32cfad4b004e0a9787239  free.gif
39c4c12814ec6a7f0848073829f90e6a  s-tree.png

 and here they are:

http://hsymbolicus.files.wordpress.com/2011/09/26-3.jpg
http://hsymbolicus.files.wordpress.com/2011/09/6926.jpeg
http://hsymbolicus.files.wordpress.com/2011/09/3997912989_5e666b3a4b.jpg
http://hsymbolicus.files.wordpress.com/2011/09/edeploy_os.jpg
http://hsymbolicus.files.wordpress.com/2011/09/free.gif
http://hsymbolicus.files.wordpress.com/2011/09/s-tree.png

 Some of these files I had even grabbed from the Internet for my tests
~
 Please, let me know (your theory of) what is going on?
~
 Thank you
 lbrtchx

Re: org.apache.tika.exception.TikaException while trying to get images' metadata ...

Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 8 Sep 2011, Albretch Mueller wrote:
> ~
> getting metadata from files using tika seemed easy. Right now what I
> am most interested in is images but all tika gives me is:
> ~
> org.apache.tika.exception.TikaException: Can't read JPEG metadata
> at org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:92)
> at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:66)
> ~
> org.apache.tika.exception.TikaException: image/png parse error
> at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:91)

This sort of thing is normally caused by corrupt images. Are you able to 
share any of your problematic files?

Nick