You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by Apache Wiki <wi...@apache.org> on 2015/05/26 06:21:56 UTC

[Tika Wiki] Update of "EXIFToolParser" by ChrisMattmann

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.

The "EXIFToolParser" page has been changed by ChrisMattmann:
https://wiki.apache.org/tika/EXIFToolParser

Comment:
- add EXIFTool instructions

New page:
Tika supports EXIFTool now through the External parser. Read on to find out how to use it.

= Download and install EXIFTool =

[[http://www.sno.phy.queensu.ca/~phil/exiftool/|EXIFTool]] is a wonderful tool that reads videos, images, audio and other media files and that extracts EXIF metadata from them. If you're lucky, you can install EXIFTool with the following commands.

== On Mac ==

`brew install exiftool`

== On Linux (CentOS) ==

`sudo yum install perl-Image-ExifTool`

To verify that EXIFTool works correctly, run:

{{{
exiftool -ver
}}}

which should output something like: `9.72`

= Using EXIFTool with Tika =

To use EXIFTool you'll need a custom Tika config that will override Tika's default MP4 parser (if you are dealing with MP4 files). You can do so by creating a file such as the one below:

{{{
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser">
    </parser>
    <parser class="org.apache.tika.parser.mp4.MP4Parser">
      <mime-exclude>video/mp4</mime-exclude>
    </parser>
    <parser class="org.apache.tika.parser.external.CompositeExternalParser">
      <mime>video/mp4</mime>
    </parser>
  </parsers>
</properties>
}}}

Note that this config file initializes the DefaultParser a CompositeParser, and the CompositeExternalParser, and the MP4Parser. For the MP4Parser, it uses a new directive, mime-exclude, to exclude that parser from the `video/mp4` type, and then to declare that CompositeExternalParser will support `video/mp4`. Since EXIFTool is an ExternalParser this configuration will make sure it gets called.

Once you have the config file made above, save it as a file, e.g., `exif-tika-config.xml` in the current directory. Then to call Tika, you can use Tika-App and/or Tika Server.

== Using Tika-App ==

Use the following command on a file, e.g., `spaghetti-to-sushi.mp4`:

{{{
java -Dtika.config=exif-tika-config.xml -classpath tika-app/target/tika-app-1.9-SNAPSHOT.jar org.apache.tika.cli.TikaCLI -m spaghetti-to-sushi.mp4
}}}

This should output:

{{{
Audio Bits Per Sample: 16
Audio Channels: 2
Audio Format: mp4a
Audio Sample Rate: 22050
Average Bitrate: 0
Avg Bitrate: 1.26 Mbps
Balance: 0
Bit Depth: 24
Buffer Size: 0
Compatible Brands: mp41
Compressor ID: avc1
Compressor Name: h264
Content Create Date: created.with.SUPER(C).v2006.19
Content Create Date (ja): created.with.SUPER(C).v2006.19
Content-Length: 353985630
Content-Type: video/mp4
Create Date: 2006:12:17 18:50:47
Current Time: 0 s
Duration: 0:37:19
Elementary Stream Track: 201 101
ExifTool Version Number: 9.72
File Access Date/Time: 2015:05:25 21:18:08-07:00
File Inode Change Date/Time: 2014:09:26 20:32:27-07:00
File Modification Date/Time: 2011:07:28 13:01:54-07:00
File Name: spaghetti-to-sushi.mp4
File Permissions: rwxr-xr-x
File Size: 338 MB
File Type: MP4
Graphics Mode: srcCopy
Handler Description: GPAC MPEG-4 BIFS Handler
Handler Type: Metadata
Handler Vendor ID: Apple
Image Height: 480
Image Size: 640x480
Image Width: 640
MIME Type: video
Major Brand: MP4 v2 [ISO 14496-14]
Matrix Structure: 1 0 0 0 1 0 0 0 1
Max Bitrate: 0
Media Create Date: 2006:12:16 20:07:48
Media Duration: 1.00 s
Media Header Version: 0
Media Language Code: und
Media Modify Date: 2006:12:16 20:07:48
Media Time Scale: 90000
Minor Version: 0.0.1
Modify Date: 2006:12:17 18:50:47
Movie Data Offset: 473003
Movie Data Size: 353512586
Movie Header Version: 0
Next Track ID: 201
Op Color: 0 0 0
Other Format: mp4s
Poster Time: 0 s
Preferred Rate: 1
Preferred Volume: 100.00
Preview Duration: 0 s
Preview Time: 0 s
Rotation: 0
Selection Duration: 0 s
Selection Time: 0 s
Source Image Height: 480
Source Image Width: 720
Time Scale: 90000
Title: From Spaghetti to Sushi.mpeg
Title (ja): From Spaghetti to Sushi.mpeg
Track Create Date: 2006:12:17 18:50:47
Track Duration: 0:37:19
Track Header Version: 0
Track ID: 201
Track Layer: 0
Track Modify Date: 2006:12:16 20:07:48
Track Volume: 0.00
Vendor ID: FFmpeg
Video Frame Rate: 25
X Resolution: 72
X-Parsed-By: org.apache.tika.parser.CompositeParser
X-Parsed-By: org.apache.tika.parser.external.CompositeExternalParser
X-Parsed-By: org.apache.tika.parser.external.ExternalParser
Y Resolution: 72
resourceName: spaghetti-to-sushi.mp4
}}}

== Using Tika Server ==

You can also use Tika-Server. First, start it up:

{{{
java -Dtika.config=exif-tika-config.xml -classpath tika-server/target/tika-server-1.9-SNAPSHOT.jar org.apache.tika.server.TikaServerCli
}}}

Now, PUT a file to it, e.g., `spaghetti-to-sushi.mp4`:

{{{
curl -T $HOME/Movies/spaghetti-to-sushi.mp4 -H "Content-Disposition: attachment;filename=spaghetti-to-sushi.mp4" http://localhost:9998/rmeta
}}}

Which should return:

{{{
[
   {
      "Audio Bits Per Sample":"16",
      "Audio Channels":"2",
      "Audio Format":"mp4a",
      "Audio Sample Rate":"22050",
      "Average Bitrate":"0",
      "Avg Bitrate":"1.26 Mbps",
      "Balance":"0",
      "Bit Depth":"24",
      "Buffer Size":"0",
      "Compatible Brands":"mp41",
      "Compressor ID":"avc1",
      "Compressor Name":"h264",
      "Content Create Date":"created.with.SUPER(C).v2006.19",
      "Content Create Date (ja)":"created.with.SUPER(C).v2006.19",
      "Content-Type":"video/mp4",
      "Create Date":"2006:12:17 18:50:47",
      "Current Time":"0 s",
      "Duration":"0:37:19",
      "Elementary Stream Track":"201 101",
      "ExifTool Version Number":"9.72",
      "File Access Date/Time":"2015:05:25 21:20:47-07:00",
      "File Inode Change Date/Time":"2015:05:25 21:20:46-07:00",
      "File Modification Date/Time":"2015:05:25 21:20:46-07:00",
      "File Name":"apache-tika-3052147227532168299.tmp",
      "File Permissions":"rw-r--r--",
      "File Size":"338 MB",
      "File Type":"MP4",
      "Graphics Mode":"srcCopy",
      "Handler Description":"GPAC MPEG-4 BIFS Handler",
      "Handler Type":"Metadata",
      "Handler Vendor ID":"Apple",
      "Image Height":"480",
      "Image Size":"640x480",
      "Image Width":"640",
      "MIME Type":"video",
      "Major Brand":"MP4 v2 [ISO 14496-14]",
      "Matrix Structure":"1 0 0 0 1 0 0 0 1",
      "Max Bitrate":"0",
      "Media Create Date":"2006:12:16 20:07:48",
      "Media Duration":"1.00 s",
      "Media Header Version":"0",
      "Media Language Code":"und",
      "Media Modify Date":"2006:12:16 20:07:48",
      "Media Time Scale":"90000",
      "Minor Version":"0.0.1",
      "Modify Date":"2006:12:17 18:50:47",
      "Movie Data Offset":"473003",
      "Movie Data Size":"353512586",
      "Movie Header Version":"0",
      "Next Track ID":"201",
      "Op Color":"0 0 0",
      "Other Format":"mp4s",
      "Poster Time":"0 s",
      "Preferred Rate":"1",
      "Preferred Volume":"100.00",
      "Preview Duration":"0 s",
      "Preview Time":"0 s",
      "Rotation":"0",
      "Selection Duration":"0 s",
      "Selection Time":"0 s",
      "Source Image Height":"480",
      "Source Image Width":"720",
      "Time Scale":"90000",
      "Title":"From Spaghetti to Sushi.mpeg",
      "Title (ja)":"From Spaghetti to Sushi.mpeg",
      "Track Create Date":"2006:12:17 18:50:47",
      "Track Duration":"0:37:19",
      "Track Header Version":"0",
      "Track ID":"201",
      "Track Layer":"0",
      "Track Modify Date":"2006:12:16 20:07:48",
      "Track Volume":"0.00",
      "Vendor ID":"FFmpeg",
      "Video Frame Rate":"25",
      "X Resolution":"72",
      "X-Parsed-By":[
         "org.apache.tika.parser.CompositeParser",
         "org.apache.tika.parser.external.CompositeExternalParser",
         "org.apache.tika.parser.external.ExternalParser"
      ],
      "X-TIKA:parse_time_millis":"3638",
      "Y Resolution":"72",
      "resourceName":"spaghetti-to-sushi.mp4"
   }
]
}}}