You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by Apache Wiki <wi...@apache.org> on 2015/05/26 06:21:56 UTC
[Tika Wiki] Update of "EXIFToolParser" by ChrisMattmann
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "EXIFToolParser" page has been changed by ChrisMattmann:
https://wiki.apache.org/tika/EXIFToolParser
Comment:
- add EXIFTool instructions
New page:
Tika supports EXIFTool now through the External parser. Read on to find out how to use it.
= Download and install EXIFTool =
[[http://www.sno.phy.queensu.ca/~phil/exiftool/|EXIFTool]] is a wonderful tool that reads videos, images, audio and other media files and that extracts EXIF metadata from them. If you're lucky, you can install EXIFTool with the following commands.
== On Mac ==
`brew install exiftool`
== On Linux (CentOS) ==
`sudo yum install perl-Image-ExifTool`
To verify that EXIFTool works correctly, run:
{{{
exiftool -ver
}}}
which should output something like: `9.72`
= Using EXIFTool with Tika =
To use EXIFTool you'll need a custom Tika config that will override Tika's default MP4 parser (if you are dealing with MP4 files). You can do so by creating a file such as the one below:
{{{
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
</parser>
<parser class="org.apache.tika.parser.mp4.MP4Parser">
<mime-exclude>video/mp4</mime-exclude>
</parser>
<parser class="org.apache.tika.parser.external.CompositeExternalParser">
<mime>video/mp4</mime>
</parser>
</parsers>
</properties>
}}}
Note that this config file initializes the DefaultParser a CompositeParser, and the CompositeExternalParser, and the MP4Parser. For the MP4Parser, it uses a new directive, mime-exclude, to exclude that parser from the `video/mp4` type, and then to declare that CompositeExternalParser will support `video/mp4`. Since EXIFTool is an ExternalParser this configuration will make sure it gets called.
Once you have the config file made above, save it as a file, e.g., `exif-tika-config.xml` in the current directory. Then to call Tika, you can use Tika-App and/or Tika Server.
== Using Tika-App ==
Use the following command on a file, e.g., `spaghetti-to-sushi.mp4`:
{{{
java -Dtika.config=exif-tika-config.xml -classpath tika-app/target/tika-app-1.9-SNAPSHOT.jar org.apache.tika.cli.TikaCLI -m spaghetti-to-sushi.mp4
}}}
This should output:
{{{
Audio Bits Per Sample: 16
Audio Channels: 2
Audio Format: mp4a
Audio Sample Rate: 22050
Average Bitrate: 0
Avg Bitrate: 1.26 Mbps
Balance: 0
Bit Depth: 24
Buffer Size: 0
Compatible Brands: mp41
Compressor ID: avc1
Compressor Name: h264
Content Create Date: created.with.SUPER(C).v2006.19
Content Create Date (ja): created.with.SUPER(C).v2006.19
Content-Length: 353985630
Content-Type: video/mp4
Create Date: 2006:12:17 18:50:47
Current Time: 0 s
Duration: 0:37:19
Elementary Stream Track: 201 101
ExifTool Version Number: 9.72
File Access Date/Time: 2015:05:25 21:18:08-07:00
File Inode Change Date/Time: 2014:09:26 20:32:27-07:00
File Modification Date/Time: 2011:07:28 13:01:54-07:00
File Name: spaghetti-to-sushi.mp4
File Permissions: rwxr-xr-x
File Size: 338 MB
File Type: MP4
Graphics Mode: srcCopy
Handler Description: GPAC MPEG-4 BIFS Handler
Handler Type: Metadata
Handler Vendor ID: Apple
Image Height: 480
Image Size: 640x480
Image Width: 640
MIME Type: video
Major Brand: MP4 v2 [ISO 14496-14]
Matrix Structure: 1 0 0 0 1 0 0 0 1
Max Bitrate: 0
Media Create Date: 2006:12:16 20:07:48
Media Duration: 1.00 s
Media Header Version: 0
Media Language Code: und
Media Modify Date: 2006:12:16 20:07:48
Media Time Scale: 90000
Minor Version: 0.0.1
Modify Date: 2006:12:17 18:50:47
Movie Data Offset: 473003
Movie Data Size: 353512586
Movie Header Version: 0
Next Track ID: 201
Op Color: 0 0 0
Other Format: mp4s
Poster Time: 0 s
Preferred Rate: 1
Preferred Volume: 100.00
Preview Duration: 0 s
Preview Time: 0 s
Rotation: 0
Selection Duration: 0 s
Selection Time: 0 s
Source Image Height: 480
Source Image Width: 720
Time Scale: 90000
Title: From Spaghetti to Sushi.mpeg
Title (ja): From Spaghetti to Sushi.mpeg
Track Create Date: 2006:12:17 18:50:47
Track Duration: 0:37:19
Track Header Version: 0
Track ID: 201
Track Layer: 0
Track Modify Date: 2006:12:16 20:07:48
Track Volume: 0.00
Vendor ID: FFmpeg
Video Frame Rate: 25
X Resolution: 72
X-Parsed-By: org.apache.tika.parser.CompositeParser
X-Parsed-By: org.apache.tika.parser.external.CompositeExternalParser
X-Parsed-By: org.apache.tika.parser.external.ExternalParser
Y Resolution: 72
resourceName: spaghetti-to-sushi.mp4
}}}
== Using Tika Server ==
You can also use Tika-Server. First, start it up:
{{{
java -Dtika.config=exif-tika-config.xml -classpath tika-server/target/tika-server-1.9-SNAPSHOT.jar org.apache.tika.server.TikaServerCli
}}}
Now, PUT a file to it, e.g., `spaghetti-to-sushi.mp4`:
{{{
curl -T $HOME/Movies/spaghetti-to-sushi.mp4 -H "Content-Disposition: attachment;filename=spaghetti-to-sushi.mp4" http://localhost:9998/rmeta
}}}
Which should return:
{{{
[
{
"Audio Bits Per Sample":"16",
"Audio Channels":"2",
"Audio Format":"mp4a",
"Audio Sample Rate":"22050",
"Average Bitrate":"0",
"Avg Bitrate":"1.26 Mbps",
"Balance":"0",
"Bit Depth":"24",
"Buffer Size":"0",
"Compatible Brands":"mp41",
"Compressor ID":"avc1",
"Compressor Name":"h264",
"Content Create Date":"created.with.SUPER(C).v2006.19",
"Content Create Date (ja)":"created.with.SUPER(C).v2006.19",
"Content-Type":"video/mp4",
"Create Date":"2006:12:17 18:50:47",
"Current Time":"0 s",
"Duration":"0:37:19",
"Elementary Stream Track":"201 101",
"ExifTool Version Number":"9.72",
"File Access Date/Time":"2015:05:25 21:20:47-07:00",
"File Inode Change Date/Time":"2015:05:25 21:20:46-07:00",
"File Modification Date/Time":"2015:05:25 21:20:46-07:00",
"File Name":"apache-tika-3052147227532168299.tmp",
"File Permissions":"rw-r--r--",
"File Size":"338 MB",
"File Type":"MP4",
"Graphics Mode":"srcCopy",
"Handler Description":"GPAC MPEG-4 BIFS Handler",
"Handler Type":"Metadata",
"Handler Vendor ID":"Apple",
"Image Height":"480",
"Image Size":"640x480",
"Image Width":"640",
"MIME Type":"video",
"Major Brand":"MP4 v2 [ISO 14496-14]",
"Matrix Structure":"1 0 0 0 1 0 0 0 1",
"Max Bitrate":"0",
"Media Create Date":"2006:12:16 20:07:48",
"Media Duration":"1.00 s",
"Media Header Version":"0",
"Media Language Code":"und",
"Media Modify Date":"2006:12:16 20:07:48",
"Media Time Scale":"90000",
"Minor Version":"0.0.1",
"Modify Date":"2006:12:17 18:50:47",
"Movie Data Offset":"473003",
"Movie Data Size":"353512586",
"Movie Header Version":"0",
"Next Track ID":"201",
"Op Color":"0 0 0",
"Other Format":"mp4s",
"Poster Time":"0 s",
"Preferred Rate":"1",
"Preferred Volume":"100.00",
"Preview Duration":"0 s",
"Preview Time":"0 s",
"Rotation":"0",
"Selection Duration":"0 s",
"Selection Time":"0 s",
"Source Image Height":"480",
"Source Image Width":"720",
"Time Scale":"90000",
"Title":"From Spaghetti to Sushi.mpeg",
"Title (ja)":"From Spaghetti to Sushi.mpeg",
"Track Create Date":"2006:12:17 18:50:47",
"Track Duration":"0:37:19",
"Track Header Version":"0",
"Track ID":"201",
"Track Layer":"0",
"Track Modify Date":"2006:12:16 20:07:48",
"Track Volume":"0.00",
"Vendor ID":"FFmpeg",
"Video Frame Rate":"25",
"X Resolution":"72",
"X-Parsed-By":[
"org.apache.tika.parser.CompositeParser",
"org.apache.tika.parser.external.CompositeExternalParser",
"org.apache.tika.parser.external.ExternalParser"
],
"X-TIKA:parse_time_millis":"3638",
"Y Resolution":"72",
"resourceName":"spaghetti-to-sushi.mp4"
}
]
}}}