You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Eric Pugh <ep...@opensourceconnections.com> on 2019/01/21 14:49:46 UTC
Extracting Subtitles from Video Files?
Hi all, thought I would toss out this inquiry…! Has any one used Tika to extract subtitles from typical video files? I’ve done some research, and it appears the common formats, .SRT, .SBV, .VTT, and even a plain text format all look like slightly different versions of the below (taken from a .SRT file:
00:00:01,160 --> 00:00:04,729
Welcome to the presentation
on basic addition.
They have a time range, and then then the corresponding text. It seems like a great use case for Tika would be to handle various different types of embedded close captioning files, and emit them in the single standard structure.
Before I get too far down the path, thought I would see if anyone else has done this in the open source space!
Eric
_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: Extracting Subtitles from Video Files?
Posted by Chris Mattmann <ma...@apache.org>.
Hi Eric,
If it’s something that FFMPEG extracted, I suggest checking out:
http://wiki.apache.org/tika/FFMPEGParser
If it’s something where you want to classify what’s going on in the video
using Tensorflow, see:
https://wiki.apache.org/tika/TikaAndVisionVideo
Hope they help.
Cheers,
Chris
From: Eric Pugh <ep...@opensourceconnections.com>
Reply-To: "user@tika.apache.org" <us...@tika.apache.org>
Date: Monday, January 21, 2019 at 6:50 AM
To: "user@tika.apache.org" <us...@tika.apache.org>
Subject: Extracting Subtitles from Video Files?
Hi all, thought I would toss out this inquiry…! Has any one used Tika to extract subtitles from typical video files? I’ve done some research, and it appears the common formats, .SRT, .SBV, .VTT, and even a plain text format all look like slightly different versions of the below (taken from a .SRT file:
00:00:01,160 --> 00:00:04,729
Welcome to the presentation
on basic addition.
They have a time range, and then then the corresponding text. It seems like a great use case for Tika would be to handle various different types of embedded close captioning files, and emit them in the single standard structure.
Before I get too far down the path, thought I would see if anyone else has done this in the open source space!
Eric
_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.