You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Nick Burch <ni...@alfresco.com> on 2011/09/18 17:43:22 UTC

Media container formats?

Hi All

As part of my Ogg stuff, I'm wondering how best to handle media 
container formats such as Ogg and AVI.

For a file with only a single stream in it, eg an Ogg Vorbis audio file, 
then it seems sensible to treat that as a single (non container) file. 
For a file with multiple streams, such as a video with two soundtracks 
and subtitles, what should we do? Try to identify the "main" stream 
(often not actually marked), parse that as the file and do the other 
streams (eg audio) as embedded resources?

The specific use case at the moment I have is for Ogg Vorbis or Ogg Flac 
files where only the outer container is detected. I'm thinking that the 
general Ogg parser should check for a single stream, and delegate to the 
Vorbis or Flac parser as found. However, if it finds multiple streams, 
what should it do?

Nick