You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Nick Burch <ni...@apache.org> on 2014/03/27 16:34:39 UTC

How should video files with audio be handled by parsers?

Hi All

Does anyone know if we have a recommended way / plan of a way to handle 
video files with possibly multiple audio streams?

Most of the multimedia container formats support video and zero or one 
audio streams, and a fair number support video and multiple audio streams. 
A few can actually hold multiple video and multiple audio. Some can tell 
you the relationship between the streams (eg video 0 = main video, video 1 
= subtitle overlay, audio 0 = english, audio 1 = french, audio 2 = english 
commentary). Some you just find what's there, and have to guess how it 
fits together.

So, now I'm looking at adding some basic video parsing support into the 
Ogg stuff, how should I be having the parser report the metadata and 
content for streams with both video and audio?

(Currently, MP4Parser looks to have "TODO Decide how to handle multiple 
tracks", and just dumps the audio information into the metadata along 
with the video)

Nick

Re: How should video files with audio be handled by parsers?

Posted by Ray Gauss <ra...@alfresco.com>.
Yes.

Since this approach has the potential to set precedence for representing structured stuff going forward I wanted to see what others thought before committing directly.

Regards,

Ray


On July 21, 2014 at 9:45:31 PM, Mattmann, Chris A (3980) (chris.a.mattmann@jpl.nasa.gov) wrote:
> Are you able to contribute to tika ?
> 
> Sent from my iPhone
> 
> > On Jul 21, 2014, at 6:43 PM, "Ray Gauss" wrote:
> >
> > Hi all,
> >
> > This is a few months old but I've been looking at this recently and since we're unlikely 
> to move to a structured metadata store in the short term I've come up with what I think is 
> an interim solution [1] that essentially allows nesting through XPath-like syntax: 
> >
> > stream[0]/field1=someValue
> > stream[0]/field2=otherValue
> > stream[1]/field1=yetAnother
> > stream[1]/field2=andSoOn
> >
> > In this case the PBCore metadata standard was used so the terminology is 'essenceTracks' 
> rather than stream and the parser is an ExternalParser configured for FFmpeg rather 
> than pure Java.
> >
> > If that approach seems reasonable we could move things into the main code base at some 
> point.
> >
> > Regards,
> >
> > Ray
> >
> >
> > [1] https://github.com/AlfrescoLabs/tika-ffmpeg
> >
> >
> >> On March 28, 2014 at 7:00:31 AM, Nick Burch (apache@gagravarr.org) wrote:
> >>> On Fri, 28 Mar 2014, Konstantin Gribov wrote:
> >>> I think you should have three info blocks: video streams, audio streams
> >>> and subtitles (if container supports their embedding). Sort naturally or
> >>> by vid/aid/sid if present.
> >>
> >> That's not something Tika supports though. We have a metadata object we
> >> can populate with some things, or we can trigger for embedded objects.
> >> The Metadata object doesn't support nesting
> >>
> >> Nick
> >>
> 

Re: How should video files with audio be handled by parsers?

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Are you able to contribute to tika ?

Sent from my iPhone

> On Jul 21, 2014, at 6:43 PM, "Ray Gauss" <ra...@alfresco.com> wrote:
> 
> Hi all,
> 
> This is a few months old but I've been looking at this recently and since we're unlikely to move to a structured metadata store in the short term I've come up with what I think is an interim solution [1] that essentially allows nesting through XPath-like syntax:
> 
>     stream[0]/field1=someValue
>     stream[0]/field2=otherValue
>     stream[1]/field1=yetAnother
>     stream[1]/field2=andSoOn
> 
> In this case the PBCore metadata standard was used so the terminology is 'essenceTracks' rather than stream and the parser is an ExternalParser configured for FFmpeg rather than pure Java.
> 
> If that approach seems reasonable we could move things into the main code base at some point.
> 
> Regards,
> 
> Ray
> 
> 
> [1] https://github.com/AlfrescoLabs/tika-ffmpeg
> 
> 
>> On March 28, 2014 at 7:00:31 AM, Nick Burch (apache@gagravarr.org) wrote:
>>> On Fri, 28 Mar 2014, Konstantin Gribov wrote:
>>> I think you should have three info blocks: video streams, audio streams
>>> and subtitles (if container supports their embedding). Sort naturally or
>>> by vid/aid/sid if present.
>> 
>> That's not something Tika supports though. We have a metadata object we
>> can populate with some things, or we can trigger for embedded objects.
>> The Metadata object doesn't support nesting
>> 
>> Nick
>> 

Re: How should video files with audio be handled by parsers?

Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 20 Aug 2014, Ray Gauss wrote:
> Are these the droids I'm looking for?
>
>   https://github.com/Gagravarr/VorbisJava/tree/master/tika/src/main/java/org/gagravarr/tika

Yup!

To find out about the relations between streams, you'll need to use the 
org.gagravarr.skeleton classes to decode the Skeleton (fisbone) packets, 
especially SkeletonFisbone. See https://wiki.xiph.org/Ogg_Skeleton_4 for 
more on Skeleton in general, and http://wiki.xiph.org/SkeletonHeaders for 
how the relationship information gets encoded into skeleton message 
headers.

There are two Skeleton-enabled test files in the source tree, I'll look to 
add a couple more this weekend to give better coverage for tests

Thanks
Nick

Re: How should video files with audio be handled by parsers?

Posted by Ray Gauss <ra...@alfresco.com>.
You could do something like that, or:

  Property phone = Contact.PHONE(1,2,2);
  System.out.println(phone.getName()); 
  // -> company[1]/contact[2]/phone[2]

Are these the droids I'm looking for?

  https://github.com/Gagravarr/VorbisJava/tree/master/tika/src/main/java/org/gagravarr/tika


Regards,

Ray


On August 20, 2014 at 6:25:10 AM, Nick Burch (apache@gagravarr.org) wrote:
> OK, almost all of it looks fine to me now!
>  
> Taking just one bit:
>  
> > so in your example, setting a value of Audio on two essence tracks would currently look  
> like:
> >
> > metadata.set(PBCore.ESSENCE_TRACK_TYPE(0), "Audio");
> > metadata.set(PBCore.ESSENCE_TRACK_TYPE(1), "Audio");
> >
> > That index related component could potentially live in the Metadata class itself but  
> if we choose to support multiple levels of structured properties, i.e.:
> >
> > company[1]/contact[0]/phoneNumber[2]=555-1234
> >
> > that might prove difficult to support.
>  
> Could you nest the property definitions to solve this?
>  
> eg
> Property contact = Contact.COMPANY_CONTACT(1);
> Property phone = Contact.PHONE(contact, 2);
> System.out.println(phone.getName());
> // -> company[1]/contact[2]/phone
>  
>  
> Otherwise, if you promise to help me update the vorbis parsers with
> support for this, I'll vote +1 on adding it in to tika core in this
> form... ;-)
>  
> Nick
>  

Re: How should video files with audio be handled by parsers?

Posted by Nick Burch <ap...@gagravarr.org>.
OK, almost all of it looks fine to me now!

Taking just one bit:

> so in your example, setting a value of Audio on two essence tracks would currently look like:
>
>    metadata.set(PBCore.ESSENCE_TRACK_TYPE(0), "Audio");
>    metadata.set(PBCore.ESSENCE_TRACK_TYPE(1), "Audio");
>
> That index related component could potentially live in the Metadata class itself but if we choose to support multiple levels of structured properties, i.e.:
>
>    company[1]/contact[0]/phoneNumber[2]=555-1234
>
> that might prove difficult to support.

Could you nest the property definitions to solve this?

eg
Property contact = Contact.COMPANY_CONTACT(1);
Property phone = Contact.PHONE(contact, 2); 
System.out.println(phone.getName());
// -> company[1]/contact[2]/phone


Otherwise, if you promise to help me update the vorbis parsers with 
support for this, I'll vote +1 on adding it in to tika core in this 
form... ;-)

Nick

On Tue, 19 Aug 2014, Ray Gauss wrote:
> The PBCore metadata class [1] has the indexed essence track properties defined as:
>
>     public static Property ESSENCE_TRACK_TYPE(int index)
>     {
>         return getIndexedEssenceTrackProperty(index, "essenceTrackType");
>     }
>
> which resolve via:
>
>     protected static Property getIndexedEssenceTrackProperty(int index, String elementName)
>     {
>         return Property.internalText(
>                 MessageFormat.format(ELEMENT_INSTANTIATION_ESSENCE_TRACK_FORMAT, index) +
>                 PREFIX_PBCORE + Metadata.NAMESPACE_PREFIX_DELIMITER + elementName);
>     }
>
> so in your example, setting a value of Audio on two essence tracks would currently look like:
>
>    metadata.set(PBCore.ESSENCE_TRACK_TYPE(0), "Audio");
>    metadata.set(PBCore.ESSENCE_TRACK_TYPE(1), "Audio");
>
> That index related component could potentially live in the Metadata class itself but if we choose to support multiple levels of structured properties, i.e.:
>
>    company[1]/contact[0]/phoneNumber[2]=555-1234
>
> that might prove difficult to support.
>
> Regards,
>
> Ray
>
>
> [1] https://github.com/AlfrescoLabs/tika-ffmpeg/blob/master/src/main/java/org/apache/tika/metadata/PBCore.java
>
>
>
> On August 7, 2014 at 6:21:37 AM, Nick Burch (apache@gagravarr.org) wrote:
>> On Wed, 6 Aug 2014, Ray Gauss wrote:
>>> I've updated tika-ffmpeg with a new file with 2 audio tracks and a
>>> subtitle track and added a test. The metadata looks as follows:
>>>
>>> pbcore:instantiationDataRate=3511 kb/s
>>> pbcore:instantiationDuration=00:00:01.03
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackType=Video
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameSize=480x270
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameRate=29.97
>> fps
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackDataRate=360 kb/s
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackEncoding=h264
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackLanguage=eng
>>> pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackType=Audio
>>> pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackSamplingRate=48000
>> Hz
>>
>> This actually looks better than I'd expected, so I have fewer resistances
>> now than before
>>
>>> A much more concise representation would be:
>>>
>>> Iptc4xmpExt:LocationCreated/Iptc4xmpExt:City
>>> Iptc4xmpExt:LocationCreated/Iptc4xmpExt:CountryName
>>> ...
>>> Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:City
>>> Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:CountryName
>>
>> This looks ok-ish to me too
>>
>>
>> One thing that I am wondering about though:
>>
>>> pbcore:instantiationDataRate=3511 kb/s
>>> pbcore:instantiationDuration=00:00:01.03
>>> stream[0]/pbcore:essenceTrackType=Video
>>> stream[0]/pbcore:essenceTrackFrameSize=480x270
>>> stream[0]/pbcore:essenceTrackFrameRate=29.97 fps
>>> stream[1]/pbcore:essenceTrackType=Audio
>>> stream[1]/pbcore:essenceTrackSamplingRate=48000 Hz
>>
>> I can see how we can farily easily modify Metadata to accept an optional
>> stream number when setting key/values, which would automatically prefix
>> them with stream[number]/
>>
>> For a property like pbcore:essenceTrackType, and your alternate scheme,
>> how would you see the method on Metadata look like to set a
>> pbcore:essenceTrackType to a value of Audio on two different tracks?
>>
>> Nick
>

Re: How should video files with audio be handled by parsers?

Posted by Ray Gauss <ra...@alfresco.com>.
The PBCore metadata class [1] has the indexed essence track properties defined as:

    public static Property ESSENCE_TRACK_TYPE(int index)
    {
        return getIndexedEssenceTrackProperty(index, "essenceTrackType");
    }

which resolve via:

    protected static Property getIndexedEssenceTrackProperty(int index, String elementName)
    {
        return Property.internalText(
                MessageFormat.format(ELEMENT_INSTANTIATION_ESSENCE_TRACK_FORMAT, index) +
                PREFIX_PBCORE + Metadata.NAMESPACE_PREFIX_DELIMITER + elementName);
    }

so in your example, setting a value of Audio on two essence tracks would currently look like:

   metadata.set(PBCore.ESSENCE_TRACK_TYPE(0), "Audio");
   metadata.set(PBCore.ESSENCE_TRACK_TYPE(1), "Audio");

That index related component could potentially live in the Metadata class itself but if we choose to support multiple levels of structured properties, i.e.:

   company[1]/contact[0]/phoneNumber[2]=555-1234

that might prove difficult to support.

Regards,

Ray


[1] https://github.com/AlfrescoLabs/tika-ffmpeg/blob/master/src/main/java/org/apache/tika/metadata/PBCore.java



On August 7, 2014 at 6:21:37 AM, Nick Burch (apache@gagravarr.org) wrote:
> On Wed, 6 Aug 2014, Ray Gauss wrote:
> > I've updated tika-ffmpeg with a new file with 2 audio tracks and a
> > subtitle track and added a test. The metadata looks as follows:
> >
> > pbcore:instantiationDataRate=3511 kb/s
> > pbcore:instantiationDuration=00:00:01.03
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackType=Video
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameSize=480x270  
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameRate=29.97  
> fps
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackDataRate=360 kb/s  
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackEncoding=h264
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackLanguage=eng
> > pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackType=Audio
> > pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackSamplingRate=48000  
> Hz
>  
> This actually looks better than I'd expected, so I have fewer resistances
> now than before
>  
> > A much more concise representation would be:
> >
> > Iptc4xmpExt:LocationCreated/Iptc4xmpExt:City
> > Iptc4xmpExt:LocationCreated/Iptc4xmpExt:CountryName
> > ...
> > Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:City
> > Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:CountryName
>  
> This looks ok-ish to me too
>  
>  
> One thing that I am wondering about though:
>  
> > pbcore:instantiationDataRate=3511 kb/s
> > pbcore:instantiationDuration=00:00:01.03
> > stream[0]/pbcore:essenceTrackType=Video
> > stream[0]/pbcore:essenceTrackFrameSize=480x270
> > stream[0]/pbcore:essenceTrackFrameRate=29.97 fps
> > stream[1]/pbcore:essenceTrackType=Audio
> > stream[1]/pbcore:essenceTrackSamplingRate=48000 Hz
>  
> I can see how we can farily easily modify Metadata to accept an optional
> stream number when setting key/values, which would automatically prefix
> them with stream[number]/
>  
> For a property like pbcore:essenceTrackType, and your alternate scheme,
> how would you see the method on Metadata look like to set a
> pbcore:essenceTrackType to a value of Audio on two different tracks?
>  
> Nick

Re: How should video files with audio be handled by parsers?

Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 6 Aug 2014, Ray Gauss wrote:
> I've updated tika-ffmpeg with a new file with 2 audio tracks and a 
> subtitle track and added a test.  The metadata looks as follows:
>
> pbcore:instantiationDataRate=3511 kb/s
> pbcore:instantiationDuration=00:00:01.03
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackType=Video
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameSize=480x270
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameRate=29.97 fps
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackDataRate=360 kb/s
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackEncoding=h264
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackLanguage=eng
> pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackType=Audio
> pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackSamplingRate=48000 Hz

This actually looks better than I'd expected, so I have fewer resistances 
now than before

> A much more concise representation would be:
>
> Iptc4xmpExt:LocationCreated/Iptc4xmpExt:City
> Iptc4xmpExt:LocationCreated/Iptc4xmpExt:CountryName
> ...
> Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:City
> Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:CountryName

This looks ok-ish to me too


One thing that I am wondering about though:

> pbcore:instantiationDataRate=3511 kb/s
> pbcore:instantiationDuration=00:00:01.03
> stream[0]/pbcore:essenceTrackType=Video
> stream[0]/pbcore:essenceTrackFrameSize=480x270
> stream[0]/pbcore:essenceTrackFrameRate=29.97 fps
> stream[1]/pbcore:essenceTrackType=Audio
> stream[1]/pbcore:essenceTrackSamplingRate=48000 Hz

I can see how we can farily easily modify Metadata to accept an optional 
stream number when setting key/values, which would automatically prefix 
them with stream[number]/

For a property like pbcore:essenceTrackType, and your alternate scheme, 
how would you see the method on Metadata look like to set a 
pbcore:essenceTrackType to a value of Audio on two different tracks?

Nick

Re: How should video files with audio be handled by parsers?

Posted by Ray Gauss <ra...@alfresco.com>.
Sorry for the delay on this.

I've updated tika-ffmpeg with a new file with 2 audio tracks and a subtitle track and added a test.  The metadata looks as follows:

pbcore:instantiationDataRate=3511 kb/s
pbcore:instantiationDuration=00:00:01.03
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackType=Video
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameSize=480x270
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameRate=29.97 fps
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackDataRate=360 kb/s
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackEncoding=h264
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackLanguage=eng
pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackType=Audio
pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackSamplingRate=48000 Hz
pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackDataRate=1536 kb/s
pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackEncoding=pcm_s16le
pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackLanguage=eng
pbcore:instantiationEssenceTrack[2]/pbcore:essenceTrackType=Audio
pbcore:instantiationEssenceTrack[2]/pbcore:essenceTrackSamplingRate=48000 Hz
pbcore:instantiationEssenceTrack[2]/pbcore:essenceTrackDataRate=1536 kb/s
pbcore:instantiationEssenceTrack[2]/pbcore:essenceTrackEncoding=pcm_s16le
pbcore:instantiationEssenceTrack[2]/pbcore:essenceTrackLanguage=eng
pbcore:instantiationEssenceTrack[3]/pbcore:essenceTrackType=Subtitle
pbcore:instantiationEssenceTrack[3]/pbcore:essenceTrackEncoding=eia_608
pbcore:instantiationEssenceTrack[3]/pbcore:essenceTrackLanguage=eng

and the alternative representation would look like:

pbcore:instantiationDataRate=3511 kb/s
pbcore:instantiationDuration=00:00:01.03
stream[0]/pbcore:essenceTrackType=Video
stream[0]/pbcore:essenceTrackFrameSize=480x270
stream[0]/pbcore:essenceTrackFrameRate=29.97 fps
stream[0]/pbcore:essenceTrackDataRate=360 kb/s
stream[0]/pbcore:essenceTrackEncoding=h264
stream[0]/pbcore:essenceTrackLanguage=eng
stream[1]/pbcore:essenceTrackType=Audio
stream[1]/pbcore:essenceTrackSamplingRate=48000 Hz
stream[1]/pbcore:essenceTrackDataRate=1536 kb/s
stream[1]/pbcore:essenceTrackEncoding=pcm_s16le
stream[1]/pbcore:essenceTrackLanguage=eng
stream[2]/pbcore:essenceTrackType=Audio
stream[2]/pbcore:essenceTrackSamplingRate=48000 Hz
stream[2]/pbcore:essenceTrackDataRate=1536 kb/s
stream[2]/pbcore:essenceTrackEncoding=pcm_s16le
stream[2]/pbcore:essenceTrackLanguage=eng
stream[3]/pbcore:essenceTrackType=Subtitle
stream[3]/pbcore:essenceTrackEncoding=eia_608
stream[3]/pbcore:essenceTrackLanguage=eng


I really think that if we encounter another 'kind of thing' that might utilize some form of sub-streams, that 'other thing' will need to be namespaced as well or we'll start to lose the value of using specifications as metadata keys in the first place.


Another example that could make use of this general concept of structured mapping is our IPTC metadata interface.  For instance, that specification uses a structured LocationDetails object for both a single-valued LocationCreated field and for a multi-valued LocationShown field.

That LocationDetails object contains fields like City and CountryName, so we currently have that mapped as:

Iptc4xmpExt:LocationCreatedCity (internalText)
Iptc4xmpExt:LocationCreatedCountryName (internalText)
...
Iptc4xmpExt:LocationShownCity (internalTextBag)
Iptc4xmpExt:LocationShownCountryName (internalTextBag)
...

which strays from the specification a bit to accommodate our metadata structure, i.e. LocationCreatedCity is not a field in the spec, and if one LocationShown entry only contains City and another only contains CountryName we have to rely on empty, 'padding' entries.

A much more concise representation would be:

Iptc4xmpExt:LocationCreated/Iptc4xmpExt:City
Iptc4xmpExt:LocationCreated/Iptc4xmpExt:CountryName
...
Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:City
Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:CountryName
...


IMHO, a generic 'streams' prefix would seem out of place next to those fields.

Regards,

Ray


On July 24, 2014 at 9:52:47 AM, Nick Burch (apache@gagravarr.org) wrote:
> On Wed, 23 Jul 2014, Ray Gauss wrote:
> > 2) There are are several PBCore instantiation properties that apply to
> > the entire file like duration and tracks that we'd want prefixed with
> > pbcore so I think it would be odd to see:
> >
> > pbcore:instantiationDuration=00:00:05.20
> > stream[0]/pbcore:essenceTrackType=Video
>  
> This structure does have the advantage that any tool can easily see that
> the second metadata key relates to a sub-stream / sub-track etc, without
> having to know anything about PBCore. That will make it easier for tools
> to exclude or handle these differently in a general way.
>  
> (I can't think, off the top of my head, of another kind of thing that
> might need this structure, but I'm reluctant to nail it down to being only
> for PBCore if that'll cause us issues when we try to support something
> very similar in future)
>  
>  
> Any chance you could get / fake a nearly-full set of metadata keys and
> value for a media file with (say) 3 streams? We can then generate pbcore
> prefixed and general prefixed versions, which should hopefully make it
> easier for other community members to compare and offer their input!
>  
> Nick

Re: How should video files with audio be handled by parsers?

Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 23 Jul 2014, Ray Gauss wrote:
> 2) There are are several PBCore instantiation properties that apply to 
> the entire file like duration and tracks that we'd want prefixed with 
> pbcore so I think it would be odd to see:
>
>   pbcore:instantiationDuration=00:00:05.20
>   stream[0]/pbcore:essenceTrackType=Video

This structure does have the advantage that any tool can easily see that 
the second metadata key relates to a sub-stream / sub-track etc, without 
having to know anything about PBCore. That will make it easier for tools 
to exclude or handle these differently in a general way.

(I can't think, off the top of my head, of another kind of thing that 
might need this structure, but I'm reluctant to nail it down to being only 
for PBCore if that'll cause us issues when we try to support something 
very similar in future)


Any chance you could get / fake a nearly-full set of metadata keys and 
value for a media file with (say) 3 streams? We can then generate pbcore 
prefixed and general prefixed versions, which should hopefully make it 
easier for other community members to compare and offer their input!

Nick

Re: How should video files with audio be handled by parsers?

Posted by Ray Gauss <ra...@alfresco.com>.
They are a bit verbose, but:

1) I'd really like to stick to the specification as closely as possible.

2) There are are several PBCore instantiation properties that apply to the entire file like duration and tracks that we'd want prefixed with pbcore so I think it would be odd to see:

  pbcore:instantiationDuration=00:00:05.20
  stream[0]/pbcore:essenceTrackType=Video

3) PBCore allows for essence track types like text that might not necessarily be considered 'streams'.

That's great that the Ogg parsers will be able to do the informational side!

Regards,

Ray


On July 23, 2014 at 10:17:29 AM, Nick Burch (apache@gagravarr.org) wrote:

> > ...
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackType=Video
> > ...
> > pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackType=Audio
> > pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackLanguage=eng
>  
> I'm not quite so keen on these metadata keys though. Do we gain anything
> from this long form, vs
>  
> stream[0]/pbcore:essenceTrackType=Video
> stream[1]/pbcore:essenceTrackType=Audio
> stream[1]/pbcore:essenceTrackLanguage=eng
>  
> ?
>  
> > The current FFmpeg parser wouldn't be able to extract things like
> > annotations, but it was only targeting the intrinsic metadata.
>  
> The Ogg parsers should be able to output that fairly easily, the only
> reason they don't is that I didn't know what to output as!
>  
> Nick
>  

Re: How should video files with audio be handled by parsers?

Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 22 Jul 2014, Ray Gauss wrote:
> The info on what the streams are and how they relate can be conveyed via 
> PBCore, i.e.:
>
> pbcore:instantiationTracks=1 video track, English and Spanish audio, 
> Director's commentary audio

Ah, that's good. Looks a sensible enough and easy to follow standard to 
crib from

> ...
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackType=Video
> ...
> pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackType=Audio
> pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackLanguage=eng

I'm not quite so keen on these metadata keys though. Do we gain anything 
from this long form, vs

stream[0]/pbcore:essenceTrackType=Video
stream[1]/pbcore:essenceTrackType=Audio
stream[1]/pbcore:essenceTrackLanguage=eng

?

> The current FFmpeg parser wouldn't be able to extract things like 
> annotations, but it was only targeting the intrinsic metadata.

The Ogg parsers should be able to output that fairly easily, the only 
reason they don't is that I didn't know what to output as!

Nick

Re: How should video files with audio be handled by parsers?

Posted by Ray Gauss <ra...@alfresco.com>.
The info on what the streams are and how they relate can be conveyed via PBCore, i.e.:

pbcore:instantiationTracks=1 video track, English and Spanish audio, Director's commentary audio
...
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackType=Video
...
pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackType=Audio
pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackLanguage=eng
...
pbcore:instantiationEssenceTrack[2]/pbcore:essenceTrackType=Audio
pbcore:instantiationEssenceTrack[2]/pbcore:essenceTrackLanguage=esp
...
pbcore:instantiationEssenceTrack[3]/pbcore:essenceTrackType=Audio
pbcore:instantiationEssenceTrack[3]/pbcore:essenceTrackLanguage=eng
pbcore:instantiationEssenceTrack[3]/pbcore:essenceTrackAnnotation=Director's commentary
...

The current FFmpeg parser wouldn't be able to extract things like annotations, but it was only targeting the intrinsic metadata.  The informational metadata will be extracted by things like XMP parsers.

Regards,

Ray


On July 22, 2014 at 7:39:12 AM, Nick Burch (apache@gagravarr.org) wrote:
> On Tue, 22 Jul 2014, Ray Gauss wrote:
> > This is a few months old but I've been looking at this recently and
> > since we're unlikely to move to a structured metadata store in the short
> > term I've come up with what I think is an interim solution [1] that
> > essentially allows nesting through XPath-like syntax:
> >
> > stream[0]/field1=someValue
> > stream[0]/field2=otherValue
> > stream[1]/field1=yetAnother
> > stream[1]/field2=andSoOn
>  
> Doesn't that loose information (from some formats at least) on what those
> streams are, and how they relate to each other?
>  
> It does have a certain simplicity that I like, and should be fairly easy
> to write some simple wrappers to let you get access to per-stream stuff
> easily
>  
> Nick

Re: How should video files with audio be handled by parsers?

Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 22 Jul 2014, Ray Gauss wrote:
> This is a few months old but I've been looking at this recently and 
> since we're unlikely to move to a structured metadata store in the short 
> term I've come up with what I think is an interim solution [1] that 
> essentially allows nesting through XPath-like syntax:
>
>     stream[0]/field1=someValue
>     stream[0]/field2=otherValue
>     stream[1]/field1=yetAnother
>     stream[1]/field2=andSoOn

Doesn't that loose information (from some formats at least) on what those 
streams are, and how they relate to each other?

It does have a certain simplicity that I like, and should be fairly easy 
to write some simple wrappers to let you get access to per-stream stuff 
easily

Nick

Re: How should video files with audio be handled by parsers?

Posted by Ray Gauss <ra...@alfresco.com>.
Hi all,

This is a few months old but I've been looking at this recently and since we're unlikely to move to a structured metadata store in the short term I've come up with what I think is an interim solution [1] that essentially allows nesting through XPath-like syntax:

    stream[0]/field1=someValue
    stream[0]/field2=otherValue
    stream[1]/field1=yetAnother
    stream[1]/field2=andSoOn

In this case the PBCore metadata standard was used so the terminology is 'essenceTracks' rather than stream and the parser is an ExternalParser configured for FFmpeg rather than pure Java.

If that approach seems reasonable we could move things into the main code base at some point.

Regards,

Ray


[1] https://github.com/AlfrescoLabs/tika-ffmpeg


On March 28, 2014 at 7:00:31 AM, Nick Burch (apache@gagravarr.org) wrote:
> On Fri, 28 Mar 2014, Konstantin Gribov wrote:
> > I think you should have three info blocks: video streams, audio streams
> > and subtitles (if container supports their embedding). Sort naturally or
> > by vid/aid/sid if present.
>  
> That's not something Tika supports though. We have a metadata object we
> can populate with some things, or we can trigger for embedded objects.
> The Metadata object doesn't support nesting
>  
> Nick
>  

Re: How should video files with audio be handled by parsers?

Posted by Konstantin Gribov <gr...@gmail.com>.
I said it about output to content handler, not to metadata. How to handle
metadata for containers with several video streams is another problem. Tika
metadata model is something weird for me, so I try to do not look at it too
often =)

-- 
Best regards,
Konstantin Gribov.


2014-03-28 14:59 GMT+04:00 Nick Burch <ap...@gagravarr.org>:

> On Fri, 28 Mar 2014, Konstantin Gribov wrote:
>
>> I think you should have three info blocks: video streams, audio streams
>> and subtitles (if container supports their embedding). Sort naturally or by
>> vid/aid/sid if present.
>>
>
> That's not something Tika supports though. We have a metadata object we
> can populate with some things, or we can trigger for embedded objects. The
> Metadata object doesn't support nesting
>
> Nick
>

Re: How should video files with audio be handled by parsers?

Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 28 Mar 2014, Konstantin Gribov wrote:
> I think you should have three info blocks: video streams, audio streams 
> and subtitles (if container supports their embedding). Sort naturally or 
> by vid/aid/sid if present.

That's not something Tika supports though. We have a metadata object we 
can populate with some things, or we can trigger for embedded objects. 
The Metadata object doesn't support nesting

Nick

Re: How should video files with audio be handled by parsers?

Posted by Konstantin Gribov <gr...@gmail.com>.
I think you should have three info blocks: video streams, audio streams and
subtitles (if container supports their embedding). Sort naturally or by
vid/aid/sid if present.

You shouldn't multiplex video and audio streams since any video stream can
be combined with any audio stream.

In terms of xml you can have container as root element, which embeds
streams grouped by type.

-- 
Best regards,
Konstantin Gribov.
28.03.2014 1:29 пользователь "Nick Burch" <ap...@gagravarr.org> написал:

> On Thu, 27 Mar 2014, Konstantin Gribov wrote:
>
>> Some containers (like matroska/mkv) tags audio and subtitle streams with
>> language tag and some comment. From mplayer console output:
>>
>>  [lavf] stream 0: video (h264), -vid 0
>>> [lavf] stream 1: audio (aac), -aid 0, -alang rus, Rus BaibaKo.tv
>>> [lavf] stream 2: audio (ac3), -aid 1, -alang eng, Eng
>>>
>>
> Ogg + CMML would give something similar
>
>  I don't know any established semantics for video streams but the first
>> usually is default for playback.
>>
>
> How should a Tika parser handle such a file though? Include the primary
> audio metadata with the video stream as the primary object, and report
> embedded items for the other audio streams? Report all as embedded items?
> Report the primary video stream as the main thing, and give all other video
> + audio as embedded items? Something else?
>
> Nick
>

Re: How should video files with audio be handled by parsers?

Posted by Nick Burch <ap...@gagravarr.org>.
On Thu, 27 Mar 2014, Konstantin Gribov wrote:
> Some containers (like matroska/mkv) tags audio and subtitle streams with
> language tag and some comment. From mplayer console output:
>
>> [lavf] stream 0: video (h264), -vid 0
>> [lavf] stream 1: audio (aac), -aid 0, -alang rus, Rus BaibaKo.tv
>> [lavf] stream 2: audio (ac3), -aid 1, -alang eng, Eng

Ogg + CMML would give something similar

> I don't know any established semantics for video streams but the first
> usually is default for playback.

How should a Tika parser handle such a file though? Include the primary 
audio metadata with the video stream as the primary object, and report 
embedded items for the other audio streams? Report all as embedded items? 
Report the primary video stream as the main thing, and give all other 
video + audio as embedded items? Something else?

Nick

Re: How should video files with audio be handled by parsers?

Posted by Konstantin Gribov <gr...@gmail.com>.
Hello, Nick.

Some containers (like matroska/mkv) tags audio and subtitle streams with
language tag and some comment. From mplayer console output:

> [lavf] stream 0: video (h264), -vid 0
> [lavf] stream 1: audio (aac), -aid 0, -alang rus, Rus BaibaKo.tv
> [lavf] stream 2: audio (ac3), -aid 1, -alang eng, Eng


Also subtitles streams can be included in container (at least in mkv).

I don't know any established semantics for video streams but the first
usually is default for playback.

-- 
Best regards,
Konstantin Gribov.


2014-03-27 19:34 GMT+04:00 Nick Burch <ni...@apache.org>:

> Hi All
>
> Does anyone know if we have a recommended way / plan of a way to handle
> video files with possibly multiple audio streams?
>
> Most of the multimedia container formats support video and zero or one
> audio streams, and a fair number support video and multiple audio streams.
> A few can actually hold multiple video and multiple audio. Some can tell
> you the relationship between the streams (eg video 0 = main video, video 1
> = subtitle overlay, audio 0 = english, audio 1 = french, audio 2 = english
> commentary). Some you just find what's there, and have to guess how it fits
> together.
>
> So, now I'm looking at adding some basic video parsing support into the
> Ogg stuff, how should I be having the parser report the metadata and
> content for streams with both video and audio?
>
> (Currently, MP4Parser looks to have "TODO Decide how to handle multiple
> tracks", and just dumps the audio information into the metadata along with
> the video)
>
> Nick
>