You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Ray Gauss <ra...@alfresco.com> on 2014/08/06 22:11:48 UTC

Re: How should video files with audio be handled by parsers?

Sorry for the delay on this.

I've updated tika-ffmpeg with a new file with 2 audio tracks and a subtitle track and added a test.  The metadata looks as follows:

pbcore:instantiationDataRate=3511 kb/s
pbcore:instantiationDuration=00:00:01.03
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackType=Video
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameSize=480x270
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameRate=29.97 fps
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackDataRate=360 kb/s
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackEncoding=h264
pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackLanguage=eng
pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackType=Audio
pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackSamplingRate=48000 Hz
pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackDataRate=1536 kb/s
pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackEncoding=pcm_s16le
pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackLanguage=eng
pbcore:instantiationEssenceTrack[2]/pbcore:essenceTrackType=Audio
pbcore:instantiationEssenceTrack[2]/pbcore:essenceTrackSamplingRate=48000 Hz
pbcore:instantiationEssenceTrack[2]/pbcore:essenceTrackDataRate=1536 kb/s
pbcore:instantiationEssenceTrack[2]/pbcore:essenceTrackEncoding=pcm_s16le
pbcore:instantiationEssenceTrack[2]/pbcore:essenceTrackLanguage=eng
pbcore:instantiationEssenceTrack[3]/pbcore:essenceTrackType=Subtitle
pbcore:instantiationEssenceTrack[3]/pbcore:essenceTrackEncoding=eia_608
pbcore:instantiationEssenceTrack[3]/pbcore:essenceTrackLanguage=eng

and the alternative representation would look like:

pbcore:instantiationDataRate=3511 kb/s
pbcore:instantiationDuration=00:00:01.03
stream[0]/pbcore:essenceTrackType=Video
stream[0]/pbcore:essenceTrackFrameSize=480x270
stream[0]/pbcore:essenceTrackFrameRate=29.97 fps
stream[0]/pbcore:essenceTrackDataRate=360 kb/s
stream[0]/pbcore:essenceTrackEncoding=h264
stream[0]/pbcore:essenceTrackLanguage=eng
stream[1]/pbcore:essenceTrackType=Audio
stream[1]/pbcore:essenceTrackSamplingRate=48000 Hz
stream[1]/pbcore:essenceTrackDataRate=1536 kb/s
stream[1]/pbcore:essenceTrackEncoding=pcm_s16le
stream[1]/pbcore:essenceTrackLanguage=eng
stream[2]/pbcore:essenceTrackType=Audio
stream[2]/pbcore:essenceTrackSamplingRate=48000 Hz
stream[2]/pbcore:essenceTrackDataRate=1536 kb/s
stream[2]/pbcore:essenceTrackEncoding=pcm_s16le
stream[2]/pbcore:essenceTrackLanguage=eng
stream[3]/pbcore:essenceTrackType=Subtitle
stream[3]/pbcore:essenceTrackEncoding=eia_608
stream[3]/pbcore:essenceTrackLanguage=eng


I really think that if we encounter another 'kind of thing' that might utilize some form of sub-streams, that 'other thing' will need to be namespaced as well or we'll start to lose the value of using specifications as metadata keys in the first place.


Another example that could make use of this general concept of structured mapping is our IPTC metadata interface.  For instance, that specification uses a structured LocationDetails object for both a single-valued LocationCreated field and for a multi-valued LocationShown field.

That LocationDetails object contains fields like City and CountryName, so we currently have that mapped as:

Iptc4xmpExt:LocationCreatedCity (internalText)
Iptc4xmpExt:LocationCreatedCountryName (internalText)
...
Iptc4xmpExt:LocationShownCity (internalTextBag)
Iptc4xmpExt:LocationShownCountryName (internalTextBag)
...

which strays from the specification a bit to accommodate our metadata structure, i.e. LocationCreatedCity is not a field in the spec, and if one LocationShown entry only contains City and another only contains CountryName we have to rely on empty, 'padding' entries.

A much more concise representation would be:

Iptc4xmpExt:LocationCreated/Iptc4xmpExt:City
Iptc4xmpExt:LocationCreated/Iptc4xmpExt:CountryName
...
Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:City
Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:CountryName
...


IMHO, a generic 'streams' prefix would seem out of place next to those fields.

Regards,

Ray


On July 24, 2014 at 9:52:47 AM, Nick Burch (apache@gagravarr.org) wrote:
> On Wed, 23 Jul 2014, Ray Gauss wrote:
> > 2) There are are several PBCore instantiation properties that apply to
> > the entire file like duration and tracks that we'd want prefixed with
> > pbcore so I think it would be odd to see:
> >
> > pbcore:instantiationDuration=00:00:05.20
> > stream[0]/pbcore:essenceTrackType=Video
>  
> This structure does have the advantage that any tool can easily see that
> the second metadata key relates to a sub-stream / sub-track etc, without
> having to know anything about PBCore. That will make it easier for tools
> to exclude or handle these differently in a general way.
>  
> (I can't think, off the top of my head, of another kind of thing that
> might need this structure, but I'm reluctant to nail it down to being only
> for PBCore if that'll cause us issues when we try to support something
> very similar in future)
>  
>  
> Any chance you could get / fake a nearly-full set of metadata keys and
> value for a media file with (say) 3 streams? We can then generate pbcore
> prefixed and general prefixed versions, which should hopefully make it
> easier for other community members to compare and offer their input!
>  
> Nick

Re: How should video files with audio be handled by parsers?

Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 20 Aug 2014, Ray Gauss wrote:
> Are these the droids I'm looking for?
>
>   https://github.com/Gagravarr/VorbisJava/tree/master/tika/src/main/java/org/gagravarr/tika

Yup!

To find out about the relations between streams, you'll need to use the 
org.gagravarr.skeleton classes to decode the Skeleton (fisbone) packets, 
especially SkeletonFisbone. See https://wiki.xiph.org/Ogg_Skeleton_4 for 
more on Skeleton in general, and http://wiki.xiph.org/SkeletonHeaders for 
how the relationship information gets encoded into skeleton message 
headers.

There are two Skeleton-enabled test files in the source tree, I'll look to 
add a couple more this weekend to give better coverage for tests

Thanks
Nick

Re: How should video files with audio be handled by parsers?

Posted by Ray Gauss <ra...@alfresco.com>.
You could do something like that, or:

  Property phone = Contact.PHONE(1,2,2);
  System.out.println(phone.getName()); 
  // -> company[1]/contact[2]/phone[2]

Are these the droids I'm looking for?

  https://github.com/Gagravarr/VorbisJava/tree/master/tika/src/main/java/org/gagravarr/tika


Regards,

Ray


On August 20, 2014 at 6:25:10 AM, Nick Burch (apache@gagravarr.org) wrote:
> OK, almost all of it looks fine to me now!
>  
> Taking just one bit:
>  
> > so in your example, setting a value of Audio on two essence tracks would currently look  
> like:
> >
> > metadata.set(PBCore.ESSENCE_TRACK_TYPE(0), "Audio");
> > metadata.set(PBCore.ESSENCE_TRACK_TYPE(1), "Audio");
> >
> > That index related component could potentially live in the Metadata class itself but  
> if we choose to support multiple levels of structured properties, i.e.:
> >
> > company[1]/contact[0]/phoneNumber[2]=555-1234
> >
> > that might prove difficult to support.
>  
> Could you nest the property definitions to solve this?
>  
> eg
> Property contact = Contact.COMPANY_CONTACT(1);
> Property phone = Contact.PHONE(contact, 2);
> System.out.println(phone.getName());
> // -> company[1]/contact[2]/phone
>  
>  
> Otherwise, if you promise to help me update the vorbis parsers with
> support for this, I'll vote +1 on adding it in to tika core in this
> form... ;-)
>  
> Nick
>  

Re: How should video files with audio be handled by parsers?

Posted by Nick Burch <ap...@gagravarr.org>.
OK, almost all of it looks fine to me now!

Taking just one bit:

> so in your example, setting a value of Audio on two essence tracks would currently look like:
>
>    metadata.set(PBCore.ESSENCE_TRACK_TYPE(0), "Audio");
>    metadata.set(PBCore.ESSENCE_TRACK_TYPE(1), "Audio");
>
> That index related component could potentially live in the Metadata class itself but if we choose to support multiple levels of structured properties, i.e.:
>
>    company[1]/contact[0]/phoneNumber[2]=555-1234
>
> that might prove difficult to support.

Could you nest the property definitions to solve this?

eg
Property contact = Contact.COMPANY_CONTACT(1);
Property phone = Contact.PHONE(contact, 2); 
System.out.println(phone.getName());
// -> company[1]/contact[2]/phone


Otherwise, if you promise to help me update the vorbis parsers with 
support for this, I'll vote +1 on adding it in to tika core in this 
form... ;-)

Nick

On Tue, 19 Aug 2014, Ray Gauss wrote:
> The PBCore metadata class [1] has the indexed essence track properties defined as:
>
>     public static Property ESSENCE_TRACK_TYPE(int index)
>     {
>         return getIndexedEssenceTrackProperty(index, "essenceTrackType");
>     }
>
> which resolve via:
>
>     protected static Property getIndexedEssenceTrackProperty(int index, String elementName)
>     {
>         return Property.internalText(
>                 MessageFormat.format(ELEMENT_INSTANTIATION_ESSENCE_TRACK_FORMAT, index) +
>                 PREFIX_PBCORE + Metadata.NAMESPACE_PREFIX_DELIMITER + elementName);
>     }
>
> so in your example, setting a value of Audio on two essence tracks would currently look like:
>
>    metadata.set(PBCore.ESSENCE_TRACK_TYPE(0), "Audio");
>    metadata.set(PBCore.ESSENCE_TRACK_TYPE(1), "Audio");
>
> That index related component could potentially live in the Metadata class itself but if we choose to support multiple levels of structured properties, i.e.:
>
>    company[1]/contact[0]/phoneNumber[2]=555-1234
>
> that might prove difficult to support.
>
> Regards,
>
> Ray
>
>
> [1] https://github.com/AlfrescoLabs/tika-ffmpeg/blob/master/src/main/java/org/apache/tika/metadata/PBCore.java
>
>
>
> On August 7, 2014 at 6:21:37 AM, Nick Burch (apache@gagravarr.org) wrote:
>> On Wed, 6 Aug 2014, Ray Gauss wrote:
>>> I've updated tika-ffmpeg with a new file with 2 audio tracks and a
>>> subtitle track and added a test. The metadata looks as follows:
>>>
>>> pbcore:instantiationDataRate=3511 kb/s
>>> pbcore:instantiationDuration=00:00:01.03
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackType=Video
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameSize=480x270
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameRate=29.97
>> fps
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackDataRate=360 kb/s
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackEncoding=h264
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackLanguage=eng
>>> pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackType=Audio
>>> pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackSamplingRate=48000
>> Hz
>>
>> This actually looks better than I'd expected, so I have fewer resistances
>> now than before
>>
>>> A much more concise representation would be:
>>>
>>> Iptc4xmpExt:LocationCreated/Iptc4xmpExt:City
>>> Iptc4xmpExt:LocationCreated/Iptc4xmpExt:CountryName
>>> ...
>>> Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:City
>>> Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:CountryName
>>
>> This looks ok-ish to me too
>>
>>
>> One thing that I am wondering about though:
>>
>>> pbcore:instantiationDataRate=3511 kb/s
>>> pbcore:instantiationDuration=00:00:01.03
>>> stream[0]/pbcore:essenceTrackType=Video
>>> stream[0]/pbcore:essenceTrackFrameSize=480x270
>>> stream[0]/pbcore:essenceTrackFrameRate=29.97 fps
>>> stream[1]/pbcore:essenceTrackType=Audio
>>> stream[1]/pbcore:essenceTrackSamplingRate=48000 Hz
>>
>> I can see how we can farily easily modify Metadata to accept an optional
>> stream number when setting key/values, which would automatically prefix
>> them with stream[number]/
>>
>> For a property like pbcore:essenceTrackType, and your alternate scheme,
>> how would you see the method on Metadata look like to set a
>> pbcore:essenceTrackType to a value of Audio on two different tracks?
>>
>> Nick
>

Re: How should video files with audio be handled by parsers?

Posted by Ray Gauss <ra...@alfresco.com>.
The PBCore metadata class [1] has the indexed essence track properties defined as:

    public static Property ESSENCE_TRACK_TYPE(int index)
    {
        return getIndexedEssenceTrackProperty(index, "essenceTrackType");
    }

which resolve via:

    protected static Property getIndexedEssenceTrackProperty(int index, String elementName)
    {
        return Property.internalText(
                MessageFormat.format(ELEMENT_INSTANTIATION_ESSENCE_TRACK_FORMAT, index) +
                PREFIX_PBCORE + Metadata.NAMESPACE_PREFIX_DELIMITER + elementName);
    }

so in your example, setting a value of Audio on two essence tracks would currently look like:

   metadata.set(PBCore.ESSENCE_TRACK_TYPE(0), "Audio");
   metadata.set(PBCore.ESSENCE_TRACK_TYPE(1), "Audio");

That index related component could potentially live in the Metadata class itself but if we choose to support multiple levels of structured properties, i.e.:

   company[1]/contact[0]/phoneNumber[2]=555-1234

that might prove difficult to support.

Regards,

Ray


[1] https://github.com/AlfrescoLabs/tika-ffmpeg/blob/master/src/main/java/org/apache/tika/metadata/PBCore.java



On August 7, 2014 at 6:21:37 AM, Nick Burch (apache@gagravarr.org) wrote:
> On Wed, 6 Aug 2014, Ray Gauss wrote:
> > I've updated tika-ffmpeg with a new file with 2 audio tracks and a
> > subtitle track and added a test. The metadata looks as follows:
> >
> > pbcore:instantiationDataRate=3511 kb/s
> > pbcore:instantiationDuration=00:00:01.03
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackType=Video
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameSize=480x270  
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameRate=29.97  
> fps
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackDataRate=360 kb/s  
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackEncoding=h264
> > pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackLanguage=eng
> > pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackType=Audio
> > pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackSamplingRate=48000  
> Hz
>  
> This actually looks better than I'd expected, so I have fewer resistances
> now than before
>  
> > A much more concise representation would be:
> >
> > Iptc4xmpExt:LocationCreated/Iptc4xmpExt:City
> > Iptc4xmpExt:LocationCreated/Iptc4xmpExt:CountryName
> > ...
> > Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:City
> > Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:CountryName
>  
> This looks ok-ish to me too
>  
>  
> One thing that I am wondering about though:
>  
> > pbcore:instantiationDataRate=3511 kb/s
> > pbcore:instantiationDuration=00:00:01.03
> > stream[0]/pbcore:essenceTrackType=Video
> > stream[0]/pbcore:essenceTrackFrameSize=480x270
> > stream[0]/pbcore:essenceTrackFrameRate=29.97 fps
> > stream[1]/pbcore:essenceTrackType=Audio
> > stream[1]/pbcore:essenceTrackSamplingRate=48000 Hz
>  
> I can see how we can farily easily modify Metadata to accept an optional
> stream number when setting key/values, which would automatically prefix
> them with stream[number]/
>  
> For a property like pbcore:essenceTrackType, and your alternate scheme,
> how would you see the method on Metadata look like to set a
> pbcore:essenceTrackType to a value of Audio on two different tracks?
>  
> Nick

Re: How should video files with audio be handled by parsers?

Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 6 Aug 2014, Ray Gauss wrote:
> I've updated tika-ffmpeg with a new file with 2 audio tracks and a 
> subtitle track and added a test.  The metadata looks as follows:
>
> pbcore:instantiationDataRate=3511 kb/s
> pbcore:instantiationDuration=00:00:01.03
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackType=Video
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameSize=480x270
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameRate=29.97 fps
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackDataRate=360 kb/s
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackEncoding=h264
> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackLanguage=eng
> pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackType=Audio
> pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackSamplingRate=48000 Hz

This actually looks better than I'd expected, so I have fewer resistances 
now than before

> A much more concise representation would be:
>
> Iptc4xmpExt:LocationCreated/Iptc4xmpExt:City
> Iptc4xmpExt:LocationCreated/Iptc4xmpExt:CountryName
> ...
> Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:City
> Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:CountryName

This looks ok-ish to me too


One thing that I am wondering about though:

> pbcore:instantiationDataRate=3511 kb/s
> pbcore:instantiationDuration=00:00:01.03
> stream[0]/pbcore:essenceTrackType=Video
> stream[0]/pbcore:essenceTrackFrameSize=480x270
> stream[0]/pbcore:essenceTrackFrameRate=29.97 fps
> stream[1]/pbcore:essenceTrackType=Audio
> stream[1]/pbcore:essenceTrackSamplingRate=48000 Hz

I can see how we can farily easily modify Metadata to accept an optional 
stream number when setting key/values, which would automatically prefix 
them with stream[number]/

For a property like pbcore:essenceTrackType, and your alternate scheme, 
how would you see the method on Metadata look like to set a 
pbcore:essenceTrackType to a value of Audio on two different tracks?

Nick