You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Nick Burch <ni...@alfresco.com> on 2012/01/30 15:40:29 UTC
Sharing metadata logic between parsers
Hi All
With the MP4 Parser, I've just found myself writing a piece of code for
the third time, to turn the number of channels as an Int into the XMPDM
channel type strings (mono, stereo, 5.1 etc).
What I'm not sure on is where the best place is to put that code so it can
be re-used? Options that spring to mind include:
* AbstractMediaParser / AbstractAudioParser parent class
* Static method on XMPDM
* Extending the Property class to support a converter
* Static method elsewhere in the metadata space
* Static method on the MP3 Parser, called from the others
What do people think is the best way to handle this sort of thing?
Nick
Re: Sharing metadata logic between parsers
Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 30 Jan 2012, Jukka Zitting wrote:
>> If we're doing that sort of thing, then I'd rather we put the logic
>> onto the Property for that. The Property already has a type and the
>> closed list of allowed values (as Strings, from the XMPDM
>> specification). It would seem to me that the logic for going to/from
>> channel numbers would best live with the strings themselves, on the
>> property, rather than outside?
>
> I'm thinking of cases where such a convenience methods could rely on
> more than just a single property. For example, a getNumberOfPages()
> convenience method could look something like this (with an extra
> getInt(String) helper method):
>
> int getNumberOfPages() {
> Integer pages = metadata.getInt(PagedText.N_PAGES);
> if (pages == null) {
> pages = metadata.getInt(MSOffice.PAGE_COUNT);
> }
I decided, after thinking on this a bit, that the best bet was to try to
implement the solution and see how it actually worked
I've however just found a snag with Jukka's proposed solution above. I
wanted to put the logic on XMPDM, as static methods, because I thought
that the logic should really live with the property definition.
Unfortunately, this doesn't work, as XMPDM is an interface so Java won't
let you add random static methods. I tried putting the static method on
Metadata itself, but I really didn't like the look of it
Property itself is final, as we don't want people to go around adding
extra types in that the specifications don't support, so I can't simply
add a converter method onto that.
In r1242018 I've committed a couple of different possible ways to do the
conversion, which do work, so people can have a look and a think. I could
see the static class being attached optionally to the Property, for either
explicit calling, or implicit as Ray described.
Based on how it looks now I've actually coded something, I'm leaning
towards implicit conversion in metadata.set(Property, thing) calling
methods on an optional attached converter on the property. (I guess Ray
would be in favour of this, as it's quite like his idea!).
Thoughts? Should I go ahead and try to implement my plan, so we can see if
we like how it looks in real code and if there are any problems with doing
it? Can anyone think of a cleaner way to do it? Can anyone think of a way
to do what Jukka suggested, but without having to clutter up the main
concrete class?
Cheers
Nick
Re: Sharing metadata logic between parsers
Posted by Ray Gauss II <ra...@alfresco.com>.
I personally like Nick's 3rd idea: Extending the Property class to support a converter.
Even in the case described here the Metadata property setters could be modified to something like:
public void set(Property property, int value) {
if(property.getPropertyType() != Property.PropertyType.SIMPLE) {
throw new PropertyTypeException(Property.PropertyType.SIMPLE, property.getPropertyType());
}
if(property.getValueType() != Property.ValueType.INTEGER) {
throw new PropertyTypeException(Property.ValueType.INTEGER, property.getValueType());
}
if (property.getConverter() != null) {
set(property.getName(), property.getConverter().convert(this, value));
} else {
set(property.getName(), Integer.toString(value));
}
}
then the specific converter implementation could still look at other properties.
Obviously the ordering of calling those set methods would be critical since the dependency properties need to be set first, but it seems like a pretty flexible implementation where some powerful converters could be developed when needed.
This is also somewhat similar to a the concept of a mapper that I had to use for the tika-exiftool parser, converting from a properties provided by the command-line tool to proper tika properties.
On Jan 30, 2012, at 10:52 AM, Jukka Zitting wrote:
> Hi,
>
> On Mon, Jan 30, 2012 at 4:20 PM, Nick Burch <ni...@alfresco.com> wrote:
>> On Mon, 30 Jan 2012, Jukka Zitting wrote:
>>> What we might also consider as an extra convenience, are Metadata methods
>>> like: [...]
>>
>> If we're doing that sort of thing, then I'd rather we put the logic onto the
>> Property for that. The Property already has a type and the closed list of
>> allowed values (as Strings, from the XMPDM specification). It would seem to
>> me that the logic for going to/from channel numbers would best live with the
>> strings themselves, on the property, rather than outside?
>
> I'm thinking of cases where such a convenience methods could rely on
> more than just a single property. For example, a getNumberOfPages()
> convenience method could look something like this (with an extra
> getInt(String) helper method):
>
> int getNumberOfPages() {
> Integer pages = metadata.getInt(PagedText.N_PAGES);
> if (pages == null) {
> pages = metadata.getInt(MSOffice.PAGE_COUNT);
> }
> if (pages == null) {
> pages = metadata.getInt(MSOffice.SLIDE_COUNT);
> }
> if (pages != null) {
> return pages;
> } else {
> return 0;
> }
> }
>
> BR,
>
> Jukka Zitting
Re: Sharing metadata logic between parsers
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Mon, Jan 30, 2012 at 4:20 PM, Nick Burch <ni...@alfresco.com> wrote:
> On Mon, 30 Jan 2012, Jukka Zitting wrote:
>> What we might also consider as an extra convenience, are Metadata methods
>> like: [...]
>
> If we're doing that sort of thing, then I'd rather we put the logic onto the
> Property for that. The Property already has a type and the closed list of
> allowed values (as Strings, from the XMPDM specification). It would seem to
> me that the logic for going to/from channel numbers would best live with the
> strings themselves, on the property, rather than outside?
I'm thinking of cases where such a convenience methods could rely on
more than just a single property. For example, a getNumberOfPages()
convenience method could look something like this (with an extra
getInt(String) helper method):
int getNumberOfPages() {
Integer pages = metadata.getInt(PagedText.N_PAGES);
if (pages == null) {
pages = metadata.getInt(MSOffice.PAGE_COUNT);
}
if (pages == null) {
pages = metadata.getInt(MSOffice.SLIDE_COUNT);
}
if (pages != null) {
return pages;
} else {
return 0;
}
}
BR,
Jukka Zitting
Re: Sharing metadata logic between parsers
Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 30 Jan 2012, Jukka Zitting wrote:
> What we might also consider as an extra convenience, are Metadata
> methods like:
>
> int getNumberOfAudioChannels();
> void getNumberOfAudioChannels(int channels);
>
> and
>
> String getAudioChannelType();
> void setAudioChannelType(String type);
>
> based on code and constants in the XMPDM class (and with defaults like
> 0 and null for non-audio documents).
If we're doing that sort of thing, then I'd rather we put the logic onto
the Property for that. The Property already has a type and the closed list
of allowed values (as Strings, from the XMPDM specification). It would
seem to me that the logic for going to/from channel numbers would best
live with the strings themselves, on the property, rather than outside?
Nick
Re: Sharing metadata logic between parsers
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Mon, Jan 30, 2012 at 3:59 PM, Nick Burch <ni...@alfresco.com> wrote:
> OK. Any thoughts on how it should work? The two things that spring to mind
> are:
> String toAudioChannelType(int numberOfChannels)
> void setAudioChannelType(int numberOfChannels, Metadata metadata)
>
> Do you think we should be building the value, or building the value and
> setting the property?
It's probably cleanest to avoid a dependency from XMPDM to Metadata
(even though they're in the same package) so I'd go with the first
signature.
What we might also consider as an extra convenience, are Metadata methods like:
int getNumberOfAudioChannels();
void getNumberOfAudioChannels(int channels);
and
String getAudioChannelType();
void setAudioChannelType(String type);
based on code and constants in the XMPDM class (and with defaults like
0 and null for non-audio documents).
Opening the Metadata class for convenience methods like these can be a
Pandora's box, but it would also simplify quite a bit of code both on
the client and the parser side.
BR,
Jukka Zitting
Re: Sharing metadata logic between parsers
Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 30 Jan 2012, Jukka Zitting wrote:
> On Mon, Jan 30, 2012 at 3:40 PM, Nick Burch <ni...@alfresco.com> wrote:
>> What do people think is the best way to handle this sort of thing?
>
> I'd go with XMPDM, as that's already a dependency of the described
> piece of code.
OK. Any thoughts on how it should work? The two things that spring to mind
are:
String toAudioChannelType(int numberOfChannels)
void setAudioChannelType(int numberOfChannels, Metadata metadata)
Do you think we should be building the value, or building the value and
setting the property?
Nick
Re: Sharing metadata logic between parsers
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Mon, Jan 30, 2012 at 3:40 PM, Nick Burch <ni...@alfresco.com> wrote:
> What do people think is the best way to handle this sort of thing?
I'd go with XMPDM, as that's already a dependency of the described
piece of code.
BR,
Jukka Zitting