You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Nick Burch <ni...@alfresco.com> on 2012/01/30 15:40:29 UTC

Sharing metadata logic between parsers

Hi All

With the MP4 Parser, I've just found myself writing a piece of code for 
the third time, to turn the number of channels as an Int into the XMPDM 
channel type strings (mono, stereo, 5.1 etc).

What I'm not sure on is where the best place is to put that code so it can 
be re-used? Options that spring to mind include:
  * AbstractMediaParser / AbstractAudioParser parent class
  * Static method on XMPDM
  * Extending the Property class to support a converter
  * Static method elsewhere in the metadata space
  * Static method on the MP3 Parser, called from the others

What do people think is the best way to handle this sort of thing?

Nick

Re: Sharing metadata logic between parsers

Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 30 Jan 2012, Jukka Zitting wrote:
>> If we're doing that sort of thing, then I'd rather we put the logic 
>> onto the Property for that. The Property already has a type and the 
>> closed list of allowed values (as Strings, from the XMPDM 
>> specification). It would seem to me that the logic for going to/from 
>> channel numbers would best live with the strings themselves, on the 
>> property, rather than outside?
>
> I'm thinking of cases where such a convenience methods could rely on
> more than just a single property. For example, a getNumberOfPages()
> convenience method could look something like this (with an extra
> getInt(String) helper method):
>
>    int getNumberOfPages() {
>        Integer pages = metadata.getInt(PagedText.N_PAGES);
>        if (pages == null) {
>            pages = metadata.getInt(MSOffice.PAGE_COUNT);
>        }

I decided, after thinking on this a bit, that the best bet was to try to 
implement the solution and see how it actually worked

I've however just found a snag with Jukka's proposed solution above. I 
wanted to put the logic on XMPDM, as static methods, because I thought 
that the logic should really live with the property definition. 
Unfortunately, this doesn't work, as XMPDM is an interface so Java won't 
let you add random static methods. I tried putting the static method on 
Metadata itself, but I really didn't like the look of it

Property itself is final, as we don't want people to go around adding 
extra types in that the specifications don't support, so I can't simply 
add a converter method onto that.

In r1242018 I've committed a couple of different possible ways to do the 
conversion, which do work, so people can have a look and a think. I could 
see the static class being attached optionally to the Property, for either 
explicit calling, or implicit as Ray described.

Based on how it looks now I've actually coded something, I'm leaning 
towards implicit conversion in metadata.set(Property, thing) calling 
methods on an optional attached converter on the property. (I guess Ray 
would be in favour of this, as it's quite like his idea!).

Thoughts? Should I go ahead and try to implement my plan, so we can see if 
we like how it looks in real code and if there are any problems with doing 
it? Can anyone think of a cleaner way to do it? Can anyone think of a way 
to do what Jukka suggested, but without having to clutter up the main 
concrete class?

Cheers
Nick

Re: Sharing metadata logic between parsers

Posted by Ray Gauss II <ra...@alfresco.com>.
I personally like Nick's 3rd idea: Extending the Property class to support a converter.

Even in the case described here the Metadata property setters could be modified to something like:

public void set(Property property, int value) {
        if(property.getPropertyType() != Property.PropertyType.SIMPLE) {
            throw new PropertyTypeException(Property.PropertyType.SIMPLE, property.getPropertyType());
        }
        if(property.getValueType() != Property.ValueType.INTEGER) {
            throw new PropertyTypeException(Property.ValueType.INTEGER, property.getValueType());
        }
        if (property.getConverter() != null) {
            set(property.getName(), property.getConverter().convert(this, value));
        } else {
            set(property.getName(), Integer.toString(value));
        }
    }

then the specific converter implementation could still look at other properties.

Obviously the ordering of calling those set methods would be critical since the dependency properties need to be set first, but it seems like a pretty flexible implementation where some powerful converters could be developed when needed.

This is also somewhat similar to a the concept of a mapper that I had to use for the tika-exiftool parser, converting from a properties provided by the command-line tool to proper tika properties.


On Jan 30, 2012, at 10:52 AM, Jukka Zitting wrote:

> Hi,
> 
> On Mon, Jan 30, 2012 at 4:20 PM, Nick Burch <ni...@alfresco.com> wrote:
>> On Mon, 30 Jan 2012, Jukka Zitting wrote:
>>> What we might also consider as an extra convenience, are Metadata methods
>>> like: [...]
>> 
>> If we're doing that sort of thing, then I'd rather we put the logic onto the
>> Property for that. The Property already has a type and the closed list of
>> allowed values (as Strings, from the XMPDM specification). It would seem to
>> me that the logic for going to/from channel numbers would best live with the
>> strings themselves, on the property, rather than outside?
> 
> I'm thinking of cases where such a convenience methods could rely on
> more than just a single property. For example, a getNumberOfPages()
> convenience method could look something like this (with an extra
> getInt(String) helper method):
> 
>    int getNumberOfPages() {
>        Integer pages = metadata.getInt(PagedText.N_PAGES);
>        if (pages == null) {
>            pages = metadata.getInt(MSOffice.PAGE_COUNT);
>        }
>        if (pages == null) {
>            pages = metadata.getInt(MSOffice.SLIDE_COUNT);
>        }
>        if (pages != null) {
>            return pages;
>        } else {
>            return 0;
>        }
>    }
> 
> BR,
> 
> Jukka Zitting


Re: Sharing metadata logic between parsers

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Jan 30, 2012 at 4:20 PM, Nick Burch <ni...@alfresco.com> wrote:
> On Mon, 30 Jan 2012, Jukka Zitting wrote:
>> What we might also consider as an extra convenience, are Metadata methods
>> like: [...]
>
> If we're doing that sort of thing, then I'd rather we put the logic onto the
> Property for that. The Property already has a type and the closed list of
> allowed values (as Strings, from the XMPDM specification). It would seem to
> me that the logic for going to/from channel numbers would best live with the
> strings themselves, on the property, rather than outside?

I'm thinking of cases where such a convenience methods could rely on
more than just a single property. For example, a getNumberOfPages()
convenience method could look something like this (with an extra
getInt(String) helper method):

    int getNumberOfPages() {
        Integer pages = metadata.getInt(PagedText.N_PAGES);
        if (pages == null) {
            pages = metadata.getInt(MSOffice.PAGE_COUNT);
        }
        if (pages == null) {
            pages = metadata.getInt(MSOffice.SLIDE_COUNT);
        }
        if (pages != null) {
            return pages;
        } else {
            return 0;
        }
    }

BR,

Jukka Zitting

Re: Sharing metadata logic between parsers

Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 30 Jan 2012, Jukka Zitting wrote:
> What we might also consider as an extra convenience, are Metadata 
> methods like:
>
>    int getNumberOfAudioChannels();
>    void getNumberOfAudioChannels(int channels);
>
> and
>
>    String getAudioChannelType();
>    void setAudioChannelType(String type);
>
> based on code and constants in the XMPDM class (and with defaults like
> 0 and null for non-audio documents).

If we're doing that sort of thing, then I'd rather we put the logic onto 
the Property for that. The Property already has a type and the closed list 
of allowed values (as Strings, from the XMPDM specification). It would 
seem to me that the logic for going to/from channel numbers would best 
live with the strings themselves, on the property, rather than outside?

Nick

Re: Sharing metadata logic between parsers

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Jan 30, 2012 at 3:59 PM, Nick Burch <ni...@alfresco.com> wrote:
> OK. Any thoughts on how it should work? The two things that spring to mind
> are:
>   String toAudioChannelType(int numberOfChannels)
>   void setAudioChannelType(int numberOfChannels, Metadata metadata)
>
> Do you think we should be building the value, or building the value and
> setting the property?

It's probably cleanest to avoid a dependency from XMPDM to Metadata
(even though they're in the same package) so I'd go with the first
signature.

What we might also consider as an extra convenience, are Metadata methods like:

    int getNumberOfAudioChannels();
    void getNumberOfAudioChannels(int channels);

and

    String getAudioChannelType();
    void setAudioChannelType(String type);

based on code and constants in the XMPDM class (and with defaults like
0 and null for non-audio documents).

Opening the Metadata class for convenience methods like these can be a
Pandora's box, but it would also simplify quite a bit of code both on
the client and the parser side.

BR,

Jukka Zitting

Re: Sharing metadata logic between parsers

Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 30 Jan 2012, Jukka Zitting wrote:
> On Mon, Jan 30, 2012 at 3:40 PM, Nick Burch <ni...@alfresco.com> wrote:
>> What do people think is the best way to handle this sort of thing?
>
> I'd go with XMPDM, as that's already a dependency of the described
> piece of code.

OK. Any thoughts on how it should work? The two things that spring to mind 
are:
    String toAudioChannelType(int numberOfChannels)
    void setAudioChannelType(int numberOfChannels, Metadata metadata)

Do you think we should be building the value, or building the value and 
setting the property?

Nick

Re: Sharing metadata logic between parsers

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Jan 30, 2012 at 3:40 PM, Nick Burch <ni...@alfresco.com> wrote:
> What do people think is the best way to handle this sort of thing?

I'd go with XMPDM, as that's already a dependency of the described
piece of code.

BR,

Jukka Zitting