You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Mark Fortner <ph...@yahoo.com> on 2006/02/14 21:54:42 UTC

[vfs] File Metadata

I've been thinking about implementing support for File metadata on a 
project that I'm working on.  I was looking at the documentation and I 
see references to the FileContentInfo and FileContentInfoFactory.  I was 
wondering if the intent of these interfaces is to provide a starting 
point for metadata implementations?

What I'm planning on doing is implementing FileContentInfoFactory's that 
can create a map containing the metadata extracted from the file.  Does 
this sound like a resonable approach that fits well with VFS?

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [vfs] File Metadata

Posted by Mark Fortner <ph...@yahoo.com>.
Mario Ivankovits wrote:
> Hi!
>   
>> I agree that we shouldn't have individual accessors/mutators for
>> everything that you might want to get
>> out of a file (i.e. getAuthor, getCreationDate).
>>
>> What if we had something like this:
>>
>> MetadataFactory
>> [org.apache.commons.vfs.metadata]
>>    + static getMetadata(FileObject file):Map
>>    + static getKeys(FileObject file):Set
>>     
> Does this mean we have to deal with untyped informations?
> I mean we have to do something like getMetadata(fo).get("TITLE") ?
>
> This is something I would avoid in any case.
>
>   
What do you mean by "untyped informations"?  I was thinking of a map of 
Strings (Map<String, String>). Were there other
types that you were thinking about supporting?  Dates and numbers can be 
easily converted from Strings to those types, and we can either provide 
some assistance there with helper methods, or let the user do it.

Is there some reason that you say you would "avoid in any case"?


>> I've found that if users can't find something within 5-10 minutes they
>> figure it's not there, and either give up on the API or write their
>> own.  Neither of which we would want them to do.
>>
>> The trick is getting them to "step into" the service API to begin with
>> -- it would require them to think of metadata as a service.  It's not
>> something that naturally occurs to people to do and so they would
>> probably never think to look in a services package for metadata code.
>>     
> This is why the name "service" is no longer on top of my list. Currently
> its "operation", but thinking of it would show me that "aspect" is more
> correct.
>   
I think aspect would be confusing since people would think that you were 
adding AOP functionality to VFS.

>   
>> It isn't a new concept; however, its implementation in JAF left a lot
>> to be desired, and was difficult for a lot of people to understand. 
>> This is the primary reason that it doesn't really get used a lot. 
>> It's still gives me headaches when I look at the doc on it. :-)
>>
>> An org.apache.commons.vfs.metadata package would be fairly obvious to
>> most people.
>>     
> Yes, this is something we would like to have anyway.
> In this package we will find something like
>
> org.apache.commons.vfs.aspects
>             .vcs.Update
>             .vcs.Commit
>             .vcs.Lock
>              .....
>             .info.FileInfo
>             .image.Thumbnail
>
> most of the above will only be a interface (or abstract class) with
> concrete implementations for SVN, CVS, OO, MS Office, JPG, and so on
> (maybe outside of this
>
> e.g
> org.apache.commons.vfs.aspects.impl.svn.SvnUpdate
> org.apache.commons.vfs.aspects.impl.svn.SvnCommit
>   

It might be better to keep all of the provider code in the same provider 
package. (i.e. org.apache.commons.vfs.provider.svn.commands
org.apache.commons.vfs.provider.sftp.commands

Your .image.Thumbnail reminded me that it might be better to have a more 
generic way of saying this.  Perhaps ".info.Preview".  The reason being 
that the movies have a preview image, sound files can have a preview 
clip and images can have Thumbnails.

>  
>   
>> All of which are good. But most people only check them after they
>> haven't been able to find it in the Javadocs under some intuitive
>> package name. :-)
>>     
> You are right, I am one of them ;-)
>  
>   
>> Is the DocumentInfo some other interface you're thinking of?  If so,
>> what's the difference between FileContentInfo and DocumentInfo?
>>     
> Yes, maybe no difference, only that the service API will provide a more
> generalized and cleaner way to deal with the various requirements.
>
>   
>> Are you anticipating that you'll have some sort of "service discovery
>> " mechanism that will automatically register all services found in the
>> classpath and make them available?  If so, then this too would require
>> some work to make it easy for users to use.  There would need to be
>> some mechanism for the user to install supporting JARs needed for
>> specific metadata service providers.
>>     
> VFS already uses a plugin mechanism by scanning the classpath and
> processing all META-INF/vfs-providers.xml - so the user has nothing more
> to do then to drop in the JAR.
>
>   

Cool.  But if everything is implemented that way then it becomes a 
nightmare to build because every plugin will have it's own build target, 
or separate build file and jar file -- won't it?
>> I believe that most of what I've outlined though, is so standard and
>> generic that it should be part of the standard VFS distribution rather
>> than available through additional downloads.
>>     
> Users already rant about the size of the current VFS jar and told me not
> to pack all in one jar. This is why I created the plugin discovery.
>
>   
>> I think usually people want and expect everything in a single
>> download, rather than having to make choices about which service
>> providers they want.
>>     
> I also like the single download approach, but as I said, others dont.
> And ....
>   

I guess one way to satisfy everybody would be to provide a "core" 
distribution and a "complete" distribution.
>   
>> The existing file system service providers are a good example of
>> this.  Right now you have to explicitly download and install
>> additional jars to get the some of  functionality that you want.
>>     
> .... is only a matter of time this will change. ie. I cant add the SVN
> services to VFS core as the used library (JavaSVN) uses a non ASF
> compatible license.
> This is true for a requested filesystem implementation (novell) too.
>   

In fact, that's probably a good way to split things up.  Anything that 
is fairly common (metadata for images, movies, sound, office files) can 
be implemented under the current Apache license should be part of the 
core distribution.  Anything that requires a library that doesn't meet 
the ASF license, can be downloaded in the same way that you do with the 
current "get-dep*" targets in the build file.  This should keep the file 
size rather small, and would still give people the flexibility that they 
want.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [vfs] File Metadata

Posted by Mario Ivankovits <ma...@ops.co.at>.
Hi!
> I agree that we shouldn't have individual accessors/mutators for
> everything that you might want to get
> out of a file (i.e. getAuthor, getCreationDate).
>
> What if we had something like this:
>
> MetadataFactory
> [org.apache.commons.vfs.metadata]
>    + static getMetadata(FileObject file):Map
>    + static getKeys(FileObject file):Set
Does this mean we have to deal with untyped informations?
I mean we have to do something like getMetadata(fo).get("TITLE") ?

This is something I would avoid in any case.

> I've found that if users can't find something within 5-10 minutes they
> figure it's not there, and either give up on the API or write their
> own.  Neither of which we would want them to do.
>
> The trick is getting them to "step into" the service API to begin with
> -- it would require them to think of metadata as a service.  It's not
> something that naturally occurs to people to do and so they would
> probably never think to look in a services package for metadata code.
This is why the name "service" is no longer on top of my list. Currently
its "operation", but thinking of it would show me that "aspect" is more
correct.

> It isn't a new concept; however, its implementation in JAF left a lot
> to be desired, and was difficult for a lot of people to understand. 
> This is the primary reason that it doesn't really get used a lot. 
> It's still gives me headaches when I look at the doc on it. :-)
>
> An org.apache.commons.vfs.metadata package would be fairly obvious to
> most people.
Yes, this is something we would like to have anyway.
In this package we will find something like

org.apache.commons.vfs.aspects
            .vcs.Update
            .vcs.Commit
            .vcs.Lock
             .....
            .info.FileInfo
            .image.Thumbnail

most of the above will only be a interface (or abstract class) with
concrete implementations for SVN, CVS, OO, MS Office, JPG, and so on
(maybe outside of this

e.g
org.apache.commons.vfs.aspects.impl.svn.SvnUpdate
org.apache.commons.vfs.aspects.impl.svn.SvnCommit
 
> All of which are good. But most people only check them after they
> haven't been able to find it in the Javadocs under some intuitive
> package name. :-)
You are right, I am one of them ;-)
 
> Is the DocumentInfo some other interface you're thinking of?  If so,
> what's the difference between FileContentInfo and DocumentInfo?
Yes, maybe no difference, only that the service API will provide a more
generalized and cleaner way to deal with the various requirements.

> Are you anticipating that you'll have some sort of "service discovery
> " mechanism that will automatically register all services found in the
> classpath and make them available?  If so, then this too would require
> some work to make it easy for users to use.  There would need to be
> some mechanism for the user to install supporting JARs needed for
> specific metadata service providers.
VFS already uses a plugin mechanism by scanning the classpath and
processing all META-INF/vfs-providers.xml - so the user has nothing more
to do then to drop in the JAR.

> I believe that most of what I've outlined though, is so standard and
> generic that it should be part of the standard VFS distribution rather
> than available through additional downloads.
Users already rant about the size of the current VFS jar and told me not
to pack all in one jar. This is why I created the plugin discovery.

> I think usually people want and expect everything in a single
> download, rather than having to make choices about which service
> providers they want.
I also like the single download approach, but as I said, others dont.
And ....

> The existing file system service providers are a good example of
> this.  Right now you have to explicitly download and install
> additional jars to get the some of  functionality that you want.
.... is only a matter of time this will change. ie. I cant add the SVN
services to VFS core as the used library (JavaSVN) uses a non ASF
compatible license.
This is true for a requested filesystem implementation (novell) too.

> It would be easier, if everything you needed to get started were
> available in a single download, or with a single Ant "install" target.
I know what you mean - unhappily sometimes live isnt that easy ;-)


Ciao,
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [vfs] File Metadata

Posted by Mark Fortner <ph...@yahoo.com>.
Mario Ivankovits wrote:
> Hi!
>   
>> The one problem I see with the service API is that if I'm trying to
>> find metadata for a FileObject, looking in the service API isn't an
>> obvious thing to do.
>>     
> But it is the most powerful solution.
>
> I dont want to change/extend the interface for every single thing we can
> imagine in the future.
> And one should be able to add commands by simply dropping in a jar.
>
>   
I agree that we shouldn't have individual accessors/mutators for 
everything that you might want to get
out of a file (i.e. getAuthor, getCreationDate).

What if we had something like this:

MetadataFactory
[org.apache.commons.vfs.metadata]
    + static getMetadata(FileObject file):Map
    + static getKeys(FileObject file):Set
|
| <<uses>>
V

MetadataReaderFactory
[org.apache.commons.vfs.metadata]
    + static getInstanceByMimeType(String mimetype):MetadataReader
    + static getInstanceByExtension(String ext):MetadataReader
    + static getInstance(FileObject obj):MetadataReader

|
| <<creates>>
V

<<MetadataReader>>
[org.apache.commons.vfs.metadata]
    + getMetadata(): Map<String, String>
    + getMetadataKeys():Set<String> -- allows you to see what metadata 
is available
    + getMimetypes():List<String>

^
|  <<implements>>
|


ImageMetadataReader [org.apache.commons.vfs.metadata.image]
SoundMetadataReader [org.apache.commons.vfs.metadata.sound]
OpenOfficeMetadataReader [org.apache.commons.vfs.metadata.openoffice]
MicrosoftOfficeMetadataReader [org.apache.commons.vfs.metadata.poi]
...

Presumably one could also add writers for these metadata types using a 
similar set of classes and interfaces.

These classes could invoke services underneath the hood, but I think the 
metadata API should be high enough in the package structure, and have 
obvious enough names that people don't have to go hunting.  I've found 
that if users can't find something within 5-10 minutes they figure it's 
not there, and either give up on the API or write their own.  Neither of 
which we would want them to do.

>> Most people when they're starting to learn VFS are going to look for
>> some method in the FileObject (or if they're clever in the
>> FileContentInfo).  Either of these places are logical places to look
>> for metadata methods.
>>     
> But once they stepped into the service API it should be easily
> understandable, no?
> And as you say, it isnt that a new concept.
>
>   

The trick is getting them to "step into" the service API to begin with 
-- it would require them to think of metadata as a service.  It's not 
something that naturally occurs to people to do and so they would 
probably never think to look in a services package for metadata code.  
It isn't a new concept; however, its implementation in JAF left a lot to 
be desired, and was difficult for a lot of people to understand.  This 
is the primary reason that it doesn't really get used a lot.  It's still 
gives me headaches when I look at the doc on it. :-)

An org.apache.commons.vfs.metadata package would be fairly obvious to 
most people.

>> Any ideas about how we could make it easier for them?
>>     
> Docs, Wiki, Mailinglist (in this order, I hope ;-) )
>
>   
All of which are good. But most people only check them after they 
haven't been able to find it in the Javadocs under some intuitive 
package name. :-)

> Think about how powerful it could be, given the following three things
> share the same base class
>   
>> Open Office metadata
>> Microsoft Office metadata
>> MP3/AAC/Ogg metadata
>>     
> e.g. DocumentInfo which provides something like (title, author, ...)
>
> one can simply lookup  DocumentInfo.class and get these informations. If
> one drop in a jar to extract these data from e.g. java files the code
> will use it in the second.
>
> I wont say it isnt possible to do this by extending the API, but I think
> it will bloat it.
>   
Is the DocumentInfo some other interface you're thinking of?  If so, 
what's the difference between FileContentInfo and DocumentInfo?

I think most of the "code bloat" would be fairly small. Basically a 
single new package, and a single method in an interface that returns 
metadata for specific mimetypes. The actual implementations are simply 
adapters that implement the interface by making calls to existing APIs 
capable of reading file metadata.  In the case of Open Office, that's a 
fairly simple matter of looking at the meta.xml file inside the Open 
Office zip file.  For images, there are a couple different ways of 
getting at this data (either through Drew Noakes' metadata-extractor API 
(http://www.drewnoakes.com/code/exif/), or through  JAI 
(http://www.picturegrid.com/community/samples/imageio/) and finally POI 
can extract Microsoft Office document metadata.

Are you anticipating that you'll have some sort of "service discovery " 
mechanism that will automatically register all services found in the 
classpath and make them available?  If so, then this too would require 
some work to make it easy for users to use.  There would need to be some 
mechanism for the user to install supporting JARs needed for specific 
metadata service providers.

I believe that most of what I've outlined though, is so standard and 
generic that it should be part of the standard VFS distribution rather 
than available through additional downloads.
 
I think usually people want and expect everything in a single download, 
rather than having to make choices about which service providers they 
want.  The existing file system service providers are a good example of 
this.  Right now you have to explicitly download and install additional 
jars to get the some of  functionality that you want.  It would be 
easier, if everything you needed to get started were available in a 
single download, or with a single Ant "install" target.

Hope this clarifies things a bit. Sorry for the ASCII UML diagram. :-)

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [vfs] File Metadata

Posted by Mario Ivankovits <ma...@ops.co.at>.
Hi!
> The one problem I see with the service API is that if I'm trying to
> find metadata for a FileObject, looking in the service API isn't an
> obvious thing to do.
But it is the most powerful solution.

I dont want to change/extend the interface for every single thing we can
imagine in the future.
And one should be able to add commands by simply dropping in a jar.

> Most people when they're starting to learn VFS are going to look for
> some method in the FileObject (or if they're clever in the
> FileContentInfo).  Either of these places are logical places to look
> for metadata methods.
But once they stepped into the service API it should be easily
understandable, no?
And as you say, it isnt that a new concept.

> Any ideas about how we could make it easier for them?
Docs, Wiki, Mailinglist (in this order, I hope ;-) )


Think about how powerful it could be, given the following three things
share the same base class
> Open Office metadata
> Microsoft Office metadata
> MP3/AAC/Ogg metadata
e.g. DocumentInfo which provides something like (title, author, ...)

one can simply lookup  DocumentInfo.class and get these informations. If
one drop in a jar to extract these data from e.g. java files the code
will use it in the second.

I wont say it isnt possible to do this by extending the API, but I think
it will bloat it.

---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [vfs] File Metadata

Posted by Mark Fortner <ph...@yahoo.com>.
Mario,
The one problem I see with the service API is that if I'm trying to find 
metadata for a FileObject, looking in the service API isn't an obvious 
thing to do.  Most people when they're starting to learn VFS are going 
to look for some method in the FileObject (or if they're clever in the 
FileContentInfo).  Either of these places are logical places to look for 
metadata methods.

Any ideas about how we could make it easier for them?

Mark

Mario Ivankovits wrote:
> Hi Mark!
>   
>> My current metadata list includes:
>>
>> EXIF/IPTC for JPEGs, and PNGs
>> Open Office metadata
>> Microsoft Office metadata
>> MP3/AAC/Ogg metadata
>>     
> Ok, so this perfectly match the operation API (aka service API)
> I'll try to find some time this week to apply this patch so you can
> start using it.
>
>   
>> The services API looks like a rewrite of the commands in the JavaBeans
>> Activation Framework.
>>     
> Yes and no. For sure we can write an adapter then to expose our api
> throug JAF too. But as far as I understand JAF works on mimeTypes only.
> Our operation API should provide operations SVN like update/commit too.
> This cant be expressed through the mimeType of the file.
> The possible operations for a given fileObject is a merge of
> 1) for sure its mimeType
> 2) directory capabilites (SVN, CVS, ...) and
> 3) global fileSystem capabilities (copy, move, rename, ....)
>
>
> ---
> Mario
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>
>   


Re: [vfs] File Metadata

Posted by Mario Ivankovits <ma...@ops.co.at>.
Hi Mark!
> My current metadata list includes:
>
> EXIF/IPTC for JPEGs, and PNGs
> Open Office metadata
> Microsoft Office metadata
> MP3/AAC/Ogg metadata
Ok, so this perfectly match the operation API (aka service API)
I'll try to find some time this week to apply this patch so you can
start using it.

> The services API looks like a rewrite of the commands in the JavaBeans
> Activation Framework.
Yes and no. For sure we can write an adapter then to expose our api
throug JAF too. But as far as I understand JAF works on mimeTypes only.
Our operation API should provide operations SVN like update/commit too.
This cant be expressed through the mimeType of the file.
The possible operations for a given fileObject is a merge of
1) for sure its mimeType
2) directory capabilites (SVN, CVS, ...) and
3) global fileSystem capabilities (copy, move, rename, ....)


---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [vfs] File Metadata

Posted by Mark Fortner <ph...@yahoo.com>.
Mario,
My current metadata list includes:

EXIF/IPTC for JPEGs, and PNGs
Open Office metadata
Microsoft Office metadata
MP3/AAC/Ogg metadata

The services API looks like a rewrite of the commands in the JavaBeans 
Activation Framework.

Mark

Mario Ivankovits wrote:
> Mark Fortner schrieb:
>   
>> I've been thinking about implementing support for File metadata on a
>> project that I'm working on.
>>     
> Could you please tell me what kind of metadata you would like to provide?
> Maybe it fits better to the (still waiting) services/aspect approach.
>
> See http://wiki.apache.org/jakarta-commons/VfsNext #2
>
> ---
> Mario
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>
>   


Re: [vfs] File Metadata

Posted by Mario Ivankovits <ma...@ops.co.at>.
Mark Fortner schrieb:
> I've been thinking about implementing support for File metadata on a
> project that I'm working on.
Could you please tell me what kind of metadata you would like to provide?
Maybe it fits better to the (still waiting) services/aspect approach.

See http://wiki.apache.org/jakarta-commons/VfsNext #2

---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org