You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Mark Fortner <ph...@yahoo.com> on 2006/02/14 21:54:42 UTC
[vfs] File Metadata
I've been thinking about implementing support for File metadata on a
project that I'm working on. I was looking at the documentation and I
see references to the FileContentInfo and FileContentInfoFactory. I was
wondering if the intent of these interfaces is to provide a starting
point for metadata implementations?
What I'm planning on doing is implementing FileContentInfoFactory's that
can create a map containing the metadata extracted from the file. Does
this sound like a resonable approach that fits well with VFS?
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org
Re: [vfs] File Metadata
Posted by Mark Fortner <ph...@yahoo.com>.
Mario Ivankovits wrote:
> Hi!
>
>> I agree that we shouldn't have individual accessors/mutators for
>> everything that you might want to get
>> out of a file (i.e. getAuthor, getCreationDate).
>>
>> What if we had something like this:
>>
>> MetadataFactory
>> [org.apache.commons.vfs.metadata]
>> + static getMetadata(FileObject file):Map
>> + static getKeys(FileObject file):Set
>>
> Does this mean we have to deal with untyped informations?
> I mean we have to do something like getMetadata(fo).get("TITLE") ?
>
> This is something I would avoid in any case.
>
>
What do you mean by "untyped informations"? I was thinking of a map of
Strings (Map<String, String>). Were there other
types that you were thinking about supporting? Dates and numbers can be
easily converted from Strings to those types, and we can either provide
some assistance there with helper methods, or let the user do it.
Is there some reason that you say you would "avoid in any case"?
>> I've found that if users can't find something within 5-10 minutes they
>> figure it's not there, and either give up on the API or write their
>> own. Neither of which we would want them to do.
>>
>> The trick is getting them to "step into" the service API to begin with
>> -- it would require them to think of metadata as a service. It's not
>> something that naturally occurs to people to do and so they would
>> probably never think to look in a services package for metadata code.
>>
> This is why the name "service" is no longer on top of my list. Currently
> its "operation", but thinking of it would show me that "aspect" is more
> correct.
>
I think aspect would be confusing since people would think that you were
adding AOP functionality to VFS.
>
>> It isn't a new concept; however, its implementation in JAF left a lot
>> to be desired, and was difficult for a lot of people to understand.
>> This is the primary reason that it doesn't really get used a lot.
>> It's still gives me headaches when I look at the doc on it. :-)
>>
>> An org.apache.commons.vfs.metadata package would be fairly obvious to
>> most people.
>>
> Yes, this is something we would like to have anyway.
> In this package we will find something like
>
> org.apache.commons.vfs.aspects
> .vcs.Update
> .vcs.Commit
> .vcs.Lock
> .....
> .info.FileInfo
> .image.Thumbnail
>
> most of the above will only be a interface (or abstract class) with
> concrete implementations for SVN, CVS, OO, MS Office, JPG, and so on
> (maybe outside of this
>
> e.g
> org.apache.commons.vfs.aspects.impl.svn.SvnUpdate
> org.apache.commons.vfs.aspects.impl.svn.SvnCommit
>
It might be better to keep all of the provider code in the same provider
package. (i.e. org.apache.commons.vfs.provider.svn.commands
org.apache.commons.vfs.provider.sftp.commands
Your .image.Thumbnail reminded me that it might be better to have a more
generic way of saying this. Perhaps ".info.Preview". The reason being
that the movies have a preview image, sound files can have a preview
clip and images can have Thumbnails.
>
>
>> All of which are good. But most people only check them after they
>> haven't been able to find it in the Javadocs under some intuitive
>> package name. :-)
>>
> You are right, I am one of them ;-)
>
>
>> Is the DocumentInfo some other interface you're thinking of? If so,
>> what's the difference between FileContentInfo and DocumentInfo?
>>
> Yes, maybe no difference, only that the service API will provide a more
> generalized and cleaner way to deal with the various requirements.
>
>
>> Are you anticipating that you'll have some sort of "service discovery
>> " mechanism that will automatically register all services found in the
>> classpath and make them available? If so, then this too would require
>> some work to make it easy for users to use. There would need to be
>> some mechanism for the user to install supporting JARs needed for
>> specific metadata service providers.
>>
> VFS already uses a plugin mechanism by scanning the classpath and
> processing all META-INF/vfs-providers.xml - so the user has nothing more
> to do then to drop in the JAR.
>
>
Cool. But if everything is implemented that way then it becomes a
nightmare to build because every plugin will have it's own build target,
or separate build file and jar file -- won't it?
>> I believe that most of what I've outlined though, is so standard and
>> generic that it should be part of the standard VFS distribution rather
>> than available through additional downloads.
>>
> Users already rant about the size of the current VFS jar and told me not
> to pack all in one jar. This is why I created the plugin discovery.
>
>
>> I think usually people want and expect everything in a single
>> download, rather than having to make choices about which service
>> providers they want.
>>
> I also like the single download approach, but as I said, others dont.
> And ....
>
I guess one way to satisfy everybody would be to provide a "core"
distribution and a "complete" distribution.
>
>> The existing file system service providers are a good example of
>> this. Right now you have to explicitly download and install
>> additional jars to get the some of functionality that you want.
>>
> .... is only a matter of time this will change. ie. I cant add the SVN
> services to VFS core as the used library (JavaSVN) uses a non ASF
> compatible license.
> This is true for a requested filesystem implementation (novell) too.
>
In fact, that's probably a good way to split things up. Anything that
is fairly common (metadata for images, movies, sound, office files) can
be implemented under the current Apache license should be part of the
core distribution. Anything that requires a library that doesn't meet
the ASF license, can be downloaded in the same way that you do with the
current "get-dep*" targets in the build file. This should keep the file
size rather small, and would still give people the flexibility that they
want.
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org
Re: [vfs] File Metadata
Posted by Mario Ivankovits <ma...@ops.co.at>.
Hi!
> I agree that we shouldn't have individual accessors/mutators for
> everything that you might want to get
> out of a file (i.e. getAuthor, getCreationDate).
>
> What if we had something like this:
>
> MetadataFactory
> [org.apache.commons.vfs.metadata]
> + static getMetadata(FileObject file):Map
> + static getKeys(FileObject file):Set
Does this mean we have to deal with untyped informations?
I mean we have to do something like getMetadata(fo).get("TITLE") ?
This is something I would avoid in any case.
> I've found that if users can't find something within 5-10 minutes they
> figure it's not there, and either give up on the API or write their
> own. Neither of which we would want them to do.
>
> The trick is getting them to "step into" the service API to begin with
> -- it would require them to think of metadata as a service. It's not
> something that naturally occurs to people to do and so they would
> probably never think to look in a services package for metadata code.
This is why the name "service" is no longer on top of my list. Currently
its "operation", but thinking of it would show me that "aspect" is more
correct.
> It isn't a new concept; however, its implementation in JAF left a lot
> to be desired, and was difficult for a lot of people to understand.
> This is the primary reason that it doesn't really get used a lot.
> It's still gives me headaches when I look at the doc on it. :-)
>
> An org.apache.commons.vfs.metadata package would be fairly obvious to
> most people.
Yes, this is something we would like to have anyway.
In this package we will find something like
org.apache.commons.vfs.aspects
.vcs.Update
.vcs.Commit
.vcs.Lock
.....
.info.FileInfo
.image.Thumbnail
most of the above will only be a interface (or abstract class) with
concrete implementations for SVN, CVS, OO, MS Office, JPG, and so on
(maybe outside of this
e.g
org.apache.commons.vfs.aspects.impl.svn.SvnUpdate
org.apache.commons.vfs.aspects.impl.svn.SvnCommit
> All of which are good. But most people only check them after they
> haven't been able to find it in the Javadocs under some intuitive
> package name. :-)
You are right, I am one of them ;-)
> Is the DocumentInfo some other interface you're thinking of? If so,
> what's the difference between FileContentInfo and DocumentInfo?
Yes, maybe no difference, only that the service API will provide a more
generalized and cleaner way to deal with the various requirements.
> Are you anticipating that you'll have some sort of "service discovery
> " mechanism that will automatically register all services found in the
> classpath and make them available? If so, then this too would require
> some work to make it easy for users to use. There would need to be
> some mechanism for the user to install supporting JARs needed for
> specific metadata service providers.
VFS already uses a plugin mechanism by scanning the classpath and
processing all META-INF/vfs-providers.xml - so the user has nothing more
to do then to drop in the JAR.
> I believe that most of what I've outlined though, is so standard and
> generic that it should be part of the standard VFS distribution rather
> than available through additional downloads.
Users already rant about the size of the current VFS jar and told me not
to pack all in one jar. This is why I created the plugin discovery.
> I think usually people want and expect everything in a single
> download, rather than having to make choices about which service
> providers they want.
I also like the single download approach, but as I said, others dont.
And ....
> The existing file system service providers are a good example of
> this. Right now you have to explicitly download and install
> additional jars to get the some of functionality that you want.
.... is only a matter of time this will change. ie. I cant add the SVN
services to VFS core as the used library (JavaSVN) uses a non ASF
compatible license.
This is true for a requested filesystem implementation (novell) too.
> It would be easier, if everything you needed to get started were
> available in a single download, or with a single Ant "install" target.
I know what you mean - unhappily sometimes live isnt that easy ;-)
Ciao,
Mario
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org
Re: [vfs] File Metadata
Posted by Mark Fortner <ph...@yahoo.com>.
Mario Ivankovits wrote:
> Hi!
>
>> The one problem I see with the service API is that if I'm trying to
>> find metadata for a FileObject, looking in the service API isn't an
>> obvious thing to do.
>>
> But it is the most powerful solution.
>
> I dont want to change/extend the interface for every single thing we can
> imagine in the future.
> And one should be able to add commands by simply dropping in a jar.
>
>
I agree that we shouldn't have individual accessors/mutators for
everything that you might want to get
out of a file (i.e. getAuthor, getCreationDate).
What if we had something like this:
MetadataFactory
[org.apache.commons.vfs.metadata]
+ static getMetadata(FileObject file):Map
+ static getKeys(FileObject file):Set
|
| <<uses>>
V
MetadataReaderFactory
[org.apache.commons.vfs.metadata]
+ static getInstanceByMimeType(String mimetype):MetadataReader
+ static getInstanceByExtension(String ext):MetadataReader
+ static getInstance(FileObject obj):MetadataReader
|
| <<creates>>
V
<<MetadataReader>>
[org.apache.commons.vfs.metadata]
+ getMetadata(): Map<String, String>
+ getMetadataKeys():Set<String> -- allows you to see what metadata
is available
+ getMimetypes():List<String>
^
| <<implements>>
|
ImageMetadataReader [org.apache.commons.vfs.metadata.image]
SoundMetadataReader [org.apache.commons.vfs.metadata.sound]
OpenOfficeMetadataReader [org.apache.commons.vfs.metadata.openoffice]
MicrosoftOfficeMetadataReader [org.apache.commons.vfs.metadata.poi]
...
Presumably one could also add writers for these metadata types using a
similar set of classes and interfaces.
These classes could invoke services underneath the hood, but I think the
metadata API should be high enough in the package structure, and have
obvious enough names that people don't have to go hunting. I've found
that if users can't find something within 5-10 minutes they figure it's
not there, and either give up on the API or write their own. Neither of
which we would want them to do.
>> Most people when they're starting to learn VFS are going to look for
>> some method in the FileObject (or if they're clever in the
>> FileContentInfo). Either of these places are logical places to look
>> for metadata methods.
>>
> But once they stepped into the service API it should be easily
> understandable, no?
> And as you say, it isnt that a new concept.
>
>
The trick is getting them to "step into" the service API to begin with
-- it would require them to think of metadata as a service. It's not
something that naturally occurs to people to do and so they would
probably never think to look in a services package for metadata code.
It isn't a new concept; however, its implementation in JAF left a lot to
be desired, and was difficult for a lot of people to understand. This
is the primary reason that it doesn't really get used a lot. It's still
gives me headaches when I look at the doc on it. :-)
An org.apache.commons.vfs.metadata package would be fairly obvious to
most people.
>> Any ideas about how we could make it easier for them?
>>
> Docs, Wiki, Mailinglist (in this order, I hope ;-) )
>
>
All of which are good. But most people only check them after they
haven't been able to find it in the Javadocs under some intuitive
package name. :-)
> Think about how powerful it could be, given the following three things
> share the same base class
>
>> Open Office metadata
>> Microsoft Office metadata
>> MP3/AAC/Ogg metadata
>>
> e.g. DocumentInfo which provides something like (title, author, ...)
>
> one can simply lookup DocumentInfo.class and get these informations. If
> one drop in a jar to extract these data from e.g. java files the code
> will use it in the second.
>
> I wont say it isnt possible to do this by extending the API, but I think
> it will bloat it.
>
Is the DocumentInfo some other interface you're thinking of? If so,
what's the difference between FileContentInfo and DocumentInfo?
I think most of the "code bloat" would be fairly small. Basically a
single new package, and a single method in an interface that returns
metadata for specific mimetypes. The actual implementations are simply
adapters that implement the interface by making calls to existing APIs
capable of reading file metadata. In the case of Open Office, that's a
fairly simple matter of looking at the meta.xml file inside the Open
Office zip file. For images, there are a couple different ways of
getting at this data (either through Drew Noakes' metadata-extractor API
(http://www.drewnoakes.com/code/exif/), or through JAI
(http://www.picturegrid.com/community/samples/imageio/) and finally POI
can extract Microsoft Office document metadata.
Are you anticipating that you'll have some sort of "service discovery "
mechanism that will automatically register all services found in the
classpath and make them available? If so, then this too would require
some work to make it easy for users to use. There would need to be some
mechanism for the user to install supporting JARs needed for specific
metadata service providers.
I believe that most of what I've outlined though, is so standard and
generic that it should be part of the standard VFS distribution rather
than available through additional downloads.
I think usually people want and expect everything in a single download,
rather than having to make choices about which service providers they
want. The existing file system service providers are a good example of
this. Right now you have to explicitly download and install additional
jars to get the some of functionality that you want. It would be
easier, if everything you needed to get started were available in a
single download, or with a single Ant "install" target.
Hope this clarifies things a bit. Sorry for the ASCII UML diagram. :-)
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org
Re: [vfs] File Metadata
Posted by Mario Ivankovits <ma...@ops.co.at>.
Hi!
> The one problem I see with the service API is that if I'm trying to
> find metadata for a FileObject, looking in the service API isn't an
> obvious thing to do.
But it is the most powerful solution.
I dont want to change/extend the interface for every single thing we can
imagine in the future.
And one should be able to add commands by simply dropping in a jar.
> Most people when they're starting to learn VFS are going to look for
> some method in the FileObject (or if they're clever in the
> FileContentInfo). Either of these places are logical places to look
> for metadata methods.
But once they stepped into the service API it should be easily
understandable, no?
And as you say, it isnt that a new concept.
> Any ideas about how we could make it easier for them?
Docs, Wiki, Mailinglist (in this order, I hope ;-) )
Think about how powerful it could be, given the following three things
share the same base class
> Open Office metadata
> Microsoft Office metadata
> MP3/AAC/Ogg metadata
e.g. DocumentInfo which provides something like (title, author, ...)
one can simply lookup DocumentInfo.class and get these informations. If
one drop in a jar to extract these data from e.g. java files the code
will use it in the second.
I wont say it isnt possible to do this by extending the API, but I think
it will bloat it.
---
Mario
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org
Re: [vfs] File Metadata
Posted by Mark Fortner <ph...@yahoo.com>.
Mario,
The one problem I see with the service API is that if I'm trying to find
metadata for a FileObject, looking in the service API isn't an obvious
thing to do. Most people when they're starting to learn VFS are going
to look for some method in the FileObject (or if they're clever in the
FileContentInfo). Either of these places are logical places to look for
metadata methods.
Any ideas about how we could make it easier for them?
Mark
Mario Ivankovits wrote:
> Hi Mark!
>
>> My current metadata list includes:
>>
>> EXIF/IPTC for JPEGs, and PNGs
>> Open Office metadata
>> Microsoft Office metadata
>> MP3/AAC/Ogg metadata
>>
> Ok, so this perfectly match the operation API (aka service API)
> I'll try to find some time this week to apply this patch so you can
> start using it.
>
>
>> The services API looks like a rewrite of the commands in the JavaBeans
>> Activation Framework.
>>
> Yes and no. For sure we can write an adapter then to expose our api
> throug JAF too. But as far as I understand JAF works on mimeTypes only.
> Our operation API should provide operations SVN like update/commit too.
> This cant be expressed through the mimeType of the file.
> The possible operations for a given fileObject is a merge of
> 1) for sure its mimeType
> 2) directory capabilites (SVN, CVS, ...) and
> 3) global fileSystem capabilities (copy, move, rename, ....)
>
>
> ---
> Mario
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>
>
Re: [vfs] File Metadata
Posted by Mario Ivankovits <ma...@ops.co.at>.
Hi Mark!
> My current metadata list includes:
>
> EXIF/IPTC for JPEGs, and PNGs
> Open Office metadata
> Microsoft Office metadata
> MP3/AAC/Ogg metadata
Ok, so this perfectly match the operation API (aka service API)
I'll try to find some time this week to apply this patch so you can
start using it.
> The services API looks like a rewrite of the commands in the JavaBeans
> Activation Framework.
Yes and no. For sure we can write an adapter then to expose our api
throug JAF too. But as far as I understand JAF works on mimeTypes only.
Our operation API should provide operations SVN like update/commit too.
This cant be expressed through the mimeType of the file.
The possible operations for a given fileObject is a merge of
1) for sure its mimeType
2) directory capabilites (SVN, CVS, ...) and
3) global fileSystem capabilities (copy, move, rename, ....)
---
Mario
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org
Re: [vfs] File Metadata
Posted by Mark Fortner <ph...@yahoo.com>.
Mario,
My current metadata list includes:
EXIF/IPTC for JPEGs, and PNGs
Open Office metadata
Microsoft Office metadata
MP3/AAC/Ogg metadata
The services API looks like a rewrite of the commands in the JavaBeans
Activation Framework.
Mark
Mario Ivankovits wrote:
> Mark Fortner schrieb:
>
>> I've been thinking about implementing support for File metadata on a
>> project that I'm working on.
>>
> Could you please tell me what kind of metadata you would like to provide?
> Maybe it fits better to the (still waiting) services/aspect approach.
>
> See http://wiki.apache.org/jakarta-commons/VfsNext #2
>
> ---
> Mario
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>
>
Re: [vfs] File Metadata
Posted by Mario Ivankovits <ma...@ops.co.at>.
Mark Fortner schrieb:
> I've been thinking about implementing support for File metadata on a
> project that I'm working on.
Could you please tell me what kind of metadata you would like to provide?
Maybe it fits better to the (still waiting) services/aspect approach.
See http://wiki.apache.org/jakarta-commons/VfsNext #2
---
Mario
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org