You are viewing a plain text version of this content. The canonical link for it is here.
Posted to repository@apache.org by Stephen McConnell <mc...@apache.org> on 2003/11/11 12:12:25 UTC

Re: repository@ awareness?


Leo Simons wrote:

> Justin Erenkrantz wrote:
>
>> Do any 'core' infrastructure people need to get involved to help 
>> guide with what's practical or not?
>
>
> yep. But I doubt you really need to get 'deeply' involved. A half-page
> explanation of what resources are and are not available should be 
> enough, don't
> you think? 


I'm probably in a minority - so don't count anything I say as an 
indicator of public opinion.

First off - the board wants human readable safe downloading. Personally 
I think this objective is of minor relevance/impact to ASF in the medium 
term. Since early 1998, the notion of repository-aware applications has 
been growing. Here in Apache its in its infancy - but clearly prevalent 
in the Java community. Maven is an early example (hit a repository for 
jar downloading to resolve n build dependencies) - Avalon is another 
example - (hit the repository and get back a class loader hierarchy).

File system - a convenient and simple solution - but should a file 
system driven approach be the basis for the next generation? My 
conclusion - no. A solution must be implementation independent - I 
should be able to map a protocol to a RDMS, LDAP, simplistic HTTP over 
file layout, even an XMI repo over IIOP if deemed appropriate.

So why a preoccupation with meta-less file system structures as opposed 
to a preoccupation with an extensible repository protocol?

Here is an example of a modern repository aware application.

$ merlin http://dpml.net/avalon-http/block.xml

The above command has executed the following:

(a) bootstrapping of a repository client
(b) resolution of repository adapter implementation
(c) downloading and installation of repository adapter and dependencies 
(meta data)
(d) bootstrapping of the repository adapter into action (meta data)
(e) downloading of block.xml using the repository adapter (i.e. protocol 
independent)
(f) validation of the downloaded artifact (meta data)
(g) construction of information about block dependencies by the local 
app (meta data)
(h) recursively downloaded artefact dependencies (meta data)
(i) local creation of a class loader hierarchy based on class loader 
assignments (meta data)
(j) created a container holding a set of composite components
(k) executed the orderly deployment of supporting components
(l) started a web server, and a set of business components, and a servlet

First time user will trigger something in the order of about 30-40 
downloads. Local system will cache information and monitor the 
repository for changes.

Step 2 - user launches a command to manage the running servlet

(a) jmx management libraries are auto-downloaded (meta data)
(b) along with a dozen commons jar file (meta data)
(c) management app invokes request on management agent download
(d) agent is deployed in a target JVM (local deployment)
(e) jnlp client completes downloading of three jar files signed using 
X509 certificates into a third JVM
(f) applet appears in users browser
(g) user updates parameters
(h) updated deployment profile is sent to remote repository (meta data)
(i) local client synchronizes local cache relative to remote repo (meta 
data)

All of the above from one command and a few clicks of a mouse. Ok, I 
confess - we don't have of the above in place today - but do have the 
majority. This benefits significantly from a rigerouse protocol 
supporting artefact location, feature assessment (meta data), 
authentication, replication and validation. An argument that appears 
popular on repository@ is that the basic files system does not need to 
be meta-aware - i.e. no distinction between artefact and 
info-about-an-artificat. IMO it is basically a misadventure to focus so 
closely on subjects such as file system structure (the lowest common 
denominator solution). Instead – should we not be defining a protocol 
that is a transport and implementation independent? A protocol that will 
enable the functional requirements of artefact authentication, artefact 
navigation, artefact retrieval and artefact registration.

Popular arguments are that agreement on meta information associated with 
artefacts is not achivable - and yet the simple notion of named value 
pairs is a widespread abstraction. This simple notion of "the artefact" 
+ "information about an artefact" is IMO a fundamental requirement. 
After all - isn't thjis 2003 - we have the technology! Surely our 
repository spec should enable an implementation based on a files 
systems, but equally, should not restrict the potential for transparent 
replacement with alternative more advanced and efficient solutions.

Also of relavance are the economic and social impacts. A repository not 
capable of supporting or evolving towards forward looking 
repository-enabled requirements as outlined in the above scenario is 
destined to be redundant within a matter of a few years. Redundant 
because it will not be relevant to a predominant programmatic scenarios 
and redundant because it will not meet basic functional requirements.

So what are the basic requirements?

* structure - the basic notions (groups, artefacts, versions, types)
* information - properties attributable to structural items (a.k.a. 
meta-data)
* function - operations that can be performed on structural items 
relative to available information

Ok - shoot me down in a ball of flames!

:-)

Cheers, Steve.

-- 

Stephen J. McConnell
mailto:mcconnell@apache.org




Re: repository@ awareness?

Posted by Michal Maczka <mm...@interia.pl>.
Stephen McConnell wrote:

[..]

>
> All of the above from one command and a few clicks of a mouse. Ok, I 
> confess - we don't have of the above in place today - but do have the 
> majority. This benefits significantly from a rigerouse protocol 
> supporting artefact location, feature assessment (meta data), 
> authentication, replication and validation. An argument that appears 
> popular on repository@ is that the basic files system does not need to 
> be meta-aware - i.e. no distinction between artefact and 
> info-about-an-artificat. 

Stephen:
Please understand that artifact's meta data is simply  just another 
artifact.  Every file which lives in repository is an artifact
And  we rather don't need any extra level of abstraction.
Notion of "the artifact" + "information about an artifact" is already 
exhausted when we will clarify the notion of artifact and define 
repository layout for artifacts.
You can have as many levels of metadata as you would like (meta data , 
metametametameta data and whatever else anybody will need).

In maven world we have

foo/jars/foo-1.0.jar
     /poms/foo-1.0.pom

Jar is an artifact
Pom is also an artifact which provides some meta information (of course 
not all) about Jar. 
You can add as many other files to  to repository as you wish.

There is clear distinction between artifact and info-about-an-artificat 
as they both will be different files in the repository (artifacts).
Possibly "info-about-an-artificat" could be located in few files and 
accessed selectivly by different tools.
Metadata about repository itself can be also kept in repository.
Even directory listings in few different flavours for different tools 
can be in repository.
Can you provide an explanation what exactly is not covered by such approach?


[..]

>
> Also of relavance are the economic and social impacts. A repository 
> not capable of supporting or evolving towards forward looking 
> repository-enabled requirements as outlined in the above scenario is 
> destined to be redundant within a matter of a few years. Redundant 
> because it will not be relevant to a predominant programmatic 
> scenarios and redundant because it will not meet basic functional 
> requirements.
>
So you want us to predict what will happen in few years :)?

Again I don't understand you:
You can build any abstractions you like on the top of the repository 
with features that were dissussed.
Aren't you doing it even now when you use  maven repository for storing 
information about your avalon services?


Michal



Re: repository@ awareness?

Posted by Joerg Pietschmann <pi...@apache.org>.
Stephen McConnell wrote:
> 
> Noel:
> 
> Thanks for the W3C style reference.  One of the subjects it deals with 
> is content negotiation http://www.w3.org/Provider/Style/URI.html#remove 
> - and this got me thinking about how metadata as opposed to a resoruce 
> that metadata is describing can be resolved.  I'm going to try to dig up 
> some more on on content negotiation subject as this may be a factor in 
> resolving some of the requirements I have.

The XML-DEV and various RDF related mailing lists hold
discussions about this topic regularly.

BTW this raises the question whether a RDF derivative or
a completely self-designed XML vocabulary will be used
for the repository metadata.

J.Pietschmann


RE: repository@ awareness?

Posted by "Noel J. Bergman" <no...@devtech.com>.
> You're saying that those interested in enabling a repo with
> metadata and searches based on this metadata could wrap the
> repository with a servlet.

Could?  Yes.  But that is just one way of many.  I maintain that httpd could
serve the content of most repositories, meta-data and all, without dynamic
content generation.

> The URI could be used by the servlet to give a different view
> of the repository based on [criteria embedded in the request]

IMO, the request should to encode the complete request.  There should not be
any other implied context.

> the servlet manages the interaction behind the scenes with
> some sort of metadata database to conduct the query and
> return the results as if they were regular files on the
> server's repo file system.

It depends upon the repository implementation.  It could work as you
describe, or there could just be pre-built metadata stored in files.

Consider that eventually web sites will likely use Subversion with WebDAV as
their authoring mechanism.  Authorized people will post directly to a
Subversion repository.  Although httpd can load directly from Subversion,
that will not be as efficient as serving directly from the file system.  The
reason for that is that sendfile() does not work directly out of a BDB
database (as far as I know).  Therefore, when a file is posted to
Subversion, it could be mirrored by a hook to a directory representing the
current content, which is what would then be served by httpd.  We used a
similar technique at GEIS years ago with SourceSafe, so that when a checking
occurred, a copy went into a shadow directory, and a build test was
initiated.  Likewise, a tool could be invoke to build meta-data, and store
it in the file system.

So there are ways and ways and more ways.  The goal is the same, as should
be the externally viewable behavior.

	--- Noel


RE: repository@ awareness?

Posted by Alex Karasulu <ao...@bellsouth.net>.
Noel,

So let me understand.  You're saying that those interested in enabling a
repo with metadata and searches based on this metadata could wrap the
repository with a servlet.  The URI could be used by the servlet to give a
different view of the repository based on parameters, search filters
embedded in it et. Cetera.  Then the servlet manages the interaction behind
the scenes with some sort of metadata database to conduct the query and
return the results as if they were regular files on the server's repo file
system.

That sounds like it has a lot of potential.  

Alex

> Noel J. Bergman wrote:
> 
> >Stephen McConnell asked:
> >
> >
> >
> >>File system - a convenient and simple solution - but should a file
> >>system driven approach be the basis for the next generation?
> >>
> >>
> >
> >The basis is a URI space.  Whether a URI is efficiently served by a
> static
> >file, or by some servlet, CGI or Grandma Moses typing very VERY fast
> really
> >should not be visible to the user-agent.
> >
> >
> >
> >>A solution must be implementation independent
> >>
> >>
> >
> >See: http://www.w3.org/Provider/Style/URI.html
> >
> >The URI is a request for content.  It should not change, regardless of
> the
> >means by which the content is generated.
> >
> >
> >
> >>So why a preoccupation with meta-less file system structures as opposed
> >>to a preoccupation with an extensible repository protocol?
> >>
> >>
> >
> >The "extensible repository protocol" is HTTP.  Nothing else needs to be
> >visible.  The only thing that the infrastructure team needs to deal with
> is
> >the implementation of the URI space (allowing that the content addressed
> by
> >a URI can vary based upon the user-agent).
> >
> >	--- Noel
> >
> >
> >
> >
> 
> --
> 
> Stephen J. McConnell
> mailto:mcconnell@apache.org
> 
> 




Re: repository@ awareness?

Posted by Stephen McConnell <mc...@apache.org>.
Noel:

Thanks for the W3C style reference.  One of the subjects it deals with 
is content negotiation http://www.w3.org/Provider/Style/URI.html#remove 
- and this got me thinking about how metadata as opposed to a resoruce 
that metadata is describing can be resolved.  I'm going to try to dig up 
some more on on content negotiation subject as this may be a factor in 
resolving some of the requirements I have.

Stephen.


Noel J. Bergman wrote:

>Stephen McConnell asked:
>
>  
>
>>File system - a convenient and simple solution - but should a file
>>system driven approach be the basis for the next generation?
>>    
>>
>
>The basis is a URI space.  Whether a URI is efficiently served by a static
>file, or by some servlet, CGI or Grandma Moses typing very VERY fast really
>should not be visible to the user-agent.
>
>  
>
>>A solution must be implementation independent
>>    
>>
>
>See: http://www.w3.org/Provider/Style/URI.html
>
>The URI is a request for content.  It should not change, regardless of the
>means by which the content is generated.
>
>  
>
>>So why a preoccupation with meta-less file system structures as opposed
>>to a preoccupation with an extensible repository protocol?
>>    
>>
>
>The "extensible repository protocol" is HTTP.  Nothing else needs to be
>visible.  The only thing that the infrastructure team needs to deal with is
>the implementation of the URI space (allowing that the content addressed by
>a URI can vary based upon the user-agent).
>
>	--- Noel
>
>
>  
>

-- 

Stephen J. McConnell
mailto:mcconnell@apache.org




RE: repository@ awareness?

Posted by "Noel J. Bergman" <no...@devtech.com>.
Stephen McConnell asked:

> File system - a convenient and simple solution - but should a file
> system driven approach be the basis for the next generation?

The basis is a URI space.  Whether a URI is efficiently served by a static
file, or by some servlet, CGI or Grandma Moses typing very VERY fast really
should not be visible to the user-agent.

> A solution must be implementation independent

See: http://www.w3.org/Provider/Style/URI.html

The URI is a request for content.  It should not change, regardless of the
means by which the content is generated.

> So why a preoccupation with meta-less file system structures as opposed
> to a preoccupation with an extensible repository protocol?

The "extensible repository protocol" is HTTP.  Nothing else needs to be
visible.  The only thing that the infrastructure team needs to deal with is
the implementation of the URI space (allowing that the content addressed by
a URI can vary based upon the user-agent).

	--- Noel