You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lenya.apache.org by Jörn Nettingsmeier <po...@uni-duisburg.de> on 2006/06/08 00:37:32 UTC

page envelope namespace issues...

Apache Wiki wrote:
> Dear Wiki user,
> 
> You have subscribed to a wiki page or wiki category on "Lenya Wiki" for change notification.
> 
> The following page has been changed by JörnNettingsmeier:
> http://wiki.apache.org/lenya/LenyaSpecificXMLNamespaces
> 
> ------------------------------------------------------------------------------
>   == lenya:* http://apache.org/cocoon/lenya/page-envelope/1.0 ==

oh my god. hope i'm not stepping on anyone's toes here, but the element 
definitions for that namespace in PageEnvelope.java (and all over the 
place) are the most frightful pile of ad-hoc crap i've ever seen. no 
regard for orthogonality, no hierarchy, just a dumpster for years of 
"neat feature of the day" ideas...

is there any interest in cleaning this up and deprecating some of the 
more glaringly redundant fields? or should we better not touch what works?

imnsho there needs to be a watertight textual definition of what's legal 
in any namespace, not a loose "add whatever you need" policy distributed 
across a number of sourcefiles in totally unrelated packages.

<dream target="lenya 2.0">i don't know if it's worth it or whether it 
will be efficient, but wouldn't it be a lot nicer to have a clean, 
authoritative grammar defined in either rng or xsd, and java beans to 
mimic this grammar 1:1 internally, including hierarchical data?</dream>

for the time being, there are a few unclear issues:

* which of the fields defined in PageEnvelope.java (and in the page 
evelope input module) are actually needed (and still being used) for 
core lenya mechanisms (as opposed to "i could use this for my current 
project, let's hack it in")? the wiki page is already becoming unwieldy 
with just that one namespace.

* the LenyaMetaDataGenerator introduces a somewhat nicer structure by 
using wrappers, but these need to be defined somehow. is there general 
interest to have hierarchical metadata, or do you prefer flat for 
internal use? if it turns out that the needs of the java coders are 
vastly different than those of pipeline jugglers, then the metadata 
generator should map the internal data on a different namespace with 
well-defined semantics and not hijack the page envelope ns.

* another hairy issue is that magic "lenya:custom" tag, which basically 
states that everything you care to dump here magically becomes part of 
the page envelope namespace. here be dragons! either let's change it to 
something like <lenya:customtag name="key">value</lenya:customtag>, or 
require that people put those custom tags in another namespace, as is 
the case with the dc: tags.


regards,

jörn





-- 
"Open source takes the bullshit out of software."
	- Charles Ferguson on TechnologyReview.com

--
Jörn Nettingsmeier, EDV-Administrator
Institut für Politikwissenschaft
Universität Duisburg-Essen, Standort Duisburg
Mail: pol-admin@uni-due.de, Telefon: 0203/379-2736

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: page envelope namespace issues...

Posted by Andreas Hartmann <an...@apache.org>.
Joern Nettingsmeier wrote:

[...]

>> 1) Backwards compatibility
>>
>>    Since meta data are stored with the content, they must either
>>    be stable, or there must be a migration tool.
> 
> that's what i was aiming at: if we introduce modular, arbitraty
> metadata, it should still be clear which is "core", meaning: for these
> fields the maintainers will take care of backwards compatibility or
> provide a migration tool.

Where should this information be available?
Isn't it sufficient that these meta data are declared by the core?


>> 2) Protection
>>
>>    Maybe some meta data should be read-only, or only accessible from
>>    certain components.
> 
> important point. i hadn't thought of that...
> that implies we should have a central component that handles all
> metadata operations - i'm not familiar with the code, perhaps that is
> already the case?

At the moment, there are separate classes for different meta data.
IMO this polymorphism should be replaced by configuration, and all
meta data should be handled by a single component.


>> 3) Validation
>>
>>    It should be possible to define additional meta data sets which are
>>    used by custom components. The meta data should be validated, i.e.
>>    the set of keys is fixed.
> 
> hmmm. i don't understand. either the set of keys is fixed, or it's
> possible to define addtional data sets... can you explain?

 From my point of view, meta data can be organized in sets.
A set is identified by a namespace.

Lenya should allow to declare sets, each of which has a
well-defined set of elements. New core components and custom
components wouldn't have to extend the existing sets, but could
introduce new sets. The key sets of the existing sets wouln't
have to be changed.

For each set, the common meta data component could check if
a key exists, if a key supports multiple values etc.


Some Lenya core sets are:

   DublinCoreTerms     dc:creator, dc:title, ...
   DublinCoreElements  dcterms:contributor, dcterms:coverage, ...
   DocumentMetaData    resourceType, sourceExtension, ...
   WorkflowMetaData    version

Custom sets could be

   MuseumMetaData      painter, year, technique, ...
   BicycleMetaData     manufacturer, no. of gears, ...


I hope this explains what I mean,

-- Andreas


-- 
Andreas Hartmann
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
andreas.hartmann@wyona.com                     andreas@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: page envelope namespace issues...

Posted by Joern Nettingsmeier <po...@uni-due.de>.
Andreas Hartmann wrote:
> Jörn Nettingsmeier wrote:
> 
> [...]
> 
>>>> * the LenyaMetaDataGenerator introduces a somewhat nicer structure
>>>> by using wrappers, but these need to be defined somehow. is there
>>>> general interest to have hierarchical metadata, or do you prefer
>>>> flat for internal use?
>>>
>>> IMO a flat structure is sufficient.
>>
>> ok, the new schema is flat. i have even flattened the "var:is_live"
>> thing into just a <isLive/> element. it's probably easier and more
>> concise to extend the schema when new var:foo fields are needed than
>> to build extensibility into the schema. this way, the schema can be
>> used as documentation.
> 
> Hmmm - the is_live workflow variable is not generic, it is just a
> variable which is used by the default publication's workflow. I'm
> not yet familiar with your code - does it still support arbitrary
> workflow variables?

currently it does, but i'd rather get rid of it, and mandate that all
fields are defined in a grammar that is used for validation.

>>> We should abandon the concept of "custom" meta data and introduce
>>> modularized meta data instead. IMO this is not too complex and should
>>> be implemented for 1.4, since it affects the content repository.
>>
>> as michi remarked, it is important to be able to differentiate between
>> core metadata that is handled by lenya and must be backwards
>> compatible, and custom stuff that is just passed through as-is.
> 
> I totally agree, but this is a different issue. We have several concern
> areas which are not necessarily related:
> 
> 1) Backwards compatibility
> 
>    Since meta data are stored with the content, they must either
>    be stable, or there must be a migration tool.

that's what i was aiming at: if we introduce modular, arbitraty
metadata, it should still be clear which is "core", meaning: for these
fields the maintainers will take care of backwards compatibility or
provide a migration tool.

> 2) Protection
> 
>    Maybe some meta data should be read-only, or only accessible from
>    certain components.

important point. i hadn't thought of that...
that implies we should have a central component that handles all
metadata operations - i'm not familiar with the code, perhaps that is
already the case?

> 3) Validation
> 
>    It should be possible to define additional meta data sets which are
>    used by custom components. The meta data should be validated, i.e.
>    the set of keys is fixed.

hmmm. i don't understand. either the set of keys is fixed, or it's
possible to define addtional data sets... can you explain?

> I don't understand why (3) would violate (1) ...

it doesn't.


-- 
"Án nýrra verka, án nútimans, hættir fortíðin að vekja áhuga."
"Without new works, without the present the past will cease to be of
interest."
        - Ásmundur Sveinsson (1893-1982)

--
Jörn Nettingsmeier, EDV-Administrator
Institut für Politikwissenschaft
Universität Duisburg-Essen, Standort Duisburg
Mail: pol-admin@uni-due.de, Telefon: 0203/379-2736


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: page envelope namespace issues...

Posted by Andreas Hartmann <an...@apache.org>.
Jörn Nettingsmeier wrote:

[...]

>>> * the LenyaMetaDataGenerator introduces a somewhat nicer structure by 
>>> using wrappers, but these need to be defined somehow. is there 
>>> general interest to have hierarchical metadata, or do you prefer flat 
>>> for internal use?
>>
>> IMO a flat structure is sufficient.
> 
> ok, the new schema is flat. i have even flattened the "var:is_live" 
> thing into just a <isLive/> element. it's probably easier and more 
> concise to extend the schema when new var:foo fields are needed than to 
> build extensibility into the schema. this way, the schema can be used as 
> documentation.

Hmmm - the is_live workflow variable is not generic, it is just a
variable which is used by the default publication's workflow. I'm
not yet familiar with your code - does it still support arbitrary
workflow variables?

[...]

>> We should abandon the concept of "custom" meta data and introduce
>> modularized meta data instead. IMO this is not too complex and should
>> be implemented for 1.4, since it affects the content repository.
> 
> as michi remarked, it is important to be able to differentiate between 
> core metadata that is handled by lenya and must be backwards compatible, 
> and custom stuff that is just passed through as-is.

I totally agree, but this is a different issue. We have several concern
areas which are not necessarily related:

1) Backwards compatibility

    Since meta data are stored with the content, they must either
    be stable, or there must be a migration tool.

2) Protection

    Maybe some meta data should be read-only, or only accessible from
    certain components.

3) Validation

    It should be possible to define additional meta data sets which are
    used by custom components. The meta data should be validated, i.e.
    the set of keys is fixed.


I don't understand why (3) would violate (1) ...

Maybe you (Jörn + Michi) can explain your concerns?
Thanks!

-- Andreas

-- 
Andreas Hartmann
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
andreas.hartmann@wyona.com                     andreas@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: page envelope namespace issues...

Posted by Jörn Nettingsmeier <po...@uni-duisburg.de>.
Andreas Hartmann wrote:
> Jörn Nettingsmeier wrote:
 >
>> is there any interest in cleaning this up and deprecating some of the 
>> more glaringly redundant fields?
> 
> +1
> 
>> or should we better not touch what works?
> 
> -1, the code must be kept alive

ok, i have started with a relax ng schema proposal for a new meta data 
generator that takes your comments into account. see my other post for 
the actual schema.

>> * which of the fields defined in PageEnvelope.java (and in the page 
>> evelope input module) are actually needed (and still being used) for 
>> core lenya mechanisms (as opposed to "i could use this for my current 
>> project, let's hack it in")? the wiki page is already becoming 
>> unwieldy with just that one namespace.
> 
> IMO the page envelope should be removed entirely and replaced by
> single-purpose modules. This process has already been started but
> has never been finished.

ok, i won't be able to contribute much to that, it's too much code... 
i'd rather concentrate on fixing things i already know a little about.

>> * the LenyaMetaDataGenerator introduces a somewhat nicer structure by 
>> using wrappers, but these need to be defined somehow. is there general 
>> interest to have hierarchical metadata, or do you prefer flat for 
>> internal use?
> 
> IMO a flat structure is sufficient.

ok, the new schema is flat. i have even flattened the "var:is_live" 
thing into just a <isLive/> element. it's probably easier and more 
concise to extend the schema when new var:foo fields are needed than to 
build extensibility into the schema. this way, the schema can be used as 
documentation.

> What we need is modularized meta data, i.e. components can define
> their own meta data sets.

the current schema allows for a <lenya-meta:custom/> element that can 
contain arbitrary xml data, provided that the elements within are *not* 
in the lenya:meta namespace.

>> if it turns out that the needs of the java coders are vastly different 
>> than those of pipeline jugglers, then the metadata generator should 
>> map the internal data on a different namespace with well-defined 
>> semantics and not hijack the page envelope ns.
> 
> +1

done.

>> * another hairy issue is that magic "lenya:custom" tag, which 
>> basically states that everything you care to dump here magically 
>> becomes part of the page envelope namespace. here be dragons! either 
>> let's change it to something like <lenya:customtag 
>> name="key">value</lenya:customtag>, or require that people put those 
>> custom tags in another namespace, as is the case with the dc: tags.

done, see above.

> We should abandon the concept of "custom" meta data and introduce
> modularized meta data instead. IMO this is not too complex and should
> be implemented for 1.4, since it affects the content repository.

as michi remarked, it is important to be able to differentiate between 
core metadata that is handled by lenya and must be backwards compatible, 
and custom stuff that is just passed through as-is.


-- 
"Open source takes the bullshit out of software."
	- Charles Ferguson on TechnologyReview.com

--
Jörn Nettingsmeier, EDV-Administrator
Institut für Politikwissenschaft
Universität Duisburg-Essen, Standort Duisburg
Mail: pol-admin@uni-due.de, Telefon: 0203/379-2736

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: page envelope namespace issues...

Posted by Andreas Hartmann <an...@apache.org>.
Michael Wechner wrote:

[...]

> well, I think we should cleanly separate Lenya stuff from custom stuff, 
> because of migration, Schema and legacy content reasons

Yes, I totally agree. But this is a different issue which applies
to many parts of the code base. I already started the separation
for Java classes. Do you have any ideas how to achieve it for
other resources as well? I guess the issue is worth another thread.

[...]

>> Resource types and other components could declare additional meta data
>> sets for specific purposes, e.g.
>>
>> - access permissions in certain environments
>> - content descriptions
>> - information about search indexing (how, when, ...)
>>
>>
>> A meta data element set consists of a set of attribute keys
>> and is identified by a namespace.
> 
> can you give an even more specific example ;-) ?

Imagine you add a module for a "picture" resource type which
is used by a museum to store picture previews. You could introduce
a meta data set (see [1]):

<meta-data namespace-uri="http://mymuseum.org/metadata">
   <element name="painter"/>
   <element name="showroom"/>
   <element name="year" optional="true"/>
   <element name="techniques" multiple="true"/>
</meta-data>

This way, you can clearly separate your specific meta data from
any other meta data without mis-using dublin core elements etc.

And it would be quaranteed that the repository validates
that all meta data of these resources are entered correctly.
This is not possible with an arbitrary set of custom meta data.


Here are some threads about this issue:

[1] http://www.nabble.com/-Proposal--Configurable-meta-data-t310980.html#a869474
[2] http://www.nabble.com/-RT--Generic-meta-data-t6866.html#a19579

-- Andreas


-- 
Andreas Hartmann
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
andreas.hartmann@wyona.com                     andreas@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: page envelope namespace issues...

Posted by Michael Wechner <mi...@wyona.com>.
Andreas Hartmann wrote:
> Michael Wechner wrote:
>> Andreas Hartmann wrote:
>>>
>>> We should abandon the concept of "custom" meta data and introduce
>>> modularized meta data instead. 
>>
>> I guess you mean separating custom and Lenya metadata cleanly, right?
>>
>> Or otherwise can you explain a bit.
>
> I think we should not separate between "internal" ("Lenya") and
> custom meta data,

well, I think we should cleanly separate Lenya stuff from custom stuff, 
because of migration,
Schema and legacy content reasons
> but we should allow to use meta data sets which
> are identified by namespaces.

I think this is the least we can do ;-)
>
> Some examples used by the core are:
>
> - DublinCore elements
> - DublinCore terms
> - workflow-related meta data
> - access-control-related meta data
> - content item meta data (e.g. resource type, source extension, mime 
> type)

this seems to me Lenya core stuff

> - image-specific meta data (e.g. width, height)

I would consider this also resource specific
>
> Resource types and other components could declare additional meta data
> sets for specific purposes, e.g.
>
> - access permissions in certain environments
> - content descriptions
> - information about search indexing (how, when, ...)
>
>
> A meta data element set consists of a set of attribute keys
> and is identified by a namespace.

can you give an even more specific example ;-) ?

Thanks

Michi
>
> -- Andreas
>


-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
michael.wechner@wyona.com                        michi@apache.org
+41 44 272 91 61


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: page envelope namespace issues...

Posted by Andreas Hartmann <an...@apache.org>.
Michael Wechner wrote:
> Andreas Hartmann wrote:
>>
>> We should abandon the concept of "custom" meta data and introduce
>> modularized meta data instead. 
> 
> I guess you mean separating custom and Lenya metadata cleanly, right?
> 
> Or otherwise can you explain a bit.

I think we should not separate between "internal" ("Lenya") and
custom meta data, but we should allow to use meta data sets which
are identified by namespaces.

Some examples used by the core are:

- DublinCore elements
- DublinCore terms
- workflow-related meta data
- access-control-related meta data
- content item meta data (e.g. resource type, source extension, mime type)
- image-specific meta data (e.g. width, height)

Resource types and other components could declare additional meta data
sets for specific purposes, e.g.

- access permissions in certain environments
- content descriptions
- information about search indexing (how, when, ...)


A meta data element set consists of a set of attribute keys
and is identified by a namespace.

-- Andreas

-- 
Andreas Hartmann
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
andreas.hartmann@wyona.com                     andreas@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: page envelope namespace issues...

Posted by Michael Wechner <mi...@wyona.com>.
Andreas Hartmann wrote:
>
> We should abandon the concept of "custom" meta data and introduce
> modularized meta data instead. 

I guess you mean separating custom and Lenya metadata cleanly, right?

Or otherwise can you explain a bit.

Thanks

Michi
> IMO this is not too complex and should
> be implemented for 1.4, since it affects the content repository.
>
> -- Andreas
>
>


-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
michael.wechner@wyona.com                        michi@apache.org
+41 44 272 91 61


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: page envelope namespace issues...

Posted by Andreas Hartmann <an...@apache.org>.
Jörn Nettingsmeier wrote:
> Apache Wiki wrote:
>> Dear Wiki user,
>>
>> You have subscribed to a wiki page or wiki category on "Lenya Wiki" 
>> for change notification.
>>
>> The following page has been changed by JörnNettingsmeier:
>> http://wiki.apache.org/lenya/LenyaSpecificXMLNamespaces
>>
>> ------------------------------------------------------------------------------ 
>>
>>   == lenya:* http://apache.org/cocoon/lenya/page-envelope/1.0 ==
> 
> oh my god. hope i'm not stepping on anyone's toes here, but the element 
> definitions for that namespace in PageEnvelope.java (and all over the 
> place) are the most frightful pile of ad-hoc crap i've ever seen. no 
> regard for orthogonality, no hierarchy, just a dumpster for years of 
> "neat feature of the day" ideas...
> 
> is there any interest in cleaning this up and deprecating some of the 
> more glaringly redundant fields?

+1

> or should we better not touch what works?

-1, the code must be kept alive


> imnsho there needs to be a watertight textual definition of what's legal 
> in any namespace, not a loose "add whatever you need" policy distributed 
> across a number of sourcefiles in totally unrelated packages.
> 
> <dream target="lenya 2.0">i don't know if it's worth it or whether it 
> will be efficient, but wouldn't it be a lot nicer to have a clean, 
> authoritative grammar defined in either rng or xsd, and java beans to 
> mimic this grammar 1:1 internally, including hierarchical data?</dream>
> 
> for the time being, there are a few unclear issues:
> 
> * which of the fields defined in PageEnvelope.java (and in the page 
> evelope input module) are actually needed (and still being used) for 
> core lenya mechanisms (as opposed to "i could use this for my current 
> project, let's hack it in")? the wiki page is already becoming unwieldy 
> with just that one namespace.

IMO the page envelope should be removed entirely and replaced by
single-purpose modules. This process has already been started but
has never been finished.


> * the LenyaMetaDataGenerator introduces a somewhat nicer structure by 
> using wrappers, but these need to be defined somehow. is there general 
> interest to have hierarchical metadata, or do you prefer flat for 
> internal use?

IMO a flat structure is sufficient.
What we need is modularized meta data, i.e. components can define
their own meta data sets.


> if it turns out that the needs of the java coders are 
> vastly different than those of pipeline jugglers, then the metadata 
> generator should map the internal data on a different namespace with 
> well-defined semantics and not hijack the page envelope ns.

+1

> * another hairy issue is that magic "lenya:custom" tag, which basically 
> states that everything you care to dump here magically becomes part of 
> the page envelope namespace. here be dragons! either let's change it to 
> something like <lenya:customtag name="key">value</lenya:customtag>, or 
> require that people put those custom tags in another namespace, as is 
> the case with the dc: tags.

We should abandon the concept of "custom" meta data and introduce
modularized meta data instead. IMO this is not too complex and should
be implemented for 1.4, since it affects the content repository.

-- Andreas


-- 
Andreas Hartmann
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
andreas.hartmann@wyona.com                     andreas@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org