You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Stefano Mazzocchi <st...@apache.org> on 2000/05/15 19:25:37 UTC

[RT] Can Cocoon help enforcing the "semantic web"?

Here we are for another episode of the "random thoughts" series brought
to you by Stefano's fried synapses.

In a recent message (also picked up by xmlhack.com) I expressed my
concerns about the harm that a tool like Cocoon could do to the ideas
that XML and friends propose. Many of you responded with strong
arguments about the need for XML operativity on the server side, and I
totally agree with them (of course), also it was pointed out that such
needs would not be removed by the existance of widespread XML-capable
clients, true, they would be reshaped, but never removed.

I've thought very much about this and came to the conclusion that while
Cocoon is _NOT_ harmful to the XML model in general, it leaves to the
user a very important part of the job to enforce the "semantic web".

During the last few days, I went over all the "web design issues" that
W3C director Tim Berners-Lee outlines (http://www.w3.org/DesignIssues/),
did my homework on RDF, RDF Schema and all related materials, read some
whitepapers about metadata activities and started to think on how Cocoon
could help.

I've come across many interesting ideas and powerful dreams of a "web of
knowledge" where a layer of machine understandable information is
processed to create a layer of human understandable information, but
generally easier to process by humans because already filtered by
metadata processing.

I know many of you don't know about RDF and many others believe it's
just the XML equivalent of the HTML <meta> tag. In general, RDF is
believed to be a useless waste of time. I used to think this myself, but
I think it's time to look forward... and outline the problems that RDF
and friends have.

There are problems: the baby is hard to understand and use. RDF is
generally verbose, it has this (please, allow me) "useless"
element/attribute equivalence (which breaks validation in all possible
ways), it's utterly abstracted and provides no example of use that would
pay off in the short term.

RDF is more than a year old and almost nobody (except the Mozilla
project) has been using it. Why?

I don't have a general answer but I have my own: why should I care about
embedding RDF markup in my documents, if nobody is able to use it?

But same thing could be said for RDF-based applications (the infamous
chicken & egg problem): why should I write an RDF-capable engine if
there is no content available which contains RDF?

Sure, there are RFC that teach you how to embed RDF into your HTML
(yeah, right... you wish), also RFC that teach you what metadata
elements to use (the dublin core), David Megginson also wrote an RDF
wrapper for SAX, everybody in this world knows that this might be
big....

.... but the energy gap to arrive to that usability plateau is _HUGE_
and it seems that nobody is able to write that "killer app" that makes
this ball spinning.

Can Cocoon be this "killer app"?

I strongly believe so. Let me explain why:

Cocoon (starting from its version 2.0) is based on the sitemap. The
sitemap is the location of all the processing information required to
generate a resource. This is metadata, this is "data about data". If we
clean it up a little, RDF-ize it, then it would be very easy for Cocoon
to expose its sitemap to semantic crawlers.

Also, thru the use of content negotiation, it could be possible for the
crawler to obtain the "RDF" information (which could be the original
one, or one created on purpose), which along with XLink/XBase/XPointer
would allow the crawler to crawl in a friendly manner the site.

Ok, you say, I get that, but what would be different from today?

The thing is that we are going to write that crawler and connect it
directly to Cocoon so we would gain:

1) The sitemap is the instruction for both the resource generator
processor and both the information semantic crawler. Single point of
management, but would also allow people to pay off instantly their
metadata effort.

2) Each Cocoon would have it's own semantically driven search engine.

3) Each Cocoon would connect to other semantic search engines which make
available RDF views of their information (the mozilla directory, for
example) to increase their action range.

4) Each Cocoon would be contacted by other agents (other Cocoon or
equivalently behaving) and provide RDF views of its information,
possibly already semantically processed to avoid the need of site
recrawling of that agent.

If you think about it, such "cellular" semantically-based indexing would
work much like Napster/Gnutella networks where there would be no central
point of failure.

Imagine a web where each site controls not only its information, its
schemas, its stylesheets and web applications, but also its own search
engine and everyone of them is the entry point for a distributed (but
locally manageable) semantically based searching enviornment.

It would work much like routing tables work for TCP/IP networks,
propagating information as soon as they are available or delegating
search and retrieval to other networks.

I don't know if this feasible or not, but the idea seems to me *very*
exiting, to say the least.

                 ----------- o -------------

But how would a "semantically based search engine" work?

I still don't have a clear view of this, but I have a few ideas to
share: first of all, the RDFSchema WD adds a great deal of functionality
to the RDF idea and makes it very appealing.

[Careful here: RDFSchema is not to be confused by XMLSchema which is a
totally different thing. RDFSchema is -NOT- the XMLSchema for RDF, also
because RDF cannot be validated]

RDFSchema provides mostly object-oriented capabilities to the RDF model,
allowing, for example to say

 <rdf:description
  about="http://www.apache.org/~stefano/rt/latest"
  xmlns="http://metadata.org/people/jobs">
  <dreamer>Stefano Mazzocchi</dreamer>
 </rdf:description>

where the namespace "http://metadata.org/people/jobs" indicates (with an
RDFSchema) that

  dreamer --(extend)--> dc:author

where dc:author indicates the author tag of the Dublin Core standard
metadata set, which indicates the author of the described resource.

Then, on the local site, since users generally are made aware of the
specific metadata tags used, "dreamer" might have other meanings, but
for other sites that are unaware of these site-specific meanings, they
can fall back on standard "author" tag since the semantic has been
inherited.

Think about something like this where you are able to define whatever
metadata markup is required for your needs, but you provide semantic
hooks for outter searches to still match.

More or less like standard API provide functionalities that you extend
as you please, but then they allow you to run your program on any
compatible platform, such a semantic web would be based on standard set
of metadata tags, then what you need to do (if you don't want to use
those tags, or what to provide special searching capabilities) is to
extend them and make the RDFSchemas accessible in a known place.

Would this solve all of us problems? no way, like for XMLSchemas, the
problem of web "balkanization" and fragmentation exists, but as many
outlined, stable points on a dynamic system happen on the bottom of a
bell-shaped surface.

Today, we have a stable dyanamic system, since its potential energy is
on a local minimum.

W3C is providing us ideas on an ideal new minimum of the web potential
energy that lies far higher, into another local minimum.

We need to behave as catalizers to lower the energy required to move
from this minimum (current web) to the other minimum high above
(semantic web), otherwise, these ideas will simply remain in those W3C
specs and will never change our favorite information infrastructure as
they fully deserve to do.

Cocoon was born to allow the adoption of XML technologies to solve real
life problems and acted as a catalizer.

Now I want to provide the same thing to complete the job.

Don't know when I'll have time for this, but I invite you to follow me
on this quest if you like the idea... and if you think you have a better
one, it's even better.

I know I'm crazy. I know... :)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------



Re: [RT] Can Cocoon help enforcing the "semantic web"?

Posted by Paul Russell <pa...@luminas.co.uk>.
On Mon, May 15, 2000 at 07:25:37PM +0200, Stefano Mazzocchi wrote:
> Here we are for another episode of the "random thoughts" series brought
> to you by Stefano's fried synapses.

And one with potential, I think. Can I borrow some fried synapses?
Do they go well with red wine? ;)

> [...]
> 4) Each Cocoon would be contacted by other agents (other Cocoon or
> equivalently behaving) and provide RDF views of its information,
> possibly already semantically processed to avoid the need of site
> recrawling of that agent.

5) Each Cocoon has the ability to release agents into the 'cocoon'
   network and learn about other cocoons, preferably *not* by
   actually visiting them all (that would, afterall be a worm -
   Not A Good Idea ;) but by sharing information with cocoons that
   it feels are 'close' in outlook.

For example, say I had a Cocoon about Cocoon. I was this Cocoon's
author, and I give the cocoon agent xml.apache.org to play with
as a resource. It hops off and talks to apache.org's cocoon, and
learns about it and gives xml.apache.org some information about
the 'local' cocoon. From this stems the idea that Cocoons will
eventually know about the other cocoons in their subject area,
and will occasionally ask them what's happening? has anything
changed? know anyone new? and that kind of thing.

One thing bothers me. Is this really Cocoon's job? Cocoon is a
nice package at the moment, I'd hate to see it get 'diluted',
regardless of the potential of the idea. Is there some way we
can make this thing generic?

If I've repeated what you meant anyway Stefano, then apologies,
my synapses too are somewhat the worse for wear at the moment ;)

Just a few thoughts...

-- 
Paul Russell                               <pa...@luminas.co.uk>
Technical Director,                   http://www.luminas.co.uk
Luminas Ltd.

Re: [RT] Can Cocoon help enforcing the "semantic web"?

Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On 15/5/00 at 7:25 pm, stefano@apache.org (Stefano Mazzocchi) wrote:

>Here we are for another episode of the "random thoughts" series brought
>to you by Stefano's fried synapses.

I am sorry that there have not been many replies to this :(

I think these are great ideas.

Look to see (if you have not already :) what Userland <http://www.userland.com> is doing in this direction, they have a similar plan.
I'm not sure if what they are doing is based on RDF, but they do lots of stuff with XML/RPC, like code updates, search engine, agregation, distributed servers ....

Closer to home ... maybe RDF could be useful for my "linkMap" problem.

regards Jeremy

      ____________________________________________________________________

      Jeremy Quinn                                             media.demon
                                                           webSpace Design
     <ma...@media.demon.co.uk>       <http://www.media.demon.co.uk>
      <phone:+44.[0].207.737.6831>          <pa...@sms.genie.co.uk>