You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@gump.apache.org by "Adam R. B. Jack" <aj...@trysybase.com> on 2004/02/28 01:05:02 UTC

Gump Database

Just to start afresh on this topic:

Right now Gump uses a Python DBM database (on platforms other than M$, where
that currently isn't supported) to store a set of states/dates/counters,
i.e. what is the current & last state, how 'long' in this state, when first
built, how man successes, how many failures, etc. Basically, the stuff that
goes into here:

http://lsd.student.utwente.nl/gump/gump_stats/index.html

and into projects:

http://lsd.student.utwente.nl/gump/ant/ant.html#Statistics

Basically, I don't wish to use DBM for anything more than this, and I would
like to start storing some historical/trend/change information.

Clearly RDBMS as a likely candidate, stuff we all know and are generally
comfortable with. We could generate a schema to hold the runtime data and
(with help from Berin's pointers) it seems Python can query/update this.

Since Gump has so much XML (in workspaces, profiles, modules, projects etc.)
perhaps an XML store is a good candidate. I could see (for descriptors and
such) having the ability to detect changes in XML (from last instance) could
be useful. I could also see us wanting to query over the workspace XML and
the results.

I recently exercised XINDICE, and tried to step outside of my biases and see
if an XML store was a good thing. I never got past thinking of it as random
buckets-o-XML, and didn't find what I hoped to.

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=xindice-users@xml.apache.org&msgNo=3237

That all said, I kinda like what Stefano wrote & worry a relational model
could impede some of the more exciting and serendipitous results we'd like
to see.

http://www.betaversion.org/~stefano/linotype/news/46/

... although he's being a bit of a tease and not doing his thinking out load
for us. ;-)

I know Nick wants us to store the HTML output historically, so I wonder if
storing the XML outputs the same way might not be so bad. [Not good for
cross date queries, but...]

In short, I don't know the right approach & am open to all ideas. Also, this
smells like something we ought discuss/design here and on the wiki.

regards,

Adam

P.S. I know folks think an XML result could be converted to graphical graphs
for Forrest documentations, and since I am eager to work on generating a
result.xml (am currently using xml.dom.minidom, my new buddy, to write that)
I could easily generate other result XML files. So, specify/request away....

Working on Step III:

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=gump@jakarta.apache.org&msgNo=3772

--
Experience the Unwired Enterprise:
http://www.sybase.com/unwiredenterprise
Try Sybase: http://www.try.sybase.com

Re: Gump Database

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.

This can wait a while, and certainly until after your demo.

I'll try to move it to the Wiki so folks can read/contribute as their time
allows, and not struggle to decipher (Adam English)/comprehend/respond
whilst surviving a day's inbox overload...

regards

Adam
----- Original Message ----- 
From: "Stefano Mazzocchi" <st...@apache.org>
To: "Gump code and data" <ge...@gump.apache.org>
Sent: Monday, March 01, 2004 7:47 PM
Subject: Re: Gump Database


> Adam R. B. Jack wrote:
>
> > Stefano wrote:
> >
> >
> >>While the question that you should be asking yourself is: what does the
> >>data look like? how stuctured is it?
> >>
> >> From where I stand, the gump metadata is highly structured and can be
> >>perfectly mapped in to a relational structure with reasonable effort.
> >>Also, given its structure, can be indexed precisely and thus queried
> >>very efficiently.
> >
> >
> > So the fact that I can visualize this it into a huge rats nest in my
head
> > (especially when wired into objects) doesn't help me make a case for it
> > being unstructured? ;-) ;-)
> >
> > Ok, I hear you -- and stepping back, looking at the main (named)
entities as
> > entries in tables, I can see a relational schema with relationships as
names
> > with or without RI. What I can't see (though) is what helps me with the
time
> > aspect --- i.e. when a dependency is dropped, what do I compare against?
>
> nonono wait a second. What are you talking about? I was talking about
> putting the historical data into the system, not the gump metadata.
>
> > I guess the data I'm interested in right now is (somehow) relationships
over
> > time. One projects relationship to it's repository, to it's peers, to
> > communities (of users). How that looks, I'm not sure, but I'll try to
answer
> > that in my head before I continue.
>
> I have an email in my draft folder about how we can do perfect nagging
> with gump... but I need to understand the graph complexity before I can
> go on and I don't have much time ATM since we are delivering the first
> demo of our project next week.
>
> >>At that point, once you have the data in the database, you can start
> >>thinking about what to do with it. Dependency graph visualization,
> >>history of dependencies, FoG estimation, all of these are problems that
> >>will result in particular queries and particular use of the result set.
> >
> >
> > I like XML as the human (community) editable interface, and converting
it to
> > relational for each run really doesn't appeal to me.
>
> Of course!! Nonono, I don't want to move from XML descriptors to
> relational data, that would be stupid without an GUI or a webapp to
> guide people, but I wouldn't use it anyway.
>
> > Even if I do, comparing
> > as I load, and detecting changes -- also sounds like work. It also
sounds
> > similar to the XML to Object work that Gumpy is doing, and I was hoping
> > something could help out here w/o me doing it myself in pedestrian
steps.
>
> I am *NOT* proposing to change the way gump loads metadata but the way
> gump stores history information
>
> > I need to do more thinking, but thanks for the direct feedback, I
appreciate
> > that. Another persons clarity helps.
> >
> > BTW: So say we want MySQL [for results and maybe more], how do we set
that
> > up? Do we install, or leverage an existing MySQL install at Apache?
>
> good question, but too early now, let's focus on what we want to do and
> how, the infrastructural details will come after that.
>
> -- 
> Stefano.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: Gump Database

Posted by Stefano Mazzocchi <st...@apache.org>.

Adam R. B. Jack wrote:

> Stefano wrote:
> 
> 
>>While the question that you should be asking yourself is: what does the
>>data look like? how stuctured is it?
>>
>> From where I stand, the gump metadata is highly structured and can be
>>perfectly mapped in to a relational structure with reasonable effort.
>>Also, given its structure, can be indexed precisely and thus queried
>>very efficiently.
> 
> 
> So the fact that I can visualize this it into a huge rats nest in my head
> (especially when wired into objects) doesn't help me make a case for it
> being unstructured? ;-) ;-)
> 
> Ok, I hear you -- and stepping back, looking at the main (named) entities as
> entries in tables, I can see a relational schema with relationships as names
> with or without RI. What I can't see (though) is what helps me with the time
> aspect --- i.e. when a dependency is dropped, what do I compare against?

nonono wait a second. What are you talking about? I was talking about 
putting the historical data into the system, not the gump metadata.

> I guess the data I'm interested in right now is (somehow) relationships over
> time. One projects relationship to it's repository, to it's peers, to
> communities (of users). How that looks, I'm not sure, but I'll try to answer
> that in my head before I continue.

I have an email in my draft folder about how we can do perfect nagging 
with gump... but I need to understand the graph complexity before I can 
go on and I don't have much time ATM since we are delivering the first 
demo of our project next week.

>>At that point, once you have the data in the database, you can start
>>thinking about what to do with it. Dependency graph visualization,
>>history of dependencies, FoG estimation, all of these are problems that
>>will result in particular queries and particular use of the result set.
> 
> 
> I like XML as the human (community) editable interface, and converting it to
> relational for each run really doesn't appeal to me. 

Of course!! Nonono, I don't want to move from XML descriptors to 
relational data, that would be stupid without an GUI or a webapp to 
guide people, but I wouldn't use it anyway.

> Even if I do, comparing
> as I load, and detecting changes -- also sounds like work. It also sounds
> similar to the XML to Object work that Gumpy is doing, and I was hoping
> something could help out here w/o me doing it myself in pedestrian steps.

I am *NOT* proposing to change the way gump loads metadata but the way 
gump stores history information

> I need to do more thinking, but thanks for the direct feedback, I appreciate
> that. Another persons clarity helps.
> 
> BTW: So say we want MySQL [for results and maybe more], how do we set that
> up? Do we install, or leverage an existing MySQL install at Apache?

good question, but too early now, let's focus on what we want to do and 
how, the infrastructural details will come after that.

-- 
Stefano.

Re: Gump Database

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.

Stefano wrote:

> While the question that you should be asking yourself is: what does the
> data look like? how stuctured is it?
>
>  From where I stand, the gump metadata is highly structured and can be
> perfectly mapped in to a relational structure with reasonable effort.
> Also, given its structure, can be indexed precisely and thus queried
> very efficiently.

So the fact that I can visualize this it into a huge rats nest in my head
(especially when wired into objects) doesn't help me make a case for it
being unstructured? ;-) ;-)

Ok, I hear you -- and stepping back, looking at the main (named) entities as
entries in tables, I can see a relational schema with relationships as names
with or without RI. What I can't see (though) is what helps me with the time
aspect --- i.e. when a dependency is dropped, what do I compare against?

I guess the data I'm interested in right now is (somehow) relationships over
time. One projects relationship to it's repository, to it's peers, to
communities (of users). How that looks, I'm not sure, but I'll try to answer
that in my head before I continue.

> At that point, once you have the data in the database, you can start
> thinking about what to do with it. Dependency graph visualization,
> history of dependencies, FoG estimation, all of these are problems that
> will result in particular queries and particular use of the result set.

I like XML as the human (community) editable interface, and converting it to
relational for each run really doesn't appeal to me. Even if I do, comparing
as I load, and detecting changes -- also sounds like work. It also sounds
similar to the XML to Object work that Gumpy is doing, and I was hoping
something could help out here w/o me doing it myself in pedestrian steps.

I need to do more thinking, but thanks for the direct feedback, I appreciate
that. Another persons clarity helps.

BTW: So say we want MySQL [for results and maybe more], how do we set that
up? Do we install, or leverage an existing MySQL install at Apache?

regards,

Adam

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: Gump Database

Posted by Stefano Mazzocchi <st...@apache.org>.

Adam R. B. Jack wrote:

>>Stefano Mazzocchi wrote:
>>
>>>For this specific problem, I would go relational.
>>
>>+1. Brain-dead and ugly solution like mysql ;)
> 
> 
> When I think about the answers I've received [and thanks for them] I wonder
> if folks are thinking more about the ugly/brain dead type problem of results
> tracking. I agree, that is fine to do relationally. [That is where this
> thread originated, so perhaps I unintentionally led the response.]
> 
> BTW: First, let me state for the record, I'm no fan of XML for XML's sake &
> don't wish to try to push it where it isn't useful. Also, everybody ought
> use relational databases (specifically those wonderful ones from Sybase,
> which also happens to have some awesome XML/XQL features, and allows joining
> XML and relational data.) Ok, my corporate pitch ;-) ;-) ;-) over...
> 
> I realize I was more interested in the XML metadata, and it's changes over
> time. The results are why folks invest in granting us/maintaining their
> project metadata. The metadata is a map (a graph) of the community, and some
> interactions [some, needs to be extended for more], and I was hoping some
> form of XML repository could somehow give us a time factor on that.
> 
> I might be looking for more than is there, I might be looking for too low
> level an assist, but I'd like to know if folks who used to depend upon X no
> longer do [change in one XML], and when most migrated away [multiple]. I'd
> like to know if a product was a 'huge hit' [multiple], and who was migrating
> to it [multiple]. I'd like to see which communities/groups (lamely, for this
> mail, based off  repository) were using something. I'd like to know details
> like this over time.
> 
> I could push this graph into a relational schema but I feel that would be
> very restrictive, and would pre-determine the benefits. I guess I can take
> deltas, or store historical copies of the whole metadata, but I feel I need
> some tool that is into analysing XML over time. Maybe I do need XINDICE, or
> something like it.
> 
> Again, feel free to correct me -- and/or express your gut feelings against,
> but my gut tells me I have a storage problem but I don't know the right
> tools for the store, or the query mechanisms.

Adam,

I think you are looking at the problem from the wrong angle: the use of 
the data.

While the question that you should be asking yourself is: what does the 
data look like? how stuctured is it?

 From where I stand, the gump metadata is highly structured and can be 
perfectly mapped in to a relational structure with reasonable effort. 
Also, given its structure, can be indexed precisely and thus queried 
very efficiently.

At that point, once you have the data in the database, you can start 
thinking about what to do with it. Dependency graph visualization, 
history of dependencies, FoG estimation, all of these are problems that 
will result in particular queries and particular use of the result set.

Gump data does not exhibit any of the properties that appear in the 
semi-structured of highly connected pseudo-digraph problem space.

For this reason, I see absolutely no reason to avoid using relational 
solutions. Also because they are, by far, much more solid and easy to 
interoperate with than any other DBMS model.

-- 
Stefano.

Re: Gump Database

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.

> Stefano Mazzocchi wrote:
> > For this specific problem, I would go relational.
>
> +1. Brain-dead and ugly solution like mysql ;)

When I think about the answers I've received [and thanks for them] I wonder
if folks are thinking more about the ugly/brain dead type problem of results
tracking. I agree, that is fine to do relationally. [That is where this
thread originated, so perhaps I unintentionally led the response.]

BTW: First, let me state for the record, I'm no fan of XML for XML's sake &
don't wish to try to push it where it isn't useful. Also, everybody ought
use relational databases (specifically those wonderful ones from Sybase,
which also happens to have some awesome XML/XQL features, and allows joining
XML and relational data.) Ok, my corporate pitch ;-) ;-) ;-) over...

I realize I was more interested in the XML metadata, and it's changes over
time. The results are why folks invest in granting us/maintaining their
project metadata. The metadata is a map (a graph) of the community, and some
interactions [some, needs to be extended for more], and I was hoping some
form of XML repository could somehow give us a time factor on that.

I might be looking for more than is there, I might be looking for too low
level an assist, but I'd like to know if folks who used to depend upon X no
longer do [change in one XML], and when most migrated away [multiple]. I'd
like to know if a product was a 'huge hit' [multiple], and who was migrating
to it [multiple]. I'd like to see which communities/groups (lamely, for this
mail, based off  repository) were using something. I'd like to know details
like this over time.

I could push this graph into a relational schema but I feel that would be
very restrictive, and would pre-determine the benefits. I guess I can take
deltas, or store historical copies of the whole metadata, but I feel I need
some tool that is into analysing XML over time. Maybe I do need XINDICE, or
something like it.

Again, feel free to correct me -- and/or express your gut feelings against,
but my gut tells me I have a storage problem but I don't know the right
tools for the store, or the query mechanisms.

regards,

Adam

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: Gump Database

Posted by Leo Simons <le...@apache.org>.

Stefano Mazzocchi wrote:
> For this specific problem, I would go relational.

+1. Brain-dead and ugly solution like mysql ;)

-- 
cheers,

- Leo Simons

-----------------------------------------------------------------------
Weblog              -- http://leosimons.com/
IoC Component Glue  -- http://jicarilla.org/
Articles & Opinions -- http://articles.leosimons.com/
-----------------------------------------------------------------------
"We started off trying to set up a small anarchist community, but
  people wouldn't obey the rules."
                                                         -- Alan Bennett



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: Gump Database

Posted by Nick Chalko <ni...@chalko.com>.

Stefano Mazzocchi wrote:

>
> For this specific problem, I would go relational.
>
+1

> Your data is very structured, the schema not so complex, the queries 
> don't need cross-dataset semantics and XML can be generated from a 
> relational query very easily (forrest can do that too using the cocoon 
> SQL modules)
>
AGree

> Storing it with XML is, IMO, golden hammer.
>
:-*

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: Gump Database

Posted by Stefano Mazzocchi <st...@apache.org>.

Adam R. B. Jack wrote:

> Just to start afresh on this topic:
> 
> Right now Gump uses a Python DBM database (on platforms other than M$, where
> that currently isn't supported) to store a set of states/dates/counters,
> i.e. what is the current & last state, how 'long' in this state, when first
> built, how man successes, how many failures, etc. Basically, the stuff that
> goes into here:
> 
>     http://lsd.student.utwente.nl/gump/gump_stats/index.html
> 
> and into projects:
> 
>     http://lsd.student.utwente.nl/gump/ant/ant.html#Statistics
> 
> Basically, I don't wish to use DBM for anything more than this, and I would
> like to start storing some historical/trend/change information.
> 
> Clearly RDBMS as a likely candidate, stuff we all know and are generally
> comfortable with. We could generate a schema to hold the runtime data and
> (with help from Berin's pointers) it seems Python can query/update this.
> 
> Since Gump has so much XML (in workspaces, profiles, modules, projects etc.)
> perhaps an XML store is a good candidate. I could see (for descriptors and
> such) having the ability to detect changes in XML (from last instance) could
> be useful. I could also see us wanting to query over the workspace XML and
> the results.
> 
> I recently exercised XINDICE, and tried to step outside of my biases and see
> if an XML store was a good thing. I never got past thinking of it as random
> buckets-o-XML, and didn't find what I hoped to.
> 
> 
> http://nagoya.apache.org/eyebrowse/ReadMsg?listName=xindice-users@xml.apache.org&msgNo=3237
> 
> That all said, I kinda like what Stefano wrote & worry a relational model
> could impede some of the more exciting and serendipitous results we'd like
> to see.
> 
>     http://www.betaversion.org/~stefano/linotype/news/46/
> 
> ... although he's being a bit of a tease and not doing his thinking out load
> for us. ;-)
> 
> I know Nick wants us to store the HTML output historically, so I wonder if
> storing the XML outputs the same way might not be so bad. [Not good for
> cross date queries, but...]
> 
> In short, I don't know the right approach & am open to all ideas. Also, this
> smells like something we ought discuss/design here and on the wiki.
> 
> regards,
> 
> Adam
> 
> P.S. I know folks think an XML result could be converted to graphical graphs
> for Forrest documentations, and since I am eager to work on generating a
> result.xml (am currently using xml.dom.minidom, my new buddy, to write that)
> I could easily generate other result XML files. So, specify/request away....

For this specific problem, I would go relational.

Your data is very structured, the schema not so complex, the queries 
don't need cross-dataset semantics and XML can be generated from a 
relational query very easily (forrest can do that too using the cocoon 
SQL modules)

Storing it with XML is, IMO, golden hammer.

-- 
Stefano.