You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jdo-user@db.apache.org by Hei Virk <hv...@gmail.com> on 2016/06/05 18:11:07 UTC

Re: A new general method to update object databases / Should it be included in Java Data Objects, JDO ?

Hi everybody,

-- Sorry the delay in my answer

First, I made slides (a PDF) about the embed method and how its algorithm
works:
https://drive.google.com/file/d/0B8eKvcE21R9RMkNPYlZiOTEwZnM/view

I don't know if the Smalltalk community or other communities have this kind
of method (embed) but I believe not, because otherwise we would probably
know it!

It is true that gc is not a new concept in db world but but perhaps before
it has been used quite separately in the db.

What is new in the embed method? I think that at least the high level
method/abstraction where the result of the embed operation is defined
clearly between two memory systems (i.e. “the network level
understanding”). Also I think that in the next lower level binding the
update part and the gc part together (“embed = update + gc”) is a new idea.
And what is important in this binding is that the the update part is able
to collect information for the gc part and the gc part has full
network/graph level understanding in db even it is done locally and
therefore efficiently.

Reference cycles (circular substructures) are not problem in the algorithm
as is described in the demo implementation. On the other hand, gc can be
done more easily when garbage (or db) does not contain circular
substructures and both algorithms can be applied in real systems, the
simpler first.

It is true that storing reference counts takes some extra bytes in nodes in
db. That does not sound a big problem, because objects often contain quite
much user data and memories are quite big today, even in small devices.

Updating of reference counts is not a big performance problem in my
opinion. If reference counts must be updated, it often means that you are
making a big update operation when updating of reference counts is a small
part of the operation. In addition, you can often update reference count
with the same SQL update statement (in O/R systems) which updates fields
containing user data. (However in a pure reference lose you are not
updating the user data of an object.) Also note that it is possible to keep
reference count information (sum of increments and decrements) in the
run-time memory during the embed method. Therefore, if, for example, a node
loses a reference and gets a new reference in an update operation, no
updates are needed for the reference count. I demonstrate this in the demo
implementation in the level of single (parent) node (the method
handleReferencesFromGrayNodesInDB()). Perhaps in practice most update
operations do not change references of objects. In these cases the embed
method is, of course, a pure store/update operation.

Personally I think that the db model where the persistence is a concept in
the db, and is defined by reachability from persistent root nodes is
natural for programmers and designers, and very object db like. Defining
temporally, during a single update operation, the persistence of a node or
some nodes does not sound theoretically/practically clear to me. Note also
that when persistence is defined by reachability, you can at anytime make a
selected node in the db as a persistent root node to be sure that it, and
reachable nodes from it, will not disappear in future operations. This
simple method is implemented in the demo implementation. It is incrORC(),
the opposite is decrORC() (which is a general delete operation).

Just as Tilmann says I also think that the embed method has a solid
theoretical foundation(!). Therefore, in my opinion, the embed method is
not just a one new update method. If I think with the terms of protocols
then, in my opinion, the embed method defines a new natural higher level
layer between a client program and the db (remote memory). This layer is
clean and makes updating a natural and easy operation for a programmer;
updating becomes nearly as easy as modifying object structures in the
run-time memory. And even though the embed method fits best for complex
update operations it is also useful in quite simple situations, for
example, when a single reference is replaced with a new one and perhaps
some scalar fields of some objects are updated at the same time.

There are of course some worst, or tedious, cases for the plain embed
method. However it is possible to derive additional “decorated” methods
from the plain embed method. Also these methods are still quite general and
easy to understand for a user as network level operations. I describe
examples of these methods (”handlings”) in Notes paragraph in my blog.

http://hvirkkun.blogspot.fi/2016/04/a-new-communication-theory-on-complex.html

Gc can be made even more efficient, and it can be helped easily  if an
application has extra knowledge of non-garbage nodes, as I explain in the
demo implementation. And, of course, embed can be called for any
substructure, which increases efficiency of the operation. These features
tackle many inefficiencies and make the embed more versatile. However,
still also simple update commands provided separately can be useful; for
example, if you only want to update some (scalar) fields of a single object
etc. . As you noticed above, the delete operation is simply an operation
which decreases the outer reference count of an object and calls then gc
for that object. Also I introduce in a Notes paragraph a ”forceDelete”
command. It should be a very effective delete/killing command.

Something about the background of the embed method:
I invented/implemented it for a real purpose, not from a theoretical
perspective. Beforehand, I did not have clear understanding of whether the
modified structure cpuld always be transferred back into the database in a
well defined way or not. And as I said, I believe that neither other people
had that understanding, which probably has inhibited to think or formulate
the whole problem and therefore the solution can not be in any standard
like Tilmann also seems to suspect. -I soon understood that the problem is
between two graphs and that updating scalar data in nodes is only a side
effect, but important for the users.

Like Andy says, if JDO supports many kinds of underlying db systems (which,
for example, do not contain the concept persistence inside the db) then it
is difficult to make modifications for the standard. Perhaps branch or
optional feature could be possible (??), I don't know.

I hope that some freeware groups would implement this method (plus related
other stuff) and maybe some companies too. I believe that some day this is
implemented in many systems because in my opinion the embed operation is
very ”object like”  and “object database like” if you allow the terms, and
because the size of the database does not limit the applicability of the
method.

If here are representatives of companies or other instances/groups reading
this mailing list and who are wiling to apply these techniques I could take
part in the developing process and implementation if you would like to hire
me. I have still quite many ideas as I explain in Notes paragraph in my
blog and also some other ideas. Some of them are in search/data mining side
which is traditionally more easy side to approach. And of course I have
good insight in the embed method and network/graph level things generally.
Please find my contact information below.

Feel free to contact me if you are interested, or want to discuss the
details more!

Best Regards,
Heikki Virkkunen
Email: hvirkkun@gmail.com
Phone: +358 40 706 3912

On Fri, May 27, 2016 at 8:32 PM, Tilmann <za...@gmx.de> wrote:

> Hi Heikki,
>
> I found something in the old ODMG 3.0 standard:
>
> 6.3.1.2 Object Deletion
> In the Smalltalk binding, as in Smalltalk, there is no notion of explicit
> deletion of objects. An object is removed from the database during garbage
> collection if that object is not referenced by any other persistent
> object. The delete() operation from interface 'Object' is not supported.
>
>
> -> So it seems that some kind of Garbage Collection was envisaged by the
> standard, if only for Smalltalk. However, as I said before, I'm not aware
> that this was ever implemented outside the SmallTalk community, and I don't
> know SmallTalk well enough to tell how much GC might differ from Java GC.
>
> So while the term 'GC in databases' doesn't seem to be entirely new, your
> approach may bring new ideas to the table, including a solid theoretical
> foundation. Maybe it would even explain why it hasn't been implemented in
> any major DBMS.
>
> For example, POET Navajo supports ODMG 3 but apparently chose _not_ to
> support GC (first page, last paragraph):
> http://www.odbms.org/wp-content/uploads/2013/11/038.01-Jordan-An-Object-Database-for-Embedded-Environments-April-2000.pdf
>
> Regards,
> Tilmann
>
>
> On 20.05.2016 19:55, Hei Virk wrote:
>
> Hi everybody:
>
> I will comment later more.
>
> -I am just going to keep a presentation about the topic in my university
> here in Finland for theorists and I am making those slides at a moment..
>
> One social observation:
> In my understanding the situation has been that people (me too) have not
> been understood that that a modified object structure can always be
> returned back into the database in a deterministic and meaningful way. - I
> mean that the well defined result exists always for an operation. This is
> because the modifications in Run-Time memory can be so arbitrary.
>
> For this reason no one has been able to think the problem? Therefore the
> method can not have been in any db standard either.
>
>
> Regards,
> Heikki
>
>
> On Fri, May 20, 2016 at 7:49 PM, Tilmann Zäschke <za...@gmx.de> <za...@gmx.de> wrote:
>
>
> Hello Hei,
> sorry, I posted my answer first only to the dev list, I'm not sure you are
> on it.
> So here it is again, with some updates:
>
>
> Hi,
>
> I haven't read the whole post in detail, but I understand the main idea is
> garbage collection in databases. This reminds me of similar concepts in
> osome object databases, such as GemStone/Smalltalk (https://downloads.gemtalksystems.com/docs/GemStone64/3.3.x/GS64-ReleaseNotes-3.3/GS64-ReleaseNotes-3.3.htm).
> However, I'm not aware of any major Java-ODBMS that supports it.
>
> Some comments:
>
> - As Andy describes, it could be supported via a property of the PM.
> However, there may be scenarios when 'embed' and 'makePersistent' are
> regularily use in the same transaction, so if the feature would be widely
> used, it may make sense to have a dedicated 'embed' method.
>
> - as far as I am aware, the JVM has moved away from reference counting.
> The situation in an DBMS may be different to an in-memory JVM, but if an
> ODBMS implements GC, it may be worth checking out other garbage collection
> techniques. One problem are reference cycles, I haven't read the full
> blogpost how this is solved.
>
> - It seams that enabling garbage collection via reference counting will
> have quite some impact on performance, because objects get bigger (little
> impact) and because it requires updating of additional objects in the
> database to update the reference counters (the counters also need updating
> when _adding_ references). I would therefore argue that support for garbage
> collection should be determined at database creation time, or as a flag
> when defining database schema. This would allow avoiding the GC overhead
> when GC is not required.
>
> - We would need to be able to identify root objects which should not be
> garbage collected. This could also be done via a flag on the schema, i.e.
> every object of a class that has no reference counters is automatically a
> root object.
>
> Cheers,
> Tilmann
>
>
>
> On 13.05.2016 15:49, Andy Jefferson wrote:
>
> Hi,
>
> -I have invented a new declarative method to update object databases and
>
> I
> would like to introduce that method you.
>
> My question is:
> Should the functionality of this method included in some future
> versions of
> JDO (and perhaps some other definitions/systems under Apache)?
> Also I am interested about your comments/critique on the method.
>
> The name of the method is "embed" and it resembles the command
>    pm.makePersistent(s);
> used in JDO-enabled applications. The embed method is called in
> equivalent
> way,
>    db.embed(s);
> but it has a larger functionality than the current makePersistent
> method.
>
>
> My opinion/comments, others may have other thoughts :
>
> For something to be part of JDO, it needs to be widely applicable across
> other datastores, not just for one "type" of datastore. If it is only
> implementable on 1 type of datastore then standardising it would be
> "premature"; that isn't to say that it isn't applicable to others.
>
>  From what I read of your method it is basically a makePersistent
> (attach) but with "deleteOrphans" enabled; i.e it addresses the object
> graph defined by the input object, and additionally deletes orphaned
> objects. Firstly it is entirely reasonable to be able to do that right now
> in terms of API (**assuming the JDO implementation provided the
> internals**) since the user can call pm.makePersistent and can have set
> properties on the PersistenceManager just before that call
> (pm.setProperty(...), so can set some "deleteOrphans" flag). From that I
> can conclude that I don't think there would be any change to the API needed
> to provide your mode of operation.
>
> Deleting orphans may vary dependent on the precise relation, hence why
> JDO allows metadata on each field, so some orphans can be deleted and
> others not.
>
> Clearly a JDO implementation would need to provide a "deleteOrphans"
> mode internally to support this.
>
> Detail of how your method works in an object database, whilst
> interesting, is not of relevance to the JDO spec question, since the spec
> doesn't define HOW an implementation implements features
>
>
> Regards
>
>
>
>

Re: A new general method to update object databases / Should it be included in Java Data Objects, JDO ?

Posted by Tilmann <za...@gmx.de>.
Hi Heikki,

Cost of GC: I'm not sure I completely agree with your analysis. 
Databases also contain small objects, for these, GC can be a 
considerable overhead, effectively doubling the storage size. It is true 
that memory/disk size have grow, but so have the amounts of data, end 
I/O is still often critical. There is especially the problem that object 
that are referenced need to be updated when a reference is added even if 
the object otherwise did not change. Depending on the scenario, this can 
happen small or big impact on I/O.

Another performance problem may occur when the database is used 
concurrently: When modifying additional referenced objects, these need 
to 'locked' in a way, which may interfere with other concurrent 
transactions which add/remove references to the same object (or simply 
want to read it).

There is also a technical problem which I couldn't immediately see how 
you solve it. Assume we have 3 objects in the database, A, B, and C, 
where A references B and B references C. A is root, B and C are embedded.
Now a transaction T1 loads A and prepares to delete it.
At the same time, T2 performs a query that returns B.
Then T1 commits.
What happens now? Is B and C deleted, invalidating transaction T2? Or 
are B and C retained, because T2 may create a reference to B from 
another root object?
Maybe there is some information kept in memory that B has been loaded 
somewhere else, so it is only deleted once it is known that no TX 
references it? But this could take a long time (the transaction T2 may 
last very long, or T2 may crash and never abort, or another transaction 
T3 may perform a query that also returns B (or C).
And with all these things in memory, what happens if the database server 
crashes? How would the database file recover from such a scenario?

I think a big difference between in-memory-GC and DB-GC is that in 
memory, once something is not referenced anymore, it can clearly be 
deleted, because there is no way to reach it anymore (at least in Java). 
However, in a DBMS, data is always reachable via queries. Your GC 
algorithm only checks whether data in the database is referenced, but 
cannot check whether someone may want to query it in future. Obviously, 
the intent to query data can be specified by making it root-objects, but 
would that mean I should declare everything as root if I want to query 
it at some point?

Btw, I ran across on old paper on O2, which discusses concurrent GC: 
http://www.vldb.org/conf/1997/P356.PDF
Incidentally, it also talks about white and grey objects :-)
There is quite some research if you google for "garbage collection o2 
database".

Generally, I see the point of using GC in databases (and so did the 
Gemstone people and the ODMG people). I'm also sure there are cases 
where the ease of development is worth the additional I/O overhead. 
There are probably also scenarios where using is actually faster, 
because it offloads the GC to the server (assuming transaction pattern 
that cause little additional writes for updating the counters).
I'm also sure there are solutions to the concurrency and crash problem.

 From personal experience (only two larger projects in industry), I have 
however seen that people can be very reluctant to have anything removed 
from their databases, let alone by some automatic removal algorithm. The 
main point is to keep a history of data and, depending on the company, 
to provide legal traceability of processes and updates. This is one 
reason why many DBMS vendors support some kind of versioning for data. 
Instead of overwriting or deleting old data, they create new versions.
So, at least in my personal experience, I haven't seen a project were a 
garbage collection feature would have been useful. Obviously my 
experience is limited, but if a DB vendor had asked me, I would have 
given other features a higher priority than garbage collection.

I wonder weather these factors (performance of I/O & concurrency; 
requirement to keep data/risk of losing data) may be the reasons why 
garbage collection is not widely implemented by DBMS vendors?

Maybe if you can convince a popular open-source DBMS to implement it (or 
a mapping layer such as Hibernate). If it becomes widely used, then it 
would make more sense to put it into a standard such as JDO ...

This is my personal opinion of course, feel free to object :-)

Best regards,
Tilmann



On 05.06.2016 20:11, Hei Virk wrote:
> Hi everybody,
>
> -- Sorry the delay in my answer
>
> First, I made slides (a PDF) about the embed method and how its 
> algorithm works:
> https://drive.google.com/file/d/0B8eKvcE21R9RMkNPYlZiOTEwZnM/view
>
> I don't know if the Smalltalk community or other communities have this 
> kind of method (embed) but I believe not, because otherwise we would 
> probably know it!
>
> It is true that gc is not a new concept in db world but but perhaps 
> before it has been used quite separately in the db.
>
> What is new in the embed method? I think that at least the high level 
> method/abstraction where the result of the embed operation is defined 
> clearly between two memory systems (i.e. \u201cthe network level 
> understanding\u201d). Also I think that in the next lower level binding the 
> update part and the gc part together (\u201cembed = update + gc\u201d) is a new 
> idea. And what is important in this binding is that the the update 
> part is able to collect information for the gc part and the gc part 
> has full network/graph level understanding in db even it is done 
> locally and therefore efficiently.
>
> Reference cycles (circular substructures) are not problem in the 
> algorithm as is described in the demo implementation. On the other 
> hand, gc can be done more easily when garbage (or db) does not contain 
> circular substructures and both algorithms can be applied in real 
> systems, the simpler first.
>
> It is true that storing reference counts takes some extra bytes in 
> nodes in db. That does not sound a big problem, because objects often 
> contain quite much user data and memories are quite big today, even in 
> small devices.
>
> Updating of reference counts is not a big performance problem in my 
> opinion. If reference counts must be updated, it often means that you 
> are making a big update operation when updating of reference counts is 
> a small part of the operation. In addition, you can often update 
> reference count with the same SQL update statement (in O/R systems) 
> which updates fields containing user data. (However in a pure 
> reference lose you are not updating the user data of an object.) Also 
> note that it is possible to keep reference count information (sum of 
> increments and decrements) in the run-time memory during the embed 
> method. Therefore, if, for example, a node loses a reference and gets 
> a new reference in an update operation, no updates are needed for the 
> reference count. I demonstrate this in the demo implementation in the 
> level of single (parent) node (the method 
> handleReferencesFromGrayNodesInDB()). Perhaps in practice most update 
> operations do not change references of objects. In these cases the 
> embed method is, of course, a pure store/update operation.
>
> Personally I think that the db model where the persistence is a 
> concept in the db, and is defined by reachability from persistent root 
> nodes is natural for programmers and designers, and very object db 
> like. Defining temporally, during a single update operation, the 
> persistence of a node or some nodes does not sound 
> theoretically/practically clear to me. Note also that when persistence 
> is defined by reachability, you can at anytime make a selected node in 
> the db as a persistent root node to be sure that it, and reachable 
> nodes from it, will not disappear in future operations. This simple 
> method is implemented in the demo implementation. It is incrORC(), the 
> opposite is decrORC() (which is a general delete operation).
>
> Just as Tilmann says I also think that the embed method has a solid 
> theoretical foundation(!). Therefore, in my opinion, the embed method 
> is not just a one new update method. If I think with the terms of 
> protocols then, in my opinion, the embed method defines a new natural 
> higher level layer between a client program and the db (remote 
> memory). This layer is clean and makes updating a natural and easy 
> operation for a programmer; updating becomes nearly as easy as 
> modifying object structures in the run-time memory. And even though 
> the embed method fits best for complex update operations it is also 
> useful in quite simple situations, for example, when a single 
> reference is replaced with a new one and perhaps some scalar fields of 
> some objects are updated at the same time.
>
> There are of course some worst, or tedious, cases for the plain embed 
> method. However it is possible to derive additional \u201cdecorated\u201d 
> methods from the plain embed method. Also these methods are still 
> quite general and easy to understand for a user as network level 
> operations. I describe examples of these methods (\u201dhandlings\u201d) in 
> Notes paragraph in my blog.
>
> http://hvirkkun.blogspot.fi/2016/04/a-new-communication-theory-on-complex.html
>
> Gc can be made even more efficient, and it can be helped easily  if an 
> application has extra knowledge of non-garbage nodes, as I explain in 
> the demo implementation. And, of course, embed can be called for any 
> substructure, which increases efficiency of the operation. These 
> features tackle many inefficiencies and make the embed more versatile. 
> However, still also simple update commands provided separately can be 
> useful; for example, if you only want to update some (scalar) fields 
> of a single object etc. . As you noticed above, the delete operation 
> is simply an operation which decreases the outer reference count of an 
> object and calls then gc for that object. Also I introduce in a Notes 
> paragraph a \u201dforceDelete\u201d command. It should be a very effective 
> delete/killing command.
>
> Something about the background of the embed method:
> I invented/implemented it for a real purpose, not from a theoretical 
> perspective. Beforehand, I did not have clear understanding of whether 
> the modified structure cpuld always be transferred back into the 
> database in a well defined way or not. And as I said, I believe that 
> neither other people had that understanding, which probably has 
> inhibited to think or formulate the whole problem and therefore the 
> solution can not be in any standard like Tilmann also seems to 
> suspect. -I soon understood that the problem is between two graphs and 
> that updating scalar data in nodes is only a side effect, but 
> important for the users.
>
> Like Andy says, if JDO supports many kinds of underlying db systems 
> (which, for example, do not contain the concept persistence inside the 
> db) then it is difficult to make modifications for the standard. 
> Perhaps branch or optional feature could be possible (??), I don't know.
>
> I hope that some freeware groups would implement this method (plus 
> related other stuff) and maybe some companies too. I believe that some 
> day this is implemented in many systems because in my opinion the 
> embed operation is very \u201dobject like\u201d  and \u201cobject database like\u201d if 
> you allow the terms, and because the size of the database does not 
> limit the applicability of the method.
>
> If here are representatives of companies or other instances/groups 
> reading this mailing list and who are wiling to apply these techniques 
> I could take part in the developing process and implementation if you 
> would like to hire me. I have still quite many ideas as I explain in 
> Notes paragraph in my blog and also some other ideas. Some of them are 
> in search/data mining side which is traditionally more easy side to 
> approach. And of course I have good insight in the embed method and 
> network/graph level things generally. Please find my contact 
> information below.
>
> Feel free to contact me if you are interested, or want to discuss the 
> details more!
>
> Best Regards,
> Heikki Virkkunen
> Email: hvirkkun@gmail.com <ma...@gmail.com>
> Phone: +358 40 706 3912 <tel:%2B358%2040%20706%203912>
>
> On Fri, May 27, 2016 at 8:32 PM, Tilmann <zaeschke@gmx.de 
> <ma...@gmx.de>> wrote:
>
>     Hi Heikki,
>
>     I found something in the old ODMG 3.0 standard:
>
>     6.3.1.2 Object Deletion
>     In the Smalltalk binding, as in Smalltalk, there is no notion of
>     explicit deletion of objects. An object is removed from the
>     database during garbage collection if that object is not
>     referenced by any other persistent object. The delete() operation
>     from interface 'Object' is not supported.
>
>
>     -> So it seems that some kind of Garbage Collection was envisaged
>     by the standard, if only for Smalltalk. However, as I said before,
>     I'm not aware that this was ever implemented outside the SmallTalk
>     community, and I don't know SmallTalk well enough to tell how much
>     GC might differ from Java GC.
>
>     So while the term 'GC in databases' doesn't seem to be entirely
>     new, your approach may bring new ideas to the table, including a
>     solid theoretical foundation. Maybe it would even explain why it
>     hasn't been implemented in any major DBMS.
>
>     For example, POET Navajo supports ODMG 3 but apparently chose
>     _not_ to support GC (first page, last paragraph):
>     http://www.odbms.org/wp-content/uploads/2013/11/038.01-Jordan-An-Object-Database-for-Embedded-Environments-April-2000.pdf
>
>
>     Regards,
>     Tilmann
>
>
>     On 20.05.2016 19:55, Hei Virk wrote:
>>     Hi everybody:
>>
>>     I will comment later more.
>>
>>     -I am just going to keep a presentation about the topic in my university
>>     here in Finland for theorists and I am making those slides at a moment..
>>
>>     One social observation:
>>     In my understanding the situation has been that people (me too) have not
>>     been understood that that a modified object structure can always be
>>     returned back into the database in a deterministic and meaningful way. - I
>>     mean that the well defined result exists always for an operation. This is
>>     because the modifications in Run-Time memory can be so arbitrary.
>>
>>     For this reason no one has been able to think the problem? Therefore the
>>     method can not have been in any db standard either.
>>
>>
>>     Regards,
>>     Heikki
>>
>>
>>     On Fri, May 20, 2016 at 7:49 PM, Tilmann Z�schke<za...@gmx.de> <ma...@gmx.de>  wrote:
>>
>>>     Hello Hei,
>>>     sorry, I posted my answer first only to the dev list, I'm not sure you are
>>>     on it.
>>>     So here it is again, with some updates:
>>>
>>>
>>>     Hi,
>>>
>>>     I haven't read the whole post in detail, but I understand the main idea is
>>>     garbage collection in databases. This reminds me of similar concepts in
>>>     osome object databases, such as GemStone/Smalltalk (
>>>     https://downloads.gemtalksystems.com/docs/GemStone64/3.3.x/GS64-ReleaseNotes-3.3/GS64-ReleaseNotes-3.3.htm).
>>>     However, I'm not aware of any major Java-ODBMS that supports it.
>>>
>>>     Some comments:
>>>
>>>     - As Andy describes, it could be supported via a property of the PM.
>>>     However, there may be scenarios when 'embed' and 'makePersistent' are
>>>     regularily use in the same transaction, so if the feature would be widely
>>>     used, it may make sense to have a dedicated 'embed' method.
>>>
>>>     - as far as I am aware, the JVM has moved away from reference counting.
>>>     The situation in an DBMS may be different to an in-memory JVM, but if an
>>>     ODBMS implements GC, it may be worth checking out other garbage collection
>>>     techniques. One problem are reference cycles, I haven't read the full
>>>     blogpost how this is solved.
>>>
>>>     - It seams that enabling garbage collection via reference counting will
>>>     have quite some impact on performance, because objects get bigger (little
>>>     impact) and because it requires updating of additional objects in the
>>>     database to update the reference counters (the counters also need updating
>>>     when _adding_ references). I would therefore argue that support for garbage
>>>     collection should be determined at database creation time, or as a flag
>>>     when defining database schema. This would allow avoiding the GC overhead
>>>     when GC is not required.
>>>
>>>     - We would need to be able to identify root objects which should not be
>>>     garbage collected. This could also be done via a flag on the schema, i.e.
>>>     every object of a class that has no reference counters is automatically a
>>>     root object.
>>>
>>>     Cheers,
>>>     Tilmann
>>>
>>>
>>>
>>>     On 13.05.2016 15:49, Andy Jefferson wrote:
>>>>>     Hi,
>>>>>
>>>>>     -I have invented a new declarative method to update object databases and
>>>>>>     I
>>>>>>     would like to introduce that method you.
>>>>>>
>>>>>>     My question is:
>>>>>>     Should the functionality of this method included in some future
>>>>>>     versions of
>>>>>>     JDO (and perhaps some other definitions/systems under Apache)?
>>>>>>     Also I am interested about your comments/critique on the method.
>>>>>>
>>>>>>     The name of the method is "embed" and it resembles the command
>>>>>>         pm.makePersistent(s);
>>>>>>     used in JDO-enabled applications. The embed method is called in
>>>>>>     equivalent
>>>>>>     way,
>>>>>>         db.embed(s);
>>>>>>     but it has a larger functionality than the current makePersistent
>>>>>>     method.
>>>>>>
>>>>>     My opinion/comments, others may have other thoughts :
>>>>>
>>>>>     For something to be part of JDO, it needs to be widely applicable across
>>>>>     other datastores, not just for one "type" of datastore. If it is only
>>>>>     implementable on 1 type of datastore then standardising it would be
>>>>>     "premature"; that isn't to say that it isn't applicable to others.
>>>>>
>>>>>       From what I read of your method it is basically a makePersistent
>>>>>     (attach) but with "deleteOrphans" enabled; i.e it addresses the object
>>>>>     graph defined by the input object, and additionally deletes orphaned
>>>>>     objects. Firstly it is entirely reasonable to be able to do that right now
>>>>>     in terms of API (**assuming the JDO implementation provided the
>>>>>     internals**) since the user can call pm.makePersistent and can have set
>>>>>     properties on the PersistenceManager just before that call
>>>>>     (pm.setProperty(...), so can set some "deleteOrphans" flag). From that I
>>>>>     can conclude that I don't think there would be any change to the API needed
>>>>>     to provide your mode of operation.
>>>>>
>>>>>     Deleting orphans may vary dependent on the precise relation, hence why
>>>>>     JDO allows metadata on each field, so some orphans can be deleted and
>>>>>     others not.
>>>>>
>>>>>     Clearly a JDO implementation would need to provide a "deleteOrphans"
>>>>>     mode internally to support this.
>>>>>
>>>>>     Detail of how your method works in an object database, whilst
>>>>>     interesting, is not of relevance to the JDO spec question, since the spec
>>>>>     doesn't define HOW an implementation implements features
>>>>>
>>>>>
>>>>>     Regards
>>>>>
>
>