You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2007/09/13 16:33:32 UTC

[jira] Resolved: (JCR-926) Global data store for binaries

     [ https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller resolved JCR-926.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 1.4

> Global data store for binaries
> ------------------------------
>
>                 Key: JCR-926
>                 URL: https://issues.apache.org/jira/browse/JCR-926
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: core
>            Reporter: Jukka Zitting
>             Fix For: 1.4
>
>         Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, dataStore3.patch, dataStore4.zip, dataStore5-garbageCollector.patch, internalValue.patch, ReadWhileSaveTest.patch
>
>
> There are three main problems with the way Jackrabbit currently handles large binary values:
> 1) Persisting a large binary value blocks access to the persistence layer for extended amounts of time (see JCR-314)
> 2) At least two copies of binary streams are made when saving them through the JCR API: one in the transient space, and one when persisting the value
> 3) Versioining and copy operations on nodes or subtrees that contain large binary values can quickly end up consuming excessive amounts of storage space.
> To solve these issues (and to get other nice benefits), I propose that we implement a global "data store" concept in the repository. A data store is an append-only set of binary values that uses short identifiers to identify and access the stored binary values. The data store would trivially fit the requirements of transient space and transaction handling due to the append-only nature. An explicit mark-and-sweep garbage collection process could be added to avoid concerns about storing garbage values.
> See the recent NGP value record discussion, especially [1], for more background on this idea.
> [1] http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/%3c510143ac0705120919k37d48dc1jc7474b23c9f02cbd@mail.gmail.com%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

I have created an issue: http://issues.apache.org/jira/browse/JCR-1154

Please attach your patch there ('Attach file'). You will be asked to
'Grant license to ASF for inclusion in ASF works'.

> other changes to the rest of the core (there aren't too many).

That would be great, because maybe somebody else likes to backport it as well.

Thanks a lot!
Thomas

Re: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Thomas Mueller <th...@gmail.com>.
Hi Pablo,

> It seems that the GC needs more information from the Data Store to do
> its job. One approach may be to have "transient" and "persisted" binary
> content so that the GC doesn't delete "transient" binary content.

Yes, I agree, that would be the best solution. Or: never delete data
where a reference is in memory (no matter if it's transient).

What about storing all DataRecord objects in a WeakHashMap. Then there
are two solutions:

Plan A) the garbage collection just checks the hash map and doesn't
delete those.
Plan B) there is a background daemon thread that updates the modified
date from time to time.

Plan A sounds simpler, Plan B would solve some distributed GC problems.

> The GC can use observation to be notified
> of property creation, as currently does to detect when nodes are moved
> during GC scan. We can use a mark file for the binary content state in
> the FileDataStore implementation and an additional column in the binary
> content table for the DatabaseDataStore.

I think that would work as well, but the observation solution would be
more complex in my view.

> What is the overhead of a PROPERTY_ADDED + PROPERTY_CHANGED listener in
> Jackrabbit ?

I don't know. If it must observe all properties all the time, probably
it should be avoided. If there is a way to only observe binary
properties, or only while the GC is running, then it should be OK.

Thomas

RE: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Pablo Rios <pr...@bea.com>.
Hi Thomas,

How do you handle in the GC the below scenario described by Esteban ?

[time]     [user session]			[GC session]
t0         node.setProperty(binary)
t1                                        gc.start
t2                                        gc.stop
t3         node.save

In this scenario the GC will delete the binary content created at t0 (t0
< t1) and the repository will have a binary property persisted at t3
with its value deleted.

It seems that the GC needs more information from the Data Store to do
its job. One approach may be to have "transient" and "persisted" binary
content so that the GC doesn't delete "transient" binary content. In the
above scenario the binary content would be created with the "transient"
state (at t0) and when the property is persisted it would be changed to
the "persisted" state (at t3). The GC can use observation to be notified
of property creation, as currently does to detect when nodes are moved
during GC scan. We can use a mark file for the binary content state in
the FileDataStore implementation and an additional column in the binary
content table for the DatabaseDataStore.

Does it have sense ?
What is the overhead of a PROPERTY_ADDED + PROPERTY_CHANGED listener in
Jackrabbit ?

Regards,
Pablo


-----Original Message-----
From: Esteban Franqueiro [mailto:efranque@bea.com] 
Sent: Tuesday, October 02, 2007 4:12 PM
To: dev@jackrabbit.apache.org
Subject: RE: [jira] Resolved: (JCR-926) Global data store for binaries

> Hi,
> 
> > > What about RepositoryException?
> > Yes, that would work too. But we wanted to be able to indentify the 
> > specific exception thrown from the DS. In a few places we
> wrapped DSE
> > inside a RE.

Yes, I changed that.

> What about if DataStoreException extends RepositoryException? 
> Like that you don't need to catch the exception and re-throw 
> a different one; but identifying is possible.
> 
> > > What about updating the time when getLength() is called?
> >
> > Sorry, I don't understand this.
> 
> Currently the modified date/time is updated when 
> modifiedDateOnRead is set and getStream() is called. You said 
> you want to avoid calling getStream().close(). So I suggest 
> to update the modified date/time when modifiedDateOnRead is 
> set and getLength() is called.
> 
> > > There is already DataStore.updateModifiedDateOnRead(long
> > > before); a separate method is not required in my view.
> > This didn't work in our testing.
> 
> Sorry, what does not work, and why?
> 
> > The FileDataRecord always queries the file object for it's 
> properties.
> 
> There is only getLength and getStream. Caching the length is possible.
> The length could be part of the identifier. For example, you 
> could do that for the DatabaseDataStore (where accessing this 
> information would be slow). For the FileDataStore, I think it 
> is not required as new
> File(..).length() is quite fast; but I may be wrong. So 
> basically the DataIdentifier for DatabaseDataStore could be 
> the digest plus the length.

For the database data store we're not using digests as ID's. We're using
UUID's. The thing about digests is that you have to get in the way
between the sender and the server. Initially we used something like the
FileDS to store the binary, get the digest and then upload it to the DB.
Then we decided to store directly in the DB, so we couldn't use digests
anymore.
Certainly, we could compute the digests in between the client and the
server, but we would need a two step upload process since after
computing the digest we would have to update the row to insert it there.
Using UUIDs is much easier since they don't depend on the content. Yes,
I know you loose the non-repetition property, but it was something we
could live with at the time. Anyway, if we can come up with another way,
that would be fine for us.
Also, storing the binary in the filesystem for digest computation,
instead of storing directly in the DB, has some impact on perfromance.

> > Well, we found that it is necesary, since the GC runs on a 
> different 
> > session than the users (we're using system sessions for this).
> 
> For the data store, it doesn't matter what session created or 
> accessed the data record. Also it doesn't matter when save() 
> was called - the data record is stored before that. As long 

Yes, the data record is stored, but the nodes referencing that record
are not.

> as the GC has access all nodes it will touch all 'not 
> recently used' data records. New data records (that were 
> created after the GC was started) will have a more recent 
> modified date / time and therefore the GC will not delete them.
> There is probably a misunderstanding somewhere. It would help 
> if you could post your current source code, maybe there is a 
> bug. Did you update the modified date / time in addRecord() 
> when the object already exists, as done in the file data store?

Yes, perhaps there's a misunderstanding. I will send you my code, but in
the meantime, let me explain what I mean.
The naive-GC reads all the nodes in the workspace to mark them (ie,
update access time). So if the user is uploading binaries in one session
and the GC is iterating the nodes on another, then until the user
save()s, the GC session won't see the changes in persistent storage. The
time of the binaries may be before or after the cut point of the GC,
depending on timing issues.
If it is before, then when the GC does a deleteAllOlderThan(), those
unmarked binaries will be deleted (they're unmarked because the nodes
that contain them could not be read because they don't exist yet).
I don't know if using the new PM interface will change any of this.

> > So, the user adds a binary property in one session, after 
> the file is 
> > uploaded but before the user save()s the session, the GC on 
> the system 
> > session starts reading all nodes from the workspace. Since 
> the changes 
> > are not yet written to persistent storage, the file is 
> assumed to be a 
> > deleted property, and is in fact deleted.
> 
> Is this using the database data store or the file data store? 
> If this is the case, it would be a bug.

Sorry, but I don't see the difference.

> > We needed to make RepositoryImpl.getWorkspaceNames() public 
> for this 
> > to work.
> 
> Yes that would be good. Did you have a look at 
> PersistenceManagerIteratorTest.testGetAllNodeIds()? 
> getWorkspaceNames is called using reflection (also.
> RepositoryImpl.class.getDeclaredMethod("getWorkspaceNames", 
> new Class[0]).setAccessible(true)).

Yes, I've seen it. But it would be easier to just make them public. Or
at least export some of those methods thru an utils class.

> > Will this change have any effect on the issue I just mentioned?
> 
> I don't know - first I need to understand the issue...
> 
> > Ok then, our current implementation is against a patched-up 1.3.1. 
> > What do you think is the best way to isolate the code?

May be we should discuss further after you get the code.
Since the copy of Jackrabbit I'm working has a lot of patches, it would
be pretty complicated to isolate those changes in a patch. Do you mind
if I send you a zip file with the implementations of the interfaces, the
tests, the configurations and parsers, and the initialization routines?

Regards,

Esteban Franqueiro
esteban.franqueiro@bea.com

Notice:  This email message, together with any attachments, may contain
information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated
entities,  that may be confidential,  proprietary,  copyrighted  and/or
legally privileged, and is intended solely for the use of the individual
or entity named in this message. If you are not the intended recipient,
and have received this message in error, please immediately return this
by email and then delete it.

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

Re: Realtime datastore garbage collector

Posted by Esteban Franqueiro <es...@bea.com>.
Hi Thomas.

> The main reference is PropertyState, which is referenced by
> PropertyImpl. PropertyState is kept in the transient space and / or
> cache for a certain amout of time. I think it is not that easy to find
> out when all hard, weak and soft references are removed, because this
> depends on the virtual machine. That's why I have written the test
> using the clearInUse method. To write a more realistic test case (that
> works in the same way in all virtual machines) would be quite complex
> I think. Do you think this is required?

I don't think that such a complicated test is required right now.
The scenario I described in my last email happened while running a test that calls clearInUse() in 
runtime (and not while stepping thru the code). I think this issue is realistic, so I'm trying to 
find a way to notify the datastore about the removals from the weak hash map (so it can update the 
entry's timestamp). Ideally Session.save() could do this, so that there's no timeframe between the 
removal of those entries and the call to deleteUnused() in the GC.

Regards,

Esteban Franqueiro
esteban.franqueiro@bea.com 


Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

Re: Realtime datastore garbage collector

Posted by Thomas Mueller <th...@gmail.com>.
Hi Esteban,

My 'chain' is incomplete:

o.a.j.core.state.PropertyState (properties),
o.a.j.core.cluster.PropertyOperation (cluster operations),
o.a.j.core.nodetype.PropDefImpl (for default values) ->

InternalValue -> BLOBInDataStore -> DataIdentifier

The main reference is PropertyState, which is referenced by
PropertyImpl. PropertyState is kept in the transient space and / or
cache for a certain amout of time. I think it is not that easy to find
out when all hard, weak and soft references are removed, because this
depends on the virtual machine. That's why I have written the test
using the clearInUse method. To write a more realistic test case (that
works in the same way in all virtual machines) would be quite complex
I think. Do you think this is required?

Thanks,
Thomas

Re: Realtime datastore garbage collector

Posted by Stefan Guggisberg <st...@gmail.com>.
On 10/30/07, Esteban Franqueiro <es...@bea.com> wrote:
> Hi Thomas.
> Regarding this:
>
> >> who has the hard reference to the data identifier
> >
> > The chain is:
> > PropertyImpl -> InternalValue -> BLOBInDataStore -> DataIdentifier
>
> Who holds on to the PropertyImpl? Is this reference removed before or after the save?
> Im asking because while testing the code I came up with the following scenario:
>
>    JR                      |  GC
> ------------------------|---------------------
> add properties    |
>                               | set cut-point as part of the next scan / mark cycle
>                               | (won't mark recently added properties, because they're unsaved)
>   save                    |
> (evict)                   |
>                               | deleteUnused()
>
> (evict): identifiers evicted from the weak hash map
>
> Is it possible that the identifiers get removed from the weak hash map immediatly after the save
> operation? Or the identifiers can't be removed because a hard reference to them (ie, to the
> PropertyImpl object) is kept somewhere in the Jackrabbit core? When is this reference removed?

there are no hard references to ItemImpl instances kept within jackrabbit core.
ItemImpl instances are automatically collected as soon as the client
code releases
the last refererence. see o.a.j.c.ItemManager#itemCache for details.

cheers
stefan

> In testing, this is possible to do by calling the clearInUse() method you provided. But what about a
> real scenario?
> I hope you can understand me. If not, I'll try to explain better.
> Regards,
>
> Esteban Franqueiro
> esteban.franqueiro@bea.com
>
>
> Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
>

Re: Realtime datastore garbage collector

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

> dataStore.removeTransientIdentifiers(addedProps);

There is a problem with this approach: an identifier can be added to
multiple properties. Also, it may be used at other places. So you
would need to keep a reference count as well. Also, you would need to
be sure the reference counts are updated correctly ('transactional').

It would be a good idea to implement this, however I think with the
current architecture of Jackrabbit (having multiple change logs,
multiple caches, and multiple places where values are used), it is
beyond my ability to verify that the implementation is correct. I just
don't know enough about the Jackrabbit core, and there are not enough
test cases in the Jackrabbit core that would allow automatic
verification.

A simpler mechanism would be to store back-references: each data
record / identifier would know who references it. The garbage
collection could then follow the back-references and check if they are
still valid (and if not remove them). Items without valid back
references could be deleted. This allows to delete very large objects
quickly (if they are not used of course).

When we change the architecture of Jackrabbit (see also NGP) we should
think about the data store.

But at this time, I would argue it is safer to keep the data store
mechanism as is, without trying add more features (adding more data
store implementations is not a problem of course), unless we really
fix a bug. I think it makes more sense to spend the time improving the
architecture of Jackrabbit before trying to add more complex
algorithms to the data store (which are not required afterwards).

Thomas

Re: Realtime datastore garbage collector

Posted by Esteban Franqueiro <es...@bea.com>.
Hi.

> Could you please describe it a bit more in detail? How do you avoid
> deleting a record that is not yet saved?

Yes, sure.
I have implemented the following, and it solves the scenario I originally mentioned.
The idea is to do something like this in SharedItemStateManager.Update.end()

persistMgr.store(shared);
List addedProps = buildListOfAddedProps(shared);
dataStore.removeTransientIdentifiers(addedProps);
...

Where addedProps has the DataIdentifier's of the properties in the ChangeLog.addedStates() 
collection.

And in DatabaseDataStore.removeTransientIdentifiers() the last modified time of each of those 
properties is updated, and its entry in the inUse weak hash map is removed (I know it's weak, but I 
prefer to delete it myself).

For this to be prettier, I think some changes should be done. For example, the way the persistence 
manager gets a reference to the data store (I added a parameter so far). Another thing is obtaining 
the DataIdentifier from a PropertyState/PropertyImpl is quite involved. There should be an internal 
interface or something like that for this job.

Regards,

Esteban Franqueiro
esteban.franqueiro@bea.com 


Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

Re: Realtime datastore garbage collector

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

Sorry for the delay...

> I think the issue I mentioned in http://www.mail-archive.com/dev@jackrabbit.apache.org/msg08520.html
> could be solved by:

> integrating the data store with the current save mechanism

This could be a solution (actually my plan was to use reference
counting: increment the count on a safe, decrement on a delete -
however there are some problems with versioning / transactions, so the
count could be wrong).

> so when you do a save, you also get an atomic update
> on the timestamps of the transient items in the data store
> cache. And by synchronizing this update and the deletion of of unsaved items,
> I think we could avoid  that scenario.

Could you please describe it a bit more in detail? How do you avoid
deleting a record that is not yet saved?

Thanks,
Thomas

Re: Realtime datastore garbage collector

Posted by Esteban Franqueiro <es...@bea.com>.
Hi.
I think the issue I mentioned in http://www.mail-archive.com/dev@jackrabbit.apache.org/msg08520.html 
could be solved by integrating the data store with the current save mechanism, so when you do a 
save, you also get an atomic update on the timestamps of the transient items in the data store 
cache. And by synchronizing this update and the deletion of of unsaved items, I think we could avoid 
that scenario.
What do you think?

Regards,

Esteban Franqueiro
esteban.franqueiro@bea.com 


Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

Realtime datastore garbage collector

Posted by Esteban Franqueiro <es...@bea.com>.
Hi Thomas.
Regarding this:

>> who has the hard reference to the data identifier
>
> The chain is:
> PropertyImpl -> InternalValue -> BLOBInDataStore -> DataIdentifier

Who holds on to the PropertyImpl? Is this reference removed before or after the save?
Im asking because while testing the code I came up with the following scenario:

   JR                      |  GC
------------------------|---------------------
add properties    |
                              | set cut-point as part of the next scan / mark cycle
                              | (won't mark recently added properties, because they're unsaved)
  save                    |
(evict)                   |
                              | deleteUnused()

(evict): identifiers evicted from the weak hash map

Is it possible that the identifiers get removed from the weak hash map immediatly after the save 
operation? Or the identifiers can't be removed because a hard reference to them (ie, to the 
PropertyImpl object) is kept somewhere in the Jackrabbit core? When is this reference removed?
In testing, this is possible to do by calling the clearInUse() method you provided. But what about a 
real scenario?
I hope you can understand me. If not, I'll try to explain better.
Regards,

Esteban Franqueiro
esteban.franqueiro@bea.com 


Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

Re: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

> It's a long procedure. So there are many possible points of failure.

Yes, it would be great if this can be simplified. It is the same
algorithm used in the FileDataStore: 1) create a temp file, 2) insert
the data, 3) try to rename it. It may be a good idea to wrap this in a
transaction, otherwise it could theoretically leak a temporary file if
there is a problem (connection lost after 1 and before 3). Also,
instead of the 'random number' at step 1 it may be better to use a
database side sequence (autoincrement) if possible, but that would
make it more database dependent.

> the uploaded code has a working version of a GC that collects on repository startup.

Yes, I saw that. I think it feels a bit strange having to change the
repository.xml to run the garbage collection (and then back).
Jackrabbit does not yet have a 'management API', maybe supporting JMX
would be a better option (not only for data store)...

> mark the nodes of all the workspaces, and then do a single delete
> operation for the entire data store
> I don't know if this was always your idea

Yes, that was the idea. Theoretically, multiple repositories could use
the same data store. Therefore, there must be a way to run the the
mark and the delete step separately.

> who has the hard reference to the data identifier

The chain is:
PropertyImpl -> InternalValue -> BLOBInDataStore -> DataIdentifier

Thomas

Re: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Esteban Franqueiro <es...@bea.com>.
Hi.
We integrated the changes made by Thomas to the FileDataStore regarding the inUse weak hash map into 
our database-based implementation. Taking care of clearing it in the test, it appears to be working.
One question I have is who has the hard reference to the data identifier, and when/where it gets 
null'ed so that the weak reference can be collected.
Regards,

Esteban Franqueiro
esteban.franqueiro@bea.com 


Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

RE: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Esteban Franqueiro <ef...@bea.com>.
Hi.
Welcome back.

> I think there is still a way to get the digest. If you wrap 
> the InputStream like this:
> 
> public DataRecord addRecord(InputStream input) throws IOException {
>   MessageDigest digest = MessageDigest.getInstance(DIGEST);
>   InputStream input = new DigestInputStream(input, digest);
>   ...
> }

> (...)
> I hope you get the idea. The second UPDATE statement will 
> return 0 update count if a record with the same digest 
> exists. I didn't set the LENGTH everywhere.

Yes, this could be a way, but I'm not yet convinced with the approach
It's a long procedure. So there are many possible points of failure.
Still, we do have a working impl of this idea. I'll upload it later.

> > [time]     [user session]                       [GC session]
> > t0         node.setProperty(binary)
> > t1                                        gc.start
> > t2                                        gc.stop
> > t3         node.save
> 
> Is this the problem? I didn't think about that so far... In 
> my view it is rare because the garbage collection usually 
> will take some time, and the time between node.setProperty 
> and node.save is (hopefully) short.

Well, it may or may not be short, depending on the application.

> But it needs to be 
> solved. I will write a test case.

There are some test cases in the attachment I uploaded to JCR-1154 (see
for example SimpleGCTest.testSaveRevert()). Of course they are synthetic
and use a synchronous GC, but I think they prove the point.

> A simple solution is to 
> only delete records when the repository is stopped (or 
> started). Obviously this is not a solution for long running 
> repositories. Another idea is to keep transient large 

I briefly mentioned this in a previous email, and the uploaded code has
a working version of a GC that collects on repository startup. To try
it, you have to set the garbageCollectorMode to "startup" in
repository.xml.

> binaries in a WeakReferenceHashMap, and before deleting check 
> that the record is not in there.
> 
> > > > We needed to make RepositoryImpl.getWorkspaceNames() public
> > it would be easier to just make them public.
> > Or at least export some of those methods thru an utils class.
> 
> I will make it public.

Great!

> > Do you mind
> > if I send you a zip file with the implementations of the 
> interfaces, 
> > the tests, the configurations and parsers, and the 
> initialization routines?
> 
> There is no hurry, but please don't send it via email. The 
> preferred way is to attach the code to the bug:
> http://issues.apache.org/jira/browse/JCR-1154 (you will be 
> asked to 'Grant license to ASF for inclusion in ASF works').

It's done.
Once again I apologize for having to send the code as I did instead of
in a patch.

On a different topic, I want to discuss again another issue. When you
are marking (ie, updating nodes times) on a workspace, and then you
delete records older than the current cut-point, you risk deleting
records belonging to other workspaces. The solution I propose is simply
to mark the nodes of all the workspaces, and then do a single delete
operation for the entire data store (I don't know if this was always
your idea, but when we begun working on this, we assumed otherwise :( )

Regards,

Esteban Franqueiro
esteban.franqueiro@bea.com

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

Re: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

I'm sorry about the delay. I was away last week.

> What about if DataStoreException extends RepositoryException?
> What about updating the time when getLength() is called?

Are you OK with that? If yes I will change it.

> For the database data store we're not using digests as ID's. We're using
> UUID's.

That's technically OK, but I think it can be avoided using
DigestInputStream (see below).

> The thing about digests is that you have to get in the way
> between the sender and the server.

Yes, that's true.

> Initially we used something like the
> FileDS to store the binary, get the digest and then upload it to the DB.

That's of course a problem, it must be avoided.

> Then we decided to store directly in the DB,

Sure, that's much faster.

> so we couldn't use digests anymore.

I think there is still a way to get the digest. If you wrap the
InputStream like this:

public DataRecord addRecord(InputStream input) throws IOException {
  MessageDigest digest = MessageDigest.getInstance(DIGEST);
  InputStream input = new DigestInputStream(input, digest);
  ...
}

> Certainly, we could compute the digests in between the client and the
> server, but we would need a two step upload process since after
> computing the digest we would have to update the row to insert it there.

A two step process must be avoided of course, I hope the
DigestInputStream approach can solve that.

> Using UUIDs is much easier since they don't depend on the content.

Sure. What about (pseudo code):

Example schema:
CREATE TABLE DATASTORE(
  KEY BINARY(..) PRIMARY KEY,
  TEMP BIT,
  LENGTH BIGINT,
  DATA BLOB
);

do {
  randomId = secureRandom.nextBytes(..);
  try {
    INSERT INTO DATASTORE(KEY, TEMP) VALUES(randomId, TRUE);
  } catch(UniqueKeyException ) {
    // very, very rare
    continue;
  }
} while(false);
UPDATE DATASTORE
  SET DATA=DigestInputStream(..)
  WHERE KEY=randomId
digest = ...
try {
updateCount = UPDATE DATASTORE SET
  KEY=digest, TEMP=FALSE
  WHERE KEY=randomId AND TEMP=TRUE
  AND NOT EXISTS(
  SELECT KEY FROM DATASTORE
  WHERE KEY=digest
  AND TEMP=FALSE AND LENGTH=...
)
} catch(duplicate key) {
  throw exception("duplicate key with different length");
}
if(updateCount == 0) {
  DELETE FROM DATASTORE WHERE KEY=randomId AND TEMP=true
}
return digest;

I hope you get the idea. The second UPDATE statement will return 0
update count if a record with the same digest exists. I didn't set the
LENGTH everywhere.

> [time]     [user session]                       [GC session]
> t0         node.setProperty(binary)
> t1                                        gc.start
> t2                                        gc.stop
> t3         node.save

Is this the problem? I didn't think about that so far... In my view it
is rare because the garbage collection usually will take some time,
and the time between node.setProperty and node.save is (hopefully)
short. But it needs to be solved. I will write a test case. A simple
solution is to only delete records when the repository is stopped (or
started). Obviously this is not a solution for long running
repositories. Another idea is to keep transient large binaries in a
WeakReferenceHashMap, and before deleting check that the record is not
in there.

> > > We needed to make RepositoryImpl.getWorkspaceNames() public
> it would be easier to just make them public.
> Or at least export some of those methods thru an utils class.

I will make it public.

> Do you mind
> if I send you a zip file with the implementations of the interfaces, the
> tests, the configurations and parsers, and the initialization routines?

There is no hurry, but please don't send it via email. The preferred
way is to attach the code to the bug:
http://issues.apache.org/jira/browse/JCR-1154 (you will be asked to
'Grant license to ASF for inclusion in ASF works').

Thanks,
Thomas

RE: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Esteban Franqueiro <ef...@bea.com>.
> Hi,
> 
> > > What about RepositoryException?
> > Yes, that would work too. But we wanted to be able to indentify the 
> > specific exception thrown from the DS. In a few places we 
> wrapped DSE 
> > inside a RE.

Yes, I changed that.

> What about if DataStoreException extends RepositoryException? 
> Like that you don't need to catch the exception and re-throw 
> a different one; but identifying is possible.
> 
> > > What about updating the time when getLength() is called?
> >
> > Sorry, I don't understand this.
> 
> Currently the modified date/time is updated when 
> modifiedDateOnRead is set and getStream() is called. You said 
> you want to avoid calling getStream().close(). So I suggest 
> to update the modified date/time when modifiedDateOnRead is 
> set and getLength() is called.
> 
> > > There is already DataStore.updateModifiedDateOnRead(long
> > > before); a separate method is not required in my view.
> > This didn't work in our testing.
> 
> Sorry, what does not work, and why?
> 
> > The FileDataRecord always queries the file object for it's 
> properties.
> 
> There is only getLength and getStream. Caching the length is possible.
> The length could be part of the identifier. For example, you 
> could do that for the DatabaseDataStore (where accessing this 
> information would be slow). For the FileDataStore, I think it 
> is not required as new
> File(..).length() is quite fast; but I may be wrong. So 
> basically the DataIdentifier for DatabaseDataStore could be 
> the digest plus the length.

For the database data store we're not using digests as ID's. We're using
UUID's. The thing about digests is that you have to get in the way
between the sender and the server. Initially we used something like the
FileDS to store the binary, get the digest and then upload it to the DB.
Then we decided to store directly in the DB, so we couldn't use digests
anymore.
Certainly, we could compute the digests in between the client and the
server, but we would need a two step upload process since after
computing the digest we would have to update the row to insert it there.
Using UUIDs is much easier since they don't depend on the content. Yes,
I know you loose the non-repetition property, but it was something we
could live with at the time. Anyway, if we can come up with another way,
that would be fine for us.
Also, storing the binary in the filesystem for digest computation,
instead of storing directly in the DB, has some impact on perfromance.

> > Well, we found that it is necesary, since the GC runs on a 
> different 
> > session than the users (we're using system sessions for this).
> 
> For the data store, it doesn't matter what session created or 
> accessed the data record. Also it doesn't matter when save() 
> was called - the data record is stored before that. As long 

Yes, the data record is stored, but the nodes referencing that record
are not.

> as the GC has access all nodes it will touch all 'not 
> recently used' data records. New data records (that were 
> created after the GC was started) will have a more recent 
> modified date / time and therefore the GC will not delete them.
> There is probably a misunderstanding somewhere. It would help 
> if you could post your current source code, maybe there is a 
> bug. Did you update the modified date / time in addRecord() 
> when the object already exists, as done in the file data store?

Yes, perhaps there's a misunderstanding. I will send you my code, but in
the meantime, let me explain what I mean.
The naive-GC reads all the nodes in the workspace to mark them (ie,
update access time). So if the user is uploading binaries in one session
and the GC is iterating the nodes on another, then until the user
save()s, the GC session won't see the changes in persistent storage. The
time of the binaries may be before or after the cut point of the GC,
depending on timing issues.
If it is before, then when the GC does a deleteAllOlderThan(), those
unmarked binaries will be deleted (they're unmarked because the nodes
that contain them could not be read because they don't exist yet).
I don't know if using the new PM interface will change any of this.

> > So, the user adds a binary property in one session, after 
> the file is 
> > uploaded but before the user save()s the session, the GC on 
> the system 
> > session starts reading all nodes from the workspace. Since 
> the changes 
> > are not yet written to persistent storage, the file is 
> assumed to be a 
> > deleted property, and is in fact deleted.
> 
> Is this using the database data store or the file data store? 
> If this is the case, it would be a bug.

Sorry, but I don't see the difference.

> > We needed to make RepositoryImpl.getWorkspaceNames() public 
> for this 
> > to work.
> 
> Yes that would be good. Did you have a look at 
> PersistenceManagerIteratorTest.testGetAllNodeIds()? 
> getWorkspaceNames is called using reflection (also.
> RepositoryImpl.class.getDeclaredMethod("getWorkspaceNames", 
> new Class[0]).setAccessible(true)).

Yes, I've seen it. But it would be easier to just make them public. Or
at least export some of those methods thru an utils class.

> > Will this change have any effect on the issue I just mentioned?
> 
> I don't know - first I need to understand the issue...
> 
> > Ok then, our current implementation is against a patched-up 1.3.1. 
> > What do you think is the best way to isolate the code?

May be we should discuss further after you get the code.
Since the copy of Jackrabbit I'm working has a lot of patches, it would
be pretty complicated to isolate those changes in a patch. Do you mind
if I send you a zip file with the implementations of the interfaces, the
tests, the configurations and parsers, and the initialization routines?

Regards,

Esteban Franqueiro
esteban.franqueiro@bea.com

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

RE: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Esteban Franqueiro <ef...@bea.com>.
> What we need is the implementations of the 
> org.apache.jackrabbit.code.data interfaces.

It's okay if I just zip them and attach them in an email to the list? Or
to you only?
I should also send any other changes to the rest of the core (there
aren't too many).

Regards,

Esteban Franqueiro
esteban.franqueiro@bea.com

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

Re: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

> > What about RepositoryException?
> Yes, that would work too. But we wanted to be able to indentify the
> specific exception thrown from the DS. In a few places we wrapped DSE
> inside a RE.

What about if DataStoreException extends RepositoryException? Like
that you don't need to catch the exception and re-throw a different
one; but identifying is possible.

> > What about updating the time when getLength() is called?
>
> Sorry, I don't understand this.

Currently the modified date/time is updated when modifiedDateOnRead is
set and getStream() is called. You said you want to avoid calling
getStream().close(). So I suggest to update the modified date/time
when modifiedDateOnRead is set and getLength() is called.

> > There is already DataStore.updateModifiedDateOnRead(long
> > before); a separate method is not required in my view.
> This didn't work in our testing.

Sorry, what does not work, and why?

> The FileDataRecord always queries the file object for it's properties.

There is only getLength and getStream. Caching the length is possible.
The length could be part of the identifier. For example, you could do
that for the DatabaseDataStore (where accessing this information would
be slow). For the FileDataStore, I think it is not required as new
File(..).length() is quite fast; but I may be wrong. So basically the
DataIdentifier for DatabaseDataStore could be the digest plus the
length.

> Well, we found that it is necesary, since the GC runs on a different
> session than the users (we're using system sessions for this).

For the data store, it doesn't matter what session created or accessed
the data record. Also it doesn't matter when save() was called - the
data record is stored before that. As long as the GC has access all
nodes it will touch all 'not recently used' data records. New data
records (that were created after the GC was started) will have a more
recent modified date / time and therefore the GC will not delete them.
There is probably a misunderstanding somewhere. It would help if you
could post your current source code, maybe there is a bug. Did you
update the modified date / time in addRecord() when the object already
exists, as done in the file data store?

> So, the user adds a binary property in one session, after the file is
> uploaded but before the user save()s the session, the GC on the system
> session starts reading all nodes from the workspace. Since the changes
> are not yet written to persistent storage, the file is assumed to be a
> deleted property, and is in fact deleted.

Is this using the database data store or the file data store? If this
is the case, it would be a bug.

> We needed to make RepositoryImpl.getWorkspaceNames() public for this to
> work.

Yes that would be good. Did you have a look at
PersistenceManagerIteratorTest.testGetAllNodeIds()? getWorkspaceNames
is called using reflection (also.
RepositoryImpl.class.getDeclaredMethod("getWorkspaceNames", new
Class[0]).setAccessible(true)).

> Will this change have any effect on the issue I just mentioned?

I don't know - first I need to understand the issue...

> Ok then, our current implementation is against a patched-up 1.3.1. What
> do you think is the best way to isolate the code?

What we need is the implementations of the
org.apache.jackrabbit.code.data interfaces.

Thanks,
Thomas

RE: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Esteban Franqueiro <ef...@bea.com>.
Hi Thomas.

> A database-backed data store would be great!

Yes, indeed :)

> What about RepositoryException?

Yes, that would work too. But we wanted to be able to indentify the
specific exception thrown from the DS. In a few places we wrapped DSE
inside a RE.

> What about updating the time when getLength() is called?

Sorry, I don't understand this.

> There is already DataStore.updateModifiedDateOnRead(long 
> before); a separate method is not required in my view.

This didn't work in our testing.

> > The issue of the delay when calling getStream()/getRecord() 
> means that 
> > the information provided by the record has to be stored in 
> the record, 
> > instead of relaying in the backing store (like it's done in the 
> > FileDataRecord class).
> 
> Sorry I don't understand this part.

The FileDataRecord always queries the file object for it's properties. A
DatabaseRecord should store all the info (except the stream itself)
instead of going to the DB on each getter call.

> > a stream that closes its DB resources when it's closed is needed.
> 
> That should be simple to implement; I suggest to do this as 
> part of the database data store (I can help if required).

We have an implementation of this already, but is a little messy. I'll
be cleaning it up shortly.

> > getRecord() should not retrieve the stream and keep resources open 
> > unless explictly asked
> 
> Sure. This is done already for the FileDataRecord. Is there a 
> problem to delay opening the stream for the database data store?

No, I'm just mentioning the issues we run into.

> > We have code to run the GC only once on repository startup, 
> and in a 
> > background thread
> 
> Both should work. I prefer the background thread

Me too, but we needed an interim implementation while we continued to
develop and test the other method.

> > run-once option needs global lock during it's run.
> > no session can be started while a collection is in progress.
> > The background thread instead needs some way to keep track 
> of changes 
> > in the binary properties.
> 
> I don't think this is required. Let's say a large object is 
> deleted while the garbage collection runs. In this case it 
> will not be collected, which is OK in my view (it will be 
> collected in the next GC run). If a new object is inserted, 
> it will not be collected because the last modified date is newer.

Well, we found that it is necesary, since the GC runs on a different
session than the users (we're using system sessions for this). Also note
that we're currently using the simple GC that's in the svn.
So, the user adds a binary property in one session, after the file is
uploaded but before the user save()s the session, the GC on the system
session starts reading all nodes from the workspace. Since the changes
are not yet written to persistent storage, the file is assumed to be a
deleted property, and is in fact deleted.
Maybe running the GC in the system session is the wrong approach, but we
haven't dug deeper into it yet.
We needed to make RepositoryImpl.getWorkspaceNames() public for this to
work.

> My plan is to scan in the persistence manager, using the new 
> method getAllNodeIds(), if a bundle persistence manager is 
> used. This should speed up the GC scan.

Will this change have any effect on the issue I just mentioned?

> > We can contribute our code,
> 
> That would be great of course!

Ok then, our current implementation is against a patched-up 1.3.1. What
do you think is the best way to isolate the code?

> > but it's gonna take same time to extract it our code it's more of a 
> > POC than production-ready for now.
> 
> No problem. We can work together on fixing the problems.

Sure, that would be great!

Regards,

Esteban Franqueiro
esteban.franqueiro@bea.com

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

Re: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

A database-backed data store would be great!

> methods throw IOException
> DataStoreException

What about RepositoryException?

> GC does a p.getStream().close() to update the last modified time of the record

You are right, this will be changed.

> store.updateRecordLastModifiedTime(id, System.currentTimeMillis());

What about updating the time when getLength() is called?
There is already DataStore.updateModifiedDateOnRead(long before);
a separate method is not required in my view.

> The issue of the delay when calling getStream()/getRecord() means that
> the information provided by the record has to be stored in the record,
> instead of relaying in the backing store (like it's done in the
> FileDataRecord class).

Sorry I don't understand this part.

> a stream that closes its DB resources when it's closed is needed.

That should be simple to implement; I suggest to do this as part of the
database data store (I can help if required).

> getRecord() should not retrieve the stream
> and keep resources open unless explictly asked

Sure. This is done already for the FileDataRecord. Is there a problem
to delay opening the stream for the database data store?

> We have code to run the GC only once on repository
> startup, and in a background thread

Both should work. I prefer the background thread

> run-once option needs global lock during it's run.
> no session can be started while a collection is in
> progress.
> The background thread instead needs some way to keep track of
> changes in the binary properties.

I don't think this is required. Let's say a large object is deleted
while the garbage collection runs. In this case it will not be
collected, which is OK in my view (it will be collected in the next GC
run). If a new object is inserted, it will not be collected because
the last modified date is newer.

My plan is to scan in the persistence manager, using the new method
getAllNodeIds(), if a bundle persistence manager is used. This should
speed up the GC scan.

> We can contribute our code,

That would be great of course!

> but it's gonna take same time to extract it
> our code it's more of a POC than production-ready for now.

No problem. We can work together on fixing the problems.

Thanks for your help!
Thomas

RE: [jira] Resolved: (JCR-926) Global data store for binaries

Posted by Esteban Franqueiro <ef...@bea.com>.
Hi Thomas.
A few comments regarding the data store.
We're very interested in a database-backed data store, and we've been
working on it for a few weeks now. A couple of issues came up that made
us modify the original interface.
The first thing that came up was that the methods in the DataStore
interface throw IOException. This is not correct, since a database will
throw an SQLException. What we did was to create a new
DataStoreException, and then wrap the implementation-dependent
exceptions in it (since wrapping an SQLException in an IOException isn't
very pretty). We needed to change a few places in the code where an
IOException was expected also.
Another problem we found, is that the GC does a p.getStream().close() to
update the last modified time of the record. When the record is in a DB,
this causes a very long delay (we think it retrieves the data from the
store). So what we did was hacking some code to get the id of the record
and call back to the data store to update the record, as in:
	Object blob = ((PropertyImpl)
p).internalGetValue().getBLOBValue();
	if (blob instanceof BLOBInDataStore) {
		DataIdentifier id = ((BLOBInDataStore)
blob).getIdentifier();
		store.updateRecordLastModifiedTime(id,
System.currentTimeMillis());
	}
This works for the time being, but since InternalValue.internalValue()
is deprecated, a better way is needed :P
The issue of the delay when calling getStream()/getRecord() means that
the information provided by the record has to be stored in the record,
instead of relaying in the backing store (like it's done in the
FileDataRecord class).
The last issue I wan't to mention is that when you access a record and
read it's stream you can't close the connection, result set, and
statement used to access it, so a stream that closes its DB resources
when it's closed is needed. On the same track, a getRecord() call should
not retrieve the stream and keep the resources open unless explictly
asked (ie, by a getStream() call).

On a different but related topic, we're investigating how to connect the
GC with the DS. We have code to run the GC only once on repository
startup, and in a background thread. Both methods have it's pros and
cons of course. The run-once option is easier, but needs to grab a
global lock during it's run (we're using the RepositoryImpl.shutdownLock
for this) so that no session can be started while a collection is in
progress. The background thread instead needs some way to keep track of
changes in the binary properties. We didn't give this to much thought
yet because other things came up, but we'll get back to it soon.

We can contribute our code, but it's gonna take same time to extract it
(we moved it over a few JR versions). Also note that our code it's more
of a POC than production-ready for now.

Regards,

PS: sorry for the lengthy mail

Esteban Franqueiro
esteban.franqueiro@bea.com

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.