You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Julian Reschke (JIRA)" <ji...@apache.org> on 2008/02/05 22:11:10 UTC

[jira] Created: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Lock test assumes that changes in one session are immediately visible in different session
------------------------------------------------------------------------------------------

                 Key: JCR-1361
                 URL: https://issues.apache.org/jira/browse/JCR-1361
             Project: Jackrabbit
          Issue Type: Bug
          Components: jackrabbit-jcr-tests
            Reporter: Julian Reschke
            Assignee: Julian Reschke
            Priority: Minor


LockTest.testLogout() assumes that a change in one session (logging out, removing a session-scoped lock) is immediately visible in another session.

Proposal: insert a 

 n1.getSession().refresh(true);

call before checking

 assertFalse("node must not be locked", n1.isLocked());

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566073#action_12566073 ] 

Marcel Reutegger commented on JCR-1361:
---------------------------------------

+1

> Lock test assumes that changes in one session are immediately visible in different session
> ------------------------------------------------------------------------------------------
>
>                 Key: JCR-1361
>                 URL: https://issues.apache.org/jira/browse/JCR-1361
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: jackrabbit-jcr-tests
>            Reporter: Julian Reschke
>            Assignee: Julian Reschke
>            Priority: Minor
>
> LockTest.testLogout() assumes that a change in one session (logging out, removing a session-scoped lock) is immediately visible in another session.
> Proposal: insert a 
>  n1.getSession().refresh(true);
> call before checking
>  assertFalse("node must not be locked", n1.isLocked());

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-1361:
-------------------------------

    Fix Version/s: 1.5

> Lock test assumes that changes in one session are immediately visible in different session
> ------------------------------------------------------------------------------------------
>
>                 Key: JCR-1361
>                 URL: https://issues.apache.org/jira/browse/JCR-1361
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: jackrabbit-jcr-tests
>            Reporter: Julian Reschke
>            Assignee: Julian Reschke
>            Priority: Minor
>             Fix For: 1.5
>
>
> LockTest.testLogout() assumes that a change in one session (logging out, removing a session-scoped lock) is immediately visible in another session.
> Proposal: insert a 
>  n1.getSession().refresh(true);
> call before checking
>  assertFalse("node must not be locked", n1.isLocked());

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
Julian Reschke wrote:
> OK,
> 
> I finally obtained measurements for jackrabbit native and JCR2SPI (see 
> proposed test in JCR-1437):
> 
> jackrabbit-core: 3.35ms per iteration
> jcr2spi: 756ms per iteration
> 
> Hopefully once the tests are checked-in, and the results are 
> reproducible everywhere, we agree that there's a problem :-)
> 
> BR, Julian

For the record, these tests can now be reproduced everywhere, by running

   mvn -Dtest=JCRBenchmark test

in both jackrabbit-core and jackrabbit-jcr-benchmark, and grepping for 
BigCollectionTest in the resulting log file.

BR, Julian


Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
OK,

I finally obtained measurements for jackrabbit native and JCR2SPI (see 
proposed test in JCR-1437):

jackrabbit-core: 3.35ms per iteration
jcr2spi: 756ms per iteration

Hopefully once the tests are checked-in, and the results are 
reproducible everywhere, we agree that there's a problem :-)

BR, Julian

RE: Repository factory, was: SPI caching, was: [jira] Resolved:(JCR-1361) Lock testassumesthat changes in one session are immediatelyvisible in differentsession

Posted by David Rauschenbach <Da...@SYNCHRONICA.COM>.
 
Another one of the areas with potential for performance improvement is in JCR2SPI. Sometimes fetching a node will cause getItemInfos to be called, which can return all the properties for a node at once, which is great. But other times, the JCR client calling code might access 8 properties, and it results in 8 separate getPropertyInfo calls instead of a single getItemInfos call. That is one area where I am trying to put adaptive code into my SPI's, to intercept the getPropertyInfo at some point and invoke getItemInfos instead. I don't know what, if any of this, could end up in JCR2SPI as a generalized solution, but I'll be keeping it in mind.

David
-----Original Message-----
From: David Rauschenbach
Sent: Tue 2/12/2008 3:50 PM
To: dev@jackrabbit.apache.org
Subject: Re: Repository factory, was: SPI caching, was: [jira] Resolved:(JCR-1361) Lock testassumesthat changes in one session are immediatelyvisible in differentsession
 
Yes I use a custom RepositoryConfig, and implement bean methods there
for custom configuration, as it applies to whatever the repository is.
But, there's no way to remote a serialized RepositoryConfig over SPI, to
do the factory work at the remote end, if you know what I mean. Again,
just because the spec doesn't address how to do the configuration and
factory work doesn't mean it doesn't have to happen. That's where some
flexibility in SPI is needed, so that it can allow someone to write a
proxy, or gateway, or middleware of whatever sort.

Sorry I was not clear about the repository descriptors. Yes you're right
you can ask for descriptors without credentials, but that's the same
reason you could never make that call to an SPI web service that used
container-managed security, to get those descriptors, because during
such a call the credentials are not yet known. This could be fudged, if
it were not for the fact that JCR2SPI asks for the descriptors before
attempting a login, even though I seem to recall examining the code and
seeing it had no pressing need for those descriptors until after login,
when nodes were being dealt with.

David

 
Visit Synchronica at GSMA Mobile World Congress, Barcelona, 11-14 Feb, Hall 2, Booth #2J25
 
 

Re: Repository factory, was: SPI caching, was: [jira] Resolved:(JCR-1361) Lock testassumesthat changes in one session are immediatelyvisible in differentsession

Posted by David Rauschenbach <da...@synchronica.com>.
 
Yes I use a custom RepositoryConfig, and implement bean methods there
for custom configuration, as it applies to whatever the repository is.
But, there's no way to remote a serialized RepositoryConfig over SPI, to
do the factory work at the remote end, if you know what I mean. Again,
just because the spec doesn't address how to do the configuration and
factory work doesn't mean it doesn't have to happen. That's where some
flexibility in SPI is needed, so that it can allow someone to write a
proxy, or gateway, or middleware of whatever sort.

Sorry I was not clear about the repository descriptors. Yes you're right
you can ask for descriptors without credentials, but that's the same
reason you could never make that call to an SPI web service that used
container-managed security, to get those descriptors, because during
such a call the credentials are not yet known. This could be fudged, if
it were not for the fact that JCR2SPI asks for the descriptors before
attempting a login, even though I seem to recall examining the code and
seeing it had no pressing need for those descriptors until after login,
when nodes were being dealt with.

David



On Tue, 2008-02-12 at 14:10 +0100, Julian Reschke wrote:
> David Rauschenbach wrote:
> >  
> > I should have elaborated on the Repository factory problem.
> > 
> > Let's say someone implements the
> > org.apache.jackrabbit.commons.repository.RepositoryFactory interface.
> > They might have some constructor arguments, or some bean setters, for
> > setting properties like target host and protocol in a corba or web
> > service, or whatever.
> > 
> > Instantiating such a factory via Spring might look like this:
> > 
> >   <bean id="repoFactory" class="MyFactory">
> >     <property name="uri" value="iiop://server1/svc"/>
> >     <property name="debug" value="true"/>
> >   </bean>
> > 
> >   <bean id="repo"
> >     factory-bean="repoFactory"
> >     factory-method="getRepository"
> >   </bean>
> > 
> > So, the proprietary setter methods of the factory (setDebug(boolean) and
> > setUri(String)) have to pass over the SPI, in order to instantiate the
> > target Repository that is to be used by SPI2JCR.
> 
> Hm. Right now you can implement a RepositoryConfigFactory with the same 
> pattern, and then pass a RepositoryConfig with the right settions to 
> JCR2SPI for creation of a Repository. Am I missing something?
> 
> > And all that has to be able to happen when or before getDescriptors gets
> > called, which JCR2SPI calls before login.
> > 
> > In other words:
> > 
> >   1. You call getDescriptors on the repository, not the session. SPI
> > needs to reflect that.
> 
> In SPI, the method is on RepositoryService and is not related to 
> SessionInfo, so it is already that way, isn't it?
> 
> >   2. Repository factories and configuration are not addressed by JSR170, but SPI has to allow it to occur anyway.
> 
> Nice to have? Yes. Necessary? Not sure.
> 
> Could you please explain what you can't do today? You have full control 
> over your implementation of RepositoryConfig, doesn't this give you the 
> necessary flexibility?
> 
> BR, Julian
> 
> 

 
Visit Synchronica at GSMA Mobile World Congress, Barcelona, 11-14 Feb, Hall 2, Booth #2J25
 
 

Repository factory, was: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumesthat changes in one session are immediately visible in differentsession

Posted by Julian Reschke <ju...@gmx.de>.
David Rauschenbach wrote:
>  
> I should have elaborated on the Repository factory problem.
> 
> Let's say someone implements the
> org.apache.jackrabbit.commons.repository.RepositoryFactory interface.
> They might have some constructor arguments, or some bean setters, for
> setting properties like target host and protocol in a corba or web
> service, or whatever.
> 
> Instantiating such a factory via Spring might look like this:
> 
>   <bean id="repoFactory" class="MyFactory">
>     <property name="uri" value="iiop://server1/svc"/>
>     <property name="debug" value="true"/>
>   </bean>
> 
>   <bean id="repo"
>     factory-bean="repoFactory"
>     factory-method="getRepository"
>   </bean>
> 
> So, the proprietary setter methods of the factory (setDebug(boolean) and
> setUri(String)) have to pass over the SPI, in order to instantiate the
> target Repository that is to be used by SPI2JCR.

Hm. Right now you can implement a RepositoryConfigFactory with the same 
pattern, and then pass a RepositoryConfig with the right settions to 
JCR2SPI for creation of a Repository. Am I missing something?

> And all that has to be able to happen when or before getDescriptors gets
> called, which JCR2SPI calls before login.
> 
> In other words:
> 
>   1. You call getDescriptors on the repository, not the session. SPI
> needs to reflect that.

In SPI, the method is on RepositoryService and is not related to 
SessionInfo, so it is already that way, isn't it?

>   2. Repository factories and configuration are not addressed by JSR170, but SPI has to allow it to occur anyway.

Nice to have? Yes. Necessary? Not sure.

Could you please explain what you can't do today? You have full control 
over your implementation of RepositoryConfig, doesn't this give you the 
necessary flexibility?

BR, Julian



RE: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumesthat changes in one session are immediately visible in differentsession

Posted by David Rauschenbach <Da...@SYNCHRONICA.COM>.
 
Thanks for mentioning that JSR283 mechanism. Suddenly the TCK makes a little more sense to me. My colleague says we will pay a price for using JNDI, since everything coming out of the Repository will have to get serialized and then deserialized back again? I suppose I'll just pay the price (5% ??), and leave the debate to the Spring versus J2EE crowd. It will be nice, though, to deploy repository impl's as stand-alone WAR files.

David
-----Original Message-----
From: Marcel Reutegger [mailto:marcel.reutegger@gmx.net]
Sent: Wed 2/13/2008 11:10 AM
To: dev@jackrabbit.apache.org
Subject: Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumesthat changes in one session are immediately visible in differentsession
 
David Rauschenbach wrote:
> In other words:
> 
>   1. You call getDescriptors on the repository, not the session. SPI
> needs to reflect that.

that's why you can call RepositoryService.getDescriptors() without a SessionInfo.

>   2. Repository factories and configuration are not addressed by JSR170,
 > but SPI has to allow it to occur anyway.

so far we did not deal with initialization of an SPI implementation, but feel 
free to suggest a factory mechanism.

please note, that JSR 283 introduces a standard way how to obtain a repository 
instance though a fectory mechanism.

regards
  marcel


 
Visit Synchronica at GSMA Mobile World Congress, Barcelona, 11-14 Feb, Hall 2, Booth #2J25
 
 

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumesthat changes in one session are immediately visible in differentsession

Posted by Marcel Reutegger <ma...@gmx.net>.
David Rauschenbach wrote:
> In other words:
> 
>   1. You call getDescriptors on the repository, not the session. SPI
> needs to reflect that.

that's why you can call RepositoryService.getDescriptors() without a SessionInfo.

>   2. Repository factories and configuration are not addressed by JSR170,
 > but SPI has to allow it to occur anyway.

so far we did not deal with initialization of an SPI implementation, but feel 
free to suggest a factory mechanism.

please note, that JSR 283 introduces a standard way how to obtain a repository 
instance though a fectory mechanism.

regards
  marcel

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumesthat changes in one session are immediately visible in differentsession

Posted by David Rauschenbach <da...@synchronica.com>.
 
I should have elaborated on the Repository factory problem.

Let's say someone implements the
org.apache.jackrabbit.commons.repository.RepositoryFactory interface.
They might have some constructor arguments, or some bean setters, for
setting properties like target host and protocol in a corba or web
service, or whatever.

Instantiating such a factory via Spring might look like this:

  <bean id="repoFactory" class="MyFactory">
    <property name="uri" value="iiop://server1/svc"/>
    <property name="debug" value="true"/>
  </bean>

  <bean id="repo"
    factory-bean="repoFactory"
    factory-method="getRepository"
  </bean>

So, the proprietary setter methods of the factory (setDebug(boolean) and
setUri(String)) have to pass over the SPI, in order to instantiate the
target Repository that is to be used by SPI2JCR.

And all that has to be able to happen when or before getDescriptors gets
called, which JCR2SPI calls before login.

In other words:

  1. You call getDescriptors on the repository, not the session. SPI
needs to reflect that.

  2. Repository factories and configuration are not addressed by JSR170, but SPI has to allow it to occur anyway.

David



On Tue, 2008-02-12 at 12:51 +0100, David Rauschenbach wrote:
> Some other SPI deficiencies off the top of my head, while we're on the
> subject:
> 
>   1. Default workspace name. They should not be specified in JCR2SPI.
> When you log into Exchange or an IMAP server, or a SQL Server, you get a
> default namespace (or database) based on what the server thinks is the
> right default name for you. The name should not be known at the client
> (that's why it's a default!). Once you're logged in, you can ask for
> other namespace names, which also fits in nicely with JCR
> (Workspace.getAccessibleWorkspaceNames). But specifying a default
> namespace name in JCR2SPI is wrong, because a null should be passed when
> the server should choose.
> 
>   2. Repository descriptors over SPI. This is a deficiency. I have a web
> service that remotes SPI, and I can't ask the target web service for the
> repository descriptors. It doesn't help that JCR2SPI asks for the
> descriptors before a login. That makes it almost mandatory to mock up
> fake values in the SPI, before a dynamic proxy can initialize itself, if
> initialization is dependent on logging in, as is the case in 2 out of
> the 5 SPIs I have in front of me, or in cases where the target web
> service is deployed under the umbrella of container-managed security.
> 
>   3. Out-of-band data. Since JCR doesn't address configuration, or a
> RepositoryFactory pattern (even though Jackrabbit provides one), it is
> up to each implementation to get configuration done. SPI could use a
> place to pass this data. SimpleCredential attributes are not enough.
> IIOP for CORBA is a good example, where you can stuff profiles with your
> extra data, like timezone of the client, alternate server names,
> whatever. For SPI, the out-of-band data could be a place where the
> implementation-specific BatchReadConfig for a specific getItemInfos
> operation could be placed. I'm not sure this is a good idea for SPI, but
> I'm thinking about it.
> 
> David
> 
> 
> 
> On Mon, 2008-02-11 at 18:49 +0100, Julian Reschke wrote:
> > Julian Reschke wrote:
> > > At the end of the day, what we should do is *measure* the performance of 
> > > JCR2SPI compared to native implementations. I'll try to submit a few 
> > > tests soon.
> > > ...
> > 
> > OK, I've got tests (not polished) and numbers.
> > 
> > Scenario:
> > 
> > A collection /a/b with 500 members, each 1024 in size, content type 
> > application/octet-stream.
> > 
> > Test code that obtains all members, checking content type, size, and 
> > total number.
> > 
> > My store can do that in ~80ms.
> > 
> > Why doing it through SPI (with limited batch read support), it will take 
> > ~1500ms.
> > 
> > Wrapping that with JCR2SPI, it takes around ~2700ms.
> > 
> > So it seems we need drastically remove the overhead introduced by the 
> > SPI API.
> > 
> > Test code below:
> > 
> > 
> >    private String createTestCollJcr(String p_parentpath, int p_members, 
> > int p_size) throws Exception {
> >      Repository l_repository = getRepository();
> >      Session l_session = null;
> > 
> >      try {
> >        l_session = l_repository.login(getCredentials());
> > 
> >        Node l_folder = null;
> >        try {
> >          l_folder = (Node)l_session.getItem(p_parentpath + "/bigcoll");
> >        }
> >        catch (RepositoryException ex) {
> >          // nothing to do
> >        }
> > 
> >        // delete when needed
> >        if (l_folder != null) {
> >          l_folder.remove();
> >          l_session.save();
> >        }
> > 
> >        Node l_parent = (Node)l_session.getItem(p_parentpath);
> >        l_folder = l_parent.addNode("bigcoll", "nt:folder");
> >        assertNotNull(l_folder);
> > 
> >        long l_cnt = 0;
> > 
> >        while (l_cnt < p_members) {
> >          InputStream l_is = new BufferedInputStream(new 
> > ContentGenerator(p_size), p_size);
> >          Node l_new = l_folder.addNode("tst" + l_cnt, "nt:file");
> >          Node l_cnew = l_new.addNode("jcr:content", "nt:resource");
> >          l_cnew.setProperty("jcr:data", l_is);
> >          l_cnew.setProperty("jcr:mimeType", "application/octet-stream");
> >          l_session.save();
> >          l_cnt += 1;
> >        }
> >      }
> >      finally {
> >        if (l_session != null) {
> >          l_session.logout();
> >        }
> >      }
> > 
> >      return p_parentpath + "/bigcoll";
> >    }
> > 
> > 
> >    private static int BIGCOLLMEMBERS = 500;
> >    private static int BIGCOLLMEMBERSIZE = 1024;
> >    private static String BIGCOLLMIMETYPE = "application/octet-stream";
> > 
> >    public void testGetMembersSpi() throws Exception {
> > 
> >      String l_path = createTestColl(this.m_path, BIGCOLLMEMBERS, 
> > BIGCOLLMEMBERSIZE);
> > 
> >      RepositoryService l_rs = getRepositoryService();
> >      SessionInfo l_si = null;
> > 
> >      try {
> >        l_si = l_rs.obtain(getCredentials(), null);
> > 
> >        long l_start = System.currentTimeMillis();
> >        long l_cnt = 0;
> > 
> >        while (System.currentTimeMillis() - l_start < MS || l_cnt < 5) {
> >          NodeId l_nid = TestPerf.computeNodeId(l_rs, l_si, l_path);
> >          int l_members = 0;
> >          for (Iterator<ChildInfo> l_it = 
> > (Iterator<ChildInfo>)l_rs.getChildInfos(l_si, l_nid); l_it.hasNext(); ) {
> >            ChildInfo l_c = l_it.next();
> >            assertNotNull(l_c);
> >            NodeId l_cnid = 
> > l_rs.getIdFactory().createNodeId(l_c.getUniqueID());
> >            NodeInfo l_node = null;
> >            NodeInfo l_contentnode = null;
> >            PropertyInfo l_mimetype = null;
> >            PropertyInfo l_data = null;
> >            Iterator l_iteminfos = l_rs.getItemInfos(l_si, l_cnid);
> >            l_node = (NodeInfo)l_iteminfos.next();
> >            assertNotNull(l_node);
> > 
> >            while (l_iteminfos.hasNext()) {
> >              ItemInfo l_i = (ItemInfo)l_iteminfos.next();
> >              if (l_i.getParentId().equals(l_node.getId()) && 
> > NameConstants.JCR_CONTENT.equals(l_i.getName())) {
> >                l_contentnode = (NodeInfo)l_i;
> >              }
> >              if (l_contentnode != null && 
> > l_i.getParentId().equals(l_contentnode.getId()) && 
> > NameConstants.JCR_MIMETYPE.equals(l_i.getName())) {
> >                l_mimetype = (PropertyInfo)l_i;
> >              }
> >              if (l_contentnode != null && 
> > l_i.getParentId().equals(l_contentnode.getId()) && 
> > NameConstants.JCR_DATA.equals(l_i.getName())) {
> >                l_data = (PropertyInfo)l_i;
> >              }
> >            }
> > 
> >            if (l_contentnode == null) {
> >              // explicitly fetch the content node, it wasn't returned 
> > with the parent
> >              NodeId l_contentnodeid = 
> > l_rs.getIdFactory().createNodeId(l_c.getUniqueID(), 
> > l_rs.getPathFactory().create(NameConstants.JCR_CONTENT));
> >              Iterator l_iteminfos2 = l_rs.getItemInfos(l_si, 
> > l_contentnodeid);
> >              l_contentnode = (NodeInfo)l_iteminfos2.next();
> >              while (l_iteminfos2.hasNext()) {
> >                ItemInfo l_i = (ItemInfo)l_iteminfos2.next();
> >                if (l_i.getParentId().equals(l_contentnode.getId()) && 
> > NameConstants.JCR_MIMETYPE.equals(l_i.getName())) {
> >                  l_mimetype = (PropertyInfo)l_i;
> >                }
> >                if (l_i.getParentId().equals(l_contentnode.getId()) && 
> > NameConstants.JCR_DATA.equals(l_i.getName())) {
> >                  l_data = (PropertyInfo)l_i;
> >                }
> >              }
> >            }
> > 
> >            assertNotNull(l_contentnode);
> > 
> >            if (l_mimetype == null) {
> >              // explicitly fetch the mime type property, it wasn't 
> > returned with the parent
> >              PropertyId l_mimetypeid = 
> > l_rs.getIdFactory().createPropertyId(l_contentnode.getId(), 
> > NameConstants.JCR_MIMETYPE);
> >              l_mimetype = l_rs.getPropertyInfo(l_si, l_mimetypeid);
> >            }
> > 
> >            assertNotNull(l_mimetype);
> >            assertEquals(BIGCOLLMIMETYPE, 
> > l_mimetype.getValues()[0].getString());
> > 
> >            if (l_data == null) {
> >              // explicitly fetch the mime type property, it wasn't 
> > returned with the parent
> >              PropertyId l_dataid = 
> > l_rs.getIdFactory().createPropertyId(l_contentnode.getId(), 
> > NameConstants.JCR_DATA);
> >              l_data = l_rs.getPropertyInfo(l_si, l_dataid);
> >            }
> > 
> >            assertNotNull(l_data);
> >            assertEquals(BIGCOLLMEMBERSIZE, 
> > l_data.getValues()[0].getLength());
> > 
> >            l_members += 1;
> >          }
> >          assertEquals(BIGCOLLMEMBERS, l_members);
> >          l_cnt += 1;
> >        }
> > 
> >        long l_elapsed = System.currentTimeMillis() - l_start;
> > 
> >        LOG.info(String.format("GetMembers - SPI: %.4fms per call (%d 
> > iterations)", (double)l_elapsed / l_cnt, l_cnt));
> > 
> >      }
> >      finally {
> >        if (l_si != null) {
> >          l_rs.dispose(l_si);
> >        }
> >      }
> >    }
> > 
> >    public void testGetMembersJcr() throws Exception {
> > 
> >      String l_path = createTestCollJcr(this.m_path, BIGCOLLMEMBERS, 
> > BIGCOLLMEMBERSIZE);
> > 
> >      Repository l_repository = getRepository();
> >      Session l_session = null;
> > 
> >      try {
> >        l_session = l_repository.login(getCredentials(), null);
> > 
> >        long l_start = System.currentTimeMillis();
> >        long l_cnt = 0;
> > 
> >        while (System.currentTimeMillis() - l_start < MS || l_cnt < 5) {
> >          Node l_dir = (Node)l_session.getItem(l_path);
> >          assertNotNull(l_dir);
> >          int l_members = 0;
> >          for (NodeIterator l_it = l_dir.getNodes(); l_it.hasNext(); ) {
> >            Node l_c = l_it.nextNode();
> >            Node l_e = l_c.getNode("jcr:content");
> >            String l_type = l_e.getProperty("jcr:mimeType").getString();
> >            long l_length = l_e.getProperty("jcr:data").getLength();
> >            assertTrue(l_c.isNode());
> >            assertEquals(BIGCOLLMIMETYPE, l_type);
> >            assertEquals(BIGCOLLMEMBERSIZE, l_length);
> >            l_members += 1;
> >          }
> >          assertEquals(BIGCOLLMEMBERS, l_members);
> >          l_session.refresh(false);
> >          l_cnt += 1;
> >        }
> > 
> >        long l_elapsed = System.currentTimeMillis() - l_start;
> > 
> >        LOG.info(String.format("GetMembers - JCR: %.4fms per call (%d 
> > iterations)", (double)l_elapsed / l_cnt, l_cnt));
> > 
> >      }
> >      finally {
> >        if (l_session != null) {
> >          l_session.logout();
> >        }
> >      }
> >    }
> > 
> >    private class ContentGenerator extends InputStream {
> > 
> >      private long m_length;
> >      private long m_position;
> > 
> >      public ContentGenerator(long p_length) {
> >        this.m_length = p_length;
> >        this.m_position = 0;
> >      }
> > 
> >      public int read() {
> > 
> >        if (this.m_position++ < this.m_length) {
> >          return 0;
> >        }
> >        else {
> >          return -1;
> >        }
> >      }
> >    }
> > 
> 
>  
> Visit Synchronica at GSMA Mobile World Congress, Barcelona, 11-14 Feb, Hall 2, Booth #2J25
>  
> 

 
Visit Synchronica at GSMA Mobile World Congress, Barcelona, 11-14 Feb, Hall 2, Booth #2J25
 
 

Other SPI design questions, was: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumes that changes in one session are immediately visible in differentsession

Posted by Julian Reschke <ju...@gmx.de>.
David Rauschenbach wrote:
>  
> Some other SPI deficiencies off the top of my head, while we're on the
> subject:
> 
>   1. Default workspace name. They should not be specified in JCR2SPI.
> When you log into Exchange or an IMAP server, or a SQL Server, you get a
> default namespace (or database) based on what the server thinks is the
> right default name for you. The name should not be known at the client
> (that's why it's a default!). Once you're logged in, you can ask for
> other namespace names, which also fits in nicely with JCR
> (Workspace.getAccessibleWorkspaceNames). But specifying a default
> namespace name in JCR2SPI is wrong, because a null should be passed when
> the server should choose.

We're talking about RepositoryConfig, right?

My implementation just returns "", which my RepositoryService takes to 
mean that the default workspace should be returned. But it's not the 
default workspace name.

Anyway, I agree this seems to be useless. Let's get rid of this, and let 
it work just as in JCR.

>   2. Repository descriptors over SPI. This is a deficiency. I have a web
> service that remotes SPI, and I can't ask the target web service for the
> repository descriptors. It doesn't help that JCR2SPI asks for the
> descriptors before a login. That makes it almost mandatory to mock up
> fake values in the SPI, before a dynamic proxy can initialize itself, if
> initialization is dependent on logging in, as is the case in 2 out of
> the 5 SPIs I have in front of me, or in cases where the target web
> service is deployed under the umbrella of container-managed security.

Repository descriptors are a design problem in JCR itself, as they 
always apply to a whole repository.

Thus, a server that aggregates multiple different kinds of store as a 
"virtual" repository always will have to lie.

>   3. Out-of-band data. Since JCR doesn't address configuration, or a
> RepositoryFactory pattern (even though Jackrabbit provides one), it is

...proposed for JSR-283...

> up to each implementation to get configuration done. SPI could use a
> place to pass this data. SimpleCredential attributes are not enough.

We currently support RepositoryConfig configuration using JNDI. There's 
also the plan to replicate whatever we get in JSR-283 into SPI, but I 
don't think we've done that yet.

> IIOP for CORBA is a good example, where you can stuff profiles with your
> extra data, like timezone of the client, alternate server names,
> whatever. For SPI, the out-of-band data could be a place where the
> implementation-specific BatchReadConfig for a specific getItemInfos
> operation could be placed. I'm not sure this is a good idea for SPI, but
> I'm thinking about it.
> 
> David

BR, Julian

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumes that changes in one session are immediately visible in differentsession

Posted by David Rauschenbach <da...@synchronica.com>.
 
Some other SPI deficiencies off the top of my head, while we're on the
subject:

  1. Default workspace name. They should not be specified in JCR2SPI.
When you log into Exchange or an IMAP server, or a SQL Server, you get a
default namespace (or database) based on what the server thinks is the
right default name for you. The name should not be known at the client
(that's why it's a default!). Once you're logged in, you can ask for
other namespace names, which also fits in nicely with JCR
(Workspace.getAccessibleWorkspaceNames). But specifying a default
namespace name in JCR2SPI is wrong, because a null should be passed when
the server should choose.

  2. Repository descriptors over SPI. This is a deficiency. I have a web
service that remotes SPI, and I can't ask the target web service for the
repository descriptors. It doesn't help that JCR2SPI asks for the
descriptors before a login. That makes it almost mandatory to mock up
fake values in the SPI, before a dynamic proxy can initialize itself, if
initialization is dependent on logging in, as is the case in 2 out of
the 5 SPIs I have in front of me, or in cases where the target web
service is deployed under the umbrella of container-managed security.

  3. Out-of-band data. Since JCR doesn't address configuration, or a
RepositoryFactory pattern (even though Jackrabbit provides one), it is
up to each implementation to get configuration done. SPI could use a
place to pass this data. SimpleCredential attributes are not enough.
IIOP for CORBA is a good example, where you can stuff profiles with your
extra data, like timezone of the client, alternate server names,
whatever. For SPI, the out-of-band data could be a place where the
implementation-specific BatchReadConfig for a specific getItemInfos
operation could be placed. I'm not sure this is a good idea for SPI, but
I'm thinking about it.

David



On Mon, 2008-02-11 at 18:49 +0100, Julian Reschke wrote:
> Julian Reschke wrote:
> > At the end of the day, what we should do is *measure* the performance of 
> > JCR2SPI compared to native implementations. I'll try to submit a few 
> > tests soon.
> > ...
> 
> OK, I've got tests (not polished) and numbers.
> 
> Scenario:
> 
> A collection /a/b with 500 members, each 1024 in size, content type 
> application/octet-stream.
> 
> Test code that obtains all members, checking content type, size, and 
> total number.
> 
> My store can do that in ~80ms.
> 
> Why doing it through SPI (with limited batch read support), it will take 
> ~1500ms.
> 
> Wrapping that with JCR2SPI, it takes around ~2700ms.
> 
> So it seems we need drastically remove the overhead introduced by the 
> SPI API.
> 
> Test code below:
> 
> 
>    private String createTestCollJcr(String p_parentpath, int p_members, 
> int p_size) throws Exception {
>      Repository l_repository = getRepository();
>      Session l_session = null;
> 
>      try {
>        l_session = l_repository.login(getCredentials());
> 
>        Node l_folder = null;
>        try {
>          l_folder = (Node)l_session.getItem(p_parentpath + "/bigcoll");
>        }
>        catch (RepositoryException ex) {
>          // nothing to do
>        }
> 
>        // delete when needed
>        if (l_folder != null) {
>          l_folder.remove();
>          l_session.save();
>        }
> 
>        Node l_parent = (Node)l_session.getItem(p_parentpath);
>        l_folder = l_parent.addNode("bigcoll", "nt:folder");
>        assertNotNull(l_folder);
> 
>        long l_cnt = 0;
> 
>        while (l_cnt < p_members) {
>          InputStream l_is = new BufferedInputStream(new 
> ContentGenerator(p_size), p_size);
>          Node l_new = l_folder.addNode("tst" + l_cnt, "nt:file");
>          Node l_cnew = l_new.addNode("jcr:content", "nt:resource");
>          l_cnew.setProperty("jcr:data", l_is);
>          l_cnew.setProperty("jcr:mimeType", "application/octet-stream");
>          l_session.save();
>          l_cnt += 1;
>        }
>      }
>      finally {
>        if (l_session != null) {
>          l_session.logout();
>        }
>      }
> 
>      return p_parentpath + "/bigcoll";
>    }
> 
> 
>    private static int BIGCOLLMEMBERS = 500;
>    private static int BIGCOLLMEMBERSIZE = 1024;
>    private static String BIGCOLLMIMETYPE = "application/octet-stream";
> 
>    public void testGetMembersSpi() throws Exception {
> 
>      String l_path = createTestColl(this.m_path, BIGCOLLMEMBERS, 
> BIGCOLLMEMBERSIZE);
> 
>      RepositoryService l_rs = getRepositoryService();
>      SessionInfo l_si = null;
> 
>      try {
>        l_si = l_rs.obtain(getCredentials(), null);
> 
>        long l_start = System.currentTimeMillis();
>        long l_cnt = 0;
> 
>        while (System.currentTimeMillis() - l_start < MS || l_cnt < 5) {
>          NodeId l_nid = TestPerf.computeNodeId(l_rs, l_si, l_path);
>          int l_members = 0;
>          for (Iterator<ChildInfo> l_it = 
> (Iterator<ChildInfo>)l_rs.getChildInfos(l_si, l_nid); l_it.hasNext(); ) {
>            ChildInfo l_c = l_it.next();
>            assertNotNull(l_c);
>            NodeId l_cnid = 
> l_rs.getIdFactory().createNodeId(l_c.getUniqueID());
>            NodeInfo l_node = null;
>            NodeInfo l_contentnode = null;
>            PropertyInfo l_mimetype = null;
>            PropertyInfo l_data = null;
>            Iterator l_iteminfos = l_rs.getItemInfos(l_si, l_cnid);
>            l_node = (NodeInfo)l_iteminfos.next();
>            assertNotNull(l_node);
> 
>            while (l_iteminfos.hasNext()) {
>              ItemInfo l_i = (ItemInfo)l_iteminfos.next();
>              if (l_i.getParentId().equals(l_node.getId()) && 
> NameConstants.JCR_CONTENT.equals(l_i.getName())) {
>                l_contentnode = (NodeInfo)l_i;
>              }
>              if (l_contentnode != null && 
> l_i.getParentId().equals(l_contentnode.getId()) && 
> NameConstants.JCR_MIMETYPE.equals(l_i.getName())) {
>                l_mimetype = (PropertyInfo)l_i;
>              }
>              if (l_contentnode != null && 
> l_i.getParentId().equals(l_contentnode.getId()) && 
> NameConstants.JCR_DATA.equals(l_i.getName())) {
>                l_data = (PropertyInfo)l_i;
>              }
>            }
> 
>            if (l_contentnode == null) {
>              // explicitly fetch the content node, it wasn't returned 
> with the parent
>              NodeId l_contentnodeid = 
> l_rs.getIdFactory().createNodeId(l_c.getUniqueID(), 
> l_rs.getPathFactory().create(NameConstants.JCR_CONTENT));
>              Iterator l_iteminfos2 = l_rs.getItemInfos(l_si, 
> l_contentnodeid);
>              l_contentnode = (NodeInfo)l_iteminfos2.next();
>              while (l_iteminfos2.hasNext()) {
>                ItemInfo l_i = (ItemInfo)l_iteminfos2.next();
>                if (l_i.getParentId().equals(l_contentnode.getId()) && 
> NameConstants.JCR_MIMETYPE.equals(l_i.getName())) {
>                  l_mimetype = (PropertyInfo)l_i;
>                }
>                if (l_i.getParentId().equals(l_contentnode.getId()) && 
> NameConstants.JCR_DATA.equals(l_i.getName())) {
>                  l_data = (PropertyInfo)l_i;
>                }
>              }
>            }
> 
>            assertNotNull(l_contentnode);
> 
>            if (l_mimetype == null) {
>              // explicitly fetch the mime type property, it wasn't 
> returned with the parent
>              PropertyId l_mimetypeid = 
> l_rs.getIdFactory().createPropertyId(l_contentnode.getId(), 
> NameConstants.JCR_MIMETYPE);
>              l_mimetype = l_rs.getPropertyInfo(l_si, l_mimetypeid);
>            }
> 
>            assertNotNull(l_mimetype);
>            assertEquals(BIGCOLLMIMETYPE, 
> l_mimetype.getValues()[0].getString());
> 
>            if (l_data == null) {
>              // explicitly fetch the mime type property, it wasn't 
> returned with the parent
>              PropertyId l_dataid = 
> l_rs.getIdFactory().createPropertyId(l_contentnode.getId(), 
> NameConstants.JCR_DATA);
>              l_data = l_rs.getPropertyInfo(l_si, l_dataid);
>            }
> 
>            assertNotNull(l_data);
>            assertEquals(BIGCOLLMEMBERSIZE, 
> l_data.getValues()[0].getLength());
> 
>            l_members += 1;
>          }
>          assertEquals(BIGCOLLMEMBERS, l_members);
>          l_cnt += 1;
>        }
> 
>        long l_elapsed = System.currentTimeMillis() - l_start;
> 
>        LOG.info(String.format("GetMembers - SPI: %.4fms per call (%d 
> iterations)", (double)l_elapsed / l_cnt, l_cnt));
> 
>      }
>      finally {
>        if (l_si != null) {
>          l_rs.dispose(l_si);
>        }
>      }
>    }
> 
>    public void testGetMembersJcr() throws Exception {
> 
>      String l_path = createTestCollJcr(this.m_path, BIGCOLLMEMBERS, 
> BIGCOLLMEMBERSIZE);
> 
>      Repository l_repository = getRepository();
>      Session l_session = null;
> 
>      try {
>        l_session = l_repository.login(getCredentials(), null);
> 
>        long l_start = System.currentTimeMillis();
>        long l_cnt = 0;
> 
>        while (System.currentTimeMillis() - l_start < MS || l_cnt < 5) {
>          Node l_dir = (Node)l_session.getItem(l_path);
>          assertNotNull(l_dir);
>          int l_members = 0;
>          for (NodeIterator l_it = l_dir.getNodes(); l_it.hasNext(); ) {
>            Node l_c = l_it.nextNode();
>            Node l_e = l_c.getNode("jcr:content");
>            String l_type = l_e.getProperty("jcr:mimeType").getString();
>            long l_length = l_e.getProperty("jcr:data").getLength();
>            assertTrue(l_c.isNode());
>            assertEquals(BIGCOLLMIMETYPE, l_type);
>            assertEquals(BIGCOLLMEMBERSIZE, l_length);
>            l_members += 1;
>          }
>          assertEquals(BIGCOLLMEMBERS, l_members);
>          l_session.refresh(false);
>          l_cnt += 1;
>        }
> 
>        long l_elapsed = System.currentTimeMillis() - l_start;
> 
>        LOG.info(String.format("GetMembers - JCR: %.4fms per call (%d 
> iterations)", (double)l_elapsed / l_cnt, l_cnt));
> 
>      }
>      finally {
>        if (l_session != null) {
>          l_session.logout();
>        }
>      }
>    }
> 
>    private class ContentGenerator extends InputStream {
> 
>      private long m_length;
>      private long m_position;
> 
>      public ContentGenerator(long p_length) {
>        this.m_length = p_length;
>        this.m_position = 0;
>      }
> 
>      public int read() {
> 
>        if (this.m_position++ < this.m_length) {
>          return 0;
>        }
>        else {
>          return -1;
>        }
>      }
>    }
> 

 
Visit Synchronica at GSMA Mobile World Congress, Barcelona, 11-14 Feb, Hall 2, Booth #2J25
 
 

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
Angela Schreiber wrote:
> Julian Reschke wrote:
> 
>> - change the return value of getChildInfos so that size information is 
>> there (Iterator -> RangeIterator or Collection)
> 
> didn't we have that discussion before? (JCR-1239)

We didn't come to a conclusion (it's an open issue). I'm looking for a 
solution to this problem; and if we are looking at extending/changing 
child entry handling, we IMHO should look at this issue as well.

> since jcr2spi will still have to process the Iterator in order
> to populate the hierarchy in order to properly respond to
> 'hasNode' or 'hasNodes' or 'getNodes', i don't see the
> benefit of the Iterator over Array.

It has the benefit that it can be lazily built.

> And Collection is wrong from my point of view.

Don't see why exactly, but RangeIterator would work as well. Collection 
has the (somewhat future) benefit to be Iterable.

> ..
>> or by just allowing both types in the Iterator).
> 
> oh no! i would oppose to that.

Agreed, I would also prefer inheritance here.

BR, Julian

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Angela Schreiber <an...@day.com>.
Julian Reschke wrote:

> - change the return value of getChildInfos so that size information is 
> there (Iterator -> RangeIterator or Collection)

didn't we have that discussion before? (JCR-1239)

since jcr2spi will still have to process the Iterator in order
to populate the hierarchy in order to properly respond to
'hasNode' or 'hasNodes' or 'getNodes', i don't see the
benefit of the Iterator over Array.
And Collection is wrong from my point of view.

> - change the values so that we can return ItemInfos instead of or in 
> addition to ChildInfos (either by refactoring the class hierarchy so 
> ItemInfo extends ChildInfo,

see answer to the corresponding suggestion-mail.

> or by just allowing both types in the Iterator).

oh no! i would oppose to that.

regards
angela


> BR, Julian
> 
> 
> Marcel Reutegger wrote:
>> Julian Reschke wrote:
>>> Why don't we change things so that ItemInfo *extends* ChildInfo, so 
>>> that SPI implementations can return NodeInfos as well? JCR2SPI could 
>>> then use that to update its internal cache, avoiding to refetch the 
>>> NodeInfos.
>>
>> I'd rather keep it the way it is right now, but introduce accessors to 
>> PropertyInfos and ChildInfos on the NodeInfo. As mentioned in a 
>> previous mail, an implementation would then provide the complete 
>> Property/ChildInfos or null (if it does not want to batch the 
>> information or it thinks it is too expensive to do so). I think this 
>> also nicely aligns with our experience with the bundle persistence 
>> manager.  the overhead of loading property states is usually quickly 
>> amortized because some of the properties of a node are read in most 
>> cases.
>>
>> regards
>>  marcel
>>
>>
>> Index: src/main/java/org/apache/jackrabbit/spi/NodeInfo.java
>> ===================================================================
>> --- src/main/java/org/apache/jackrabbit/spi/NodeInfo.java    (revision 
>> 617369)
>> +++ src/main/java/org/apache/jackrabbit/spi/NodeInfo.java    (working 
>> copy)
>> @@ -72,4 +72,31 @@
>>       * @see PropertyInfo#getId()
>>       */
>>      public Iterator getPropertyIds();
>> +
>> +    /**
>> +     * Returns the property infos of this node info or 
>> <code>null</code> if none
>> +     * are provided. If a non-null value is returned an 
>> implementation must
>> +     * return the complete list of property infos. If an 
>> implemetation returns
>> +     * <code>null</code> a client must use {@link #getPropertyIds()} in
>> +     * conjunction with
>> +     * {@link RepositoryService#getPropertyInfo(SessionInfo, 
>> PropertyId)} to
>> +     * retrieve property infos.
>> +     *
>> +     * @return Returns the property infos of this node info or 
>> <code>null</code>
>> +     *         if none are provided.
>> +     */
>> +    public PropertyInfo[] getPropertyInfos();
>> +
>> +    /**
>> +     * Returns the child infos of this node info or <code>null</code> 
>> if none
>> +     * are provided. If a non-null value is returned an 
>> implementation must
>> +     * return the complete list of child infos. If an implementation 
>> returns
>> +     * <code>null</code> a client must use
>> +     * {@link RepositoryService#getChildInfos(SessionInfo, NodeId)} 
>> to retrieve
>> +     * the child infos for this node info.
>> +     *
>> +     * @return the child infos of this node info or <code>null</code> 
>> if none
>> +     *         are provided.
>> +     */
>> +    public ChildInfo[] getChildInfos();
>>  }
>>
> 
> 


Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumes that changes in one session are immediately visible in differentsession

Posted by Marcel Reutegger <ma...@gmx.net>.
David Rauschenbach wrote:
> 2. Per Julian's suggestion, I also support the notion that it's a shame
> getQNodeTypeDefintions supports a filter, whereas a filter is not
> supported where it's needed most.

can you please elaborate on this? I understand we changed the node type 
definition handling at some point to lazily load them when needed and only 
retrieve them all if a client wants to iterate through them all using the 
respective JCR API method.

regards
  marcel

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumes that changes in one session are immediately visible in differentsession

Posted by David Rauschenbach <da...@synchronica.com>.
Right. I liked what I've been reading lately about returning null versus
returning an empty iterator.

David



On Fri, 2008-02-22 at 09:21 +0100, Marcel Reutegger wrote:
> David Rauschenbach wrote:
> > 3. Per Julian's other suggestion, I also support the notion of returning
> > MORE data, like number of children, or an IMAP-like \hasChildren flag
> > for Nodes, in case an SPI wanted to declare that a node definitely has
> > no children (so don't ask), or that there are children, but finding out
> > how many is going to cost you.
> 
> I think this can be achieved with the NodeInfo.getChildInfos(). If an 
> implementation was to set a \hasChildren flag to false it simply returns an 
> empty array.
> 
> regards
>   marcel

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumes that changes in one session are immediately visible in differentsession

Posted by Marcel Reutegger <ma...@gmx.net>.
David Rauschenbach wrote:
> 3. Per Julian's other suggestion, I also support the notion of returning
> MORE data, like number of children, or an IMAP-like \hasChildren flag
> for Nodes, in case an SPI wanted to declare that a node definitely has
> no children (so don't ask), or that there are children, but finding out
> how many is going to cost you.

I think this can be achieved with the NodeInfo.getChildInfos(). If an 
implementation was to set a \hasChildren flag to false it simply returns an 
empty array.

regards
  marcel

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumes that changes in one session are immediately visible in differentsession

Posted by David Rauschenbach <da...@synchronica.com>.
 
I'm going to weigh in somewhere in the middle of Julian and Marcel.

1. Per Marcel's mention of using the NodeInfo to announce whether an SPI
is returning ALL or NONE properties (nice grammar, eh), declaring that
through the node works for me. In essence, and SPI gets to state whether
it would return a level-0, level-1, or level-n response.

2. Per Julian's suggestion, I also support the notion that it's a shame
getQNodeTypeDefintions supports a filter, whereas a filter is not
supported where it's needed most.

3. Per Julian's other suggestion, I also support the notion of returning
MORE data, like number of children, or an IMAP-like \hasChildren flag
for Nodes, in case an SPI wanted to declare that a node definitely has
no children (so don't ask), or that there are children, but finding out
how many is going to cost you.

David



On Tue, 2008-02-19 at 15:06 +0100, Julian Reschke wrote:
> I'm not particularly attached to a specific solution, as long as we make 
> progress.
> 
> That being said, here are a few comments on the proposal below:
> 
> - what we don't have is an efficient way to query a set of properties, 
> with the ability for the SPI to return more; similarly to what we do 
> with getQNodeTypeDefinitions(),
> 
> - we still don't have a cheap way just to retrieve the number of children,
> 
> - we introduce a second way to do something, instead of fixing the one 
> we already have; so I'd prefer to change getChildInfos so that it can do 
> what we need (return ItemInfos in addition to ChildInfos, provide the 
> size without having to build the full result),
> 
> - need a way to throw exceptions.
> 
> Proposal:
> 
> - change the return value of getChildInfos so that size information is 
> there (Iterator -> RangeIterator or Collection)
> 
> - change the values so that we can return ItemInfos instead of or in 
> addition to ChildInfos (either by refactoring the class hierarchy so 
> ItemInfo extends ChildInfo, or by just allowing both types in the Iterator).
> 
> BR, Julian
> 
> 
> Marcel Reutegger wrote:
> > Julian Reschke wrote:
> >> Why don't we change things so that ItemInfo *extends* ChildInfo, so 
> >> that SPI implementations can return NodeInfos as well? JCR2SPI could 
> >> then use that to update its internal cache, avoiding to refetch the 
> >> NodeInfos.
> > 
> > I'd rather keep it the way it is right now, but introduce accessors to 
> > PropertyInfos and ChildInfos on the NodeInfo. As mentioned in a previous 
> > mail, an implementation would then provide the complete 
> > Property/ChildInfos or null (if it does not want to batch the 
> > information or it thinks it is too expensive to do so). I think this 
> > also nicely aligns with our experience with the bundle persistence 
> > manager.  the overhead of loading property states is usually quickly 
> > amortized because some of the properties of a node are read in most cases.
> > 
> > regards
> >  marcel
> > 
> > 
> > Index: src/main/java/org/apache/jackrabbit/spi/NodeInfo.java
> > ===================================================================
> > --- src/main/java/org/apache/jackrabbit/spi/NodeInfo.java    (revision 
> > 617369)
> > +++ src/main/java/org/apache/jackrabbit/spi/NodeInfo.java    (working copy)
> > @@ -72,4 +72,31 @@
> >       * @see PropertyInfo#getId()
> >       */
> >      public Iterator getPropertyIds();
> > +
> > +    /**
> > +     * Returns the property infos of this node info or 
> > <code>null</code> if none
> > +     * are provided. If a non-null value is returned an implementation 
> > must
> > +     * return the complete list of property infos. If an implemetation 
> > returns
> > +     * <code>null</code> a client must use {@link #getPropertyIds()} in
> > +     * conjunction with
> > +     * {@link RepositoryService#getPropertyInfo(SessionInfo, 
> > PropertyId)} to
> > +     * retrieve property infos.
> > +     *
> > +     * @return Returns the property infos of this node info or 
> > <code>null</code>
> > +     *         if none are provided.
> > +     */
> > +    public PropertyInfo[] getPropertyInfos();
> > +
> > +    /**
> > +     * Returns the child infos of this node info or <code>null</code> 
> > if none
> > +     * are provided. If a non-null value is returned an implementation 
> > must
> > +     * return the complete list of child infos. If an implementation 
> > returns
> > +     * <code>null</code> a client must use
> > +     * {@link RepositoryService#getChildInfos(SessionInfo, NodeId)} to 
> > retrieve
> > +     * the child infos for this node info.
> > +     *
> > +     * @return the child infos of this node info or <code>null</code> 
> > if none
> > +     *         are provided.
> > +     */
> > +    public ChildInfo[] getChildInfos();
> >  }
> > 
> 

 
Visit Synchronica at GSMA Mobile World Congress, Barcelona, 11-14 Feb, Hall 2, Booth #2J25
 
 

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
I'm not particularly attached to a specific solution, as long as we make 
progress.

That being said, here are a few comments on the proposal below:

- what we don't have is an efficient way to query a set of properties, 
with the ability for the SPI to return more; similarly to what we do 
with getQNodeTypeDefinitions(),

- we still don't have a cheap way just to retrieve the number of children,

- we introduce a second way to do something, instead of fixing the one 
we already have; so I'd prefer to change getChildInfos so that it can do 
what we need (return ItemInfos in addition to ChildInfos, provide the 
size without having to build the full result),

- need a way to throw exceptions.

Proposal:

- change the return value of getChildInfos so that size information is 
there (Iterator -> RangeIterator or Collection)

- change the values so that we can return ItemInfos instead of or in 
addition to ChildInfos (either by refactoring the class hierarchy so 
ItemInfo extends ChildInfo, or by just allowing both types in the Iterator).

BR, Julian


Marcel Reutegger wrote:
> Julian Reschke wrote:
>> Why don't we change things so that ItemInfo *extends* ChildInfo, so 
>> that SPI implementations can return NodeInfos as well? JCR2SPI could 
>> then use that to update its internal cache, avoiding to refetch the 
>> NodeInfos.
> 
> I'd rather keep it the way it is right now, but introduce accessors to 
> PropertyInfos and ChildInfos on the NodeInfo. As mentioned in a previous 
> mail, an implementation would then provide the complete 
> Property/ChildInfos or null (if it does not want to batch the 
> information or it thinks it is too expensive to do so). I think this 
> also nicely aligns with our experience with the bundle persistence 
> manager.  the overhead of loading property states is usually quickly 
> amortized because some of the properties of a node are read in most cases.
> 
> regards
>  marcel
> 
> 
> Index: src/main/java/org/apache/jackrabbit/spi/NodeInfo.java
> ===================================================================
> --- src/main/java/org/apache/jackrabbit/spi/NodeInfo.java    (revision 
> 617369)
> +++ src/main/java/org/apache/jackrabbit/spi/NodeInfo.java    (working copy)
> @@ -72,4 +72,31 @@
>       * @see PropertyInfo#getId()
>       */
>      public Iterator getPropertyIds();
> +
> +    /**
> +     * Returns the property infos of this node info or 
> <code>null</code> if none
> +     * are provided. If a non-null value is returned an implementation 
> must
> +     * return the complete list of property infos. If an implemetation 
> returns
> +     * <code>null</code> a client must use {@link #getPropertyIds()} in
> +     * conjunction with
> +     * {@link RepositoryService#getPropertyInfo(SessionInfo, 
> PropertyId)} to
> +     * retrieve property infos.
> +     *
> +     * @return Returns the property infos of this node info or 
> <code>null</code>
> +     *         if none are provided.
> +     */
> +    public PropertyInfo[] getPropertyInfos();
> +
> +    /**
> +     * Returns the child infos of this node info or <code>null</code> 
> if none
> +     * are provided. If a non-null value is returned an implementation 
> must
> +     * return the complete list of child infos. If an implementation 
> returns
> +     * <code>null</code> a client must use
> +     * {@link RepositoryService#getChildInfos(SessionInfo, NodeId)} to 
> retrieve
> +     * the child infos for this node info.
> +     *
> +     * @return the child infos of this node info or <code>null</code> 
> if none
> +     *         are provided.
> +     */
> +    public ChildInfo[] getChildInfos();
>  }
> 


Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Marcel Reutegger <ma...@gmx.net>.
Julian Reschke wrote:
> Why don't we change things so that ItemInfo *extends* ChildInfo, so that 
> SPI implementations can return NodeInfos as well? JCR2SPI could then use 
> that to update its internal cache, avoiding to refetch the NodeInfos.

I'd rather keep it the way it is right now, but introduce accessors to 
PropertyInfos and ChildInfos on the NodeInfo. As mentioned in a previous mail, 
an implementation would then provide the complete Property/ChildInfos or null 
(if it does not want to batch the information or it thinks it is too expensive 
to do so). I think this also nicely aligns with our experience with the bundle 
persistence manager.  the overhead of loading property states is usually quickly 
amortized because some of the properties of a node are read in most cases.

regards
  marcel


Index: src/main/java/org/apache/jackrabbit/spi/NodeInfo.java
===================================================================
--- src/main/java/org/apache/jackrabbit/spi/NodeInfo.java	(revision 617369)
+++ src/main/java/org/apache/jackrabbit/spi/NodeInfo.java	(working copy)
@@ -72,4 +72,31 @@
       * @see PropertyInfo#getId()
       */
      public Iterator getPropertyIds();
+
+    /**
+     * Returns the property infos of this node info or <code>null</code> if none
+     * are provided. If a non-null value is returned an implementation must
+     * return the complete list of property infos. If an implemetation returns
+     * <code>null</code> a client must use {@link #getPropertyIds()} in
+     * conjunction with
+     * {@link RepositoryService#getPropertyInfo(SessionInfo, PropertyId)} to
+     * retrieve property infos.
+     *
+     * @return Returns the property infos of this node info or <code>null</code>
+     *         if none are provided.
+     */
+    public PropertyInfo[] getPropertyInfos();
+
+    /**
+     * Returns the child infos of this node info or <code>null</code> if none
+     * are provided. If a non-null value is returned an implementation must
+     * return the complete list of child infos. If an implementation returns
+     * <code>null</code> a client must use
+     * {@link RepositoryService#getChildInfos(SessionInfo, NodeId)} to retrieve
+     * the child infos for this node info.
+     *
+     * @return the child infos of this node info or <code>null</code> if none
+     *         are provided.
+     */
+    public ChildInfo[] getChildInfos();
  }

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
Angela Schreiber wrote:
> Julian Reschke wrote:
>> Why don't we change things so that ItemInfo *extends* ChildInfo, 
> 
> that doesn't make sense from my point of view.
> 
> the usecase for the ChildInfo was to be able to build
> the HierarchyEntry for a Node where neither ids of
> the child-entries nor the references are required.
> 
> but the extra methods exposed by ChildInfo (and basically
> shared with NodeInfo as marcel pointed out) don't make
> sense for Properties.

OK, in that case let NodeInfo extend ChildInfo.

BR, Julian

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Angela Schreiber <an...@day.com>.
Julian Reschke wrote:
> Why don't we change things so that ItemInfo *extends* ChildInfo, 

that doesn't make sense from my point of view.

the usecase for the ChildInfo was to be able to build
the HierarchyEntry for a Node where neither ids of
the child-entries nor the references are required.

but the extra methods exposed by ChildInfo (and basically
shared with NodeInfo as marcel pointed out) don't make
sense for Properties.

angela

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
Hi,

I've been playing around with SPI-side caching of the result of 
getChildInfos() (keeping the associated NodeInfos in memory), and I can 
gain something like a factor of 8 for SPI access.

So getting back to Marcel's ideas:

Why don't we change things so that ItemInfo *extends* ChildInfo, so that 
SPI implementations can return NodeInfos as well? JCR2SPI could then use 
that to update its internal cache, avoiding to refetch the NodeInfos.

WDYT?

BR, Julian

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
Julian Reschke wrote:
> At the end of the day, what we should do is *measure* the performance of 
> JCR2SPI compared to native implementations. I'll try to submit a few 
> tests soon.
> ...

OK, I've got tests (not polished) and numbers.

Scenario:

A collection /a/b with 500 members, each 1024 in size, content type 
application/octet-stream.

Test code that obtains all members, checking content type, size, and 
total number.

My store can do that in ~80ms.

Why doing it through SPI (with limited batch read support), it will take 
~1500ms.

Wrapping that with JCR2SPI, it takes around ~2700ms.

So it seems we need drastically remove the overhead introduced by the 
SPI API.

Test code below:


   private String createTestCollJcr(String p_parentpath, int p_members, 
int p_size) throws Exception {
     Repository l_repository = getRepository();
     Session l_session = null;

     try {
       l_session = l_repository.login(getCredentials());

       Node l_folder = null;
       try {
         l_folder = (Node)l_session.getItem(p_parentpath + "/bigcoll");
       }
       catch (RepositoryException ex) {
         // nothing to do
       }

       // delete when needed
       if (l_folder != null) {
         l_folder.remove();
         l_session.save();
       }

       Node l_parent = (Node)l_session.getItem(p_parentpath);
       l_folder = l_parent.addNode("bigcoll", "nt:folder");
       assertNotNull(l_folder);

       long l_cnt = 0;

       while (l_cnt < p_members) {
         InputStream l_is = new BufferedInputStream(new 
ContentGenerator(p_size), p_size);
         Node l_new = l_folder.addNode("tst" + l_cnt, "nt:file");
         Node l_cnew = l_new.addNode("jcr:content", "nt:resource");
         l_cnew.setProperty("jcr:data", l_is);
         l_cnew.setProperty("jcr:mimeType", "application/octet-stream");
         l_session.save();
         l_cnt += 1;
       }
     }
     finally {
       if (l_session != null) {
         l_session.logout();
       }
     }

     return p_parentpath + "/bigcoll";
   }


   private static int BIGCOLLMEMBERS = 500;
   private static int BIGCOLLMEMBERSIZE = 1024;
   private static String BIGCOLLMIMETYPE = "application/octet-stream";

   public void testGetMembersSpi() throws Exception {

     String l_path = createTestColl(this.m_path, BIGCOLLMEMBERS, 
BIGCOLLMEMBERSIZE);

     RepositoryService l_rs = getRepositoryService();
     SessionInfo l_si = null;

     try {
       l_si = l_rs.obtain(getCredentials(), null);

       long l_start = System.currentTimeMillis();
       long l_cnt = 0;

       while (System.currentTimeMillis() - l_start < MS || l_cnt < 5) {
         NodeId l_nid = TestPerf.computeNodeId(l_rs, l_si, l_path);
         int l_members = 0;
         for (Iterator<ChildInfo> l_it = 
(Iterator<ChildInfo>)l_rs.getChildInfos(l_si, l_nid); l_it.hasNext(); ) {
           ChildInfo l_c = l_it.next();
           assertNotNull(l_c);
           NodeId l_cnid = 
l_rs.getIdFactory().createNodeId(l_c.getUniqueID());
           NodeInfo l_node = null;
           NodeInfo l_contentnode = null;
           PropertyInfo l_mimetype = null;
           PropertyInfo l_data = null;
           Iterator l_iteminfos = l_rs.getItemInfos(l_si, l_cnid);
           l_node = (NodeInfo)l_iteminfos.next();
           assertNotNull(l_node);

           while (l_iteminfos.hasNext()) {
             ItemInfo l_i = (ItemInfo)l_iteminfos.next();
             if (l_i.getParentId().equals(l_node.getId()) && 
NameConstants.JCR_CONTENT.equals(l_i.getName())) {
               l_contentnode = (NodeInfo)l_i;
             }
             if (l_contentnode != null && 
l_i.getParentId().equals(l_contentnode.getId()) && 
NameConstants.JCR_MIMETYPE.equals(l_i.getName())) {
               l_mimetype = (PropertyInfo)l_i;
             }
             if (l_contentnode != null && 
l_i.getParentId().equals(l_contentnode.getId()) && 
NameConstants.JCR_DATA.equals(l_i.getName())) {
               l_data = (PropertyInfo)l_i;
             }
           }

           if (l_contentnode == null) {
             // explicitly fetch the content node, it wasn't returned 
with the parent
             NodeId l_contentnodeid = 
l_rs.getIdFactory().createNodeId(l_c.getUniqueID(), 
l_rs.getPathFactory().create(NameConstants.JCR_CONTENT));
             Iterator l_iteminfos2 = l_rs.getItemInfos(l_si, 
l_contentnodeid);
             l_contentnode = (NodeInfo)l_iteminfos2.next();
             while (l_iteminfos2.hasNext()) {
               ItemInfo l_i = (ItemInfo)l_iteminfos2.next();
               if (l_i.getParentId().equals(l_contentnode.getId()) && 
NameConstants.JCR_MIMETYPE.equals(l_i.getName())) {
                 l_mimetype = (PropertyInfo)l_i;
               }
               if (l_i.getParentId().equals(l_contentnode.getId()) && 
NameConstants.JCR_DATA.equals(l_i.getName())) {
                 l_data = (PropertyInfo)l_i;
               }
             }
           }

           assertNotNull(l_contentnode);

           if (l_mimetype == null) {
             // explicitly fetch the mime type property, it wasn't 
returned with the parent
             PropertyId l_mimetypeid = 
l_rs.getIdFactory().createPropertyId(l_contentnode.getId(), 
NameConstants.JCR_MIMETYPE);
             l_mimetype = l_rs.getPropertyInfo(l_si, l_mimetypeid);
           }

           assertNotNull(l_mimetype);
           assertEquals(BIGCOLLMIMETYPE, 
l_mimetype.getValues()[0].getString());

           if (l_data == null) {
             // explicitly fetch the mime type property, it wasn't 
returned with the parent
             PropertyId l_dataid = 
l_rs.getIdFactory().createPropertyId(l_contentnode.getId(), 
NameConstants.JCR_DATA);
             l_data = l_rs.getPropertyInfo(l_si, l_dataid);
           }

           assertNotNull(l_data);
           assertEquals(BIGCOLLMEMBERSIZE, 
l_data.getValues()[0].getLength());

           l_members += 1;
         }
         assertEquals(BIGCOLLMEMBERS, l_members);
         l_cnt += 1;
       }

       long l_elapsed = System.currentTimeMillis() - l_start;

       LOG.info(String.format("GetMembers - SPI: %.4fms per call (%d 
iterations)", (double)l_elapsed / l_cnt, l_cnt));

     }
     finally {
       if (l_si != null) {
         l_rs.dispose(l_si);
       }
     }
   }

   public void testGetMembersJcr() throws Exception {

     String l_path = createTestCollJcr(this.m_path, BIGCOLLMEMBERS, 
BIGCOLLMEMBERSIZE);

     Repository l_repository = getRepository();
     Session l_session = null;

     try {
       l_session = l_repository.login(getCredentials(), null);

       long l_start = System.currentTimeMillis();
       long l_cnt = 0;

       while (System.currentTimeMillis() - l_start < MS || l_cnt < 5) {
         Node l_dir = (Node)l_session.getItem(l_path);
         assertNotNull(l_dir);
         int l_members = 0;
         for (NodeIterator l_it = l_dir.getNodes(); l_it.hasNext(); ) {
           Node l_c = l_it.nextNode();
           Node l_e = l_c.getNode("jcr:content");
           String l_type = l_e.getProperty("jcr:mimeType").getString();
           long l_length = l_e.getProperty("jcr:data").getLength();
           assertTrue(l_c.isNode());
           assertEquals(BIGCOLLMIMETYPE, l_type);
           assertEquals(BIGCOLLMEMBERSIZE, l_length);
           l_members += 1;
         }
         assertEquals(BIGCOLLMEMBERS, l_members);
         l_session.refresh(false);
         l_cnt += 1;
       }

       long l_elapsed = System.currentTimeMillis() - l_start;

       LOG.info(String.format("GetMembers - JCR: %.4fms per call (%d 
iterations)", (double)l_elapsed / l_cnt, l_cnt));

     }
     finally {
       if (l_session != null) {
         l_session.logout();
       }
     }
   }

   private class ContentGenerator extends InputStream {

     private long m_length;
     private long m_position;

     public ContentGenerator(long p_length) {
       this.m_length = p_length;
       this.m_position = 0;
     }

     public int read() {

       if (this.m_position++ < this.m_length) {
         return 0;
       }
       else {
         return -1;
       }
     }
   }


Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Angela Schreiber <an...@day.com>.
hi marcel

Marcel Reutegger wrote:
> Here's another idea:
> 
> introduce a method ChildInfo[] NodeInfo.getChildInfos(). The method 
> either returns:
> 
> - all child infos, which also gives the correct number of child nodes. 
> this may also mean that an empty array is returned to indicate there are 
> no child nodes.
> - null, to indicate that there are *lots* of child nodes and the method 
> RepositoryService.getChildInfos() with the iterator should be used.

i'd say that should work.
at least it would open the following possibility:

- create NodeEntry that knows the
   - IDs of its properties (existing: NodeInfo.getPropertyId)
   - IDs and the order of its child-nodes (new)

- avoid an extra call to getChildInfos in the first place
   if only the existence of child-nodes needs to be checked
   without taking a look at the child-nodes themselves.

- if NodeInfo.getChildInfos() was null, the same behaviour
   would apply as today.

Just for clarification: we can't use something like 
NodeInfo.getChildIDs(), because the id may neither reveal the
name nor the index.

marcel, can you open an issue for that?
angela

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
Marcel Reutegger wrote:
>>> Hmmm, does that mean a batch read should also be allowed to return 
>>> ChildInfo, with the restriction that it must be complete, when sent?
>>
>> That would be less expensive than returning ItemInfos for the 
>> children. But would it be useful?
> 
> Maybe the more interesting question is, how useful is it to have the 
> distinction between NodeInfo and ChildInfo?
> 
> ChildInfo is basically a stripped down NodeInfo. With little effort it 
> would even be possible to have NodeInfo extends ChildInfo. Not sure how 
> useful that is, but since we don't have that inheritance in code and at 
> the same time nearly a 100% overlap it makes me suspicious.

Yep.

> Here's another idea:
> 
> introduce a method ChildInfo[] NodeInfo.getChildInfos(). The method 
> either returns:
> 
> - all child infos, which also gives the correct number of child nodes. 
> this may also mean that an empty array is returned to indicate there are 
> no child nodes.
> - null, to indicate that there are *lots* of child nodes and the method 
> RepositoryService.getChildInfos() with the iterator should be used.

Having the method on NodeInfo would help keeping state; but my 
impression was that this design pattern was something we don't do. For 
instance, why wouldn't we also use it for retrieving properties (which 
has similar problems)?

I am also not sure why we just wouldn't want getChildInfos return 
something that can both provide members, the count, and be evaluated 
lazily when needed.

>>>> And how should the SPI implementation know that somebody *wants* to 
>>>> retrieve all children?
>>>
>>> I'm not sure I understand your question, because there is 
>>> RepositoryService.getChildInfos(). Do you mean something else?
>>
>> I was thinking in terms of PROPFIND Depth 0/1, where 1 includes 0. 
>> That is, it's possible to return information about the node and all 
>> it's children, saving yet another round trip. Which may not be 
>> sufficient justification.
>>
>> But returning ChildInfos will not be sufficient here, because there is 
>> no Batch aspect built in; thus, JCR2SPI still needs to fetch the 
>> ItemInfos for each child node in a separate call.
> 
> example content:
> 
> /a
>   /b
>   /c
> 
> Considering the above mentioned method an SPI implementation could return:
> NodeInfo(a, [ChildInfo(b), ChildInfo(c)]), NodeInfo(b, []), NodeInfo(c, [])
> 
> plus whatever properties are considered useful.
> 
> would that work?

Yes, in particular if we make NodeInfo extend ChildInfo, so no 
duplication is needed here.

BR, Julian


Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Angela Schreiber <an...@day.com>.
David Rauschenbach wrote:

> Back to the discussion: also worth mentioning is why a requested-depth 
> argument is missing from getItemInfos.

first: spi is not webdav.
second: at the last spi f2f (public invitation,
attendees: julian, marcel, jukka, myself)
we discussed the batch-read.

we decided:
- it's implementation specific how and if the impl allows
   configure the batch-read.
- we don't want the depth param, because the client simply
   doesn't know the nodetype of the node it is requesting
   and therefore cannot decide beforehand about the depth.
- the spi2jcr impl will be an example how the depth-to-nodetype
   configuration will be passed to the spi.
- we don't want this configuration neither being part of
   the SPI nor of JCR2SPI.

sorry for the short answer... just running away.
angela


Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumes that changes in one session are immediately visible in differentsession

Posted by Marcel Reutegger <ma...@gmx.net>.
David Rauschenbach wrote:
> Yeah, I think I hear what you're saying, and I understand that tight
> spot. I have to confess that I also use JCR2SPI as the primary client
> for SPI. But I have performance problems to solve, so that <n> JCR calls
> don't explode into <n>*6 SPI invocations, so I have to play all kinds of
> tricks now, to divine extra information from the client and/or server.

we should definitely improve that situation. so far we didn't invest too much 
time in performance analysis but first wanted to have an SPI stack that works 
correctly. I also think that it is now time to carefully analyze the message 
complexity for each JCR call and if needed change the SPI interfaces.

> I like SPI because of its simplicity. But performance is problematic, and
> outside of my control right now, [...]

please let us know what issues you have with the SPI stack. feedback is always 
welcome and gives us an additional view on the SPI that we probably overlooked 
in the past.

you are also welcome to gain control ;) if you have ideas how to improve the SPI 
stack or have patches, please let us know and we will be happy to consider them.

> [...] and I have caches and NodeTypeManagers in my SPIs, even though I am not
>  supposed to.

at some point node type definitions were requested extensively. JCR-1030 should 
have improved that situation.

> I also have my own PathElement comparators, to get SPI to work,
> so that 0 (unspecified) and 1 (default) indexes are considered equivalent,
> but that is another story...

Can you please describe in more detail why you had to do this?

regards
  marcel

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock testassumes that changes in one session are immediately visible in differentsession

Posted by David Rauschenbach <da...@synchronica.com>.
 
Yeah, I think I hear what you're saying, and I understand that tight
spot. I have to confess that I also use JCR2SPI as the primary client
for SPI. But I have performance problems to solve, so that <n> JCR calls
don't explode into <n>*6 SPI invocations, so I have to play all kinds of
tricks now, to divine extra information from the client and/or server.
BatchReadConfig is further evidence of the need for this kind of thing.

Anyway, I am also very shocked to see XPath and SQL being "distanced" in
JSR283. Because the point I was going to make, before that revelation,
was that an XPath or SQL query processor would be just the kind of
source of that extra intent that would allow the middleware harbor to
dispatch the correct number of ships across the channel.

It's interesting that JSR283 extolls the virtues of multiple access
paradigms (XPath / SQL and hierarchical traversal), but at the same time
makes it seem like it's headed towards hierarchical traversal only.

It reminds me of what drives me crazy about the database market. Back in
the ISAM days of xBase, Paradox, Raima, etc, we could all traverse tens
of thousands of records per-second. Then SQL caught on, and you could
then do both traversal and sets, which was like the best of both worlds.
Then a funny thing happened -- traversal disappeared, and was lost for
15 years, and data access became slow, since everything had to be
shoe-horned into SQL, or some ODBC/JDBC batch mode, which might require
turning off indexes or ACID protection! Whatever. Now mixed-mode
ISAM/SQL engines are slowly coming back, even though the database
engines are few and far between that go out of their way to support
both.

It's hard to imagine getting much done with an Exchange server via
WebDAV, without query support. Talk about needing a shoe-horn, doing
everything via iteration! I'm guessing content repository vendors are
steering the ship, when what we have here is a very good API that also
works for content middleware.

Conceptually, I think the best way to think of SPI is to still pretend
there's WebDAV in the front and back, with SPI in the middle. If there's
a client doing JCR hierarchical traversal via JCR2SPI, then you end up
with small high-freqeuncy SPI requests. If you do XPath or SQL over SPI,
then you end up with lesser fatter SPI requests, like a PROPFIND. Or if
you're proxying JCR content without a JCR client per-se at the front,
then you have an API that can relay the *content* of JSR170, without
needing to care too much about whether the front-end is JCR, WebDAV,
IMAP, RSS, or some other protocol endpoint.

>>From a middleware point of view, which I would call SPI's point of view,
you only need to have some idea of what you're dealing with, which is
nodes & properties, depths and namespaces, collections & filters,
queries and observation, and a session. It shouldn't matter whether JCR
is at the front end, or something else like a WebDAV proxy shaping more
specific requests.

That's just my 2 cents. I like SPI because of its simplicity. But
performance is problematic, and outside of my control right now, and I
have caches and NodeTypeManagers in my SPIs, even though I am not
supposed to. I also have my own PathElement comparators, to get SPI to
work, so that 0 (unspecified) and 1 (default) indexes are considered
equivalent, but that is another story...

David


On Fri, 2008-02-08 at 18:02 +0100, Marcel Reutegger wrote:
> David Rauschenbach wrote:
> > also worth mentioning is why a requested-depth argument is missing from
> > getItemInfos. It's just a little strange for the server to choose what to do,
> > or to have a pre-configured nodetype-specific batch strategy configured
> > there, when the client is where it's at, where it's known what's to be
> > requested.
> 
> our primary SPI client that we have in mind is jcr2spi. here we are in the same 
> tight spot. jcr2spi does not know in advance what properties a client will 
> request after it got a node. even if we had the ability in the SPI to pass a 
> hint, jcr2spi cannot make use of it in a reasonable way.
> 
> for jcr2spi there are only two patterns it can distinguish. a JCR client gets a 
> named item (getNode/Property()) or an iterator over items 
> (getNodes/Properties()). At least the latter should not result in individual 
> calls for each item.
> 
> regards
>   marcel

 
Visit Synchronica at GSMA Mobile World Congress, Barcelona, 11-14 Feb, Hall 2, Booth #2J25
 
 

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Marcel Reutegger <ma...@gmx.net>.
David Rauschenbach wrote:
> also worth mentioning is why a requested-depth argument is missing from
> getItemInfos. It's just a little strange for the server to choose what to do,
> or to have a pre-configured nodetype-specific batch strategy configured
> there, when the client is where it's at, where it's known what's to be
> requested.

our primary SPI client that we have in mind is jcr2spi. here we are in the same 
tight spot. jcr2spi does not know in advance what properties a client will 
request after it got a node. even if we had the ability in the SPI to pass a 
hint, jcr2spi cannot make use of it in a reasonable way.

for jcr2spi there are only two patterns it can distinguish. a JCR client gets a 
named item (getNode/Property()) or an iterator over items 
(getNodes/Properties()). At least the latter should not result in individual 
calls for each item.

regards
  marcel

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
David Rauschenbach wrote:
>  
> Distraction:
> 
> WebDAV notation is better for examples (nodes/collections end with slashes, items don't):
> 
> a/
>   b/
>   c/
>   p1
>   p2
> d/
> 
> Back to the discussion: also worth mentioning is why a requested-depth argument is missing from getItemInfos. It's just a little strange for the server to choose what to do, or to have a pre-configured nodetype-specific batch strategy configured there, when the client is where it's at, where it's known what's to be requested. Particularly if your front end is a WebDAV query, which you're going to run through SPI to some WebDAV or other back-end. No matter what the back-end is, you want to know the kinds of things you'd know if you were a WebDAV servlet, including not only ItemId, but also depth, and named property lists, or allprops.

Maybe there is something to learn from PROPFIND, after all -- not only 
the distinction between depth 0 and 1, but also the ability to specify 
in advance what properties (== child items) to return.

BR, Julian

RE: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by David Rauschenbach <Da...@SYNCHRONICA.COM>.
 
Distraction:

WebDAV notation is better for examples (nodes/collections end with slashes, items don't):

a/
  b/
  c/
  p1
  p2
d/

Back to the discussion: also worth mentioning is why a requested-depth argument is missing from getItemInfos. It's just a little strange for the server to choose what to do, or to have a pre-configured nodetype-specific batch strategy configured there, when the client is where it's at, where it's known what's to be requested. Particularly if your front end is a WebDAV query, which you're going to run through SPI to some WebDAV or other back-end. No matter what the back-end is, you want to know the kinds of things you'd know if you were a WebDAV servlet, including not only ItemId, but also depth, and named property lists, or allprops.

David
-----Original Message-----
From: Marcel Reutegger [mailto:marcel.reutegger@gmx.net]
Sent: Fri 2/8/2008 5:21 PM
To: dev@jackrabbit.apache.org
Subject: Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session
 
Julian Reschke wrote:
> Marcel Reutegger wrote:
>> I think your question reveals a design flaw in the batch read method 
>> RepositoryService.getItemInfos(). And it's not just about knowing all 
>> children, it's also about the order of nodes. Even if we are sure we 
>> got all children we still have to call getChildInfos() in case the 
>> parent supports orderable child nodes. Angela, please correct me, if 
>> I'm way off here...
> 
> Couldn't we just require that for orderable child nodes, their ItemInfos 
> are returned in order?

we could, but the issue whether child node infos are complete remains...

>> Hmmm, does that mean a batch read should also be allowed to return 
>> ChildInfo, with the restriction that it must be complete, when sent?
> 
> That would be less expensive than returning ItemInfos for the children. 
> But would it be useful?

Maybe the more interesting question is, how useful is it to have the distinction 
between NodeInfo and ChildInfo?

ChildInfo is basically a stripped down NodeInfo. With little effort it would 
even be possible to have NodeInfo extends ChildInfo. Not sure how useful that 
is, but since we don't have that inheritance in code and at the same time nearly 
a 100% overlap it makes me suspicious.

Here's another idea:

introduce a method ChildInfo[] NodeInfo.getChildInfos(). The method either returns:

- all child infos, which also gives the correct number of child nodes. this may 
also mean that an empty array is returned to indicate there are no child nodes.
- null, to indicate that there are *lots* of child nodes and the method 
RepositoryService.getChildInfos() with the iterator should be used.

>>> And how should the SPI implementation know that somebody *wants* to 
>>> retrieve all children?
>>
>> I'm not sure I understand your question, because there is 
>> RepositoryService.getChildInfos(). Do you mean something else?
> 
> I was thinking in terms of PROPFIND Depth 0/1, where 1 includes 0. That 
> is, it's possible to return information about the node and all it's 
> children, saving yet another round trip. Which may not be sufficient 
> justification.
> 
> But returning ChildInfos will not be sufficient here, because there is 
> no Batch aspect built in; thus, JCR2SPI still needs to fetch the 
> ItemInfos for each child node in a separate call.

example content:

/a
   /b
   /c

Considering the above mentioned method an SPI implementation could return:
NodeInfo(a, [ChildInfo(b), ChildInfo(c)]), NodeInfo(b, []), NodeInfo(c, [])

plus whatever properties are considered useful.

would that work?

regards
  marcel


 
Visit Synchronica at GSMA Mobile World Congress, Barcelona, 11-14 Feb, Hall 2, Booth #2J25
 
 

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Marcel Reutegger <ma...@gmx.net>.
Julian Reschke wrote:
> Marcel Reutegger wrote:
>> I think your question reveals a design flaw in the batch read method 
>> RepositoryService.getItemInfos(). And it's not just about knowing all 
>> children, it's also about the order of nodes. Even if we are sure we 
>> got all children we still have to call getChildInfos() in case the 
>> parent supports orderable child nodes. Angela, please correct me, if 
>> I'm way off here...
> 
> Couldn't we just require that for orderable child nodes, their ItemInfos 
> are returned in order?

we could, but the issue whether child node infos are complete remains...

>> Hmmm, does that mean a batch read should also be allowed to return 
>> ChildInfo, with the restriction that it must be complete, when sent?
> 
> That would be less expensive than returning ItemInfos for the children. 
> But would it be useful?

Maybe the more interesting question is, how useful is it to have the distinction 
between NodeInfo and ChildInfo?

ChildInfo is basically a stripped down NodeInfo. With little effort it would 
even be possible to have NodeInfo extends ChildInfo. Not sure how useful that 
is, but since we don't have that inheritance in code and at the same time nearly 
a 100% overlap it makes me suspicious.

Here's another idea:

introduce a method ChildInfo[] NodeInfo.getChildInfos(). The method either returns:

- all child infos, which also gives the correct number of child nodes. this may 
also mean that an empty array is returned to indicate there are no child nodes.
- null, to indicate that there are *lots* of child nodes and the method 
RepositoryService.getChildInfos() with the iterator should be used.

>>> And how should the SPI implementation know that somebody *wants* to 
>>> retrieve all children?
>>
>> I'm not sure I understand your question, because there is 
>> RepositoryService.getChildInfos(). Do you mean something else?
> 
> I was thinking in terms of PROPFIND Depth 0/1, where 1 includes 0. That 
> is, it's possible to return information about the node and all it's 
> children, saving yet another round trip. Which may not be sufficient 
> justification.
> 
> But returning ChildInfos will not be sufficient here, because there is 
> no Batch aspect built in; thus, JCR2SPI still needs to fetch the 
> ItemInfos for each child node in a separate call.

example content:

/a
   /b
   /c

Considering the above mentioned method an SPI implementation could return:
NodeInfo(a, [ChildInfo(b), ChildInfo(c)]), NodeInfo(b, []), NodeInfo(c, [])

plus whatever properties are considered useful.

would that work?

regards
  marcel

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
Marcel Reutegger wrote:
>> How would JCR2SPI that *all* children have been returned?
> 
> good question! I don't know. I'm not that familiar anymore with the 
> jcr2spi code. I remember a discussion with angela about a flag, which 
> indicates that child node entries are complete. But I guess we never 
> implemented this.

Yup.

But it's good that we finally agree that there's an unresolved issue 
here :-)-

> I think your question reveals a design flaw in the batch read method 
> RepositoryService.getItemInfos(). And it's not just about knowing all 
> children, it's also about the order of nodes. Even if we are sure we got 
> all children we still have to call getChildInfos() in case the parent 
> supports orderable child nodes. Angela, please correct me, if I'm way 
> off here...

Couldn't we just require that for orderable child nodes, their ItemInfos 
are returned in order?

> Hmmm, does that mean a batch read should also be allowed to return 
> ChildInfo, with the restriction that it must be complete, when sent?

That would be less expensive than returning ItemInfos for the children. 
But would it be useful?

>> And how should the SPI implementation know that somebody *wants* to 
>> retrieve all children?
> 
> I'm not sure I understand your question, because there is 
> RepositoryService.getChildInfos(). Do you mean something else?

I was thinking in terms of PROPFIND Depth 0/1, where 1 includes 0. That 
is, it's possible to return information about the node and all it's 
children, saving yet another round trip. Which may not be sufficient 
justification.

But returning ChildInfos will not be sufficient here, because there is 
no Batch aspect built in; thus, JCR2SPI still needs to fetch the 
ItemInfos for each child node in a separate call.

BR, Julian


Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Marcel Reutegger <ma...@gmx.net>.
Julian Reschke wrote:
> Marcel Reutegger wrote:
>> David Rauschenbach wrote:
>>> My main problem with SPI is that if I return depth=1 results from
>>> getItemInfos (a NodeInfo for each subfolder), JCR2SPI ends up 
>>> subsequently
>>> calling getChildInfos anyway, to find out what ALL the children are,
>>> regardless of the fact that I just returned what all the children are 
>>> in my
>>> GetItemInfos response.
>>
>> this is because jackrabbit-jcr2spi did not have a cache. angela 
>> recently committed the changes discussed in JCR-1011. this introduces 
>> a cache and should avoid the calls for the children if they were 
>> delivered in a previous call.
> 
> How would JCR2SPI that *all* children have been returned?

good question! I don't know. I'm not that familiar anymore with the jcr2spi 
code. I remember a discussion with angela about a flag, which indicates that 
child node entries are complete. But I guess we never implemented this.

I think your question reveals a design flaw in the batch read method 
RepositoryService.getItemInfos(). And it's not just about knowing all children, 
it's also about the order of nodes. Even if we are sure we got all children we 
still have to call getChildInfos() in case the parent supports orderable child 
nodes. Angela, please correct me, if I'm way off here...

Hmmm, does that mean a batch read should also be allowed to return ChildInfo, 
with the restriction that it must be complete, when sent?

> And how should 
> the SPI implementation know that somebody *wants* to retrieve all children?

I'm not sure I understand your question, because there is 
RepositoryService.getChildInfos(). Do you mean something else?

regards
  marcel

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
Marcel Reutegger wrote:
> David Rauschenbach wrote:
>> My main problem with SPI is that if I return depth=1 results from
>> getItemInfos (a NodeInfo for each subfolder), JCR2SPI ends up 
>> subsequently
>> calling getChildInfos anyway, to find out what ALL the children are,
>> regardless of the fact that I just returned what all the children are 
>> in my
>> GetItemInfos response.
> 
> this is because jackrabbit-jcr2spi did not have a cache. angela recently 
> committed the changes discussed in JCR-1011. this introduces a cache and 
> should avoid the calls for the children if they were delivered in a 
> previous call.

How would JCR2SPI that *all* children have been returned? And how should 
the SPI implementation know that somebody *wants* to retrieve all children?

> ...

BR, Julian

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Marcel Reutegger <ma...@gmx.net>.
David Rauschenbach wrote:
> My main problem with SPI is that if I return depth=1 results from
> getItemInfos (a NodeInfo for each subfolder), JCR2SPI ends up subsequently
> calling getChildInfos anyway, to find out what ALL the children are,
> regardless of the fact that I just returned what all the children are in my
> GetItemInfos response.

this is because jackrabbit-jcr2spi did not have a cache. angela recently 
committed the changes discussed in JCR-1011. this introduces a cache and should 
avoid the calls for the children if they were delivered in a previous call.

> It would also not hurt for depth=0 results to be able to return the
> equivalent of IMAP's \HasChildren flag. Because in that case, getItemInfos
> could return depth=0 results, and then in the case where there are no
> children, JCR2SPI could avoid the unnecessary getChildInfos when there are no
> results.

we already considered this, see: JCR-1239

cheers
  marcel

RE: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by David Rauschenbach <Da...@SYNCHRONICA.COM>.
 
My main problem with SPI is that if I return depth=1 results from getItemInfos (a NodeInfo for each subfolder), JCR2SPI ends up subsequently calling getChildInfos anyway, to find out what ALL the children are, regardless of the fact that I just returned what all the children are in my GetItemInfos response.

>From JCR2SPI's point of view, the problem seems to be that it had no way of knowing whether I returned SOME, or ALL, of the subfolders. In that sense, it's the Iterator return value types that I have a problem with. I would prefer returning a container, that could encapsulate the Iterator, but also allow for declaring formalized hints, and maybe even out-of-band data that JCR2SPI chooses to look at as hints.

And the hint JCR2SPI needs from getItemInfos is "response includes all children" (or "this response contains depth=1 results").

It would also not hurt for depth=0 results to be able to return the equivalent of IMAP's \HasChildren flag. Because in that case, getItemInfos could return depth=0 results, and then in the case where there are no children, JCR2SPI could avoid the unnecessary getChildInfos when there are no results.

David
-----Original Message-----
From: Marcel Reutegger [mailto:marcel.reutegger@gmx.net]
Sent: Thu 2/7/2008 3:49 PM
To: dev@jackrabbit.apache.org
Subject: Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session
 
Julian Reschke wrote:
> I think I understand batch read, and how JCR2SPI would use that. What I 
> don't see how it helps in this case.
> 
> An SPI implementation *could* return ItemInfos for all children when the 
> NodeInfo for a collection is fetched, but how would it know that anybody 
> wants to see the members?

Angela and I discussed this some time ago and we decided that for now we leave 
to up to the implementation. basically for simplicity. See also javadoc 
RepositoryService.getItemInfos().

 
Visit Synchronica at GSMA Mobile World Congress, Barcelona, 11-14 Feb, Hall 2, Booth #2J25
 
 

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Marcel Reutegger <ma...@gmx.net>.
Julian Reschke wrote:
> I think I understand batch read, and how JCR2SPI would use that. What I 
> don't see how it helps in this case.
> 
> An SPI implementation *could* return ItemInfos for all children when the 
> NodeInfo for a collection is fetched, but how would it know that anybody 
> wants to see the members?

Angela and I discussed this some time ago and we decided that for now we leave 
to up to the implementation. basically for simplicity. See also javadoc 
RepositoryService.getItemInfos().

>>> I have the feeling that we're optimizing for the wrong use case here.
>>>
>>> If we can't make *read* access efficient enough, we're in trouble. 
>>> And I really don't want to require every SPI implementation to 
>>> subscribe to events from the underlying store, in particular if it's 
>>> remote (think HTTP).
>>
>> that's why I don't even want to get into this business. but if an 
>> implementation wants to cache something it is responsible for 
>> maintaining it.
> 
> That's a broad statement.
> 
> JCR includes "refresh" for good reasons. Are you arguing that it's not 
> needed, and a JCR implementation is responsible for that as well?

It is may be needed however only at the upper level of a JCR implementation. 
Session.refresh() only has an effect on the current session and does not change 
the persistent state nor does it affect other sessions. Translating this into 
the SPI design where everything above the SPI is session session local 
(transient changes, namespace mappings, etc) the refresh IMO only belongs into 
this layer and not the SPI implementation where we rather deal with the 
persistent storage of items.

> I think that would be a fundamentally bad idea, because whether cache 
> information needs to be fresh depends on what the client does. There's 
> no way how the JCR or the SPI implementation would know.

I'm open to discuss this issue, but to me this is rather about a more 
intelligent batch read.

> If a client does a collection listing, asking for a limited set of 
> properties of the members (name, timestamps, mime type, length), it 
> really doesn't care much. However, the SPI implementation has no 
> knowledge about the context in which the information in the NodeInfo is 
> needed, and thus has no way to optimize the operation.

I agree, but this shouldn't be solved individually in each SPI implementation 
using a cache. To me it seems the batch read should be more intelligent and pass 
additional information what is actually needed. We might want to introduce 
something like BatchReadConfig into the SPI [1].

>>> JCR clients today can not rely on fresh session information unless 
>>> they do a refresh(), and it's unclear to me why we would require that 
>>> from an SPI implementation.
>>
>> it is a fundamental requirement that the SPI implementation provides 
>> the most up-to-date item that is available. the refresh semantic is 
>> only relevant in the context of jcr2spi but not the SPI itself.
> 
> Where does this requirement come from? Is it stated somewhere?

It's not stated explicitly, but the RepositoryService says:

"The RepositoryService interface defines methods used to retrieve information 
from the persistent layer of the repository as well as the methods that modify 
its persistent state."

And RepositoryService.getItemInfos() says:

"Method used to 'batch-read' from the persistent storage."

Note that both say 'persistent storage', which is why I understand there 
shouldn't be a cache in between that is stale.

> Did you 
> ever try to compare performance between native Jackrabbit, and an SPI 
> based solution for operations like the one mentioned above?

Yes I did, but the numbers very much depend on the setup. If there is a remoting 
in between the SPI based repository is significantly slower because there are 
lots of round-trips. If everything is in one process the difference is much 
smaller. The SPI calls however can be reduced significantly when the batch-read 
is configured properly and JCR-1011 is in use.

>> Again any call using a SessionInfo should return the most up-to-date 
>> item(s) that are requested.
> 
> Requiring this sounds nice in theory, but I'm *very* skeptic that it 
> works in practice.

That's why I wrote 'should' ;)

I think it does no harm if an SPI implementation provides an item that is 
slightly out of date, because the moment an item is delivered it may already be 
modified again by another session. An SPI client must be able to handle that 
situation. The InvalidItemStateException is used in that situation.

>>  > If the JCR client does call "refresh()", we really should pass that
>>  > information to SPI, either by a new method (which could be more
>>  > elaborate than just refresh() as mentioned by Angela), or [...]
>>
>> That's IMO a more relevant use case that we should consider rather 
>> than caching.
> 
> I'm not sure how this is a different use case, but I really don't care 
> for the motivation.
> 
> At the end of the day, what we should do is *measure* the performance of 
> JCR2SPI compared to native implementations. I'll try to submit a few 
> tests soon.

Some test we have already now. Just build jackrabbit and see the difference 
between jackrabbit-core and jackrabbit-jcr2spi. on my machine jackrabbit-core 
runs the api tests in 33 seconds while jackrabbit-jcr2spi runs them in 48 
seconds. That means the additional spi layers add 45% overhead.

regards
  marcel

[1] 
http://svn.apache.org/repos/asf/jackrabbit/tags/1.4/jackrabbit-spi2jcr/src/main/java/org/apache/jackrabbit/spi2jcr/BatchReadConfig.java

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
Marcel Reutegger wrote:
>> Example - obtaining a directory listing: SPI2JCR currently gets the 
>> NodeInfo for the collection, then gets the ChildInfo iterator, then 
>> for each NodeId of a child fetches that child's NodeInfo.
>>
>> For a collection of N members, this translates to N additional 
>> roundtrips to the store (with WebDAV, PROPFINDs on each child 
>> resource, although a single PROPFIND with Depth 1 would have been 
>> sufficient).
>>
>> It's not clear to me how it would be able to avoid this with the 
>> current SPI interfaces while disallowing SPI to cache.
> 
> see JCR-1011. we just have to commit the patch.

I think I understand batch read, and how JCR2SPI would use that. What I 
don't see how it helps in this case.

An SPI implementation *could* return ItemInfos for all children when the 
NodeInfo for a collection is fetched, but how would it know that anybody 
wants to see the members?

>> I have the feeling that we're optimizing for the wrong use case here.
>>
>> If we can't make *read* access efficient enough, we're in trouble. And 
>> I really don't want to require every SPI implementation to subscribe 
>> to events from the underlying store, in particular if it's remote 
>> (think HTTP).
> 
> that's why I don't even want to get into this business. but if an 
> implementation wants to cache something it is responsible for 
> maintaining it.

That's a broad statement.

JCR includes "refresh" for good reasons. Are you arguing that it's not 
needed, and a JCR implementation is responsible for that as well?

I think that would be a fundamentally bad idea, because whether cache 
information needs to be fresh depends on what the client does. There's 
no way how the JCR or the SPI implementation would know.

If a client does a collection listing, asking for a limited set of 
properties of the members (name, timestamps, mime type, length), it 
really doesn't care much. However, the SPI implementation has no 
knowledge about the context in which the information in the NodeInfo is 
needed, and thus has no way to optimize the operation.

>> JCR clients today can not rely on fresh session information unless 
>> they do a refresh(), and it's unclear to me why we would require that 
>> from an SPI implementation.
> 
> it is a fundamental requirement that the SPI implementation provides the 
> most up-to-date item that is available. the refresh semantic is only 
> relevant in the context of jcr2spi but not the SPI itself.

Where does this requirement come from? Is it stated somewhere? Did you 
ever try to compare performance between native Jackrabbit, and an SPI 
based solution for operations like the one mentioned above?

>> [...] or just discard the SessionInfo and get a fresh one.
> 
> that's contrary to how the SessionInfo is designed. It is meant to be 
> the result of a successful authentication. If it holds state information 
> that is relevant to the server (e.g. a cache, a JCR session, JDBC 
> connection, ...) it is the responsibility of the implementation to 
> maintain it. An SPI client does not need nor use that information directly.

I didn't claim it does.

> Again any call using a SessionInfo should return the most up-to-date 
> item(s) that are requested.

Requiring this sounds nice in theory, but I'm *very* skeptic that it 
works in practice.

>  > If the JCR client does call "refresh()", we really should pass that
>  > information to SPI, either by a new method (which could be more
>  > elaborate than just refresh() as mentioned by Angela), or [...]
> 
> That's IMO a more relevant use case that we should consider rather than 
> caching.

I'm not sure how this is a different use case, but I really don't care 
for the motivation.

At the end of the day, what we should do is *measure* the performance of 
JCR2SPI compared to native implementations. I'll try to submit a few 
tests soon.

BR, Julian

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Marcel Reutegger <ma...@gmx.net>.
Julian Reschke wrote:
> Marcel Reutegger wrote:
>> IMO an SPI implementation was never meant to cache anything. it is 
>> meant to be as stateless as possible and translate SPI calls into 
>> calls on the back-end. if an implementation reads more than it was 
>> asked for it may pass it to the client of the SPI and hope it will be 
>> cached there.
> 
> That would be good, but right now SPI doesn't support it everywhere 
> where it would be useful (and, afaik, JCR2SPI doesn't take advantage of 
> it).
 >
> Example - obtaining a directory listing: SPI2JCR currently gets the 
> NodeInfo for the collection, then gets the ChildInfo iterator, then for 
> each NodeId of a child fetches that child's NodeInfo.
> 
> For a collection of N members, this translates to N additional 
> roundtrips to the store (with WebDAV, PROPFINDs on each child resource, 
> although a single PROPFIND with Depth 1 would have been sufficient).
> 
> It's not clear to me how it would be able to avoid this with the current 
> SPI interfaces while disallowing SPI to cache.

see JCR-1011. we just have to commit the patch.

>>> Or do you expect SPI implementations to keep cached information 
>>> up-to-date by some kind of observation mechanism?
>>
>> yes, if there is a cache present the implementation should maintain 
>> the cache on its own without additional information from an SPI client.
> 
> I have the feeling that we're optimizing for the wrong use case here.
> 
> If we can't make *read* access efficient enough, we're in trouble. And I 
> really don't want to require every SPI implementation to subscribe to 
> events from the underlying store, in particular if it's remote (think 
> HTTP).

that's why I don't even want to get into this business. but if an implementation 
wants to cache something it is responsible for maintaining it.

> JCR clients today can not rely on fresh session information unless they 
> do a refresh(), and it's unclear to me why we would require that from an 
> SPI implementation.

it is a fundamental requirement that the SPI implementation provides the most 
up-to-date item that is available. the refresh semantic is only relevant in the 
context of jcr2spi but not the SPI itself.

> [...] or just discard the SessionInfo and get a fresh one.

that's contrary to how the SessionInfo is designed. It is meant to be the result 
of a successful authentication. If it holds state information that is relevant 
to the server (e.g. a cache, a JCR session, JDBC connection, ...) it is the 
responsibility of the implementation to maintain it. An SPI client does not need 
nor use that information directly.

Again any call using a SessionInfo should return the most up-to-date item(s) 
that are requested.

 > If the JCR client does call "refresh()", we really should pass that
 > information to SPI, either by a new method (which could be more
 > elaborate than just refresh() as mentioned by Angela), or [...]

That's IMO a more relevant use case that we should consider rather than caching.

regards
  marcel

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Angela Schreiber <an...@day.com>.
Julian Reschke wrote:

> If the JCR client does call "refresh()", we really should pass that 
> information to SPI, either by a new method (which could be more 
> elaborate than just refresh() as mentioned by Angela), or just discard 
> the SessionInfo and get a fresh one.

i wouldn't state, that we pass the refresh to the SPI.

but it would definitely make sense to me, to have the
ability to let the client find out, whether there are
changes at all. given the assumption that most of
the time nothing changes (that's what i always get
told) that would save the client quite some unnecessary
invalidation and communication with the spi.

if a SPI impl isn't able the provide that info, we have
the same situation as we have now. legal as well.

i wouldn't expect an SPI impl to hold caches either.
but if you want to use that "anychanges?" as hint for
whatever stuff inside the SPI... i wouldn't mind...

am i missing anything fundamental?
angela



Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
Marcel Reutegger wrote:
> Julian Reschke wrote:
>> The store the SPI implementation talks to internally may be on a 
>> separate machine, so verifying that something is up-to-date (for read 
>> access) would actually defy the caching in the first place, wouldn't it?
> 
> IMO an SPI implementation was never meant to cache anything. it is meant 
> to be as stateless as possible and translate SPI calls into calls on the 
> back-end. if an implementation reads more than it was asked for it may 
> pass it to the client of the SPI and hope it will be cached there.

That would be good, but right now SPI doesn't support it everywhere 
where it would be useful (and, afaik, JCR2SPI doesn't take advantage of it).

Example - obtaining a directory listing: SPI2JCR currently gets the 
NodeInfo for the collection, then gets the ChildInfo iterator, then for 
each NodeId of a child fetches that child's NodeInfo.

For a collection of N members, this translates to N additional 
roundtrips to the store (with WebDAV, PROPFINDs on each child resource, 
although a single PROPFIND with Depth 1 would have been sufficient).

It's not clear to me how it would be able to avoid this with the current 
SPI interfaces while disallowing SPI to cache.

>> Or do you expect SPI implementations to keep cached information 
>> up-to-date by some kind of observation mechanism?
> 
> yes, if there is a cache present the implementation should maintain the 
> cache on its own without additional information from an SPI client.

I have the feeling that we're optimizing for the wrong use case here.

If we can't make *read* access efficient enough, we're in trouble. And I 
really don't want to require every SPI implementation to subscribe to 
events from the underlying store, in particular if it's remote (think HTTP).

JCR clients today can not rely on fresh session information unless they 
do a refresh(), and it's unclear to me why we would require that from an 
SPI implementation.

If the JCR client does call "refresh()", we really should pass that 
information to SPI, either by a new method (which could be more 
elaborate than just refresh() as mentioned by Angela), or just discard 
the SessionInfo and get a fresh one.

BR, Julian


Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Marcel Reutegger <ma...@gmx.net>.
Julian Reschke wrote:
> The store the SPI implementation talks to internally may be on a 
> separate machine, so verifying that something is up-to-date (for read 
> access) would actually defy the caching in the first place, wouldn't it?

IMO an SPI implementation was never meant to cache anything. it is meant to be 
as stateless as possible and translate SPI calls into calls on the back-end. if 
an implementation reads more than it was asked for it may pass it to the client 
of the SPI and hope it will be cached there.

> Or do you expect SPI implementations to keep cached information 
> up-to-date by some kind of observation mechanism?

yes, if there is a cache present the implementation should maintain the cache on 
its own without additional information from an SPI client.

regards
  marcel

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
Angela Schreiber wrote:
> hi julian
> 
>> this issue made me think that we may be missing something in the SPI...
>>
>> So, is an SPI implementation allowed to internally cache things (for 
>> instance, with the SessionInfo implementation)? I would assume so (but 
>> maybe I'm wrong).
> 
>> If it is allowed to do that, shouldn't a Session.refresh() call be 
>> reflected in some SPI call, so that the SPI implementation can 
>> invalidate caches as well?
> 
> that sounds strange to me. if the SPI does some kind of caching
> it should rather communicate with the implementation below in
> order to update/refresh the cache.

The store the SPI implementation talks to internally may be on a 
separate machine, so verifying that something is up-to-date (for read 
access) would actually defy the caching in the first place, wouldn't it?

Or do you expect SPI implementations to keep cached information 
up-to-date by some kind of observation mechanism?

> ...

BR, Julian

Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Angela Schreiber <an...@day.com>.
hi julian

> this issue made me think that we may be missing something in the SPI...
> 
> So, is an SPI implementation allowed to internally cache things (for 
> instance, with the SessionInfo implementation)? I would assume so (but 
> maybe I'm wrong).

> If it is allowed to do that, shouldn't a Session.refresh() call be 
> reflected in some SPI call, so that the SPI implementation can 
> invalidate caches as well?

that sounds strange to me. if the SPI does some kind of caching
it should rather communicate with the implementation below in
order to update/refresh the cache.

see however JCR-1012 for potential improvement of
Session.refresh()... that could result in some extension
of the SPI. but at first glance propagating the 'refresh'
to the SPI however looks wrong to me.

am i missing something?
angela


> BR, Julian
> 


SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by Julian Reschke <ju...@gmx.de>.
Hi,

this issue made me think that we may be missing something in the SPI...

So, is an SPI implementation allowed to internally cache things (for 
instance, with the SessionInfo implementation)? I would assume so (but 
maybe I'm wrong).

If it is allowed to do that, shouldn't a Session.refresh() call be 
reflected in some SPI call, so that the SPI implementation can 
invalidate caches as well?

BR, Julian

[jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session

Posted by "Julian Reschke (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julian Reschke resolved JCR-1361.
---------------------------------

    Resolution: Fixed

Fixed with revision 618965.


> Lock test assumes that changes in one session are immediately visible in different session
> ------------------------------------------------------------------------------------------
>
>                 Key: JCR-1361
>                 URL: https://issues.apache.org/jira/browse/JCR-1361
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: jackrabbit-jcr-tests
>            Reporter: Julian Reschke
>            Assignee: Julian Reschke
>            Priority: Minor
>
> LockTest.testLogout() assumes that a change in one session (logging out, removing a session-scoped lock) is immediately visible in another session.
> Proposal: insert a 
>  n1.getSession().refresh(true);
> call before checking
>  assertFalse("node must not be locked", n1.isLocked());

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.