You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Domenic DiTano <do...@ansys.com> on 2016/04/06 16:20:06 UTC

RE: Jackrabbit 2.10 vs Oak 1.2.7

Hi Marcel,

I uploaded all the source to github along with a summary spreadsheet.  I
would appreciate any time you have to review.

https://github.com/Domenic-Ansys/Jackrabbit2-Oak-Tests

As you stated the move is a non goal, but in comparison to Jackrabbit 2 I
am also finding in my tests that create, update, and copy are all faster
in Jackrabbit 2 (10k nodes).  Any input would be appreciated...

Also, will MySql will not be listed as "Experimental" at some point?

Thanks,
Domenic

-----Original Message-----
From: Marcel Reutegger [mailto:mreutegg@adobe.com]
Sent: Thursday, March 31, 2016 6:14 AM
To: oak-dev@jackrabbit.apache.org
Subject: Re: Jackrabbit 2.10 vs Oak 1.2.7

Hi Domenic,

On 30/03/16 14:34, "Domenic DiTano" wrote:
>"In contrast to Jackrabbit 2, a move of a large subtree is an expensive
>operation in Oak"
>So should I avoid doing a move of a large number of items using Oak?
>If we are using Oak then should we avoid operations with a large number
>of items in general?

In general it is fine to have a large change set with Oak. With Oak you
can even have change sets that do not fit into the heap.

>  As a FYI - there are other benefits for us to move to Oak, but our
>application uses executes JCR operations with a large number of items
>quite often.  I am worried about the performance.
>
>The move method is pretty simple - should I be doing it differently?
>
>public static long moveNodes(Session session, Node node, String
>newNodeName)
>throws Exception{
>    	long start = System.currentTimeMillis();
>	session.move(node.getPath(), "/"+newNodeName);
>             session.save();
>    	long end = System.currentTimeMillis();
>    	return end-start;
>}

No, this is fine. As mentioned earlier, with Oak a move operation is not
cheap and is basically implemented as copy to new location and delete at
the old location.

A cheap move operation was considered a non-goal when Oak was designed:
https://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20Jackr
a
bbit%203

Regards
 Marcel

Re: Jackrabbit 2.10 vs Oak 1.2.7

Posted by Michael Marth <mm...@adobe.com>.

Hi Domenic,

My point was that *very* roughly speaking Oak is expected to outperform JR for mixed read-write test cases, especially (but not only) in clustered deployments.

My 2nd point was: if you need to optimise pure write throughput then TarMK in Oak is expected to get best results.

Not knowing your application, I cannot judge if your test cases make sense.
Just wanted to comment on what can be expected.

Re
“FYI 1000 and 100000 node creation these are realistic use cases as our
application generates very large datasets (it is common to see 500gb/1000
files or more get added to a repo in one user session)."

Interesting. In my experience when you deploy DocumentMK (Mongo or RDBMK) and need to optimise for file upload throughput then it is beneficial to use the file system data store (FSDS), not the data stores within Mongo/RDB.
Btw: I had a quick look at your test case [1]. It uploads the same file again and again. Binaries are internally stored content-addressed, so the test case does not quite reflect what would go on IRL in your app. But also in JR data store was content-addressed, so I do not expect a big impact in terms of comparing JR and Oak.

Michael


[1] https://github.com/Domenic-Ansys/Jackrabbit2-Oak-Tests/blob/master/Oak-boot/src/main/java/com/test/oak/JCRTests.java




On 08/04/16 07:50, "Domenic DiTano" <do...@ansys.com> wrote:

>Hi Michael,
>
>First thank you for your response.
>
>My POV:
>"You are essentially testing how fast Oak or JR can put nodes into
>MySQL/Postgres/Mongo. IMO Oak’s design does not suggest that there should
>be fundamental differences between JR and Oak for this isolated case. (*)"
>
>Are you saying there should not be a difference for this test case between
>oak/jcr?  I understand your point that I am testing how fast Oak/JR put's
>things into a database, but from my perspective I am doing simple JCR
>operations like creating/updating/moving a reasonable number of nodes and
>JR seems to be performing significantly better.  I also ran the tests at
>100 nodes and in general Jackrabbit 2's performance in particular around
>copy, updates, and moves are generally better (I understand why for
>moves) .  Is this expected?
>
>FYI 1000 and 100000 node creation these are realistic use cases as our
>application generates very large datasets (it is common to see 500gb/1000
>files or more get added to a repo in one user session).
>
>"To explain:
>Re 1: in reality you would usually have many reading threads for each
>writing thread. Oak’s MVCC design caters for performance for such test
>cases.
>Can you point me to any test cases where I can see the configuration for
>something like this?
>
>Re 2: If you have many cluster nodes the MVCC becomes even more pronounced
>(not only different threads but different processes).
>Also, if you have observation listeners and many cluster nodes then I
>expect to see substantial differences between Oak and JR.
>
>Are there any performance metrics out there for Oak that use
>DocumentNodestore/Filedatastore that someone could share?  If I am
>understanding correctly, I need to add nodes/horizontally scale for Oak's
>performance to improve.  My overall goal here is to determine whether it
>benefits us to upgrade from JR, but is it fair to compare the two?  FYI our
>application can be deployed as one or mult nodes on premise or in a cloud.
>
>thanks,
>Domenic
>
>On Thu, Apr 7, 2016 at 11:04 AM, Michael Marth <mm...@adobe.com> wrote:
>
>> Hi Domenic,
>>
>> My POV:
>> You are essentially testing how fast Oak or JR can put nodes into
>> MySQL/Postgres/Mongo. IMO Oak’s design does not suggest that there should
>> be fundamental differences between JR and Oak for this isolated case. (*)
>>
>> However, where Oak is expected to outperform JR is when
>> 1) the test case reflects realistic usage patterns and
>> 2) horizontal scalability becomes a topic.
>>
>> To explain:
>> Re 1: in reality you would usually have many reading threads for each
>> writing thread. Oak’s MVCC design caters for performance for such test
>> cases.
>> Re 2: If you have many cluster nodes the MVCC becomes even more pronounced
>> (not only different threads but different processes). Also, if you have
>> observation listeners and many cluster nodes then I expect to see
>> substantial differences between Oak and JR.
>>
>> Cheers
>> Michael
>>
>> (*) with the notable exception of TarMK which I expect to outperform
>> anything on any test case ;)
>>
>>
>>
>> On 06/04/16 16:20, "Domenic DiTano" <do...@ansys.com> wrote:
>>
>> >Hi Marcel,
>> >
>> >I uploaded all the source to github along with a summary spreadsheet.  I
>> >would appreciate any time you have to review.
>> >
>> >https://github.com/Domenic-Ansys/Jackrabbit2-Oak-Tests
>> >
>> >As you stated the move is a non goal, but in comparison to Jackrabbit 2 I
>> >am also finding in my tests that create, update, and copy are all faster
>> >in Jackrabbit 2 (10k nodes).  Any input would be appreciated...
>> >
>> >Also, will MySql will not be listed as "Experimental" at some point?
>> >
>> >Thanks,
>> >Domenic
>> >
>> >
>> >-----Original Message-----
>> >From: Marcel Reutegger [mailto:mreutegg@adobe.com]
>> >Sent: Thursday, March 31, 2016 6:14 AM
>> >To: oak-dev@jackrabbit.apache.org
>> >Subject: Re: Jackrabbit 2.10 vs Oak 1.2.7
>> >
>> >Hi Domenic,
>> >
>> >On 30/03/16 14:34, "Domenic DiTano" wrote:
>> >>"In contrast to Jackrabbit 2, a move of a large subtree is an expensive
>> >>operation in Oak"
>> >>So should I avoid doing a move of a large number of items using Oak?
>> >>If we are using Oak then should we avoid operations with a large number
>> >>of items in general?
>> >
>> >In general it is fine to have a large change set with Oak. With Oak you
>> >can even have change sets that do not fit into the heap.
>> >
>> >>  As a FYI - there are other benefits for us to move to Oak, but our
>> >>application uses executes JCR operations with a large number of items
>> >>quite often.  I am worried about the performance.
>> >>
>> >>The move method is pretty simple - should I be doing it differently?
>> >>
>> >>public static long moveNodes(Session session, Node node, String
>> >>newNodeName)
>> >>throws Exception{
>> >>      long start = System.currentTimeMillis();
>> >>      session.move(node.getPath(), "/"+newNodeName);
>> >>             session.save();
>> >>      long end = System.currentTimeMillis();
>> >>      return end-start;
>> >>}
>> >
>> >No, this is fine. As mentioned earlier, with Oak a move operation is not
>> >cheap and is basically implemented as copy to new location and delete at
>> >the old location.
>> >
>> >A cheap move operation was considered a non-goal when Oak was designed:
>> >
>> https://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20Jackr
>> >a
>> >bbit%203
>> >
>> >
>> >Regards
>> > Marcel
>>
>
>
>
>-- 
>Domenic DiTano
>ANSYS, Inc.
>Tel: 1.724.514.3624
>domenic.ditano@ansys.com
>www.ansys.com

Re: Jackrabbit 2.10 vs Oak 1.2.7

Posted by Domenic DiTano <do...@ansys.com>.

Hi Michael,

First thank you for your response.

My POV:
"You are essentially testing how fast Oak or JR can put nodes into
MySQL/Postgres/Mongo. IMO Oak’s design does not suggest that there should
be fundamental differences between JR and Oak for this isolated case. (*)"

Are you saying there should not be a difference for this test case between
oak/jcr?  I understand your point that I am testing how fast Oak/JR put's
things into a database, but from my perspective I am doing simple JCR
operations like creating/updating/moving a reasonable number of nodes and
JR seems to be performing significantly better.  I also ran the tests at
100 nodes and in general Jackrabbit 2's performance in particular around
copy, updates, and moves are generally better (I understand why for
moves) .  Is this expected?

FYI 1000 and 100000 node creation these are realistic use cases as our
application generates very large datasets (it is common to see 500gb/1000
files or more get added to a repo in one user session).

"To explain:
Re 1: in reality you would usually have many reading threads for each
writing thread. Oak’s MVCC design caters for performance for such test
cases.
Can you point me to any test cases where I can see the configuration for
something like this?

Re 2: If you have many cluster nodes the MVCC becomes even more pronounced
(not only different threads but different processes).
Also, if you have observation listeners and many cluster nodes then I
expect to see substantial differences between Oak and JR.

Are there any performance metrics out there for Oak that use
DocumentNodestore/Filedatastore that someone could share?  If I am
understanding correctly, I need to add nodes/horizontally scale for Oak's
performance to improve.  My overall goal here is to determine whether it
benefits us to upgrade from JR, but is it fair to compare the two?  FYI our
application can be deployed as one or mult nodes on premise or in a cloud.

thanks,
Domenic

On Thu, Apr 7, 2016 at 11:04 AM, Michael Marth <mm...@adobe.com> wrote:

> Hi Domenic,
>
> My POV:
> You are essentially testing how fast Oak or JR can put nodes into
> MySQL/Postgres/Mongo. IMO Oak’s design does not suggest that there should
> be fundamental differences between JR and Oak for this isolated case. (*)
>
> However, where Oak is expected to outperform JR is when
> 1) the test case reflects realistic usage patterns and
> 2) horizontal scalability becomes a topic.
>
> To explain:
> Re 1: in reality you would usually have many reading threads for each
> writing thread. Oak’s MVCC design caters for performance for such test
> cases.
> Re 2: If you have many cluster nodes the MVCC becomes even more pronounced
> (not only different threads but different processes). Also, if you have
> observation listeners and many cluster nodes then I expect to see
> substantial differences between Oak and JR.
>
> Cheers
> Michael
>
> (*) with the notable exception of TarMK which I expect to outperform
> anything on any test case ;)
>
>
>
> On 06/04/16 16:20, "Domenic DiTano" <do...@ansys.com> wrote:
>
> >Hi Marcel,
> >
> >I uploaded all the source to github along with a summary spreadsheet.  I
> >would appreciate any time you have to review.
> >
> >https://github.com/Domenic-Ansys/Jackrabbit2-Oak-Tests
> >
> >As you stated the move is a non goal, but in comparison to Jackrabbit 2 I
> >am also finding in my tests that create, update, and copy are all faster
> >in Jackrabbit 2 (10k nodes).  Any input would be appreciated...
> >
> >Also, will MySql will not be listed as "Experimental" at some point?
> >
> >Thanks,
> >Domenic
> >
> >
> >-----Original Message-----
> >From: Marcel Reutegger [mailto:mreutegg@adobe.com]
> >Sent: Thursday, March 31, 2016 6:14 AM
> >To: oak-dev@jackrabbit.apache.org
> >Subject: Re: Jackrabbit 2.10 vs Oak 1.2.7
> >
> >Hi Domenic,
> >
> >On 30/03/16 14:34, "Domenic DiTano" wrote:
> >>"In contrast to Jackrabbit 2, a move of a large subtree is an expensive
> >>operation in Oak"
> >>So should I avoid doing a move of a large number of items using Oak?
> >>If we are using Oak then should we avoid operations with a large number
> >>of items in general?
> >
> >In general it is fine to have a large change set with Oak. With Oak you
> >can even have change sets that do not fit into the heap.
> >
> >>  As a FYI - there are other benefits for us to move to Oak, but our
> >>application uses executes JCR operations with a large number of items
> >>quite often.  I am worried about the performance.
> >>
> >>The move method is pretty simple - should I be doing it differently?
> >>
> >>public static long moveNodes(Session session, Node node, String
> >>newNodeName)
> >>throws Exception{
> >>      long start = System.currentTimeMillis();
> >>      session.move(node.getPath(), "/"+newNodeName);
> >>             session.save();
> >>      long end = System.currentTimeMillis();
> >>      return end-start;
> >>}
> >
> >No, this is fine. As mentioned earlier, with Oak a move operation is not
> >cheap and is basically implemented as copy to new location and delete at
> >the old location.
> >
> >A cheap move operation was considered a non-goal when Oak was designed:
> >
> https://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20Jackr
> >a
> >bbit%203
> >
> >
> >Regards
> > Marcel
>

-- 
Domenic DiTano
ANSYS, Inc.
Tel: 1.724.514.3624
domenic.ditano@ansys.com
www.ansys.com

Re: Jackrabbit 2.10 vs Oak 1.2.7

Posted by Michael Marth <mm...@adobe.com>.

Hi Domenic,

My POV:
You are essentially testing how fast Oak or JR can put nodes into MySQL/Postgres/Mongo. IMO Oak’s design does not suggest that there should be fundamental differences between JR and Oak for this isolated case. (*)

However, where Oak is expected to outperform JR is when
1) the test case reflects realistic usage patterns and
2) horizontal scalability becomes a topic.

To explain:
Re 1: in reality you would usually have many reading threads for each writing thread. Oak’s MVCC design caters for performance for such test cases.
Re 2: If you have many cluster nodes the MVCC becomes even more pronounced (not only different threads but different processes). Also, if you have observation listeners and many cluster nodes then I expect to see substantial differences between Oak and JR.

Cheers
Michael

(*) with the notable exception of TarMK which I expect to outperform anything on any test case ;)



On 06/04/16 16:20, "Domenic DiTano" <do...@ansys.com> wrote:

>Hi Marcel,
>
>I uploaded all the source to github along with a summary spreadsheet.  I
>would appreciate any time you have to review.
>
>https://github.com/Domenic-Ansys/Jackrabbit2-Oak-Tests
>
>As you stated the move is a non goal, but in comparison to Jackrabbit 2 I
>am also finding in my tests that create, update, and copy are all faster
>in Jackrabbit 2 (10k nodes).  Any input would be appreciated...
>
>Also, will MySql will not be listed as "Experimental" at some point?
>
>Thanks,
>Domenic
>
>
>-----Original Message-----
>From: Marcel Reutegger [mailto:mreutegg@adobe.com]
>Sent: Thursday, March 31, 2016 6:14 AM
>To: oak-dev@jackrabbit.apache.org
>Subject: Re: Jackrabbit 2.10 vs Oak 1.2.7
>
>Hi Domenic,
>
>On 30/03/16 14:34, "Domenic DiTano" wrote:
>>"In contrast to Jackrabbit 2, a move of a large subtree is an expensive
>>operation in Oak"
>>So should I avoid doing a move of a large number of items using Oak?
>>If we are using Oak then should we avoid operations with a large number
>>of items in general?
>
>In general it is fine to have a large change set with Oak. With Oak you
>can even have change sets that do not fit into the heap.
>
>>  As a FYI - there are other benefits for us to move to Oak, but our
>>application uses executes JCR operations with a large number of items
>>quite often.  I am worried about the performance.
>>
>>The move method is pretty simple - should I be doing it differently?
>>
>>public static long moveNodes(Session session, Node node, String
>>newNodeName)
>>throws Exception{
>>    	long start = System.currentTimeMillis();
>>	session.move(node.getPath(), "/"+newNodeName);
>>             session.save();
>>    	long end = System.currentTimeMillis();
>>    	return end-start;
>>}
>
>No, this is fine. As mentioned earlier, with Oak a move operation is not
>cheap and is basically implemented as copy to new location and delete at
>the old location.
>
>A cheap move operation was considered a non-goal when Oak was designed:
>https://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20Jackr
>a
>bbit%203
>
>
>Regards
> Marcel

Re: Jackrabbit 2.10 vs Oak 1.2.7

Posted by Marcel Reutegger <mr...@adobe.com>.

Hi Domenic

I apologize for the late reply, but now I finally had time
to look at your test.

The reason why Oak on MongoDB is so slow with your test is the
write concern that your test specifies when it constructs
the DocumentNodeStore. The test sets it to FSYNCED. This is
an appropriate write concern when you only have a single MongoDB
node but comes with a very high latency. In general MongoDB is
designed to run in production as a replica set and the recommended
write concern with this deployment would be MAJORITY.

More details why Oak on MongoDB performs badly with your test
is also available in OAK-3554 [0].

So, you should either reduce the journalCommitInterval in MongoDB
or test with a replica set and MAJORITY write concern. Both
should give you a significant speedup compared to your current
test setup.

Regards
 Marcel

[0] https://issues.apache.org/jira/browse/OAK-3554?focusedCommentId=14991306&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14991306


On 06/04/16 16:20, "Domenic DiTano" wrote:

Hi Marcel,

I uploaded all the source to github along with a summary spreadsheet.  I
would appreciate any time you have to review.

https://github.com/Domenic-Ansys/Jackrabbit2-Oak-Tests

As you stated the move is a non goal, but in comparison to Jackrabbit 2 I
am also finding in my tests that create, update, and copy are all faster
in Jackrabbit 2 (10k nodes).  Any input would be appreciated...

Also, will MySql will not be listed as "Experimental" at some point?

Thanks,
Domenic


-----Original Message-----
From: Marcel Reutegger [mailto:mreutegg@adobe.com]
Sent: Thursday, March 31, 2016 6:14 AM
To: oak-dev@jackrabbit.apache.org<ma...@jackrabbit.apache.org>
Subject: Re: Jackrabbit 2.10 vs Oak 1.2.7

Hi Domenic,

On 30/03/16 14:34, "Domenic DiTano" wrote:
"In contrast to Jackrabbit 2, a move of a large subtree is an expensive
operation in Oak"
So should I avoid doing a move of a large number of items using Oak?
If we are using Oak then should we avoid operations with a large number
of items in general?

In general it is fine to have a large change set with Oak. With Oak you
can even have change sets that do not fit into the heap.

  As a FYI - there are other benefits for us to move to Oak, but our
application uses executes JCR operations with a large number of items
quite often.  I am worried about the performance.

The move method is pretty simple - should I be doing it differently?

public static long moveNodes(Session session, Node node, String
newNodeName)
throws Exception{
     long start = System.currentTimeMillis();
session.move(node.getPath(), "/"+newNodeName);
             session.save();
     long end = System.currentTimeMillis();
     return end-start;
}

No, this is fine. As mentioned earlier, with Oak a move operation is not
cheap and is basically implemented as copy to new location and delete at
the old location.

A cheap move operation was considered a non-goal when Oak was designed:
https://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20Jackr
a
bbit%203


Regards
Marcel