You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Vivek Kamble <vi...@gmail.com> on 2019/11/27 14:28:14 UTC

Oak Repository Startup time issue

Hi Jackrabbit/Oak Team,



We are facing issue while repository startup,
As number of direct child nodes of Root node are huge in number.
The number of child node is close to million.

 Below is Repository startup code,



            final RDBOptions options =

                new
RDBOptions().tablePrefix(dbDetails.get(DB_TABLE_PREFIX)).dropTablesOnClose(

                    false);

            final DataSource ds =

                RDBDataSourceFactory.forJdbcUrl(

                    dbDetails.get("dbURL"),

                    dbDetails.get("dbUser"),

                    dbDetails.get("dbPassword"));



            final Properties properties = buildS3Properties(dbDetails);

            final S3DataStore s3DataStore = buildS3DataStore(properties);



            final DataStoreBlobStore dataStoreBlobStore = new
DataStoreBlobStore(s3DataStore);



            final Whiteboard wb = new DefaultWhiteboard();

            wb.register(

                BlobAccessProvider.class,

                (BlobAccessProvider) dataStoreBlobStore,

                properties);



            documentNodeStore =

                new RDBDocumentNodeStoreBuilder()

                    .setBlobStore(dataStoreBlobStore)

                    .setRDBConnection(ds, options)

                    .build();



            final long start = System.currentTimeMillis();

            repository = new Jcr(new
Oak(documentNodeStore)).with(wb).createRepository();

            final long end = System.currentTimeMillis();

            log.info("Total Time : " + (end - start));

            return repository;





Bheviour obsereved is that as the Repository size gets bigger the Startup
time is increased,

As we observed most of the time is spent on   call to following method,

                ChildNodeEntryIterator() {

            fetchMore();

        }

                                and its getting called from
MemoryNodeState.wrap(NodeState state)

So there are some questions,

1) Is there any guidlines/best practice to store child nodes under root?

2) How can we minimize the Repository startup time?

3) Does bundling of nodes will help in Repository startup?


Thanks N Regards,
Vivek

Re: Oak Repository Startup time issue

Posted by Julian Reschke <ju...@gmx.de>.
On 27.11.2019 18:30, Vivek Kamble wrote:
> 1) Why do you do that?____
> 
>                  Below is the tree Structure of Repository.____
> 
> ____
> 
>                  0:/Root Node____
> 
>                  |____
> 
>                  1:/Custom Child Folder____
> 
>                  |____
> 
>                  2:/nt:folder____
> 
>                  |____
> 
>                  3:/Custom nt:file____
> 
> ____
> 
>                  And as Repository grows day by day, The number of 
> Custom Child Folder, has been, close to million.____
> 
>                  The Subtree below, Custom Child folder is even huge.____
> 
> ____
> 
>                If its not the proper/recommended way, Could share some 
> guidlines/best practices.

It doesn't make sense at all to put millions of child nodes into the 
root of an hierarchical content tree.

I'd move this down, and also would try to add more intermediate nodes.

> __
> 
> ____
> 
> 2) How does it scale exactly?____
> 
>     Could you please define, What exactly you mean by Scaling?____

Does that start up time grow linearly with the number of nodes? Is it 
better? Is it worse?

> __ __
> 
> 3) Did you try whether putting them into a child folder changes things?____
> 
>     Yes, As the number of child node of Root increase, thus the Startup 
> time.

...and the result for the repo startup is?

Best regards, Julian

Re: Oak Repository Startup time issue

Posted by Vivek Kamble <vi...@gmail.com>.
1) Why do you do that?

                Below is the tree Structure of Repository.



                0:/Root Node

                |

                1:/Custom Child Folder

                |

                2:/nt:folder

                |

                3:/Custom nt:file



                And as Repository grows day by day, The number of Custom
Child Folder, has been, close to million.

                The Subtree below, Custom Child folder is even huge.



              If its not the proper/recommended way, Could share some
guidlines/best practices.



2) How does it scale exactly?

   Could you please define, What exactly you mean by Scaling?



3) Did you try whether putting them into a child folder changes things?
   Yes, As the number of child node of Root increase, thus the Startup time.n
Wed, 27 Nov, 2019, 9:41 PM Julian Reschke, <ju...@gmx.de> wrote:

> On 27.11.2019 15:28, Vivek Kamble wrote:
> > Hi Jackrabbit/Oak Team,
> >
> >
> >
> > We are facing issue while repository startup,
> > As number of direct child nodes of Root node are huge in number.
> > The number of child node is close to million.
> > ...
>
> 1) Why do you do that?
>
> 2) How does it scale exactly?
>
> 3) Did you try whether putting them into a child folder changes things?
>
> Best regards, Julian
>

Re: Oak Repository Startup time issue

Posted by Julian Reschke <ju...@gmx.de>.
On 27.11.2019 15:28, Vivek Kamble wrote:
> Hi Jackrabbit/Oak Team,
>
>
>
> We are facing issue while repository startup,
> As number of direct child nodes of Root node are huge in number.
> The number of child node is close to million.
> ...

1) Why do you do that?

2) How does it scale exactly?

3) Did you try whether putting them into a child folder changes things?

Best regards, Julian

Re: Oak Repository Startup time issue

Posted by Vivek Kamble <vi...@gmail.com>.
Hi Manfred,

Could you please guide me or share doc links/example.

On Wed, 27 Nov, 2019, 8:37 PM Manfred Baedke, <ma...@gmail.com>
wrote:

> Hi Vivik,
>
> > 1) Is there any guidlines/best practice to store child nodes under root?
> >
> > 2) How can we minimize the Repository startup time?
>
> Re 1): Yes. Don't do it. The startup time will increase when creating
> the initial root state.
> Re 2): It's all because of 1)
>
> Best regards,
> Manfred
>
> On 11/27/2019 3:28 PM, Vivek Kamble wrote:
> > Hi Jackrabbit/Oak Team,
> >
> >
> >
> > We are facing issue while repository startup,
> > As number of direct child nodes of Root node are huge in number.
> > The number of child node is close to million.
> >
> >   Below is Repository startup code,
> >
> >
> >
> >              final RDBOptions options =
> >
> >                  new
> >
> RDBOptions().tablePrefix(dbDetails.get(DB_TABLE_PREFIX)).dropTablesOnClose(
> >
> >                      false);
> >
> >              final DataSource ds =
> >
> >                  RDBDataSourceFactory.forJdbcUrl(
> >
> >                      dbDetails.get("dbURL"),
> >
> >                      dbDetails.get("dbUser"),
> >
> >                      dbDetails.get("dbPassword"));
> >
> >
> >
> >              final Properties properties = buildS3Properties(dbDetails);
> >
> >              final S3DataStore s3DataStore =
> buildS3DataStore(properties);
> >
> >
> >
> >              final DataStoreBlobStore dataStoreBlobStore = new
> > DataStoreBlobStore(s3DataStore);
> >
> >
> >
> >              final Whiteboard wb = new DefaultWhiteboard();
> >
> >              wb.register(
> >
> >                  BlobAccessProvider.class,
> >
> >                  (BlobAccessProvider) dataStoreBlobStore,
> >
> >                  properties);
> >
> >
> >
> >              documentNodeStore =
> >
> >                  new RDBDocumentNodeStoreBuilder()
> >
> >                      .setBlobStore(dataStoreBlobStore)
> >
> >                      .setRDBConnection(ds, options)
> >
> >                      .build();
> >
> >
> >
> >              final long start = System.currentTimeMillis();
> >
> >              repository = new Jcr(new
> > Oak(documentNodeStore)).with(wb).createRepository();
> >
> >              final long end = System.currentTimeMillis();
> >
> >              log.info("Total Time : " + (end - start));
> >
> >              return repository;
> >
> >
> >
> >
> >
> > Bheviour obsereved is that as the Repository size gets bigger the Startup
> > time is increased,
> >
> > As we observed most of the time is spent on   call to following method,
> >
> >                  ChildNodeEntryIterator() {
> >
> >              fetchMore();
> >
> >          }
> >
> >                                  and its getting called from
> > MemoryNodeState.wrap(NodeState state)
> >
> > So there are some questions,
> >
> > 1) Is there any guidlines/best practice to store child nodes under root?
> >
> > 2) How can we minimize the Repository startup time?
> >
> > 3) Does bundling of nodes will help in Repository startup?
> >
> >
> > Thanks N Regards,
> > Vivek
> >
>
>