You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Andrew Khoury <ak...@adobe.com> on 2014/08/07 21:45:22 UTC

How to get total nodes in Oak repo

Hi,
What is the quickest and most efficient way to get the total number of nodes in an Oak repository?  Is there a built in way or do I need to do a full traversal or query?
Thanks,
Andrew Khoury

Re: How to get total nodes in Oak repo

Posted by Andrew Khoury <ak...@adobe.com>.
Thanks I will use this improved script instead :)
-Andrew

On 8/13/14, 8:18 AM, "Chetan Mehrotra" <ch...@gmail.com> wrote:

>Hi Andrew,
>
>Your script would work.
>
>----
>total_count = 1
>countNodes = { n ->
>    n.getChildNodeNames()?.each {
>        def child = n.getChildNode(it);
>        total_count += 1;
>        //println it
>        countNodes(child)
>    }
>
>}
>
>countNodes(session.workingNode)
>println "Total nodes in tree ${session.workingPath}: ${total_count}"
>----
>
>However as noted in the javadoc for getChildNodeEntries its more
>performant to use getChildNodeEntries compared to getChildNodeNames ->
>getChildNode i.e. O(n) vs. O(n log n). So following script [1] (but
>bit more complex) might perform better
>
>----
>import com.google.common.base.Function
>import com.google.common.collect.TreeTraverser
>import org.apache.jackrabbit.oak.spi.state.NodeState
>
>import static com.google.common.collect.Iterables.transform
>
>def getChildCount(NodeState ns){
>    def traversor = {ns2 -> transform(ns2.childNodeEntries, {cne ->
>cne.nodeState} as Function)} as TreeTraverser
>    return traversor.preOrderTraversal(ns).size()
>}
>----
>
>From within Oak run following
>
>----
>Apache Jackrabbit Oak 1.1-SNAPSHOT
>Jackrabbit Oak Shell (Apache Jackrabbit Oak 1.1-SNAPSHOT, JVM: 1.7.0_55)
>Type ':help' or ':h' for help.
>--------------------------------------------------------------------------
>--------------------------------------------------------------------------
>---------------------------------------------------------
>/> :load 
>https://gist.githubusercontent.com/chetanmeh/2138c188a1bcc135eeb3/raw/getC
>hildCount.groovy
>/> getChildCount(session.workingNode)
>----
>
>Chetan Mehrotra
>[1] https://gist.github.com/chetanmeh/2138c188a1bcc135eeb3
>
>
>On Wed, Aug 13, 2014 at 8:06 PM, Andrew Khoury <ak...@adobe.com> wrote:
>> Thanks Chetan, this really helped.
>>
>> This was for a tar based deployment.  I want to count the the total
>>nodes
>> including hidden ones under /oak:index branch and all.  So I wrote an
>> oak-run groovy console script that counts all nodes under the current
>> working node:
>> https://gist.github.com/andrewmkhoury/c5588a6a4b57e7e0e593
>>
>>
>> Please let me know if you see any issues with this.
>> -Andrew
>>
>> On 8/8/14, 4:39 PM, "Andrew Khoury" <ak...@adobe.com> wrote:
>>
>>>Hi Chetan,
>>>How about for TarMK?  What is the quickest way to calculate total nodes?
>>>Thanks,
>>>Andrew
>>>
>>>On 8/7/14, 10:37 PM, "Chetan Mehrotra" <ch...@gmail.com>
>>>wrote:
>>>
>>>>At JCR level traversal is the only option. For Mongo based deployment
>>>>you can get a rough estimate via ds.nodes.stats() command.
>>>>
>>>>- count - This property provides an estimate of number of nodes
>>>>- It also includes the nodes which store the index data. Note that
>>>>these index are Oak indexes and are different from Mongo indexes
>>>>- It also includes nodes which are marked deleted but yet not garbage
>>>>collected
>>>>
>>>>$ mongo <server>:<port>/<db>
>>>>$ db.nodes.stats()
>>>>$ {
>>>>        "ns" : "aem-author.nodes",
>>>>        "count" : 593688,
>>>>        "size" : 453287536,
>>>>        "avgObjSize" : 763,
>>>>        "storageSize" : 629633024,
>>>>        "numExtents" : 16,
>>>>        "nindexes" : 5,
>>>>        "lastExtentSize" : 168742912,
>>>>        "paddingFactor" : 1,
>>>>        "systemFlags" : 0,
>>>>        "userFlags" : 1,
>>>>        "totalIndexSize" : 102437104,
>>>>        "indexSizes" : {
>>>>                "_id_" : 86902704,
>>>>                "_modified_-1" : 15027488,
>>>>                "_bin_1" : 449680,
>>>>                "_deletedOnce_1" : 24528,
>>>>                "_sdType_1" : 32704
>>>>        },
>>>>        "ok" : 1
>>>>}
>>>>Chetan Mehrotra
>>>>
>>>>
>>>>On Fri, Aug 8, 2014 at 1:15 AM, Andrew Khoury <ak...@adobe.com>
>>>>wrote:
>>>>> Hi,
>>>>> What is the quickest and most efficient way to get the total number
>>>>>of
>>>>>nodes in an Oak repository?  Is there a built in way or do I need to
>>>>>do
>>>>>a full traversal or query?
>>>>> Thanks,
>>>>> Andrew Khoury
>>>
>>


Re: How to get total nodes in Oak repo

Posted by Chetan Mehrotra <ch...@gmail.com>.
Hi Andrew,

Your script would work.

----
total_count = 1
countNodes = { n ->
    n.getChildNodeNames()?.each {
        def child = n.getChildNode(it);
        total_count += 1;
        //println it
        countNodes(child)
    }

}

countNodes(session.workingNode)
println "Total nodes in tree ${session.workingPath}: ${total_count}"
----

However as noted in the javadoc for getChildNodeEntries its more
performant to use getChildNodeEntries compared to getChildNodeNames ->
getChildNode i.e. O(n) vs. O(n log n). So following script [1] (but
bit more complex) might perform better

----
import com.google.common.base.Function
import com.google.common.collect.TreeTraverser
import org.apache.jackrabbit.oak.spi.state.NodeState

import static com.google.common.collect.Iterables.transform

def getChildCount(NodeState ns){
    def traversor = {ns2 -> transform(ns2.childNodeEntries, {cne ->
cne.nodeState} as Function)} as TreeTraverser
    return traversor.preOrderTraversal(ns).size()
}
----

>From within Oak run following

----
Apache Jackrabbit Oak 1.1-SNAPSHOT
Jackrabbit Oak Shell (Apache Jackrabbit Oak 1.1-SNAPSHOT, JVM: 1.7.0_55)
Type ':help' or ':h' for help.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/> :load https://gist.githubusercontent.com/chetanmeh/2138c188a1bcc135eeb3/raw/getChildCount.groovy
/> getChildCount(session.workingNode)
----

Chetan Mehrotra
[1] https://gist.github.com/chetanmeh/2138c188a1bcc135eeb3


On Wed, Aug 13, 2014 at 8:06 PM, Andrew Khoury <ak...@adobe.com> wrote:
> Thanks Chetan, this really helped.
>
> This was for a tar based deployment.  I want to count the the total nodes
> including hidden ones under /oak:index branch and all.  So I wrote an
> oak-run groovy console script that counts all nodes under the current
> working node:
> https://gist.github.com/andrewmkhoury/c5588a6a4b57e7e0e593
>
>
> Please let me know if you see any issues with this.
> -Andrew
>
> On 8/8/14, 4:39 PM, "Andrew Khoury" <ak...@adobe.com> wrote:
>
>>Hi Chetan,
>>How about for TarMK?  What is the quickest way to calculate total nodes?
>>Thanks,
>>Andrew
>>
>>On 8/7/14, 10:37 PM, "Chetan Mehrotra" <ch...@gmail.com> wrote:
>>
>>>At JCR level traversal is the only option. For Mongo based deployment
>>>you can get a rough estimate via ds.nodes.stats() command.
>>>
>>>- count - This property provides an estimate of number of nodes
>>>- It also includes the nodes which store the index data. Note that
>>>these index are Oak indexes and are different from Mongo indexes
>>>- It also includes nodes which are marked deleted but yet not garbage
>>>collected
>>>
>>>$ mongo <server>:<port>/<db>
>>>$ db.nodes.stats()
>>>$ {
>>>        "ns" : "aem-author.nodes",
>>>        "count" : 593688,
>>>        "size" : 453287536,
>>>        "avgObjSize" : 763,
>>>        "storageSize" : 629633024,
>>>        "numExtents" : 16,
>>>        "nindexes" : 5,
>>>        "lastExtentSize" : 168742912,
>>>        "paddingFactor" : 1,
>>>        "systemFlags" : 0,
>>>        "userFlags" : 1,
>>>        "totalIndexSize" : 102437104,
>>>        "indexSizes" : {
>>>                "_id_" : 86902704,
>>>                "_modified_-1" : 15027488,
>>>                "_bin_1" : 449680,
>>>                "_deletedOnce_1" : 24528,
>>>                "_sdType_1" : 32704
>>>        },
>>>        "ok" : 1
>>>}
>>>Chetan Mehrotra
>>>
>>>
>>>On Fri, Aug 8, 2014 at 1:15 AM, Andrew Khoury <ak...@adobe.com> wrote:
>>>> Hi,
>>>> What is the quickest and most efficient way to get the total number of
>>>>nodes in an Oak repository?  Is there a built in way or do I need to do
>>>>a full traversal or query?
>>>> Thanks,
>>>> Andrew Khoury
>>
>

Re: How to get total nodes in Oak repo

Posted by Andrew Khoury <ak...@adobe.com>.
Thanks Chetan, this really helped.

This was for a tar based deployment.  I want to count the the total nodes
including hidden ones under /oak:index branch and all.  So I wrote an
oak-run groovy console script that counts all nodes under the current
working node:
https://gist.github.com/andrewmkhoury/c5588a6a4b57e7e0e593


Please let me know if you see any issues with this.
-Andrew

On 8/8/14, 4:39 PM, "Andrew Khoury" <ak...@adobe.com> wrote:

>Hi Chetan,
>How about for TarMK?  What is the quickest way to calculate total nodes?
>Thanks,
>Andrew
>
>On 8/7/14, 10:37 PM, "Chetan Mehrotra" <ch...@gmail.com> wrote:
>
>>At JCR level traversal is the only option. For Mongo based deployment
>>you can get a rough estimate via ds.nodes.stats() command.
>>
>>- count - This property provides an estimate of number of nodes
>>- It also includes the nodes which store the index data. Note that
>>these index are Oak indexes and are different from Mongo indexes
>>- It also includes nodes which are marked deleted but yet not garbage
>>collected
>>
>>$ mongo <server>:<port>/<db>
>>$ db.nodes.stats()
>>$ {
>>        "ns" : "aem-author.nodes",
>>        "count" : 593688,
>>        "size" : 453287536,
>>        "avgObjSize" : 763,
>>        "storageSize" : 629633024,
>>        "numExtents" : 16,
>>        "nindexes" : 5,
>>        "lastExtentSize" : 168742912,
>>        "paddingFactor" : 1,
>>        "systemFlags" : 0,
>>        "userFlags" : 1,
>>        "totalIndexSize" : 102437104,
>>        "indexSizes" : {
>>                "_id_" : 86902704,
>>                "_modified_-1" : 15027488,
>>                "_bin_1" : 449680,
>>                "_deletedOnce_1" : 24528,
>>                "_sdType_1" : 32704
>>        },
>>        "ok" : 1
>>}
>>Chetan Mehrotra
>>
>>
>>On Fri, Aug 8, 2014 at 1:15 AM, Andrew Khoury <ak...@adobe.com> wrote:
>>> Hi,
>>> What is the quickest and most efficient way to get the total number of
>>>nodes in an Oak repository?  Is there a built in way or do I need to do
>>>a full traversal or query?
>>> Thanks,
>>> Andrew Khoury
>


Re: How to get total nodes in Oak repo

Posted by Andrew Khoury <ak...@adobe.com>.
Hi Chetan,
How about for TarMK?  What is the quickest way to calculate total nodes?
Thanks,
Andrew

On 8/7/14, 10:37 PM, "Chetan Mehrotra" <ch...@gmail.com> wrote:

>At JCR level traversal is the only option. For Mongo based deployment
>you can get a rough estimate via ds.nodes.stats() command.
>
>- count - This property provides an estimate of number of nodes
>- It also includes the nodes which store the index data. Note that
>these index are Oak indexes and are different from Mongo indexes
>- It also includes nodes which are marked deleted but yet not garbage
>collected
>
>$ mongo <server>:<port>/<db>
>$ db.nodes.stats()
>$ {
>        "ns" : "aem-author.nodes",
>        "count" : 593688,
>        "size" : 453287536,
>        "avgObjSize" : 763,
>        "storageSize" : 629633024,
>        "numExtents" : 16,
>        "nindexes" : 5,
>        "lastExtentSize" : 168742912,
>        "paddingFactor" : 1,
>        "systemFlags" : 0,
>        "userFlags" : 1,
>        "totalIndexSize" : 102437104,
>        "indexSizes" : {
>                "_id_" : 86902704,
>                "_modified_-1" : 15027488,
>                "_bin_1" : 449680,
>                "_deletedOnce_1" : 24528,
>                "_sdType_1" : 32704
>        },
>        "ok" : 1
>}
>Chetan Mehrotra
>
>
>On Fri, Aug 8, 2014 at 1:15 AM, Andrew Khoury <ak...@adobe.com> wrote:
>> Hi,
>> What is the quickest and most efficient way to get the total number of
>>nodes in an Oak repository?  Is there a built in way or do I need to do
>>a full traversal or query?
>> Thanks,
>> Andrew Khoury


Re: How to get total nodes in Oak repo

Posted by Chetan Mehrotra <ch...@gmail.com>.
At JCR level traversal is the only option. For Mongo based deployment
you can get a rough estimate via ds.nodes.stats() command.

- count - This property provides an estimate of number of nodes
- It also includes the nodes which store the index data. Note that
these index are Oak indexes and are different from Mongo indexes
- It also includes nodes which are marked deleted but yet not garbage collected

$ mongo <server>:<port>/<db>
$ db.nodes.stats()
$ {
        "ns" : "aem-author.nodes",
        "count" : 593688,
        "size" : 453287536,
        "avgObjSize" : 763,
        "storageSize" : 629633024,
        "numExtents" : 16,
        "nindexes" : 5,
        "lastExtentSize" : 168742912,
        "paddingFactor" : 1,
        "systemFlags" : 0,
        "userFlags" : 1,
        "totalIndexSize" : 102437104,
        "indexSizes" : {
                "_id_" : 86902704,
                "_modified_-1" : 15027488,
                "_bin_1" : 449680,
                "_deletedOnce_1" : 24528,
                "_sdType_1" : 32704
        },
        "ok" : 1
}
Chetan Mehrotra


On Fri, Aug 8, 2014 at 1:15 AM, Andrew Khoury <ak...@adobe.com> wrote:
> Hi,
> What is the quickest and most efficient way to get the total number of nodes in an Oak repository?  Is there a built in way or do I need to do a full traversal or query?
> Thanks,
> Andrew Khoury