You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by John Chilton <jc...@gmail.com> on 2017/06/23 21:03:37 UTC

Using Jackrabbit in orchestration environment

We are running in an orchestration environment — either Mesos/Chronos/Marathon or Kubernetes.

Each docker container needs to join the Jackrabbit cluster for the lifetime of that container and then leave the Jackrabbit cluster when its work is complete.
When each container joins the Jackrabbit cluster it is assigned a unique cluster node id (repository.xml). We also have no upper bound on the number of our containers that may join the cluster at any given time. 

Will this “dynamic” clustering work or will we encounter issues? Is this ill-advised? or are there things we need to do beyond uniquely identify each cluster node. 
I Am trying to get ahead of issues that may arise when exercising this. Any thoughts at all would be appreciated. 

Thanks, 

-John

Re: Using Jackrabbit in orchestration environment

Posted by John Chilton <jc...@gmail.com>.

Thanks!

On Mon, Jun 26, 2017 at 3:55 AM, Galo Gimenez <ga...@gmail.com>
wrote:

> John,
>
> Large sets is when you have +100M nodes, blob size is not a factor unless
> you want full-text indexing. You can control which metadata fields get
> indexed, so - https://wiki.apache.org/jackrabbit/IndexingConfiguration -
> you need to setup this to a minimum for you application to work. I know
> very little a about Mesos, so can't comment.
>
>
> -- Galo
>
> On Sat, Jun 24, 2017 at 6:59 PM, Clay Ferguson <wc...@gmail.com> wrote:
>
> > Related to the updating of indexes. I'm working on a P2P capability which
> > will make a JCR Repo behave essentially like a distributed blockchain
> > database (i.e. "ledger"), where every node has a full copy of the
> DB/repo.
> > One capability required for that which i've already completed is the
> > implementation of a Merkle-Tree-like capability where I can tell if the
> > full content under any given subgraph is identical to that located on
> some
> > separate "peer" (network node), simply by comparing a SHA256 hash at both
> > nodes (each node being on totally independent repositories).
> >
> > The method for maintaining 'identical' copies of the repos (technically a
> > subgraph in each) will be to use the Merkle-tree to perform a "sync"
> doing
> > the "least effort" data transfers from peer to peer to perform the
> updates
> > (syncing). I may end up using an open source BitTorrent library to
> perform
> > the transmission of data between clients efficiently. So John, that kind
> of
> > technique (BitTorrent protocol) could theoretically help you distribute
> > index files across nodes rather than regenerating index files manually
> > every time you spin one up.
> >
> > I admit I haven't even researched "Clusters" (in jackrabbit), and I don't
> > know if those are sharded/federated, or whether they use a full "copy" on
> > each node. Interestingly, if you're a fan of blockchain, i will also be
> > using a public-key encryption system on this app to be able to
> authenticate
> > who added what content, by having each 'edit' (node property
> modification)
> > get hashed and then encrypted with the user's private key, and storing
> that
> > encrypted hash on the tree. So the entire app I am implementing will BE a
> > true blockchain, implemented as a layer built on top of the JCR.
> >
> > I think of what I'm doing as a "reference implementation" of what could
> > eventually become a blockchain specification for the JCR which will be an
> > extension to the JCR API specifically adding a blockchain protocol/layer
> on
> > top of JCR, and hopefully will become an Apache Project of it's own, and
> a
> > formal spec for how to use JCR to build out Blockchains. What I am doing
> is
> > along the lines of Ethereum, by making blockchain be a more generic,
> > accessible, reusable technology, but afaik Ethereum is not built on JCR,
> > and I believe in building on top of JCR. Anyone who understands Merkle
> > Trees AND the JCR and also is fully cognizant of blockchain would come to
> > this same conclusion, I believe.
> >
> > So I hope at least a couple of the guys who are well-connected in Adobe
> > will pass the word up the chain of command regarding this concept. In
> 10yrs
> > nobody will want to use a content repository that doesn't have the level
> of
> > 'trust' that can only come from a blockchain. I think in 10 to 20yrs even
> > RDBs will have 'blockchain verifiable' transactions as built-in
> functions,
> > in them also. But for now, a protocol layer on top of and separate from
> the
> > JCR that specifically does blockchain functionality seems like the next
> > step for blockchain technology and also for JCR. Who knows, maybe the
> world
> > is ready for Adobe to start a cryptocurrency of their own!? Perhaps that
> > would be the financial incentive to get them interested in this? I have
> > $10K for that ICO ready and waiting!!
> >
> > I've probably violated the terms and conditions of this mailing list and
> I
> > apologize if so. I went slightly beyond a reply to John.
> >
> > Best regards,
> > Clay Ferguson
> > https://github.com/Clay-Ferguson/meta64
> > wclayf@gmail.com
> >
> >
> >
> > On Sat, Jun 24, 2017 at 6:52 AM, John Chilton <jc...@gmail.com>
> wrote:
> >
> > > Thanks Galo, this is useful information.
> > >
> > > When you say, “large” working sets, how large is large — just looking
> for
> > > order of magnitude (Gig, Tera, Peta….)?
> > >
> > > Also, are you aware if any Mesos frameworks that offer similar
> > > capabilities as K8s stateful sets?
> > >
> > > Thanks again,
> > >
> > > -John
> > >
> > > > On Jun 23, 2017, at 6:37 PM, Galo Gimenez <ga...@gmail.com>
> > > wrote:
> > > >
> > > > One issue you will find on Jackrabbit is indexing, local storage is
> > > ephemeral so new nodes need to re index and on large working sets this
> > can
> > > take hours.
> > > >
> > > > Kubernetes introduced stateful sets, this allows you to have very
> > stable
> > > naming and storage inside the cluster, and a consistent ordering when
> > nodes
> > > are started -https://kubernetes.io/docs/concepts/workloads/
> > > controllers/statefulset/ <https://kubernetes.io/docs/
> concepts/workloads/
> > > controllers/statefulset/>.
> > > >
> > > > — Galo
> > > >
> > > >> On Jun 23, 2017, at 11:03 PM, John Chilton <jc...@gmail.com>
> > wrote:
> > > >>
> > > >> We are running in an orchestration environment — either
> > > Mesos/Chronos/Marathon or Kubernetes.
> > > >>
> > > >> Each docker container needs to join the Jackrabbit cluster for the
> > > lifetime of that container and then leave the Jackrabbit cluster when
> its
> > > work is complete.
> > > >> When each container joins the Jackrabbit cluster it is assigned a
> > > unique cluster node id (repository.xml). We also have no upper bound on
> > the
> > > number of our containers that may join the cluster at any given time.
> > > >>
> > > >> Will this “dynamic” clustering work or will we encounter issues? Is
> > > this ill-advised? or are there things we need to do beyond uniquely
> > > identify each cluster node.
> > > >> I Am trying to get ahead of issues that may arise when exercising
> > this.
> > > Any thoughts at all would be appreciated.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> -John
> > > >>
> > > >
> > >
> > >
> >
>
>
>
> --
> -- Galo
>

Re: Using Jackrabbit in orchestration environment

Posted by Galo Gimenez <ga...@gmail.com>.

John,

Large sets is when you have +100M nodes, blob size is not a factor unless
you want full-text indexing. You can control which metadata fields get
indexed, so - https://wiki.apache.org/jackrabbit/IndexingConfiguration -
you need to setup this to a minimum for you application to work. I know
very little a about Mesos, so can't comment.


-- Galo

On Sat, Jun 24, 2017 at 6:59 PM, Clay Ferguson <wc...@gmail.com> wrote:

> Related to the updating of indexes. I'm working on a P2P capability which
> will make a JCR Repo behave essentially like a distributed blockchain
> database (i.e. "ledger"), where every node has a full copy of the DB/repo.
> One capability required for that which i've already completed is the
> implementation of a Merkle-Tree-like capability where I can tell if the
> full content under any given subgraph is identical to that located on some
> separate "peer" (network node), simply by comparing a SHA256 hash at both
> nodes (each node being on totally independent repositories).
>
> The method for maintaining 'identical' copies of the repos (technically a
> subgraph in each) will be to use the Merkle-tree to perform a "sync" doing
> the "least effort" data transfers from peer to peer to perform the updates
> (syncing). I may end up using an open source BitTorrent library to perform
> the transmission of data between clients efficiently. So John, that kind of
> technique (BitTorrent protocol) could theoretically help you distribute
> index files across nodes rather than regenerating index files manually
> every time you spin one up.
>
> I admit I haven't even researched "Clusters" (in jackrabbit), and I don't
> know if those are sharded/federated, or whether they use a full "copy" on
> each node. Interestingly, if you're a fan of blockchain, i will also be
> using a public-key encryption system on this app to be able to authenticate
> who added what content, by having each 'edit' (node property modification)
> get hashed and then encrypted with the user's private key, and storing that
> encrypted hash on the tree. So the entire app I am implementing will BE a
> true blockchain, implemented as a layer built on top of the JCR.
>
> I think of what I'm doing as a "reference implementation" of what could
> eventually become a blockchain specification for the JCR which will be an
> extension to the JCR API specifically adding a blockchain protocol/layer on
> top of JCR, and hopefully will become an Apache Project of it's own, and a
> formal spec for how to use JCR to build out Blockchains. What I am doing is
> along the lines of Ethereum, by making blockchain be a more generic,
> accessible, reusable technology, but afaik Ethereum is not built on JCR,
> and I believe in building on top of JCR. Anyone who understands Merkle
> Trees AND the JCR and also is fully cognizant of blockchain would come to
> this same conclusion, I believe.
>
> So I hope at least a couple of the guys who are well-connected in Adobe
> will pass the word up the chain of command regarding this concept. In 10yrs
> nobody will want to use a content repository that doesn't have the level of
> 'trust' that can only come from a blockchain. I think in 10 to 20yrs even
> RDBs will have 'blockchain verifiable' transactions as built-in functions,
> in them also. But for now, a protocol layer on top of and separate from the
> JCR that specifically does blockchain functionality seems like the next
> step for blockchain technology and also for JCR. Who knows, maybe the world
> is ready for Adobe to start a cryptocurrency of their own!? Perhaps that
> would be the financial incentive to get them interested in this? I have
> $10K for that ICO ready and waiting!!
>
> I've probably violated the terms and conditions of this mailing list and I
> apologize if so. I went slightly beyond a reply to John.
>
> Best regards,
> Clay Ferguson
> https://github.com/Clay-Ferguson/meta64
> wclayf@gmail.com
>
>
>
> On Sat, Jun 24, 2017 at 6:52 AM, John Chilton <jc...@gmail.com> wrote:
>
> > Thanks Galo, this is useful information.
> >
> > When you say, “large” working sets, how large is large — just looking for
> > order of magnitude (Gig, Tera, Peta….)?
> >
> > Also, are you aware if any Mesos frameworks that offer similar
> > capabilities as K8s stateful sets?
> >
> > Thanks again,
> >
> > -John
> >
> > > On Jun 23, 2017, at 6:37 PM, Galo Gimenez <ga...@gmail.com>
> > wrote:
> > >
> > > One issue you will find on Jackrabbit is indexing, local storage is
> > ephemeral so new nodes need to re index and on large working sets this
> can
> > take hours.
> > >
> > > Kubernetes introduced stateful sets, this allows you to have very
> stable
> > naming and storage inside the cluster, and a consistent ordering when
> nodes
> > are started -https://kubernetes.io/docs/concepts/workloads/
> > controllers/statefulset/ <https://kubernetes.io/docs/concepts/workloads/
> > controllers/statefulset/>.
> > >
> > > — Galo
> > >
> > >> On Jun 23, 2017, at 11:03 PM, John Chilton <jc...@gmail.com>
> wrote:
> > >>
> > >> We are running in an orchestration environment — either
> > Mesos/Chronos/Marathon or Kubernetes.
> > >>
> > >> Each docker container needs to join the Jackrabbit cluster for the
> > lifetime of that container and then leave the Jackrabbit cluster when its
> > work is complete.
> > >> When each container joins the Jackrabbit cluster it is assigned a
> > unique cluster node id (repository.xml). We also have no upper bound on
> the
> > number of our containers that may join the cluster at any given time.
> > >>
> > >> Will this “dynamic” clustering work or will we encounter issues? Is
> > this ill-advised? or are there things we need to do beyond uniquely
> > identify each cluster node.
> > >> I Am trying to get ahead of issues that may arise when exercising
> this.
> > Any thoughts at all would be appreciated.
> > >>
> > >> Thanks,
> > >>
> > >> -John
> > >>
> > >
> >
> >
>



-- 
-- Galo

Re: Using Jackrabbit in orchestration environment

Posted by Clay Ferguson <wc...@gmail.com>.

Related to the updating of indexes. I'm working on a P2P capability which
will make a JCR Repo behave essentially like a distributed blockchain
database (i.e. "ledger"), where every node has a full copy of the DB/repo.
One capability required for that which i've already completed is the
implementation of a Merkle-Tree-like capability where I can tell if the
full content under any given subgraph is identical to that located on some
separate "peer" (network node), simply by comparing a SHA256 hash at both
nodes (each node being on totally independent repositories).

The method for maintaining 'identical' copies of the repos (technically a
subgraph in each) will be to use the Merkle-tree to perform a "sync" doing
the "least effort" data transfers from peer to peer to perform the updates
(syncing). I may end up using an open source BitTorrent library to perform
the transmission of data between clients efficiently. So John, that kind of
technique (BitTorrent protocol) could theoretically help you distribute
index files across nodes rather than regenerating index files manually
every time you spin one up.

I admit I haven't even researched "Clusters" (in jackrabbit), and I don't
know if those are sharded/federated, or whether they use a full "copy" on
each node. Interestingly, if you're a fan of blockchain, i will also be
using a public-key encryption system on this app to be able to authenticate
who added what content, by having each 'edit' (node property modification)
get hashed and then encrypted with the user's private key, and storing that
encrypted hash on the tree. So the entire app I am implementing will BE a
true blockchain, implemented as a layer built on top of the JCR.

I think of what I'm doing as a "reference implementation" of what could
eventually become a blockchain specification for the JCR which will be an
extension to the JCR API specifically adding a blockchain protocol/layer on
top of JCR, and hopefully will become an Apache Project of it's own, and a
formal spec for how to use JCR to build out Blockchains. What I am doing is
along the lines of Ethereum, by making blockchain be a more generic,
accessible, reusable technology, but afaik Ethereum is not built on JCR,
and I believe in building on top of JCR. Anyone who understands Merkle
Trees AND the JCR and also is fully cognizant of blockchain would come to
this same conclusion, I believe.

So I hope at least a couple of the guys who are well-connected in Adobe
will pass the word up the chain of command regarding this concept. In 10yrs
nobody will want to use a content repository that doesn't have the level of
'trust' that can only come from a blockchain. I think in 10 to 20yrs even
RDBs will have 'blockchain verifiable' transactions as built-in functions,
in them also. But for now, a protocol layer on top of and separate from the
JCR that specifically does blockchain functionality seems like the next
step for blockchain technology and also for JCR. Who knows, maybe the world
is ready for Adobe to start a cryptocurrency of their own!? Perhaps that
would be the financial incentive to get them interested in this? I have
$10K for that ICO ready and waiting!!

I've probably violated the terms and conditions of this mailing list and I
apologize if so. I went slightly beyond a reply to John.

Best regards,
Clay Ferguson
https://github.com/Clay-Ferguson/meta64
wclayf@gmail.com

On Sat, Jun 24, 2017 at 6:52 AM, John Chilton <jc...@gmail.com> wrote:

> Thanks Galo, this is useful information.
>
> When you say, “large” working sets, how large is large — just looking for
> order of magnitude (Gig, Tera, Peta….)?
>
> Also, are you aware if any Mesos frameworks that offer similar
> capabilities as K8s stateful sets?
>
> Thanks again,
>
> -John
>
> > On Jun 23, 2017, at 6:37 PM, Galo Gimenez <ga...@gmail.com>
> wrote:
> >
> > One issue you will find on Jackrabbit is indexing, local storage is
> ephemeral so new nodes need to re index and on large working sets this can
> take hours.
> >
> > Kubernetes introduced stateful sets, this allows you to have very stable
> naming and storage inside the cluster, and a consistent ordering when nodes
> are started -https://kubernetes.io/docs/concepts/workloads/
> controllers/statefulset/ <https://kubernetes.io/docs/concepts/workloads/
> controllers/statefulset/>.
> >
> > — Galo
> >
> >> On Jun 23, 2017, at 11:03 PM, John Chilton <jc...@gmail.com> wrote:
> >>
> >> We are running in an orchestration environment — either
> Mesos/Chronos/Marathon or Kubernetes.
> >>
> >> Each docker container needs to join the Jackrabbit cluster for the
> lifetime of that container and then leave the Jackrabbit cluster when its
> work is complete.
> >> When each container joins the Jackrabbit cluster it is assigned a
> unique cluster node id (repository.xml). We also have no upper bound on the
> number of our containers that may join the cluster at any given time.
> >>
> >> Will this “dynamic” clustering work or will we encounter issues? Is
> this ill-advised? or are there things we need to do beyond uniquely
> identify each cluster node.
> >> I Am trying to get ahead of issues that may arise when exercising this.
> Any thoughts at all would be appreciated.
> >>
> >> Thanks,
> >>
> >> -John
> >>
> >
>
>

Re: Using Jackrabbit in orchestration environment

Posted by John Chilton <jc...@gmail.com>.

Thanks Galo, this is useful information. 

When you say, “large” working sets, how large is large — just looking for order of magnitude (Gig, Tera, Peta….)? 

Also, are you aware if any Mesos frameworks that offer similar capabilities as K8s stateful sets?

Thanks again,

-John

> On Jun 23, 2017, at 6:37 PM, Galo Gimenez <ga...@gmail.com> wrote:
> 
> One issue you will find on Jackrabbit is indexing, local storage is ephemeral so new nodes need to re index and on large working sets this can take hours. 
> 
> Kubernetes introduced stateful sets, this allows you to have very stable naming and storage inside the cluster, and a consistent ordering when nodes are started -https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ <https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/>. 
> 
> — Galo
> 
>> On Jun 23, 2017, at 11:03 PM, John Chilton <jc...@gmail.com> wrote:
>> 
>> We are running in an orchestration environment — either Mesos/Chronos/Marathon or Kubernetes.
>> 
>> Each docker container needs to join the Jackrabbit cluster for the lifetime of that container and then leave the Jackrabbit cluster when its work is complete.
>> When each container joins the Jackrabbit cluster it is assigned a unique cluster node id (repository.xml). We also have no upper bound on the number of our containers that may join the cluster at any given time. 
>> 
>> Will this “dynamic” clustering work or will we encounter issues? Is this ill-advised? or are there things we need to do beyond uniquely identify each cluster node. 
>> I Am trying to get ahead of issues that may arise when exercising this. Any thoughts at all would be appreciated. 
>> 
>> Thanks, 
>> 
>> -John
>> 
>

Re: Using Jackrabbit in orchestration environment

Posted by Galo Gimenez <ga...@gmail.com>.

One issue you will find on Jackrabbit is indexing, local storage is ephemeral so new nodes need to re index and on large working sets this can take hours. 

Kubernetes introduced stateful sets, this allows you to have very stable naming and storage inside the cluster, and a consistent ordering when nodes are started -https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ <https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/>. 

— Galo

> On Jun 23, 2017, at 11:03 PM, John Chilton <jc...@gmail.com> wrote:
> 
> We are running in an orchestration environment — either Mesos/Chronos/Marathon or Kubernetes.
> 
> Each docker container needs to join the Jackrabbit cluster for the lifetime of that container and then leave the Jackrabbit cluster when its work is complete.
> When each container joins the Jackrabbit cluster it is assigned a unique cluster node id (repository.xml). We also have no upper bound on the number of our containers that may join the cluster at any given time. 
> 
> Will this “dynamic” clustering work or will we encounter issues? Is this ill-advised? or are there things we need to do beyond uniquely identify each cluster node. 
> I Am trying to get ahead of issues that may arise when exercising this. Any thoughts at all would be appreciated. 
> 
> Thanks, 
> 
> -John
>