You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Adrian Gschwend <ml...@netlabs.org> on 2012/07/19 12:13:32 UTC

Stanbol architecture questions

Hi group,

As Stephan mentioned in a previous post we will most probably use
Stanbol as a platform base for our FP7 project "FusePool". Check out
http://www.fusepool.eu/ for more information (BTW we are looking for one
full time employee as well, see the job offering on the page :)

Right now we have a closer look at Stanbol to make sure we can do what
we want in the end, currently we have the following questions/remarks:

Multi Tenancy:
As Stephane mentioned we need to be able to support multiple clients on
the same platform. Wikipedia has a pretty good explanation:

"With a multitenant architecture, a software application is designed to
virtually partition its data and configuration, and each client
organization works with a customized virtual application instance."

https://en.wikipedia.org/wiki/Multitenancy

The alternative is a multi-instance architecture which makes it much
harder to "blend" between two partitions, which is in our case a very
likely scenario.

We definitely need this for FusePool so we would appreciate if we could
work together on this point to make Stanbol more valuable for cloud
based environments and large scale applications. If I get Rupert
correctly this could already be done for some components but probably
not for all.

So our proposal is to have something like a Tenancy module which:
- lets us define partitions
- allows us to enable/disable components in the partitions
- allows us to have individual configurations per component in the
partition (like individual chains in the enhancer or individual rules)
- will take care that the components store the data in their own partition
- ultimately we need to be able to plug some form of ACLs on top of that
but I don't think this part will be the show stopper once the rest is
working

Transaction Management:
- is there currently some form of transaction management in Stanbol?
Could not find much documentation about this.

Application Server:
- We will most probably want to run Stanbol in JBoss, this is more a FYI
right now. Did anyone do that already? Should we expect problems? :-D

Jena Endpoint:
- It seems that Jena provides TDB or SDB for persistent storage. Anyone
knows if there are any SDB adaptors for Infinispan in the works?
Probably a question I should rather ask on the Jena list but I give it a
try :)

So far, would be nice if we could start a discussion on our remarks.

cu

Adrian



Re: Stanbol architecture questions

Posted by Stéphane Gamard <st...@searchbox.com>.
Good start from Adrian. I'd like to add a small comment about
transactions. While I am pretty certain that atomic components within
the enhancers are bound by certain Tx, I am wondering about Tx at the
entire chain level.

For example, what happens if one of the enhancer within a chain fails?
Should all enhancers "rollback"? Is that possible? Or could we think
of a stagin step between enhancements and persistence?

My 2 cents :p

_Stephane

Sent from my iPhone

On Jul 19, 2012, at 12:14 PM, Adrian Gschwend <ml...@netlabs.org> wrote:

> Hi group,
>
> As Stephan mentioned in a previous post we will most probably use
> Stanbol as a platform base for our FP7 project "FusePool". Check out
> http://www.fusepool.eu/ for more information (BTW we are looking for one
> full time employee as well, see the job offering on the page :)
>
> Right now we have a closer look at Stanbol to make sure we can do what
> we want in the end, currently we have the following questions/remarks:
>
> Multi Tenancy:
> As Stephane mentioned we need to be able to support multiple clients on
> the same platform. Wikipedia has a pretty good explanation:
>
> "With a multitenant architecture, a software application is designed to
> virtually partition its data and configuration, and each client
> organization works with a customized virtual application instance."
>
> https://en.wikipedia.org/wiki/Multitenancy
>
> The alternative is a multi-instance architecture which makes it much
> harder to "blend" between two partitions, which is in our case a very
> likely scenario.
>
> We definitely need this for FusePool so we would appreciate if we could
> work together on this point to make Stanbol more valuable for cloud
> based environments and large scale applications. If I get Rupert
> correctly this could already be done for some components but probably
> not for all.
>
> So our proposal is to have something like a Tenancy module which:
> - lets us define partitions
> - allows us to enable/disable components in the partitions
> - allows us to have individual configurations per component in the
> partition (like individual chains in the enhancer or individual rules)
> - will take care that the components store the data in their own partition
> - ultimately we need to be able to plug some form of ACLs on top of that
> but I don't think this part will be the show stopper once the rest is
> working
>
> Transaction Management:
> - is there currently some form of transaction management in Stanbol?
> Could not find much documentation about this.
>
> Application Server:
> - We will most probably want to run Stanbol in JBoss, this is more a FYI
> right now. Did anyone do that already? Should we expect problems? :-D
>
> Jena Endpoint:
> - It seems that Jena provides TDB or SDB for persistent storage. Anyone
> knows if there are any SDB adaptors for Infinispan in the works?
> Probably a question I should rather ask on the Jena list but I give it a
> try :)
>
> So far, would be nice if we could start a discussion on our remarks.
>
> cu
>
> Adrian
>
>

Re: Stanbol architecture questions

Posted by Adrian Gschwend <ml...@netlabs.org>.
On 19.07.12 18:58, Rupert Westenthaler wrote:

Hi Rupert,

> I am on vacation this week so just a short replay to this very
> interesting discussion.

Thanks for the great feedback! We are currently thinking about our
design so there is some more work needed before we start to address
Stanbol. But your remarks are very helpful and we will work on this :)

More later

cu

Adrian

Re: Stanbol architecture questions

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi all,

I am on vacation this week so just a short replay to this very
interesting discussion.

## RESTful API

To make the RESTful API Multi Tenancy one needs to teach a new trick
that allows to publish OSGI services such as Enhancement Chains,
Referenced Sites ... on different RESTful Endpoints. This could be
different HTTP services (e.g. different ports, HTTPs ...) or just
different root paths.

Configurations of services that are bound to the RESTful API (e.g.
EnhancementChains) could than provide additional metadata that tell
the RESTful service implementation on what RESTful Endpoints it should
be published.

Such a design would allow to use a single DBPedia instance for all
clients while client specific configurations would be only published
on the client specific RESTful Endpoint.

## Configuration

Adding those features would make the Stanbol configuration even more
complex, so we would definitely look into a way to make this easy to
configure. With the Linked Media Framework / Stanbol integration we
used the HTTP endpoint of the Felix Web Console to directly create
EnhancementEngine and EnhancementChain instances. This allowed to have
the configuration UI for Stanbol to be integrated within the LMF.

A similar approach could be also used for providing a Configuration
Interface for Apache Stanbol. However for those we would need to
decide if we want to base it on

* Configuration Admin Service
* Sling Installer [1]

So one possibility would be that each client gets its own
configuration interface. All configuration changes would also apply
the metadata needed to bind its configurations to his RESTful
endpoint. Global and multi client configurations/services would be
only possible via a admin-level configuration (e.g. direct access to
the Felix Webconsole, or by copying configurations to the directory of
the Sling File Installer).

## Components

I am sure that there are also Multi Tenancy related issues in the
design and implementation of specific components. E.g. some
assumptions of the Entityhub are not optimal for such an environment.
But AFAIK there are no blocking one

##Storage

The Stanbol Entityhub uses usually embedded Apache SolrCores for
storage. This is done by the ManagedSolrServer component. Multiple
ManagedSolrServer are supported. If you do not want to use the default
ManagedSolrServer instance you will need to explicitly specify the
server name in the configuration ( {server-name}:{core-name}). In
addition it is also possible to use ReferencedSolrServer - an external
manages SolrServer that does not run in the same JVM as Apache
Stanbol.

For Triple related storage Apache Clerezza is used. The default
Stanbol Launcher includes Apache Jena. So yes you will end-up storing
your RDF data in a Apache Jena TDB store. However this can be changed
by using different bundles in the Stanbol Launchers. There are other
peoples on this mailing list that know much more than I do about how
to change/control the Storage used by Apache Clerezza.


## Transactions

AFAIK there is no need/use for transactions in Apache Stanbol. However
there is a RESTful API for asynchronous Jobs - currently only used for
Reasoning. As this is a RESTful service this could hopefully be solved
as suggested for the RESTful API.

best
Rupert




[1] http://sling.apache.org/site/osgi-installer.html

The addition of  Multi Tenancy could be a good

With the recent LMF version we have implemented




On Thu, Jul 19, 2012 at 1:56 PM, Fabian Christ
<ch...@googlemail.com> wrote:
> Hi,
>
> 2012/7/19 Adrian Gschwend <ml...@netlabs.org>:
>> What would be the way to go for extending the current
>> components to introduce our requirements? Should we fork it first or do
>> you prefer other ways of working on it?
>
> All discussions regarding fundamental code changes should happen on
> this dev list. At Apache we follow the credo that if something was not
> discussed on the list, then it did not happen ;) So start the
> discussion threads that you need on this list.
>
> Then you should open separated JIRA issues for the things discussed on
> the list. Having a JIRA issue, you are able to attach patch files with
> your code changes to that issue. Do not send the patches to this list
> as large attachments are not supported. A committer will take that
> patch and apply it after a review to the code base. This is a rather
> time consuming process. So after several patches and when the
> community believes that you contribute good patches, you may be
> invited to become a committer. That is not so hard as it may sound.
> This is just a small barrier to see if people are really interested in
> contributing to the project. Committer are always individual persons
> and never organisations or a group of people.
>
> Best,
>  - Fabian
>
> --
> Fabian
> http://twitter.com/fctwitt



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Stanbol architecture questions

Posted by Fabian Christ <ch...@googlemail.com>.
Hi,

2012/7/19 Adrian Gschwend <ml...@netlabs.org>:
> What would be the way to go for extending the current
> components to introduce our requirements? Should we fork it first or do
> you prefer other ways of working on it?

All discussions regarding fundamental code changes should happen on
this dev list. At Apache we follow the credo that if something was not
discussed on the list, then it did not happen ;) So start the
discussion threads that you need on this list.

Then you should open separated JIRA issues for the things discussed on
the list. Having a JIRA issue, you are able to attach patch files with
your code changes to that issue. Do not send the patches to this list
as large attachments are not supported. A committer will take that
patch and apply it after a review to the code base. This is a rather
time consuming process. So after several patches and when the
community believes that you contribute good patches, you may be
invited to become a committer. That is not so hard as it may sound.
This is just a small barrier to see if people are really interested in
contributing to the project. Committer are always individual persons
and never organisations or a group of people.

Best,
 - Fabian

-- 
Fabian
http://twitter.com/fctwitt

Re: Stanbol architecture questions

Posted by Adrian Gschwend <ml...@netlabs.org>.
On 19.07.12 13:06, Fabian Christ wrote:

Hi Fabian,

> Stanbol in its current stage is not multitenant. It is much more like
> a multi-instance architecture.

ok that's the impression we had.

> Maybe you should have a look at the components that you may need for
> your project and then we have to think about ways to make them
> multitenant. The service architecture is nice for cloud scenarios but
> for the multitenant aspect we have to think about the used storage
> solutions of different components. As I said - at the moment each
> component solves this on its own. For example, we do not have a
> central storage service or layer.

Jup so we would need some form of generic interface which the components
could implement to abstract that part.

> This is an open-source project and if you would like to become part of
> the community, everything is possible for Stanbol. Making Stanbol
> ready for cloud-based scenarios is definitely something that many
> people would be interested in.

Yeah we do have some resources as well so the idea is that we would
contribute code as well but this would clearly be easier if you guys are
motivated to support us and work on it as well as time permits :)

For sure it would make sense to discuss the design implications with the
team first so we all agree on how it should be done. Also I'm not the
one which is capable of doing this on a coding level, that would be done
by someone else in the team.

> Sounds like a plan, but is not supported that way by Stanbol, yet.

Yeah in the end we would have to define some requirements and this is
what I can think of right now.

>> Transaction Management:
>> - is there currently some form of transaction management in Stanbol?
>> Could not find much documentation about this.
> 
> Not that I know.

ok we will think about that as well.

> We have a WAR packaging for Stanbol that is in use by some people.

ok good

> For the ongoing discussion, we should always keep in mind that Stanbol
> is not one single system. It is a composition of components and the
> user can select which components she would like to use. As a
> consequence of this, there is not much of an overall architecture in
> Stanbol. Each top level component, like Enhancer, Entityhub,
> Contenthub, etc. offers a RESTful API. That's what all have in common.
> So for your questions we have to look at each component separately.

ok so identifying with which components we would start is definitely
important. What would be the way to go for extending the current
components to introduce our requirements? Should we fork it first or do
you prefer other ways of working on it?

thanks

Adrian

Re: Stanbol architecture questions

Posted by Fabian Christ <ch...@googlemail.com>.
Hi,

2012/7/19 Adrian Gschwend <ml...@netlabs.org>:
> Hi group,
>
> As Stephan mentioned in a previous post we will most probably use
> Stanbol as a platform base for our FP7 project "FusePool". Check out
> http://www.fusepool.eu/ for more information (BTW we are looking for one
> full time employee as well, see the job offering on the page :)

Nice to hear ;)

> Right now we have a closer look at Stanbol to make sure we can do what
> we want in the end, currently we have the following questions/remarks:
>
> Multi Tenancy:
> As Stephane mentioned we need to be able to support multiple clients on
> the same platform. Wikipedia has a pretty good explanation:
>
> "With a multitenant architecture, a software application is designed to
> virtually partition its data and configuration, and each client
> organization works with a customized virtual application instance."
>
> https://en.wikipedia.org/wiki/Multitenancy
>
> The alternative is a multi-instance architecture which makes it much
> harder to "blend" between two partitions, which is in our case a very
> likely scenario.

Stanbol in its current stage is not multitenant. It is much more like
a multi-instance architecture.

Stanbol components are loosely coupled which means that there is no
coherent architecture that connects them. At the moment Stanbol
components just offer services but do not interact much, if at all,
with each other.

Maybe you should have a look at the components that you may need for
your project and then we have to think about ways to make them
multitenant. The service architecture is nice for cloud scenarios but
for the multitenant aspect we have to think about the used storage
solutions of different components. As I said - at the moment each
component solves this on its own. For example, we do not have a
central storage service or layer.

> We definitely need this for FusePool so we would appreciate if we could
> work together on this point to make Stanbol more valuable for cloud
> based environments and large scale applications. If I get Rupert
> correctly this could already be done for some components but probably
> not for all.

This is an open-source project and if you would like to become part of
the community, everything is possible for Stanbol. Making Stanbol
ready for cloud-based scenarios is definitely something that many
people would be interested in.

> So our proposal is to have something like a Tenancy module which:
> - lets us define partitions
> - allows us to enable/disable components in the partitions
> - allows us to have individual configurations per component in the
> partition (like individual chains in the enhancer or individual rules)
> - will take care that the components store the data in their own partition
> - ultimately we need to be able to plug some form of ACLs on top of that
> but I don't think this part will be the show stopper once the rest is
> working

Sounds like a plan, but is not supported that way by Stanbol, yet.

> Transaction Management:
> - is there currently some form of transaction management in Stanbol?
> Could not find much documentation about this.

Not that I know.

> Application Server:
> - We will most probably want to run Stanbol in JBoss, this is more a FYI
> right now. Did anyone do that already? Should we expect problems? :-D

We have a WAR packaging for Stanbol that is in use by some people.

> Jena Endpoint:
> - It seems that Jena provides TDB or SDB for persistent storage. Anyone
> knows if there are any SDB adaptors for Infinispan in the works?
> Probably a question I should rather ask on the Jena list but I give it a
> try :)

Maybe Rupert knows about this?

> So far, would be nice if we could start a discussion on our remarks.

Discussion started ;)

For the ongoing discussion, we should always keep in mind that Stanbol
is not one single system. It is a composition of components and the
user can select which components she would like to use. As a
consequence of this, there is not much of an overall architecture in
Stanbol. Each top level component, like Enhancer, Entityhub,
Contenthub, etc. offers a RESTful API. That's what all have in common.
So for your questions we have to look at each component separately.

Best,
 - Fabian