You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Dileepa Jayakody <dj...@zaizi.com> on 2015/11/13 11:20:53 UTC

Re: ManifoldCF transformation connector for Apache Stanbol

Hi guys,

I have developed a Stanbol connector for MCF. You can check it out from our
github repo here:
https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector

It requires the SolrWrapper output connector which indexes enhanced
documents, entities and entityTypes in separate Solr cores. Basically it
requires 3 separate solr cores configured with a specific Solr schema for
primary documents, entities and entityTypes separately. This was done for
our specific use-case.

The SolrWrapper code is here :
https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector

Perhaps we can discuss and remove the Stanbol connector's dependency with
SolrWrapper and have it working with any output connector.
Please note that the Stanbol connector currently has a bug in the UI
(editSpecification) which I'm working on at the moment. After fixing that I
will update here. And also I will provide documentations for configuring
the connector.

Thanks,
Dileepa

On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
adperezmorales@gmail.com> wrote:

> Hi Joshua
>
> It is not the list for that, but Marmotta is already integrated in Apache
> Stanbol. You can take a look at this issue
> https://issues.apache.org/jira/browse/STANBOL-1165 .
>
> Anyway, as I said this is not the list for that, so let's use the proper
> list for these things.
>
> Regards
>
>
>
> 2015-07-09 15:29 GMT+02:00 Joshua Dunham <jo...@gmail.com>:
>
> > Hey Dileepa,
> >
> >       In case you were interested, I pinged the list a few days ago
> asking
> > for integration tips for Apache Marmotta.
> >
> > I got some great tips on how to do this which could help you. Since
> > Marmotta is a drop in replacement for Clarezza on Stanbol it may be
> easier
> > for you to take this way.
> >
> > I'm not a Java programmer but I'm bringing this problem to the
> development
> > staff at my company for assistance. If you like the Marmotta approach we
> > may gain more traction solving the same integration.
> >
> > I'm also integrating Marmotta with Stanbol so the effect would be the
> same
> > except not using the Stanbol API for data import in favor of Marmotta.
> >
> > Best,
> >
> > -J
> >
> > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <dj...@zaizi.com>
> > wrote:
> > >
> > > Hi all,
> > >
> > > Thanks you for the feedback and offering your help in this.
> > > Let me get back to you on where to start the code base.
> > > As the first step, I would like to start by creating a architecture
> > diagram
> > > for the connector.
> > > I will send the diagram for your review soon.
> > >
> > > Thanks,
> > > Dileepa
> > >
> > > --
> > >
> > > ------------------------------
> > > This message should be regarded as confidential. If you have received
> > this
> > > email in error please notify the sender and destroy it immediately.
> > > Statements of intent shall only become binding when confirmed in hard
> > copy
> > > by an authorised signatory.
> > >
> > > Zaizi Ltd is registered in England and Wales with the registration
> number
> > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > > London W6 7AN.
> >
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Rafa Haro <rh...@apache.org>.
Hi Karl,




Stanbol performs a semantic lifting of the documents. It recognizes entities from an ontology based dataset and returns them as a result of analyzing the document. The semantic information associated to each document that can be retrieved for each entity (entities properties in the dataset) is configurable and can even include data associated to other entities reachable from the extracted ones by traversing the ontology using a language called LDPath. So, the idea is to define those LDpath expressions to configure the entity metadata that is going to be retrieved and finally indexed as metadata.




Cheers,

Rafa

On Mon, Dec 7, 2015 at 12:34 PM, Karl Wright <da...@gmail.com> wrote:

> It makes sense to me, anyway. :-)
> It sounds like Stanbol just has hierarchical attributes, rather than actual
> documents.
> Karl
> On Mon, Dec 7, 2015 at 6:16 AM, Rafa Haro <rh...@apache.org> wrote:
>> Hi Dileepa,
>>
>>
>>
>>
>> As I explained to you before, with Solr (and probably this is also true
>> with elastic search, although it allows you to index nested fields) you
>> can't have nested objects or fields. Besides that, also within ManifoldCF
>> the metadata is expressed as key, value pairs where values can be list of
>> objects but nothing beyond that. So, there is not possible to work with
>> complex structures as metadata, you must plain the stuff before.
>>
>>
>>
>>
>> In a nutshell, it is not possible to maintain the relationships between
>> entities and entities metadata. That doesn't mean that it is not
>> interesting to index the semantic metadata information, even if you can
>> relate them with a concrete entity. Indexing that information would enable
>> a bunch of uses cases. So, the proposal would be to define LDPath fields by
>> configuration at the transformation connector. With all the LDPath
>> expressions you would build a LDPATH program that would pass to the Stanbol
>> enhancer request. When you parse the response, you just need to go entity
>> by entity taking the LDPath fields values returned and putting them as
>> metadata using the name of the field as key and the returned value as value.
>>
>>
>>
>>
>> Does make sense?
>>
>>
>>
>>
>> Cheers,
>>
>> Rafa
>>
>> On Mon, Dec 7, 2015 at 11:17 AM, Dileepa Jayakody <dj...@zaizi.com>
>> wrote:
>>
>> > Hi All,
>> > While thanking you all for your input on Stanbol connector requirement, I
>> > would like to continue with modifying the Stanbol connector to be
>> > compatible with any output connector. If you guys can give some guidance
>> on
>> > how the entity metadata should be added to the repository document I can
>> > modify the stanbol connector accordingly.
>> > From Rafa's comments, I gathered we can add the entity metadata to the
>> > repo.doc as key value pairs.
>> > However this idea is not yet clear to me. There could be 'N' number of
>> > entities in a document and each of them will have some common attributes
>> > such as name, id, type and specific attributes for particular entity
>> type.
>> > I'm not clear on how to maintain that structure of N number of entities
>> > with their attributes in a repo.document as key value pairs and make them
>> > LDPath compatible for retrieval in an output connector.
>> > @Rafa
>> > If you can please elaborate on your suggestion it would be greatly
>> helpful
>> > to me.
>> > All other suggestions are also welcome.
>> > Thanks,
>> > Dileepa
>> > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <da...@gmail.com> wrote:
>> >> I, too, agree.  Somebody will need to turn this connector into one that
>> >> plays by the rules.  It may be possible for someone on the team here to
>> do
>> >> that, but it won't be me; I'm seriously overextended at the moment.  It
>> >> would be best if someone who knew the connector well could do the
>> necessary
>> >> work.
>> >>
>> >> Karl
>> >>
>> >>
>> >> On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rh...@gmail.com>
>> wrote:
>> >>
>> >> > I must agree with Antonio. When I started to work on this I was
>> expecting
>> >> > the connector to work by just extracting the entities and entities
>> >> metadata
>> >> > and put them as plain metadata of the documents, probably following
>> >> LDPATH
>> >> > queries configuration
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > This is probably ok for Sensefy but I don’t think this could be
>> suitable
>> >> > to be included in the project. But this is only my opinion. Of
>> course, a
>> >> > version of the connector that fully respect the ManifoldCF
>> architecture
>> >> > would be more than welcome in my opinion
>> >> >
>> >> > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
>> >> > <ad...@gmail.com> wrote:
>> >> >
>> >> > > Hi
>> >> > > The removal of the SolrWrapper is a must. It was a requirement for
>> an
>> >> > > internal project which has nothing to do here with a normal
>> operation
>> >> of
>> >> > > Manifold, so forcing the users to use Solr does not fit the Manifold
>> >> > > philosophy.
>> >> > > In my opinion, at this moment, a Stanbol connector with such a big
>> >> > > dependency which will not fit almost any use case is not very
>> useful.
>> >> > > You should think a way to convert Stanbol connector into a normal
>> >> > > Transformation connector without assuming that a specific output
>> >> > connector
>> >> > > will be used.
>> >> > > Regards
>> >> > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <dj...@zaizi.com>:
>> >> > >> Hi guys,
>> >> > >>
>> >> > >> I have developed a Stanbol connector for MCF. You can check it out
>> >> from
>> >> > our
>> >> > >> github repo here:
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
>> >> > >>
>> >> > >> It requires the SolrWrapper output connector which indexes enhanced
>> >> > >> documents, entities and entityTypes in separate Solr cores.
>> Basically
>> >> it
>> >> > >> requires 3 separate solr cores configured with a specific Solr
>> schema
>> >> > for
>> >> > >> primary documents, entities and entityTypes separately. This was
>> done
>> >> > for
>> >> > >> our specific use-case.
>> >> > >>
>> >> > >> The SolrWrapper code is here :
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
>> >> > >>
>> >> > >> Perhaps we can discuss and remove the Stanbol connector's
>> dependency
>> >> > with
>> >> > >> SolrWrapper and have it working with any output connector.
>> >> > >> Please note that the Stanbol connector currently has a bug in the
>> UI
>> >> > >> (editSpecification) which I'm working on at the moment. After
>> fixing
>> >> > that I
>> >> > >> will update here. And also I will provide documentations for
>> >> configuring
>> >> > >> the connector.
>> >> > >>
>> >> > >> Thanks,
>> >> > >> Dileepa
>> >> > >>
>> >> > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
>> >> > >> adperezmorales@gmail.com> wrote:
>> >> > >>
>> >> > >> > Hi Joshua
>> >> > >> >
>> >> > >> > It is not the list for that, but Marmotta is already integrated
>> in
>> >> > Apache
>> >> > >> > Stanbol. You can take a look at this issue
>> >> > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
>> >> > >> >
>> >> > >> > Anyway, as I said this is not the list for that, so let's use the
>> >> > proper
>> >> > >> > list for these things.
>> >> > >> >
>> >> > >> > Regards
>> >> > >> >
>> >> > >> >
>> >> > >> >
>> >> > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
>> joshua.dunham@gmail.com>:
>> >> > >> >
>> >> > >> > > Hey Dileepa,
>> >> > >> > >
>> >> > >> > >       In case you were interested, I pinged the list a few days
>> >> ago
>> >> > >> > asking
>> >> > >> > > for integration tips for Apache Marmotta.
>> >> > >> > >
>> >> > >> > > I got some great tips on how to do this which could help you.
>> >> Since
>> >> > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol it
>> may
>> >> be
>> >> > >> > easier
>> >> > >> > > for you to take this way.
>> >> > >> > >
>> >> > >> > > I'm not a Java programmer but I'm bringing this problem to the
>> >> > >> > development
>> >> > >> > > staff at my company for assistance. If you like the Marmotta
>> >> > approach
>> >> > >> we
>> >> > >> > > may gain more traction solving the same integration.
>> >> > >> > >
>> >> > >> > > I'm also integrating Marmotta with Stanbol so the effect would
>> be
>> >> > the
>> >> > >> > same
>> >> > >> > > except not using the Stanbol API for data import in favor of
>> >> > Marmotta.
>> >> > >> > >
>> >> > >> > > Best,
>> >> > >> > >
>> >> > >> > > -J
>> >> > >> > >
>> >> > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
>> >> djayakody@zaizi.com
>> >> > >
>> >> > >> > > wrote:
>> >> > >> > > >
>> >> > >> > > > Hi all,
>> >> > >> > > >
>> >> > >> > > > Thanks you for the feedback and offering your help in this.
>> >> > >> > > > Let me get back to you on where to start the code base.
>> >> > >> > > > As the first step, I would like to start by creating a
>> >> > architecture
>> >> > >> > > diagram
>> >> > >> > > > for the connector.
>> >> > >> > > > I will send the diagram for your review soon.
>> >> > >> > > >
>> >> > >> > > > Thanks,
>> >> > >> > > > Dileepa
>> >> > >> > > >
>> >> > >> > > > --
>> >> > >> > > >
>> >> > >> > > > ------------------------------
>> >> > >> > > > This message should be regarded as confidential. If you have
>> >> > received
>> >> > >> > > this
>> >> > >> > > > email in error please notify the sender and destroy it
>> >> > immediately.
>> >> > >> > > > Statements of intent shall only become binding when
>> confirmed in
>> >> > hard
>> >> > >> > > copy
>> >> > >> > > > by an authorised signatory.
>> >> > >> > > >
>> >> > >> > > > Zaizi Ltd is registered in England and Wales with the
>> >> registration
>> >> > >> > number
>> >> > >> > > > 6440931. The Registered Office is Brook House, 229 Shepherds
>> >> Bush
>> >> > >> Road,
>> >> > >> > > > London W6 7AN.
>> >> > >> > >
>> >> > >> >
>> >> > >>
>> >> > >> --
>> >> > >>
>> >> > >> ------------------------------
>> >> > >> This message should be regarded as confidential. If you have
>> received
>> >> > this
>> >> > >> email in error please notify the sender and destroy it immediately.
>> >> > >> Statements of intent shall only become binding when confirmed in
>> hard
>> >> > copy
>> >> > >> by an authorised signatory.
>> >> > >>
>> >> > >> Zaizi Ltd is registered in England and Wales with the registration
>> >> > number
>> >> > >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>> >> Road,
>> >> > >> London W6 7AN.
>> >> > >>
>> >> >
>> >>
>> > --
>> > ------------------------------
>> > This message should be regarded as confidential. If you have received
>> this
>> > email in error please notify the sender and destroy it immediately.
>> > Statements of intent shall only become binding when confirmed in hard
>> copy
>> > by an authorised signatory.
>> > Zaizi Ltd is registered in England and Wales with the registration number
>> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>> > London W6 7AN.
>>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Karl Wright <da...@gmail.com>.
It makes sense to me, anyway. :-)
It sounds like Stanbol just has hierarchical attributes, rather than actual
documents.

Karl

On Mon, Dec 7, 2015 at 6:16 AM, Rafa Haro <rh...@apache.org> wrote:

> Hi Dileepa,
>
>
>
>
> As I explained to you before, with Solr (and probably this is also true
> with elastic search, although it allows you to index nested fields) you
> can't have nested objects or fields. Besides that, also within ManifoldCF
> the metadata is expressed as key, value pairs where values can be list of
> objects but nothing beyond that. So, there is not possible to work with
> complex structures as metadata, you must plain the stuff before.
>
>
>
>
> In a nutshell, it is not possible to maintain the relationships between
> entities and entities metadata. That doesn't mean that it is not
> interesting to index the semantic metadata information, even if you can
> relate them with a concrete entity. Indexing that information would enable
> a bunch of uses cases. So, the proposal would be to define LDPath fields by
> configuration at the transformation connector. With all the LDPath
> expressions you would build a LDPATH program that would pass to the Stanbol
> enhancer request. When you parse the response, you just need to go entity
> by entity taking the LDPath fields values returned and putting them as
> metadata using the name of the field as key and the returned value as value.
>
>
>
>
> Does make sense?
>
>
>
>
> Cheers,
>
> Rafa
>
> On Mon, Dec 7, 2015 at 11:17 AM, Dileepa Jayakody <dj...@zaizi.com>
> wrote:
>
> > Hi All,
> > While thanking you all for your input on Stanbol connector requirement, I
> > would like to continue with modifying the Stanbol connector to be
> > compatible with any output connector. If you guys can give some guidance
> on
> > how the entity metadata should be added to the repository document I can
> > modify the stanbol connector accordingly.
> > From Rafa's comments, I gathered we can add the entity metadata to the
> > repo.doc as key value pairs.
> > However this idea is not yet clear to me. There could be 'N' number of
> > entities in a document and each of them will have some common attributes
> > such as name, id, type and specific attributes for particular entity
> type.
> > I'm not clear on how to maintain that structure of N number of entities
> > with their attributes in a repo.document as key value pairs and make them
> > LDPath compatible for retrieval in an output connector.
> > @Rafa
> > If you can please elaborate on your suggestion it would be greatly
> helpful
> > to me.
> > All other suggestions are also welcome.
> > Thanks,
> > Dileepa
> > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <da...@gmail.com> wrote:
> >> I, too, agree.  Somebody will need to turn this connector into one that
> >> plays by the rules.  It may be possible for someone on the team here to
> do
> >> that, but it won't be me; I'm seriously overextended at the moment.  It
> >> would be best if someone who knew the connector well could do the
> necessary
> >> work.
> >>
> >> Karl
> >>
> >>
> >> On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rh...@gmail.com>
> wrote:
> >>
> >> > I must agree with Antonio. When I started to work on this I was
> expecting
> >> > the connector to work by just extracting the entities and entities
> >> metadata
> >> > and put them as plain metadata of the documents, probably following
> >> LDPATH
> >> > queries configuration
> >> >
> >> >
> >> >
> >> >
> >> > This is probably ok for Sensefy but I don’t think this could be
> suitable
> >> > to be included in the project. But this is only my opinion. Of
> course, a
> >> > version of the connector that fully respect the ManifoldCF
> architecture
> >> > would be more than welcome in my opinion
> >> >
> >> > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> >> > <ad...@gmail.com> wrote:
> >> >
> >> > > Hi
> >> > > The removal of the SolrWrapper is a must. It was a requirement for
> an
> >> > > internal project which has nothing to do here with a normal
> operation
> >> of
> >> > > Manifold, so forcing the users to use Solr does not fit the Manifold
> >> > > philosophy.
> >> > > In my opinion, at this moment, a Stanbol connector with such a big
> >> > > dependency which will not fit almost any use case is not very
> useful.
> >> > > You should think a way to convert Stanbol connector into a normal
> >> > > Transformation connector without assuming that a specific output
> >> > connector
> >> > > will be used.
> >> > > Regards
> >> > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <dj...@zaizi.com>:
> >> > >> Hi guys,
> >> > >>
> >> > >> I have developed a Stanbol connector for MCF. You can check it out
> >> from
> >> > our
> >> > >> github repo here:
> >> > >>
> >> > >>
> >> >
> >>
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> >> > >>
> >> > >> It requires the SolrWrapper output connector which indexes enhanced
> >> > >> documents, entities and entityTypes in separate Solr cores.
> Basically
> >> it
> >> > >> requires 3 separate solr cores configured with a specific Solr
> schema
> >> > for
> >> > >> primary documents, entities and entityTypes separately. This was
> done
> >> > for
> >> > >> our specific use-case.
> >> > >>
> >> > >> The SolrWrapper code is here :
> >> > >>
> >> > >>
> >> >
> >>
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> >> > >>
> >> > >> Perhaps we can discuss and remove the Stanbol connector's
> dependency
> >> > with
> >> > >> SolrWrapper and have it working with any output connector.
> >> > >> Please note that the Stanbol connector currently has a bug in the
> UI
> >> > >> (editSpecification) which I'm working on at the moment. After
> fixing
> >> > that I
> >> > >> will update here. And also I will provide documentations for
> >> configuring
> >> > >> the connector.
> >> > >>
> >> > >> Thanks,
> >> > >> Dileepa
> >> > >>
> >> > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
> >> > >> adperezmorales@gmail.com> wrote:
> >> > >>
> >> > >> > Hi Joshua
> >> > >> >
> >> > >> > It is not the list for that, but Marmotta is already integrated
> in
> >> > Apache
> >> > >> > Stanbol. You can take a look at this issue
> >> > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> >> > >> >
> >> > >> > Anyway, as I said this is not the list for that, so let's use the
> >> > proper
> >> > >> > list for these things.
> >> > >> >
> >> > >> > Regards
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> joshua.dunham@gmail.com>:
> >> > >> >
> >> > >> > > Hey Dileepa,
> >> > >> > >
> >> > >> > >       In case you were interested, I pinged the list a few days
> >> ago
> >> > >> > asking
> >> > >> > > for integration tips for Apache Marmotta.
> >> > >> > >
> >> > >> > > I got some great tips on how to do this which could help you.
> >> Since
> >> > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol it
> may
> >> be
> >> > >> > easier
> >> > >> > > for you to take this way.
> >> > >> > >
> >> > >> > > I'm not a Java programmer but I'm bringing this problem to the
> >> > >> > development
> >> > >> > > staff at my company for assistance. If you like the Marmotta
> >> > approach
> >> > >> we
> >> > >> > > may gain more traction solving the same integration.
> >> > >> > >
> >> > >> > > I'm also integrating Marmotta with Stanbol so the effect would
> be
> >> > the
> >> > >> > same
> >> > >> > > except not using the Stanbol API for data import in favor of
> >> > Marmotta.
> >> > >> > >
> >> > >> > > Best,
> >> > >> > >
> >> > >> > > -J
> >> > >> > >
> >> > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> >> djayakody@zaizi.com
> >> > >
> >> > >> > > wrote:
> >> > >> > > >
> >> > >> > > > Hi all,
> >> > >> > > >
> >> > >> > > > Thanks you for the feedback and offering your help in this.
> >> > >> > > > Let me get back to you on where to start the code base.
> >> > >> > > > As the first step, I would like to start by creating a
> >> > architecture
> >> > >> > > diagram
> >> > >> > > > for the connector.
> >> > >> > > > I will send the diagram for your review soon.
> >> > >> > > >
> >> > >> > > > Thanks,
> >> > >> > > > Dileepa
> >> > >> > > >
> >> > >> > > > --
> >> > >> > > >
> >> > >> > > > ------------------------------
> >> > >> > > > This message should be regarded as confidential. If you have
> >> > received
> >> > >> > > this
> >> > >> > > > email in error please notify the sender and destroy it
> >> > immediately.
> >> > >> > > > Statements of intent shall only become binding when
> confirmed in
> >> > hard
> >> > >> > > copy
> >> > >> > > > by an authorised signatory.
> >> > >> > > >
> >> > >> > > > Zaizi Ltd is registered in England and Wales with the
> >> registration
> >> > >> > number
> >> > >> > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> >> Bush
> >> > >> Road,
> >> > >> > > > London W6 7AN.
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >> --
> >> > >>
> >> > >> ------------------------------
> >> > >> This message should be regarded as confidential. If you have
> received
> >> > this
> >> > >> email in error please notify the sender and destroy it immediately.
> >> > >> Statements of intent shall only become binding when confirmed in
> hard
> >> > copy
> >> > >> by an authorised signatory.
> >> > >>
> >> > >> Zaizi Ltd is registered in England and Wales with the registration
> >> > number
> >> > >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> >> Road,
> >> > >> London W6 7AN.
> >> > >>
> >> >
> >>
> > --
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Rafa Haro <rh...@apache.org>.
Hi Dileepa,




As I explained to you before, with Solr (and probably this is also true with elastic search, although it allows you to index nested fields) you can't have nested objects or fields. Besides that, also within ManifoldCF the metadata is expressed as key, value pairs where values can be list of objects but nothing beyond that. So, there is not possible to work with complex structures as metadata, you must plain the stuff before.




In a nutshell, it is not possible to maintain the relationships between entities and entities metadata. That doesn't mean that it is not interesting to index the semantic metadata information, even if you can relate them with a concrete entity. Indexing that information would enable a bunch of uses cases. So, the proposal would be to define LDPath fields by configuration at the transformation connector. With all the LDPath expressions you would build a LDPATH program that would pass to the Stanbol enhancer request. When you parse the response, you just need to go entity by entity taking the LDPath fields values returned and putting them as metadata using the name of the field as key and the returned value as value.




Does make sense?




Cheers,

Rafa

On Mon, Dec 7, 2015 at 11:17 AM, Dileepa Jayakody <dj...@zaizi.com>
wrote:

> Hi All,
> While thanking you all for your input on Stanbol connector requirement, I
> would like to continue with modifying the Stanbol connector to be
> compatible with any output connector. If you guys can give some guidance on
> how the entity metadata should be added to the repository document I can
> modify the stanbol connector accordingly.
> From Rafa's comments, I gathered we can add the entity metadata to the
> repo.doc as key value pairs.
> However this idea is not yet clear to me. There could be 'N' number of
> entities in a document and each of them will have some common attributes
> such as name, id, type and specific attributes for particular entity type.
> I'm not clear on how to maintain that structure of N number of entities
> with their attributes in a repo.document as key value pairs and make them
> LDPath compatible for retrieval in an output connector.
> @Rafa
> If you can please elaborate on your suggestion it would be greatly helpful
> to me.
> All other suggestions are also welcome.
> Thanks,
> Dileepa
> On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <da...@gmail.com> wrote:
>> I, too, agree.  Somebody will need to turn this connector into one that
>> plays by the rules.  It may be possible for someone on the team here to do
>> that, but it won't be me; I'm seriously overextended at the moment.  It
>> would be best if someone who knew the connector well could do the necessary
>> work.
>>
>> Karl
>>
>>
>> On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rh...@gmail.com> wrote:
>>
>> > I must agree with Antonio. When I started to work on this I was expecting
>> > the connector to work by just extracting the entities and entities
>> metadata
>> > and put them as plain metadata of the documents, probably following
>> LDPATH
>> > queries configuration
>> >
>> >
>> >
>> >
>> > This is probably ok for Sensefy but I don’t think this could be suitable
>> > to be included in the project. But this is only my opinion. Of course, a
>> > version of the connector that fully respect the ManifoldCF architecture
>> > would be more than welcome in my opinion
>> >
>> > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
>> > <ad...@gmail.com> wrote:
>> >
>> > > Hi
>> > > The removal of the SolrWrapper is a must. It was a requirement for an
>> > > internal project which has nothing to do here with a normal operation
>> of
>> > > Manifold, so forcing the users to use Solr does not fit the Manifold
>> > > philosophy.
>> > > In my opinion, at this moment, a Stanbol connector with such a big
>> > > dependency which will not fit almost any use case is not very useful.
>> > > You should think a way to convert Stanbol connector into a normal
>> > > Transformation connector without assuming that a specific output
>> > connector
>> > > will be used.
>> > > Regards
>> > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <dj...@zaizi.com>:
>> > >> Hi guys,
>> > >>
>> > >> I have developed a Stanbol connector for MCF. You can check it out
>> from
>> > our
>> > >> github repo here:
>> > >>
>> > >>
>> >
>> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
>> > >>
>> > >> It requires the SolrWrapper output connector which indexes enhanced
>> > >> documents, entities and entityTypes in separate Solr cores. Basically
>> it
>> > >> requires 3 separate solr cores configured with a specific Solr schema
>> > for
>> > >> primary documents, entities and entityTypes separately. This was done
>> > for
>> > >> our specific use-case.
>> > >>
>> > >> The SolrWrapper code is here :
>> > >>
>> > >>
>> >
>> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
>> > >>
>> > >> Perhaps we can discuss and remove the Stanbol connector's dependency
>> > with
>> > >> SolrWrapper and have it working with any output connector.
>> > >> Please note that the Stanbol connector currently has a bug in the UI
>> > >> (editSpecification) which I'm working on at the moment. After fixing
>> > that I
>> > >> will update here. And also I will provide documentations for
>> configuring
>> > >> the connector.
>> > >>
>> > >> Thanks,
>> > >> Dileepa
>> > >>
>> > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
>> > >> adperezmorales@gmail.com> wrote:
>> > >>
>> > >> > Hi Joshua
>> > >> >
>> > >> > It is not the list for that, but Marmotta is already integrated in
>> > Apache
>> > >> > Stanbol. You can take a look at this issue
>> > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
>> > >> >
>> > >> > Anyway, as I said this is not the list for that, so let's use the
>> > proper
>> > >> > list for these things.
>> > >> >
>> > >> > Regards
>> > >> >
>> > >> >
>> > >> >
>> > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <jo...@gmail.com>:
>> > >> >
>> > >> > > Hey Dileepa,
>> > >> > >
>> > >> > >       In case you were interested, I pinged the list a few days
>> ago
>> > >> > asking
>> > >> > > for integration tips for Apache Marmotta.
>> > >> > >
>> > >> > > I got some great tips on how to do this which could help you.
>> Since
>> > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol it may
>> be
>> > >> > easier
>> > >> > > for you to take this way.
>> > >> > >
>> > >> > > I'm not a Java programmer but I'm bringing this problem to the
>> > >> > development
>> > >> > > staff at my company for assistance. If you like the Marmotta
>> > approach
>> > >> we
>> > >> > > may gain more traction solving the same integration.
>> > >> > >
>> > >> > > I'm also integrating Marmotta with Stanbol so the effect would be
>> > the
>> > >> > same
>> > >> > > except not using the Stanbol API for data import in favor of
>> > Marmotta.
>> > >> > >
>> > >> > > Best,
>> > >> > >
>> > >> > > -J
>> > >> > >
>> > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
>> djayakody@zaizi.com
>> > >
>> > >> > > wrote:
>> > >> > > >
>> > >> > > > Hi all,
>> > >> > > >
>> > >> > > > Thanks you for the feedback and offering your help in this.
>> > >> > > > Let me get back to you on where to start the code base.
>> > >> > > > As the first step, I would like to start by creating a
>> > architecture
>> > >> > > diagram
>> > >> > > > for the connector.
>> > >> > > > I will send the diagram for your review soon.
>> > >> > > >
>> > >> > > > Thanks,
>> > >> > > > Dileepa
>> > >> > > >
>> > >> > > > --
>> > >> > > >
>> > >> > > > ------------------------------
>> > >> > > > This message should be regarded as confidential. If you have
>> > received
>> > >> > > this
>> > >> > > > email in error please notify the sender and destroy it
>> > immediately.
>> > >> > > > Statements of intent shall only become binding when confirmed in
>> > hard
>> > >> > > copy
>> > >> > > > by an authorised signatory.
>> > >> > > >
>> > >> > > > Zaizi Ltd is registered in England and Wales with the
>> registration
>> > >> > number
>> > >> > > > 6440931. The Registered Office is Brook House, 229 Shepherds
>> Bush
>> > >> Road,
>> > >> > > > London W6 7AN.
>> > >> > >
>> > >> >
>> > >>
>> > >> --
>> > >>
>> > >> ------------------------------
>> > >> This message should be regarded as confidential. If you have received
>> > this
>> > >> email in error please notify the sender and destroy it immediately.
>> > >> Statements of intent shall only become binding when confirmed in hard
>> > copy
>> > >> by an authorised signatory.
>> > >>
>> > >> Zaizi Ltd is registered in England and Wales with the registration
>> > number
>> > >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>> Road,
>> > >> London W6 7AN.
>> > >>
>> >
>>
> -- 
> ------------------------------
> This message should be regarded as confidential. If you have received this 
> email in error please notify the sender and destroy it immediately. 
> Statements of intent shall only become binding when confirmed in hard copy 
> by an authorised signatory.
> Zaizi Ltd is registered in England and Wales with the registration number 
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
> London W6 7AN. 

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Dileepa Jayakody <dj...@zaizi.com>.
Hi Rafa, Karl,

Thanks again for all the pointers. Apparently I have missed several replies
before sending my reply to Karl's earlier email :)

Regards,
Dileepa

On Mon, Dec 7, 2015 at 5:29 PM, Dileepa Jayakody <dj...@zaizi.com>
wrote:

> Hi Karl,
>
> Thanks a lot for the pointer.
>
> Stanbol doesn't update an existing document, it generates a new response
> with requested enhancement details for the content enhansment request.
> For example for a request like : "Paris is a city in France" following RDF
> response [1] is given by Stanbol.
>
> In the Stanbol connector, enhancement artifacts such as TextAnnotations
> and EntityAnnotations are extracted from the RDF response, to generate the
> entity abstractions and add them to the mcf repository document. Currently
> in the Stanbol connector we have added these entity abstractions as JSON
> strings to a multi-valued 'entities' field in the repository document and
> we parse that JSON in the SolrWrapper output connector to index in separate
> Solr cores (primary documents, linked entities and entity types with their
> attributes).
>
> Can we can have a primary repository document and create sub documents for
> the extracted entities? Is it possible to generate sub documents for a
> repo-document in a transformation connector?
>
> Thanks.
> Dileepa
>
> [1] Sample Stanbol response
>
> {
>   "@context": {
>     "dbp-ont": "http://dbpedia.org/ontology/",
>     "dc": "http://purl.org/dc/terms/",
>     "dc:created": {
>       "@type": "xsd:dateTime"
>     },
>     "enhancer": "http://fise.iks-project.eu/ontology/",
>     "enhancer:confidence": {
>       "@type": "xsd:double"
>     },
>     "enhancer:end": {
>       "@type": "xsd:int"
>     },
>     "enhancer:entity-reference": {
>       "@type": "@id"
>     },
>     "enhancer:entity-type": {
>       "@type": "@id"
>     },
>     "enhancer:extracted-from": {
>       "@type": "@id"
>     },
>     "enhancer:start": {
>       "@type": "xsd:int"
>     },
>     "entityhub": "http://stanbol.apache.org/ontology/entityhub/entityhub#",
>     "foaf": "http://xmlns.com/foaf/0.1/",
>     "foaf:depiction": {
>       "@type": "@id"
>     },
>     "owl": "http://www.w3.org/2002/07/owl#",
>     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
>     "schema": "http://schema.org/",
>     "xsd": "http://www.w3.org/2001/XMLSchema#"
>   },
>   "@graph": [
>     {
>       "@id": "http://dbpedia.org/resource/France",
>       "@type": [
>         "dbp-ont:Country",
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "http://www.opengis.net/gml/_Feature",
>         "owl:Thing",
>         "schema:Country",
>         "schema:Place"
>       ],
>       "foaf:depiction": [
>         "http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg",
>         "http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png"
>       ],
>       "rdfs:comment": {
>         "@language": "en",
>         "@value": "France, officially the French Republic, is a unitary semi-presidential republic in Western Europe with several overseas territories and islands located on other continents and in the Indian, Pacific, and Atlantic oceans. Metropolitan France extends from the Mediterranean Sea to the English Channel and the North Sea, and from the Rhine to the Atlantic Ocean. It is often referred to as l’Hexagone because of the geometric shape of its territory."
>       },
>       "rdfs:label": [
>         {
>           "@language": "en",
>           "@value": "France"
>         },
>         {
>           "@language": "fr",
>           "@value": "France"
>         },
>       ]
>     },
>
>     {
>       "@id": "http://dbpedia.org/resource/Paris",
>       "@type": [
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "dbp-ont:Settlement",
>         "http://www.opengis.net/gml/_Feature",
>         "owl:Thing",
>         "schema:Place"
>       ],
>       "foaf:depiction": [
>         "http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg",
>         "http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg"
>       ],
>       "geo:lat": 48.8567,
>       "geo:long": 2.3508,
>       "rdfs:comment": {
>         "@language": "en",
>         "@value": "Paris is the capital and largest city of France. It is situated on the river Seine, in northern France, at the heart of the Île-de-France region (or Paris Region, French: Région parisienne). As of January 2008 the city of Paris, within its administrative limits largely unchanged since 1860, has an estimated population of 2,211,297 and a metropolitan population of 12,089,098, and is one of the most populated metropolitan areas in Europe."
>       },
>       "rdfs:label": [
>
>         {
>           "@language": "en",
>           "@value": "Paris"
>         },
>         {
>           "@language": "fr",
>           "@value": "Paris"
>         },
>       ]
>     },
>    }
>     {
>       "@id": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:TextAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.740Z",
>       "dc:creator": "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
>       "dc:type": "dbp-ont:Place",
>       "enhancer:confidence": 0.6017613,
>       "enhancer:end": 5,
>       "enhancer:extracted-from": "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "enhancer:selected-text": {
>         "@language": "en",
>         "@value": "Paris"
>       },
>       "enhancer:selection-context": {
>         "@language": "en",
>         "@value": "Paris is in France"
>       },
>       "enhancer:start": 0
>     },
>     {
>       "@id": "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:EntityAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.748Z",
>       "dc:creator": "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
>       "dc:relation": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
>       "enhancer:confidence": 1.0,
>       "enhancer:entity-label": {
>         "@language": "en",
>         "@value": "France"
>       },
>       "enhancer:entity-reference": "http://dbpedia.org/resource/France",
>       "enhancer:entity-type": [
>         "dbp-ont:Country",
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "schema:Country",
>         "schema:Place",
>         "http://www.opengis.net/gml/_Feature",
>         "owl:Thing"
>       ],
>       "enhancer:extracted-from": "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "entityhub:site": "dbpedia"
>     },
>     {
>       "@id": "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:EntityAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.748Z",
>       "dc:creator": "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
>       "dc:relation": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
>       "enhancer:confidence": 0.25715446,
>       "enhancer:entity-label": {
>         "@language": "en",
>         "@value": "Vichy France"
>       },
>       "enhancer:entity-reference": "http://dbpedia.org/resource/Vichy_France",
>       "enhancer:entity-type": [
>         "dbp-ont:Country",
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "schema:Country",
>         "schema:Place",
>         "http://www.opengis.net/gml/_Feature",
>         "owl:Thing"
>       ],
>       "enhancer:extracted-from": "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "entityhub:site": "dbpedia"
>     },
>     {
>       "@id": "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:EntityAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.748Z",
>       "dc:creator": "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
>       "dc:relation": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
>       "enhancer:confidence": 0.1493264,
>       "enhancer:entity-label": {
>         "@language": "en",
>         "@value": "Paris Commune"
>       },
>       "enhancer:entity-reference": "http://dbpedia.org/resource/Paris_Commune",
>       "enhancer:entity-type": [
>         "dbp-ont:Country",
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "schema:Country",
>         "schema:Place",
>         "owl:Thing"
>       ],
>       "enhancer:extracted-from": "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "entityhub:site": "dbpedia"
>     },
>     {
>       "@id": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:TextAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.740Z",
>       "dc:creator": "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
>       "dc:type": "dbp-ont:Place",
>       "enhancer:confidence": 0.99354976,
>       "enhancer:end": 18,
>       "enhancer:extracted-from": "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "enhancer:selected-text": {
>         "@language": "en",
>         "@value": "France"
>       },
>       "enhancer:selection-context": {
>         "@language": "en",
>         "@value": "Paris is in France"
>       },
>       "enhancer:start": 12
>     }
>   ]
> }
>
>
>
>
>
>
> On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <da...@gmail.com> wrote:
>
>> Hi Dileepa,
>>
>> Repository connectors have an abstraction that allows them to generate
>> compound documents (where a document has a primary identifier, and there
>> are subdocuments that share that primary identifier and have a secondary
>> identifier).  This sounds a bit like what you are describing.  Does
>> Stanbol
>> work by decorating an existing document, or does it work by generating all
>> content for a document?
>>
>> Karl
>>
>>
>> On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <dj...@zaizi.com>
>> wrote:
>>
>> > Hi All,
>> >
>> >
>> > While thanking you all for your input on Stanbol connector requirement,
>> I
>> > would like to continue with modifying the Stanbol connector to be
>> > compatible with any output connector. If you guys can give some
>> guidance on
>> > how the entity metadata should be added to the repository document I can
>> > modify the stanbol connector accordingly.
>> >
>> > From Rafa's comments, I gathered we can add the entity metadata to the
>> > repo.doc as key value pairs.
>> > However this idea is not yet clear to me. There could be 'N' number of
>> > entities in a document and each of them will have some common attributes
>> > such as name, id, type and specific attributes for particular entity
>> type.
>> > I'm not clear on how to maintain that structure of N number of entities
>> > with their attributes in a repo.document as key value pairs and make
>> them
>> > LDPath compatible for retrieval in an output connector.
>> >
>> > @Rafa
>> > If you can please elaborate on your suggestion it would be greatly
>> helpful
>> > to me.
>> > All other suggestions are also welcome.
>> >
>> > Thanks,
>> > Dileepa
>> >
>> >
>> > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <da...@gmail.com>
>> wrote:
>> >
>> > > I, too, agree.  Somebody will need to turn this connector into one
>> that
>> > > plays by the rules.  It may be possible for someone on the team here
>> to
>> > do
>> > > that, but it won't be me; I'm seriously overextended at the moment.
>> It
>> > > would be best if someone who knew the connector well could do the
>> > necessary
>> > > work.
>> > >
>> > > Karl
>> > >
>> > >
>> > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rh...@gmail.com>
>> > wrote:
>> > >
>> > > > I must agree with Antonio. When I started to work on this I was
>> > expecting
>> > > > the connector to work by just extracting the entities and entities
>> > > metadata
>> > > > and put them as plain metadata of the documents, probably following
>> > > LDPATH
>> > > > queries configuration
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > This is probably ok for Sensefy but I don’t think this could be
>> > suitable
>> > > > to be included in the project. But this is only my opinion. Of
>> course,
>> > a
>> > > > version of the connector that fully respect the ManifoldCF
>> architecture
>> > > > would be more than welcome in my opinion
>> > > >
>> > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
>> > > > <ad...@gmail.com> wrote:
>> > > >
>> > > > > Hi
>> > > > > The removal of the SolrWrapper is a must. It was a requirement
>> for an
>> > > > > internal project which has nothing to do here with a normal
>> operation
>> > > of
>> > > > > Manifold, so forcing the users to use Solr does not fit the
>> Manifold
>> > > > > philosophy.
>> > > > > In my opinion, at this moment, a Stanbol connector with such a big
>> > > > > dependency which will not fit almost any use case is not very
>> useful.
>> > > > > You should think a way to convert Stanbol connector into a normal
>> > > > > Transformation connector without assuming that a specific output
>> > > > connector
>> > > > > will be used.
>> > > > > Regards
>> > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <djayakody@zaizi.com
>> >:
>> > > > >> Hi guys,
>> > > > >>
>> > > > >> I have developed a Stanbol connector for MCF. You can check it
>> out
>> > > from
>> > > > our
>> > > > >> github repo here:
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
>> > > > >>
>> > > > >> It requires the SolrWrapper output connector which indexes
>> enhanced
>> > > > >> documents, entities and entityTypes in separate Solr cores.
>> > Basically
>> > > it
>> > > > >> requires 3 separate solr cores configured with a specific Solr
>> > schema
>> > > > for
>> > > > >> primary documents, entities and entityTypes separately. This was
>> > done
>> > > > for
>> > > > >> our specific use-case.
>> > > > >>
>> > > > >> The SolrWrapper code is here :
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
>> > > > >>
>> > > > >> Perhaps we can discuss and remove the Stanbol connector's
>> dependency
>> > > > with
>> > > > >> SolrWrapper and have it working with any output connector.
>> > > > >> Please note that the Stanbol connector currently has a bug in
>> the UI
>> > > > >> (editSpecification) which I'm working on at the moment. After
>> fixing
>> > > > that I
>> > > > >> will update here. And also I will provide documentations for
>> > > configuring
>> > > > >> the connector.
>> > > > >>
>> > > > >> Thanks,
>> > > > >> Dileepa
>> > > > >>
>> > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
>> > > > >> adperezmorales@gmail.com> wrote:
>> > > > >>
>> > > > >> > Hi Joshua
>> > > > >> >
>> > > > >> > It is not the list for that, but Marmotta is already
>> integrated in
>> > > > Apache
>> > > > >> > Stanbol. You can take a look at this issue
>> > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
>> > > > >> >
>> > > > >> > Anyway, as I said this is not the list for that, so let's use
>> the
>> > > > proper
>> > > > >> > list for these things.
>> > > > >> >
>> > > > >> > Regards
>> > > > >> >
>> > > > >> >
>> > > > >> >
>> > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
>> joshua.dunham@gmail.com
>> > >:
>> > > > >> >
>> > > > >> > > Hey Dileepa,
>> > > > >> > >
>> > > > >> > >       In case you were interested, I pinged the list a few
>> days
>> > > ago
>> > > > >> > asking
>> > > > >> > > for integration tips for Apache Marmotta.
>> > > > >> > >
>> > > > >> > > I got some great tips on how to do this which could help you.
>> > > Since
>> > > > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol it
>> may
>> > > be
>> > > > >> > easier
>> > > > >> > > for you to take this way.
>> > > > >> > >
>> > > > >> > > I'm not a Java programmer but I'm bringing this problem to
>> the
>> > > > >> > development
>> > > > >> > > staff at my company for assistance. If you like the Marmotta
>> > > > approach
>> > > > >> we
>> > > > >> > > may gain more traction solving the same integration.
>> > > > >> > >
>> > > > >> > > I'm also integrating Marmotta with Stanbol so the effect
>> would
>> > be
>> > > > the
>> > > > >> > same
>> > > > >> > > except not using the Stanbol API for data import in favor of
>> > > > Marmotta.
>> > > > >> > >
>> > > > >> > > Best,
>> > > > >> > >
>> > > > >> > > -J
>> > > > >> > >
>> > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
>> > > djayakody@zaizi.com
>> > > > >
>> > > > >> > > wrote:
>> > > > >> > > >
>> > > > >> > > > Hi all,
>> > > > >> > > >
>> > > > >> > > > Thanks you for the feedback and offering your help in this.
>> > > > >> > > > Let me get back to you on where to start the code base.
>> > > > >> > > > As the first step, I would like to start by creating a
>> > > > architecture
>> > > > >> > > diagram
>> > > > >> > > > for the connector.
>> > > > >> > > > I will send the diagram for your review soon.
>> > > > >> > > >
>> > > > >> > > > Thanks,
>> > > > >> > > > Dileepa
>> > > > >> > > >
>> > > > >> > > > --
>> > > > >> > > >
>> > > > >> > > > ------------------------------
>> > > > >> > > > This message should be regarded as confidential. If you
>> have
>> > > > received
>> > > > >> > > this
>> > > > >> > > > email in error please notify the sender and destroy it
>> > > > immediately.
>> > > > >> > > > Statements of intent shall only become binding when
>> confirmed
>> > in
>> > > > hard
>> > > > >> > > copy
>> > > > >> > > > by an authorised signatory.
>> > > > >> > > >
>> > > > >> > > > Zaizi Ltd is registered in England and Wales with the
>> > > registration
>> > > > >> > number
>> > > > >> > > > 6440931. The Registered Office is Brook House, 229
>> Shepherds
>> > > Bush
>> > > > >> Road,
>> > > > >> > > > London W6 7AN.
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >> --
>> > > > >>
>> > > > >> ------------------------------
>> > > > >> This message should be regarded as confidential. If you have
>> > received
>> > > > this
>> > > > >> email in error please notify the sender and destroy it
>> immediately.
>> > > > >> Statements of intent shall only become binding when confirmed in
>> > hard
>> > > > copy
>> > > > >> by an authorised signatory.
>> > > > >>
>> > > > >> Zaizi Ltd is registered in England and Wales with the
>> registration
>> > > > number
>> > > > >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>> > > Road,
>> > > > >> London W6 7AN.
>> > > > >>
>> > > >
>> > >
>> >
>> > --
>> >
>> > ------------------------------
>> > This message should be regarded as confidential. If you have received
>> this
>> > email in error please notify the sender and destroy it immediately.
>> > Statements of intent shall only become binding when confirmed in hard
>> copy
>> > by an authorised signatory.
>> >
>> > Zaizi Ltd is registered in England and Wales with the registration
>> number
>> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>> > London W6 7AN.
>> >
>>
>
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Karl Wright <da...@gmail.com>.
Thanks!!  Rafa and I will have a look at this over the weekend.

Karl


On Fri, Dec 11, 2015 at 7:05 AM, Dileepa Jayakody <dj...@zaizi.com>
wrote:

> Hi All,
>
> As per our discussion I have modified the Stanbol Connector so that it adds
> all extracted entity URIs and entity attributes to the repository document
> as fields.
>
> On a separate branch I have committed this code to our github project
> sensefy-connectors.
> You can find the source code here:
>
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> Let me know your feedback.
>
> I will write a blog post on how to add it in a connection and get
> ehancement results and share it with you.
>
> Thanks,
> Dileepa
>
>
>
> On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <da...@gmail.com> wrote:
>
> > Hi Dileepa,
> >
> > You cannot create sub-documents in a transformation connector.  And
> adding
> > that capability to the framework is not possible; we would be missing key
> > bookkeeping logic if that was allowed.
> >
> > Karl
> >
> >
> > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <dj...@zaizi.com>
> > wrote:
> >
> > > Hi Karl,
> > >
> > > Thanks a lot for the pointer.
> > >
> > > Stanbol doesn't update an existing document, it generates a new
> response
> > > with requested enhancement details for the content enhansment request.
> > > For example for a request like : "Paris is a city in France" following
> > RDF
> > > response [1] is given by Stanbol.
> > >
> > > In the Stanbol connector, enhancement artifacts such as TextAnnotations
> > > and EntityAnnotations are extracted from the RDF response, to generate
> > the
> > > entity abstractions and add them to the mcf repository document.
> > Currently
> > > in the Stanbol connector we have added these entity abstractions as
> JSON
> > > strings to a multi-valued 'entities' field in the repository document
> and
> > > we parse that JSON in the SolrWrapper output connector to index in
> > separate
> > > Solr cores (primary documents, linked entities and entity types with
> > their
> > > attributes).
> > >
> > > Can we can have a primary repository document and create sub documents
> > for
> > > the extracted entities? Is it possible to generate sub documents for a
> > > repo-document in a transformation connector?
> > >
> > > Thanks.
> > > Dileepa
> > >
> > > [1] Sample Stanbol response
> > >
> > > {
> > >   "@context": {
> > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > >     "dc": "http://purl.org/dc/terms/",
> > >     "dc:created": {
> > >       "@type": "xsd:dateTime"
> > >     },
> > >     "enhancer": "http://fise.iks-project.eu/ontology/",
> > >     "enhancer:confidence": {
> > >       "@type": "xsd:double"
> > >     },
> > >     "enhancer:end": {
> > >       "@type": "xsd:int"
> > >     },
> > >     "enhancer:entity-reference": {
> > >       "@type": "@id"
> > >     },
> > >     "enhancer:entity-type": {
> > >       "@type": "@id"
> > >     },
> > >     "enhancer:extracted-from": {
> > >       "@type": "@id"
> > >     },
> > >     "enhancer:start": {
> > >       "@type": "xsd:int"
> > >     },
> > >     "entityhub": "
> > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > ",
> > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > >     "foaf:depiction": {
> > >       "@type": "@id"
> > >     },
> > >     "owl": "http://www.w3.org/2002/07/owl#",
> > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > >     "schema": "http://schema.org/",
> > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > >   },
> > >   "@graph": [
> > >     {
> > >       "@id": "http://dbpedia.org/resource/France",
> > >       "@type": [
> > >         "dbp-ont:Country",
> > >         "dbp-ont:Place",
> > >         "dbp-ont:PopulatedPlace",
> > >         "http://www.opengis.net/gml/_Feature",
> > >         "owl:Thing",
> > >         "schema:Country",
> > >         "schema:Place"
> > >       ],
> > >       "foaf:depiction": [
> > >         "
> > > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> ",
> > >         "
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > "
> > >       ],
> > >       "rdfs:comment": {
> > >         "@language": "en",
> > >         "@value": "France, officially the French Republic, is a
> > > unitary semi-presidential republic in Western Europe with several
> > > overseas territories and islands located on other continents and in
> > > the Indian, Pacific, and Atlantic oceans. Metropolitan France extends
> > > from the Mediterranean Sea to the English Channel and the North Sea,
> > > and from the Rhine to the Atlantic Ocean. It is often referred to as
> > > l’Hexagone because of the geometric shape of its territory."
> > >       },
> > >       "rdfs:label": [
> > >         {
> > >           "@language": "en",
> > >           "@value": "France"
> > >         },
> > >         {
> > >           "@language": "fr",
> > >           "@value": "France"
> > >         },
> > >       ]
> > >     },
> > >
> > >     {
> > >       "@id": "http://dbpedia.org/resource/Paris",
> > >       "@type": [
> > >         "dbp-ont:Place",
> > >         "dbp-ont:PopulatedPlace",
> > >         "dbp-ont:Settlement",
> > >         "http://www.opengis.net/gml/_Feature",
> > >         "owl:Thing",
> > >         "schema:Place"
> > >       ],
> > >       "foaf:depiction": [
> > >         "
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > ",
> > >         "
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > "
> > >       ],
> > >       "geo:lat": 48.8567,
> > >       "geo:long": 2.3508,
> > >       "rdfs:comment": {
> > >         "@language": "en",
> > >         "@value": "Paris is the capital and largest city of France. It
> > > is situated on the river Seine, in northern France, at the heart of
> > > the Île-de-France region (or Paris Region, French: Région parisienne).
> > > As of January 2008 the city of Paris, within its administrative limits
> > > largely unchanged since 1860, has an estimated population of 2,211,297
> > > and a metropolitan population of 12,089,098, and is one of the most
> > > populated metropolitan areas in Europe."
> > >       },
> > >       "rdfs:label": [
> > >
> > >         {
> > >           "@language": "en",
> > >           "@value": "Paris"
> > >         },
> > >         {
> > >           "@language": "fr",
> > >           "@value": "Paris"
> > >         },
> > >       ]
> > >     },
> > >    }
> > >     {
> > >       "@id": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > >       "@type": [
> > >         "enhancer:Enhancement",
> > >         "enhancer:TextAnnotation"
> > >       ],
> > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > >       "dc:creator":
> > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > >       "dc:type": "dbp-ont:Place",
> > >       "enhancer:confidence": 0.6017613,
> > >       "enhancer:end": 5,
> > >       "enhancer:extracted-from":
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > >       "enhancer:selected-text": {
> > >         "@language": "en",
> > >         "@value": "Paris"
> > >       },
> > >       "enhancer:selection-context": {
> > >         "@language": "en",
> > >         "@value": "Paris is in France"
> > >       },
> > >       "enhancer:start": 0
> > >     },
> > >     {
> > >       "@id": "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > >       "@type": [
> > >         "enhancer:Enhancement",
> > >         "enhancer:EntityAnnotation"
> > >       ],
> > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > >       "dc:creator":
> > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > >       "dc:relation":
> > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > >       "enhancer:confidence": 1.0,
> > >       "enhancer:entity-label": {
> > >         "@language": "en",
> > >         "@value": "France"
> > >       },
> > >       "enhancer:entity-reference": "http://dbpedia.org/resource/France
> ",
> > >       "enhancer:entity-type": [
> > >         "dbp-ont:Country",
> > >         "dbp-ont:Place",
> > >         "dbp-ont:PopulatedPlace",
> > >         "schema:Country",
> > >         "schema:Place",
> > >         "http://www.opengis.net/gml/_Feature",
> > >         "owl:Thing"
> > >       ],
> > >       "enhancer:extracted-from":
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > >       "entityhub:site": "dbpedia"
> > >     },
> > >     {
> > >       "@id": "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > >       "@type": [
> > >         "enhancer:Enhancement",
> > >         "enhancer:EntityAnnotation"
> > >       ],
> > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > >       "dc:creator":
> > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > >       "dc:relation":
> > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > >       "enhancer:confidence": 0.25715446,
> > >       "enhancer:entity-label": {
> > >         "@language": "en",
> > >         "@value": "Vichy France"
> > >       },
> > >       "enhancer:entity-reference": "
> > > http://dbpedia.org/resource/Vichy_France",
> > >       "enhancer:entity-type": [
> > >         "dbp-ont:Country",
> > >         "dbp-ont:Place",
> > >         "dbp-ont:PopulatedPlace",
> > >         "schema:Country",
> > >         "schema:Place",
> > >         "http://www.opengis.net/gml/_Feature",
> > >         "owl:Thing"
> > >       ],
> > >       "enhancer:extracted-from":
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > >       "entityhub:site": "dbpedia"
> > >     },
> > >     {
> > >       "@id": "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > >       "@type": [
> > >         "enhancer:Enhancement",
> > >         "enhancer:EntityAnnotation"
> > >       ],
> > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > >       "dc:creator":
> > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > >       "dc:relation":
> > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > >       "enhancer:confidence": 0.1493264,
> > >       "enhancer:entity-label": {
> > >         "@language": "en",
> > >         "@value": "Paris Commune"
> > >       },
> > >       "enhancer:entity-reference": "
> > > http://dbpedia.org/resource/Paris_Commune",
> > >       "enhancer:entity-type": [
> > >         "dbp-ont:Country",
> > >         "dbp-ont:Place",
> > >         "dbp-ont:PopulatedPlace",
> > >         "schema:Country",
> > >         "schema:Place",
> > >         "owl:Thing"
> > >       ],
> > >       "enhancer:extracted-from":
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > >       "entityhub:site": "dbpedia"
> > >     },
> > >     {
> > >       "@id": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > >       "@type": [
> > >         "enhancer:Enhancement",
> > >         "enhancer:TextAnnotation"
> > >       ],
> > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > >       "dc:creator":
> > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > >       "dc:type": "dbp-ont:Place",
> > >       "enhancer:confidence": 0.99354976,
> > >       "enhancer:end": 18,
> > >       "enhancer:extracted-from":
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > >       "enhancer:selected-text": {
> > >         "@language": "en",
> > >         "@value": "France"
> > >       },
> > >       "enhancer:selection-context": {
> > >         "@language": "en",
> > >         "@value": "Paris is in France"
> > >       },
> > >       "enhancer:start": 12
> > >     }
> > >   ]
> > > }
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <da...@gmail.com>
> wrote:
> > >
> > > > Hi Dileepa,
> > > >
> > > > Repository connectors have an abstraction that allows them to
> generate
> > > > compound documents (where a document has a primary identifier, and
> > there
> > > > are subdocuments that share that primary identifier and have a
> > secondary
> > > > identifier).  This sounds a bit like what you are describing.  Does
> > > Stanbol
> > > > work by decorating an existing document, or does it work by
> generating
> > > all
> > > > content for a document?
> > > >
> > > > Karl
> > > >
> > > >
> > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> djayakody@zaizi.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > >
> > > > > While thanking you all for your input on Stanbol connector
> > > requirement, I
> > > > > would like to continue with modifying the Stanbol connector to be
> > > > > compatible with any output connector. If you guys can give some
> > > guidance
> > > > on
> > > > > how the entity metadata should be added to the repository document
> I
> > > can
> > > > > modify the stanbol connector accordingly.
> > > > >
> > > > > From Rafa's comments, I gathered we can add the entity metadata to
> > the
> > > > > repo.doc as key value pairs.
> > > > > However this idea is not yet clear to me. There could be 'N' number
> > of
> > > > > entities in a document and each of them will have some common
> > > attributes
> > > > > such as name, id, type and specific attributes for particular
> entity
> > > > type.
> > > > > I'm not clear on how to maintain that structure of N number of
> > entities
> > > > > with their attributes in a repo.document as key value pairs and
> make
> > > them
> > > > > LDPath compatible for retrieval in an output connector.
> > > > >
> > > > > @Rafa
> > > > > If you can please elaborate on your suggestion it would be greatly
> > > > helpful
> > > > > to me.
> > > > > All other suggestions are also welcome.
> > > > >
> > > > > Thanks,
> > > > > Dileepa
> > > > >
> > > > >
> > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <da...@gmail.com>
> > > wrote:
> > > > >
> > > > > > I, too, agree.  Somebody will need to turn this connector into
> one
> > > that
> > > > > > plays by the rules.  It may be possible for someone on the team
> > here
> > > to
> > > > > do
> > > > > > that, but it won't be me; I'm seriously overextended at the
> moment.
> > > It
> > > > > > would be best if someone who knew the connector well could do the
> > > > > necessary
> > > > > > work.
> > > > > >
> > > > > > Karl
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> rharoapache@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > I must agree with Antonio. When I started to work on this I was
> > > > > expecting
> > > > > > > the connector to work by just extracting the entities and
> > entities
> > > > > > metadata
> > > > > > > and put them as plain metadata of the documents, probably
> > following
> > > > > > LDPATH
> > > > > > > queries configuration
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > This is probably ok for Sensefy but I don’t think this could be
> > > > > suitable
> > > > > > > to be included in the project. But this is only my opinion. Of
> > > > course,
> > > > > a
> > > > > > > version of the connector that fully respect the ManifoldCF
> > > > architecture
> > > > > > > would be more than welcome in my opinion
> > > > > > >
> > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> > > > > > > <ad...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hi
> > > > > > > > The removal of the SolrWrapper is a must. It was a
> requirement
> > > for
> > > > an
> > > > > > > > internal project which has nothing to do here with a normal
> > > > operation
> > > > > > of
> > > > > > > > Manifold, so forcing the users to use Solr does not fit the
> > > > Manifold
> > > > > > > > philosophy.
> > > > > > > > In my opinion, at this moment, a Stanbol connector with such
> a
> > > big
> > > > > > > > dependency which will not fit almost any use case is not very
> > > > useful.
> > > > > > > > You should think a way to convert Stanbol connector into a
> > normal
> > > > > > > > Transformation connector without assuming that a specific
> > output
> > > > > > > connector
> > > > > > > > will be used.
> > > > > > > > Regards
> > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> > djayakody@zaizi.com
> > > >:
> > > > > > > >> Hi guys,
> > > > > > > >>
> > > > > > > >> I have developed a Stanbol connector for MCF. You can check
> it
> > > out
> > > > > > from
> > > > > > > our
> > > > > > > >> github repo here:
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > >>
> > > > > > > >> It requires the SolrWrapper output connector which indexes
> > > > enhanced
> > > > > > > >> documents, entities and entityTypes in separate Solr cores.
> > > > > Basically
> > > > > > it
> > > > > > > >> requires 3 separate solr cores configured with a specific
> Solr
> > > > > schema
> > > > > > > for
> > > > > > > >> primary documents, entities and entityTypes separately. This
> > was
> > > > > done
> > > > > > > for
> > > > > > > >> our specific use-case.
> > > > > > > >>
> > > > > > > >> The SolrWrapper code is here :
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > >>
> > > > > > > >> Perhaps we can discuss and remove the Stanbol connector's
> > > > dependency
> > > > > > > with
> > > > > > > >> SolrWrapper and have it working with any output connector.
> > > > > > > >> Please note that the Stanbol connector currently has a bug
> in
> > > the
> > > > UI
> > > > > > > >> (editSpecification) which I'm working on at the moment.
> After
> > > > fixing
> > > > > > > that I
> > > > > > > >> will update here. And also I will provide documentations for
> > > > > > configuring
> > > > > > > >> the connector.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Dileepa
> > > > > > > >>
> > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales
> <
> > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > >>
> > > > > > > >> > Hi Joshua
> > > > > > > >> >
> > > > > > > >> > It is not the list for that, but Marmotta is already
> > > integrated
> > > > in
> > > > > > > Apache
> > > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> > > > > > > >> >
> > > > > > > >> > Anyway, as I said this is not the list for that, so let's
> > use
> > > > the
> > > > > > > proper
> > > > > > > >> > list for these things.
> > > > > > > >> >
> > > > > > > >> > Regards
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > > joshua.dunham@gmail.com
> > > > > >:
> > > > > > > >> >
> > > > > > > >> > > Hey Dileepa,
> > > > > > > >> > >
> > > > > > > >> > >       In case you were interested, I pinged the list a
> few
> > > > days
> > > > > > ago
> > > > > > > >> > asking
> > > > > > > >> > > for integration tips for Apache Marmotta.
> > > > > > > >> > >
> > > > > > > >> > > I got some great tips on how to do this which could help
> > > you.
> > > > > > Since
> > > > > > > >> > > Marmotta is a drop in replacement for Clarezza on
> Stanbol
> > it
> > > > may
> > > > > > be
> > > > > > > >> > easier
> > > > > > > >> > > for you to take this way.
> > > > > > > >> > >
> > > > > > > >> > > I'm not a Java programmer but I'm bringing this problem
> to
> > > the
> > > > > > > >> > development
> > > > > > > >> > > staff at my company for assistance. If you like the
> > Marmotta
> > > > > > > approach
> > > > > > > >> we
> > > > > > > >> > > may gain more traction solving the same integration.
> > > > > > > >> > >
> > > > > > > >> > > I'm also integrating Marmotta with Stanbol so the effect
> > > would
> > > > > be
> > > > > > > the
> > > > > > > >> > same
> > > > > > > >> > > except not using the Stanbol API for data import in
> favor
> > of
> > > > > > > Marmotta.
> > > > > > > >> > >
> > > > > > > >> > > Best,
> > > > > > > >> > >
> > > > > > > >> > > -J
> > > > > > > >> > >
> > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> > > > > > djayakody@zaizi.com
> > > > > > > >
> > > > > > > >> > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > Hi all,
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks you for the feedback and offering your help in
> > > this.
> > > > > > > >> > > > Let me get back to you on where to start the code
> base.
> > > > > > > >> > > > As the first step, I would like to start by creating a
> > > > > > > architecture
> > > > > > > >> > > diagram
> > > > > > > >> > > > for the connector.
> > > > > > > >> > > > I will send the diagram for your review soon.
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks,
> > > > > > > >> > > > Dileepa
> > > > > > > >> > > >
> > > > > > > >> > > > --
> > > > > > > >> > > >
> > > > > > > >> > > > ------------------------------
> > > > > > > >> > > > This message should be regarded as confidential. If
> you
> > > have
> > > > > > > received
> > > > > > > >> > > this
> > > > > > > >> > > > email in error please notify the sender and destroy it
> > > > > > > immediately.
> > > > > > > >> > > > Statements of intent shall only become binding when
> > > > confirmed
> > > > > in
> > > > > > > hard
> > > > > > > >> > > copy
> > > > > > > >> > > > by an authorised signatory.
> > > > > > > >> > > >
> > > > > > > >> > > > Zaizi Ltd is registered in England and Wales with the
> > > > > > registration
> > > > > > > >> > number
> > > > > > > >> > > > 6440931. The Registered Office is Brook House, 229
> > > Shepherds
> > > > > > Bush
> > > > > > > >> Road,
> > > > > > > >> > > > London W6 7AN.
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >>
> > > > > > > >> ------------------------------
> > > > > > > >> This message should be regarded as confidential. If you have
> > > > > received
> > > > > > > this
> > > > > > > >> email in error please notify the sender and destroy it
> > > > immediately.
> > > > > > > >> Statements of intent shall only become binding when
> confirmed
> > in
> > > > > hard
> > > > > > > copy
> > > > > > > >> by an authorised signatory.
> > > > > > > >>
> > > > > > > >> Zaizi Ltd is registered in England and Wales with the
> > > registration
> > > > > > > number
> > > > > > > >> 6440931. The Registered Office is Brook House, 229 Shepherds
> > > Bush
> > > > > > Road,
> > > > > > > >> London W6 7AN.
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > ------------------------------
> > > > > This message should be regarded as confidential. If you have
> received
> > > > this
> > > > > email in error please notify the sender and destroy it immediately.
> > > > > Statements of intent shall only become binding when confirmed in
> hard
> > > > copy
> > > > > by an authorised signatory.
> > > > >
> > > > > Zaizi Ltd is registered in England and Wales with the registration
> > > number
> > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > Road,
> > > > > London W6 7AN.
> > > > >
> > > >
> > >
> > > --
> > >
> > > ------------------------------
> > > This message should be regarded as confidential. If you have received
> > this
> > > email in error please notify the sender and destroy it immediately.
> > > Statements of intent shall only become binding when confirmed in hard
> > copy
> > > by an authorised signatory.
> > >
> > > Zaizi Ltd is registered in England and Wales with the registration
> number
> > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > > London W6 7AN.
> > >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Rafa Haro <rh...@apache.org>.
Hi Karl,

I have just talked with Dileepa and he will be doing the Pull Request, so I
will wait to it

Thanks Dileepa

On Tue, Jan 26, 2016 at 1:45 PM Karl Wright <da...@gmail.com> wrote:

> The ticket for this is CONNECTORS-1181.  If you are willing, please name
> the branch "branches/CONNECTORS-1181".  Thanks!
>
> Karl
>
>
> On Tue, Jan 26, 2016 at 7:39 AM, Karl Wright <da...@gmail.com> wrote:
>
> > If you set up a pull request, and download a .diff, it should be easy to
> > confirm that this will "svn patch" onto a workarea.
> >
> > Karl
> >
> >
> > On Tue, Jan 26, 2016 at 7:35 AM, Rafa Haro <rh...@apache.org> wrote:
> >
> >> Hi Karl,
> >>
> >> I will proceed in the same way than the OpenNLP connector, so as I ask
> in
> >> the other email, should we do this using Pull request instead of
> manually
> >> importing the master branch of Dileepa's repo?
> >>
> >> Cheers,
> >> Rafa
> >>
> >> On Tue, Jan 26, 2016 at 12:07 PM Karl Wright <da...@gmail.com>
> wrote:
> >>
> >> > Hi Rafa,
> >> >
> >> > Any time you are ready, please import this into a branch.  I'll need
> to
> >> > look over licensing and build before committing to trunk.
> >> >
> >> > Thanks!
> >> > Karl
> >> >
> >> >
> >> > On Tue, Jan 26, 2016 at 3:20 AM, Dileepa Jayakody <
> djayakody@zaizi.com>
> >> > wrote:
> >> >
> >> > > Hi All,
> >> > >
> >> > > I have done the discussed modifications to the Stanbol connector.
> Now
> >> the
> >> > > users can either define dereference fields or define a LDPath
> program
> >> to
> >> > > extract entity properties from Stanbol entities and add them to the
> >> > > document as fields.
> >> > >
> >> > > The latest code is available here for your review:
> >> > >
> >> > >
> >> >
> >>
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> >> > >
> >> > > I have also written a blog post on how to configure the connector:
> >> > >
> >> > >
> >> >
> >>
> http://dileepajayakody.blogspot.com/2016/01/enhancing-documents-in-apache.html
> >> > >
> >> > > Looking forward to your comments.
> >> > >
> >> > > Thanks,
> >> > > Dileepa
> >> > >
> >> > >
> >> > > On Mon, Dec 14, 2015 at 1:18 PM, Rafa Haro <rh...@apache.org>
> wrote:
> >> > >
> >> > > > Hi Karl,
> >> > > >
> >> > > > I will import this one, don't worry.
> >> > > >
> >> > > > Cheers,
> >> > > > Rafa
> >> > > > El El sáb, 12 dic 2015 a las 20:36, Dileepa Jayakody <
> >> > > djayakody@zaizi.com>
> >> > > > escribió:
> >> > > >
> >> > > > > Hi Karl,
> >> > > > >
> >> > > > > Yes, I will improve the code with Rafa's reviews and then we can
> >> > import
> >> > > > it
> >> > > > > to mcf code base.
> >> > > > >
> >> > > > > Thanks
> >> > > > > Dileepa
> >> > > > >
> >> > > > > On Sat, Dec 12, 2015 at 5:26 PM, Karl Wright <
> daddywri@gmail.com>
> >> > > wrote:
> >> > > > >
> >> > > > > > Ok, it seems premature for me to try to import this from
> Github
> >> > > today,
> >> > > > so
> >> > > > > > I'll wait until the dust settles a bit further first.
> >> > > > > >
> >> > > > > > Karl
> >> > > > > >
> >> > > > > >
> >> > > > > > On Fri, Dec 11, 2015 at 1:45 PM, Dileepa Jayakody <
> >> > > djayakody@zaizi.com
> >> > > > >
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Thanks a lot Rafa for pointing that out. big miss as  I
> didn't
> >> > test
> >> > > > the
> >> > > > > > > LDPath configuration part yet. More improvements to be done.
> >> > > > > > > I will do the required mprovements as pointed out.
> >> > > > > > >
> >> > > > > > > Regards,
> >> > > > > > > Dileepa
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Fri, Dec 11, 2015 at 8:42 PM, Rafa Haro <
> rharo@apache.org>
> >> > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi Dileepa,
> >> > > > > > > >
> >> > > > > > > > The problem is not in that part on the code, it is rather
> on
> >> > this
> >> > > > > part:
> >> > > > > > > >
> >> > > > > > > > if (entity != null) { Collection<String> properties =
> >> entity.
> >> > > > > > > > getProperties(); for (String property : properties) {
> String
> >> > > > > > > > targetFieldName = derefFields.get(property); Set<String>
> >> > > > propValues =
> >> > > > > > > > entityPropertyMap.get(targetFieldName); if (propValues ==
> >> > null) {
> >> > > > > > > > propValues = new HashSet<String>(); } Collection<String>
> >> > > > > > > entityPropValues =
> >> > > > > > > > entity.getPropertyValues(property);
> >> > > > > > propValues.addAll(entityPropValues);
> >> > > > > > > > entityPropertyMap.put(targetFieldName, propValues); } }
> >> > > > > > > > You are collecting from the EnhancementStructure response
> >> just
> >> > > only
> >> > > > > the
> >> > > > > > > > configured dereferenced fields and LDPath fields are
> >> ignored.
> >> > > Also,
> >> > > > > > there
> >> > > > > > > > is a potential bug in that code if there is no
> dereferencing
> >> > > field
> >> > > > > > > > configured for a certain entity property here:
> >> > > > > > > >
> >> > > > > > > > String targetFieldName = derefFields.get(property);
> >> > > > > > > >
> >> > > > > > > > targetFieldName would be Null then. Instead of trying to
> >> index
> >> > > > every
> >> > > > > > > > property, you should just collect the configured ones by
> the
> >> > user
> >> > > > (or
> >> > > > > > at
> >> > > > > > > > least, if the user wants all of them, provide a
> >> configuration
> >> > > > option
> >> > > > > > for
> >> > > > > > > > that).
> >> > > > > > > >
> >> > > > > > > > Anyway, going back to LDPath issue, please take into
> account
> >> > that
> >> > > > > when
> >> > > > > > > you
> >> > > > > > > > define a field you must use a custom Namespace and Prefix
> >> for
> >> > > later
> >> > > > > > being
> >> > > > > > > > able to retrieve that property from the entity. If you
> >> don't do
> >> > > > that,
> >> > > > > > > > Stanbol will provide a random namespace for that property.
> >> > Check
> >> > > > this
> >> > > > > > > > example from RedLink SDK:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443
> >> > > > > > > >
> >> > > > > > > > Hope that helps
> >> > > > > > > >
> >> > > > > > > > On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <
> >> > daddywri@gmail.com>
> >> > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > The next step would be to pull this code into an svn
> >> branch.
> >> > > > This
> >> > > > > is
> >> > > > > > > > > something I can tackled after the 2.3 release candidate
> is
> >> > put
> >> > > > > > > together.
> >> > > > > > > > >
> >> > > > > > > > > Thanks,
> >> > > > > > > > > Karl
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <
> >> > > > > > djayakody@zaizi.com
> >> > > > > > > >
> >> > > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi Rafa,
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks for reviewing my code and for your feedback.
> >> Please
> >> > > see
> >> > > > my
> >> > > > > > > > > comments
> >> > > > > > > > > > inline below.
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <
> >> > rharo@apache.org
> >> > > >
> >> > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Hi Dileepa,
> >> > > > > > > > > > >
> >> > > > > > > > > > > This seems to be going in the right direction
> clearly
> >> now
> >> > > in
> >> > > > my
> >> > > > > > > > > opinion.
> >> > > > > > > > > > > Quick comments after a first review:
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >    - Rejecting a document because it can't be
> >> enhanced is
> >> > > > kind
> >> > > > > of
> >> > > > > > > > > tough.
> >> > > > > > > > > > >    You are preventing a document to be finally
> indexed
> >> > > > because
> >> > > > > > the
> >> > > > > > > > > > > enhancement
> >> > > > > > > > > > >    didn't perform correctly, probably it is better
> >> just
> >> > to
> >> > > > let
> >> > > > > > them
> >> > > > > > > > > > > continue
> >> > > > > > > > > > >    the workflow within the system
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Got your point. Will remove that part from the code
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >    - As I can deduce for the code, you are correctly
> >> > > > extracting
> >> > > > > > the
> >> > > > > > > > > > >    configured dereferenced fields, but you are not
> >> > > processing
> >> > > > > at
> >> > > > > > > all
> >> > > > > > > > > the
> >> > > > > > > > > > >    LDPath results
> >> > > > > > > > > > >
> >> > > > > > > > > > > I'm passing the LDPath program as an enhancer
> >> parameter
> >> > to
> >> > > > > > Stanbol
> >> > > > > > > to
> >> > > > > > > > > > retrieve the enhancement result according to the
> LDPath
> >> > > program
> >> > > > > > > (which
> >> > > > > > > > is
> >> > > > > > > > > > given as a text string in the connector UI).
> >> > > > > > > > > > If the user has not defined a LDPath program and added
> >> > > > derefence
> >> > > > > > > fields
> >> > > > > > > > > in
> >> > > > > > > > > > the UI instead, then the enhancement request will be
> >> built
> >> > > > using
> >> > > > > > the
> >> > > > > > > > > > dereference fields as enhancer parameters.
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > If neither a LDPath or dereference fields are given in
> >> the
> >> > > > > > > > transformation
> >> > > > > > > > > > UI, then I just call the given enhancement chain
> without
> >> > any
> >> > > > > other
> >> > > > > > > > > enhancer
> >> > > > > > > > > > paramaters.
> >> > > > > > > > > >
> >> > > > > > > > > > Please refer below code segment where I do this and
> let
> >> me
> >> > > know
> >> > > > > if
> >> > > > > > it
> >> > > > > > > > > needs
> >> > > > > > > > > > more improvements.
> >> > > > > > > > > >
> >> > > > > > > > > >             // ldpath program is given priority if
> it's
> >> set
> >> > > > > > > > > >             if (ldPath != null)
> >> > > > > > > > > >             {
> >> > > > > > > > > >                 parameters =
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
> >> > > > > > > > > >             }
> >> > > > > > > > > >             else if (!derefFields.isEmpty())
> >> > > > > > > > > >             {
> >> > > > > > > > > >                 parameters =
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
> >> > > > > > > > > >                         derefFields.keySet()).build();
> >> > > > > > > > > >             }
> >> > > > > > > > > >             else
> >> > > > > > > > > >             {
> >> > > > > > > > > >                 parameters =
> >> > > > > > > > > >
> >> > > > > > > >
> >> > > > > >
> >> > > >
> >> >
> >>
> EnhancerParameters.builder().setChain(chain).setContent(content).build();
> >> > > > > > > > > >             }
> >> > > > > > > > > >             eRes = enhancerClient.enhance(parameters);
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks,
> >> > > > > > > > > > Dileepa
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Cheers,
> >> > > > > > > > > > > Rafa
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <
> >> > > > > > > > djayakody@zaizi.com>
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Hi All,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > As per our discussion I have modified the Stanbol
> >> > > Connector
> >> > > > > so
> >> > > > > > > that
> >> > > > > > > > > it
> >> > > > > > > > > > > adds
> >> > > > > > > > > > > > all extracted entity URIs and entity attributes to
> >> the
> >> > > > > > repository
> >> > > > > > > > > > > document
> >> > > > > > > > > > > > as fields.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > On a separate branch I have committed this code to
> >> our
> >> > > > github
> >> > > > > > > > project
> >> > > > > > > > > > > > sensefy-connectors.
> >> > > > > > > > > > > > You can find the source code here:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> >> > > > > > > > > > > > Let me know your feedback.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > I will write a blog post on how to add it in a
> >> > connection
> >> > > > and
> >> > > > > > get
> >> > > > > > > > > > > > ehancement results and share it with you.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thanks,
> >> > > > > > > > > > > > Dileepa
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <
> >> > > > > > daddywri@gmail.com>
> >> > > > > > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > Hi Dileepa,
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > You cannot create sub-documents in a
> >> transformation
> >> > > > > > connector.
> >> > > > > > > > And
> >> > > > > > > > > > > > adding
> >> > > > > > > > > > > > > that capability to the framework is not
> possible;
> >> we
> >> > > > would
> >> > > > > be
> >> > > > > > > > > missing
> >> > > > > > > > > > > key
> >> > > > > > > > > > > > > bookkeeping logic if that was allowed.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Karl
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa
> Jayakody <
> >> > > > > > > > > > djayakody@zaizi.com>
> >> > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hi Karl,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Thanks a lot for the pointer.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Stanbol doesn't update an existing document,
> it
> >> > > > > generates a
> >> > > > > > > new
> >> > > > > > > > > > > > response
> >> > > > > > > > > > > > > > with requested enhancement details for the
> >> content
> >> > > > > > enhansment
> >> > > > > > > > > > > request.
> >> > > > > > > > > > > > > > For example for a request like : "Paris is a
> >> city
> >> > in
> >> > > > > > France"
> >> > > > > > > > > > > following
> >> > > > > > > > > > > > > RDF
> >> > > > > > > > > > > > > > response [1] is given by Stanbol.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > In the Stanbol connector, enhancement
> artifacts
> >> > such
> >> > > as
> >> > > > > > > > > > > TextAnnotations
> >> > > > > > > > > > > > > > and EntityAnnotations are extracted from the
> RDF
> >> > > > > response,
> >> > > > > > to
> >> > > > > > > > > > > generate
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > entity abstractions and add them to the mcf
> >> > > repository
> >> > > > > > > > document.
> >> > > > > > > > > > > > > Currently
> >> > > > > > > > > > > > > > in the Stanbol connector we have added these
> >> entity
> >> > > > > > > > abstractions
> >> > > > > > > > > as
> >> > > > > > > > > > > > JSON
> >> > > > > > > > > > > > > > strings to a multi-valued 'entities' field in
> >> the
> >> > > > > > repository
> >> > > > > > > > > > document
> >> > > > > > > > > > > > and
> >> > > > > > > > > > > > > > we parse that JSON in the SolrWrapper output
> >> > > connector
> >> > > > to
> >> > > > > > > index
> >> > > > > > > > > in
> >> > > > > > > > > > > > > separate
> >> > > > > > > > > > > > > > Solr cores (primary documents, linked entities
> >> and
> >> > > > entity
> >> > > > > > > types
> >> > > > > > > > > > with
> >> > > > > > > > > > > > > their
> >> > > > > > > > > > > > > > attributes).
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Can we can have a primary repository document
> >> and
> >> > > > create
> >> > > > > > sub
> >> > > > > > > > > > > documents
> >> > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > the extracted entities? Is it possible to
> >> generate
> >> > > sub
> >> > > > > > > > documents
> >> > > > > > > > > > for
> >> > > > > > > > > > > a
> >> > > > > > > > > > > > > > repo-document in a transformation connector?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Thanks.
> >> > > > > > > > > > > > > > Dileepa
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > [1] Sample Stanbol response
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > {
> >> > > > > > > > > > > > > >   "@context": {
> >> > > > > > > > > > > > > >     "dbp-ont": "http://dbpedia.org/ontology/
> ",
> >> > > > > > > > > > > > > >     "dc": "http://purl.org/dc/terms/",
> >> > > > > > > > > > > > > >     "dc:created": {
> >> > > > > > > > > > > > > >       "@type": "xsd:dateTime"
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >     "enhancer": "
> >> > > http://fise.iks-project.eu/ontology/
> >> > > > ",
> >> > > > > > > > > > > > > >     "enhancer:confidence": {
> >> > > > > > > > > > > > > >       "@type": "xsd:double"
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >     "enhancer:end": {
> >> > > > > > > > > > > > > >       "@type": "xsd:int"
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >     "enhancer:entity-reference": {
> >> > > > > > > > > > > > > >       "@type": "@id"
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >     "enhancer:entity-type": {
> >> > > > > > > > > > > > > >       "@type": "@id"
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >     "enhancer:extracted-from": {
> >> > > > > > > > > > > > > >       "@type": "@id"
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >     "enhancer:start": {
> >> > > > > > > > > > > > > >       "@type": "xsd:int"
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >     "entityhub": "
> >> > > > > > > > > > > > >
> >> > > http://stanbol.apache.org/ontology/entityhub/entityhub#
> >> > > > > > > > > > > > > > ",
> >> > > > > > > > > > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> >> > > > > > > > > > > > > >     "foaf:depiction": {
> >> > > > > > > > > > > > > >       "@type": "@id"
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> >> > > > > > > > > > > > > >     "rdfs": "
> >> http://www.w3.org/2000/01/rdf-schema#
> >> > ",
> >> > > > > > > > > > > > > >     "schema": "http://schema.org/",
> >> > > > > > > > > > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#
> "
> >> > > > > > > > > > > > > >   },
> >> > > > > > > > > > > > > >   "@graph": [
> >> > > > > > > > > > > > > >     {
> >> > > > > > > > > > > > > >       "@id": "
> >> http://dbpedia.org/resource/France",
> >> > > > > > > > > > > > > >       "@type": [
> >> > > > > > > > > > > > > >         "dbp-ont:Country",
> >> > > > > > > > > > > > > >         "dbp-ont:Place",
> >> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> >> > > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature
> ",
> >> > > > > > > > > > > > > >         "owl:Thing",
> >> > > > > > > > > > > > > >         "schema:Country",
> >> > > > > > > > > > > > > >         "schema:Place"
> >> > > > > > > > > > > > > >       ],
> >> > > > > > > > > > > > > >       "foaf:depiction": [
> >> > > > > > > > > > > > > >         "
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > >
> >> > > > >
> >> > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> >> > > > > > > > > > > > ",
> >> > > > > > > > > > > > > >         "
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> >> > > > > > > > > > > > > > "
> >> > > > > > > > > > > > > >       ],
> >> > > > > > > > > > > > > >       "rdfs:comment": {
> >> > > > > > > > > > > > > >         "@language": "en",
> >> > > > > > > > > > > > > >         "@value": "France, officially the
> French
> >> > > > > Republic,
> >> > > > > > > is a
> >> > > > > > > > > > > > > > unitary semi-presidential republic in Western
> >> > Europe
> >> > > > with
> >> > > > > > > > several
> >> > > > > > > > > > > > > > overseas territories and islands located on
> >> other
> >> > > > > > continents
> >> > > > > > > > and
> >> > > > > > > > > in
> >> > > > > > > > > > > > > > the Indian, Pacific, and Atlantic oceans.
> >> > > Metropolitan
> >> > > > > > France
> >> > > > > > > > > > extends
> >> > > > > > > > > > > > > > from the Mediterranean Sea to the English
> >> Channel
> >> > and
> >> > > > the
> >> > > > > > > North
> >> > > > > > > > > > Sea,
> >> > > > > > > > > > > > > > and from the Rhine to the Atlantic Ocean. It
> is
> >> > often
> >> > > > > > > referred
> >> > > > > > > > to
> >> > > > > > > > > > as
> >> > > > > > > > > > > > > > l’Hexagone because of the geometric shape of
> its
> >> > > > > > territory."
> >> > > > > > > > > > > > > >       },
> >> > > > > > > > > > > > > >       "rdfs:label": [
> >> > > > > > > > > > > > > >         {
> >> > > > > > > > > > > > > >           "@language": "en",
> >> > > > > > > > > > > > > >           "@value": "France"
> >> > > > > > > > > > > > > >         },
> >> > > > > > > > > > > > > >         {
> >> > > > > > > > > > > > > >           "@language": "fr",
> >> > > > > > > > > > > > > >           "@value": "France"
> >> > > > > > > > > > > > > >         },
> >> > > > > > > > > > > > > >       ]
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >     {
> >> > > > > > > > > > > > > >       "@id": "
> http://dbpedia.org/resource/Paris
> >> ",
> >> > > > > > > > > > > > > >       "@type": [
> >> > > > > > > > > > > > > >         "dbp-ont:Place",
> >> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> >> > > > > > > > > > > > > >         "dbp-ont:Settlement",
> >> > > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature
> ",
> >> > > > > > > > > > > > > >         "owl:Thing",
> >> > > > > > > > > > > > > >         "schema:Place"
> >> > > > > > > > > > > > > >       ],
> >> > > > > > > > > > > > > >       "foaf:depiction": [
> >> > > > > > > > > > > > > >         "
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> >> > > > > > > > > > > > > > ",
> >> > > > > > > > > > > > > >         "
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> >> > > > > > > > > > > > > > "
> >> > > > > > > > > > > > > >       ],
> >> > > > > > > > > > > > > >       "geo:lat": 48.8567,
> >> > > > > > > > > > > > > >       "geo:long": 2.3508,
> >> > > > > > > > > > > > > >       "rdfs:comment": {
> >> > > > > > > > > > > > > >         "@language": "en",
> >> > > > > > > > > > > > > >         "@value": "Paris is the capital and
> >> largest
> >> > > > city
> >> > > > > of
> >> > > > > > > > > France.
> >> > > > > > > > > > > It
> >> > > > > > > > > > > > > > is situated on the river Seine, in northern
> >> France,
> >> > > at
> >> > > > > the
> >> > > > > > > > heart
> >> > > > > > > > > of
> >> > > > > > > > > > > > > > the Île-de-France region (or Paris Region,
> >> French:
> >> > > > Région
> >> > > > > > > > > > > parisienne).
> >> > > > > > > > > > > > > > As of January 2008 the city of Paris, within
> its
> >> > > > > > > administrative
> >> > > > > > > > > > > limits
> >> > > > > > > > > > > > > > largely unchanged since 1860, has an estimated
> >> > > > population
> >> > > > > > of
> >> > > > > > > > > > > 2,211,297
> >> > > > > > > > > > > > > > and a metropolitan population of 12,089,098,
> >> and is
> >> > > one
> >> > > > > of
> >> > > > > > > the
> >> > > > > > > > > most
> >> > > > > > > > > > > > > > populated metropolitan areas in Europe."
> >> > > > > > > > > > > > > >       },
> >> > > > > > > > > > > > > >       "rdfs:label": [
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >         {
> >> > > > > > > > > > > > > >           "@language": "en",
> >> > > > > > > > > > > > > >           "@value": "Paris"
> >> > > > > > > > > > > > > >         },
> >> > > > > > > > > > > > > >         {
> >> > > > > > > > > > > > > >           "@language": "fr",
> >> > > > > > > > > > > > > >           "@value": "Paris"
> >> > > > > > > > > > > > > >         },
> >> > > > > > > > > > > > > >       ]
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >    }
> >> > > > > > > > > > > > > >     {
> >> > > > > > > > > > > > > >       "@id":
> >> > > > > > > > > >
> "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> >> > > > > > > > > > > > > >       "@type": [
> >> > > > > > > > > > > > > >         "enhancer:Enhancement",
> >> > > > > > > > > > > > > >         "enhancer:TextAnnotation"
> >> > > > > > > > > > > > > >       ],
> >> > > > > > > > > > > > > >       "dc:created":
> "2015-12-07T11:22:07.740Z",
> >> > > > > > > > > > > > > >       "dc:creator":
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> >> > > > > > > > > > > > > >       "dc:type": "dbp-ont:Place",
> >> > > > > > > > > > > > > >       "enhancer:confidence": 0.6017613,
> >> > > > > > > > > > > > > >       "enhancer:end": 5,
> >> > > > > > > > > > > > > >       "enhancer:extracted-from":
> >> > > > > > > > > > > > > >
> >> > > > > > > >
> >> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> >> > > > > > > > > > > > > >       "enhancer:selected-text": {
> >> > > > > > > > > > > > > >         "@language": "en",
> >> > > > > > > > > > > > > >         "@value": "Paris"
> >> > > > > > > > > > > > > >       },
> >> > > > > > > > > > > > > >       "enhancer:selection-context": {
> >> > > > > > > > > > > > > >         "@language": "en",
> >> > > > > > > > > > > > > >         "@value": "Paris is in France"
> >> > > > > > > > > > > > > >       },
> >> > > > > > > > > > > > > >       "enhancer:start": 0
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >     {
> >> > > > > > > > > > > > > >       "@id":
> >> > > > > > > > > >
> "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> >> > > > > > > > > > > > > >       "@type": [
> >> > > > > > > > > > > > > >         "enhancer:Enhancement",
> >> > > > > > > > > > > > > >         "enhancer:EntityAnnotation"
> >> > > > > > > > > > > > > >       ],
> >> > > > > > > > > > > > > >       "dc:created":
> "2015-12-07T11:22:07.748Z",
> >> > > > > > > > > > > > > >       "dc:creator":
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> >> > > > > > > > > > > > > >       "dc:relation":
> >> > > > > > > > > > > > > >
> >> > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> >> > > > > > > > > > > > > >       "enhancer:confidence": 1.0,
> >> > > > > > > > > > > > > >       "enhancer:entity-label": {
> >> > > > > > > > > > > > > >         "@language": "en",
> >> > > > > > > > > > > > > >         "@value": "France"
> >> > > > > > > > > > > > > >       },
> >> > > > > > > > > > > > > >       "enhancer:entity-reference": "
> >> > > > > > > > > > > http://dbpedia.org/resource/France
> >> > > > > > > > > > > > ",
> >> > > > > > > > > > > > > >       "enhancer:entity-type": [
> >> > > > > > > > > > > > > >         "dbp-ont:Country",
> >> > > > > > > > > > > > > >         "dbp-ont:Place",
> >> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> >> > > > > > > > > > > > > >         "schema:Country",
> >> > > > > > > > > > > > > >         "schema:Place",
> >> > > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature
> ",
> >> > > > > > > > > > > > > >         "owl:Thing"
> >> > > > > > > > > > > > > >       ],
> >> > > > > > > > > > > > > >       "enhancer:extracted-from":
> >> > > > > > > > > > > > > >
> >> > > > > > > >
> >> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> >> > > > > > > > > > > > > >       "entityhub:site": "dbpedia"
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >     {
> >> > > > > > > > > > > > > >       "@id":
> >> > > > > > > > > >
> "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> >> > > > > > > > > > > > > >       "@type": [
> >> > > > > > > > > > > > > >         "enhancer:Enhancement",
> >> > > > > > > > > > > > > >         "enhancer:EntityAnnotation"
> >> > > > > > > > > > > > > >       ],
> >> > > > > > > > > > > > > >       "dc:created":
> "2015-12-07T11:22:07.748Z",
> >> > > > > > > > > > > > > >       "dc:creator":
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> >> > > > > > > > > > > > > >       "dc:relation":
> >> > > > > > > > > > > > > >
> >> > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> >> > > > > > > > > > > > > >       "enhancer:confidence": 0.25715446,
> >> > > > > > > > > > > > > >       "enhancer:entity-label": {
> >> > > > > > > > > > > > > >         "@language": "en",
> >> > > > > > > > > > > > > >         "@value": "Vichy France"
> >> > > > > > > > > > > > > >       },
> >> > > > > > > > > > > > > >       "enhancer:entity-reference": "
> >> > > > > > > > > > > > > > http://dbpedia.org/resource/Vichy_France",
> >> > > > > > > > > > > > > >       "enhancer:entity-type": [
> >> > > > > > > > > > > > > >         "dbp-ont:Country",
> >> > > > > > > > > > > > > >         "dbp-ont:Place",
> >> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> >> > > > > > > > > > > > > >         "schema:Country",
> >> > > > > > > > > > > > > >         "schema:Place",
> >> > > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature
> ",
> >> > > > > > > > > > > > > >         "owl:Thing"
> >> > > > > > > > > > > > > >       ],
> >> > > > > > > > > > > > > >       "enhancer:extracted-from":
> >> > > > > > > > > > > > > >
> >> > > > > > > >
> >> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> >> > > > > > > > > > > > > >       "entityhub:site": "dbpedia"
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >     {
> >> > > > > > > > > > > > > >       "@id":
> >> > > > > > > > > >
> "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> >> > > > > > > > > > > > > >       "@type": [
> >> > > > > > > > > > > > > >         "enhancer:Enhancement",
> >> > > > > > > > > > > > > >         "enhancer:EntityAnnotation"
> >> > > > > > > > > > > > > >       ],
> >> > > > > > > > > > > > > >       "dc:created":
> "2015-12-07T11:22:07.748Z",
> >> > > > > > > > > > > > > >       "dc:creator":
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> >> > > > > > > > > > > > > >       "dc:relation":
> >> > > > > > > > > > > > > >
> >> > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> >> > > > > > > > > > > > > >       "enhancer:confidence": 0.1493264,
> >> > > > > > > > > > > > > >       "enhancer:entity-label": {
> >> > > > > > > > > > > > > >         "@language": "en",
> >> > > > > > > > > > > > > >         "@value": "Paris Commune"
> >> > > > > > > > > > > > > >       },
> >> > > > > > > > > > > > > >       "enhancer:entity-reference": "
> >> > > > > > > > > > > > > > http://dbpedia.org/resource/Paris_Commune",
> >> > > > > > > > > > > > > >       "enhancer:entity-type": [
> >> > > > > > > > > > > > > >         "dbp-ont:Country",
> >> > > > > > > > > > > > > >         "dbp-ont:Place",
> >> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> >> > > > > > > > > > > > > >         "schema:Country",
> >> > > > > > > > > > > > > >         "schema:Place",
> >> > > > > > > > > > > > > >         "owl:Thing"
> >> > > > > > > > > > > > > >       ],
> >> > > > > > > > > > > > > >       "enhancer:extracted-from":
> >> > > > > > > > > > > > > >
> >> > > > > > > >
> >> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> >> > > > > > > > > > > > > >       "entityhub:site": "dbpedia"
> >> > > > > > > > > > > > > >     },
> >> > > > > > > > > > > > > >     {
> >> > > > > > > > > > > > > >       "@id":
> >> > > > > > > > > >
> "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> >> > > > > > > > > > > > > >       "@type": [
> >> > > > > > > > > > > > > >         "enhancer:Enhancement",
> >> > > > > > > > > > > > > >         "enhancer:TextAnnotation"
> >> > > > > > > > > > > > > >       ],
> >> > > > > > > > > > > > > >       "dc:created":
> "2015-12-07T11:22:07.740Z",
> >> > > > > > > > > > > > > >       "dc:creator":
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> >> > > > > > > > > > > > > >       "dc:type": "dbp-ont:Place",
> >> > > > > > > > > > > > > >       "enhancer:confidence": 0.99354976,
> >> > > > > > > > > > > > > >       "enhancer:end": 18,
> >> > > > > > > > > > > > > >       "enhancer:extracted-from":
> >> > > > > > > > > > > > > >
> >> > > > > > > >
> >> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> >> > > > > > > > > > > > > >       "enhancer:selected-text": {
> >> > > > > > > > > > > > > >         "@language": "en",
> >> > > > > > > > > > > > > >         "@value": "France"
> >> > > > > > > > > > > > > >       },
> >> > > > > > > > > > > > > >       "enhancer:selection-context": {
> >> > > > > > > > > > > > > >         "@language": "en",
> >> > > > > > > > > > > > > >         "@value": "Paris is in France"
> >> > > > > > > > > > > > > >       },
> >> > > > > > > > > > > > > >       "enhancer:start": 12
> >> > > > > > > > > > > > > >     }
> >> > > > > > > > > > > > > >   ]
> >> > > > > > > > > > > > > > }
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <
> >> > > > > > > > daddywri@gmail.com>
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Hi Dileepa,
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Repository connectors have an abstraction
> that
> >> > > allows
> >> > > > > > them
> >> > > > > > > to
> >> > > > > > > > > > > > generate
> >> > > > > > > > > > > > > > > compound documents (where a document has a
> >> > primary
> >> > > > > > > > identifier,
> >> > > > > > > > > > and
> >> > > > > > > > > > > > > there
> >> > > > > > > > > > > > > > > are subdocuments that share that primary
> >> > identifier
> >> > > > and
> >> > > > > > > have
> >> > > > > > > > a
> >> > > > > > > > > > > > > secondary
> >> > > > > > > > > > > > > > > identifier).  This sounds a bit like what
> you
> >> are
> >> > > > > > > describing.
> >> > > > > > > > > > Does
> >> > > > > > > > > > > > > > Stanbol
> >> > > > > > > > > > > > > > > work by decorating an existing document, or
> >> does
> >> > it
> >> > > > > work
> >> > > > > > by
> >> > > > > > > > > > > > generating
> >> > > > > > > > > > > > > > all
> >> > > > > > > > > > > > > > > content for a document?
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Karl
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa
> >> Jayakody
> >> > <
> >> > > > > > > > > > > > djayakody@zaizi.com>
> >> > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Hi All,
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > While thanking you all for your input on
> >> > Stanbol
> >> > > > > > > connector
> >> > > > > > > > > > > > > > requirement, I
> >> > > > > > > > > > > > > > > > would like to continue with modifying the
> >> > Stanbol
> >> > > > > > > connector
> >> > > > > > > > > to
> >> > > > > > > > > > be
> >> > > > > > > > > > > > > > > > compatible with any output connector. If
> you
> >> > guys
> >> > > > can
> >> > > > > > > give
> >> > > > > > > > > some
> >> > > > > > > > > > > > > > guidance
> >> > > > > > > > > > > > > > > on
> >> > > > > > > > > > > > > > > > how the entity metadata should be added to
> >> the
> >> > > > > > repository
> >> > > > > > > > > > > document
> >> > > > > > > > > > > > I
> >> > > > > > > > > > > > > > can
> >> > > > > > > > > > > > > > > > modify the stanbol connector accordingly.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > From Rafa's comments, I gathered we can
> add
> >> the
> >> > > > > entity
> >> > > > > > > > > metadata
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > repo.doc as key value pairs.
> >> > > > > > > > > > > > > > > > However this idea is not yet clear to me.
> >> There
> >> > > > could
> >> > > > > > be
> >> > > > > > > > 'N'
> >> > > > > > > > > > > number
> >> > > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > > entities in a document and each of them
> will
> >> > have
> >> > > > > some
> >> > > > > > > > common
> >> > > > > > > > > > > > > > attributes
> >> > > > > > > > > > > > > > > > such as name, id, type and specific
> >> attributes
> >> > > for
> >> > > > > > > > particular
> >> > > > > > > > > > > > entity
> >> > > > > > > > > > > > > > > type.
> >> > > > > > > > > > > > > > > > I'm not clear on how to maintain that
> >> structure
> >> > > of
> >> > > > N
> >> > > > > > > number
> >> > > > > > > > > of
> >> > > > > > > > > > > > > entities
> >> > > > > > > > > > > > > > > > with their attributes in a repo.document
> as
> >> key
> >> > > > value
> >> > > > > > > pairs
> >> > > > > > > > > and
> >> > > > > > > > > > > > make
> >> > > > > > > > > > > > > > them
> >> > > > > > > > > > > > > > > > LDPath compatible for retrieval in an
> output
> >> > > > > connector.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > @Rafa
> >> > > > > > > > > > > > > > > > If you can please elaborate on your
> >> suggestion
> >> > it
> >> > > > > would
> >> > > > > > > be
> >> > > > > > > > > > > greatly
> >> > > > > > > > > > > > > > > helpful
> >> > > > > > > > > > > > > > > > to me.
> >> > > > > > > > > > > > > > > > All other suggestions are also welcome.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Thanks,
> >> > > > > > > > > > > > > > > > Dileepa
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl
> >> Wright <
> >> > > > > > > > > > daddywri@gmail.com
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > I, too, agree.  Somebody will need to
> turn
> >> > this
> >> > > > > > > connector
> >> > > > > > > > > > into
> >> > > > > > > > > > > > one
> >> > > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > > > > plays by the rules.  It may be possible
> >> for
> >> > > > someone
> >> > > > > > on
> >> > > > > > > > the
> >> > > > > > > > > > team
> >> > > > > > > > > > > > > here
> >> > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > do
> >> > > > > > > > > > > > > > > > > that, but it won't be me; I'm seriously
> >> > > > > overextended
> >> > > > > > at
> >> > > > > > > > the
> >> > > > > > > > > > > > moment.
> >> > > > > > > > > > > > > > It
> >> > > > > > > > > > > > > > > > > would be best if someone who knew the
> >> > connector
> >> > > > > well
> >> > > > > > > > could
> >> > > > > > > > > do
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > necessary
> >> > > > > > > > > > > > > > > > > work.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Karl
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa
> >> Haro <
> >> > > > > > > > > > > > rharoapache@gmail.com>
> >> > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > I must agree with Antonio. When I
> >> started
> >> > to
> >> > > > work
> >> > > > > > on
> >> > > > > > > > > this I
> >> > > > > > > > > > > was
> >> > > > > > > > > > > > > > > > expecting
> >> > > > > > > > > > > > > > > > > > the connector to work by just
> extracting
> >> > the
> >> > > > > > entities
> >> > > > > > > > and
> >> > > > > > > > > > > > > entities
> >> > > > > > > > > > > > > > > > > metadata
> >> > > > > > > > > > > > > > > > > > and put them as plain metadata of the
> >> > > > documents,
> >> > > > > > > > probably
> >> > > > > > > > > > > > > following
> >> > > > > > > > > > > > > > > > > LDPATH
> >> > > > > > > > > > > > > > > > > > queries configuration
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > This is probably ok for Sensefy but I
> >> don’t
> >> > > > think
> >> > > > > > > this
> >> > > > > > > > > > could
> >> > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > suitable
> >> > > > > > > > > > > > > > > > > > to be included in the project. But
> this
> >> is
> >> > > only
> >> > > > > my
> >> > > > > > > > > opinion.
> >> > > > > > > > > > > Of
> >> > > > > > > > > > > > > > > course,
> >> > > > > > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > > > > version of the connector that fully
> >> respect
> >> > > the
> >> > > > > > > > > ManifoldCF
> >> > > > > > > > > > > > > > > architecture
> >> > > > > > > > > > > > > > > > > > would be more than welcome in my
> opinion
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM,
> >> Antonio
> >> > > David
> >> > > > > > Pérez
> >> > > > > > > > > > Morales
> >> > > > > > > > > > > > > > > > > > <ad...@gmail.com> wrote:
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Hi
> >> > > > > > > > > > > > > > > > > > > The removal of the SolrWrapper is a
> >> must.
> >> > > It
> >> > > > > was
> >> > > > > > a
> >> > > > > > > > > > > > requirement
> >> > > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > an
> >> > > > > > > > > > > > > > > > > > > internal project which has nothing
> to
> >> do
> >> > > here
> >> > > > > > with
> >> > > > > > > a
> >> > > > > > > > > > normal
> >> > > > > > > > > > > > > > > operation
> >> > > > > > > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > > > > > Manifold, so forcing the users to
> use
> >> > Solr
> >> > > > does
> >> > > > > > not
> >> > > > > > > > fit
> >> > > > > > > > > > the
> >> > > > > > > > > > > > > > > Manifold
> >> > > > > > > > > > > > > > > > > > > philosophy.
> >> > > > > > > > > > > > > > > > > > > In my opinion, at this moment, a
> >> Stanbol
> >> > > > > > connector
> >> > > > > > > > with
> >> > > > > > > > > > > such
> >> > > > > > > > > > > > a
> >> > > > > > > > > > > > > > big
> >> > > > > > > > > > > > > > > > > > > dependency which will not fit almost
> >> any
> >> > > use
> >> > > > > case
> >> > > > > > > is
> >> > > > > > > > > not
> >> > > > > > > > > > > very
> >> > > > > > > > > > > > > > > useful.
> >> > > > > > > > > > > > > > > > > > > You should think a way to convert
> >> Stanbol
> >> > > > > > connector
> >> > > > > > > > > into
> >> > > > > > > > > > a
> >> > > > > > > > > > > > > normal
> >> > > > > > > > > > > > > > > > > > > Transformation connector without
> >> assuming
> >> > > > that
> >> > > > > a
> >> > > > > > > > > specific
> >> > > > > > > > > > > > > output
> >> > > > > > > > > > > > > > > > > > connector
> >> > > > > > > > > > > > > > > > > > > will be used.
> >> > > > > > > > > > > > > > > > > > > Regards
> >> > > > > > > > > > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa
> >> > > Jayakody <
> >> > > > > > > > > > > > > djayakody@zaizi.com
> >> > > > > > > > > > > > > > >:
> >> > > > > > > > > > > > > > > > > > >> Hi guys,
> >> > > > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > > > >> I have developed a Stanbol
> connector
> >> for
> >> > > > MCF.
> >> > > > > > You
> >> > > >

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Karl Wright <da...@gmail.com>.
The ticket for this is CONNECTORS-1181.  If you are willing, please name
the branch "branches/CONNECTORS-1181".  Thanks!

Karl


On Tue, Jan 26, 2016 at 7:39 AM, Karl Wright <da...@gmail.com> wrote:

> If you set up a pull request, and download a .diff, it should be easy to
> confirm that this will "svn patch" onto a workarea.
>
> Karl
>
>
> On Tue, Jan 26, 2016 at 7:35 AM, Rafa Haro <rh...@apache.org> wrote:
>
>> Hi Karl,
>>
>> I will proceed in the same way than the OpenNLP connector, so as I ask in
>> the other email, should we do this using Pull request instead of manually
>> importing the master branch of Dileepa's repo?
>>
>> Cheers,
>> Rafa
>>
>> On Tue, Jan 26, 2016 at 12:07 PM Karl Wright <da...@gmail.com> wrote:
>>
>> > Hi Rafa,
>> >
>> > Any time you are ready, please import this into a branch.  I'll need to
>> > look over licensing and build before committing to trunk.
>> >
>> > Thanks!
>> > Karl
>> >
>> >
>> > On Tue, Jan 26, 2016 at 3:20 AM, Dileepa Jayakody <dj...@zaizi.com>
>> > wrote:
>> >
>> > > Hi All,
>> > >
>> > > I have done the discussed modifications to the Stanbol connector. Now
>> the
>> > > users can either define dereference fields or define a LDPath program
>> to
>> > > extract entity properties from Stanbol entities and add them to the
>> > > document as fields.
>> > >
>> > > The latest code is available here for your review:
>> > >
>> > >
>> >
>> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
>> > >
>> > > I have also written a blog post on how to configure the connector:
>> > >
>> > >
>> >
>> http://dileepajayakody.blogspot.com/2016/01/enhancing-documents-in-apache.html
>> > >
>> > > Looking forward to your comments.
>> > >
>> > > Thanks,
>> > > Dileepa
>> > >
>> > >
>> > > On Mon, Dec 14, 2015 at 1:18 PM, Rafa Haro <rh...@apache.org> wrote:
>> > >
>> > > > Hi Karl,
>> > > >
>> > > > I will import this one, don't worry.
>> > > >
>> > > > Cheers,
>> > > > Rafa
>> > > > El El sáb, 12 dic 2015 a las 20:36, Dileepa Jayakody <
>> > > djayakody@zaizi.com>
>> > > > escribió:
>> > > >
>> > > > > Hi Karl,
>> > > > >
>> > > > > Yes, I will improve the code with Rafa's reviews and then we can
>> > import
>> > > > it
>> > > > > to mcf code base.
>> > > > >
>> > > > > Thanks
>> > > > > Dileepa
>> > > > >
>> > > > > On Sat, Dec 12, 2015 at 5:26 PM, Karl Wright <da...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > > Ok, it seems premature for me to try to import this from Github
>> > > today,
>> > > > so
>> > > > > > I'll wait until the dust settles a bit further first.
>> > > > > >
>> > > > > > Karl
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Dec 11, 2015 at 1:45 PM, Dileepa Jayakody <
>> > > djayakody@zaizi.com
>> > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Thanks a lot Rafa for pointing that out. big miss as  I didn't
>> > test
>> > > > the
>> > > > > > > LDPath configuration part yet. More improvements to be done.
>> > > > > > > I will do the required mprovements as pointed out.
>> > > > > > >
>> > > > > > > Regards,
>> > > > > > > Dileepa
>> > > > > > >
>> > > > > > >
>> > > > > > > On Fri, Dec 11, 2015 at 8:42 PM, Rafa Haro <rh...@apache.org>
>> > > wrote:
>> > > > > > >
>> > > > > > > > Hi Dileepa,
>> > > > > > > >
>> > > > > > > > The problem is not in that part on the code, it is rather on
>> > this
>> > > > > part:
>> > > > > > > >
>> > > > > > > > if (entity != null) { Collection<String> properties =
>> entity.
>> > > > > > > > getProperties(); for (String property : properties) { String
>> > > > > > > > targetFieldName = derefFields.get(property); Set<String>
>> > > > propValues =
>> > > > > > > > entityPropertyMap.get(targetFieldName); if (propValues ==
>> > null) {
>> > > > > > > > propValues = new HashSet<String>(); } Collection<String>
>> > > > > > > entityPropValues =
>> > > > > > > > entity.getPropertyValues(property);
>> > > > > > propValues.addAll(entityPropValues);
>> > > > > > > > entityPropertyMap.put(targetFieldName, propValues); } }
>> > > > > > > > You are collecting from the EnhancementStructure response
>> just
>> > > only
>> > > > > the
>> > > > > > > > configured dereferenced fields and LDPath fields are
>> ignored.
>> > > Also,
>> > > > > > there
>> > > > > > > > is a potential bug in that code if there is no dereferencing
>> > > field
>> > > > > > > > configured for a certain entity property here:
>> > > > > > > >
>> > > > > > > > String targetFieldName = derefFields.get(property);
>> > > > > > > >
>> > > > > > > > targetFieldName would be Null then. Instead of trying to
>> index
>> > > > every
>> > > > > > > > property, you should just collect the configured ones by the
>> > user
>> > > > (or
>> > > > > > at
>> > > > > > > > least, if the user wants all of them, provide a
>> configuration
>> > > > option
>> > > > > > for
>> > > > > > > > that).
>> > > > > > > >
>> > > > > > > > Anyway, going back to LDPath issue, please take into account
>> > that
>> > > > > when
>> > > > > > > you
>> > > > > > > > define a field you must use a custom Namespace and Prefix
>> for
>> > > later
>> > > > > > being
>> > > > > > > > able to retrieve that property from the entity. If you
>> don't do
>> > > > that,
>> > > > > > > > Stanbol will provide a random namespace for that property.
>> > Check
>> > > > this
>> > > > > > > > example from RedLink SDK:
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443
>> > > > > > > >
>> > > > > > > > Hope that helps
>> > > > > > > >
>> > > > > > > > On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <
>> > daddywri@gmail.com>
>> > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > The next step would be to pull this code into an svn
>> branch.
>> > > > This
>> > > > > is
>> > > > > > > > > something I can tackled after the 2.3 release candidate is
>> > put
>> > > > > > > together.
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > > Karl
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <
>> > > > > > djayakody@zaizi.com
>> > > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi Rafa,
>> > > > > > > > > >
>> > > > > > > > > > Thanks for reviewing my code and for your feedback.
>> Please
>> > > see
>> > > > my
>> > > > > > > > > comments
>> > > > > > > > > > inline below.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <
>> > rharo@apache.org
>> > > >
>> > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi Dileepa,
>> > > > > > > > > > >
>> > > > > > > > > > > This seems to be going in the right direction clearly
>> now
>> > > in
>> > > > my
>> > > > > > > > > opinion.
>> > > > > > > > > > > Quick comments after a first review:
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >    - Rejecting a document because it can't be
>> enhanced is
>> > > > kind
>> > > > > of
>> > > > > > > > > tough.
>> > > > > > > > > > >    You are preventing a document to be finally indexed
>> > > > because
>> > > > > > the
>> > > > > > > > > > > enhancement
>> > > > > > > > > > >    didn't perform correctly, probably it is better
>> just
>> > to
>> > > > let
>> > > > > > them
>> > > > > > > > > > > continue
>> > > > > > > > > > >    the workflow within the system
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Got your point. Will remove that part from the code
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >    - As I can deduce for the code, you are correctly
>> > > > extracting
>> > > > > > the
>> > > > > > > > > > >    configured dereferenced fields, but you are not
>> > > processing
>> > > > > at
>> > > > > > > all
>> > > > > > > > > the
>> > > > > > > > > > >    LDPath results
>> > > > > > > > > > >
>> > > > > > > > > > > I'm passing the LDPath program as an enhancer
>> parameter
>> > to
>> > > > > > Stanbol
>> > > > > > > to
>> > > > > > > > > > retrieve the enhancement result according to the LDPath
>> > > program
>> > > > > > > (which
>> > > > > > > > is
>> > > > > > > > > > given as a text string in the connector UI).
>> > > > > > > > > > If the user has not defined a LDPath program and added
>> > > > derefence
>> > > > > > > fields
>> > > > > > > > > in
>> > > > > > > > > > the UI instead, then the enhancement request will be
>> built
>> > > > using
>> > > > > > the
>> > > > > > > > > > dereference fields as enhancer parameters.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > If neither a LDPath or dereference fields are given in
>> the
>> > > > > > > > transformation
>> > > > > > > > > > UI, then I just call the given enhancement chain without
>> > any
>> > > > > other
>> > > > > > > > > enhancer
>> > > > > > > > > > paramaters.
>> > > > > > > > > >
>> > > > > > > > > > Please refer below code segment where I do this and let
>> me
>> > > know
>> > > > > if
>> > > > > > it
>> > > > > > > > > needs
>> > > > > > > > > > more improvements.
>> > > > > > > > > >
>> > > > > > > > > >             // ldpath program is given priority if it's
>> set
>> > > > > > > > > >             if (ldPath != null)
>> > > > > > > > > >             {
>> > > > > > > > > >                 parameters =
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
>> > > > > > > > > >             }
>> > > > > > > > > >             else if (!derefFields.isEmpty())
>> > > > > > > > > >             {
>> > > > > > > > > >                 parameters =
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
>> > > > > > > > > >                         derefFields.keySet()).build();
>> > > > > > > > > >             }
>> > > > > > > > > >             else
>> > > > > > > > > >             {
>> > > > > > > > > >                 parameters =
>> > > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > >
>> >
>> EnhancerParameters.builder().setChain(chain).setContent(content).build();
>> > > > > > > > > >             }
>> > > > > > > > > >             eRes = enhancerClient.enhance(parameters);
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Thanks,
>> > > > > > > > > > Dileepa
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Cheers,
>> > > > > > > > > > > Rafa
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <
>> > > > > > > > djayakody@zaizi.com>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi All,
>> > > > > > > > > > > >
>> > > > > > > > > > > > As per our discussion I have modified the Stanbol
>> > > Connector
>> > > > > so
>> > > > > > > that
>> > > > > > > > > it
>> > > > > > > > > > > adds
>> > > > > > > > > > > > all extracted entity URIs and entity attributes to
>> the
>> > > > > > repository
>> > > > > > > > > > > document
>> > > > > > > > > > > > as fields.
>> > > > > > > > > > > >
>> > > > > > > > > > > > On a separate branch I have committed this code to
>> our
>> > > > github
>> > > > > > > > project
>> > > > > > > > > > > > sensefy-connectors.
>> > > > > > > > > > > > You can find the source code here:
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
>> > > > > > > > > > > > Let me know your feedback.
>> > > > > > > > > > > >
>> > > > > > > > > > > > I will write a blog post on how to add it in a
>> > connection
>> > > > and
>> > > > > > get
>> > > > > > > > > > > > ehancement results and share it with you.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > Dileepa
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <
>> > > > > > daddywri@gmail.com>
>> > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Hi Dileepa,
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > You cannot create sub-documents in a
>> transformation
>> > > > > > connector.
>> > > > > > > > And
>> > > > > > > > > > > > adding
>> > > > > > > > > > > > > that capability to the framework is not possible;
>> we
>> > > > would
>> > > > > be
>> > > > > > > > > missing
>> > > > > > > > > > > key
>> > > > > > > > > > > > > bookkeeping logic if that was allowed.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Karl
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
>> > > > > > > > > > djayakody@zaizi.com>
>> > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hi Karl,
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Thanks a lot for the pointer.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Stanbol doesn't update an existing document, it
>> > > > > generates a
>> > > > > > > new
>> > > > > > > > > > > > response
>> > > > > > > > > > > > > > with requested enhancement details for the
>> content
>> > > > > > enhansment
>> > > > > > > > > > > request.
>> > > > > > > > > > > > > > For example for a request like : "Paris is a
>> city
>> > in
>> > > > > > France"
>> > > > > > > > > > > following
>> > > > > > > > > > > > > RDF
>> > > > > > > > > > > > > > response [1] is given by Stanbol.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > In the Stanbol connector, enhancement artifacts
>> > such
>> > > as
>> > > > > > > > > > > TextAnnotations
>> > > > > > > > > > > > > > and EntityAnnotations are extracted from the RDF
>> > > > > response,
>> > > > > > to
>> > > > > > > > > > > generate
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > entity abstractions and add them to the mcf
>> > > repository
>> > > > > > > > document.
>> > > > > > > > > > > > > Currently
>> > > > > > > > > > > > > > in the Stanbol connector we have added these
>> entity
>> > > > > > > > abstractions
>> > > > > > > > > as
>> > > > > > > > > > > > JSON
>> > > > > > > > > > > > > > strings to a multi-valued 'entities' field in
>> the
>> > > > > > repository
>> > > > > > > > > > document
>> > > > > > > > > > > > and
>> > > > > > > > > > > > > > we parse that JSON in the SolrWrapper output
>> > > connector
>> > > > to
>> > > > > > > index
>> > > > > > > > > in
>> > > > > > > > > > > > > separate
>> > > > > > > > > > > > > > Solr cores (primary documents, linked entities
>> and
>> > > > entity
>> > > > > > > types
>> > > > > > > > > > with
>> > > > > > > > > > > > > their
>> > > > > > > > > > > > > > attributes).
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Can we can have a primary repository document
>> and
>> > > > create
>> > > > > > sub
>> > > > > > > > > > > documents
>> > > > > > > > > > > > > for
>> > > > > > > > > > > > > > the extracted entities? Is it possible to
>> generate
>> > > sub
>> > > > > > > > documents
>> > > > > > > > > > for
>> > > > > > > > > > > a
>> > > > > > > > > > > > > > repo-document in a transformation connector?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Thanks.
>> > > > > > > > > > > > > > Dileepa
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > [1] Sample Stanbol response
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > {
>> > > > > > > > > > > > > >   "@context": {
>> > > > > > > > > > > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
>> > > > > > > > > > > > > >     "dc": "http://purl.org/dc/terms/",
>> > > > > > > > > > > > > >     "dc:created": {
>> > > > > > > > > > > > > >       "@type": "xsd:dateTime"
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >     "enhancer": "
>> > > http://fise.iks-project.eu/ontology/
>> > > > ",
>> > > > > > > > > > > > > >     "enhancer:confidence": {
>> > > > > > > > > > > > > >       "@type": "xsd:double"
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >     "enhancer:end": {
>> > > > > > > > > > > > > >       "@type": "xsd:int"
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >     "enhancer:entity-reference": {
>> > > > > > > > > > > > > >       "@type": "@id"
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >     "enhancer:entity-type": {
>> > > > > > > > > > > > > >       "@type": "@id"
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >     "enhancer:extracted-from": {
>> > > > > > > > > > > > > >       "@type": "@id"
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >     "enhancer:start": {
>> > > > > > > > > > > > > >       "@type": "xsd:int"
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >     "entityhub": "
>> > > > > > > > > > > > >
>> > > http://stanbol.apache.org/ontology/entityhub/entityhub#
>> > > > > > > > > > > > > > ",
>> > > > > > > > > > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
>> > > > > > > > > > > > > >     "foaf:depiction": {
>> > > > > > > > > > > > > >       "@type": "@id"
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
>> > > > > > > > > > > > > >     "rdfs": "
>> http://www.w3.org/2000/01/rdf-schema#
>> > ",
>> > > > > > > > > > > > > >     "schema": "http://schema.org/",
>> > > > > > > > > > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
>> > > > > > > > > > > > > >   },
>> > > > > > > > > > > > > >   "@graph": [
>> > > > > > > > > > > > > >     {
>> > > > > > > > > > > > > >       "@id": "
>> http://dbpedia.org/resource/France",
>> > > > > > > > > > > > > >       "@type": [
>> > > > > > > > > > > > > >         "dbp-ont:Country",
>> > > > > > > > > > > > > >         "dbp-ont:Place",
>> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
>> > > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
>> > > > > > > > > > > > > >         "owl:Thing",
>> > > > > > > > > > > > > >         "schema:Country",
>> > > > > > > > > > > > > >         "schema:Place"
>> > > > > > > > > > > > > >       ],
>> > > > > > > > > > > > > >       "foaf:depiction": [
>> > > > > > > > > > > > > >         "
>> > > > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > >
>> > > > >
>> > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
>> > > > > > > > > > > > ",
>> > > > > > > > > > > > > >         "
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
>> > > > > > > > > > > > > > "
>> > > > > > > > > > > > > >       ],
>> > > > > > > > > > > > > >       "rdfs:comment": {
>> > > > > > > > > > > > > >         "@language": "en",
>> > > > > > > > > > > > > >         "@value": "France, officially the French
>> > > > > Republic,
>> > > > > > > is a
>> > > > > > > > > > > > > > unitary semi-presidential republic in Western
>> > Europe
>> > > > with
>> > > > > > > > several
>> > > > > > > > > > > > > > overseas territories and islands located on
>> other
>> > > > > > continents
>> > > > > > > > and
>> > > > > > > > > in
>> > > > > > > > > > > > > > the Indian, Pacific, and Atlantic oceans.
>> > > Metropolitan
>> > > > > > France
>> > > > > > > > > > extends
>> > > > > > > > > > > > > > from the Mediterranean Sea to the English
>> Channel
>> > and
>> > > > the
>> > > > > > > North
>> > > > > > > > > > Sea,
>> > > > > > > > > > > > > > and from the Rhine to the Atlantic Ocean. It is
>> > often
>> > > > > > > referred
>> > > > > > > > to
>> > > > > > > > > > as
>> > > > > > > > > > > > > > l’Hexagone because of the geometric shape of its
>> > > > > > territory."
>> > > > > > > > > > > > > >       },
>> > > > > > > > > > > > > >       "rdfs:label": [
>> > > > > > > > > > > > > >         {
>> > > > > > > > > > > > > >           "@language": "en",
>> > > > > > > > > > > > > >           "@value": "France"
>> > > > > > > > > > > > > >         },
>> > > > > > > > > > > > > >         {
>> > > > > > > > > > > > > >           "@language": "fr",
>> > > > > > > > > > > > > >           "@value": "France"
>> > > > > > > > > > > > > >         },
>> > > > > > > > > > > > > >       ]
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >     {
>> > > > > > > > > > > > > >       "@id": "http://dbpedia.org/resource/Paris
>> ",
>> > > > > > > > > > > > > >       "@type": [
>> > > > > > > > > > > > > >         "dbp-ont:Place",
>> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
>> > > > > > > > > > > > > >         "dbp-ont:Settlement",
>> > > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
>> > > > > > > > > > > > > >         "owl:Thing",
>> > > > > > > > > > > > > >         "schema:Place"
>> > > > > > > > > > > > > >       ],
>> > > > > > > > > > > > > >       "foaf:depiction": [
>> > > > > > > > > > > > > >         "
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
>> > > > > > > > > > > > > > ",
>> > > > > > > > > > > > > >         "
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
>> > > > > > > > > > > > > > "
>> > > > > > > > > > > > > >       ],
>> > > > > > > > > > > > > >       "geo:lat": 48.8567,
>> > > > > > > > > > > > > >       "geo:long": 2.3508,
>> > > > > > > > > > > > > >       "rdfs:comment": {
>> > > > > > > > > > > > > >         "@language": "en",
>> > > > > > > > > > > > > >         "@value": "Paris is the capital and
>> largest
>> > > > city
>> > > > > of
>> > > > > > > > > France.
>> > > > > > > > > > > It
>> > > > > > > > > > > > > > is situated on the river Seine, in northern
>> France,
>> > > at
>> > > > > the
>> > > > > > > > heart
>> > > > > > > > > of
>> > > > > > > > > > > > > > the Île-de-France region (or Paris Region,
>> French:
>> > > > Région
>> > > > > > > > > > > parisienne).
>> > > > > > > > > > > > > > As of January 2008 the city of Paris, within its
>> > > > > > > administrative
>> > > > > > > > > > > limits
>> > > > > > > > > > > > > > largely unchanged since 1860, has an estimated
>> > > > population
>> > > > > > of
>> > > > > > > > > > > 2,211,297
>> > > > > > > > > > > > > > and a metropolitan population of 12,089,098,
>> and is
>> > > one
>> > > > > of
>> > > > > > > the
>> > > > > > > > > most
>> > > > > > > > > > > > > > populated metropolitan areas in Europe."
>> > > > > > > > > > > > > >       },
>> > > > > > > > > > > > > >       "rdfs:label": [
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >         {
>> > > > > > > > > > > > > >           "@language": "en",
>> > > > > > > > > > > > > >           "@value": "Paris"
>> > > > > > > > > > > > > >         },
>> > > > > > > > > > > > > >         {
>> > > > > > > > > > > > > >           "@language": "fr",
>> > > > > > > > > > > > > >           "@value": "Paris"
>> > > > > > > > > > > > > >         },
>> > > > > > > > > > > > > >       ]
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >    }
>> > > > > > > > > > > > > >     {
>> > > > > > > > > > > > > >       "@id":
>> > > > > > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
>> > > > > > > > > > > > > >       "@type": [
>> > > > > > > > > > > > > >         "enhancer:Enhancement",
>> > > > > > > > > > > > > >         "enhancer:TextAnnotation"
>> > > > > > > > > > > > > >       ],
>> > > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
>> > > > > > > > > > > > > >       "dc:creator":
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
>> > > > > > > > > > > > > >       "dc:type": "dbp-ont:Place",
>> > > > > > > > > > > > > >       "enhancer:confidence": 0.6017613,
>> > > > > > > > > > > > > >       "enhancer:end": 5,
>> > > > > > > > > > > > > >       "enhancer:extracted-from":
>> > > > > > > > > > > > > >
>> > > > > > > >
>> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>> > > > > > > > > > > > > >       "enhancer:selected-text": {
>> > > > > > > > > > > > > >         "@language": "en",
>> > > > > > > > > > > > > >         "@value": "Paris"
>> > > > > > > > > > > > > >       },
>> > > > > > > > > > > > > >       "enhancer:selection-context": {
>> > > > > > > > > > > > > >         "@language": "en",
>> > > > > > > > > > > > > >         "@value": "Paris is in France"
>> > > > > > > > > > > > > >       },
>> > > > > > > > > > > > > >       "enhancer:start": 0
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >     {
>> > > > > > > > > > > > > >       "@id":
>> > > > > > > > > > "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
>> > > > > > > > > > > > > >       "@type": [
>> > > > > > > > > > > > > >         "enhancer:Enhancement",
>> > > > > > > > > > > > > >         "enhancer:EntityAnnotation"
>> > > > > > > > > > > > > >       ],
>> > > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
>> > > > > > > > > > > > > >       "dc:creator":
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
>> > > > > > > > > > > > > >       "dc:relation":
>> > > > > > > > > > > > > >
>> > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
>> > > > > > > > > > > > > >       "enhancer:confidence": 1.0,
>> > > > > > > > > > > > > >       "enhancer:entity-label": {
>> > > > > > > > > > > > > >         "@language": "en",
>> > > > > > > > > > > > > >         "@value": "France"
>> > > > > > > > > > > > > >       },
>> > > > > > > > > > > > > >       "enhancer:entity-reference": "
>> > > > > > > > > > > http://dbpedia.org/resource/France
>> > > > > > > > > > > > ",
>> > > > > > > > > > > > > >       "enhancer:entity-type": [
>> > > > > > > > > > > > > >         "dbp-ont:Country",
>> > > > > > > > > > > > > >         "dbp-ont:Place",
>> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
>> > > > > > > > > > > > > >         "schema:Country",
>> > > > > > > > > > > > > >         "schema:Place",
>> > > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
>> > > > > > > > > > > > > >         "owl:Thing"
>> > > > > > > > > > > > > >       ],
>> > > > > > > > > > > > > >       "enhancer:extracted-from":
>> > > > > > > > > > > > > >
>> > > > > > > >
>> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>> > > > > > > > > > > > > >       "entityhub:site": "dbpedia"
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >     {
>> > > > > > > > > > > > > >       "@id":
>> > > > > > > > > > "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
>> > > > > > > > > > > > > >       "@type": [
>> > > > > > > > > > > > > >         "enhancer:Enhancement",
>> > > > > > > > > > > > > >         "enhancer:EntityAnnotation"
>> > > > > > > > > > > > > >       ],
>> > > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
>> > > > > > > > > > > > > >       "dc:creator":
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
>> > > > > > > > > > > > > >       "dc:relation":
>> > > > > > > > > > > > > >
>> > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
>> > > > > > > > > > > > > >       "enhancer:confidence": 0.25715446,
>> > > > > > > > > > > > > >       "enhancer:entity-label": {
>> > > > > > > > > > > > > >         "@language": "en",
>> > > > > > > > > > > > > >         "@value": "Vichy France"
>> > > > > > > > > > > > > >       },
>> > > > > > > > > > > > > >       "enhancer:entity-reference": "
>> > > > > > > > > > > > > > http://dbpedia.org/resource/Vichy_France",
>> > > > > > > > > > > > > >       "enhancer:entity-type": [
>> > > > > > > > > > > > > >         "dbp-ont:Country",
>> > > > > > > > > > > > > >         "dbp-ont:Place",
>> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
>> > > > > > > > > > > > > >         "schema:Country",
>> > > > > > > > > > > > > >         "schema:Place",
>> > > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
>> > > > > > > > > > > > > >         "owl:Thing"
>> > > > > > > > > > > > > >       ],
>> > > > > > > > > > > > > >       "enhancer:extracted-from":
>> > > > > > > > > > > > > >
>> > > > > > > >
>> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>> > > > > > > > > > > > > >       "entityhub:site": "dbpedia"
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >     {
>> > > > > > > > > > > > > >       "@id":
>> > > > > > > > > > "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
>> > > > > > > > > > > > > >       "@type": [
>> > > > > > > > > > > > > >         "enhancer:Enhancement",
>> > > > > > > > > > > > > >         "enhancer:EntityAnnotation"
>> > > > > > > > > > > > > >       ],
>> > > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
>> > > > > > > > > > > > > >       "dc:creator":
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
>> > > > > > > > > > > > > >       "dc:relation":
>> > > > > > > > > > > > > >
>> > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
>> > > > > > > > > > > > > >       "enhancer:confidence": 0.1493264,
>> > > > > > > > > > > > > >       "enhancer:entity-label": {
>> > > > > > > > > > > > > >         "@language": "en",
>> > > > > > > > > > > > > >         "@value": "Paris Commune"
>> > > > > > > > > > > > > >       },
>> > > > > > > > > > > > > >       "enhancer:entity-reference": "
>> > > > > > > > > > > > > > http://dbpedia.org/resource/Paris_Commune",
>> > > > > > > > > > > > > >       "enhancer:entity-type": [
>> > > > > > > > > > > > > >         "dbp-ont:Country",
>> > > > > > > > > > > > > >         "dbp-ont:Place",
>> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
>> > > > > > > > > > > > > >         "schema:Country",
>> > > > > > > > > > > > > >         "schema:Place",
>> > > > > > > > > > > > > >         "owl:Thing"
>> > > > > > > > > > > > > >       ],
>> > > > > > > > > > > > > >       "enhancer:extracted-from":
>> > > > > > > > > > > > > >
>> > > > > > > >
>> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>> > > > > > > > > > > > > >       "entityhub:site": "dbpedia"
>> > > > > > > > > > > > > >     },
>> > > > > > > > > > > > > >     {
>> > > > > > > > > > > > > >       "@id":
>> > > > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
>> > > > > > > > > > > > > >       "@type": [
>> > > > > > > > > > > > > >         "enhancer:Enhancement",
>> > > > > > > > > > > > > >         "enhancer:TextAnnotation"
>> > > > > > > > > > > > > >       ],
>> > > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
>> > > > > > > > > > > > > >       "dc:creator":
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
>> > > > > > > > > > > > > >       "dc:type": "dbp-ont:Place",
>> > > > > > > > > > > > > >       "enhancer:confidence": 0.99354976,
>> > > > > > > > > > > > > >       "enhancer:end": 18,
>> > > > > > > > > > > > > >       "enhancer:extracted-from":
>> > > > > > > > > > > > > >
>> > > > > > > >
>> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>> > > > > > > > > > > > > >       "enhancer:selected-text": {
>> > > > > > > > > > > > > >         "@language": "en",
>> > > > > > > > > > > > > >         "@value": "France"
>> > > > > > > > > > > > > >       },
>> > > > > > > > > > > > > >       "enhancer:selection-context": {
>> > > > > > > > > > > > > >         "@language": "en",
>> > > > > > > > > > > > > >         "@value": "Paris is in France"
>> > > > > > > > > > > > > >       },
>> > > > > > > > > > > > > >       "enhancer:start": 12
>> > > > > > > > > > > > > >     }
>> > > > > > > > > > > > > >   ]
>> > > > > > > > > > > > > > }
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <
>> > > > > > > > daddywri@gmail.com>
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Hi Dileepa,
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Repository connectors have an abstraction that
>> > > allows
>> > > > > > them
>> > > > > > > to
>> > > > > > > > > > > > generate
>> > > > > > > > > > > > > > > compound documents (where a document has a
>> > primary
>> > > > > > > > identifier,
>> > > > > > > > > > and
>> > > > > > > > > > > > > there
>> > > > > > > > > > > > > > > are subdocuments that share that primary
>> > identifier
>> > > > and
>> > > > > > > have
>> > > > > > > > a
>> > > > > > > > > > > > > secondary
>> > > > > > > > > > > > > > > identifier).  This sounds a bit like what you
>> are
>> > > > > > > describing.
>> > > > > > > > > > Does
>> > > > > > > > > > > > > > Stanbol
>> > > > > > > > > > > > > > > work by decorating an existing document, or
>> does
>> > it
>> > > > > work
>> > > > > > by
>> > > > > > > > > > > > generating
>> > > > > > > > > > > > > > all
>> > > > > > > > > > > > > > > content for a document?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Karl
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa
>> Jayakody
>> > <
>> > > > > > > > > > > > djayakody@zaizi.com>
>> > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Hi All,
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > While thanking you all for your input on
>> > Stanbol
>> > > > > > > connector
>> > > > > > > > > > > > > > requirement, I
>> > > > > > > > > > > > > > > > would like to continue with modifying the
>> > Stanbol
>> > > > > > > connector
>> > > > > > > > > to
>> > > > > > > > > > be
>> > > > > > > > > > > > > > > > compatible with any output connector. If you
>> > guys
>> > > > can
>> > > > > > > give
>> > > > > > > > > some
>> > > > > > > > > > > > > > guidance
>> > > > > > > > > > > > > > > on
>> > > > > > > > > > > > > > > > how the entity metadata should be added to
>> the
>> > > > > > repository
>> > > > > > > > > > > document
>> > > > > > > > > > > > I
>> > > > > > > > > > > > > > can
>> > > > > > > > > > > > > > > > modify the stanbol connector accordingly.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > From Rafa's comments, I gathered we can add
>> the
>> > > > > entity
>> > > > > > > > > metadata
>> > > > > > > > > > > to
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > repo.doc as key value pairs.
>> > > > > > > > > > > > > > > > However this idea is not yet clear to me.
>> There
>> > > > could
>> > > > > > be
>> > > > > > > > 'N'
>> > > > > > > > > > > number
>> > > > > > > > > > > > > of
>> > > > > > > > > > > > > > > > entities in a document and each of them will
>> > have
>> > > > > some
>> > > > > > > > common
>> > > > > > > > > > > > > > attributes
>> > > > > > > > > > > > > > > > such as name, id, type and specific
>> attributes
>> > > for
>> > > > > > > > particular
>> > > > > > > > > > > > entity
>> > > > > > > > > > > > > > > type.
>> > > > > > > > > > > > > > > > I'm not clear on how to maintain that
>> structure
>> > > of
>> > > > N
>> > > > > > > number
>> > > > > > > > > of
>> > > > > > > > > > > > > entities
>> > > > > > > > > > > > > > > > with their attributes in a repo.document as
>> key
>> > > > value
>> > > > > > > pairs
>> > > > > > > > > and
>> > > > > > > > > > > > make
>> > > > > > > > > > > > > > them
>> > > > > > > > > > > > > > > > LDPath compatible for retrieval in an output
>> > > > > connector.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > @Rafa
>> > > > > > > > > > > > > > > > If you can please elaborate on your
>> suggestion
>> > it
>> > > > > would
>> > > > > > > be
>> > > > > > > > > > > greatly
>> > > > > > > > > > > > > > > helpful
>> > > > > > > > > > > > > > > > to me.
>> > > > > > > > > > > > > > > > All other suggestions are also welcome.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > Dileepa
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl
>> Wright <
>> > > > > > > > > > daddywri@gmail.com
>> > > > > > > > > > > >
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > I, too, agree.  Somebody will need to turn
>> > this
>> > > > > > > connector
>> > > > > > > > > > into
>> > > > > > > > > > > > one
>> > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > > > plays by the rules.  It may be possible
>> for
>> > > > someone
>> > > > > > on
>> > > > > > > > the
>> > > > > > > > > > team
>> > > > > > > > > > > > > here
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > do
>> > > > > > > > > > > > > > > > > that, but it won't be me; I'm seriously
>> > > > > overextended
>> > > > > > at
>> > > > > > > > the
>> > > > > > > > > > > > moment.
>> > > > > > > > > > > > > > It
>> > > > > > > > > > > > > > > > > would be best if someone who knew the
>> > connector
>> > > > > well
>> > > > > > > > could
>> > > > > > > > > do
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > > > necessary
>> > > > > > > > > > > > > > > > > work.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Karl
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa
>> Haro <
>> > > > > > > > > > > > rharoapache@gmail.com>
>> > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > I must agree with Antonio. When I
>> started
>> > to
>> > > > work
>> > > > > > on
>> > > > > > > > > this I
>> > > > > > > > > > > was
>> > > > > > > > > > > > > > > > expecting
>> > > > > > > > > > > > > > > > > > the connector to work by just extracting
>> > the
>> > > > > > entities
>> > > > > > > > and
>> > > > > > > > > > > > > entities
>> > > > > > > > > > > > > > > > > metadata
>> > > > > > > > > > > > > > > > > > and put them as plain metadata of the
>> > > > documents,
>> > > > > > > > probably
>> > > > > > > > > > > > > following
>> > > > > > > > > > > > > > > > > LDPATH
>> > > > > > > > > > > > > > > > > > queries configuration
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > This is probably ok for Sensefy but I
>> don’t
>> > > > think
>> > > > > > > this
>> > > > > > > > > > could
>> > > > > > > > > > > be
>> > > > > > > > > > > > > > > > suitable
>> > > > > > > > > > > > > > > > > > to be included in the project. But this
>> is
>> > > only
>> > > > > my
>> > > > > > > > > opinion.
>> > > > > > > > > > > Of
>> > > > > > > > > > > > > > > course,
>> > > > > > > > > > > > > > > > a
>> > > > > > > > > > > > > > > > > > version of the connector that fully
>> respect
>> > > the
>> > > > > > > > > ManifoldCF
>> > > > > > > > > > > > > > > architecture
>> > > > > > > > > > > > > > > > > > would be more than welcome in my opinion
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM,
>> Antonio
>> > > David
>> > > > > > Pérez
>> > > > > > > > > > Morales
>> > > > > > > > > > > > > > > > > > <ad...@gmail.com> wrote:
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Hi
>> > > > > > > > > > > > > > > > > > > The removal of the SolrWrapper is a
>> must.
>> > > It
>> > > > > was
>> > > > > > a
>> > > > > > > > > > > > requirement
>> > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > an
>> > > > > > > > > > > > > > > > > > > internal project which has nothing to
>> do
>> > > here
>> > > > > > with
>> > > > > > > a
>> > > > > > > > > > normal
>> > > > > > > > > > > > > > > operation
>> > > > > > > > > > > > > > > > > of
>> > > > > > > > > > > > > > > > > > > Manifold, so forcing the users to use
>> > Solr
>> > > > does
>> > > > > > not
>> > > > > > > > fit
>> > > > > > > > > > the
>> > > > > > > > > > > > > > > Manifold
>> > > > > > > > > > > > > > > > > > > philosophy.
>> > > > > > > > > > > > > > > > > > > In my opinion, at this moment, a
>> Stanbol
>> > > > > > connector
>> > > > > > > > with
>> > > > > > > > > > > such
>> > > > > > > > > > > > a
>> > > > > > > > > > > > > > big
>> > > > > > > > > > > > > > > > > > > dependency which will not fit almost
>> any
>> > > use
>> > > > > case
>> > > > > > > is
>> > > > > > > > > not
>> > > > > > > > > > > very
>> > > > > > > > > > > > > > > useful.
>> > > > > > > > > > > > > > > > > > > You should think a way to convert
>> Stanbol
>> > > > > > connector
>> > > > > > > > > into
>> > > > > > > > > > a
>> > > > > > > > > > > > > normal
>> > > > > > > > > > > > > > > > > > > Transformation connector without
>> assuming
>> > > > that
>> > > > > a
>> > > > > > > > > specific
>> > > > > > > > > > > > > output
>> > > > > > > > > > > > > > > > > > connector
>> > > > > > > > > > > > > > > > > > > will be used.
>> > > > > > > > > > > > > > > > > > > Regards
>> > > > > > > > > > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa
>> > > Jayakody <
>> > > > > > > > > > > > > djayakody@zaizi.com
>> > > > > > > > > > > > > > >:
>> > > > > > > > > > > > > > > > > > >> Hi guys,
>> > > > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > > > >> I have developed a Stanbol connector
>> for
>> > > > MCF.
>> > > > > > You
>> > > > > > > > can
>> > > > > > > > > > > check
>> > > > > > > > > > > > it
>> > > > > > > > > > > > > > out
>> > > > > > > > > > > > > > > > > from
>> > > > > > > > > > > > > > > > > > our
>> > > > > > > > > > > > > > > > > > >> github repo here:
>> > > > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
>> > > > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > > > >> It requires the SolrWrapper output
>> > > connector
>> > > > > > which
>> > > > > > > > > > indexes
>> > > > > > > > > > > > > > > enhanced
>> > > > > > > > > > > > > > > > > > >> documents, entities and entityTypes
>> in
>> > > > > separate
>> > > > > > > Solr
>> > > > > > > > > > > cores.
>> > > > > > > > > > > > > > > > Basically
>> > > > > > > > > > > > > > > > > it
>> > > > > > > > > > > > > > > > > > >> requires 3 separate solr cores
>> > configured
>> > > > > with a
>> > > > > > > > > > specific
>> > > > > > > > > > > > Solr
>> > > > > > > > > > > > > > > > schema
>> > > > > > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > > >> primary documents, entities and
>> > > entityTypes
>> > > > > > > > > separately.
>> > > > > > > > > > > This
>> > > > > > > > > > > > > was
>> > > > > > > > > > > > > > > > done
>> > > > > > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > > >> our specific use-case.
>> > > > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > > > >> The SolrWrapper code is here :
>> > > > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
>> > > > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > > > >> Perhaps we can discuss and remove the
>> > > > Stanbol
>> > > > > > > > > > connector's
>> > > > > > > > > > > > > > > dependency
>> > > > > > > > > > > > > > > > > > with
>> > > > > > > > > > > > > > > > > > >> SolrWrapper and have it working with
>> any
>> > > > > output
>> > > > > > > > > > connector.
>> > > > > > > > > > > > > > > > > > >> Please note that the Stanbol
>> connector
>> > > > > currently
>> > > > > > > > has a
>> > > > > > > > > > bug
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > UI
>> > > > > > > > > > > > > > > > > > >> (editSpecification) which I'm
>> working on
>> > > at
>> > > > > the
>> > > > > > > > > moment.
>> > > > > > > > > > > > After
>> > > > > > > > > > > > > > > fixing
>> > > > > > > > > > > > > > > > > > that I
>> > > > > > > > > > > > > > > > > > >> will update here. And also I will
>> > provide
>> > > > > > > > > documentations
>> > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > configuring
>> > > > > > > > > > > > > > > > > > >> the connector.
>> > > > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > > > >> Thanks,
>> > > > > > > > > > > > > > > > > > >> Dileepa
>> > > > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM,
>> Antonio
>> > > > David
>> > > > > > > Pérez
>> > > > > > > > > > > Morales
>> > > > > > > > > > > > <
>> > > > > > > > > > > > > > > > > > >> adperezmorales@gmail.com> wrote:
>> > > > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > > > >> > Hi Joshua
>> > > > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > > > >> > It is not the list for that, but
>> > > Marmotta
>> > > > is
>> > > > > > > > already
>> > > > > > > > > > > > > > integrated
>> > > > > > > > > > > > > > > in
>> > > > > > > > > > > > > > > > > > Apache
>> > > > > > > > > > > > > > > > > > >> > Stanbol. You can take a look at
>> this
>> > > issue
>> > > > > > > > > > > > > > > > > > >> >
>> > > > > > > > https://issues.apache.org/jira/browse/STANBOL-1165
>> > > > > > > > > .
>> > > > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > > > >> > Anyway, as I said this is not the
>> list
>> > > for
>> > > > > > that,
>> > > > > > > > so
>> > > > > > > > > > > let's
>> > > > > > > > > > > > > use
>> > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > proper
>> > > > > > > > > > > > > > > > > > >> > list for these things.
>> > > > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > > > >> > Regards
>> > > > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua
>> > > Dunham <
>> > > > > > > > > > > > > > > joshua.dunham@gmail.com
>> > > > > > > > > > > > > > > > >:
>> > > > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > > > >> > > Hey Dileepa,
>> > > > > > > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > > > > > > >> > >       In case you were
>> interested, I
>> > > > > pinged
>> > > > > > > the
>> > > > > > > > > > list a
>> > > > > > > > > > > > few
>> > > > > > > > > > > > > > > days
>> > > > > > > > > > > > > > > > > ago
>> > > > > > > > > > > > > > > > > > >> > asking
>> > > > > > > > > > > > > > > > > > >> > > for integration tips for Apache
>> > > > Marmotta.
>> > > > > > > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > > > > > > >> > > I got some great tips on how to
>> do
>> > > this
>> > > > > > which
>> > > > > > > > > could
>> > > > > > > > > > > help
>> > > > > > > > > > > > > > you.
>> > > > > > > > > > > > > > > > > Since
>> > > > > > > > > > > > > > > > > > >> > > Marmotta is a drop in replacement
>> > for
>> > > > > > Clarezza
>> > > > > > > > on
>> > > > > > > > > > > > Stanbol
>> > > > > > > > > > > > > it
>> > > > > > > > > > > > > > > may
>> > > > > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > > > >> > easier
>> > > > > > > > > > > > > > > > > > >> > > for you to take this way.
>> > > > > > > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > > > > > > >> > > I'm not a Java programmer but I'm
>> > > > bringing
>> > > > > > > this
>> > > > > > > > > > > problem
>> > > > > > > > > > > > to
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > >> > development
>> > > > > > > > > > > > > > > > > > >> > > staff at my company for
>> assistance.
>> > If
>> > > > you
>> > > > > > > like
>> > > > > > > > > the
>> > > > > > > > > > > > > Marmotta
>> > > > > > > > > > > > > > > > > > approach
>> > > > > > > > > > > > > > > > > > >> we
>> > > > > > > > > > > > > > > > > > >> > > may gain more traction solving
>> the
>> > > same
>> > > > > > > > > integration.
>> > > > > > > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > > > > > > >> > > I'm also integrating Marmotta
>> with
>> > > > Stanbol
>> > > > > > so
>> > > > > > > > the
>> > > > > > > > > > > effect
>> > > > > > > > > > > > > > would
>> > > > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > >> > same
>> > > > > > > > > > > > > > > > > > >> > > except not using the Stanbol API
>> for
>> > > > data
>> > > > > > > import
>> > > > > > > > > in
>> > > > > > > > > > > > favor
>> > > > > > > > > > > > > of
>> > > > > > > > > > > > > > > > > > Marmotta.
>> > > > > > > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > > > > > > >> > > Best,
>> > > > > > > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > > > > > > >> > > -J
>> > > > > > > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM,
>> > Dileepa
>> > > > > > > Jayakody <
>> > > > > > > > > > > > > > > > > djayakody@zaizi.com
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >> > > wrote:
>> > > > > > > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > > > > > > >> > > > Hi all,
>> > > > > > > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > > > > > > >> > > > Thanks you for the feedback and
>> > > > offering
>> > > > > > > your
>> > > > > > > > > help
>> > > > > > > > > > > in
>> > > > > > > > > > > > > > this.
>> > > > > > > > > > > > > > > > > > >> > > > Let me get back to you on
>> where to
>> > > > start
>> > > > > > the
>> > > > > > > > > code
>> > > > > > > > > > > > base.
>> > > > > > > > > > > > > > > > > > >> > > > As the first step, I would
>> like to
>> > > > start
>> > > > > > by
>> > > > > > > > > > > creating a
>> > > > > > > > > > > > > > > > > > architecture
>> > > > > > > > > > > > > > > > > > >> > > diagram
>> > > > > > > > > > > > > > > > > > >> > > > for the connector.
>> > > > > > > > > > > > > > > > > > >> > > > I will send the diagram for
>> your
>> > > > review
>> > > > > > > soon.
>> > > > > > > > > > > > > > > > > > >> > > >
>> >
>>
>
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Karl Wright <da...@gmail.com>.
If you set up a pull request, and download a .diff, it should be easy to
confirm that this will "svn patch" onto a workarea.

Karl


On Tue, Jan 26, 2016 at 7:35 AM, Rafa Haro <rh...@apache.org> wrote:

> Hi Karl,
>
> I will proceed in the same way than the OpenNLP connector, so as I ask in
> the other email, should we do this using Pull request instead of manually
> importing the master branch of Dileepa's repo?
>
> Cheers,
> Rafa
>
> On Tue, Jan 26, 2016 at 12:07 PM Karl Wright <da...@gmail.com> wrote:
>
> > Hi Rafa,
> >
> > Any time you are ready, please import this into a branch.  I'll need to
> > look over licensing and build before committing to trunk.
> >
> > Thanks!
> > Karl
> >
> >
> > On Tue, Jan 26, 2016 at 3:20 AM, Dileepa Jayakody <dj...@zaizi.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I have done the discussed modifications to the Stanbol connector. Now
> the
> > > users can either define dereference fields or define a LDPath program
> to
> > > extract entity properties from Stanbol entities and add them to the
> > > document as fields.
> > >
> > > The latest code is available here for your review:
> > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > >
> > > I have also written a blog post on how to configure the connector:
> > >
> > >
> >
> http://dileepajayakody.blogspot.com/2016/01/enhancing-documents-in-apache.html
> > >
> > > Looking forward to your comments.
> > >
> > > Thanks,
> > > Dileepa
> > >
> > >
> > > On Mon, Dec 14, 2015 at 1:18 PM, Rafa Haro <rh...@apache.org> wrote:
> > >
> > > > Hi Karl,
> > > >
> > > > I will import this one, don't worry.
> > > >
> > > > Cheers,
> > > > Rafa
> > > > El El sáb, 12 dic 2015 a las 20:36, Dileepa Jayakody <
> > > djayakody@zaizi.com>
> > > > escribió:
> > > >
> > > > > Hi Karl,
> > > > >
> > > > > Yes, I will improve the code with Rafa's reviews and then we can
> > import
> > > > it
> > > > > to mcf code base.
> > > > >
> > > > > Thanks
> > > > > Dileepa
> > > > >
> > > > > On Sat, Dec 12, 2015 at 5:26 PM, Karl Wright <da...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Ok, it seems premature for me to try to import this from Github
> > > today,
> > > > so
> > > > > > I'll wait until the dust settles a bit further first.
> > > > > >
> > > > > > Karl
> > > > > >
> > > > > >
> > > > > > On Fri, Dec 11, 2015 at 1:45 PM, Dileepa Jayakody <
> > > djayakody@zaizi.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks a lot Rafa for pointing that out. big miss as  I didn't
> > test
> > > > the
> > > > > > > LDPath configuration part yet. More improvements to be done.
> > > > > > > I will do the required mprovements as pointed out.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Dileepa
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Dec 11, 2015 at 8:42 PM, Rafa Haro <rh...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Dileepa,
> > > > > > > >
> > > > > > > > The problem is not in that part on the code, it is rather on
> > this
> > > > > part:
> > > > > > > >
> > > > > > > > if (entity != null) { Collection<String> properties = entity.
> > > > > > > > getProperties(); for (String property : properties) { String
> > > > > > > > targetFieldName = derefFields.get(property); Set<String>
> > > > propValues =
> > > > > > > > entityPropertyMap.get(targetFieldName); if (propValues ==
> > null) {
> > > > > > > > propValues = new HashSet<String>(); } Collection<String>
> > > > > > > entityPropValues =
> > > > > > > > entity.getPropertyValues(property);
> > > > > > propValues.addAll(entityPropValues);
> > > > > > > > entityPropertyMap.put(targetFieldName, propValues); } }
> > > > > > > > You are collecting from the EnhancementStructure response
> just
> > > only
> > > > > the
> > > > > > > > configured dereferenced fields and LDPath fields are ignored.
> > > Also,
> > > > > > there
> > > > > > > > is a potential bug in that code if there is no dereferencing
> > > field
> > > > > > > > configured for a certain entity property here:
> > > > > > > >
> > > > > > > > String targetFieldName = derefFields.get(property);
> > > > > > > >
> > > > > > > > targetFieldName would be Null then. Instead of trying to
> index
> > > > every
> > > > > > > > property, you should just collect the configured ones by the
> > user
> > > > (or
> > > > > > at
> > > > > > > > least, if the user wants all of them, provide a configuration
> > > > option
> > > > > > for
> > > > > > > > that).
> > > > > > > >
> > > > > > > > Anyway, going back to LDPath issue, please take into account
> > that
> > > > > when
> > > > > > > you
> > > > > > > > define a field you must use a custom Namespace and Prefix for
> > > later
> > > > > > being
> > > > > > > > able to retrieve that property from the entity. If you don't
> do
> > > > that,
> > > > > > > > Stanbol will provide a random namespace for that property.
> > Check
> > > > this
> > > > > > > > example from RedLink SDK:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443
> > > > > > > >
> > > > > > > > Hope that helps
> > > > > > > >
> > > > > > > > On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <
> > daddywri@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > The next step would be to pull this code into an svn
> branch.
> > > > This
> > > > > is
> > > > > > > > > something I can tackled after the 2.3 release candidate is
> > put
> > > > > > > together.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Karl
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <
> > > > > > djayakody@zaizi.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Rafa,
> > > > > > > > > >
> > > > > > > > > > Thanks for reviewing my code and for your feedback.
> Please
> > > see
> > > > my
> > > > > > > > > comments
> > > > > > > > > > inline below.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <
> > rharo@apache.org
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Dileepa,
> > > > > > > > > > >
> > > > > > > > > > > This seems to be going in the right direction clearly
> now
> > > in
> > > > my
> > > > > > > > > opinion.
> > > > > > > > > > > Quick comments after a first review:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >    - Rejecting a document because it can't be enhanced
> is
> > > > kind
> > > > > of
> > > > > > > > > tough.
> > > > > > > > > > >    You are preventing a document to be finally indexed
> > > > because
> > > > > > the
> > > > > > > > > > > enhancement
> > > > > > > > > > >    didn't perform correctly, probably it is better just
> > to
> > > > let
> > > > > > them
> > > > > > > > > > > continue
> > > > > > > > > > >    the workflow within the system
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Got your point. Will remove that part from the code
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >    - As I can deduce for the code, you are correctly
> > > > extracting
> > > > > > the
> > > > > > > > > > >    configured dereferenced fields, but you are not
> > > processing
> > > > > at
> > > > > > > all
> > > > > > > > > the
> > > > > > > > > > >    LDPath results
> > > > > > > > > > >
> > > > > > > > > > > I'm passing the LDPath program as an enhancer parameter
> > to
> > > > > > Stanbol
> > > > > > > to
> > > > > > > > > > retrieve the enhancement result according to the LDPath
> > > program
> > > > > > > (which
> > > > > > > > is
> > > > > > > > > > given as a text string in the connector UI).
> > > > > > > > > > If the user has not defined a LDPath program and added
> > > > derefence
> > > > > > > fields
> > > > > > > > > in
> > > > > > > > > > the UI instead, then the enhancement request will be
> built
> > > > using
> > > > > > the
> > > > > > > > > > dereference fields as enhancer parameters.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > If neither a LDPath or dereference fields are given in
> the
> > > > > > > > transformation
> > > > > > > > > > UI, then I just call the given enhancement chain without
> > any
> > > > > other
> > > > > > > > > enhancer
> > > > > > > > > > paramaters.
> > > > > > > > > >
> > > > > > > > > > Please refer below code segment where I do this and let
> me
> > > know
> > > > > if
> > > > > > it
> > > > > > > > > needs
> > > > > > > > > > more improvements.
> > > > > > > > > >
> > > > > > > > > >             // ldpath program is given priority if it's
> set
> > > > > > > > > >             if (ldPath != null)
> > > > > > > > > >             {
> > > > > > > > > >                 parameters =
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
> > > > > > > > > >             }
> > > > > > > > > >             else if (!derefFields.isEmpty())
> > > > > > > > > >             {
> > > > > > > > > >                 parameters =
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
> > > > > > > > > >                         derefFields.keySet()).build();
> > > > > > > > > >             }
> > > > > > > > > >             else
> > > > > > > > > >             {
> > > > > > > > > >                 parameters =
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > EnhancerParameters.builder().setChain(chain).setContent(content).build();
> > > > > > > > > >             }
> > > > > > > > > >             eRes = enhancerClient.enhance(parameters);
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Dileepa
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Rafa
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <
> > > > > > > > djayakody@zaizi.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi All,
> > > > > > > > > > > >
> > > > > > > > > > > > As per our discussion I have modified the Stanbol
> > > Connector
> > > > > so
> > > > > > > that
> > > > > > > > > it
> > > > > > > > > > > adds
> > > > > > > > > > > > all extracted entity URIs and entity attributes to
> the
> > > > > > repository
> > > > > > > > > > > document
> > > > > > > > > > > > as fields.
> > > > > > > > > > > >
> > > > > > > > > > > > On a separate branch I have committed this code to
> our
> > > > github
> > > > > > > > project
> > > > > > > > > > > > sensefy-connectors.
> > > > > > > > > > > > You can find the source code here:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > > > > > > > > > > > Let me know your feedback.
> > > > > > > > > > > >
> > > > > > > > > > > > I will write a blog post on how to add it in a
> > connection
> > > > and
> > > > > > get
> > > > > > > > > > > > ehancement results and share it with you.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Dileepa
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <
> > > > > > daddywri@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Dileepa,
> > > > > > > > > > > > >
> > > > > > > > > > > > > You cannot create sub-documents in a transformation
> > > > > > connector.
> > > > > > > > And
> > > > > > > > > > > > adding
> > > > > > > > > > > > > that capability to the framework is not possible;
> we
> > > > would
> > > > > be
> > > > > > > > > missing
> > > > > > > > > > > key
> > > > > > > > > > > > > bookkeeping logic if that was allowed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Karl
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
> > > > > > > > > > djayakody@zaizi.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Karl,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks a lot for the pointer.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Stanbol doesn't update an existing document, it
> > > > > generates a
> > > > > > > new
> > > > > > > > > > > > response
> > > > > > > > > > > > > > with requested enhancement details for the
> content
> > > > > > enhansment
> > > > > > > > > > > request.
> > > > > > > > > > > > > > For example for a request like : "Paris is a city
> > in
> > > > > > France"
> > > > > > > > > > > following
> > > > > > > > > > > > > RDF
> > > > > > > > > > > > > > response [1] is given by Stanbol.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In the Stanbol connector, enhancement artifacts
> > such
> > > as
> > > > > > > > > > > TextAnnotations
> > > > > > > > > > > > > > and EntityAnnotations are extracted from the RDF
> > > > > response,
> > > > > > to
> > > > > > > > > > > generate
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > entity abstractions and add them to the mcf
> > > repository
> > > > > > > > document.
> > > > > > > > > > > > > Currently
> > > > > > > > > > > > > > in the Stanbol connector we have added these
> entity
> > > > > > > > abstractions
> > > > > > > > > as
> > > > > > > > > > > > JSON
> > > > > > > > > > > > > > strings to a multi-valued 'entities' field in the
> > > > > > repository
> > > > > > > > > > document
> > > > > > > > > > > > and
> > > > > > > > > > > > > > we parse that JSON in the SolrWrapper output
> > > connector
> > > > to
> > > > > > > index
> > > > > > > > > in
> > > > > > > > > > > > > separate
> > > > > > > > > > > > > > Solr cores (primary documents, linked entities
> and
> > > > entity
> > > > > > > types
> > > > > > > > > > with
> > > > > > > > > > > > > their
> > > > > > > > > > > > > > attributes).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can we can have a primary repository document and
> > > > create
> > > > > > sub
> > > > > > > > > > > documents
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > the extracted entities? Is it possible to
> generate
> > > sub
> > > > > > > > documents
> > > > > > > > > > for
> > > > > > > > > > > a
> > > > > > > > > > > > > > repo-document in a transformation connector?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > > Dileepa
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1] Sample Stanbol response
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > >   "@context": {
> > > > > > > > > > > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > > > > > > > > > > > >     "dc": "http://purl.org/dc/terms/",
> > > > > > > > > > > > > >     "dc:created": {
> > > > > > > > > > > > > >       "@type": "xsd:dateTime"
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >     "enhancer": "
> > > http://fise.iks-project.eu/ontology/
> > > > ",
> > > > > > > > > > > > > >     "enhancer:confidence": {
> > > > > > > > > > > > > >       "@type": "xsd:double"
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >     "enhancer:end": {
> > > > > > > > > > > > > >       "@type": "xsd:int"
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >     "enhancer:entity-reference": {
> > > > > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >     "enhancer:entity-type": {
> > > > > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >     "enhancer:extracted-from": {
> > > > > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >     "enhancer:start": {
> > > > > > > > > > > > > >       "@type": "xsd:int"
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >     "entityhub": "
> > > > > > > > > > > > >
> > > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > > > > > > > > > > > ",
> > > > > > > > > > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > > > > > > > > > > > >     "foaf:depiction": {
> > > > > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > > > > > > > > > > > >     "rdfs": "
> http://www.w3.org/2000/01/rdf-schema#
> > ",
> > > > > > > > > > > > > >     "schema": "http://schema.org/",
> > > > > > > > > > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > > > > > > > > > > > >   },
> > > > > > > > > > > > > >   "@graph": [
> > > > > > > > > > > > > >     {
> > > > > > > > > > > > > >       "@id": "http://dbpedia.org/resource/France
> ",
> > > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > > > > >         "owl:Thing",
> > > > > > > > > > > > > >         "schema:Country",
> > > > > > > > > > > > > >         "schema:Place"
> > > > > > > > > > > > > >       ],
> > > > > > > > > > > > > >       "foaf:depiction": [
> > > > > > > > > > > > > >         "
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > >
> > > > >
> > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > > > > > > > > > > > ",
> > > > > > > > > > > > > >         "
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > > > > > > > > > > > "
> > > > > > > > > > > > > >       ],
> > > > > > > > > > > > > >       "rdfs:comment": {
> > > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > > >         "@value": "France, officially the French
> > > > > Republic,
> > > > > > > is a
> > > > > > > > > > > > > > unitary semi-presidential republic in Western
> > Europe
> > > > with
> > > > > > > > several
> > > > > > > > > > > > > > overseas territories and islands located on other
> > > > > > continents
> > > > > > > > and
> > > > > > > > > in
> > > > > > > > > > > > > > the Indian, Pacific, and Atlantic oceans.
> > > Metropolitan
> > > > > > France
> > > > > > > > > > extends
> > > > > > > > > > > > > > from the Mediterranean Sea to the English Channel
> > and
> > > > the
> > > > > > > North
> > > > > > > > > > Sea,
> > > > > > > > > > > > > > and from the Rhine to the Atlantic Ocean. It is
> > often
> > > > > > > referred
> > > > > > > > to
> > > > > > > > > > as
> > > > > > > > > > > > > > l’Hexagone because of the geometric shape of its
> > > > > > territory."
> > > > > > > > > > > > > >       },
> > > > > > > > > > > > > >       "rdfs:label": [
> > > > > > > > > > > > > >         {
> > > > > > > > > > > > > >           "@language": "en",
> > > > > > > > > > > > > >           "@value": "France"
> > > > > > > > > > > > > >         },
> > > > > > > > > > > > > >         {
> > > > > > > > > > > > > >           "@language": "fr",
> > > > > > > > > > > > > >           "@value": "France"
> > > > > > > > > > > > > >         },
> > > > > > > > > > > > > >       ]
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >     {
> > > > > > > > > > > > > >       "@id": "http://dbpedia.org/resource/Paris
> ",
> > > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > > > >         "dbp-ont:Settlement",
> > > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > > > > >         "owl:Thing",
> > > > > > > > > > > > > >         "schema:Place"
> > > > > > > > > > > > > >       ],
> > > > > > > > > > > > > >       "foaf:depiction": [
> > > > > > > > > > > > > >         "
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > > > > > > ",
> > > > > > > > > > > > > >         "
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > > > > > > "
> > > > > > > > > > > > > >       ],
> > > > > > > > > > > > > >       "geo:lat": 48.8567,
> > > > > > > > > > > > > >       "geo:long": 2.3508,
> > > > > > > > > > > > > >       "rdfs:comment": {
> > > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > > >         "@value": "Paris is the capital and
> largest
> > > > city
> > > > > of
> > > > > > > > > France.
> > > > > > > > > > > It
> > > > > > > > > > > > > > is situated on the river Seine, in northern
> France,
> > > at
> > > > > the
> > > > > > > > heart
> > > > > > > > > of
> > > > > > > > > > > > > > the Île-de-France region (or Paris Region,
> French:
> > > > Région
> > > > > > > > > > > parisienne).
> > > > > > > > > > > > > > As of January 2008 the city of Paris, within its
> > > > > > > administrative
> > > > > > > > > > > limits
> > > > > > > > > > > > > > largely unchanged since 1860, has an estimated
> > > > population
> > > > > > of
> > > > > > > > > > > 2,211,297
> > > > > > > > > > > > > > and a metropolitan population of 12,089,098, and
> is
> > > one
> > > > > of
> > > > > > > the
> > > > > > > > > most
> > > > > > > > > > > > > > populated metropolitan areas in Europe."
> > > > > > > > > > > > > >       },
> > > > > > > > > > > > > >       "rdfs:label": [
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         {
> > > > > > > > > > > > > >           "@language": "en",
> > > > > > > > > > > > > >           "@value": "Paris"
> > > > > > > > > > > > > >         },
> > > > > > > > > > > > > >         {
> > > > > > > > > > > > > >           "@language": "fr",
> > > > > > > > > > > > > >           "@value": "Paris"
> > > > > > > > > > > > > >         },
> > > > > > > > > > > > > >       ]
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >    }
> > > > > > > > > > > > > >     {
> > > > > > > > > > > > > >       "@id":
> > > > > > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > > > > > > >       ],
> > > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > > > > > > >       "enhancer:confidence": 0.6017613,
> > > > > > > > > > > > > >       "enhancer:end": 5,
> > > > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > > > >
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > > >         "@value": "Paris"
> > > > > > > > > > > > > >       },
> > > > > > > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > > > > > > >       },
> > > > > > > > > > > > > >       "enhancer:start": 0
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >     {
> > > > > > > > > > > > > >       "@id":
> > > > > > > > > > "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > > > > > >       ],
> > > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > > > > > >       "dc:relation":
> > > > > > > > > > > > > >
> > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > > > > > >       "enhancer:confidence": 1.0,
> > > > > > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > > >         "@value": "France"
> > > > > > > > > > > > > >       },
> > > > > > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > > > http://dbpedia.org/resource/France
> > > > > > > > > > > > ",
> > > > > > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > > > >         "schema:Country",
> > > > > > > > > > > > > >         "schema:Place",
> > > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > > > > >         "owl:Thing"
> > > > > > > > > > > > > >       ],
> > > > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > > > >
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >     {
> > > > > > > > > > > > > >       "@id":
> > > > > > > > > > "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > > > > > >       ],
> > > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > > > > > >       "dc:relation":
> > > > > > > > > > > > > >
> > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > > > > > >       "enhancer:confidence": 0.25715446,
> > > > > > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > > >         "@value": "Vichy France"
> > > > > > > > > > > > > >       },
> > > > > > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > > > > > > http://dbpedia.org/resource/Vichy_France",
> > > > > > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > > > >         "schema:Country",
> > > > > > > > > > > > > >         "schema:Place",
> > > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > > > > >         "owl:Thing"
> > > > > > > > > > > > > >       ],
> > > > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > > > >
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >     {
> > > > > > > > > > > > > >       "@id":
> > > > > > > > > > "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > > > > > >       ],
> > > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > > > > > >       "dc:relation":
> > > > > > > > > > > > > >
> > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > > > > > > >       "enhancer:confidence": 0.1493264,
> > > > > > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > > >         "@value": "Paris Commune"
> > > > > > > > > > > > > >       },
> > > > > > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > > > > > > http://dbpedia.org/resource/Paris_Commune",
> > > > > > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > > > >         "schema:Country",
> > > > > > > > > > > > > >         "schema:Place",
> > > > > > > > > > > > > >         "owl:Thing"
> > > > > > > > > > > > > >       ],
> > > > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > > > >
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > > > > > >     },
> > > > > > > > > > > > > >     {
> > > > > > > > > > > > > >       "@id":
> > > > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > > > > > > >       ],
> > > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > > > > > > >       "enhancer:confidence": 0.99354976,
> > > > > > > > > > > > > >       "enhancer:end": 18,
> > > > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > > > >
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > > >         "@value": "France"
> > > > > > > > > > > > > >       },
> > > > > > > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > > > > > > >       },
> > > > > > > > > > > > > >       "enhancer:start": 12
> > > > > > > > > > > > > >     }
> > > > > > > > > > > > > >   ]
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <
> > > > > > > > daddywri@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Dileepa,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Repository connectors have an abstraction that
> > > allows
> > > > > > them
> > > > > > > to
> > > > > > > > > > > > generate
> > > > > > > > > > > > > > > compound documents (where a document has a
> > primary
> > > > > > > > identifier,
> > > > > > > > > > and
> > > > > > > > > > > > > there
> > > > > > > > > > > > > > > are subdocuments that share that primary
> > identifier
> > > > and
> > > > > > > have
> > > > > > > > a
> > > > > > > > > > > > > secondary
> > > > > > > > > > > > > > > identifier).  This sounds a bit like what you
> are
> > > > > > > describing.
> > > > > > > > > > Does
> > > > > > > > > > > > > > Stanbol
> > > > > > > > > > > > > > > work by decorating an existing document, or
> does
> > it
> > > > > work
> > > > > > by
> > > > > > > > > > > > generating
> > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > content for a document?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Karl
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa
> Jayakody
> > <
> > > > > > > > > > > > djayakody@zaizi.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > While thanking you all for your input on
> > Stanbol
> > > > > > > connector
> > > > > > > > > > > > > > requirement, I
> > > > > > > > > > > > > > > > would like to continue with modifying the
> > Stanbol
> > > > > > > connector
> > > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > > > > > > > compatible with any output connector. If you
> > guys
> > > > can
> > > > > > > give
> > > > > > > > > some
> > > > > > > > > > > > > > guidance
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > how the entity metadata should be added to
> the
> > > > > > repository
> > > > > > > > > > > document
> > > > > > > > > > > > I
> > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > modify the stanbol connector accordingly.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > From Rafa's comments, I gathered we can add
> the
> > > > > entity
> > > > > > > > > metadata
> > > > > > > > > > > to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > repo.doc as key value pairs.
> > > > > > > > > > > > > > > > However this idea is not yet clear to me.
> There
> > > > could
> > > > > > be
> > > > > > > > 'N'
> > > > > > > > > > > number
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > entities in a document and each of them will
> > have
> > > > > some
> > > > > > > > common
> > > > > > > > > > > > > > attributes
> > > > > > > > > > > > > > > > such as name, id, type and specific
> attributes
> > > for
> > > > > > > > particular
> > > > > > > > > > > > entity
> > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > I'm not clear on how to maintain that
> structure
> > > of
> > > > N
> > > > > > > number
> > > > > > > > > of
> > > > > > > > > > > > > entities
> > > > > > > > > > > > > > > > with their attributes in a repo.document as
> key
> > > > value
> > > > > > > pairs
> > > > > > > > > and
> > > > > > > > > > > > make
> > > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > LDPath compatible for retrieval in an output
> > > > > connector.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @Rafa
> > > > > > > > > > > > > > > > If you can please elaborate on your
> suggestion
> > it
> > > > > would
> > > > > > > be
> > > > > > > > > > > greatly
> > > > > > > > > > > > > > > helpful
> > > > > > > > > > > > > > > > to me.
> > > > > > > > > > > > > > > > All other suggestions are also welcome.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > Dileepa
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright
> <
> > > > > > > > > > daddywri@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I, too, agree.  Somebody will need to turn
> > this
> > > > > > > connector
> > > > > > > > > > into
> > > > > > > > > > > > one
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > plays by the rules.  It may be possible for
> > > > someone
> > > > > > on
> > > > > > > > the
> > > > > > > > > > team
> > > > > > > > > > > > > here
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > do
> > > > > > > > > > > > > > > > > that, but it won't be me; I'm seriously
> > > > > overextended
> > > > > > at
> > > > > > > > the
> > > > > > > > > > > > moment.
> > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > would be best if someone who knew the
> > connector
> > > > > well
> > > > > > > > could
> > > > > > > > > do
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > necessary
> > > > > > > > > > > > > > > > > work.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Karl
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro
> <
> > > > > > > > > > > > rharoapache@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I must agree with Antonio. When I started
> > to
> > > > work
> > > > > > on
> > > > > > > > > this I
> > > > > > > > > > > was
> > > > > > > > > > > > > > > > expecting
> > > > > > > > > > > > > > > > > > the connector to work by just extracting
> > the
> > > > > > entities
> > > > > > > > and
> > > > > > > > > > > > > entities
> > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > and put them as plain metadata of the
> > > > documents,
> > > > > > > > probably
> > > > > > > > > > > > > following
> > > > > > > > > > > > > > > > > LDPATH
> > > > > > > > > > > > > > > > > > queries configuration
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > This is probably ok for Sensefy but I
> don’t
> > > > think
> > > > > > > this
> > > > > > > > > > could
> > > > > > > > > > > be
> > > > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > > to be included in the project. But this
> is
> > > only
> > > > > my
> > > > > > > > > opinion.
> > > > > > > > > > > Of
> > > > > > > > > > > > > > > course,
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > version of the connector that fully
> respect
> > > the
> > > > > > > > > ManifoldCF
> > > > > > > > > > > > > > > architecture
> > > > > > > > > > > > > > > > > > would be more than welcome in my opinion
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio
> > > David
> > > > > > Pérez
> > > > > > > > > > Morales
> > > > > > > > > > > > > > > > > > <ad...@gmail.com> wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi
> > > > > > > > > > > > > > > > > > > The removal of the SolrWrapper is a
> must.
> > > It
> > > > > was
> > > > > > a
> > > > > > > > > > > > requirement
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > internal project which has nothing to
> do
> > > here
> > > > > > with
> > > > > > > a
> > > > > > > > > > normal
> > > > > > > > > > > > > > > operation
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > Manifold, so forcing the users to use
> > Solr
> > > > does
> > > > > > not
> > > > > > > > fit
> > > > > > > > > > the
> > > > > > > > > > > > > > > Manifold
> > > > > > > > > > > > > > > > > > > philosophy.
> > > > > > > > > > > > > > > > > > > In my opinion, at this moment, a
> Stanbol
> > > > > > connector
> > > > > > > > with
> > > > > > > > > > > such
> > > > > > > > > > > > a
> > > > > > > > > > > > > > big
> > > > > > > > > > > > > > > > > > > dependency which will not fit almost
> any
> > > use
> > > > > case
> > > > > > > is
> > > > > > > > > not
> > > > > > > > > > > very
> > > > > > > > > > > > > > > useful.
> > > > > > > > > > > > > > > > > > > You should think a way to convert
> Stanbol
> > > > > > connector
> > > > > > > > > into
> > > > > > > > > > a
> > > > > > > > > > > > > normal
> > > > > > > > > > > > > > > > > > > Transformation connector without
> assuming
> > > > that
> > > > > a
> > > > > > > > > specific
> > > > > > > > > > > > > output
> > > > > > > > > > > > > > > > > > connector
> > > > > > > > > > > > > > > > > > > will be used.
> > > > > > > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa
> > > Jayakody <
> > > > > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > >> I have developed a Stanbol connector
> for
> > > > MCF.
> > > > > > You
> > > > > > > > can
> > > > > > > > > > > check
> > > > > > > > > > > > it
> > > > > > > > > > > > > > out
> > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > >> github repo here:
> > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > >> It requires the SolrWrapper output
> > > connector
> > > > > > which
> > > > > > > > > > indexes
> > > > > > > > > > > > > > > enhanced
> > > > > > > > > > > > > > > > > > >> documents, entities and entityTypes in
> > > > > separate
> > > > > > > Solr
> > > > > > > > > > > cores.
> > > > > > > > > > > > > > > > Basically
> > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > >> requires 3 separate solr cores
> > configured
> > > > > with a
> > > > > > > > > > specific
> > > > > > > > > > > > Solr
> > > > > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > >> primary documents, entities and
> > > entityTypes
> > > > > > > > > separately.
> > > > > > > > > > > This
> > > > > > > > > > > > > was
> > > > > > > > > > > > > > > > done
> > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > >> our specific use-case.
> > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > >> Perhaps we can discuss and remove the
> > > > Stanbol
> > > > > > > > > > connector's
> > > > > > > > > > > > > > > dependency
> > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > >> SolrWrapper and have it working with
> any
> > > > > output
> > > > > > > > > > connector.
> > > > > > > > > > > > > > > > > > >> Please note that the Stanbol connector
> > > > > currently
> > > > > > > > has a
> > > > > > > > > > bug
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > > > >> (editSpecification) which I'm working
> on
> > > at
> > > > > the
> > > > > > > > > moment.
> > > > > > > > > > > > After
> > > > > > > > > > > > > > > fixing
> > > > > > > > > > > > > > > > > > that I
> > > > > > > > > > > > > > > > > > >> will update here. And also I will
> > provide
> > > > > > > > > documentations
> > > > > > > > > > > for
> > > > > > > > > > > > > > > > > configuring
> > > > > > > > > > > > > > > > > > >> the connector.
> > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > > > > > > > >> Dileepa
> > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM,
> Antonio
> > > > David
> > > > > > > Pérez
> > > > > > > > > > > Morales
> > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > >> > Hi Joshua
> > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > >> > It is not the list for that, but
> > > Marmotta
> > > > is
> > > > > > > > already
> > > > > > > > > > > > > > integrated
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > Apache
> > > > > > > > > > > > > > > > > > >> > Stanbol. You can take a look at this
> > > issue
> > > > > > > > > > > > > > > > > > >> >
> > > > > > > > https://issues.apache.org/jira/browse/STANBOL-1165
> > > > > > > > > .
> > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > >> > Anyway, as I said this is not the
> list
> > > for
> > > > > > that,
> > > > > > > > so
> > > > > > > > > > > let's
> > > > > > > > > > > > > use
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > proper
> > > > > > > > > > > > > > > > > > >> > list for these things.
> > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > >> > Regards
> > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua
> > > Dunham <
> > > > > > > > > > > > > > > joshua.dunham@gmail.com
> > > > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > >> > > Hey Dileepa,
> > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > >> > >       In case you were
> interested, I
> > > > > pinged
> > > > > > > the
> > > > > > > > > > list a
> > > > > > > > > > > > few
> > > > > > > > > > > > > > > days
> > > > > > > > > > > > > > > > > ago
> > > > > > > > > > > > > > > > > > >> > asking
> > > > > > > > > > > > > > > > > > >> > > for integration tips for Apache
> > > > Marmotta.
> > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > >> > > I got some great tips on how to do
> > > this
> > > > > > which
> > > > > > > > > could
> > > > > > > > > > > help
> > > > > > > > > > > > > > you.
> > > > > > > > > > > > > > > > > Since
> > > > > > > > > > > > > > > > > > >> > > Marmotta is a drop in replacement
> > for
> > > > > > Clarezza
> > > > > > > > on
> > > > > > > > > > > > Stanbol
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > >> > easier
> > > > > > > > > > > > > > > > > > >> > > for you to take this way.
> > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > >> > > I'm not a Java programmer but I'm
> > > > bringing
> > > > > > > this
> > > > > > > > > > > problem
> > > > > > > > > > > > to
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > >> > development
> > > > > > > > > > > > > > > > > > >> > > staff at my company for
> assistance.
> > If
> > > > you
> > > > > > > like
> > > > > > > > > the
> > > > > > > > > > > > > Marmotta
> > > > > > > > > > > > > > > > > > approach
> > > > > > > > > > > > > > > > > > >> we
> > > > > > > > > > > > > > > > > > >> > > may gain more traction solving the
> > > same
> > > > > > > > > integration.
> > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > >> > > I'm also integrating Marmotta with
> > > > Stanbol
> > > > > > so
> > > > > > > > the
> > > > > > > > > > > effect
> > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > >> > same
> > > > > > > > > > > > > > > > > > >> > > except not using the Stanbol API
> for
> > > > data
> > > > > > > import
> > > > > > > > > in
> > > > > > > > > > > > favor
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > Marmotta.
> > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > >> > > Best,
> > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > >> > > -J
> > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM,
> > Dileepa
> > > > > > > Jayakody <
> > > > > > > > > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > > >> > > > Thanks you for the feedback and
> > > > offering
> > > > > > > your
> > > > > > > > > help
> > > > > > > > > > > in
> > > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > > > >> > > > Let me get back to you on where
> to
> > > > start
> > > > > > the
> > > > > > > > > code
> > > > > > > > > > > > base.
> > > > > > > > > > > > > > > > > > >> > > > As the first step, I would like
> to
> > > > start
> > > > > > by
> > > > > > > > > > > creating a
> > > > > > > > > > > > > > > > > > architecture
> > > > > > > > > > > > > > > > > > >> > > diagram
> > > > > > > > > > > > > > > > > > >> > > > for the connector.
> > > > > > > > > > > > > > > > > > >> > > > I will send the diagram for your
> > > > review
> > > > > > > soon.
> > > > > > > > > > > > > > > > > > >> > > >
> >
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Rafa Haro <rh...@apache.org>.
Hi Karl,

I will proceed in the same way than the OpenNLP connector, so as I ask in
the other email, should we do this using Pull request instead of manually
importing the master branch of Dileepa's repo?

Cheers,
Rafa

On Tue, Jan 26, 2016 at 12:07 PM Karl Wright <da...@gmail.com> wrote:

> Hi Rafa,
>
> Any time you are ready, please import this into a branch.  I'll need to
> look over licensing and build before committing to trunk.
>
> Thanks!
> Karl
>
>
> On Tue, Jan 26, 2016 at 3:20 AM, Dileepa Jayakody <dj...@zaizi.com>
> wrote:
>
> > Hi All,
> >
> > I have done the discussed modifications to the Stanbol connector. Now the
> > users can either define dereference fields or define a LDPath program to
> > extract entity properties from Stanbol entities and add them to the
> > document as fields.
> >
> > The latest code is available here for your review:
> >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> >
> > I have also written a blog post on how to configure the connector:
> >
> >
> http://dileepajayakody.blogspot.com/2016/01/enhancing-documents-in-apache.html
> >
> > Looking forward to your comments.
> >
> > Thanks,
> > Dileepa
> >
> >
> > On Mon, Dec 14, 2015 at 1:18 PM, Rafa Haro <rh...@apache.org> wrote:
> >
> > > Hi Karl,
> > >
> > > I will import this one, don't worry.
> > >
> > > Cheers,
> > > Rafa
> > > El El sáb, 12 dic 2015 a las 20:36, Dileepa Jayakody <
> > djayakody@zaizi.com>
> > > escribió:
> > >
> > > > Hi Karl,
> > > >
> > > > Yes, I will improve the code with Rafa's reviews and then we can
> import
> > > it
> > > > to mcf code base.
> > > >
> > > > Thanks
> > > > Dileepa
> > > >
> > > > On Sat, Dec 12, 2015 at 5:26 PM, Karl Wright <da...@gmail.com>
> > wrote:
> > > >
> > > > > Ok, it seems premature for me to try to import this from Github
> > today,
> > > so
> > > > > I'll wait until the dust settles a bit further first.
> > > > >
> > > > > Karl
> > > > >
> > > > >
> > > > > On Fri, Dec 11, 2015 at 1:45 PM, Dileepa Jayakody <
> > djayakody@zaizi.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Thanks a lot Rafa for pointing that out. big miss as  I didn't
> test
> > > the
> > > > > > LDPath configuration part yet. More improvements to be done.
> > > > > > I will do the required mprovements as pointed out.
> > > > > >
> > > > > > Regards,
> > > > > > Dileepa
> > > > > >
> > > > > >
> > > > > > On Fri, Dec 11, 2015 at 8:42 PM, Rafa Haro <rh...@apache.org>
> > wrote:
> > > > > >
> > > > > > > Hi Dileepa,
> > > > > > >
> > > > > > > The problem is not in that part on the code, it is rather on
> this
> > > > part:
> > > > > > >
> > > > > > > if (entity != null) { Collection<String> properties = entity.
> > > > > > > getProperties(); for (String property : properties) { String
> > > > > > > targetFieldName = derefFields.get(property); Set<String>
> > > propValues =
> > > > > > > entityPropertyMap.get(targetFieldName); if (propValues ==
> null) {
> > > > > > > propValues = new HashSet<String>(); } Collection<String>
> > > > > > entityPropValues =
> > > > > > > entity.getPropertyValues(property);
> > > > > propValues.addAll(entityPropValues);
> > > > > > > entityPropertyMap.put(targetFieldName, propValues); } }
> > > > > > > You are collecting from the EnhancementStructure response just
> > only
> > > > the
> > > > > > > configured dereferenced fields and LDPath fields are ignored.
> > Also,
> > > > > there
> > > > > > > is a potential bug in that code if there is no dereferencing
> > field
> > > > > > > configured for a certain entity property here:
> > > > > > >
> > > > > > > String targetFieldName = derefFields.get(property);
> > > > > > >
> > > > > > > targetFieldName would be Null then. Instead of trying to index
> > > every
> > > > > > > property, you should just collect the configured ones by the
> user
> > > (or
> > > > > at
> > > > > > > least, if the user wants all of them, provide a configuration
> > > option
> > > > > for
> > > > > > > that).
> > > > > > >
> > > > > > > Anyway, going back to LDPath issue, please take into account
> that
> > > > when
> > > > > > you
> > > > > > > define a field you must use a custom Namespace and Prefix for
> > later
> > > > > being
> > > > > > > able to retrieve that property from the entity. If you don't do
> > > that,
> > > > > > > Stanbol will provide a random namespace for that property.
> Check
> > > this
> > > > > > > example from RedLink SDK:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443
> > > > > > >
> > > > > > > Hope that helps
> > > > > > >
> > > > > > > On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <
> daddywri@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > The next step would be to pull this code into an svn branch.
> > > This
> > > > is
> > > > > > > > something I can tackled after the 2.3 release candidate is
> put
> > > > > > together.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Karl
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <
> > > > > djayakody@zaizi.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Rafa,
> > > > > > > > >
> > > > > > > > > Thanks for reviewing my code and for your feedback. Please
> > see
> > > my
> > > > > > > > comments
> > > > > > > > > inline below.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <
> rharo@apache.org
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Dileepa,
> > > > > > > > > >
> > > > > > > > > > This seems to be going in the right direction clearly now
> > in
> > > my
> > > > > > > > opinion.
> > > > > > > > > > Quick comments after a first review:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >    - Rejecting a document because it can't be enhanced is
> > > kind
> > > > of
> > > > > > > > tough.
> > > > > > > > > >    You are preventing a document to be finally indexed
> > > because
> > > > > the
> > > > > > > > > > enhancement
> > > > > > > > > >    didn't perform correctly, probably it is better just
> to
> > > let
> > > > > them
> > > > > > > > > > continue
> > > > > > > > > >    the workflow within the system
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Got your point. Will remove that part from the code
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    - As I can deduce for the code, you are correctly
> > > extracting
> > > > > the
> > > > > > > > > >    configured dereferenced fields, but you are not
> > processing
> > > > at
> > > > > > all
> > > > > > > > the
> > > > > > > > > >    LDPath results
> > > > > > > > > >
> > > > > > > > > > I'm passing the LDPath program as an enhancer parameter
> to
> > > > > Stanbol
> > > > > > to
> > > > > > > > > retrieve the enhancement result according to the LDPath
> > program
> > > > > > (which
> > > > > > > is
> > > > > > > > > given as a text string in the connector UI).
> > > > > > > > > If the user has not defined a LDPath program and added
> > > derefence
> > > > > > fields
> > > > > > > > in
> > > > > > > > > the UI instead, then the enhancement request will be built
> > > using
> > > > > the
> > > > > > > > > dereference fields as enhancer parameters.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > If neither a LDPath or dereference fields are given in the
> > > > > > > transformation
> > > > > > > > > UI, then I just call the given enhancement chain without
> any
> > > > other
> > > > > > > > enhancer
> > > > > > > > > paramaters.
> > > > > > > > >
> > > > > > > > > Please refer below code segment where I do this and let me
> > know
> > > > if
> > > > > it
> > > > > > > > needs
> > > > > > > > > more improvements.
> > > > > > > > >
> > > > > > > > >             // ldpath program is given priority if it's set
> > > > > > > > >             if (ldPath != null)
> > > > > > > > >             {
> > > > > > > > >                 parameters =
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
> > > > > > > > >             }
> > > > > > > > >             else if (!derefFields.isEmpty())
> > > > > > > > >             {
> > > > > > > > >                 parameters =
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
> > > > > > > > >                         derefFields.keySet()).build();
> > > > > > > > >             }
> > > > > > > > >             else
> > > > > > > > >             {
> > > > > > > > >                 parameters =
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> EnhancerParameters.builder().setChain(chain).setContent(content).build();
> > > > > > > > >             }
> > > > > > > > >             eRes = enhancerClient.enhance(parameters);
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Dileepa
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Rafa
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <
> > > > > > > djayakody@zaizi.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi All,
> > > > > > > > > > >
> > > > > > > > > > > As per our discussion I have modified the Stanbol
> > Connector
> > > > so
> > > > > > that
> > > > > > > > it
> > > > > > > > > > adds
> > > > > > > > > > > all extracted entity URIs and entity attributes to the
> > > > > repository
> > > > > > > > > > document
> > > > > > > > > > > as fields.
> > > > > > > > > > >
> > > > > > > > > > > On a separate branch I have committed this code to our
> > > github
> > > > > > > project
> > > > > > > > > > > sensefy-connectors.
> > > > > > > > > > > You can find the source code here:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > > > > > > > > > > Let me know your feedback.
> > > > > > > > > > >
> > > > > > > > > > > I will write a blog post on how to add it in a
> connection
> > > and
> > > > > get
> > > > > > > > > > > ehancement results and share it with you.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Dileepa
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <
> > > > > daddywri@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Dileepa,
> > > > > > > > > > > >
> > > > > > > > > > > > You cannot create sub-documents in a transformation
> > > > > connector.
> > > > > > > And
> > > > > > > > > > > adding
> > > > > > > > > > > > that capability to the framework is not possible; we
> > > would
> > > > be
> > > > > > > > missing
> > > > > > > > > > key
> > > > > > > > > > > > bookkeeping logic if that was allowed.
> > > > > > > > > > > >
> > > > > > > > > > > > Karl
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
> > > > > > > > > djayakody@zaizi.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Karl,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks a lot for the pointer.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Stanbol doesn't update an existing document, it
> > > > generates a
> > > > > > new
> > > > > > > > > > > response
> > > > > > > > > > > > > with requested enhancement details for the content
> > > > > enhansment
> > > > > > > > > > request.
> > > > > > > > > > > > > For example for a request like : "Paris is a city
> in
> > > > > France"
> > > > > > > > > > following
> > > > > > > > > > > > RDF
> > > > > > > > > > > > > response [1] is given by Stanbol.
> > > > > > > > > > > > >
> > > > > > > > > > > > > In the Stanbol connector, enhancement artifacts
> such
> > as
> > > > > > > > > > TextAnnotations
> > > > > > > > > > > > > and EntityAnnotations are extracted from the RDF
> > > > response,
> > > > > to
> > > > > > > > > > generate
> > > > > > > > > > > > the
> > > > > > > > > > > > > entity abstractions and add them to the mcf
> > repository
> > > > > > > document.
> > > > > > > > > > > > Currently
> > > > > > > > > > > > > in the Stanbol connector we have added these entity
> > > > > > > abstractions
> > > > > > > > as
> > > > > > > > > > > JSON
> > > > > > > > > > > > > strings to a multi-valued 'entities' field in the
> > > > > repository
> > > > > > > > > document
> > > > > > > > > > > and
> > > > > > > > > > > > > we parse that JSON in the SolrWrapper output
> > connector
> > > to
> > > > > > index
> > > > > > > > in
> > > > > > > > > > > > separate
> > > > > > > > > > > > > Solr cores (primary documents, linked entities and
> > > entity
> > > > > > types
> > > > > > > > > with
> > > > > > > > > > > > their
> > > > > > > > > > > > > attributes).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can we can have a primary repository document and
> > > create
> > > > > sub
> > > > > > > > > > documents
> > > > > > > > > > > > for
> > > > > > > > > > > > > the extracted entities? Is it possible to generate
> > sub
> > > > > > > documents
> > > > > > > > > for
> > > > > > > > > > a
> > > > > > > > > > > > > repo-document in a transformation connector?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > Dileepa
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1] Sample Stanbol response
> > > > > > > > > > > > >
> > > > > > > > > > > > > {
> > > > > > > > > > > > >   "@context": {
> > > > > > > > > > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > > > > > > > > > > >     "dc": "http://purl.org/dc/terms/",
> > > > > > > > > > > > >     "dc:created": {
> > > > > > > > > > > > >       "@type": "xsd:dateTime"
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >     "enhancer": "
> > http://fise.iks-project.eu/ontology/
> > > ",
> > > > > > > > > > > > >     "enhancer:confidence": {
> > > > > > > > > > > > >       "@type": "xsd:double"
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >     "enhancer:end": {
> > > > > > > > > > > > >       "@type": "xsd:int"
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >     "enhancer:entity-reference": {
> > > > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >     "enhancer:entity-type": {
> > > > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >     "enhancer:extracted-from": {
> > > > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >     "enhancer:start": {
> > > > > > > > > > > > >       "@type": "xsd:int"
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >     "entityhub": "
> > > > > > > > > > > >
> > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > > > > > > > > > > ",
> > > > > > > > > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > > > > > > > > > > >     "foaf:depiction": {
> > > > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > > > > > > > > > > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#
> ",
> > > > > > > > > > > > >     "schema": "http://schema.org/",
> > > > > > > > > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > > > > > > > > > > >   },
> > > > > > > > > > > > >   "@graph": [
> > > > > > > > > > > > >     {
> > > > > > > > > > > > >       "@id": "http://dbpedia.org/resource/France",
> > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > > > >         "owl:Thing",
> > > > > > > > > > > > >         "schema:Country",
> > > > > > > > > > > > >         "schema:Place"
> > > > > > > > > > > > >       ],
> > > > > > > > > > > > >       "foaf:depiction": [
> > > > > > > > > > > > >         "
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > >
> http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > > > > > > > > > > ",
> > > > > > > > > > > > >         "
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > > > > > > > > > > "
> > > > > > > > > > > > >       ],
> > > > > > > > > > > > >       "rdfs:comment": {
> > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > >         "@value": "France, officially the French
> > > > Republic,
> > > > > > is a
> > > > > > > > > > > > > unitary semi-presidential republic in Western
> Europe
> > > with
> > > > > > > several
> > > > > > > > > > > > > overseas territories and islands located on other
> > > > > continents
> > > > > > > and
> > > > > > > > in
> > > > > > > > > > > > > the Indian, Pacific, and Atlantic oceans.
> > Metropolitan
> > > > > France
> > > > > > > > > extends
> > > > > > > > > > > > > from the Mediterranean Sea to the English Channel
> and
> > > the
> > > > > > North
> > > > > > > > > Sea,
> > > > > > > > > > > > > and from the Rhine to the Atlantic Ocean. It is
> often
> > > > > > referred
> > > > > > > to
> > > > > > > > > as
> > > > > > > > > > > > > l’Hexagone because of the geometric shape of its
> > > > > territory."
> > > > > > > > > > > > >       },
> > > > > > > > > > > > >       "rdfs:label": [
> > > > > > > > > > > > >         {
> > > > > > > > > > > > >           "@language": "en",
> > > > > > > > > > > > >           "@value": "France"
> > > > > > > > > > > > >         },
> > > > > > > > > > > > >         {
> > > > > > > > > > > > >           "@language": "fr",
> > > > > > > > > > > > >           "@value": "France"
> > > > > > > > > > > > >         },
> > > > > > > > > > > > >       ]
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >
> > > > > > > > > > > > >     {
> > > > > > > > > > > > >       "@id": "http://dbpedia.org/resource/Paris",
> > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > > >         "dbp-ont:Settlement",
> > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > > > >         "owl:Thing",
> > > > > > > > > > > > >         "schema:Place"
> > > > > > > > > > > > >       ],
> > > > > > > > > > > > >       "foaf:depiction": [
> > > > > > > > > > > > >         "
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > > > > > ",
> > > > > > > > > > > > >         "
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > > > > > "
> > > > > > > > > > > > >       ],
> > > > > > > > > > > > >       "geo:lat": 48.8567,
> > > > > > > > > > > > >       "geo:long": 2.3508,
> > > > > > > > > > > > >       "rdfs:comment": {
> > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > >         "@value": "Paris is the capital and largest
> > > city
> > > > of
> > > > > > > > France.
> > > > > > > > > > It
> > > > > > > > > > > > > is situated on the river Seine, in northern France,
> > at
> > > > the
> > > > > > > heart
> > > > > > > > of
> > > > > > > > > > > > > the Île-de-France region (or Paris Region, French:
> > > Région
> > > > > > > > > > parisienne).
> > > > > > > > > > > > > As of January 2008 the city of Paris, within its
> > > > > > administrative
> > > > > > > > > > limits
> > > > > > > > > > > > > largely unchanged since 1860, has an estimated
> > > population
> > > > > of
> > > > > > > > > > 2,211,297
> > > > > > > > > > > > > and a metropolitan population of 12,089,098, and is
> > one
> > > > of
> > > > > > the
> > > > > > > > most
> > > > > > > > > > > > > populated metropolitan areas in Europe."
> > > > > > > > > > > > >       },
> > > > > > > > > > > > >       "rdfs:label": [
> > > > > > > > > > > > >
> > > > > > > > > > > > >         {
> > > > > > > > > > > > >           "@language": "en",
> > > > > > > > > > > > >           "@value": "Paris"
> > > > > > > > > > > > >         },
> > > > > > > > > > > > >         {
> > > > > > > > > > > > >           "@language": "fr",
> > > > > > > > > > > > >           "@value": "Paris"
> > > > > > > > > > > > >         },
> > > > > > > > > > > > >       ]
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >    }
> > > > > > > > > > > > >     {
> > > > > > > > > > > > >       "@id":
> > > > > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > > > > > >       ],
> > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > > > > > >       "enhancer:confidence": 0.6017613,
> > > > > > > > > > > > >       "enhancer:end": 5,
> > > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > > >
> > > > > > >
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > >         "@value": "Paris"
> > > > > > > > > > > > >       },
> > > > > > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > > > > > >       },
> > > > > > > > > > > > >       "enhancer:start": 0
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >     {
> > > > > > > > > > > > >       "@id":
> > > > > > > > > "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > > > > >       ],
> > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > > > > >       "dc:relation":
> > > > > > > > > > > > >
> > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > > > > >       "enhancer:confidence": 1.0,
> > > > > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > >         "@value": "France"
> > > > > > > > > > > > >       },
> > > > > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > > http://dbpedia.org/resource/France
> > > > > > > > > > > ",
> > > > > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > > >         "schema:Country",
> > > > > > > > > > > > >         "schema:Place",
> > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > > > >         "owl:Thing"
> > > > > > > > > > > > >       ],
> > > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > > >
> > > > > > >
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >     {
> > > > > > > > > > > > >       "@id":
> > > > > > > > > "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > > > > >       ],
> > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > > > > >       "dc:relation":
> > > > > > > > > > > > >
> > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > > > > >       "enhancer:confidence": 0.25715446,
> > > > > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > >         "@value": "Vichy France"
> > > > > > > > > > > > >       },
> > > > > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > > > > > http://dbpedia.org/resource/Vichy_France",
> > > > > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > > >         "schema:Country",
> > > > > > > > > > > > >         "schema:Place",
> > > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > > > >         "owl:Thing"
> > > > > > > > > > > > >       ],
> > > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > > >
> > > > > > >
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >     {
> > > > > > > > > > > > >       "@id":
> > > > > > > > > "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > > > > >       ],
> > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > > > > >       "dc:relation":
> > > > > > > > > > > > >
> > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > > > > > >       "enhancer:confidence": 0.1493264,
> > > > > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > >         "@value": "Paris Commune"
> > > > > > > > > > > > >       },
> > > > > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > > > > > http://dbpedia.org/resource/Paris_Commune",
> > > > > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > > >         "schema:Country",
> > > > > > > > > > > > >         "schema:Place",
> > > > > > > > > > > > >         "owl:Thing"
> > > > > > > > > > > > >       ],
> > > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > > >
> > > > > > >
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > > > > >     },
> > > > > > > > > > > > >     {
> > > > > > > > > > > > >       "@id":
> > > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > > > > >       "@type": [
> > > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > > > > > >       ],
> > > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > > > > > >       "enhancer:confidence": 0.99354976,
> > > > > > > > > > > > >       "enhancer:end": 18,
> > > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > > >
> > > > > > >
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > >         "@value": "France"
> > > > > > > > > > > > >       },
> > > > > > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > > > > > >       },
> > > > > > > > > > > > >       "enhancer:start": 12
> > > > > > > > > > > > >     }
> > > > > > > > > > > > >   ]
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <
> > > > > > > daddywri@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Dileepa,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Repository connectors have an abstraction that
> > allows
> > > > > them
> > > > > > to
> > > > > > > > > > > generate
> > > > > > > > > > > > > > compound documents (where a document has a
> primary
> > > > > > > identifier,
> > > > > > > > > and
> > > > > > > > > > > > there
> > > > > > > > > > > > > > are subdocuments that share that primary
> identifier
> > > and
> > > > > > have
> > > > > > > a
> > > > > > > > > > > > secondary
> > > > > > > > > > > > > > identifier).  This sounds a bit like what you are
> > > > > > describing.
> > > > > > > > > Does
> > > > > > > > > > > > > Stanbol
> > > > > > > > > > > > > > work by decorating an existing document, or does
> it
> > > > work
> > > > > by
> > > > > > > > > > > generating
> > > > > > > > > > > > > all
> > > > > > > > > > > > > > content for a document?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Karl
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody
> <
> > > > > > > > > > > djayakody@zaizi.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > While thanking you all for your input on
> Stanbol
> > > > > > connector
> > > > > > > > > > > > > requirement, I
> > > > > > > > > > > > > > > would like to continue with modifying the
> Stanbol
> > > > > > connector
> > > > > > > > to
> > > > > > > > > be
> > > > > > > > > > > > > > > compatible with any output connector. If you
> guys
> > > can
> > > > > > give
> > > > > > > > some
> > > > > > > > > > > > > guidance
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > how the entity metadata should be added to the
> > > > > repository
> > > > > > > > > > document
> > > > > > > > > > > I
> > > > > > > > > > > > > can
> > > > > > > > > > > > > > > modify the stanbol connector accordingly.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > From Rafa's comments, I gathered we can add the
> > > > entity
> > > > > > > > metadata
> > > > > > > > > > to
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > repo.doc as key value pairs.
> > > > > > > > > > > > > > > However this idea is not yet clear to me. There
> > > could
> > > > > be
> > > > > > > 'N'
> > > > > > > > > > number
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > entities in a document and each of them will
> have
> > > > some
> > > > > > > common
> > > > > > > > > > > > > attributes
> > > > > > > > > > > > > > > such as name, id, type and specific attributes
> > for
> > > > > > > particular
> > > > > > > > > > > entity
> > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > I'm not clear on how to maintain that structure
> > of
> > > N
> > > > > > number
> > > > > > > > of
> > > > > > > > > > > > entities
> > > > > > > > > > > > > > > with their attributes in a repo.document as key
> > > value
> > > > > > pairs
> > > > > > > > and
> > > > > > > > > > > make
> > > > > > > > > > > > > them
> > > > > > > > > > > > > > > LDPath compatible for retrieval in an output
> > > > connector.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Rafa
> > > > > > > > > > > > > > > If you can please elaborate on your suggestion
> it
> > > > would
> > > > > > be
> > > > > > > > > > greatly
> > > > > > > > > > > > > > helpful
> > > > > > > > > > > > > > > to me.
> > > > > > > > > > > > > > > All other suggestions are also welcome.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Dileepa
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <
> > > > > > > > > daddywri@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I, too, agree.  Somebody will need to turn
> this
> > > > > > connector
> > > > > > > > > into
> > > > > > > > > > > one
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > plays by the rules.  It may be possible for
> > > someone
> > > > > on
> > > > > > > the
> > > > > > > > > team
> > > > > > > > > > > > here
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > do
> > > > > > > > > > > > > > > > that, but it won't be me; I'm seriously
> > > > overextended
> > > > > at
> > > > > > > the
> > > > > > > > > > > moment.
> > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > would be best if someone who knew the
> connector
> > > > well
> > > > > > > could
> > > > > > > > do
> > > > > > > > > > the
> > > > > > > > > > > > > > > necessary
> > > > > > > > > > > > > > > > work.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Karl
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> > > > > > > > > > > rharoapache@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I must agree with Antonio. When I started
> to
> > > work
> > > > > on
> > > > > > > > this I
> > > > > > > > > > was
> > > > > > > > > > > > > > > expecting
> > > > > > > > > > > > > > > > > the connector to work by just extracting
> the
> > > > > entities
> > > > > > > and
> > > > > > > > > > > > entities
> > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > and put them as plain metadata of the
> > > documents,
> > > > > > > probably
> > > > > > > > > > > > following
> > > > > > > > > > > > > > > > LDPATH
> > > > > > > > > > > > > > > > > queries configuration
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This is probably ok for Sensefy but I don’t
> > > think
> > > > > > this
> > > > > > > > > could
> > > > > > > > > > be
> > > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > to be included in the project. But this is
> > only
> > > > my
> > > > > > > > opinion.
> > > > > > > > > > Of
> > > > > > > > > > > > > > course,
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > version of the connector that fully respect
> > the
> > > > > > > > ManifoldCF
> > > > > > > > > > > > > > architecture
> > > > > > > > > > > > > > > > > would be more than welcome in my opinion
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio
> > David
> > > > > Pérez
> > > > > > > > > Morales
> > > > > > > > > > > > > > > > > <ad...@gmail.com> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi
> > > > > > > > > > > > > > > > > > The removal of the SolrWrapper is a must.
> > It
> > > > was
> > > > > a
> > > > > > > > > > > requirement
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > internal project which has nothing to do
> > here
> > > > > with
> > > > > > a
> > > > > > > > > normal
> > > > > > > > > > > > > > operation
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > Manifold, so forcing the users to use
> Solr
> > > does
> > > > > not
> > > > > > > fit
> > > > > > > > > the
> > > > > > > > > > > > > > Manifold
> > > > > > > > > > > > > > > > > > philosophy.
> > > > > > > > > > > > > > > > > > In my opinion, at this moment, a Stanbol
> > > > > connector
> > > > > > > with
> > > > > > > > > > such
> > > > > > > > > > > a
> > > > > > > > > > > > > big
> > > > > > > > > > > > > > > > > > dependency which will not fit almost any
> > use
> > > > case
> > > > > > is
> > > > > > > > not
> > > > > > > > > > very
> > > > > > > > > > > > > > useful.
> > > > > > > > > > > > > > > > > > You should think a way to convert Stanbol
> > > > > connector
> > > > > > > > into
> > > > > > > > > a
> > > > > > > > > > > > normal
> > > > > > > > > > > > > > > > > > Transformation connector without assuming
> > > that
> > > > a
> > > > > > > > specific
> > > > > > > > > > > > output
> > > > > > > > > > > > > > > > > connector
> > > > > > > > > > > > > > > > > > will be used.
> > > > > > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa
> > Jayakody <
> > > > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> I have developed a Stanbol connector for
> > > MCF.
> > > > > You
> > > > > > > can
> > > > > > > > > > check
> > > > > > > > > > > it
> > > > > > > > > > > > > out
> > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > >> github repo here:
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> It requires the SolrWrapper output
> > connector
> > > > > which
> > > > > > > > > indexes
> > > > > > > > > > > > > > enhanced
> > > > > > > > > > > > > > > > > >> documents, entities and entityTypes in
> > > > separate
> > > > > > Solr
> > > > > > > > > > cores.
> > > > > > > > > > > > > > > Basically
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > >> requires 3 separate solr cores
> configured
> > > > with a
> > > > > > > > > specific
> > > > > > > > > > > Solr
> > > > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > >> primary documents, entities and
> > entityTypes
> > > > > > > > separately.
> > > > > > > > > > This
> > > > > > > > > > > > was
> > > > > > > > > > > > > > > done
> > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > >> our specific use-case.
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> Perhaps we can discuss and remove the
> > > Stanbol
> > > > > > > > > connector's
> > > > > > > > > > > > > > dependency
> > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > >> SolrWrapper and have it working with any
> > > > output
> > > > > > > > > connector.
> > > > > > > > > > > > > > > > > >> Please note that the Stanbol connector
> > > > currently
> > > > > > > has a
> > > > > > > > > bug
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > > >> (editSpecification) which I'm working on
> > at
> > > > the
> > > > > > > > moment.
> > > > > > > > > > > After
> > > > > > > > > > > > > > fixing
> > > > > > > > > > > > > > > > > that I
> > > > > > > > > > > > > > > > > >> will update here. And also I will
> provide
> > > > > > > > documentations
> > > > > > > > > > for
> > > > > > > > > > > > > > > > configuring
> > > > > > > > > > > > > > > > > >> the connector.
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > > > > > > >> Dileepa
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio
> > > David
> > > > > > Pérez
> > > > > > > > > > Morales
> > > > > > > > > > > <
> > > > > > > > > > > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> > Hi Joshua
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > It is not the list for that, but
> > Marmotta
> > > is
> > > > > > > already
> > > > > > > > > > > > > integrated
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > Apache
> > > > > > > > > > > > > > > > > >> > Stanbol. You can take a look at this
> > issue
> > > > > > > > > > > > > > > > > >> >
> > > > > > > https://issues.apache.org/jira/browse/STANBOL-1165
> > > > > > > > .
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > Anyway, as I said this is not the list
> > for
> > > > > that,
> > > > > > > so
> > > > > > > > > > let's
> > > > > > > > > > > > use
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > proper
> > > > > > > > > > > > > > > > > >> > list for these things.
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > Regards
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua
> > Dunham <
> > > > > > > > > > > > > > joshua.dunham@gmail.com
> > > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > > Hey Dileepa,
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > >       In case you were interested, I
> > > > pinged
> > > > > > the
> > > > > > > > > list a
> > > > > > > > > > > few
> > > > > > > > > > > > > > days
> > > > > > > > > > > > > > > > ago
> > > > > > > > > > > > > > > > > >> > asking
> > > > > > > > > > > > > > > > > >> > > for integration tips for Apache
> > > Marmotta.
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > I got some great tips on how to do
> > this
> > > > > which
> > > > > > > > could
> > > > > > > > > > help
> > > > > > > > > > > > > you.
> > > > > > > > > > > > > > > > Since
> > > > > > > > > > > > > > > > > >> > > Marmotta is a drop in replacement
> for
> > > > > Clarezza
> > > > > > > on
> > > > > > > > > > > Stanbol
> > > > > > > > > > > > it
> > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > >> > easier
> > > > > > > > > > > > > > > > > >> > > for you to take this way.
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > I'm not a Java programmer but I'm
> > > bringing
> > > > > > this
> > > > > > > > > > problem
> > > > > > > > > > > to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> > development
> > > > > > > > > > > > > > > > > >> > > staff at my company for assistance.
> If
> > > you
> > > > > > like
> > > > > > > > the
> > > > > > > > > > > > Marmotta
> > > > > > > > > > > > > > > > > approach
> > > > > > > > > > > > > > > > > >> we
> > > > > > > > > > > > > > > > > >> > > may gain more traction solving the
> > same
> > > > > > > > integration.
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > I'm also integrating Marmotta with
> > > Stanbol
> > > > > so
> > > > > > > the
> > > > > > > > > > effect
> > > > > > > > > > > > > would
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> > same
> > > > > > > > > > > > > > > > > >> > > except not using the Stanbol API for
> > > data
> > > > > > import
> > > > > > > > in
> > > > > > > > > > > favor
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > Marmotta.
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > Best,
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > -J
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM,
> Dileepa
> > > > > > Jayakody <
> > > > > > > > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > > > Thanks you for the feedback and
> > > offering
> > > > > > your
> > > > > > > > help
> > > > > > > > > > in
> > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > > >> > > > Let me get back to you on where to
> > > start
> > > > > the
> > > > > > > > code
> > > > > > > > > > > base.
> > > > > > > > > > > > > > > > > >> > > > As the first step, I would like to
> > > start
> > > > > by
> > > > > > > > > > creating a
> > > > > > > > > > > > > > > > > architecture
> > > > > > > > > > > > > > > > > >> > > diagram
> > > > > > > > > > > > > > > > > >> > > > for the connector.
> > > > > > > > > > > > > > > > > >> > > > I will send the diagram for your
> > > review
> > > > > > soon.
> > > > > > > > > > > > > > > > > >> > > >
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Karl Wright <da...@gmail.com>.
Hi Rafa,

Any time you are ready, please import this into a branch.  I'll need to
look over licensing and build before committing to trunk.

Thanks!
Karl


On Tue, Jan 26, 2016 at 3:20 AM, Dileepa Jayakody <dj...@zaizi.com>
wrote:

> Hi All,
>
> I have done the discussed modifications to the Stanbol connector. Now the
> users can either define dereference fields or define a LDPath program to
> extract entity properties from Stanbol entities and add them to the
> document as fields.
>
> The latest code is available here for your review:
>
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
>
> I have also written a blog post on how to configure the connector:
>
> http://dileepajayakody.blogspot.com/2016/01/enhancing-documents-in-apache.html
>
> Looking forward to your comments.
>
> Thanks,
> Dileepa
>
>
> On Mon, Dec 14, 2015 at 1:18 PM, Rafa Haro <rh...@apache.org> wrote:
>
> > Hi Karl,
> >
> > I will import this one, don't worry.
> >
> > Cheers,
> > Rafa
> > El El sáb, 12 dic 2015 a las 20:36, Dileepa Jayakody <
> djayakody@zaizi.com>
> > escribió:
> >
> > > Hi Karl,
> > >
> > > Yes, I will improve the code with Rafa's reviews and then we can import
> > it
> > > to mcf code base.
> > >
> > > Thanks
> > > Dileepa
> > >
> > > On Sat, Dec 12, 2015 at 5:26 PM, Karl Wright <da...@gmail.com>
> wrote:
> > >
> > > > Ok, it seems premature for me to try to import this from Github
> today,
> > so
> > > > I'll wait until the dust settles a bit further first.
> > > >
> > > > Karl
> > > >
> > > >
> > > > On Fri, Dec 11, 2015 at 1:45 PM, Dileepa Jayakody <
> djayakody@zaizi.com
> > >
> > > > wrote:
> > > >
> > > > > Thanks a lot Rafa for pointing that out. big miss as  I didn't test
> > the
> > > > > LDPath configuration part yet. More improvements to be done.
> > > > > I will do the required mprovements as pointed out.
> > > > >
> > > > > Regards,
> > > > > Dileepa
> > > > >
> > > > >
> > > > > On Fri, Dec 11, 2015 at 8:42 PM, Rafa Haro <rh...@apache.org>
> wrote:
> > > > >
> > > > > > Hi Dileepa,
> > > > > >
> > > > > > The problem is not in that part on the code, it is rather on this
> > > part:
> > > > > >
> > > > > > if (entity != null) { Collection<String> properties = entity.
> > > > > > getProperties(); for (String property : properties) { String
> > > > > > targetFieldName = derefFields.get(property); Set<String>
> > propValues =
> > > > > > entityPropertyMap.get(targetFieldName); if (propValues == null) {
> > > > > > propValues = new HashSet<String>(); } Collection<String>
> > > > > entityPropValues =
> > > > > > entity.getPropertyValues(property);
> > > > propValues.addAll(entityPropValues);
> > > > > > entityPropertyMap.put(targetFieldName, propValues); } }
> > > > > > You are collecting from the EnhancementStructure response just
> only
> > > the
> > > > > > configured dereferenced fields and LDPath fields are ignored.
> Also,
> > > > there
> > > > > > is a potential bug in that code if there is no dereferencing
> field
> > > > > > configured for a certain entity property here:
> > > > > >
> > > > > > String targetFieldName = derefFields.get(property);
> > > > > >
> > > > > > targetFieldName would be Null then. Instead of trying to index
> > every
> > > > > > property, you should just collect the configured ones by the user
> > (or
> > > > at
> > > > > > least, if the user wants all of them, provide a configuration
> > option
> > > > for
> > > > > > that).
> > > > > >
> > > > > > Anyway, going back to LDPath issue, please take into account that
> > > when
> > > > > you
> > > > > > define a field you must use a custom Namespace and Prefix for
> later
> > > > being
> > > > > > able to retrieve that property from the entity. If you don't do
> > that,
> > > > > > Stanbol will provide a random namespace for that property. Check
> > this
> > > > > > example from RedLink SDK:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443
> > > > > >
> > > > > > Hope that helps
> > > > > >
> > > > > > On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <da...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > The next step would be to pull this code into an svn branch.
> > This
> > > is
> > > > > > > something I can tackled after the 2.3 release candidate is put
> > > > > together.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Karl
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <
> > > > djayakody@zaizi.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Rafa,
> > > > > > > >
> > > > > > > > Thanks for reviewing my code and for your feedback. Please
> see
> > my
> > > > > > > comments
> > > > > > > > inline below.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <rharo@apache.org
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Dileepa,
> > > > > > > > >
> > > > > > > > > This seems to be going in the right direction clearly now
> in
> > my
> > > > > > > opinion.
> > > > > > > > > Quick comments after a first review:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >    - Rejecting a document because it can't be enhanced is
> > kind
> > > of
> > > > > > > tough.
> > > > > > > > >    You are preventing a document to be finally indexed
> > because
> > > > the
> > > > > > > > > enhancement
> > > > > > > > >    didn't perform correctly, probably it is better just to
> > let
> > > > them
> > > > > > > > > continue
> > > > > > > > >    the workflow within the system
> > > > > > > > >
> > > > > > > >
> > > > > > > > Got your point. Will remove that part from the code
> > > > > > > >
> > > > > > > >
> > > > > > > > >    - As I can deduce for the code, you are correctly
> > extracting
> > > > the
> > > > > > > > >    configured dereferenced fields, but you are not
> processing
> > > at
> > > > > all
> > > > > > > the
> > > > > > > > >    LDPath results
> > > > > > > > >
> > > > > > > > > I'm passing the LDPath program as an enhancer parameter to
> > > > Stanbol
> > > > > to
> > > > > > > > retrieve the enhancement result according to the LDPath
> program
> > > > > (which
> > > > > > is
> > > > > > > > given as a text string in the connector UI).
> > > > > > > > If the user has not defined a LDPath program and added
> > derefence
> > > > > fields
> > > > > > > in
> > > > > > > > the UI instead, then the enhancement request will be built
> > using
> > > > the
> > > > > > > > dereference fields as enhancer parameters.
> > > > > > > >
> > > > > > > >
> > > > > > > > If neither a LDPath or dereference fields are given in the
> > > > > > transformation
> > > > > > > > UI, then I just call the given enhancement chain without any
> > > other
> > > > > > > enhancer
> > > > > > > > paramaters.
> > > > > > > >
> > > > > > > > Please refer below code segment where I do this and let me
> know
> > > if
> > > > it
> > > > > > > needs
> > > > > > > > more improvements.
> > > > > > > >
> > > > > > > >             // ldpath program is given priority if it's set
> > > > > > > >             if (ldPath != null)
> > > > > > > >             {
> > > > > > > >                 parameters =
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
> > > > > > > >             }
> > > > > > > >             else if (!derefFields.isEmpty())
> > > > > > > >             {
> > > > > > > >                 parameters =
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
> > > > > > > >                         derefFields.keySet()).build();
> > > > > > > >             }
> > > > > > > >             else
> > > > > > > >             {
> > > > > > > >                 parameters =
> > > > > > > >
> > > > > >
> > > >
> > EnhancerParameters.builder().setChain(chain).setContent(content).build();
> > > > > > > >             }
> > > > > > > >             eRes = enhancerClient.enhance(parameters);
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Dileepa
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Rafa
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <
> > > > > > djayakody@zaizi.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi All,
> > > > > > > > > >
> > > > > > > > > > As per our discussion I have modified the Stanbol
> Connector
> > > so
> > > > > that
> > > > > > > it
> > > > > > > > > adds
> > > > > > > > > > all extracted entity URIs and entity attributes to the
> > > > repository
> > > > > > > > > document
> > > > > > > > > > as fields.
> > > > > > > > > >
> > > > > > > > > > On a separate branch I have committed this code to our
> > github
> > > > > > project
> > > > > > > > > > sensefy-connectors.
> > > > > > > > > > You can find the source code here:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > > > > > > > > > Let me know your feedback.
> > > > > > > > > >
> > > > > > > > > > I will write a blog post on how to add it in a connection
> > and
> > > > get
> > > > > > > > > > ehancement results and share it with you.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Dileepa
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <
> > > > daddywri@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Dileepa,
> > > > > > > > > > >
> > > > > > > > > > > You cannot create sub-documents in a transformation
> > > > connector.
> > > > > > And
> > > > > > > > > > adding
> > > > > > > > > > > that capability to the framework is not possible; we
> > would
> > > be
> > > > > > > missing
> > > > > > > > > key
> > > > > > > > > > > bookkeeping logic if that was allowed.
> > > > > > > > > > >
> > > > > > > > > > > Karl
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
> > > > > > > > djayakody@zaizi.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Karl,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks a lot for the pointer.
> > > > > > > > > > > >
> > > > > > > > > > > > Stanbol doesn't update an existing document, it
> > > generates a
> > > > > new
> > > > > > > > > > response
> > > > > > > > > > > > with requested enhancement details for the content
> > > > enhansment
> > > > > > > > > request.
> > > > > > > > > > > > For example for a request like : "Paris is a city in
> > > > France"
> > > > > > > > > following
> > > > > > > > > > > RDF
> > > > > > > > > > > > response [1] is given by Stanbol.
> > > > > > > > > > > >
> > > > > > > > > > > > In the Stanbol connector, enhancement artifacts such
> as
> > > > > > > > > TextAnnotations
> > > > > > > > > > > > and EntityAnnotations are extracted from the RDF
> > > response,
> > > > to
> > > > > > > > > generate
> > > > > > > > > > > the
> > > > > > > > > > > > entity abstractions and add them to the mcf
> repository
> > > > > > document.
> > > > > > > > > > > Currently
> > > > > > > > > > > > in the Stanbol connector we have added these entity
> > > > > > abstractions
> > > > > > > as
> > > > > > > > > > JSON
> > > > > > > > > > > > strings to a multi-valued 'entities' field in the
> > > > repository
> > > > > > > > document
> > > > > > > > > > and
> > > > > > > > > > > > we parse that JSON in the SolrWrapper output
> connector
> > to
> > > > > index
> > > > > > > in
> > > > > > > > > > > separate
> > > > > > > > > > > > Solr cores (primary documents, linked entities and
> > entity
> > > > > types
> > > > > > > > with
> > > > > > > > > > > their
> > > > > > > > > > > > attributes).
> > > > > > > > > > > >
> > > > > > > > > > > > Can we can have a primary repository document and
> > create
> > > > sub
> > > > > > > > > documents
> > > > > > > > > > > for
> > > > > > > > > > > > the extracted entities? Is it possible to generate
> sub
> > > > > > documents
> > > > > > > > for
> > > > > > > > > a
> > > > > > > > > > > > repo-document in a transformation connector?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks.
> > > > > > > > > > > > Dileepa
> > > > > > > > > > > >
> > > > > > > > > > > > [1] Sample Stanbol response
> > > > > > > > > > > >
> > > > > > > > > > > > {
> > > > > > > > > > > >   "@context": {
> > > > > > > > > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > > > > > > > > > >     "dc": "http://purl.org/dc/terms/",
> > > > > > > > > > > >     "dc:created": {
> > > > > > > > > > > >       "@type": "xsd:dateTime"
> > > > > > > > > > > >     },
> > > > > > > > > > > >     "enhancer": "
> http://fise.iks-project.eu/ontology/
> > ",
> > > > > > > > > > > >     "enhancer:confidence": {
> > > > > > > > > > > >       "@type": "xsd:double"
> > > > > > > > > > > >     },
> > > > > > > > > > > >     "enhancer:end": {
> > > > > > > > > > > >       "@type": "xsd:int"
> > > > > > > > > > > >     },
> > > > > > > > > > > >     "enhancer:entity-reference": {
> > > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > > >     },
> > > > > > > > > > > >     "enhancer:entity-type": {
> > > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > > >     },
> > > > > > > > > > > >     "enhancer:extracted-from": {
> > > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > > >     },
> > > > > > > > > > > >     "enhancer:start": {
> > > > > > > > > > > >       "@type": "xsd:int"
> > > > > > > > > > > >     },
> > > > > > > > > > > >     "entityhub": "
> > > > > > > > > > >
> http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > > > > > > > > > ",
> > > > > > > > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > > > > > > > > > >     "foaf:depiction": {
> > > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > > >     },
> > > > > > > > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > > > > > > > > > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > > > > > > > > > > >     "schema": "http://schema.org/",
> > > > > > > > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > > > > > > > > > >   },
> > > > > > > > > > > >   "@graph": [
> > > > > > > > > > > >     {
> > > > > > > > > > > >       "@id": "http://dbpedia.org/resource/France",
> > > > > > > > > > > >       "@type": [
> > > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > > >         "owl:Thing",
> > > > > > > > > > > >         "schema:Country",
> > > > > > > > > > > >         "schema:Place"
> > > > > > > > > > > >       ],
> > > > > > > > > > > >       "foaf:depiction": [
> > > > > > > > > > > >         "
> > > > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > > > > > > > > > ",
> > > > > > > > > > > >         "
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > > > > > > > > > "
> > > > > > > > > > > >       ],
> > > > > > > > > > > >       "rdfs:comment": {
> > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > >         "@value": "France, officially the French
> > > Republic,
> > > > > is a
> > > > > > > > > > > > unitary semi-presidential republic in Western Europe
> > with
> > > > > > several
> > > > > > > > > > > > overseas territories and islands located on other
> > > > continents
> > > > > > and
> > > > > > > in
> > > > > > > > > > > > the Indian, Pacific, and Atlantic oceans.
> Metropolitan
> > > > France
> > > > > > > > extends
> > > > > > > > > > > > from the Mediterranean Sea to the English Channel and
> > the
> > > > > North
> > > > > > > > Sea,
> > > > > > > > > > > > and from the Rhine to the Atlantic Ocean. It is often
> > > > > referred
> > > > > > to
> > > > > > > > as
> > > > > > > > > > > > l’Hexagone because of the geometric shape of its
> > > > territory."
> > > > > > > > > > > >       },
> > > > > > > > > > > >       "rdfs:label": [
> > > > > > > > > > > >         {
> > > > > > > > > > > >           "@language": "en",
> > > > > > > > > > > >           "@value": "France"
> > > > > > > > > > > >         },
> > > > > > > > > > > >         {
> > > > > > > > > > > >           "@language": "fr",
> > > > > > > > > > > >           "@value": "France"
> > > > > > > > > > > >         },
> > > > > > > > > > > >       ]
> > > > > > > > > > > >     },
> > > > > > > > > > > >
> > > > > > > > > > > >     {
> > > > > > > > > > > >       "@id": "http://dbpedia.org/resource/Paris",
> > > > > > > > > > > >       "@type": [
> > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > >         "dbp-ont:Settlement",
> > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > > >         "owl:Thing",
> > > > > > > > > > > >         "schema:Place"
> > > > > > > > > > > >       ],
> > > > > > > > > > > >       "foaf:depiction": [
> > > > > > > > > > > >         "
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > > > > ",
> > > > > > > > > > > >         "
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > > > > "
> > > > > > > > > > > >       ],
> > > > > > > > > > > >       "geo:lat": 48.8567,
> > > > > > > > > > > >       "geo:long": 2.3508,
> > > > > > > > > > > >       "rdfs:comment": {
> > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > >         "@value": "Paris is the capital and largest
> > city
> > > of
> > > > > > > France.
> > > > > > > > > It
> > > > > > > > > > > > is situated on the river Seine, in northern France,
> at
> > > the
> > > > > > heart
> > > > > > > of
> > > > > > > > > > > > the Île-de-France region (or Paris Region, French:
> > Région
> > > > > > > > > parisienne).
> > > > > > > > > > > > As of January 2008 the city of Paris, within its
> > > > > administrative
> > > > > > > > > limits
> > > > > > > > > > > > largely unchanged since 1860, has an estimated
> > population
> > > > of
> > > > > > > > > 2,211,297
> > > > > > > > > > > > and a metropolitan population of 12,089,098, and is
> one
> > > of
> > > > > the
> > > > > > > most
> > > > > > > > > > > > populated metropolitan areas in Europe."
> > > > > > > > > > > >       },
> > > > > > > > > > > >       "rdfs:label": [
> > > > > > > > > > > >
> > > > > > > > > > > >         {
> > > > > > > > > > > >           "@language": "en",
> > > > > > > > > > > >           "@value": "Paris"
> > > > > > > > > > > >         },
> > > > > > > > > > > >         {
> > > > > > > > > > > >           "@language": "fr",
> > > > > > > > > > > >           "@value": "Paris"
> > > > > > > > > > > >         },
> > > > > > > > > > > >       ]
> > > > > > > > > > > >     },
> > > > > > > > > > > >    }
> > > > > > > > > > > >     {
> > > > > > > > > > > >       "@id":
> > > > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > > > > >       "@type": [
> > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > > > > >       ],
> > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > > > > >       "enhancer:confidence": 0.6017613,
> > > > > > > > > > > >       "enhancer:end": 5,
> > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > >
> > > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > >         "@value": "Paris"
> > > > > > > > > > > >       },
> > > > > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > > > > >       },
> > > > > > > > > > > >       "enhancer:start": 0
> > > > > > > > > > > >     },
> > > > > > > > > > > >     {
> > > > > > > > > > > >       "@id":
> > > > > > > > "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > > > > > > > > > >       "@type": [
> > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > > > >       ],
> > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > > > >       "dc:relation":
> > > > > > > > > > > >
> "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > > > >       "enhancer:confidence": 1.0,
> > > > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > >         "@value": "France"
> > > > > > > > > > > >       },
> > > > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > http://dbpedia.org/resource/France
> > > > > > > > > > ",
> > > > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > >         "schema:Country",
> > > > > > > > > > > >         "schema:Place",
> > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > > >         "owl:Thing"
> > > > > > > > > > > >       ],
> > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > >
> > > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > > > >     },
> > > > > > > > > > > >     {
> > > > > > > > > > > >       "@id":
> > > > > > > > "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > > > > > > > > > >       "@type": [
> > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > > > >       ],
> > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > > > >       "dc:relation":
> > > > > > > > > > > >
> "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > > > >       "enhancer:confidence": 0.25715446,
> > > > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > >         "@value": "Vichy France"
> > > > > > > > > > > >       },
> > > > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > > > > http://dbpedia.org/resource/Vichy_France",
> > > > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > >         "schema:Country",
> > > > > > > > > > > >         "schema:Place",
> > > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > > >         "owl:Thing"
> > > > > > > > > > > >       ],
> > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > >
> > > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > > > >     },
> > > > > > > > > > > >     {
> > > > > > > > > > > >       "@id":
> > > > > > > > "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > > > > > > > > > >       "@type": [
> > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > > > >       ],
> > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > > > >       "dc:relation":
> > > > > > > > > > > >
> "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > > > > >       "enhancer:confidence": 0.1493264,
> > > > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > >         "@value": "Paris Commune"
> > > > > > > > > > > >       },
> > > > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > > > > http://dbpedia.org/resource/Paris_Commune",
> > > > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > > >         "schema:Country",
> > > > > > > > > > > >         "schema:Place",
> > > > > > > > > > > >         "owl:Thing"
> > > > > > > > > > > >       ],
> > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > >
> > > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > > > >     },
> > > > > > > > > > > >     {
> > > > > > > > > > > >       "@id":
> > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > > > >       "@type": [
> > > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > > > > >       ],
> > > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > > > > >       "dc:creator":
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > > > > >       "enhancer:confidence": 0.99354976,
> > > > > > > > > > > >       "enhancer:end": 18,
> > > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > > >
> > > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > >         "@value": "France"
> > > > > > > > > > > >       },
> > > > > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > > > > >         "@language": "en",
> > > > > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > > > > >       },
> > > > > > > > > > > >       "enhancer:start": 12
> > > > > > > > > > > >     }
> > > > > > > > > > > >   ]
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <
> > > > > > daddywri@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Dileepa,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Repository connectors have an abstraction that
> allows
> > > > them
> > > > > to
> > > > > > > > > > generate
> > > > > > > > > > > > > compound documents (where a document has a primary
> > > > > > identifier,
> > > > > > > > and
> > > > > > > > > > > there
> > > > > > > > > > > > > are subdocuments that share that primary identifier
> > and
> > > > > have
> > > > > > a
> > > > > > > > > > > secondary
> > > > > > > > > > > > > identifier).  This sounds a bit like what you are
> > > > > describing.
> > > > > > > > Does
> > > > > > > > > > > > Stanbol
> > > > > > > > > > > > > work by decorating an existing document, or does it
> > > work
> > > > by
> > > > > > > > > > generating
> > > > > > > > > > > > all
> > > > > > > > > > > > > content for a document?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Karl
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> > > > > > > > > > djayakody@zaizi.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > While thanking you all for your input on Stanbol
> > > > > connector
> > > > > > > > > > > > requirement, I
> > > > > > > > > > > > > > would like to continue with modifying the Stanbol
> > > > > connector
> > > > > > > to
> > > > > > > > be
> > > > > > > > > > > > > > compatible with any output connector. If you guys
> > can
> > > > > give
> > > > > > > some
> > > > > > > > > > > > guidance
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > how the entity metadata should be added to the
> > > > repository
> > > > > > > > > document
> > > > > > > > > > I
> > > > > > > > > > > > can
> > > > > > > > > > > > > > modify the stanbol connector accordingly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > From Rafa's comments, I gathered we can add the
> > > entity
> > > > > > > metadata
> > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > > > > repo.doc as key value pairs.
> > > > > > > > > > > > > > However this idea is not yet clear to me. There
> > could
> > > > be
> > > > > > 'N'
> > > > > > > > > number
> > > > > > > > > > > of
> > > > > > > > > > > > > > entities in a document and each of them will have
> > > some
> > > > > > common
> > > > > > > > > > > > attributes
> > > > > > > > > > > > > > such as name, id, type and specific attributes
> for
> > > > > > particular
> > > > > > > > > > entity
> > > > > > > > > > > > > type.
> > > > > > > > > > > > > > I'm not clear on how to maintain that structure
> of
> > N
> > > > > number
> > > > > > > of
> > > > > > > > > > > entities
> > > > > > > > > > > > > > with their attributes in a repo.document as key
> > value
> > > > > pairs
> > > > > > > and
> > > > > > > > > > make
> > > > > > > > > > > > them
> > > > > > > > > > > > > > LDPath compatible for retrieval in an output
> > > connector.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @Rafa
> > > > > > > > > > > > > > If you can please elaborate on your suggestion it
> > > would
> > > > > be
> > > > > > > > > greatly
> > > > > > > > > > > > > helpful
> > > > > > > > > > > > > > to me.
> > > > > > > > > > > > > > All other suggestions are also welcome.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Dileepa
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <
> > > > > > > > daddywri@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I, too, agree.  Somebody will need to turn this
> > > > > connector
> > > > > > > > into
> > > > > > > > > > one
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > plays by the rules.  It may be possible for
> > someone
> > > > on
> > > > > > the
> > > > > > > > team
> > > > > > > > > > > here
> > > > > > > > > > > > to
> > > > > > > > > > > > > > do
> > > > > > > > > > > > > > > that, but it won't be me; I'm seriously
> > > overextended
> > > > at
> > > > > > the
> > > > > > > > > > moment.
> > > > > > > > > > > > It
> > > > > > > > > > > > > > > would be best if someone who knew the connector
> > > well
> > > > > > could
> > > > > > > do
> > > > > > > > > the
> > > > > > > > > > > > > > necessary
> > > > > > > > > > > > > > > work.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Karl
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> > > > > > > > > > rharoapache@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I must agree with Antonio. When I started to
> > work
> > > > on
> > > > > > > this I
> > > > > > > > > was
> > > > > > > > > > > > > > expecting
> > > > > > > > > > > > > > > > the connector to work by just extracting the
> > > > entities
> > > > > > and
> > > > > > > > > > > entities
> > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > and put them as plain metadata of the
> > documents,
> > > > > > probably
> > > > > > > > > > > following
> > > > > > > > > > > > > > > LDPATH
> > > > > > > > > > > > > > > > queries configuration
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This is probably ok for Sensefy but I don’t
> > think
> > > > > this
> > > > > > > > could
> > > > > > > > > be
> > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > to be included in the project. But this is
> only
> > > my
> > > > > > > opinion.
> > > > > > > > > Of
> > > > > > > > > > > > > course,
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > version of the connector that fully respect
> the
> > > > > > > ManifoldCF
> > > > > > > > > > > > > architecture
> > > > > > > > > > > > > > > > would be more than welcome in my opinion
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio
> David
> > > > Pérez
> > > > > > > > Morales
> > > > > > > > > > > > > > > > <ad...@gmail.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi
> > > > > > > > > > > > > > > > > The removal of the SolrWrapper is a must.
> It
> > > was
> > > > a
> > > > > > > > > > requirement
> > > > > > > > > > > > for
> > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > internal project which has nothing to do
> here
> > > > with
> > > > > a
> > > > > > > > normal
> > > > > > > > > > > > > operation
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > Manifold, so forcing the users to use Solr
> > does
> > > > not
> > > > > > fit
> > > > > > > > the
> > > > > > > > > > > > > Manifold
> > > > > > > > > > > > > > > > > philosophy.
> > > > > > > > > > > > > > > > > In my opinion, at this moment, a Stanbol
> > > > connector
> > > > > > with
> > > > > > > > > such
> > > > > > > > > > a
> > > > > > > > > > > > big
> > > > > > > > > > > > > > > > > dependency which will not fit almost any
> use
> > > case
> > > > > is
> > > > > > > not
> > > > > > > > > very
> > > > > > > > > > > > > useful.
> > > > > > > > > > > > > > > > > You should think a way to convert Stanbol
> > > > connector
> > > > > > > into
> > > > > > > > a
> > > > > > > > > > > normal
> > > > > > > > > > > > > > > > > Transformation connector without assuming
> > that
> > > a
> > > > > > > specific
> > > > > > > > > > > output
> > > > > > > > > > > > > > > > connector
> > > > > > > > > > > > > > > > > will be used.
> > > > > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa
> Jayakody <
> > > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> I have developed a Stanbol connector for
> > MCF.
> > > > You
> > > > > > can
> > > > > > > > > check
> > > > > > > > > > it
> > > > > > > > > > > > out
> > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > >> github repo here:
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> It requires the SolrWrapper output
> connector
> > > > which
> > > > > > > > indexes
> > > > > > > > > > > > > enhanced
> > > > > > > > > > > > > > > > >> documents, entities and entityTypes in
> > > separate
> > > > > Solr
> > > > > > > > > cores.
> > > > > > > > > > > > > > Basically
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > >> requires 3 separate solr cores configured
> > > with a
> > > > > > > > specific
> > > > > > > > > > Solr
> > > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > >> primary documents, entities and
> entityTypes
> > > > > > > separately.
> > > > > > > > > This
> > > > > > > > > > > was
> > > > > > > > > > > > > > done
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > >> our specific use-case.
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> Perhaps we can discuss and remove the
> > Stanbol
> > > > > > > > connector's
> > > > > > > > > > > > > dependency
> > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > >> SolrWrapper and have it working with any
> > > output
> > > > > > > > connector.
> > > > > > > > > > > > > > > > >> Please note that the Stanbol connector
> > > currently
> > > > > > has a
> > > > > > > > bug
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > >> (editSpecification) which I'm working on
> at
> > > the
> > > > > > > moment.
> > > > > > > > > > After
> > > > > > > > > > > > > fixing
> > > > > > > > > > > > > > > > that I
> > > > > > > > > > > > > > > > >> will update here. And also I will provide
> > > > > > > documentations
> > > > > > > > > for
> > > > > > > > > > > > > > > configuring
> > > > > > > > > > > > > > > > >> the connector.
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > > > > > >> Dileepa
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio
> > David
> > > > > Pérez
> > > > > > > > > Morales
> > > > > > > > > > <
> > > > > > > > > > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> > Hi Joshua
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > It is not the list for that, but
> Marmotta
> > is
> > > > > > already
> > > > > > > > > > > > integrated
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > Apache
> > > > > > > > > > > > > > > > >> > Stanbol. You can take a look at this
> issue
> > > > > > > > > > > > > > > > >> >
> > > > > > https://issues.apache.org/jira/browse/STANBOL-1165
> > > > > > > .
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > Anyway, as I said this is not the list
> for
> > > > that,
> > > > > > so
> > > > > > > > > let's
> > > > > > > > > > > use
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > proper
> > > > > > > > > > > > > > > > >> > list for these things.
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > Regards
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua
> Dunham <
> > > > > > > > > > > > > joshua.dunham@gmail.com
> > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > > Hey Dileepa,
> > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > >> > >       In case you were interested, I
> > > pinged
> > > > > the
> > > > > > > > list a
> > > > > > > > > > few
> > > > > > > > > > > > > days
> > > > > > > > > > > > > > > ago
> > > > > > > > > > > > > > > > >> > asking
> > > > > > > > > > > > > > > > >> > > for integration tips for Apache
> > Marmotta.
> > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > >> > > I got some great tips on how to do
> this
> > > > which
> > > > > > > could
> > > > > > > > > help
> > > > > > > > > > > > you.
> > > > > > > > > > > > > > > Since
> > > > > > > > > > > > > > > > >> > > Marmotta is a drop in replacement for
> > > > Clarezza
> > > > > > on
> > > > > > > > > > Stanbol
> > > > > > > > > > > it
> > > > > > > > > > > > > may
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > >> > easier
> > > > > > > > > > > > > > > > >> > > for you to take this way.
> > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > >> > > I'm not a Java programmer but I'm
> > bringing
> > > > > this
> > > > > > > > > problem
> > > > > > > > > > to
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > >> > development
> > > > > > > > > > > > > > > > >> > > staff at my company for assistance. If
> > you
> > > > > like
> > > > > > > the
> > > > > > > > > > > Marmotta
> > > > > > > > > > > > > > > > approach
> > > > > > > > > > > > > > > > >> we
> > > > > > > > > > > > > > > > >> > > may gain more traction solving the
> same
> > > > > > > integration.
> > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > >> > > I'm also integrating Marmotta with
> > Stanbol
> > > > so
> > > > > > the
> > > > > > > > > effect
> > > > > > > > > > > > would
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > >> > same
> > > > > > > > > > > > > > > > >> > > except not using the Stanbol API for
> > data
> > > > > import
> > > > > > > in
> > > > > > > > > > favor
> > > > > > > > > > > of
> > > > > > > > > > > > > > > > Marmotta.
> > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > >> > > Best,
> > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > >> > > -J
> > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa
> > > > > Jayakody <
> > > > > > > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > >> > > > Thanks you for the feedback and
> > offering
> > > > > your
> > > > > > > help
> > > > > > > > > in
> > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > >> > > > Let me get back to you on where to
> > start
> > > > the
> > > > > > > code
> > > > > > > > > > base.
> > > > > > > > > > > > > > > > >> > > > As the first step, I would like to
> > start
> > > > by
> > > > > > > > > creating a
> > > > > > > > > > > > > > > > architecture
> > > > > > > > > > > > > > > > >> > > diagram
> > > > > > > > > > > > > > > > >> > > > for the connector.
> > > > > > > > > > > > > > > > >> > > > I will send the diagram for your
> > review
> > > > > soon.
> > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > > > > > > >> > > > Dileepa
> > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > >> > > > ------------------------------
> > > > > > > > > > > > > > > > >> > > > This message should be regarded as
> > > > > > confidential.
> > > > > > > > If
> > > > > > > > > > you
> > > > > > > > > > > > have
> > > > > > > > > > > > > > > > received
> > > > > > > > > > > > > > > > >> > > this
> > > > > > > > > > > > > > > > >> > > > email in error please notify the
> > sender
> > > > and
> > > > > > > > destroy
> > > > > > > > > it
> > > > > > > > > > > > > > > > immediately.
> > > > > > > > > > > > > > > > >> > > > Statements of intent shall only
> become
> > > > > binding
> > > > > > > > when
> > > > > > > > > > > > > confirmed
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > hard
> > > > > > > > > > > > > > > > >> > > copy
> > > > > > > > > > > > > > > > >> > > > by an authorised signatory.
> > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > >> > > > Zaizi Ltd is registered in England
> and
> > > > Wales
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > > > > > > > registration
> > > > > > > > > > > > > > > > >> > number
> > > > > > > > > > > > > > > > >> > > > 6440931. The Registered Office is
> > Brook
> > > > > House,
> > > > > > > 229
> > > > > > > > > > > > Shepherds
> > > > > > > > > > > > > > > Bush
> > > > > > > > > > > > > > > > >> Road,
> > > > > > > > > > > > > > > > >> > > > London W6 7AN.
> > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> --
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> ------------------------------
> > > > > > > > > > > > > > > > >> This message should be regarded as
> > > confidential.
> > > > > If
> > > > > > > you
> > > > > > > > > have
> > > > > > > > > > > > > > received
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > >> email in error please notify the sender
> and
> > > > > destroy
> > > > > > it
> > > > > > > > > > > > > immediately.
> > > > > > > > > > > > > > > > >> Statements of intent shall only become
> > binding
> > > > > when
> > > > > > > > > > confirmed
> > > > > > > > > > > in
> > > > > > > > > > > > > > hard
> > > > > > > > > > > > > > > > copy
> > > > > > > > > > > > > > > > >> by an authorised signatory.
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> Zaizi Ltd is registered in England and
> Wales
> > > > with
> > > > > > the
> > > > > > > > > > > > registration
> > > > > > > > > > > > > > > > number
> > > > > > > > > > > > > > > > >> 6440931. The Registered Office is Brook
> > House,
> > > > 229
> > > > > > > > > Shepherds
> > > > > > > > > > > > Bush
> > > > > > > > > > > > > > > Road,
> > > > > > > > > > > > > > > > >> London W6 7AN.
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ------------------------------
> > > > > > > > > > > > > > This message should be regarded as confidential.
> If
> > > you
> > > > > > have
> > > > > > > > > > received
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > email in error please notify the sender and
> destroy
> > > it
> > > > > > > > > immediately.
> > > > > > > > > > > > > > Statements of intent shall only become binding
> when
> > > > > > confirmed
> > > > > > > > in
> > > > > > > > > > hard
> > > > > > > > > > > > > copy
> > > > > > > > > > > > > > by an authorised signatory.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Zaizi Ltd is registered in England and Wales with
> > the
> > > > > > > > > registration
> > > > > > > > > > > > number
> > > > > > > > > > > > > > 6440931. The Registered Office is Brook House,
> 229
> > > > > > Shepherds
> > > > > > > > Bush
> > > > > > > > > > > Road,
> > > > > > > > > > > > > > London W6 7AN.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > >
> > > > > > > > > > > > ------------------------------
> > > > > > > > > > > > This message should be regarded as confidential. If
> you
> > > > have
> > > > > > > > received
> > > > > > > > > > > this
> > > > > > > > > > > > email in error please notify the sender and destroy
> it
> > > > > > > immediately.
> > > > > > > > > > > > Statements of intent shall only become binding when
> > > > confirmed
> > > > > > in
> > > > > > > > hard
> > > > > > > > > > > copy
> > > > > > > > > > > > by an authorised signatory.
> > > > > > > > > > > >
> > > > > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > > > > registration
> > > > > > > > > > number
> > > > > > > > > > > > 6440931. The Registered Office is Brook House, 229
> > > > Shepherds
> > > > > > Bush
> > > > > > > > > Road,
> > > > > > > > > > > > London W6 7AN.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > ------------------------------
> > > > > > > > > > This message should be regarded as confidential. If you
> > have
> > > > > > received
> > > > > > > > > this
> > > > > > > > > > email in error please notify the sender and destroy it
> > > > > immediately.
> > > > > > > > > > Statements of intent shall only become binding when
> > confirmed
> > > > in
> > > > > > hard
> > > > > > > > > copy
> > > > > > > > > > by an authorised signatory.
> > > > > > > > > >
> > > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > > registration
> > > > > > > > number
> > > > > > > > > > 6440931. The Registered Office is Brook House, 229
> > Shepherds
> > > > Bush
> > > > > > > Road,
> > > > > > > > > > London W6 7AN.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > ------------------------------
> > > > > > > > This message should be regarded as confidential. If you have
> > > > received
> > > > > > > this
> > > > > > > > email in error please notify the sender and destroy it
> > > immediately.
> > > > > > > > Statements of intent shall only become binding when confirmed
> > in
> > > > hard
> > > > > > > copy
> > > > > > > > by an authorised signatory.
> > > > > > > >
> > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > registration
> > > > > > number
> > > > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> > Bush
> > > > > Road,
> > > > > > > > London W6 7AN.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > ------------------------------
> > > > > This message should be regarded as confidential. If you have
> received
> > > > this
> > > > > email in error please notify the sender and destroy it immediately.
> > > > > Statements of intent shall only become binding when confirmed in
> hard
> > > > copy
> > > > > by an authorised signatory.
> > > > >
> > > > > Zaizi Ltd is registered in England and Wales with the registration
> > > number
> > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > Road,
> > > > > London W6 7AN.
> > > > >
> > > >
> > >
> > > --
> > >
> > > ------------------------------
> > > This message should be regarded as confidential. If you have received
> > this
> > > email in error please notify the sender and destroy it immediately.
> > > Statements of intent shall only become binding when confirmed in hard
> > copy
> > > by an authorised signatory.
> > >
> > > Zaizi Ltd is registered in England and Wales with the registration
> number
> > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > > London W6 7AN.
> > >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Dileepa Jayakody <dj...@zaizi.com>.
Hi All,

I have done the discussed modifications to the Stanbol connector. Now the
users can either define dereference fields or define a LDPath program to
extract entity properties from Stanbol entities and add them to the
document as fields.

The latest code is available here for your review:
https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector

I have also written a blog post on how to configure the connector:
http://dileepajayakody.blogspot.com/2016/01/enhancing-documents-in-apache.html

Looking forward to your comments.

Thanks,
Dileepa


On Mon, Dec 14, 2015 at 1:18 PM, Rafa Haro <rh...@apache.org> wrote:

> Hi Karl,
>
> I will import this one, don't worry.
>
> Cheers,
> Rafa
> El El sáb, 12 dic 2015 a las 20:36, Dileepa Jayakody <dj...@zaizi.com>
> escribió:
>
> > Hi Karl,
> >
> > Yes, I will improve the code with Rafa's reviews and then we can import
> it
> > to mcf code base.
> >
> > Thanks
> > Dileepa
> >
> > On Sat, Dec 12, 2015 at 5:26 PM, Karl Wright <da...@gmail.com> wrote:
> >
> > > Ok, it seems premature for me to try to import this from Github today,
> so
> > > I'll wait until the dust settles a bit further first.
> > >
> > > Karl
> > >
> > >
> > > On Fri, Dec 11, 2015 at 1:45 PM, Dileepa Jayakody <djayakody@zaizi.com
> >
> > > wrote:
> > >
> > > > Thanks a lot Rafa for pointing that out. big miss as  I didn't test
> the
> > > > LDPath configuration part yet. More improvements to be done.
> > > > I will do the required mprovements as pointed out.
> > > >
> > > > Regards,
> > > > Dileepa
> > > >
> > > >
> > > > On Fri, Dec 11, 2015 at 8:42 PM, Rafa Haro <rh...@apache.org> wrote:
> > > >
> > > > > Hi Dileepa,
> > > > >
> > > > > The problem is not in that part on the code, it is rather on this
> > part:
> > > > >
> > > > > if (entity != null) { Collection<String> properties = entity.
> > > > > getProperties(); for (String property : properties) { String
> > > > > targetFieldName = derefFields.get(property); Set<String>
> propValues =
> > > > > entityPropertyMap.get(targetFieldName); if (propValues == null) {
> > > > > propValues = new HashSet<String>(); } Collection<String>
> > > > entityPropValues =
> > > > > entity.getPropertyValues(property);
> > > propValues.addAll(entityPropValues);
> > > > > entityPropertyMap.put(targetFieldName, propValues); } }
> > > > > You are collecting from the EnhancementStructure response just only
> > the
> > > > > configured dereferenced fields and LDPath fields are ignored. Also,
> > > there
> > > > > is a potential bug in that code if there is no dereferencing field
> > > > > configured for a certain entity property here:
> > > > >
> > > > > String targetFieldName = derefFields.get(property);
> > > > >
> > > > > targetFieldName would be Null then. Instead of trying to index
> every
> > > > > property, you should just collect the configured ones by the user
> (or
> > > at
> > > > > least, if the user wants all of them, provide a configuration
> option
> > > for
> > > > > that).
> > > > >
> > > > > Anyway, going back to LDPath issue, please take into account that
> > when
> > > > you
> > > > > define a field you must use a custom Namespace and Prefix for later
> > > being
> > > > > able to retrieve that property from the entity. If you don't do
> that,
> > > > > Stanbol will provide a random namespace for that property. Check
> this
> > > > > example from RedLink SDK:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443
> > > > >
> > > > > Hope that helps
> > > > >
> > > > > On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <da...@gmail.com>
> > > wrote:
> > > > >
> > > > > > The next step would be to pull this code into an svn branch.
> This
> > is
> > > > > > something I can tackled after the 2.3 release candidate is put
> > > > together.
> > > > > >
> > > > > > Thanks,
> > > > > > Karl
> > > > > >
> > > > > >
> > > > > > On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <
> > > djayakody@zaizi.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Rafa,
> > > > > > >
> > > > > > > Thanks for reviewing my code and for your feedback. Please see
> my
> > > > > > comments
> > > > > > > inline below.
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <rh...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Dileepa,
> > > > > > > >
> > > > > > > > This seems to be going in the right direction clearly now in
> my
> > > > > > opinion.
> > > > > > > > Quick comments after a first review:
> > > > > > > >
> > > > > > > >
> > > > > > > >    - Rejecting a document because it can't be enhanced is
> kind
> > of
> > > > > > tough.
> > > > > > > >    You are preventing a document to be finally indexed
> because
> > > the
> > > > > > > > enhancement
> > > > > > > >    didn't perform correctly, probably it is better just to
> let
> > > them
> > > > > > > > continue
> > > > > > > >    the workflow within the system
> > > > > > > >
> > > > > > >
> > > > > > > Got your point. Will remove that part from the code
> > > > > > >
> > > > > > >
> > > > > > > >    - As I can deduce for the code, you are correctly
> extracting
> > > the
> > > > > > > >    configured dereferenced fields, but you are not processing
> > at
> > > > all
> > > > > > the
> > > > > > > >    LDPath results
> > > > > > > >
> > > > > > > > I'm passing the LDPath program as an enhancer parameter to
> > > Stanbol
> > > > to
> > > > > > > retrieve the enhancement result according to the LDPath program
> > > > (which
> > > > > is
> > > > > > > given as a text string in the connector UI).
> > > > > > > If the user has not defined a LDPath program and added
> derefence
> > > > fields
> > > > > > in
> > > > > > > the UI instead, then the enhancement request will be built
> using
> > > the
> > > > > > > dereference fields as enhancer parameters.
> > > > > > >
> > > > > > >
> > > > > > > If neither a LDPath or dereference fields are given in the
> > > > > transformation
> > > > > > > UI, then I just call the given enhancement chain without any
> > other
> > > > > > enhancer
> > > > > > > paramaters.
> > > > > > >
> > > > > > > Please refer below code segment where I do this and let me know
> > if
> > > it
> > > > > > needs
> > > > > > > more improvements.
> > > > > > >
> > > > > > >             // ldpath program is given priority if it's set
> > > > > > >             if (ldPath != null)
> > > > > > >             {
> > > > > > >                 parameters =
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
> > > > > > >             }
> > > > > > >             else if (!derefFields.isEmpty())
> > > > > > >             {
> > > > > > >                 parameters =
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
> > > > > > >                         derefFields.keySet()).build();
> > > > > > >             }
> > > > > > >             else
> > > > > > >             {
> > > > > > >                 parameters =
> > > > > > >
> > > > >
> > >
> EnhancerParameters.builder().setChain(chain).setContent(content).build();
> > > > > > >             }
> > > > > > >             eRes = enhancerClient.enhance(parameters);
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dileepa
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Rafa
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <
> > > > > djayakody@zaizi.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > As per our discussion I have modified the Stanbol Connector
> > so
> > > > that
> > > > > > it
> > > > > > > > adds
> > > > > > > > > all extracted entity URIs and entity attributes to the
> > > repository
> > > > > > > > document
> > > > > > > > > as fields.
> > > > > > > > >
> > > > > > > > > On a separate branch I have committed this code to our
> github
> > > > > project
> > > > > > > > > sensefy-connectors.
> > > > > > > > > You can find the source code here:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > > > > > > > > Let me know your feedback.
> > > > > > > > >
> > > > > > > > > I will write a blog post on how to add it in a connection
> and
> > > get
> > > > > > > > > ehancement results and share it with you.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Dileepa
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <
> > > daddywri@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Dileepa,
> > > > > > > > > >
> > > > > > > > > > You cannot create sub-documents in a transformation
> > > connector.
> > > > > And
> > > > > > > > > adding
> > > > > > > > > > that capability to the framework is not possible; we
> would
> > be
> > > > > > missing
> > > > > > > > key
> > > > > > > > > > bookkeeping logic if that was allowed.
> > > > > > > > > >
> > > > > > > > > > Karl
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
> > > > > > > djayakody@zaizi.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Karl,
> > > > > > > > > > >
> > > > > > > > > > > Thanks a lot for the pointer.
> > > > > > > > > > >
> > > > > > > > > > > Stanbol doesn't update an existing document, it
> > generates a
> > > > new
> > > > > > > > > response
> > > > > > > > > > > with requested enhancement details for the content
> > > enhansment
> > > > > > > > request.
> > > > > > > > > > > For example for a request like : "Paris is a city in
> > > France"
> > > > > > > > following
> > > > > > > > > > RDF
> > > > > > > > > > > response [1] is given by Stanbol.
> > > > > > > > > > >
> > > > > > > > > > > In the Stanbol connector, enhancement artifacts such as
> > > > > > > > TextAnnotations
> > > > > > > > > > > and EntityAnnotations are extracted from the RDF
> > response,
> > > to
> > > > > > > > generate
> > > > > > > > > > the
> > > > > > > > > > > entity abstractions and add them to the mcf repository
> > > > > document.
> > > > > > > > > > Currently
> > > > > > > > > > > in the Stanbol connector we have added these entity
> > > > > abstractions
> > > > > > as
> > > > > > > > > JSON
> > > > > > > > > > > strings to a multi-valued 'entities' field in the
> > > repository
> > > > > > > document
> > > > > > > > > and
> > > > > > > > > > > we parse that JSON in the SolrWrapper output connector
> to
> > > > index
> > > > > > in
> > > > > > > > > > separate
> > > > > > > > > > > Solr cores (primary documents, linked entities and
> entity
> > > > types
> > > > > > > with
> > > > > > > > > > their
> > > > > > > > > > > attributes).
> > > > > > > > > > >
> > > > > > > > > > > Can we can have a primary repository document and
> create
> > > sub
> > > > > > > > documents
> > > > > > > > > > for
> > > > > > > > > > > the extracted entities? Is it possible to generate sub
> > > > > documents
> > > > > > > for
> > > > > > > > a
> > > > > > > > > > > repo-document in a transformation connector?
> > > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > > > Dileepa
> > > > > > > > > > >
> > > > > > > > > > > [1] Sample Stanbol response
> > > > > > > > > > >
> > > > > > > > > > > {
> > > > > > > > > > >   "@context": {
> > > > > > > > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > > > > > > > > >     "dc": "http://purl.org/dc/terms/",
> > > > > > > > > > >     "dc:created": {
> > > > > > > > > > >       "@type": "xsd:dateTime"
> > > > > > > > > > >     },
> > > > > > > > > > >     "enhancer": "http://fise.iks-project.eu/ontology/
> ",
> > > > > > > > > > >     "enhancer:confidence": {
> > > > > > > > > > >       "@type": "xsd:double"
> > > > > > > > > > >     },
> > > > > > > > > > >     "enhancer:end": {
> > > > > > > > > > >       "@type": "xsd:int"
> > > > > > > > > > >     },
> > > > > > > > > > >     "enhancer:entity-reference": {
> > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > >     },
> > > > > > > > > > >     "enhancer:entity-type": {
> > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > >     },
> > > > > > > > > > >     "enhancer:extracted-from": {
> > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > >     },
> > > > > > > > > > >     "enhancer:start": {
> > > > > > > > > > >       "@type": "xsd:int"
> > > > > > > > > > >     },
> > > > > > > > > > >     "entityhub": "
> > > > > > > > > > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > > > > > > > > ",
> > > > > > > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > > > > > > > > >     "foaf:depiction": {
> > > > > > > > > > >       "@type": "@id"
> > > > > > > > > > >     },
> > > > > > > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > > > > > > > > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > > > > > > > > > >     "schema": "http://schema.org/",
> > > > > > > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > > > > > > > > >   },
> > > > > > > > > > >   "@graph": [
> > > > > > > > > > >     {
> > > > > > > > > > >       "@id": "http://dbpedia.org/resource/France",
> > > > > > > > > > >       "@type": [
> > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > >         "owl:Thing",
> > > > > > > > > > >         "schema:Country",
> > > > > > > > > > >         "schema:Place"
> > > > > > > > > > >       ],
> > > > > > > > > > >       "foaf:depiction": [
> > > > > > > > > > >         "
> > > > > > > > > > >
> > > > > > > >
> > > > >
> > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > > > > > > > > ",
> > > > > > > > > > >         "
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > > > > > > > > "
> > > > > > > > > > >       ],
> > > > > > > > > > >       "rdfs:comment": {
> > > > > > > > > > >         "@language": "en",
> > > > > > > > > > >         "@value": "France, officially the French
> > Republic,
> > > > is a
> > > > > > > > > > > unitary semi-presidential republic in Western Europe
> with
> > > > > several
> > > > > > > > > > > overseas territories and islands located on other
> > > continents
> > > > > and
> > > > > > in
> > > > > > > > > > > the Indian, Pacific, and Atlantic oceans. Metropolitan
> > > France
> > > > > > > extends
> > > > > > > > > > > from the Mediterranean Sea to the English Channel and
> the
> > > > North
> > > > > > > Sea,
> > > > > > > > > > > and from the Rhine to the Atlantic Ocean. It is often
> > > > referred
> > > > > to
> > > > > > > as
> > > > > > > > > > > l’Hexagone because of the geometric shape of its
> > > territory."
> > > > > > > > > > >       },
> > > > > > > > > > >       "rdfs:label": [
> > > > > > > > > > >         {
> > > > > > > > > > >           "@language": "en",
> > > > > > > > > > >           "@value": "France"
> > > > > > > > > > >         },
> > > > > > > > > > >         {
> > > > > > > > > > >           "@language": "fr",
> > > > > > > > > > >           "@value": "France"
> > > > > > > > > > >         },
> > > > > > > > > > >       ]
> > > > > > > > > > >     },
> > > > > > > > > > >
> > > > > > > > > > >     {
> > > > > > > > > > >       "@id": "http://dbpedia.org/resource/Paris",
> > > > > > > > > > >       "@type": [
> > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > >         "dbp-ont:Settlement",
> > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > >         "owl:Thing",
> > > > > > > > > > >         "schema:Place"
> > > > > > > > > > >       ],
> > > > > > > > > > >       "foaf:depiction": [
> > > > > > > > > > >         "
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > > > ",
> > > > > > > > > > >         "
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > > > "
> > > > > > > > > > >       ],
> > > > > > > > > > >       "geo:lat": 48.8567,
> > > > > > > > > > >       "geo:long": 2.3508,
> > > > > > > > > > >       "rdfs:comment": {
> > > > > > > > > > >         "@language": "en",
> > > > > > > > > > >         "@value": "Paris is the capital and largest
> city
> > of
> > > > > > France.
> > > > > > > > It
> > > > > > > > > > > is situated on the river Seine, in northern France, at
> > the
> > > > > heart
> > > > > > of
> > > > > > > > > > > the Île-de-France region (or Paris Region, French:
> Région
> > > > > > > > parisienne).
> > > > > > > > > > > As of January 2008 the city of Paris, within its
> > > > administrative
> > > > > > > > limits
> > > > > > > > > > > largely unchanged since 1860, has an estimated
> population
> > > of
> > > > > > > > 2,211,297
> > > > > > > > > > > and a metropolitan population of 12,089,098, and is one
> > of
> > > > the
> > > > > > most
> > > > > > > > > > > populated metropolitan areas in Europe."
> > > > > > > > > > >       },
> > > > > > > > > > >       "rdfs:label": [
> > > > > > > > > > >
> > > > > > > > > > >         {
> > > > > > > > > > >           "@language": "en",
> > > > > > > > > > >           "@value": "Paris"
> > > > > > > > > > >         },
> > > > > > > > > > >         {
> > > > > > > > > > >           "@language": "fr",
> > > > > > > > > > >           "@value": "Paris"
> > > > > > > > > > >         },
> > > > > > > > > > >       ]
> > > > > > > > > > >     },
> > > > > > > > > > >    }
> > > > > > > > > > >     {
> > > > > > > > > > >       "@id":
> > > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > > > >       "@type": [
> > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > > > >       ],
> > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > > > >       "dc:creator":
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > > > >       "enhancer:confidence": 0.6017613,
> > > > > > > > > > >       "enhancer:end": 5,
> > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > >
> > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > > > >         "@language": "en",
> > > > > > > > > > >         "@value": "Paris"
> > > > > > > > > > >       },
> > > > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > > > >         "@language": "en",
> > > > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > > > >       },
> > > > > > > > > > >       "enhancer:start": 0
> > > > > > > > > > >     },
> > > > > > > > > > >     {
> > > > > > > > > > >       "@id":
> > > > > > > "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > > > > > > > > >       "@type": [
> > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > > >       ],
> > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > > >       "dc:creator":
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > > >       "dc:relation":
> > > > > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > > >       "enhancer:confidence": 1.0,
> > > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > > >         "@language": "en",
> > > > > > > > > > >         "@value": "France"
> > > > > > > > > > >       },
> > > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > http://dbpedia.org/resource/France
> > > > > > > > > ",
> > > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > >         "schema:Country",
> > > > > > > > > > >         "schema:Place",
> > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > >         "owl:Thing"
> > > > > > > > > > >       ],
> > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > >
> > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > > >     },
> > > > > > > > > > >     {
> > > > > > > > > > >       "@id":
> > > > > > > "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > > > > > > > > >       "@type": [
> > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > > >       ],
> > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > > >       "dc:creator":
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > > >       "dc:relation":
> > > > > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > > >       "enhancer:confidence": 0.25715446,
> > > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > > >         "@language": "en",
> > > > > > > > > > >         "@value": "Vichy France"
> > > > > > > > > > >       },
> > > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > > > http://dbpedia.org/resource/Vichy_France",
> > > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > >         "schema:Country",
> > > > > > > > > > >         "schema:Place",
> > > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > > >         "owl:Thing"
> > > > > > > > > > >       ],
> > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > >
> > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > > >     },
> > > > > > > > > > >     {
> > > > > > > > > > >       "@id":
> > > > > > > "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > > > > > > > > >       "@type": [
> > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > > >       ],
> > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > > >       "dc:creator":
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > > >       "dc:relation":
> > > > > > > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > > > >       "enhancer:confidence": 0.1493264,
> > > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > > >         "@language": "en",
> > > > > > > > > > >         "@value": "Paris Commune"
> > > > > > > > > > >       },
> > > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > > > http://dbpedia.org/resource/Paris_Commune",
> > > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > > >         "schema:Country",
> > > > > > > > > > >         "schema:Place",
> > > > > > > > > > >         "owl:Thing"
> > > > > > > > > > >       ],
> > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > >
> > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > > >     },
> > > > > > > > > > >     {
> > > > > > > > > > >       "@id":
> > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > > >       "@type": [
> > > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > > > >       ],
> > > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > > > >       "dc:creator":
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > > > >       "enhancer:confidence": 0.99354976,
> > > > > > > > > > >       "enhancer:end": 18,
> > > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > > >
> > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > > > >         "@language": "en",
> > > > > > > > > > >         "@value": "France"
> > > > > > > > > > >       },
> > > > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > > > >         "@language": "en",
> > > > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > > > >       },
> > > > > > > > > > >       "enhancer:start": 12
> > > > > > > > > > >     }
> > > > > > > > > > >   ]
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <
> > > > > daddywri@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Dileepa,
> > > > > > > > > > > >
> > > > > > > > > > > > Repository connectors have an abstraction that allows
> > > them
> > > > to
> > > > > > > > > generate
> > > > > > > > > > > > compound documents (where a document has a primary
> > > > > identifier,
> > > > > > > and
> > > > > > > > > > there
> > > > > > > > > > > > are subdocuments that share that primary identifier
> and
> > > > have
> > > > > a
> > > > > > > > > > secondary
> > > > > > > > > > > > identifier).  This sounds a bit like what you are
> > > > describing.
> > > > > > > Does
> > > > > > > > > > > Stanbol
> > > > > > > > > > > > work by decorating an existing document, or does it
> > work
> > > by
> > > > > > > > > generating
> > > > > > > > > > > all
> > > > > > > > > > > > content for a document?
> > > > > > > > > > > >
> > > > > > > > > > > > Karl
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> > > > > > > > > djayakody@zaizi.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > While thanking you all for your input on Stanbol
> > > > connector
> > > > > > > > > > > requirement, I
> > > > > > > > > > > > > would like to continue with modifying the Stanbol
> > > > connector
> > > > > > to
> > > > > > > be
> > > > > > > > > > > > > compatible with any output connector. If you guys
> can
> > > > give
> > > > > > some
> > > > > > > > > > > guidance
> > > > > > > > > > > > on
> > > > > > > > > > > > > how the entity metadata should be added to the
> > > repository
> > > > > > > > document
> > > > > > > > > I
> > > > > > > > > > > can
> > > > > > > > > > > > > modify the stanbol connector accordingly.
> > > > > > > > > > > > >
> > > > > > > > > > > > > From Rafa's comments, I gathered we can add the
> > entity
> > > > > > metadata
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > > repo.doc as key value pairs.
> > > > > > > > > > > > > However this idea is not yet clear to me. There
> could
> > > be
> > > > > 'N'
> > > > > > > > number
> > > > > > > > > > of
> > > > > > > > > > > > > entities in a document and each of them will have
> > some
> > > > > common
> > > > > > > > > > > attributes
> > > > > > > > > > > > > such as name, id, type and specific attributes for
> > > > > particular
> > > > > > > > > entity
> > > > > > > > > > > > type.
> > > > > > > > > > > > > I'm not clear on how to maintain that structure of
> N
> > > > number
> > > > > > of
> > > > > > > > > > entities
> > > > > > > > > > > > > with their attributes in a repo.document as key
> value
> > > > pairs
> > > > > > and
> > > > > > > > > make
> > > > > > > > > > > them
> > > > > > > > > > > > > LDPath compatible for retrieval in an output
> > connector.
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Rafa
> > > > > > > > > > > > > If you can please elaborate on your suggestion it
> > would
> > > > be
> > > > > > > > greatly
> > > > > > > > > > > > helpful
> > > > > > > > > > > > > to me.
> > > > > > > > > > > > > All other suggestions are also welcome.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Dileepa
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <
> > > > > > > daddywri@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I, too, agree.  Somebody will need to turn this
> > > > connector
> > > > > > > into
> > > > > > > > > one
> > > > > > > > > > > that
> > > > > > > > > > > > > > plays by the rules.  It may be possible for
> someone
> > > on
> > > > > the
> > > > > > > team
> > > > > > > > > > here
> > > > > > > > > > > to
> > > > > > > > > > > > > do
> > > > > > > > > > > > > > that, but it won't be me; I'm seriously
> > overextended
> > > at
> > > > > the
> > > > > > > > > moment.
> > > > > > > > > > > It
> > > > > > > > > > > > > > would be best if someone who knew the connector
> > well
> > > > > could
> > > > > > do
> > > > > > > > the
> > > > > > > > > > > > > necessary
> > > > > > > > > > > > > > work.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Karl
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> > > > > > > > > rharoapache@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I must agree with Antonio. When I started to
> work
> > > on
> > > > > > this I
> > > > > > > > was
> > > > > > > > > > > > > expecting
> > > > > > > > > > > > > > > the connector to work by just extracting the
> > > entities
> > > > > and
> > > > > > > > > > entities
> > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > and put them as plain metadata of the
> documents,
> > > > > probably
> > > > > > > > > > following
> > > > > > > > > > > > > > LDPATH
> > > > > > > > > > > > > > > queries configuration
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This is probably ok for Sensefy but I don’t
> think
> > > > this
> > > > > > > could
> > > > > > > > be
> > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > to be included in the project. But this is only
> > my
> > > > > > opinion.
> > > > > > > > Of
> > > > > > > > > > > > course,
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > version of the connector that fully respect the
> > > > > > ManifoldCF
> > > > > > > > > > > > architecture
> > > > > > > > > > > > > > > would be more than welcome in my opinion
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David
> > > Pérez
> > > > > > > Morales
> > > > > > > > > > > > > > > <ad...@gmail.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi
> > > > > > > > > > > > > > > > The removal of the SolrWrapper is a must. It
> > was
> > > a
> > > > > > > > > requirement
> > > > > > > > > > > for
> > > > > > > > > > > > an
> > > > > > > > > > > > > > > > internal project which has nothing to do here
> > > with
> > > > a
> > > > > > > normal
> > > > > > > > > > > > operation
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > Manifold, so forcing the users to use Solr
> does
> > > not
> > > > > fit
> > > > > > > the
> > > > > > > > > > > > Manifold
> > > > > > > > > > > > > > > > philosophy.
> > > > > > > > > > > > > > > > In my opinion, at this moment, a Stanbol
> > > connector
> > > > > with
> > > > > > > > such
> > > > > > > > > a
> > > > > > > > > > > big
> > > > > > > > > > > > > > > > dependency which will not fit almost any use
> > case
> > > > is
> > > > > > not
> > > > > > > > very
> > > > > > > > > > > > useful.
> > > > > > > > > > > > > > > > You should think a way to convert Stanbol
> > > connector
> > > > > > into
> > > > > > > a
> > > > > > > > > > normal
> > > > > > > > > > > > > > > > Transformation connector without assuming
> that
> > a
> > > > > > specific
> > > > > > > > > > output
> > > > > > > > > > > > > > > connector
> > > > > > > > > > > > > > > > will be used.
> > > > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > >:
> > > > > > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> I have developed a Stanbol connector for
> MCF.
> > > You
> > > > > can
> > > > > > > > check
> > > > > > > > > it
> > > > > > > > > > > out
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > >> github repo here:
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> It requires the SolrWrapper output connector
> > > which
> > > > > > > indexes
> > > > > > > > > > > > enhanced
> > > > > > > > > > > > > > > >> documents, entities and entityTypes in
> > separate
> > > > Solr
> > > > > > > > cores.
> > > > > > > > > > > > > Basically
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > >> requires 3 separate solr cores configured
> > with a
> > > > > > > specific
> > > > > > > > > Solr
> > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > >> primary documents, entities and entityTypes
> > > > > > separately.
> > > > > > > > This
> > > > > > > > > > was
> > > > > > > > > > > > > done
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > >> our specific use-case.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Perhaps we can discuss and remove the
> Stanbol
> > > > > > > connector's
> > > > > > > > > > > > dependency
> > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > >> SolrWrapper and have it working with any
> > output
> > > > > > > connector.
> > > > > > > > > > > > > > > >> Please note that the Stanbol connector
> > currently
> > > > > has a
> > > > > > > bug
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > UI
> > > > > > > > > > > > > > > >> (editSpecification) which I'm working on at
> > the
> > > > > > moment.
> > > > > > > > > After
> > > > > > > > > > > > fixing
> > > > > > > > > > > > > > > that I
> > > > > > > > > > > > > > > >> will update here. And also I will provide
> > > > > > documentations
> > > > > > > > for
> > > > > > > > > > > > > > configuring
> > > > > > > > > > > > > > > >> the connector.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > > > > >> Dileepa
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio
> David
> > > > Pérez
> > > > > > > > Morales
> > > > > > > > > <
> > > > > > > > > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> > Hi Joshua
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > It is not the list for that, but Marmotta
> is
> > > > > already
> > > > > > > > > > > integrated
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > Apache
> > > > > > > > > > > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > > > > > > > > > > >> >
> > > > > https://issues.apache.org/jira/browse/STANBOL-1165
> > > > > > .
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > Anyway, as I said this is not the list for
> > > that,
> > > > > so
> > > > > > > > let's
> > > > > > > > > > use
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > proper
> > > > > > > > > > > > > > > >> > list for these things.
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > Regards
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > > > > > > > > > > joshua.dunham@gmail.com
> > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > > Hey Dileepa,
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > >       In case you were interested, I
> > pinged
> > > > the
> > > > > > > list a
> > > > > > > > > few
> > > > > > > > > > > > days
> > > > > > > > > > > > > > ago
> > > > > > > > > > > > > > > >> > asking
> > > > > > > > > > > > > > > >> > > for integration tips for Apache
> Marmotta.
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > > I got some great tips on how to do this
> > > which
> > > > > > could
> > > > > > > > help
> > > > > > > > > > > you.
> > > > > > > > > > > > > > Since
> > > > > > > > > > > > > > > >> > > Marmotta is a drop in replacement for
> > > Clarezza
> > > > > on
> > > > > > > > > Stanbol
> > > > > > > > > > it
> > > > > > > > > > > > may
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > >> > easier
> > > > > > > > > > > > > > > >> > > for you to take this way.
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > > I'm not a Java programmer but I'm
> bringing
> > > > this
> > > > > > > > problem
> > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > > > > > >> > development
> > > > > > > > > > > > > > > >> > > staff at my company for assistance. If
> you
> > > > like
> > > > > > the
> > > > > > > > > > Marmotta
> > > > > > > > > > > > > > > approach
> > > > > > > > > > > > > > > >> we
> > > > > > > > > > > > > > > >> > > may gain more traction solving the same
> > > > > > integration.
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > > I'm also integrating Marmotta with
> Stanbol
> > > so
> > > > > the
> > > > > > > > effect
> > > > > > > > > > > would
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >> > same
> > > > > > > > > > > > > > > >> > > except not using the Stanbol API for
> data
> > > > import
> > > > > > in
> > > > > > > > > favor
> > > > > > > > > > of
> > > > > > > > > > > > > > > Marmotta.
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > > Best,
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > > -J
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa
> > > > Jayakody <
> > > > > > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > > > Thanks you for the feedback and
> offering
> > > > your
> > > > > > help
> > > > > > > > in
> > > > > > > > > > > this.
> > > > > > > > > > > > > > > >> > > > Let me get back to you on where to
> start
> > > the
> > > > > > code
> > > > > > > > > base.
> > > > > > > > > > > > > > > >> > > > As the first step, I would like to
> start
> > > by
> > > > > > > > creating a
> > > > > > > > > > > > > > > architecture
> > > > > > > > > > > > > > > >> > > diagram
> > > > > > > > > > > > > > > >> > > > for the connector.
> > > > > > > > > > > > > > > >> > > > I will send the diagram for your
> review
> > > > soon.
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > > > > > >> > > > Dileepa
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > > > ------------------------------
> > > > > > > > > > > > > > > >> > > > This message should be regarded as
> > > > > confidential.
> > > > > > > If
> > > > > > > > > you
> > > > > > > > > > > have
> > > > > > > > > > > > > > > received
> > > > > > > > > > > > > > > >> > > this
> > > > > > > > > > > > > > > >> > > > email in error please notify the
> sender
> > > and
> > > > > > > destroy
> > > > > > > > it
> > > > > > > > > > > > > > > immediately.
> > > > > > > > > > > > > > > >> > > > Statements of intent shall only become
> > > > binding
> > > > > > > when
> > > > > > > > > > > > confirmed
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > hard
> > > > > > > > > > > > > > > >> > > copy
> > > > > > > > > > > > > > > >> > > > by an authorised signatory.
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > > > Zaizi Ltd is registered in England and
> > > Wales
> > > > > > with
> > > > > > > > the
> > > > > > > > > > > > > > registration
> > > > > > > > > > > > > > > >> > number
> > > > > > > > > > > > > > > >> > > > 6440931. The Registered Office is
> Brook
> > > > House,
> > > > > > 229
> > > > > > > > > > > Shepherds
> > > > > > > > > > > > > > Bush
> > > > > > > > > > > > > > > >> Road,
> > > > > > > > > > > > > > > >> > > > London W6 7AN.
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> --
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> ------------------------------
> > > > > > > > > > > > > > > >> This message should be regarded as
> > confidential.
> > > > If
> > > > > > you
> > > > > > > > have
> > > > > > > > > > > > > received
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > >> email in error please notify the sender and
> > > > destroy
> > > > > it
> > > > > > > > > > > > immediately.
> > > > > > > > > > > > > > > >> Statements of intent shall only become
> binding
> > > > when
> > > > > > > > > confirmed
> > > > > > > > > > in
> > > > > > > > > > > > > hard
> > > > > > > > > > > > > > > copy
> > > > > > > > > > > > > > > >> by an authorised signatory.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Zaizi Ltd is registered in England and Wales
> > > with
> > > > > the
> > > > > > > > > > > registration
> > > > > > > > > > > > > > > number
> > > > > > > > > > > > > > > >> 6440931. The Registered Office is Brook
> House,
> > > 229
> > > > > > > > Shepherds
> > > > > > > > > > > Bush
> > > > > > > > > > > > > > Road,
> > > > > > > > > > > > > > > >> London W6 7AN.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > >
> > > > > > > > > > > > > ------------------------------
> > > > > > > > > > > > > This message should be regarded as confidential. If
> > you
> > > > > have
> > > > > > > > > received
> > > > > > > > > > > > this
> > > > > > > > > > > > > email in error please notify the sender and destroy
> > it
> > > > > > > > immediately.
> > > > > > > > > > > > > Statements of intent shall only become binding when
> > > > > confirmed
> > > > > > > in
> > > > > > > > > hard
> > > > > > > > > > > > copy
> > > > > > > > > > > > > by an authorised signatory.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Zaizi Ltd is registered in England and Wales with
> the
> > > > > > > > registration
> > > > > > > > > > > number
> > > > > > > > > > > > > 6440931. The Registered Office is Brook House, 229
> > > > > Shepherds
> > > > > > > Bush
> > > > > > > > > > Road,
> > > > > > > > > > > > > London W6 7AN.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > ------------------------------
> > > > > > > > > > > This message should be regarded as confidential. If you
> > > have
> > > > > > > received
> > > > > > > > > > this
> > > > > > > > > > > email in error please notify the sender and destroy it
> > > > > > immediately.
> > > > > > > > > > > Statements of intent shall only become binding when
> > > confirmed
> > > > > in
> > > > > > > hard
> > > > > > > > > > copy
> > > > > > > > > > > by an authorised signatory.
> > > > > > > > > > >
> > > > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > > > registration
> > > > > > > > > number
> > > > > > > > > > > 6440931. The Registered Office is Brook House, 229
> > > Shepherds
> > > > > Bush
> > > > > > > > Road,
> > > > > > > > > > > London W6 7AN.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > ------------------------------
> > > > > > > > > This message should be regarded as confidential. If you
> have
> > > > > received
> > > > > > > > this
> > > > > > > > > email in error please notify the sender and destroy it
> > > > immediately.
> > > > > > > > > Statements of intent shall only become binding when
> confirmed
> > > in
> > > > > hard
> > > > > > > > copy
> > > > > > > > > by an authorised signatory.
> > > > > > > > >
> > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > registration
> > > > > > > number
> > > > > > > > > 6440931. The Registered Office is Brook House, 229
> Shepherds
> > > Bush
> > > > > > Road,
> > > > > > > > > London W6 7AN.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > ------------------------------
> > > > > > > This message should be regarded as confidential. If you have
> > > received
> > > > > > this
> > > > > > > email in error please notify the sender and destroy it
> > immediately.
> > > > > > > Statements of intent shall only become binding when confirmed
> in
> > > hard
> > > > > > copy
> > > > > > > by an authorised signatory.
> > > > > > >
> > > > > > > Zaizi Ltd is registered in England and Wales with the
> > registration
> > > > > number
> > > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> Bush
> > > > Road,
> > > > > > > London W6 7AN.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > > ------------------------------
> > > > This message should be regarded as confidential. If you have received
> > > this
> > > > email in error please notify the sender and destroy it immediately.
> > > > Statements of intent shall only become binding when confirmed in hard
> > > copy
> > > > by an authorised signatory.
> > > >
> > > > Zaizi Ltd is registered in England and Wales with the registration
> > number
> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> Road,
> > > > London W6 7AN.
> > > >
> > >
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
> >
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Rafa Haro <rh...@apache.org>.
Hi Karl,

I will import this one, don't worry.

Cheers,
Rafa
El El sáb, 12 dic 2015 a las 20:36, Dileepa Jayakody <dj...@zaizi.com>
escribió:

> Hi Karl,
>
> Yes, I will improve the code with Rafa's reviews and then we can import it
> to mcf code base.
>
> Thanks
> Dileepa
>
> On Sat, Dec 12, 2015 at 5:26 PM, Karl Wright <da...@gmail.com> wrote:
>
> > Ok, it seems premature for me to try to import this from Github today, so
> > I'll wait until the dust settles a bit further first.
> >
> > Karl
> >
> >
> > On Fri, Dec 11, 2015 at 1:45 PM, Dileepa Jayakody <dj...@zaizi.com>
> > wrote:
> >
> > > Thanks a lot Rafa for pointing that out. big miss as  I didn't test the
> > > LDPath configuration part yet. More improvements to be done.
> > > I will do the required mprovements as pointed out.
> > >
> > > Regards,
> > > Dileepa
> > >
> > >
> > > On Fri, Dec 11, 2015 at 8:42 PM, Rafa Haro <rh...@apache.org> wrote:
> > >
> > > > Hi Dileepa,
> > > >
> > > > The problem is not in that part on the code, it is rather on this
> part:
> > > >
> > > > if (entity != null) { Collection<String> properties = entity.
> > > > getProperties(); for (String property : properties) { String
> > > > targetFieldName = derefFields.get(property); Set<String> propValues =
> > > > entityPropertyMap.get(targetFieldName); if (propValues == null) {
> > > > propValues = new HashSet<String>(); } Collection<String>
> > > entityPropValues =
> > > > entity.getPropertyValues(property);
> > propValues.addAll(entityPropValues);
> > > > entityPropertyMap.put(targetFieldName, propValues); } }
> > > > You are collecting from the EnhancementStructure response just only
> the
> > > > configured dereferenced fields and LDPath fields are ignored. Also,
> > there
> > > > is a potential bug in that code if there is no dereferencing field
> > > > configured for a certain entity property here:
> > > >
> > > > String targetFieldName = derefFields.get(property);
> > > >
> > > > targetFieldName would be Null then. Instead of trying to index every
> > > > property, you should just collect the configured ones by the user (or
> > at
> > > > least, if the user wants all of them, provide a configuration option
> > for
> > > > that).
> > > >
> > > > Anyway, going back to LDPath issue, please take into account that
> when
> > > you
> > > > define a field you must use a custom Namespace and Prefix for later
> > being
> > > > able to retrieve that property from the entity. If you don't do that,
> > > > Stanbol will provide a random namespace for that property. Check this
> > > > example from RedLink SDK:
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443
> > > >
> > > > Hope that helps
> > > >
> > > > On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <da...@gmail.com>
> > wrote:
> > > >
> > > > > The next step would be to pull this code into an svn branch.  This
> is
> > > > > something I can tackled after the 2.3 release candidate is put
> > > together.
> > > > >
> > > > > Thanks,
> > > > > Karl
> > > > >
> > > > >
> > > > > On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <
> > djayakody@zaizi.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Rafa,
> > > > > >
> > > > > > Thanks for reviewing my code and for your feedback. Please see my
> > > > > comments
> > > > > > inline below.
> > > > > >
> > > > > >
> > > > > > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <rh...@apache.org>
> > wrote:
> > > > > >
> > > > > > > Hi Dileepa,
> > > > > > >
> > > > > > > This seems to be going in the right direction clearly now in my
> > > > > opinion.
> > > > > > > Quick comments after a first review:
> > > > > > >
> > > > > > >
> > > > > > >    - Rejecting a document because it can't be enhanced is kind
> of
> > > > > tough.
> > > > > > >    You are preventing a document to be finally indexed because
> > the
> > > > > > > enhancement
> > > > > > >    didn't perform correctly, probably it is better just to let
> > them
> > > > > > > continue
> > > > > > >    the workflow within the system
> > > > > > >
> > > > > >
> > > > > > Got your point. Will remove that part from the code
> > > > > >
> > > > > >
> > > > > > >    - As I can deduce for the code, you are correctly extracting
> > the
> > > > > > >    configured dereferenced fields, but you are not processing
> at
> > > all
> > > > > the
> > > > > > >    LDPath results
> > > > > > >
> > > > > > > I'm passing the LDPath program as an enhancer parameter to
> > Stanbol
> > > to
> > > > > > retrieve the enhancement result according to the LDPath program
> > > (which
> > > > is
> > > > > > given as a text string in the connector UI).
> > > > > > If the user has not defined a LDPath program and added derefence
> > > fields
> > > > > in
> > > > > > the UI instead, then the enhancement request will be built using
> > the
> > > > > > dereference fields as enhancer parameters.
> > > > > >
> > > > > >
> > > > > > If neither a LDPath or dereference fields are given in the
> > > > transformation
> > > > > > UI, then I just call the given enhancement chain without any
> other
> > > > > enhancer
> > > > > > paramaters.
> > > > > >
> > > > > > Please refer below code segment where I do this and let me know
> if
> > it
> > > > > needs
> > > > > > more improvements.
> > > > > >
> > > > > >             // ldpath program is given priority if it's set
> > > > > >             if (ldPath != null)
> > > > > >             {
> > > > > >                 parameters =
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
> > > > > >             }
> > > > > >             else if (!derefFields.isEmpty())
> > > > > >             {
> > > > > >                 parameters =
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
> > > > > >                         derefFields.keySet()).build();
> > > > > >             }
> > > > > >             else
> > > > > >             {
> > > > > >                 parameters =
> > > > > >
> > > >
> > EnhancerParameters.builder().setChain(chain).setContent(content).build();
> > > > > >             }
> > > > > >             eRes = enhancerClient.enhance(parameters);
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Dileepa
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Rafa
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <
> > > > djayakody@zaizi.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > As per our discussion I have modified the Stanbol Connector
> so
> > > that
> > > > > it
> > > > > > > adds
> > > > > > > > all extracted entity URIs and entity attributes to the
> > repository
> > > > > > > document
> > > > > > > > as fields.
> > > > > > > >
> > > > > > > > On a separate branch I have committed this code to our github
> > > > project
> > > > > > > > sensefy-connectors.
> > > > > > > > You can find the source code here:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > > > > > > > Let me know your feedback.
> > > > > > > >
> > > > > > > > I will write a blog post on how to add it in a connection and
> > get
> > > > > > > > ehancement results and share it with you.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Dileepa
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <
> > daddywri@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Dileepa,
> > > > > > > > >
> > > > > > > > > You cannot create sub-documents in a transformation
> > connector.
> > > > And
> > > > > > > > adding
> > > > > > > > > that capability to the framework is not possible; we would
> be
> > > > > missing
> > > > > > > key
> > > > > > > > > bookkeeping logic if that was allowed.
> > > > > > > > >
> > > > > > > > > Karl
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
> > > > > > djayakody@zaizi.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Karl,
> > > > > > > > > >
> > > > > > > > > > Thanks a lot for the pointer.
> > > > > > > > > >
> > > > > > > > > > Stanbol doesn't update an existing document, it
> generates a
> > > new
> > > > > > > > response
> > > > > > > > > > with requested enhancement details for the content
> > enhansment
> > > > > > > request.
> > > > > > > > > > For example for a request like : "Paris is a city in
> > France"
> > > > > > > following
> > > > > > > > > RDF
> > > > > > > > > > response [1] is given by Stanbol.
> > > > > > > > > >
> > > > > > > > > > In the Stanbol connector, enhancement artifacts such as
> > > > > > > TextAnnotations
> > > > > > > > > > and EntityAnnotations are extracted from the RDF
> response,
> > to
> > > > > > > generate
> > > > > > > > > the
> > > > > > > > > > entity abstractions and add them to the mcf repository
> > > > document.
> > > > > > > > > Currently
> > > > > > > > > > in the Stanbol connector we have added these entity
> > > > abstractions
> > > > > as
> > > > > > > > JSON
> > > > > > > > > > strings to a multi-valued 'entities' field in the
> > repository
> > > > > > document
> > > > > > > > and
> > > > > > > > > > we parse that JSON in the SolrWrapper output connector to
> > > index
> > > > > in
> > > > > > > > > separate
> > > > > > > > > > Solr cores (primary documents, linked entities and entity
> > > types
> > > > > > with
> > > > > > > > > their
> > > > > > > > > > attributes).
> > > > > > > > > >
> > > > > > > > > > Can we can have a primary repository document and create
> > sub
> > > > > > > documents
> > > > > > > > > for
> > > > > > > > > > the extracted entities? Is it possible to generate sub
> > > > documents
> > > > > > for
> > > > > > > a
> > > > > > > > > > repo-document in a transformation connector?
> > > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > > > Dileepa
> > > > > > > > > >
> > > > > > > > > > [1] Sample Stanbol response
> > > > > > > > > >
> > > > > > > > > > {
> > > > > > > > > >   "@context": {
> > > > > > > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > > > > > > > >     "dc": "http://purl.org/dc/terms/",
> > > > > > > > > >     "dc:created": {
> > > > > > > > > >       "@type": "xsd:dateTime"
> > > > > > > > > >     },
> > > > > > > > > >     "enhancer": "http://fise.iks-project.eu/ontology/",
> > > > > > > > > >     "enhancer:confidence": {
> > > > > > > > > >       "@type": "xsd:double"
> > > > > > > > > >     },
> > > > > > > > > >     "enhancer:end": {
> > > > > > > > > >       "@type": "xsd:int"
> > > > > > > > > >     },
> > > > > > > > > >     "enhancer:entity-reference": {
> > > > > > > > > >       "@type": "@id"
> > > > > > > > > >     },
> > > > > > > > > >     "enhancer:entity-type": {
> > > > > > > > > >       "@type": "@id"
> > > > > > > > > >     },
> > > > > > > > > >     "enhancer:extracted-from": {
> > > > > > > > > >       "@type": "@id"
> > > > > > > > > >     },
> > > > > > > > > >     "enhancer:start": {
> > > > > > > > > >       "@type": "xsd:int"
> > > > > > > > > >     },
> > > > > > > > > >     "entityhub": "
> > > > > > > > > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > > > > > > > ",
> > > > > > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > > > > > > > >     "foaf:depiction": {
> > > > > > > > > >       "@type": "@id"
> > > > > > > > > >     },
> > > > > > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > > > > > > > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > > > > > > > > >     "schema": "http://schema.org/",
> > > > > > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > > > > > > > >   },
> > > > > > > > > >   "@graph": [
> > > > > > > > > >     {
> > > > > > > > > >       "@id": "http://dbpedia.org/resource/France",
> > > > > > > > > >       "@type": [
> > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > >         "owl:Thing",
> > > > > > > > > >         "schema:Country",
> > > > > > > > > >         "schema:Place"
> > > > > > > > > >       ],
> > > > > > > > > >       "foaf:depiction": [
> > > > > > > > > >         "
> > > > > > > > > >
> > > > > > >
> > > >
> http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > > > > > > > ",
> > > > > > > > > >         "
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > > > > > > > "
> > > > > > > > > >       ],
> > > > > > > > > >       "rdfs:comment": {
> > > > > > > > > >         "@language": "en",
> > > > > > > > > >         "@value": "France, officially the French
> Republic,
> > > is a
> > > > > > > > > > unitary semi-presidential republic in Western Europe with
> > > > several
> > > > > > > > > > overseas territories and islands located on other
> > continents
> > > > and
> > > > > in
> > > > > > > > > > the Indian, Pacific, and Atlantic oceans. Metropolitan
> > France
> > > > > > extends
> > > > > > > > > > from the Mediterranean Sea to the English Channel and the
> > > North
> > > > > > Sea,
> > > > > > > > > > and from the Rhine to the Atlantic Ocean. It is often
> > > referred
> > > > to
> > > > > > as
> > > > > > > > > > l’Hexagone because of the geometric shape of its
> > territory."
> > > > > > > > > >       },
> > > > > > > > > >       "rdfs:label": [
> > > > > > > > > >         {
> > > > > > > > > >           "@language": "en",
> > > > > > > > > >           "@value": "France"
> > > > > > > > > >         },
> > > > > > > > > >         {
> > > > > > > > > >           "@language": "fr",
> > > > > > > > > >           "@value": "France"
> > > > > > > > > >         },
> > > > > > > > > >       ]
> > > > > > > > > >     },
> > > > > > > > > >
> > > > > > > > > >     {
> > > > > > > > > >       "@id": "http://dbpedia.org/resource/Paris",
> > > > > > > > > >       "@type": [
> > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > >         "dbp-ont:Settlement",
> > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > >         "owl:Thing",
> > > > > > > > > >         "schema:Place"
> > > > > > > > > >       ],
> > > > > > > > > >       "foaf:depiction": [
> > > > > > > > > >         "
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > > ",
> > > > > > > > > >         "
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > > "
> > > > > > > > > >       ],
> > > > > > > > > >       "geo:lat": 48.8567,
> > > > > > > > > >       "geo:long": 2.3508,
> > > > > > > > > >       "rdfs:comment": {
> > > > > > > > > >         "@language": "en",
> > > > > > > > > >         "@value": "Paris is the capital and largest city
> of
> > > > > France.
> > > > > > > It
> > > > > > > > > > is situated on the river Seine, in northern France, at
> the
> > > > heart
> > > > > of
> > > > > > > > > > the Île-de-France region (or Paris Region, French: Région
> > > > > > > parisienne).
> > > > > > > > > > As of January 2008 the city of Paris, within its
> > > administrative
> > > > > > > limits
> > > > > > > > > > largely unchanged since 1860, has an estimated population
> > of
> > > > > > > 2,211,297
> > > > > > > > > > and a metropolitan population of 12,089,098, and is one
> of
> > > the
> > > > > most
> > > > > > > > > > populated metropolitan areas in Europe."
> > > > > > > > > >       },
> > > > > > > > > >       "rdfs:label": [
> > > > > > > > > >
> > > > > > > > > >         {
> > > > > > > > > >           "@language": "en",
> > > > > > > > > >           "@value": "Paris"
> > > > > > > > > >         },
> > > > > > > > > >         {
> > > > > > > > > >           "@language": "fr",
> > > > > > > > > >           "@value": "Paris"
> > > > > > > > > >         },
> > > > > > > > > >       ]
> > > > > > > > > >     },
> > > > > > > > > >    }
> > > > > > > > > >     {
> > > > > > > > > >       "@id":
> > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > > >       "@type": [
> > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > > >       ],
> > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > > >       "dc:creator":
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > > >       "enhancer:confidence": 0.6017613,
> > > > > > > > > >       "enhancer:end": 5,
> > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > >
> > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > > >         "@language": "en",
> > > > > > > > > >         "@value": "Paris"
> > > > > > > > > >       },
> > > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > > >         "@language": "en",
> > > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > > >       },
> > > > > > > > > >       "enhancer:start": 0
> > > > > > > > > >     },
> > > > > > > > > >     {
> > > > > > > > > >       "@id":
> > > > > > "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > > > > > > > >       "@type": [
> > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > >       ],
> > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > >       "dc:creator":
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > >       "dc:relation":
> > > > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > >       "enhancer:confidence": 1.0,
> > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > >         "@language": "en",
> > > > > > > > > >         "@value": "France"
> > > > > > > > > >       },
> > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > http://dbpedia.org/resource/France
> > > > > > > > ",
> > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > >         "schema:Country",
> > > > > > > > > >         "schema:Place",
> > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > >         "owl:Thing"
> > > > > > > > > >       ],
> > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > >
> > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > >     },
> > > > > > > > > >     {
> > > > > > > > > >       "@id":
> > > > > > "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > > > > > > > >       "@type": [
> > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > >       ],
> > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > >       "dc:creator":
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > >       "dc:relation":
> > > > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > >       "enhancer:confidence": 0.25715446,
> > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > >         "@language": "en",
> > > > > > > > > >         "@value": "Vichy France"
> > > > > > > > > >       },
> > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > > http://dbpedia.org/resource/Vichy_France",
> > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > >         "schema:Country",
> > > > > > > > > >         "schema:Place",
> > > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > > >         "owl:Thing"
> > > > > > > > > >       ],
> > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > >
> > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > >     },
> > > > > > > > > >     {
> > > > > > > > > >       "@id":
> > > > > > "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > > > > > > > >       "@type": [
> > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > > >       ],
> > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > > >       "dc:creator":
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > > >       "dc:relation":
> > > > > > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > > >       "enhancer:confidence": 0.1493264,
> > > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > > >         "@language": "en",
> > > > > > > > > >         "@value": "Paris Commune"
> > > > > > > > > >       },
> > > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > > http://dbpedia.org/resource/Paris_Commune",
> > > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > > >         "dbp-ont:Country",
> > > > > > > > > >         "dbp-ont:Place",
> > > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > > >         "schema:Country",
> > > > > > > > > >         "schema:Place",
> > > > > > > > > >         "owl:Thing"
> > > > > > > > > >       ],
> > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > >
> > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > > >     },
> > > > > > > > > >     {
> > > > > > > > > >       "@id":
> > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > > >       "@type": [
> > > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > > >       ],
> > > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > > >       "dc:creator":
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > > >       "enhancer:confidence": 0.99354976,
> > > > > > > > > >       "enhancer:end": 18,
> > > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > > >
> > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > > >         "@language": "en",
> > > > > > > > > >         "@value": "France"
> > > > > > > > > >       },
> > > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > > >         "@language": "en",
> > > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > > >       },
> > > > > > > > > >       "enhancer:start": 12
> > > > > > > > > >     }
> > > > > > > > > >   ]
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <
> > > > daddywri@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Dileepa,
> > > > > > > > > > >
> > > > > > > > > > > Repository connectors have an abstraction that allows
> > them
> > > to
> > > > > > > > generate
> > > > > > > > > > > compound documents (where a document has a primary
> > > > identifier,
> > > > > > and
> > > > > > > > > there
> > > > > > > > > > > are subdocuments that share that primary identifier and
> > > have
> > > > a
> > > > > > > > > secondary
> > > > > > > > > > > identifier).  This sounds a bit like what you are
> > > describing.
> > > > > > Does
> > > > > > > > > > Stanbol
> > > > > > > > > > > work by decorating an existing document, or does it
> work
> > by
> > > > > > > > generating
> > > > > > > > > > all
> > > > > > > > > > > content for a document?
> > > > > > > > > > >
> > > > > > > > > > > Karl
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> > > > > > > > djayakody@zaizi.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi All,
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > While thanking you all for your input on Stanbol
> > > connector
> > > > > > > > > > requirement, I
> > > > > > > > > > > > would like to continue with modifying the Stanbol
> > > connector
> > > > > to
> > > > > > be
> > > > > > > > > > > > compatible with any output connector. If you guys can
> > > give
> > > > > some
> > > > > > > > > > guidance
> > > > > > > > > > > on
> > > > > > > > > > > > how the entity metadata should be added to the
> > repository
> > > > > > > document
> > > > > > > > I
> > > > > > > > > > can
> > > > > > > > > > > > modify the stanbol connector accordingly.
> > > > > > > > > > > >
> > > > > > > > > > > > From Rafa's comments, I gathered we can add the
> entity
> > > > > metadata
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > repo.doc as key value pairs.
> > > > > > > > > > > > However this idea is not yet clear to me. There could
> > be
> > > > 'N'
> > > > > > > number
> > > > > > > > > of
> > > > > > > > > > > > entities in a document and each of them will have
> some
> > > > common
> > > > > > > > > > attributes
> > > > > > > > > > > > such as name, id, type and specific attributes for
> > > > particular
> > > > > > > > entity
> > > > > > > > > > > type.
> > > > > > > > > > > > I'm not clear on how to maintain that structure of N
> > > number
> > > > > of
> > > > > > > > > entities
> > > > > > > > > > > > with their attributes in a repo.document as key value
> > > pairs
> > > > > and
> > > > > > > > make
> > > > > > > > > > them
> > > > > > > > > > > > LDPath compatible for retrieval in an output
> connector.
> > > > > > > > > > > >
> > > > > > > > > > > > @Rafa
> > > > > > > > > > > > If you can please elaborate on your suggestion it
> would
> > > be
> > > > > > > greatly
> > > > > > > > > > > helpful
> > > > > > > > > > > > to me.
> > > > > > > > > > > > All other suggestions are also welcome.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Dileepa
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <
> > > > > > daddywri@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I, too, agree.  Somebody will need to turn this
> > > connector
> > > > > > into
> > > > > > > > one
> > > > > > > > > > that
> > > > > > > > > > > > > plays by the rules.  It may be possible for someone
> > on
> > > > the
> > > > > > team
> > > > > > > > > here
> > > > > > > > > > to
> > > > > > > > > > > > do
> > > > > > > > > > > > > that, but it won't be me; I'm seriously
> overextended
> > at
> > > > the
> > > > > > > > moment.
> > > > > > > > > > It
> > > > > > > > > > > > > would be best if someone who knew the connector
> well
> > > > could
> > > > > do
> > > > > > > the
> > > > > > > > > > > > necessary
> > > > > > > > > > > > > work.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Karl
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> > > > > > > > rharoapache@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I must agree with Antonio. When I started to work
> > on
> > > > > this I
> > > > > > > was
> > > > > > > > > > > > expecting
> > > > > > > > > > > > > > the connector to work by just extracting the
> > entities
> > > > and
> > > > > > > > > entities
> > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > and put them as plain metadata of the documents,
> > > > probably
> > > > > > > > > following
> > > > > > > > > > > > > LDPATH
> > > > > > > > > > > > > > queries configuration
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This is probably ok for Sensefy but I don’t think
> > > this
> > > > > > could
> > > > > > > be
> > > > > > > > > > > > suitable
> > > > > > > > > > > > > > to be included in the project. But this is only
> my
> > > > > opinion.
> > > > > > > Of
> > > > > > > > > > > course,
> > > > > > > > > > > > a
> > > > > > > > > > > > > > version of the connector that fully respect the
> > > > > ManifoldCF
> > > > > > > > > > > architecture
> > > > > > > > > > > > > > would be more than welcome in my opinion
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David
> > Pérez
> > > > > > Morales
> > > > > > > > > > > > > > <ad...@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi
> > > > > > > > > > > > > > > The removal of the SolrWrapper is a must. It
> was
> > a
> > > > > > > > requirement
> > > > > > > > > > for
> > > > > > > > > > > an
> > > > > > > > > > > > > > > internal project which has nothing to do here
> > with
> > > a
> > > > > > normal
> > > > > > > > > > > operation
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > Manifold, so forcing the users to use Solr does
> > not
> > > > fit
> > > > > > the
> > > > > > > > > > > Manifold
> > > > > > > > > > > > > > > philosophy.
> > > > > > > > > > > > > > > In my opinion, at this moment, a Stanbol
> > connector
> > > > with
> > > > > > > such
> > > > > > > > a
> > > > > > > > > > big
> > > > > > > > > > > > > > > dependency which will not fit almost any use
> case
> > > is
> > > > > not
> > > > > > > very
> > > > > > > > > > > useful.
> > > > > > > > > > > > > > > You should think a way to convert Stanbol
> > connector
> > > > > into
> > > > > > a
> > > > > > > > > normal
> > > > > > > > > > > > > > > Transformation connector without assuming that
> a
> > > > > specific
> > > > > > > > > output
> > > > > > > > > > > > > > connector
> > > > > > > > > > > > > > > will be used.
> > > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > >:
> > > > > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> I have developed a Stanbol connector for MCF.
> > You
> > > > can
> > > > > > > check
> > > > > > > > it
> > > > > > > > > > out
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > our
> > > > > > > > > > > > > > >> github repo here:
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> It requires the SolrWrapper output connector
> > which
> > > > > > indexes
> > > > > > > > > > > enhanced
> > > > > > > > > > > > > > >> documents, entities and entityTypes in
> separate
> > > Solr
> > > > > > > cores.
> > > > > > > > > > > > Basically
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > >> requires 3 separate solr cores configured
> with a
> > > > > > specific
> > > > > > > > Solr
> > > > > > > > > > > > schema
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > >> primary documents, entities and entityTypes
> > > > > separately.
> > > > > > > This
> > > > > > > > > was
> > > > > > > > > > > > done
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > >> our specific use-case.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> Perhaps we can discuss and remove the Stanbol
> > > > > > connector's
> > > > > > > > > > > dependency
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > >> SolrWrapper and have it working with any
> output
> > > > > > connector.
> > > > > > > > > > > > > > >> Please note that the Stanbol connector
> currently
> > > > has a
> > > > > > bug
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > UI
> > > > > > > > > > > > > > >> (editSpecification) which I'm working on at
> the
> > > > > moment.
> > > > > > > > After
> > > > > > > > > > > fixing
> > > > > > > > > > > > > > that I
> > > > > > > > > > > > > > >> will update here. And also I will provide
> > > > > documentations
> > > > > > > for
> > > > > > > > > > > > > configuring
> > > > > > > > > > > > > > >> the connector.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > > > >> Dileepa
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David
> > > Pérez
> > > > > > > Morales
> > > > > > > > <
> > > > > > > > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> > Hi Joshua
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > It is not the list for that, but Marmotta is
> > > > already
> > > > > > > > > > integrated
> > > > > > > > > > > in
> > > > > > > > > > > > > > Apache
> > > > > > > > > > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > > > > > > > > > >> >
> > > > https://issues.apache.org/jira/browse/STANBOL-1165
> > > > > .
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Anyway, as I said this is not the list for
> > that,
> > > > so
> > > > > > > let's
> > > > > > > > > use
> > > > > > > > > > > the
> > > > > > > > > > > > > > proper
> > > > > > > > > > > > > > >> > list for these things.
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Regards
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > > > > > > > > > joshua.dunham@gmail.com
> > > > > > > > > > > > >:
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > > Hey Dileepa,
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > >       In case you were interested, I
> pinged
> > > the
> > > > > > list a
> > > > > > > > few
> > > > > > > > > > > days
> > > > > > > > > > > > > ago
> > > > > > > > > > > > > > >> > asking
> > > > > > > > > > > > > > >> > > for integration tips for Apache Marmotta.
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > I got some great tips on how to do this
> > which
> > > > > could
> > > > > > > help
> > > > > > > > > > you.
> > > > > > > > > > > > > Since
> > > > > > > > > > > > > > >> > > Marmotta is a drop in replacement for
> > Clarezza
> > > > on
> > > > > > > > Stanbol
> > > > > > > > > it
> > > > > > > > > > > may
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > >> > easier
> > > > > > > > > > > > > > >> > > for you to take this way.
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > I'm not a Java programmer but I'm bringing
> > > this
> > > > > > > problem
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > > > >> > development
> > > > > > > > > > > > > > >> > > staff at my company for assistance. If you
> > > like
> > > > > the
> > > > > > > > > Marmotta
> > > > > > > > > > > > > > approach
> > > > > > > > > > > > > > >> we
> > > > > > > > > > > > > > >> > > may gain more traction solving the same
> > > > > integration.
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > I'm also integrating Marmotta with Stanbol
> > so
> > > > the
> > > > > > > effect
> > > > > > > > > > would
> > > > > > > > > > > > be
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > >> > same
> > > > > > > > > > > > > > >> > > except not using the Stanbol API for data
> > > import
> > > > > in
> > > > > > > > favor
> > > > > > > > > of
> > > > > > > > > > > > > > Marmotta.
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > Best,
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > -J
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa
> > > Jayakody <
> > > > > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > Thanks you for the feedback and offering
> > > your
> > > > > help
> > > > > > > in
> > > > > > > > > > this.
> > > > > > > > > > > > > > >> > > > Let me get back to you on where to start
> > the
> > > > > code
> > > > > > > > base.
> > > > > > > > > > > > > > >> > > > As the first step, I would like to start
> > by
> > > > > > > creating a
> > > > > > > > > > > > > > architecture
> > > > > > > > > > > > > > >> > > diagram
> > > > > > > > > > > > > > >> > > > for the connector.
> > > > > > > > > > > > > > >> > > > I will send the diagram for your review
> > > soon.
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > > > > >> > > > Dileepa
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > ------------------------------
> > > > > > > > > > > > > > >> > > > This message should be regarded as
> > > > confidential.
> > > > > > If
> > > > > > > > you
> > > > > > > > > > have
> > > > > > > > > > > > > > received
> > > > > > > > > > > > > > >> > > this
> > > > > > > > > > > > > > >> > > > email in error please notify the sender
> > and
> > > > > > destroy
> > > > > > > it
> > > > > > > > > > > > > > immediately.
> > > > > > > > > > > > > > >> > > > Statements of intent shall only become
> > > binding
> > > > > > when
> > > > > > > > > > > confirmed
> > > > > > > > > > > > in
> > > > > > > > > > > > > > hard
> > > > > > > > > > > > > > >> > > copy
> > > > > > > > > > > > > > >> > > > by an authorised signatory.
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > Zaizi Ltd is registered in England and
> > Wales
> > > > > with
> > > > > > > the
> > > > > > > > > > > > > registration
> > > > > > > > > > > > > > >> > number
> > > > > > > > > > > > > > >> > > > 6440931. The Registered Office is Brook
> > > House,
> > > > > 229
> > > > > > > > > > Shepherds
> > > > > > > > > > > > > Bush
> > > > > > > > > > > > > > >> Road,
> > > > > > > > > > > > > > >> > > > London W6 7AN.
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> --
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> ------------------------------
> > > > > > > > > > > > > > >> This message should be regarded as
> confidential.
> > > If
> > > > > you
> > > > > > > have
> > > > > > > > > > > > received
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > >> email in error please notify the sender and
> > > destroy
> > > > it
> > > > > > > > > > > immediately.
> > > > > > > > > > > > > > >> Statements of intent shall only become binding
> > > when
> > > > > > > > confirmed
> > > > > > > > > in
> > > > > > > > > > > > hard
> > > > > > > > > > > > > > copy
> > > > > > > > > > > > > > >> by an authorised signatory.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> Zaizi Ltd is registered in England and Wales
> > with
> > > > the
> > > > > > > > > > registration
> > > > > > > > > > > > > > number
> > > > > > > > > > > > > > >> 6440931. The Registered Office is Brook House,
> > 229
> > > > > > > Shepherds
> > > > > > > > > > Bush
> > > > > > > > > > > > > Road,
> > > > > > > > > > > > > > >> London W6 7AN.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > >
> > > > > > > > > > > > ------------------------------
> > > > > > > > > > > > This message should be regarded as confidential. If
> you
> > > > have
> > > > > > > > received
> > > > > > > > > > > this
> > > > > > > > > > > > email in error please notify the sender and destroy
> it
> > > > > > > immediately.
> > > > > > > > > > > > Statements of intent shall only become binding when
> > > > confirmed
> > > > > > in
> > > > > > > > hard
> > > > > > > > > > > copy
> > > > > > > > > > > > by an authorised signatory.
> > > > > > > > > > > >
> > > > > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > > > > registration
> > > > > > > > > > number
> > > > > > > > > > > > 6440931. The Registered Office is Brook House, 229
> > > > Shepherds
> > > > > > Bush
> > > > > > > > > Road,
> > > > > > > > > > > > London W6 7AN.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > ------------------------------
> > > > > > > > > > This message should be regarded as confidential. If you
> > have
> > > > > > received
> > > > > > > > > this
> > > > > > > > > > email in error please notify the sender and destroy it
> > > > > immediately.
> > > > > > > > > > Statements of intent shall only become binding when
> > confirmed
> > > > in
> > > > > > hard
> > > > > > > > > copy
> > > > > > > > > > by an authorised signatory.
> > > > > > > > > >
> > > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > > registration
> > > > > > > > number
> > > > > > > > > > 6440931. The Registered Office is Brook House, 229
> > Shepherds
> > > > Bush
> > > > > > > Road,
> > > > > > > > > > London W6 7AN.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > ------------------------------
> > > > > > > > This message should be regarded as confidential. If you have
> > > > received
> > > > > > > this
> > > > > > > > email in error please notify the sender and destroy it
> > > immediately.
> > > > > > > > Statements of intent shall only become binding when confirmed
> > in
> > > > hard
> > > > > > > copy
> > > > > > > > by an authorised signatory.
> > > > > > > >
> > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > registration
> > > > > > number
> > > > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> > Bush
> > > > > Road,
> > > > > > > > London W6 7AN.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > ------------------------------
> > > > > > This message should be regarded as confidential. If you have
> > received
> > > > > this
> > > > > > email in error please notify the sender and destroy it
> immediately.
> > > > > > Statements of intent shall only become binding when confirmed in
> > hard
> > > > > copy
> > > > > > by an authorised signatory.
> > > > > >
> > > > > > Zaizi Ltd is registered in England and Wales with the
> registration
> > > > number
> > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > > Road,
> > > > > > London W6 7AN.
> > > > > >
> > > > >
> > > >
> > >
> > > --
> > >
> > > ------------------------------
> > > This message should be regarded as confidential. If you have received
> > this
> > > email in error please notify the sender and destroy it immediately.
> > > Statements of intent shall only become binding when confirmed in hard
> > copy
> > > by an authorised signatory.
> > >
> > > Zaizi Ltd is registered in England and Wales with the registration
> number
> > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > > London W6 7AN.
> > >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Dileepa Jayakody <dj...@zaizi.com>.
Hi Karl,

Yes, I will improve the code with Rafa's reviews and then we can import it
to mcf code base.

Thanks
Dileepa

On Sat, Dec 12, 2015 at 5:26 PM, Karl Wright <da...@gmail.com> wrote:

> Ok, it seems premature for me to try to import this from Github today, so
> I'll wait until the dust settles a bit further first.
>
> Karl
>
>
> On Fri, Dec 11, 2015 at 1:45 PM, Dileepa Jayakody <dj...@zaizi.com>
> wrote:
>
> > Thanks a lot Rafa for pointing that out. big miss as  I didn't test the
> > LDPath configuration part yet. More improvements to be done.
> > I will do the required mprovements as pointed out.
> >
> > Regards,
> > Dileepa
> >
> >
> > On Fri, Dec 11, 2015 at 8:42 PM, Rafa Haro <rh...@apache.org> wrote:
> >
> > > Hi Dileepa,
> > >
> > > The problem is not in that part on the code, it is rather on this part:
> > >
> > > if (entity != null) { Collection<String> properties = entity.
> > > getProperties(); for (String property : properties) { String
> > > targetFieldName = derefFields.get(property); Set<String> propValues =
> > > entityPropertyMap.get(targetFieldName); if (propValues == null) {
> > > propValues = new HashSet<String>(); } Collection<String>
> > entityPropValues =
> > > entity.getPropertyValues(property);
> propValues.addAll(entityPropValues);
> > > entityPropertyMap.put(targetFieldName, propValues); } }
> > > You are collecting from the EnhancementStructure response just only the
> > > configured dereferenced fields and LDPath fields are ignored. Also,
> there
> > > is a potential bug in that code if there is no dereferencing field
> > > configured for a certain entity property here:
> > >
> > > String targetFieldName = derefFields.get(property);
> > >
> > > targetFieldName would be Null then. Instead of trying to index every
> > > property, you should just collect the configured ones by the user (or
> at
> > > least, if the user wants all of them, provide a configuration option
> for
> > > that).
> > >
> > > Anyway, going back to LDPath issue, please take into account that when
> > you
> > > define a field you must use a custom Namespace and Prefix for later
> being
> > > able to retrieve that property from the entity. If you don't do that,
> > > Stanbol will provide a random namespace for that property. Check this
> > > example from RedLink SDK:
> > >
> > >
> > >
> >
> https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443
> > >
> > > Hope that helps
> > >
> > > On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <da...@gmail.com>
> wrote:
> > >
> > > > The next step would be to pull this code into an svn branch.  This is
> > > > something I can tackled after the 2.3 release candidate is put
> > together.
> > > >
> > > > Thanks,
> > > > Karl
> > > >
> > > >
> > > > On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <
> djayakody@zaizi.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Rafa,
> > > > >
> > > > > Thanks for reviewing my code and for your feedback. Please see my
> > > > comments
> > > > > inline below.
> > > > >
> > > > >
> > > > > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <rh...@apache.org>
> wrote:
> > > > >
> > > > > > Hi Dileepa,
> > > > > >
> > > > > > This seems to be going in the right direction clearly now in my
> > > > opinion.
> > > > > > Quick comments after a first review:
> > > > > >
> > > > > >
> > > > > >    - Rejecting a document because it can't be enhanced is kind of
> > > > tough.
> > > > > >    You are preventing a document to be finally indexed because
> the
> > > > > > enhancement
> > > > > >    didn't perform correctly, probably it is better just to let
> them
> > > > > > continue
> > > > > >    the workflow within the system
> > > > > >
> > > > >
> > > > > Got your point. Will remove that part from the code
> > > > >
> > > > >
> > > > > >    - As I can deduce for the code, you are correctly extracting
> the
> > > > > >    configured dereferenced fields, but you are not processing at
> > all
> > > > the
> > > > > >    LDPath results
> > > > > >
> > > > > > I'm passing the LDPath program as an enhancer parameter to
> Stanbol
> > to
> > > > > retrieve the enhancement result according to the LDPath program
> > (which
> > > is
> > > > > given as a text string in the connector UI).
> > > > > If the user has not defined a LDPath program and added derefence
> > fields
> > > > in
> > > > > the UI instead, then the enhancement request will be built using
> the
> > > > > dereference fields as enhancer parameters.
> > > > >
> > > > >
> > > > > If neither a LDPath or dereference fields are given in the
> > > transformation
> > > > > UI, then I just call the given enhancement chain without any other
> > > > enhancer
> > > > > paramaters.
> > > > >
> > > > > Please refer below code segment where I do this and let me know if
> it
> > > > needs
> > > > > more improvements.
> > > > >
> > > > >             // ldpath program is given priority if it's set
> > > > >             if (ldPath != null)
> > > > >             {
> > > > >                 parameters =
> > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
> > > > >             }
> > > > >             else if (!derefFields.isEmpty())
> > > > >             {
> > > > >                 parameters =
> > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
> > > > >                         derefFields.keySet()).build();
> > > > >             }
> > > > >             else
> > > > >             {
> > > > >                 parameters =
> > > > >
> > >
> EnhancerParameters.builder().setChain(chain).setContent(content).build();
> > > > >             }
> > > > >             eRes = enhancerClient.enhance(parameters);
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Dileepa
> > > > >
> > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Rafa
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <
> > > djayakody@zaizi.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > As per our discussion I have modified the Stanbol Connector so
> > that
> > > > it
> > > > > > adds
> > > > > > > all extracted entity URIs and entity attributes to the
> repository
> > > > > > document
> > > > > > > as fields.
> > > > > > >
> > > > > > > On a separate branch I have committed this code to our github
> > > project
> > > > > > > sensefy-connectors.
> > > > > > > You can find the source code here:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > > > > > > Let me know your feedback.
> > > > > > >
> > > > > > > I will write a blog post on how to add it in a connection and
> get
> > > > > > > ehancement results and share it with you.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dileepa
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <
> daddywri@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Dileepa,
> > > > > > > >
> > > > > > > > You cannot create sub-documents in a transformation
> connector.
> > > And
> > > > > > > adding
> > > > > > > > that capability to the framework is not possible; we would be
> > > > missing
> > > > > > key
> > > > > > > > bookkeeping logic if that was allowed.
> > > > > > > >
> > > > > > > > Karl
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
> > > > > djayakody@zaizi.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Karl,
> > > > > > > > >
> > > > > > > > > Thanks a lot for the pointer.
> > > > > > > > >
> > > > > > > > > Stanbol doesn't update an existing document, it generates a
> > new
> > > > > > > response
> > > > > > > > > with requested enhancement details for the content
> enhansment
> > > > > > request.
> > > > > > > > > For example for a request like : "Paris is a city in
> France"
> > > > > > following
> > > > > > > > RDF
> > > > > > > > > response [1] is given by Stanbol.
> > > > > > > > >
> > > > > > > > > In the Stanbol connector, enhancement artifacts such as
> > > > > > TextAnnotations
> > > > > > > > > and EntityAnnotations are extracted from the RDF response,
> to
> > > > > > generate
> > > > > > > > the
> > > > > > > > > entity abstractions and add them to the mcf repository
> > > document.
> > > > > > > > Currently
> > > > > > > > > in the Stanbol connector we have added these entity
> > > abstractions
> > > > as
> > > > > > > JSON
> > > > > > > > > strings to a multi-valued 'entities' field in the
> repository
> > > > > document
> > > > > > > and
> > > > > > > > > we parse that JSON in the SolrWrapper output connector to
> > index
> > > > in
> > > > > > > > separate
> > > > > > > > > Solr cores (primary documents, linked entities and entity
> > types
> > > > > with
> > > > > > > > their
> > > > > > > > > attributes).
> > > > > > > > >
> > > > > > > > > Can we can have a primary repository document and create
> sub
> > > > > > documents
> > > > > > > > for
> > > > > > > > > the extracted entities? Is it possible to generate sub
> > > documents
> > > > > for
> > > > > > a
> > > > > > > > > repo-document in a transformation connector?
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > > Dileepa
> > > > > > > > >
> > > > > > > > > [1] Sample Stanbol response
> > > > > > > > >
> > > > > > > > > {
> > > > > > > > >   "@context": {
> > > > > > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > > > > > > >     "dc": "http://purl.org/dc/terms/",
> > > > > > > > >     "dc:created": {
> > > > > > > > >       "@type": "xsd:dateTime"
> > > > > > > > >     },
> > > > > > > > >     "enhancer": "http://fise.iks-project.eu/ontology/",
> > > > > > > > >     "enhancer:confidence": {
> > > > > > > > >       "@type": "xsd:double"
> > > > > > > > >     },
> > > > > > > > >     "enhancer:end": {
> > > > > > > > >       "@type": "xsd:int"
> > > > > > > > >     },
> > > > > > > > >     "enhancer:entity-reference": {
> > > > > > > > >       "@type": "@id"
> > > > > > > > >     },
> > > > > > > > >     "enhancer:entity-type": {
> > > > > > > > >       "@type": "@id"
> > > > > > > > >     },
> > > > > > > > >     "enhancer:extracted-from": {
> > > > > > > > >       "@type": "@id"
> > > > > > > > >     },
> > > > > > > > >     "enhancer:start": {
> > > > > > > > >       "@type": "xsd:int"
> > > > > > > > >     },
> > > > > > > > >     "entityhub": "
> > > > > > > > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > > > > > > ",
> > > > > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > > > > > > >     "foaf:depiction": {
> > > > > > > > >       "@type": "@id"
> > > > > > > > >     },
> > > > > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > > > > > > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > > > > > > > >     "schema": "http://schema.org/",
> > > > > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > > > > > > >   },
> > > > > > > > >   "@graph": [
> > > > > > > > >     {
> > > > > > > > >       "@id": "http://dbpedia.org/resource/France",
> > > > > > > > >       "@type": [
> > > > > > > > >         "dbp-ont:Country",
> > > > > > > > >         "dbp-ont:Place",
> > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > >         "owl:Thing",
> > > > > > > > >         "schema:Country",
> > > > > > > > >         "schema:Place"
> > > > > > > > >       ],
> > > > > > > > >       "foaf:depiction": [
> > > > > > > > >         "
> > > > > > > > >
> > > > > >
> > > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > > > > > > ",
> > > > > > > > >         "
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > > > > > > "
> > > > > > > > >       ],
> > > > > > > > >       "rdfs:comment": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "France, officially the French Republic,
> > is a
> > > > > > > > > unitary semi-presidential republic in Western Europe with
> > > several
> > > > > > > > > overseas territories and islands located on other
> continents
> > > and
> > > > in
> > > > > > > > > the Indian, Pacific, and Atlantic oceans. Metropolitan
> France
> > > > > extends
> > > > > > > > > from the Mediterranean Sea to the English Channel and the
> > North
> > > > > Sea,
> > > > > > > > > and from the Rhine to the Atlantic Ocean. It is often
> > referred
> > > to
> > > > > as
> > > > > > > > > l’Hexagone because of the geometric shape of its
> territory."
> > > > > > > > >       },
> > > > > > > > >       "rdfs:label": [
> > > > > > > > >         {
> > > > > > > > >           "@language": "en",
> > > > > > > > >           "@value": "France"
> > > > > > > > >         },
> > > > > > > > >         {
> > > > > > > > >           "@language": "fr",
> > > > > > > > >           "@value": "France"
> > > > > > > > >         },
> > > > > > > > >       ]
> > > > > > > > >     },
> > > > > > > > >
> > > > > > > > >     {
> > > > > > > > >       "@id": "http://dbpedia.org/resource/Paris",
> > > > > > > > >       "@type": [
> > > > > > > > >         "dbp-ont:Place",
> > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > >         "dbp-ont:Settlement",
> > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > >         "owl:Thing",
> > > > > > > > >         "schema:Place"
> > > > > > > > >       ],
> > > > > > > > >       "foaf:depiction": [
> > > > > > > > >         "
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > ",
> > > > > > > > >         "
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > "
> > > > > > > > >       ],
> > > > > > > > >       "geo:lat": 48.8567,
> > > > > > > > >       "geo:long": 2.3508,
> > > > > > > > >       "rdfs:comment": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "Paris is the capital and largest city of
> > > > France.
> > > > > > It
> > > > > > > > > is situated on the river Seine, in northern France, at the
> > > heart
> > > > of
> > > > > > > > > the Île-de-France region (or Paris Region, French: Région
> > > > > > parisienne).
> > > > > > > > > As of January 2008 the city of Paris, within its
> > administrative
> > > > > > limits
> > > > > > > > > largely unchanged since 1860, has an estimated population
> of
> > > > > > 2,211,297
> > > > > > > > > and a metropolitan population of 12,089,098, and is one of
> > the
> > > > most
> > > > > > > > > populated metropolitan areas in Europe."
> > > > > > > > >       },
> > > > > > > > >       "rdfs:label": [
> > > > > > > > >
> > > > > > > > >         {
> > > > > > > > >           "@language": "en",
> > > > > > > > >           "@value": "Paris"
> > > > > > > > >         },
> > > > > > > > >         {
> > > > > > > > >           "@language": "fr",
> > > > > > > > >           "@value": "Paris"
> > > > > > > > >         },
> > > > > > > > >       ]
> > > > > > > > >     },
> > > > > > > > >    }
> > > > > > > > >     {
> > > > > > > > >       "@id":
> > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > >       "@type": [
> > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > >       ],
> > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > >       "dc:creator":
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > >       "enhancer:confidence": 0.6017613,
> > > > > > > > >       "enhancer:end": 5,
> > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > >
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "Paris"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:start": 0
> > > > > > > > >     },
> > > > > > > > >     {
> > > > > > > > >       "@id":
> > > > > "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > > > > > > >       "@type": [
> > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > >       ],
> > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > >       "dc:creator":
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > >       "dc:relation":
> > > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > >       "enhancer:confidence": 1.0,
> > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "France"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:entity-reference": "
> > > > > > http://dbpedia.org/resource/France
> > > > > > > ",
> > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > >         "dbp-ont:Country",
> > > > > > > > >         "dbp-ont:Place",
> > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > >         "schema:Country",
> > > > > > > > >         "schema:Place",
> > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > >         "owl:Thing"
> > > > > > > > >       ],
> > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > >
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > >     },
> > > > > > > > >     {
> > > > > > > > >       "@id":
> > > > > "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > > > > > > >       "@type": [
> > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > >       ],
> > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > >       "dc:creator":
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > >       "dc:relation":
> > > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > >       "enhancer:confidence": 0.25715446,
> > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "Vichy France"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > http://dbpedia.org/resource/Vichy_France",
> > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > >         "dbp-ont:Country",
> > > > > > > > >         "dbp-ont:Place",
> > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > >         "schema:Country",
> > > > > > > > >         "schema:Place",
> > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > >         "owl:Thing"
> > > > > > > > >       ],
> > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > >
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > >     },
> > > > > > > > >     {
> > > > > > > > >       "@id":
> > > > > "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > > > > > > >       "@type": [
> > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > >       ],
> > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > >       "dc:creator":
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > >       "dc:relation":
> > > > > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > >       "enhancer:confidence": 0.1493264,
> > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "Paris Commune"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > http://dbpedia.org/resource/Paris_Commune",
> > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > >         "dbp-ont:Country",
> > > > > > > > >         "dbp-ont:Place",
> > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > >         "schema:Country",
> > > > > > > > >         "schema:Place",
> > > > > > > > >         "owl:Thing"
> > > > > > > > >       ],
> > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > >
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > >     },
> > > > > > > > >     {
> > > > > > > > >       "@id":
> > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > >       "@type": [
> > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > >       ],
> > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > >       "dc:creator":
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > >       "enhancer:confidence": 0.99354976,
> > > > > > > > >       "enhancer:end": 18,
> > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > >
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "France"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:start": 12
> > > > > > > > >     }
> > > > > > > > >   ]
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <
> > > daddywri@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Dileepa,
> > > > > > > > > >
> > > > > > > > > > Repository connectors have an abstraction that allows
> them
> > to
> > > > > > > generate
> > > > > > > > > > compound documents (where a document has a primary
> > > identifier,
> > > > > and
> > > > > > > > there
> > > > > > > > > > are subdocuments that share that primary identifier and
> > have
> > > a
> > > > > > > > secondary
> > > > > > > > > > identifier).  This sounds a bit like what you are
> > describing.
> > > > > Does
> > > > > > > > > Stanbol
> > > > > > > > > > work by decorating an existing document, or does it work
> by
> > > > > > > generating
> > > > > > > > > all
> > > > > > > > > > content for a document?
> > > > > > > > > >
> > > > > > > > > > Karl
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> > > > > > > djayakody@zaizi.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi All,
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > While thanking you all for your input on Stanbol
> > connector
> > > > > > > > > requirement, I
> > > > > > > > > > > would like to continue with modifying the Stanbol
> > connector
> > > > to
> > > > > be
> > > > > > > > > > > compatible with any output connector. If you guys can
> > give
> > > > some
> > > > > > > > > guidance
> > > > > > > > > > on
> > > > > > > > > > > how the entity metadata should be added to the
> repository
> > > > > > document
> > > > > > > I
> > > > > > > > > can
> > > > > > > > > > > modify the stanbol connector accordingly.
> > > > > > > > > > >
> > > > > > > > > > > From Rafa's comments, I gathered we can add the entity
> > > > metadata
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > repo.doc as key value pairs.
> > > > > > > > > > > However this idea is not yet clear to me. There could
> be
> > > 'N'
> > > > > > number
> > > > > > > > of
> > > > > > > > > > > entities in a document and each of them will have some
> > > common
> > > > > > > > > attributes
> > > > > > > > > > > such as name, id, type and specific attributes for
> > > particular
> > > > > > > entity
> > > > > > > > > > type.
> > > > > > > > > > > I'm not clear on how to maintain that structure of N
> > number
> > > > of
> > > > > > > > entities
> > > > > > > > > > > with their attributes in a repo.document as key value
> > pairs
> > > > and
> > > > > > > make
> > > > > > > > > them
> > > > > > > > > > > LDPath compatible for retrieval in an output connector.
> > > > > > > > > > >
> > > > > > > > > > > @Rafa
> > > > > > > > > > > If you can please elaborate on your suggestion it would
> > be
> > > > > > greatly
> > > > > > > > > > helpful
> > > > > > > > > > > to me.
> > > > > > > > > > > All other suggestions are also welcome.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Dileepa
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <
> > > > > daddywri@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I, too, agree.  Somebody will need to turn this
> > connector
> > > > > into
> > > > > > > one
> > > > > > > > > that
> > > > > > > > > > > > plays by the rules.  It may be possible for someone
> on
> > > the
> > > > > team
> > > > > > > > here
> > > > > > > > > to
> > > > > > > > > > > do
> > > > > > > > > > > > that, but it won't be me; I'm seriously overextended
> at
> > > the
> > > > > > > moment.
> > > > > > > > > It
> > > > > > > > > > > > would be best if someone who knew the connector well
> > > could
> > > > do
> > > > > > the
> > > > > > > > > > > necessary
> > > > > > > > > > > > work.
> > > > > > > > > > > >
> > > > > > > > > > > > Karl
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> > > > > > > rharoapache@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I must agree with Antonio. When I started to work
> on
> > > > this I
> > > > > > was
> > > > > > > > > > > expecting
> > > > > > > > > > > > > the connector to work by just extracting the
> entities
> > > and
> > > > > > > > entities
> > > > > > > > > > > > metadata
> > > > > > > > > > > > > and put them as plain metadata of the documents,
> > > probably
> > > > > > > > following
> > > > > > > > > > > > LDPATH
> > > > > > > > > > > > > queries configuration
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > This is probably ok for Sensefy but I don’t think
> > this
> > > > > could
> > > > > > be
> > > > > > > > > > > suitable
> > > > > > > > > > > > > to be included in the project. But this is only my
> > > > opinion.
> > > > > > Of
> > > > > > > > > > course,
> > > > > > > > > > > a
> > > > > > > > > > > > > version of the connector that fully respect the
> > > > ManifoldCF
> > > > > > > > > > architecture
> > > > > > > > > > > > > would be more than welcome in my opinion
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David
> Pérez
> > > > > Morales
> > > > > > > > > > > > > <ad...@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi
> > > > > > > > > > > > > > The removal of the SolrWrapper is a must. It was
> a
> > > > > > > requirement
> > > > > > > > > for
> > > > > > > > > > an
> > > > > > > > > > > > > > internal project which has nothing to do here
> with
> > a
> > > > > normal
> > > > > > > > > > operation
> > > > > > > > > > > > of
> > > > > > > > > > > > > > Manifold, so forcing the users to use Solr does
> not
> > > fit
> > > > > the
> > > > > > > > > > Manifold
> > > > > > > > > > > > > > philosophy.
> > > > > > > > > > > > > > In my opinion, at this moment, a Stanbol
> connector
> > > with
> > > > > > such
> > > > > > > a
> > > > > > > > > big
> > > > > > > > > > > > > > dependency which will not fit almost any use case
> > is
> > > > not
> > > > > > very
> > > > > > > > > > useful.
> > > > > > > > > > > > > > You should think a way to convert Stanbol
> connector
> > > > into
> > > > > a
> > > > > > > > normal
> > > > > > > > > > > > > > Transformation connector without assuming that a
> > > > specific
> > > > > > > > output
> > > > > > > > > > > > > connector
> > > > > > > > > > > > > > will be used.
> > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> > > > > > > > djayakody@zaizi.com
> > > > > > > > > >:
> > > > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> I have developed a Stanbol connector for MCF.
> You
> > > can
> > > > > > check
> > > > > > > it
> > > > > > > > > out
> > > > > > > > > > > > from
> > > > > > > > > > > > > our
> > > > > > > > > > > > > >> github repo here:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> It requires the SolrWrapper output connector
> which
> > > > > indexes
> > > > > > > > > > enhanced
> > > > > > > > > > > > > >> documents, entities and entityTypes in separate
> > Solr
> > > > > > cores.
> > > > > > > > > > > Basically
> > > > > > > > > > > > it
> > > > > > > > > > > > > >> requires 3 separate solr cores configured with a
> > > > > specific
> > > > > > > Solr
> > > > > > > > > > > schema
> > > > > > > > > > > > > for
> > > > > > > > > > > > > >> primary documents, entities and entityTypes
> > > > separately.
> > > > > > This
> > > > > > > > was
> > > > > > > > > > > done
> > > > > > > > > > > > > for
> > > > > > > > > > > > > >> our specific use-case.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Perhaps we can discuss and remove the Stanbol
> > > > > connector's
> > > > > > > > > > dependency
> > > > > > > > > > > > > with
> > > > > > > > > > > > > >> SolrWrapper and have it working with any output
> > > > > connector.
> > > > > > > > > > > > > >> Please note that the Stanbol connector currently
> > > has a
> > > > > bug
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > UI
> > > > > > > > > > > > > >> (editSpecification) which I'm working on at the
> > > > moment.
> > > > > > > After
> > > > > > > > > > fixing
> > > > > > > > > > > > > that I
> > > > > > > > > > > > > >> will update here. And also I will provide
> > > > documentations
> > > > > > for
> > > > > > > > > > > > configuring
> > > > > > > > > > > > > >> the connector.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > > >> Dileepa
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David
> > Pérez
> > > > > > Morales
> > > > > > > <
> > > > > > > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> > Hi Joshua
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > It is not the list for that, but Marmotta is
> > > already
> > > > > > > > > integrated
> > > > > > > > > > in
> > > > > > > > > > > > > Apache
> > > > > > > > > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > > > > > > > > >> >
> > > https://issues.apache.org/jira/browse/STANBOL-1165
> > > > .
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Anyway, as I said this is not the list for
> that,
> > > so
> > > > > > let's
> > > > > > > > use
> > > > > > > > > > the
> > > > > > > > > > > > > proper
> > > > > > > > > > > > > >> > list for these things.
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Regards
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > > > > > > > > joshua.dunham@gmail.com
> > > > > > > > > > > >:
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > > Hey Dileepa,
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > >       In case you were interested, I pinged
> > the
> > > > > list a
> > > > > > > few
> > > > > > > > > > days
> > > > > > > > > > > > ago
> > > > > > > > > > > > > >> > asking
> > > > > > > > > > > > > >> > > for integration tips for Apache Marmotta.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > I got some great tips on how to do this
> which
> > > > could
> > > > > > help
> > > > > > > > > you.
> > > > > > > > > > > > Since
> > > > > > > > > > > > > >> > > Marmotta is a drop in replacement for
> Clarezza
> > > on
> > > > > > > Stanbol
> > > > > > > > it
> > > > > > > > > > may
> > > > > > > > > > > > be
> > > > > > > > > > > > > >> > easier
> > > > > > > > > > > > > >> > > for you to take this way.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > I'm not a Java programmer but I'm bringing
> > this
> > > > > > problem
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > > >> > development
> > > > > > > > > > > > > >> > > staff at my company for assistance. If you
> > like
> > > > the
> > > > > > > > Marmotta
> > > > > > > > > > > > > approach
> > > > > > > > > > > > > >> we
> > > > > > > > > > > > > >> > > may gain more traction solving the same
> > > > integration.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > I'm also integrating Marmotta with Stanbol
> so
> > > the
> > > > > > effect
> > > > > > > > > would
> > > > > > > > > > > be
> > > > > > > > > > > > > the
> > > > > > > > > > > > > >> > same
> > > > > > > > > > > > > >> > > except not using the Stanbol API for data
> > import
> > > > in
> > > > > > > favor
> > > > > > > > of
> > > > > > > > > > > > > Marmotta.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > Best,
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > -J
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa
> > Jayakody <
> > > > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Thanks you for the feedback and offering
> > your
> > > > help
> > > > > > in
> > > > > > > > > this.
> > > > > > > > > > > > > >> > > > Let me get back to you on where to start
> the
> > > > code
> > > > > > > base.
> > > > > > > > > > > > > >> > > > As the first step, I would like to start
> by
> > > > > > creating a
> > > > > > > > > > > > > architecture
> > > > > > > > > > > > > >> > > diagram
> > > > > > > > > > > > > >> > > > for the connector.
> > > > > > > > > > > > > >> > > > I will send the diagram for your review
> > soon.
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > > > >> > > > Dileepa
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > ------------------------------
> > > > > > > > > > > > > >> > > > This message should be regarded as
> > > confidential.
> > > > > If
> > > > > > > you
> > > > > > > > > have
> > > > > > > > > > > > > received
> > > > > > > > > > > > > >> > > this
> > > > > > > > > > > > > >> > > > email in error please notify the sender
> and
> > > > > destroy
> > > > > > it
> > > > > > > > > > > > > immediately.
> > > > > > > > > > > > > >> > > > Statements of intent shall only become
> > binding
> > > > > when
> > > > > > > > > > confirmed
> > > > > > > > > > > in
> > > > > > > > > > > > > hard
> > > > > > > > > > > > > >> > > copy
> > > > > > > > > > > > > >> > > > by an authorised signatory.
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Zaizi Ltd is registered in England and
> Wales
> > > > with
> > > > > > the
> > > > > > > > > > > > registration
> > > > > > > > > > > > > >> > number
> > > > > > > > > > > > > >> > > > 6440931. The Registered Office is Brook
> > House,
> > > > 229
> > > > > > > > > Shepherds
> > > > > > > > > > > > Bush
> > > > > > > > > > > > > >> Road,
> > > > > > > > > > > > > >> > > > London W6 7AN.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> --
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> ------------------------------
> > > > > > > > > > > > > >> This message should be regarded as confidential.
> > If
> > > > you
> > > > > > have
> > > > > > > > > > > received
> > > > > > > > > > > > > this
> > > > > > > > > > > > > >> email in error please notify the sender and
> > destroy
> > > it
> > > > > > > > > > immediately.
> > > > > > > > > > > > > >> Statements of intent shall only become binding
> > when
> > > > > > > confirmed
> > > > > > > > in
> > > > > > > > > > > hard
> > > > > > > > > > > > > copy
> > > > > > > > > > > > > >> by an authorised signatory.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Zaizi Ltd is registered in England and Wales
> with
> > > the
> > > > > > > > > registration
> > > > > > > > > > > > > number
> > > > > > > > > > > > > >> 6440931. The Registered Office is Brook House,
> 229
> > > > > > Shepherds
> > > > > > > > > Bush
> > > > > > > > > > > > Road,
> > > > > > > > > > > > > >> London W6 7AN.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > ------------------------------
> > > > > > > > > > > This message should be regarded as confidential. If you
> > > have
> > > > > > > received
> > > > > > > > > > this
> > > > > > > > > > > email in error please notify the sender and destroy it
> > > > > > immediately.
> > > > > > > > > > > Statements of intent shall only become binding when
> > > confirmed
> > > > > in
> > > > > > > hard
> > > > > > > > > > copy
> > > > > > > > > > > by an authorised signatory.
> > > > > > > > > > >
> > > > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > > > registration
> > > > > > > > > number
> > > > > > > > > > > 6440931. The Registered Office is Brook House, 229
> > > Shepherds
> > > > > Bush
> > > > > > > > Road,
> > > > > > > > > > > London W6 7AN.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > ------------------------------
> > > > > > > > > This message should be regarded as confidential. If you
> have
> > > > > received
> > > > > > > > this
> > > > > > > > > email in error please notify the sender and destroy it
> > > > immediately.
> > > > > > > > > Statements of intent shall only become binding when
> confirmed
> > > in
> > > > > hard
> > > > > > > > copy
> > > > > > > > > by an authorised signatory.
> > > > > > > > >
> > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > registration
> > > > > > > number
> > > > > > > > > 6440931. The Registered Office is Brook House, 229
> Shepherds
> > > Bush
> > > > > > Road,
> > > > > > > > > London W6 7AN.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > ------------------------------
> > > > > > > This message should be regarded as confidential. If you have
> > > received
> > > > > > this
> > > > > > > email in error please notify the sender and destroy it
> > immediately.
> > > > > > > Statements of intent shall only become binding when confirmed
> in
> > > hard
> > > > > > copy
> > > > > > > by an authorised signatory.
> > > > > > >
> > > > > > > Zaizi Ltd is registered in England and Wales with the
> > registration
> > > > > number
> > > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> Bush
> > > > Road,
> > > > > > > London W6 7AN.
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > ------------------------------
> > > > > This message should be regarded as confidential. If you have
> received
> > > > this
> > > > > email in error please notify the sender and destroy it immediately.
> > > > > Statements of intent shall only become binding when confirmed in
> hard
> > > > copy
> > > > > by an authorised signatory.
> > > > >
> > > > > Zaizi Ltd is registered in England and Wales with the registration
> > > number
> > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > Road,
> > > > > London W6 7AN.
> > > > >
> > > >
> > >
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
> >
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Karl Wright <da...@gmail.com>.
Ok, it seems premature for me to try to import this from Github today, so
I'll wait until the dust settles a bit further first.

Karl


On Fri, Dec 11, 2015 at 1:45 PM, Dileepa Jayakody <dj...@zaizi.com>
wrote:

> Thanks a lot Rafa for pointing that out. big miss as  I didn't test the
> LDPath configuration part yet. More improvements to be done.
> I will do the required mprovements as pointed out.
>
> Regards,
> Dileepa
>
>
> On Fri, Dec 11, 2015 at 8:42 PM, Rafa Haro <rh...@apache.org> wrote:
>
> > Hi Dileepa,
> >
> > The problem is not in that part on the code, it is rather on this part:
> >
> > if (entity != null) { Collection<String> properties = entity.
> > getProperties(); for (String property : properties) { String
> > targetFieldName = derefFields.get(property); Set<String> propValues =
> > entityPropertyMap.get(targetFieldName); if (propValues == null) {
> > propValues = new HashSet<String>(); } Collection<String>
> entityPropValues =
> > entity.getPropertyValues(property); propValues.addAll(entityPropValues);
> > entityPropertyMap.put(targetFieldName, propValues); } }
> > You are collecting from the EnhancementStructure response just only the
> > configured dereferenced fields and LDPath fields are ignored. Also, there
> > is a potential bug in that code if there is no dereferencing field
> > configured for a certain entity property here:
> >
> > String targetFieldName = derefFields.get(property);
> >
> > targetFieldName would be Null then. Instead of trying to index every
> > property, you should just collect the configured ones by the user (or at
> > least, if the user wants all of them, provide a configuration option for
> > that).
> >
> > Anyway, going back to LDPath issue, please take into account that when
> you
> > define a field you must use a custom Namespace and Prefix for later being
> > able to retrieve that property from the entity. If you don't do that,
> > Stanbol will provide a random namespace for that property. Check this
> > example from RedLink SDK:
> >
> >
> >
> https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443
> >
> > Hope that helps
> >
> > On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <da...@gmail.com> wrote:
> >
> > > The next step would be to pull this code into an svn branch.  This is
> > > something I can tackled after the 2.3 release candidate is put
> together.
> > >
> > > Thanks,
> > > Karl
> > >
> > >
> > > On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <djayakody@zaizi.com
> >
> > > wrote:
> > >
> > > > Hi Rafa,
> > > >
> > > > Thanks for reviewing my code and for your feedback. Please see my
> > > comments
> > > > inline below.
> > > >
> > > >
> > > > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <rh...@apache.org> wrote:
> > > >
> > > > > Hi Dileepa,
> > > > >
> > > > > This seems to be going in the right direction clearly now in my
> > > opinion.
> > > > > Quick comments after a first review:
> > > > >
> > > > >
> > > > >    - Rejecting a document because it can't be enhanced is kind of
> > > tough.
> > > > >    You are preventing a document to be finally indexed because the
> > > > > enhancement
> > > > >    didn't perform correctly, probably it is better just to let them
> > > > > continue
> > > > >    the workflow within the system
> > > > >
> > > >
> > > > Got your point. Will remove that part from the code
> > > >
> > > >
> > > > >    - As I can deduce for the code, you are correctly extracting the
> > > > >    configured dereferenced fields, but you are not processing at
> all
> > > the
> > > > >    LDPath results
> > > > >
> > > > > I'm passing the LDPath program as an enhancer parameter to Stanbol
> to
> > > > retrieve the enhancement result according to the LDPath program
> (which
> > is
> > > > given as a text string in the connector UI).
> > > > If the user has not defined a LDPath program and added derefence
> fields
> > > in
> > > > the UI instead, then the enhancement request will be built using the
> > > > dereference fields as enhancer parameters.
> > > >
> > > >
> > > > If neither a LDPath or dereference fields are given in the
> > transformation
> > > > UI, then I just call the given enhancement chain without any other
> > > enhancer
> > > > paramaters.
> > > >
> > > > Please refer below code segment where I do this and let me know if it
> > > needs
> > > > more improvements.
> > > >
> > > >             // ldpath program is given priority if it's set
> > > >             if (ldPath != null)
> > > >             {
> > > >                 parameters =
> > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
> > > >             }
> > > >             else if (!derefFields.isEmpty())
> > > >             {
> > > >                 parameters =
> > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
> > > >                         derefFields.keySet()).build();
> > > >             }
> > > >             else
> > > >             {
> > > >                 parameters =
> > > >
> > EnhancerParameters.builder().setChain(chain).setContent(content).build();
> > > >             }
> > > >             eRes = enhancerClient.enhance(parameters);
> > > >
> > > >
> > > > Thanks,
> > > > Dileepa
> > > >
> > > >
> > > > >
> > > > > Cheers,
> > > > > Rafa
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <
> > djayakody@zaizi.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > As per our discussion I have modified the Stanbol Connector so
> that
> > > it
> > > > > adds
> > > > > > all extracted entity URIs and entity attributes to the repository
> > > > > document
> > > > > > as fields.
> > > > > >
> > > > > > On a separate branch I have committed this code to our github
> > project
> > > > > > sensefy-connectors.
> > > > > > You can find the source code here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > > > > > Let me know your feedback.
> > > > > >
> > > > > > I will write a blog post on how to add it in a connection and get
> > > > > > ehancement results and share it with you.
> > > > > >
> > > > > > Thanks,
> > > > > > Dileepa
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <da...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi Dileepa,
> > > > > > >
> > > > > > > You cannot create sub-documents in a transformation connector.
> > And
> > > > > > adding
> > > > > > > that capability to the framework is not possible; we would be
> > > missing
> > > > > key
> > > > > > > bookkeeping logic if that was allowed.
> > > > > > >
> > > > > > > Karl
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
> > > > djayakody@zaizi.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Karl,
> > > > > > > >
> > > > > > > > Thanks a lot for the pointer.
> > > > > > > >
> > > > > > > > Stanbol doesn't update an existing document, it generates a
> new
> > > > > > response
> > > > > > > > with requested enhancement details for the content enhansment
> > > > > request.
> > > > > > > > For example for a request like : "Paris is a city in France"
> > > > > following
> > > > > > > RDF
> > > > > > > > response [1] is given by Stanbol.
> > > > > > > >
> > > > > > > > In the Stanbol connector, enhancement artifacts such as
> > > > > TextAnnotations
> > > > > > > > and EntityAnnotations are extracted from the RDF response, to
> > > > > generate
> > > > > > > the
> > > > > > > > entity abstractions and add them to the mcf repository
> > document.
> > > > > > > Currently
> > > > > > > > in the Stanbol connector we have added these entity
> > abstractions
> > > as
> > > > > > JSON
> > > > > > > > strings to a multi-valued 'entities' field in the repository
> > > > document
> > > > > > and
> > > > > > > > we parse that JSON in the SolrWrapper output connector to
> index
> > > in
> > > > > > > separate
> > > > > > > > Solr cores (primary documents, linked entities and entity
> types
> > > > with
> > > > > > > their
> > > > > > > > attributes).
> > > > > > > >
> > > > > > > > Can we can have a primary repository document and create sub
> > > > > documents
> > > > > > > for
> > > > > > > > the extracted entities? Is it possible to generate sub
> > documents
> > > > for
> > > > > a
> > > > > > > > repo-document in a transformation connector?
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > > Dileepa
> > > > > > > >
> > > > > > > > [1] Sample Stanbol response
> > > > > > > >
> > > > > > > > {
> > > > > > > >   "@context": {
> > > > > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > > > > > >     "dc": "http://purl.org/dc/terms/",
> > > > > > > >     "dc:created": {
> > > > > > > >       "@type": "xsd:dateTime"
> > > > > > > >     },
> > > > > > > >     "enhancer": "http://fise.iks-project.eu/ontology/",
> > > > > > > >     "enhancer:confidence": {
> > > > > > > >       "@type": "xsd:double"
> > > > > > > >     },
> > > > > > > >     "enhancer:end": {
> > > > > > > >       "@type": "xsd:int"
> > > > > > > >     },
> > > > > > > >     "enhancer:entity-reference": {
> > > > > > > >       "@type": "@id"
> > > > > > > >     },
> > > > > > > >     "enhancer:entity-type": {
> > > > > > > >       "@type": "@id"
> > > > > > > >     },
> > > > > > > >     "enhancer:extracted-from": {
> > > > > > > >       "@type": "@id"
> > > > > > > >     },
> > > > > > > >     "enhancer:start": {
> > > > > > > >       "@type": "xsd:int"
> > > > > > > >     },
> > > > > > > >     "entityhub": "
> > > > > > > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > > > > > ",
> > > > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > > > > > >     "foaf:depiction": {
> > > > > > > >       "@type": "@id"
> > > > > > > >     },
> > > > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > > > > > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > > > > > > >     "schema": "http://schema.org/",
> > > > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > > > > > >   },
> > > > > > > >   "@graph": [
> > > > > > > >     {
> > > > > > > >       "@id": "http://dbpedia.org/resource/France",
> > > > > > > >       "@type": [
> > > > > > > >         "dbp-ont:Country",
> > > > > > > >         "dbp-ont:Place",
> > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > >         "owl:Thing",
> > > > > > > >         "schema:Country",
> > > > > > > >         "schema:Place"
> > > > > > > >       ],
> > > > > > > >       "foaf:depiction": [
> > > > > > > >         "
> > > > > > > >
> > > > >
> > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > > > > > ",
> > > > > > > >         "
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > > > > > "
> > > > > > > >       ],
> > > > > > > >       "rdfs:comment": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "France, officially the French Republic,
> is a
> > > > > > > > unitary semi-presidential republic in Western Europe with
> > several
> > > > > > > > overseas territories and islands located on other continents
> > and
> > > in
> > > > > > > > the Indian, Pacific, and Atlantic oceans. Metropolitan France
> > > > extends
> > > > > > > > from the Mediterranean Sea to the English Channel and the
> North
> > > > Sea,
> > > > > > > > and from the Rhine to the Atlantic Ocean. It is often
> referred
> > to
> > > > as
> > > > > > > > l’Hexagone because of the geometric shape of its territory."
> > > > > > > >       },
> > > > > > > >       "rdfs:label": [
> > > > > > > >         {
> > > > > > > >           "@language": "en",
> > > > > > > >           "@value": "France"
> > > > > > > >         },
> > > > > > > >         {
> > > > > > > >           "@language": "fr",
> > > > > > > >           "@value": "France"
> > > > > > > >         },
> > > > > > > >       ]
> > > > > > > >     },
> > > > > > > >
> > > > > > > >     {
> > > > > > > >       "@id": "http://dbpedia.org/resource/Paris",
> > > > > > > >       "@type": [
> > > > > > > >         "dbp-ont:Place",
> > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > >         "dbp-ont:Settlement",
> > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > >         "owl:Thing",
> > > > > > > >         "schema:Place"
> > > > > > > >       ],
> > > > > > > >       "foaf:depiction": [
> > > > > > > >         "
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > ",
> > > > > > > >         "
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > "
> > > > > > > >       ],
> > > > > > > >       "geo:lat": 48.8567,
> > > > > > > >       "geo:long": 2.3508,
> > > > > > > >       "rdfs:comment": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "Paris is the capital and largest city of
> > > France.
> > > > > It
> > > > > > > > is situated on the river Seine, in northern France, at the
> > heart
> > > of
> > > > > > > > the Île-de-France region (or Paris Region, French: Région
> > > > > parisienne).
> > > > > > > > As of January 2008 the city of Paris, within its
> administrative
> > > > > limits
> > > > > > > > largely unchanged since 1860, has an estimated population of
> > > > > 2,211,297
> > > > > > > > and a metropolitan population of 12,089,098, and is one of
> the
> > > most
> > > > > > > > populated metropolitan areas in Europe."
> > > > > > > >       },
> > > > > > > >       "rdfs:label": [
> > > > > > > >
> > > > > > > >         {
> > > > > > > >           "@language": "en",
> > > > > > > >           "@value": "Paris"
> > > > > > > >         },
> > > > > > > >         {
> > > > > > > >           "@language": "fr",
> > > > > > > >           "@value": "Paris"
> > > > > > > >         },
> > > > > > > >       ]
> > > > > > > >     },
> > > > > > > >    }
> > > > > > > >     {
> > > > > > > >       "@id":
> > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > >       "@type": [
> > > > > > > >         "enhancer:Enhancement",
> > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > >       ],
> > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > >       "dc:creator":
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > >       "enhancer:confidence": 0.6017613,
> > > > > > > >       "enhancer:end": 5,
> > > > > > > >       "enhancer:extracted-from":
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > >       "enhancer:selected-text": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "Paris"
> > > > > > > >       },
> > > > > > > >       "enhancer:selection-context": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "Paris is in France"
> > > > > > > >       },
> > > > > > > >       "enhancer:start": 0
> > > > > > > >     },
> > > > > > > >     {
> > > > > > > >       "@id":
> > > > "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > > > > > >       "@type": [
> > > > > > > >         "enhancer:Enhancement",
> > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > >       ],
> > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > >       "dc:creator":
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > >       "dc:relation":
> > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > >       "enhancer:confidence": 1.0,
> > > > > > > >       "enhancer:entity-label": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "France"
> > > > > > > >       },
> > > > > > > >       "enhancer:entity-reference": "
> > > > > http://dbpedia.org/resource/France
> > > > > > ",
> > > > > > > >       "enhancer:entity-type": [
> > > > > > > >         "dbp-ont:Country",
> > > > > > > >         "dbp-ont:Place",
> > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > >         "schema:Country",
> > > > > > > >         "schema:Place",
> > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > >         "owl:Thing"
> > > > > > > >       ],
> > > > > > > >       "enhancer:extracted-from":
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > >     },
> > > > > > > >     {
> > > > > > > >       "@id":
> > > > "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > > > > > >       "@type": [
> > > > > > > >         "enhancer:Enhancement",
> > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > >       ],
> > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > >       "dc:creator":
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > >       "dc:relation":
> > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > >       "enhancer:confidence": 0.25715446,
> > > > > > > >       "enhancer:entity-label": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "Vichy France"
> > > > > > > >       },
> > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > http://dbpedia.org/resource/Vichy_France",
> > > > > > > >       "enhancer:entity-type": [
> > > > > > > >         "dbp-ont:Country",
> > > > > > > >         "dbp-ont:Place",
> > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > >         "schema:Country",
> > > > > > > >         "schema:Place",
> > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > >         "owl:Thing"
> > > > > > > >       ],
> > > > > > > >       "enhancer:extracted-from":
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > >     },
> > > > > > > >     {
> > > > > > > >       "@id":
> > > > "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > > > > > >       "@type": [
> > > > > > > >         "enhancer:Enhancement",
> > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > >       ],
> > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > >       "dc:creator":
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > >       "dc:relation":
> > > > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > >       "enhancer:confidence": 0.1493264,
> > > > > > > >       "enhancer:entity-label": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "Paris Commune"
> > > > > > > >       },
> > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > http://dbpedia.org/resource/Paris_Commune",
> > > > > > > >       "enhancer:entity-type": [
> > > > > > > >         "dbp-ont:Country",
> > > > > > > >         "dbp-ont:Place",
> > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > >         "schema:Country",
> > > > > > > >         "schema:Place",
> > > > > > > >         "owl:Thing"
> > > > > > > >       ],
> > > > > > > >       "enhancer:extracted-from":
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > >     },
> > > > > > > >     {
> > > > > > > >       "@id":
> > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > >       "@type": [
> > > > > > > >         "enhancer:Enhancement",
> > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > >       ],
> > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > >       "dc:creator":
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > >       "enhancer:confidence": 0.99354976,
> > > > > > > >       "enhancer:end": 18,
> > > > > > > >       "enhancer:extracted-from":
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > >       "enhancer:selected-text": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "France"
> > > > > > > >       },
> > > > > > > >       "enhancer:selection-context": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "Paris is in France"
> > > > > > > >       },
> > > > > > > >       "enhancer:start": 12
> > > > > > > >     }
> > > > > > > >   ]
> > > > > > > > }
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <
> > daddywri@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Dileepa,
> > > > > > > > >
> > > > > > > > > Repository connectors have an abstraction that allows them
> to
> > > > > > generate
> > > > > > > > > compound documents (where a document has a primary
> > identifier,
> > > > and
> > > > > > > there
> > > > > > > > > are subdocuments that share that primary identifier and
> have
> > a
> > > > > > > secondary
> > > > > > > > > identifier).  This sounds a bit like what you are
> describing.
> > > > Does
> > > > > > > > Stanbol
> > > > > > > > > work by decorating an existing document, or does it work by
> > > > > > generating
> > > > > > > > all
> > > > > > > > > content for a document?
> > > > > > > > >
> > > > > > > > > Karl
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> > > > > > djayakody@zaizi.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi All,
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > While thanking you all for your input on Stanbol
> connector
> > > > > > > > requirement, I
> > > > > > > > > > would like to continue with modifying the Stanbol
> connector
> > > to
> > > > be
> > > > > > > > > > compatible with any output connector. If you guys can
> give
> > > some
> > > > > > > > guidance
> > > > > > > > > on
> > > > > > > > > > how the entity metadata should be added to the repository
> > > > > document
> > > > > > I
> > > > > > > > can
> > > > > > > > > > modify the stanbol connector accordingly.
> > > > > > > > > >
> > > > > > > > > > From Rafa's comments, I gathered we can add the entity
> > > metadata
> > > > > to
> > > > > > > the
> > > > > > > > > > repo.doc as key value pairs.
> > > > > > > > > > However this idea is not yet clear to me. There could be
> > 'N'
> > > > > number
> > > > > > > of
> > > > > > > > > > entities in a document and each of them will have some
> > common
> > > > > > > > attributes
> > > > > > > > > > such as name, id, type and specific attributes for
> > particular
> > > > > > entity
> > > > > > > > > type.
> > > > > > > > > > I'm not clear on how to maintain that structure of N
> number
> > > of
> > > > > > > entities
> > > > > > > > > > with their attributes in a repo.document as key value
> pairs
> > > and
> > > > > > make
> > > > > > > > them
> > > > > > > > > > LDPath compatible for retrieval in an output connector.
> > > > > > > > > >
> > > > > > > > > > @Rafa
> > > > > > > > > > If you can please elaborate on your suggestion it would
> be
> > > > > greatly
> > > > > > > > > helpful
> > > > > > > > > > to me.
> > > > > > > > > > All other suggestions are also welcome.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Dileepa
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <
> > > > daddywri@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I, too, agree.  Somebody will need to turn this
> connector
> > > > into
> > > > > > one
> > > > > > > > that
> > > > > > > > > > > plays by the rules.  It may be possible for someone on
> > the
> > > > team
> > > > > > > here
> > > > > > > > to
> > > > > > > > > > do
> > > > > > > > > > > that, but it won't be me; I'm seriously overextended at
> > the
> > > > > > moment.
> > > > > > > > It
> > > > > > > > > > > would be best if someone who knew the connector well
> > could
> > > do
> > > > > the
> > > > > > > > > > necessary
> > > > > > > > > > > work.
> > > > > > > > > > >
> > > > > > > > > > > Karl
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> > > > > > rharoapache@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I must agree with Antonio. When I started to work on
> > > this I
> > > > > was
> > > > > > > > > > expecting
> > > > > > > > > > > > the connector to work by just extracting the entities
> > and
> > > > > > > entities
> > > > > > > > > > > metadata
> > > > > > > > > > > > and put them as plain metadata of the documents,
> > probably
> > > > > > > following
> > > > > > > > > > > LDPATH
> > > > > > > > > > > > queries configuration
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > This is probably ok for Sensefy but I don’t think
> this
> > > > could
> > > > > be
> > > > > > > > > > suitable
> > > > > > > > > > > > to be included in the project. But this is only my
> > > opinion.
> > > > > Of
> > > > > > > > > course,
> > > > > > > > > > a
> > > > > > > > > > > > version of the connector that fully respect the
> > > ManifoldCF
> > > > > > > > > architecture
> > > > > > > > > > > > would be more than welcome in my opinion
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez
> > > > Morales
> > > > > > > > > > > > <ad...@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi
> > > > > > > > > > > > > The removal of the SolrWrapper is a must. It was a
> > > > > > requirement
> > > > > > > > for
> > > > > > > > > an
> > > > > > > > > > > > > internal project which has nothing to do here with
> a
> > > > normal
> > > > > > > > > operation
> > > > > > > > > > > of
> > > > > > > > > > > > > Manifold, so forcing the users to use Solr does not
> > fit
> > > > the
> > > > > > > > > Manifold
> > > > > > > > > > > > > philosophy.
> > > > > > > > > > > > > In my opinion, at this moment, a Stanbol connector
> > with
> > > > > such
> > > > > > a
> > > > > > > > big
> > > > > > > > > > > > > dependency which will not fit almost any use case
> is
> > > not
> > > > > very
> > > > > > > > > useful.
> > > > > > > > > > > > > You should think a way to convert Stanbol connector
> > > into
> > > > a
> > > > > > > normal
> > > > > > > > > > > > > Transformation connector without assuming that a
> > > specific
> > > > > > > output
> > > > > > > > > > > > connector
> > > > > > > > > > > > > will be used.
> > > > > > > > > > > > > Regards
> > > > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> > > > > > > djayakody@zaizi.com
> > > > > > > > >:
> > > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> I have developed a Stanbol connector for MCF. You
> > can
> > > > > check
> > > > > > it
> > > > > > > > out
> > > > > > > > > > > from
> > > > > > > > > > > > our
> > > > > > > > > > > > >> github repo here:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> It requires the SolrWrapper output connector which
> > > > indexes
> > > > > > > > > enhanced
> > > > > > > > > > > > >> documents, entities and entityTypes in separate
> Solr
> > > > > cores.
> > > > > > > > > > Basically
> > > > > > > > > > > it
> > > > > > > > > > > > >> requires 3 separate solr cores configured with a
> > > > specific
> > > > > > Solr
> > > > > > > > > > schema
> > > > > > > > > > > > for
> > > > > > > > > > > > >> primary documents, entities and entityTypes
> > > separately.
> > > > > This
> > > > > > > was
> > > > > > > > > > done
> > > > > > > > > > > > for
> > > > > > > > > > > > >> our specific use-case.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Perhaps we can discuss and remove the Stanbol
> > > > connector's
> > > > > > > > > dependency
> > > > > > > > > > > > with
> > > > > > > > > > > > >> SolrWrapper and have it working with any output
> > > > connector.
> > > > > > > > > > > > >> Please note that the Stanbol connector currently
> > has a
> > > > bug
> > > > > > in
> > > > > > > > the
> > > > > > > > > UI
> > > > > > > > > > > > >> (editSpecification) which I'm working on at the
> > > moment.
> > > > > > After
> > > > > > > > > fixing
> > > > > > > > > > > > that I
> > > > > > > > > > > > >> will update here. And also I will provide
> > > documentations
> > > > > for
> > > > > > > > > > > configuring
> > > > > > > > > > > > >> the connector.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > >> Dileepa
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David
> Pérez
> > > > > Morales
> > > > > > <
> > > > > > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> > Hi Joshua
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > It is not the list for that, but Marmotta is
> > already
> > > > > > > > integrated
> > > > > > > > > in
> > > > > > > > > > > > Apache
> > > > > > > > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > > > > > > > >> >
> > https://issues.apache.org/jira/browse/STANBOL-1165
> > > .
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Anyway, as I said this is not the list for that,
> > so
> > > > > let's
> > > > > > > use
> > > > > > > > > the
> > > > > > > > > > > > proper
> > > > > > > > > > > > >> > list for these things.
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Regards
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > > > > > > > joshua.dunham@gmail.com
> > > > > > > > > > >:
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > > Hey Dileepa,
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > >       In case you were interested, I pinged
> the
> > > > list a
> > > > > > few
> > > > > > > > > days
> > > > > > > > > > > ago
> > > > > > > > > > > > >> > asking
> > > > > > > > > > > > >> > > for integration tips for Apache Marmotta.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > I got some great tips on how to do this which
> > > could
> > > > > help
> > > > > > > > you.
> > > > > > > > > > > Since
> > > > > > > > > > > > >> > > Marmotta is a drop in replacement for Clarezza
> > on
> > > > > > Stanbol
> > > > > > > it
> > > > > > > > > may
> > > > > > > > > > > be
> > > > > > > > > > > > >> > easier
> > > > > > > > > > > > >> > > for you to take this way.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > I'm not a Java programmer but I'm bringing
> this
> > > > > problem
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > > >> > development
> > > > > > > > > > > > >> > > staff at my company for assistance. If you
> like
> > > the
> > > > > > > Marmotta
> > > > > > > > > > > > approach
> > > > > > > > > > > > >> we
> > > > > > > > > > > > >> > > may gain more traction solving the same
> > > integration.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > I'm also integrating Marmotta with Stanbol so
> > the
> > > > > effect
> > > > > > > > would
> > > > > > > > > > be
> > > > > > > > > > > > the
> > > > > > > > > > > > >> > same
> > > > > > > > > > > > >> > > except not using the Stanbol API for data
> import
> > > in
> > > > > > favor
> > > > > > > of
> > > > > > > > > > > > Marmotta.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > Best,
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > -J
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa
> Jayakody <
> > > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > > >
> > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > Thanks you for the feedback and offering
> your
> > > help
> > > > > in
> > > > > > > > this.
> > > > > > > > > > > > >> > > > Let me get back to you on where to start the
> > > code
> > > > > > base.
> > > > > > > > > > > > >> > > > As the first step, I would like to start by
> > > > > creating a
> > > > > > > > > > > > architecture
> > > > > > > > > > > > >> > > diagram
> > > > > > > > > > > > >> > > > for the connector.
> > > > > > > > > > > > >> > > > I will send the diagram for your review
> soon.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > > >> > > > Dileepa
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > ------------------------------
> > > > > > > > > > > > >> > > > This message should be regarded as
> > confidential.
> > > > If
> > > > > > you
> > > > > > > > have
> > > > > > > > > > > > received
> > > > > > > > > > > > >> > > this
> > > > > > > > > > > > >> > > > email in error please notify the sender and
> > > > destroy
> > > > > it
> > > > > > > > > > > > immediately.
> > > > > > > > > > > > >> > > > Statements of intent shall only become
> binding
> > > > when
> > > > > > > > > confirmed
> > > > > > > > > > in
> > > > > > > > > > > > hard
> > > > > > > > > > > > >> > > copy
> > > > > > > > > > > > >> > > > by an authorised signatory.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > Zaizi Ltd is registered in England and Wales
> > > with
> > > > > the
> > > > > > > > > > > registration
> > > > > > > > > > > > >> > number
> > > > > > > > > > > > >> > > > 6440931. The Registered Office is Brook
> House,
> > > 229
> > > > > > > > Shepherds
> > > > > > > > > > > Bush
> > > > > > > > > > > > >> Road,
> > > > > > > > > > > > >> > > > London W6 7AN.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> --
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> ------------------------------
> > > > > > > > > > > > >> This message should be regarded as confidential.
> If
> > > you
> > > > > have
> > > > > > > > > > received
> > > > > > > > > > > > this
> > > > > > > > > > > > >> email in error please notify the sender and
> destroy
> > it
> > > > > > > > > immediately.
> > > > > > > > > > > > >> Statements of intent shall only become binding
> when
> > > > > > confirmed
> > > > > > > in
> > > > > > > > > > hard
> > > > > > > > > > > > copy
> > > > > > > > > > > > >> by an authorised signatory.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Zaizi Ltd is registered in England and Wales with
> > the
> > > > > > > > registration
> > > > > > > > > > > > number
> > > > > > > > > > > > >> 6440931. The Registered Office is Brook House, 229
> > > > > Shepherds
> > > > > > > > Bush
> > > > > > > > > > > Road,
> > > > > > > > > > > > >> London W6 7AN.
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > ------------------------------
> > > > > > > > > > This message should be regarded as confidential. If you
> > have
> > > > > > received
> > > > > > > > > this
> > > > > > > > > > email in error please notify the sender and destroy it
> > > > > immediately.
> > > > > > > > > > Statements of intent shall only become binding when
> > confirmed
> > > > in
> > > > > > hard
> > > > > > > > > copy
> > > > > > > > > > by an authorised signatory.
> > > > > > > > > >
> > > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > > registration
> > > > > > > > number
> > > > > > > > > > 6440931. The Registered Office is Brook House, 229
> > Shepherds
> > > > Bush
> > > > > > > Road,
> > > > > > > > > > London W6 7AN.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > ------------------------------
> > > > > > > > This message should be regarded as confidential. If you have
> > > > received
> > > > > > > this
> > > > > > > > email in error please notify the sender and destroy it
> > > immediately.
> > > > > > > > Statements of intent shall only become binding when confirmed
> > in
> > > > hard
> > > > > > > copy
> > > > > > > > by an authorised signatory.
> > > > > > > >
> > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > registration
> > > > > > number
> > > > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> > Bush
> > > > > Road,
> > > > > > > > London W6 7AN.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > ------------------------------
> > > > > > This message should be regarded as confidential. If you have
> > received
> > > > > this
> > > > > > email in error please notify the sender and destroy it
> immediately.
> > > > > > Statements of intent shall only become binding when confirmed in
> > hard
> > > > > copy
> > > > > > by an authorised signatory.
> > > > > >
> > > > > > Zaizi Ltd is registered in England and Wales with the
> registration
> > > > number
> > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > > Road,
> > > > > > London W6 7AN.
> > > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > > ------------------------------
> > > > This message should be regarded as confidential. If you have received
> > > this
> > > > email in error please notify the sender and destroy it immediately.
> > > > Statements of intent shall only become binding when confirmed in hard
> > > copy
> > > > by an authorised signatory.
> > > >
> > > > Zaizi Ltd is registered in England and Wales with the registration
> > number
> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> Road,
> > > > London W6 7AN.
> > > >
> > >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Dileepa Jayakody <dj...@zaizi.com>.
Thanks a lot Rafa for pointing that out. big miss as  I didn't test the
LDPath configuration part yet. More improvements to be done.
I will do the required mprovements as pointed out.

Regards,
Dileepa


On Fri, Dec 11, 2015 at 8:42 PM, Rafa Haro <rh...@apache.org> wrote:

> Hi Dileepa,
>
> The problem is not in that part on the code, it is rather on this part:
>
> if (entity != null) { Collection<String> properties = entity.
> getProperties(); for (String property : properties) { String
> targetFieldName = derefFields.get(property); Set<String> propValues =
> entityPropertyMap.get(targetFieldName); if (propValues == null) {
> propValues = new HashSet<String>(); } Collection<String> entityPropValues =
> entity.getPropertyValues(property); propValues.addAll(entityPropValues);
> entityPropertyMap.put(targetFieldName, propValues); } }
> You are collecting from the EnhancementStructure response just only the
> configured dereferenced fields and LDPath fields are ignored. Also, there
> is a potential bug in that code if there is no dereferencing field
> configured for a certain entity property here:
>
> String targetFieldName = derefFields.get(property);
>
> targetFieldName would be Null then. Instead of trying to index every
> property, you should just collect the configured ones by the user (or at
> least, if the user wants all of them, provide a configuration option for
> that).
>
> Anyway, going back to LDPath issue, please take into account that when you
> define a field you must use a custom Namespace and Prefix for later being
> able to retrieve that property from the entity. If you don't do that,
> Stanbol will provide a random namespace for that property. Check this
> example from RedLink SDK:
>
>
> https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443
>
> Hope that helps
>
> On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <da...@gmail.com> wrote:
>
> > The next step would be to pull this code into an svn branch.  This is
> > something I can tackled after the 2.3 release candidate is put together.
> >
> > Thanks,
> > Karl
> >
> >
> > On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <dj...@zaizi.com>
> > wrote:
> >
> > > Hi Rafa,
> > >
> > > Thanks for reviewing my code and for your feedback. Please see my
> > comments
> > > inline below.
> > >
> > >
> > > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <rh...@apache.org> wrote:
> > >
> > > > Hi Dileepa,
> > > >
> > > > This seems to be going in the right direction clearly now in my
> > opinion.
> > > > Quick comments after a first review:
> > > >
> > > >
> > > >    - Rejecting a document because it can't be enhanced is kind of
> > tough.
> > > >    You are preventing a document to be finally indexed because the
> > > > enhancement
> > > >    didn't perform correctly, probably it is better just to let them
> > > > continue
> > > >    the workflow within the system
> > > >
> > >
> > > Got your point. Will remove that part from the code
> > >
> > >
> > > >    - As I can deduce for the code, you are correctly extracting the
> > > >    configured dereferenced fields, but you are not processing at all
> > the
> > > >    LDPath results
> > > >
> > > > I'm passing the LDPath program as an enhancer parameter to Stanbol to
> > > retrieve the enhancement result according to the LDPath program (which
> is
> > > given as a text string in the connector UI).
> > > If the user has not defined a LDPath program and added derefence fields
> > in
> > > the UI instead, then the enhancement request will be built using the
> > > dereference fields as enhancer parameters.
> > >
> > >
> > > If neither a LDPath or dereference fields are given in the
> transformation
> > > UI, then I just call the given enhancement chain without any other
> > enhancer
> > > paramaters.
> > >
> > > Please refer below code segment where I do this and let me know if it
> > needs
> > > more improvements.
> > >
> > >             // ldpath program is given priority if it's set
> > >             if (ldPath != null)
> > >             {
> > >                 parameters =
> > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
> > >             }
> > >             else if (!derefFields.isEmpty())
> > >             {
> > >                 parameters =
> > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
> > >                         derefFields.keySet()).build();
> > >             }
> > >             else
> > >             {
> > >                 parameters =
> > >
> EnhancerParameters.builder().setChain(chain).setContent(content).build();
> > >             }
> > >             eRes = enhancerClient.enhance(parameters);
> > >
> > >
> > > Thanks,
> > > Dileepa
> > >
> > >
> > > >
> > > > Cheers,
> > > > Rafa
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <
> djayakody@zaizi.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > As per our discussion I have modified the Stanbol Connector so that
> > it
> > > > adds
> > > > > all extracted entity URIs and entity attributes to the repository
> > > > document
> > > > > as fields.
> > > > >
> > > > > On a separate branch I have committed this code to our github
> project
> > > > > sensefy-connectors.
> > > > > You can find the source code here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > > > > Let me know your feedback.
> > > > >
> > > > > I will write a blog post on how to add it in a connection and get
> > > > > ehancement results and share it with you.
> > > > >
> > > > > Thanks,
> > > > > Dileepa
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <da...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Dileepa,
> > > > > >
> > > > > > You cannot create sub-documents in a transformation connector.
> And
> > > > > adding
> > > > > > that capability to the framework is not possible; we would be
> > missing
> > > > key
> > > > > > bookkeeping logic if that was allowed.
> > > > > >
> > > > > > Karl
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
> > > djayakody@zaizi.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Karl,
> > > > > > >
> > > > > > > Thanks a lot for the pointer.
> > > > > > >
> > > > > > > Stanbol doesn't update an existing document, it generates a new
> > > > > response
> > > > > > > with requested enhancement details for the content enhansment
> > > > request.
> > > > > > > For example for a request like : "Paris is a city in France"
> > > > following
> > > > > > RDF
> > > > > > > response [1] is given by Stanbol.
> > > > > > >
> > > > > > > In the Stanbol connector, enhancement artifacts such as
> > > > TextAnnotations
> > > > > > > and EntityAnnotations are extracted from the RDF response, to
> > > > generate
> > > > > > the
> > > > > > > entity abstractions and add them to the mcf repository
> document.
> > > > > > Currently
> > > > > > > in the Stanbol connector we have added these entity
> abstractions
> > as
> > > > > JSON
> > > > > > > strings to a multi-valued 'entities' field in the repository
> > > document
> > > > > and
> > > > > > > we parse that JSON in the SolrWrapper output connector to index
> > in
> > > > > > separate
> > > > > > > Solr cores (primary documents, linked entities and entity types
> > > with
> > > > > > their
> > > > > > > attributes).
> > > > > > >
> > > > > > > Can we can have a primary repository document and create sub
> > > > documents
> > > > > > for
> > > > > > > the extracted entities? Is it possible to generate sub
> documents
> > > for
> > > > a
> > > > > > > repo-document in a transformation connector?
> > > > > > >
> > > > > > > Thanks.
> > > > > > > Dileepa
> > > > > > >
> > > > > > > [1] Sample Stanbol response
> > > > > > >
> > > > > > > {
> > > > > > >   "@context": {
> > > > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > > > > >     "dc": "http://purl.org/dc/terms/",
> > > > > > >     "dc:created": {
> > > > > > >       "@type": "xsd:dateTime"
> > > > > > >     },
> > > > > > >     "enhancer": "http://fise.iks-project.eu/ontology/",
> > > > > > >     "enhancer:confidence": {
> > > > > > >       "@type": "xsd:double"
> > > > > > >     },
> > > > > > >     "enhancer:end": {
> > > > > > >       "@type": "xsd:int"
> > > > > > >     },
> > > > > > >     "enhancer:entity-reference": {
> > > > > > >       "@type": "@id"
> > > > > > >     },
> > > > > > >     "enhancer:entity-type": {
> > > > > > >       "@type": "@id"
> > > > > > >     },
> > > > > > >     "enhancer:extracted-from": {
> > > > > > >       "@type": "@id"
> > > > > > >     },
> > > > > > >     "enhancer:start": {
> > > > > > >       "@type": "xsd:int"
> > > > > > >     },
> > > > > > >     "entityhub": "
> > > > > > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > > > > ",
> > > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > > > > >     "foaf:depiction": {
> > > > > > >       "@type": "@id"
> > > > > > >     },
> > > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > > > > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > > > > > >     "schema": "http://schema.org/",
> > > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > > > > >   },
> > > > > > >   "@graph": [
> > > > > > >     {
> > > > > > >       "@id": "http://dbpedia.org/resource/France",
> > > > > > >       "@type": [
> > > > > > >         "dbp-ont:Country",
> > > > > > >         "dbp-ont:Place",
> > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > >         "owl:Thing",
> > > > > > >         "schema:Country",
> > > > > > >         "schema:Place"
> > > > > > >       ],
> > > > > > >       "foaf:depiction": [
> > > > > > >         "
> > > > > > >
> > > >
> http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > > > > ",
> > > > > > >         "
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > > > > "
> > > > > > >       ],
> > > > > > >       "rdfs:comment": {
> > > > > > >         "@language": "en",
> > > > > > >         "@value": "France, officially the French Republic, is a
> > > > > > > unitary semi-presidential republic in Western Europe with
> several
> > > > > > > overseas territories and islands located on other continents
> and
> > in
> > > > > > > the Indian, Pacific, and Atlantic oceans. Metropolitan France
> > > extends
> > > > > > > from the Mediterranean Sea to the English Channel and the North
> > > Sea,
> > > > > > > and from the Rhine to the Atlantic Ocean. It is often referred
> to
> > > as
> > > > > > > l’Hexagone because of the geometric shape of its territory."
> > > > > > >       },
> > > > > > >       "rdfs:label": [
> > > > > > >         {
> > > > > > >           "@language": "en",
> > > > > > >           "@value": "France"
> > > > > > >         },
> > > > > > >         {
> > > > > > >           "@language": "fr",
> > > > > > >           "@value": "France"
> > > > > > >         },
> > > > > > >       ]
> > > > > > >     },
> > > > > > >
> > > > > > >     {
> > > > > > >       "@id": "http://dbpedia.org/resource/Paris",
> > > > > > >       "@type": [
> > > > > > >         "dbp-ont:Place",
> > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > >         "dbp-ont:Settlement",
> > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > >         "owl:Thing",
> > > > > > >         "schema:Place"
> > > > > > >       ],
> > > > > > >       "foaf:depiction": [
> > > > > > >         "
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > ",
> > > > > > >         "
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > "
> > > > > > >       ],
> > > > > > >       "geo:lat": 48.8567,
> > > > > > >       "geo:long": 2.3508,
> > > > > > >       "rdfs:comment": {
> > > > > > >         "@language": "en",
> > > > > > >         "@value": "Paris is the capital and largest city of
> > France.
> > > > It
> > > > > > > is situated on the river Seine, in northern France, at the
> heart
> > of
> > > > > > > the Île-de-France region (or Paris Region, French: Région
> > > > parisienne).
> > > > > > > As of January 2008 the city of Paris, within its administrative
> > > > limits
> > > > > > > largely unchanged since 1860, has an estimated population of
> > > > 2,211,297
> > > > > > > and a metropolitan population of 12,089,098, and is one of the
> > most
> > > > > > > populated metropolitan areas in Europe."
> > > > > > >       },
> > > > > > >       "rdfs:label": [
> > > > > > >
> > > > > > >         {
> > > > > > >           "@language": "en",
> > > > > > >           "@value": "Paris"
> > > > > > >         },
> > > > > > >         {
> > > > > > >           "@language": "fr",
> > > > > > >           "@value": "Paris"
> > > > > > >         },
> > > > > > >       ]
> > > > > > >     },
> > > > > > >    }
> > > > > > >     {
> > > > > > >       "@id":
> > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > >       "@type": [
> > > > > > >         "enhancer:Enhancement",
> > > > > > >         "enhancer:TextAnnotation"
> > > > > > >       ],
> > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > >       "dc:creator":
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > >       "enhancer:confidence": 0.6017613,
> > > > > > >       "enhancer:end": 5,
> > > > > > >       "enhancer:extracted-from":
> > > > > > >
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > >       "enhancer:selected-text": {
> > > > > > >         "@language": "en",
> > > > > > >         "@value": "Paris"
> > > > > > >       },
> > > > > > >       "enhancer:selection-context": {
> > > > > > >         "@language": "en",
> > > > > > >         "@value": "Paris is in France"
> > > > > > >       },
> > > > > > >       "enhancer:start": 0
> > > > > > >     },
> > > > > > >     {
> > > > > > >       "@id":
> > > "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > > > > >       "@type": [
> > > > > > >         "enhancer:Enhancement",
> > > > > > >         "enhancer:EntityAnnotation"
> > > > > > >       ],
> > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > >       "dc:creator":
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > >       "dc:relation":
> > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > >       "enhancer:confidence": 1.0,
> > > > > > >       "enhancer:entity-label": {
> > > > > > >         "@language": "en",
> > > > > > >         "@value": "France"
> > > > > > >       },
> > > > > > >       "enhancer:entity-reference": "
> > > > http://dbpedia.org/resource/France
> > > > > ",
> > > > > > >       "enhancer:entity-type": [
> > > > > > >         "dbp-ont:Country",
> > > > > > >         "dbp-ont:Place",
> > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > >         "schema:Country",
> > > > > > >         "schema:Place",
> > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > >         "owl:Thing"
> > > > > > >       ],
> > > > > > >       "enhancer:extracted-from":
> > > > > > >
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > >       "entityhub:site": "dbpedia"
> > > > > > >     },
> > > > > > >     {
> > > > > > >       "@id":
> > > "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > > > > >       "@type": [
> > > > > > >         "enhancer:Enhancement",
> > > > > > >         "enhancer:EntityAnnotation"
> > > > > > >       ],
> > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > >       "dc:creator":
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > >       "dc:relation":
> > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > >       "enhancer:confidence": 0.25715446,
> > > > > > >       "enhancer:entity-label": {
> > > > > > >         "@language": "en",
> > > > > > >         "@value": "Vichy France"
> > > > > > >       },
> > > > > > >       "enhancer:entity-reference": "
> > > > > > > http://dbpedia.org/resource/Vichy_France",
> > > > > > >       "enhancer:entity-type": [
> > > > > > >         "dbp-ont:Country",
> > > > > > >         "dbp-ont:Place",
> > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > >         "schema:Country",
> > > > > > >         "schema:Place",
> > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > >         "owl:Thing"
> > > > > > >       ],
> > > > > > >       "enhancer:extracted-from":
> > > > > > >
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > >       "entityhub:site": "dbpedia"
> > > > > > >     },
> > > > > > >     {
> > > > > > >       "@id":
> > > "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > > > > >       "@type": [
> > > > > > >         "enhancer:Enhancement",
> > > > > > >         "enhancer:EntityAnnotation"
> > > > > > >       ],
> > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > >       "dc:creator":
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > >       "dc:relation":
> > > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > >       "enhancer:confidence": 0.1493264,
> > > > > > >       "enhancer:entity-label": {
> > > > > > >         "@language": "en",
> > > > > > >         "@value": "Paris Commune"
> > > > > > >       },
> > > > > > >       "enhancer:entity-reference": "
> > > > > > > http://dbpedia.org/resource/Paris_Commune",
> > > > > > >       "enhancer:entity-type": [
> > > > > > >         "dbp-ont:Country",
> > > > > > >         "dbp-ont:Place",
> > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > >         "schema:Country",
> > > > > > >         "schema:Place",
> > > > > > >         "owl:Thing"
> > > > > > >       ],
> > > > > > >       "enhancer:extracted-from":
> > > > > > >
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > >       "entityhub:site": "dbpedia"
> > > > > > >     },
> > > > > > >     {
> > > > > > >       "@id":
> > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > >       "@type": [
> > > > > > >         "enhancer:Enhancement",
> > > > > > >         "enhancer:TextAnnotation"
> > > > > > >       ],
> > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > >       "dc:creator":
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > >       "enhancer:confidence": 0.99354976,
> > > > > > >       "enhancer:end": 18,
> > > > > > >       "enhancer:extracted-from":
> > > > > > >
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > >       "enhancer:selected-text": {
> > > > > > >         "@language": "en",
> > > > > > >         "@value": "France"
> > > > > > >       },
> > > > > > >       "enhancer:selection-context": {
> > > > > > >         "@language": "en",
> > > > > > >         "@value": "Paris is in France"
> > > > > > >       },
> > > > > > >       "enhancer:start": 12
> > > > > > >     }
> > > > > > >   ]
> > > > > > > }
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <
> daddywri@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Dileepa,
> > > > > > > >
> > > > > > > > Repository connectors have an abstraction that allows them to
> > > > > generate
> > > > > > > > compound documents (where a document has a primary
> identifier,
> > > and
> > > > > > there
> > > > > > > > are subdocuments that share that primary identifier and have
> a
> > > > > > secondary
> > > > > > > > identifier).  This sounds a bit like what you are describing.
> > > Does
> > > > > > > Stanbol
> > > > > > > > work by decorating an existing document, or does it work by
> > > > > generating
> > > > > > > all
> > > > > > > > content for a document?
> > > > > > > >
> > > > > > > > Karl
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> > > > > djayakody@zaizi.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > While thanking you all for your input on Stanbol connector
> > > > > > > requirement, I
> > > > > > > > > would like to continue with modifying the Stanbol connector
> > to
> > > be
> > > > > > > > > compatible with any output connector. If you guys can give
> > some
> > > > > > > guidance
> > > > > > > > on
> > > > > > > > > how the entity metadata should be added to the repository
> > > > document
> > > > > I
> > > > > > > can
> > > > > > > > > modify the stanbol connector accordingly.
> > > > > > > > >
> > > > > > > > > From Rafa's comments, I gathered we can add the entity
> > metadata
> > > > to
> > > > > > the
> > > > > > > > > repo.doc as key value pairs.
> > > > > > > > > However this idea is not yet clear to me. There could be
> 'N'
> > > > number
> > > > > > of
> > > > > > > > > entities in a document and each of them will have some
> common
> > > > > > > attributes
> > > > > > > > > such as name, id, type and specific attributes for
> particular
> > > > > entity
> > > > > > > > type.
> > > > > > > > > I'm not clear on how to maintain that structure of N number
> > of
> > > > > > entities
> > > > > > > > > with their attributes in a repo.document as key value pairs
> > and
> > > > > make
> > > > > > > them
> > > > > > > > > LDPath compatible for retrieval in an output connector.
> > > > > > > > >
> > > > > > > > > @Rafa
> > > > > > > > > If you can please elaborate on your suggestion it would be
> > > > greatly
> > > > > > > > helpful
> > > > > > > > > to me.
> > > > > > > > > All other suggestions are also welcome.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Dileepa
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <
> > > daddywri@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I, too, agree.  Somebody will need to turn this connector
> > > into
> > > > > one
> > > > > > > that
> > > > > > > > > > plays by the rules.  It may be possible for someone on
> the
> > > team
> > > > > > here
> > > > > > > to
> > > > > > > > > do
> > > > > > > > > > that, but it won't be me; I'm seriously overextended at
> the
> > > > > moment.
> > > > > > > It
> > > > > > > > > > would be best if someone who knew the connector well
> could
> > do
> > > > the
> > > > > > > > > necessary
> > > > > > > > > > work.
> > > > > > > > > >
> > > > > > > > > > Karl
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> > > > > rharoapache@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I must agree with Antonio. When I started to work on
> > this I
> > > > was
> > > > > > > > > expecting
> > > > > > > > > > > the connector to work by just extracting the entities
> and
> > > > > > entities
> > > > > > > > > > metadata
> > > > > > > > > > > and put them as plain metadata of the documents,
> probably
> > > > > > following
> > > > > > > > > > LDPATH
> > > > > > > > > > > queries configuration
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > This is probably ok for Sensefy but I don’t think this
> > > could
> > > > be
> > > > > > > > > suitable
> > > > > > > > > > > to be included in the project. But this is only my
> > opinion.
> > > > Of
> > > > > > > > course,
> > > > > > > > > a
> > > > > > > > > > > version of the connector that fully respect the
> > ManifoldCF
> > > > > > > > architecture
> > > > > > > > > > > would be more than welcome in my opinion
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez
> > > Morales
> > > > > > > > > > > <ad...@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi
> > > > > > > > > > > > The removal of the SolrWrapper is a must. It was a
> > > > > requirement
> > > > > > > for
> > > > > > > > an
> > > > > > > > > > > > internal project which has nothing to do here with a
> > > normal
> > > > > > > > operation
> > > > > > > > > > of
> > > > > > > > > > > > Manifold, so forcing the users to use Solr does not
> fit
> > > the
> > > > > > > > Manifold
> > > > > > > > > > > > philosophy.
> > > > > > > > > > > > In my opinion, at this moment, a Stanbol connector
> with
> > > > such
> > > > > a
> > > > > > > big
> > > > > > > > > > > > dependency which will not fit almost any use case is
> > not
> > > > very
> > > > > > > > useful.
> > > > > > > > > > > > You should think a way to convert Stanbol connector
> > into
> > > a
> > > > > > normal
> > > > > > > > > > > > Transformation connector without assuming that a
> > specific
> > > > > > output
> > > > > > > > > > > connector
> > > > > > > > > > > > will be used.
> > > > > > > > > > > > Regards
> > > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> > > > > > djayakody@zaizi.com
> > > > > > > >:
> > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > >>
> > > > > > > > > > > >> I have developed a Stanbol connector for MCF. You
> can
> > > > check
> > > > > it
> > > > > > > out
> > > > > > > > > > from
> > > > > > > > > > > our
> > > > > > > > > > > >> github repo here:
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > > > > >>
> > > > > > > > > > > >> It requires the SolrWrapper output connector which
> > > indexes
> > > > > > > > enhanced
> > > > > > > > > > > >> documents, entities and entityTypes in separate Solr
> > > > cores.
> > > > > > > > > Basically
> > > > > > > > > > it
> > > > > > > > > > > >> requires 3 separate solr cores configured with a
> > > specific
> > > > > Solr
> > > > > > > > > schema
> > > > > > > > > > > for
> > > > > > > > > > > >> primary documents, entities and entityTypes
> > separately.
> > > > This
> > > > > > was
> > > > > > > > > done
> > > > > > > > > > > for
> > > > > > > > > > > >> our specific use-case.
> > > > > > > > > > > >>
> > > > > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > > > > >>
> > > > > > > > > > > >> Perhaps we can discuss and remove the Stanbol
> > > connector's
> > > > > > > > dependency
> > > > > > > > > > > with
> > > > > > > > > > > >> SolrWrapper and have it working with any output
> > > connector.
> > > > > > > > > > > >> Please note that the Stanbol connector currently
> has a
> > > bug
> > > > > in
> > > > > > > the
> > > > > > > > UI
> > > > > > > > > > > >> (editSpecification) which I'm working on at the
> > moment.
> > > > > After
> > > > > > > > fixing
> > > > > > > > > > > that I
> > > > > > > > > > > >> will update here. And also I will provide
> > documentations
> > > > for
> > > > > > > > > > configuring
> > > > > > > > > > > >> the connector.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Thanks,
> > > > > > > > > > > >> Dileepa
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez
> > > > Morales
> > > > > <
> > > > > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >> > Hi Joshua
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > It is not the list for that, but Marmotta is
> already
> > > > > > > integrated
> > > > > > > > in
> > > > > > > > > > > Apache
> > > > > > > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > > > > > > >> >
> https://issues.apache.org/jira/browse/STANBOL-1165
> > .
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Anyway, as I said this is not the list for that,
> so
> > > > let's
> > > > > > use
> > > > > > > > the
> > > > > > > > > > > proper
> > > > > > > > > > > >> > list for these things.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Regards
> > > > > > > > > > > >> >
> > > > > > > > > > > >> >
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > > > > > > joshua.dunham@gmail.com
> > > > > > > > > >:
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > > Hey Dileepa,
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > >       In case you were interested, I pinged the
> > > list a
> > > > > few
> > > > > > > > days
> > > > > > > > > > ago
> > > > > > > > > > > >> > asking
> > > > > > > > > > > >> > > for integration tips for Apache Marmotta.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > I got some great tips on how to do this which
> > could
> > > > help
> > > > > > > you.
> > > > > > > > > > Since
> > > > > > > > > > > >> > > Marmotta is a drop in replacement for Clarezza
> on
> > > > > Stanbol
> > > > > > it
> > > > > > > > may
> > > > > > > > > > be
> > > > > > > > > > > >> > easier
> > > > > > > > > > > >> > > for you to take this way.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > I'm not a Java programmer but I'm bringing this
> > > > problem
> > > > > to
> > > > > > > the
> > > > > > > > > > > >> > development
> > > > > > > > > > > >> > > staff at my company for assistance. If you like
> > the
> > > > > > Marmotta
> > > > > > > > > > > approach
> > > > > > > > > > > >> we
> > > > > > > > > > > >> > > may gain more traction solving the same
> > integration.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > I'm also integrating Marmotta with Stanbol so
> the
> > > > effect
> > > > > > > would
> > > > > > > > > be
> > > > > > > > > > > the
> > > > > > > > > > > >> > same
> > > > > > > > > > > >> > > except not using the Stanbol API for data import
> > in
> > > > > favor
> > > > > > of
> > > > > > > > > > > Marmotta.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > Best,
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > -J
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > >
> > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > Thanks you for the feedback and offering your
> > help
> > > > in
> > > > > > > this.
> > > > > > > > > > > >> > > > Let me get back to you on where to start the
> > code
> > > > > base.
> > > > > > > > > > > >> > > > As the first step, I would like to start by
> > > > creating a
> > > > > > > > > > > architecture
> > > > > > > > > > > >> > > diagram
> > > > > > > > > > > >> > > > for the connector.
> > > > > > > > > > > >> > > > I will send the diagram for your review soon.
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > >> > > > Dileepa
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > --
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > ------------------------------
> > > > > > > > > > > >> > > > This message should be regarded as
> confidential.
> > > If
> > > > > you
> > > > > > > have
> > > > > > > > > > > received
> > > > > > > > > > > >> > > this
> > > > > > > > > > > >> > > > email in error please notify the sender and
> > > destroy
> > > > it
> > > > > > > > > > > immediately.
> > > > > > > > > > > >> > > > Statements of intent shall only become binding
> > > when
> > > > > > > > confirmed
> > > > > > > > > in
> > > > > > > > > > > hard
> > > > > > > > > > > >> > > copy
> > > > > > > > > > > >> > > > by an authorised signatory.
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > Zaizi Ltd is registered in England and Wales
> > with
> > > > the
> > > > > > > > > > registration
> > > > > > > > > > > >> > number
> > > > > > > > > > > >> > > > 6440931. The Registered Office is Brook House,
> > 229
> > > > > > > Shepherds
> > > > > > > > > > Bush
> > > > > > > > > > > >> Road,
> > > > > > > > > > > >> > > > London W6 7AN.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > > > > > > > > > > >> --
> > > > > > > > > > > >>
> > > > > > > > > > > >> ------------------------------
> > > > > > > > > > > >> This message should be regarded as confidential. If
> > you
> > > > have
> > > > > > > > > received
> > > > > > > > > > > this
> > > > > > > > > > > >> email in error please notify the sender and destroy
> it
> > > > > > > > immediately.
> > > > > > > > > > > >> Statements of intent shall only become binding when
> > > > > confirmed
> > > > > > in
> > > > > > > > > hard
> > > > > > > > > > > copy
> > > > > > > > > > > >> by an authorised signatory.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Zaizi Ltd is registered in England and Wales with
> the
> > > > > > > registration
> > > > > > > > > > > number
> > > > > > > > > > > >> 6440931. The Registered Office is Brook House, 229
> > > > Shepherds
> > > > > > > Bush
> > > > > > > > > > Road,
> > > > > > > > > > > >> London W6 7AN.
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > ------------------------------
> > > > > > > > > This message should be regarded as confidential. If you
> have
> > > > > received
> > > > > > > > this
> > > > > > > > > email in error please notify the sender and destroy it
> > > > immediately.
> > > > > > > > > Statements of intent shall only become binding when
> confirmed
> > > in
> > > > > hard
> > > > > > > > copy
> > > > > > > > > by an authorised signatory.
> > > > > > > > >
> > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > registration
> > > > > > > number
> > > > > > > > > 6440931. The Registered Office is Brook House, 229
> Shepherds
> > > Bush
> > > > > > Road,
> > > > > > > > > London W6 7AN.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > ------------------------------
> > > > > > > This message should be regarded as confidential. If you have
> > > received
> > > > > > this
> > > > > > > email in error please notify the sender and destroy it
> > immediately.
> > > > > > > Statements of intent shall only become binding when confirmed
> in
> > > hard
> > > > > > copy
> > > > > > > by an authorised signatory.
> > > > > > >
> > > > > > > Zaizi Ltd is registered in England and Wales with the
> > registration
> > > > > number
> > > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> Bush
> > > > Road,
> > > > > > > London W6 7AN.
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > ------------------------------
> > > > > This message should be regarded as confidential. If you have
> received
> > > > this
> > > > > email in error please notify the sender and destroy it immediately.
> > > > > Statements of intent shall only become binding when confirmed in
> hard
> > > > copy
> > > > > by an authorised signatory.
> > > > >
> > > > > Zaizi Ltd is registered in England and Wales with the registration
> > > number
> > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > Road,
> > > > > London W6 7AN.
> > > > >
> > > >
> > >
> > > --
> > >
> > > ------------------------------
> > > This message should be regarded as confidential. If you have received
> > this
> > > email in error please notify the sender and destroy it immediately.
> > > Statements of intent shall only become binding when confirmed in hard
> > copy
> > > by an authorised signatory.
> > >
> > > Zaizi Ltd is registered in England and Wales with the registration
> number
> > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > > London W6 7AN.
> > >
> >
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Rafa Haro <rh...@apache.org>.
Hi Dileepa,

The problem is not in that part on the code, it is rather on this part:

if (entity != null) { Collection<String> properties = entity.
getProperties(); for (String property : properties) { String
targetFieldName = derefFields.get(property); Set<String> propValues =
entityPropertyMap.get(targetFieldName); if (propValues == null) {
propValues = new HashSet<String>(); } Collection<String> entityPropValues =
entity.getPropertyValues(property); propValues.addAll(entityPropValues);
entityPropertyMap.put(targetFieldName, propValues); } }
You are collecting from the EnhancementStructure response just only the
configured dereferenced fields and LDPath fields are ignored. Also, there
is a potential bug in that code if there is no dereferencing field
configured for a certain entity property here:

String targetFieldName = derefFields.get(property);

targetFieldName would be Null then. Instead of trying to index every
property, you should just collect the configured ones by the user (or at
least, if the user wants all of them, provide a configuration option for
that).

Anyway, going back to LDPath issue, please take into account that when you
define a field you must use a custom Namespace and Prefix for later being
able to retrieve that property from the entity. If you don't do that,
Stanbol will provide a random namespace for that property. Check this
example from RedLink SDK:

https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443

Hope that helps

On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <da...@gmail.com> wrote:

> The next step would be to pull this code into an svn branch.  This is
> something I can tackled after the 2.3 release candidate is put together.
>
> Thanks,
> Karl
>
>
> On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <dj...@zaizi.com>
> wrote:
>
> > Hi Rafa,
> >
> > Thanks for reviewing my code and for your feedback. Please see my
> comments
> > inline below.
> >
> >
> > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <rh...@apache.org> wrote:
> >
> > > Hi Dileepa,
> > >
> > > This seems to be going in the right direction clearly now in my
> opinion.
> > > Quick comments after a first review:
> > >
> > >
> > >    - Rejecting a document because it can't be enhanced is kind of
> tough.
> > >    You are preventing a document to be finally indexed because the
> > > enhancement
> > >    didn't perform correctly, probably it is better just to let them
> > > continue
> > >    the workflow within the system
> > >
> >
> > Got your point. Will remove that part from the code
> >
> >
> > >    - As I can deduce for the code, you are correctly extracting the
> > >    configured dereferenced fields, but you are not processing at all
> the
> > >    LDPath results
> > >
> > > I'm passing the LDPath program as an enhancer parameter to Stanbol to
> > retrieve the enhancement result according to the LDPath program (which is
> > given as a text string in the connector UI).
> > If the user has not defined a LDPath program and added derefence fields
> in
> > the UI instead, then the enhancement request will be built using the
> > dereference fields as enhancer parameters.
> >
> >
> > If neither a LDPath or dereference fields are given in the transformation
> > UI, then I just call the given enhancement chain without any other
> enhancer
> > paramaters.
> >
> > Please refer below code segment where I do this and let me know if it
> needs
> > more improvements.
> >
> >             // ldpath program is given priority if it's set
> >             if (ldPath != null)
> >             {
> >                 parameters =
> >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
> >             }
> >             else if (!derefFields.isEmpty())
> >             {
> >                 parameters =
> >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
> >                         derefFields.keySet()).build();
> >             }
> >             else
> >             {
> >                 parameters =
> > EnhancerParameters.builder().setChain(chain).setContent(content).build();
> >             }
> >             eRes = enhancerClient.enhance(parameters);
> >
> >
> > Thanks,
> > Dileepa
> >
> >
> > >
> > > Cheers,
> > > Rafa
> > >
> > >
> > >
> > >
> > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <dj...@zaizi.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > As per our discussion I have modified the Stanbol Connector so that
> it
> > > adds
> > > > all extracted entity URIs and entity attributes to the repository
> > > document
> > > > as fields.
> > > >
> > > > On a separate branch I have committed this code to our github project
> > > > sensefy-connectors.
> > > > You can find the source code here:
> > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > > > Let me know your feedback.
> > > >
> > > > I will write a blog post on how to add it in a connection and get
> > > > ehancement results and share it with you.
> > > >
> > > > Thanks,
> > > > Dileepa
> > > >
> > > >
> > > >
> > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <da...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Dileepa,
> > > > >
> > > > > You cannot create sub-documents in a transformation connector.  And
> > > > adding
> > > > > that capability to the framework is not possible; we would be
> missing
> > > key
> > > > > bookkeeping logic if that was allowed.
> > > > >
> > > > > Karl
> > > > >
> > > > >
> > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
> > djayakody@zaizi.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Karl,
> > > > > >
> > > > > > Thanks a lot for the pointer.
> > > > > >
> > > > > > Stanbol doesn't update an existing document, it generates a new
> > > > response
> > > > > > with requested enhancement details for the content enhansment
> > > request.
> > > > > > For example for a request like : "Paris is a city in France"
> > > following
> > > > > RDF
> > > > > > response [1] is given by Stanbol.
> > > > > >
> > > > > > In the Stanbol connector, enhancement artifacts such as
> > > TextAnnotations
> > > > > > and EntityAnnotations are extracted from the RDF response, to
> > > generate
> > > > > the
> > > > > > entity abstractions and add them to the mcf repository document.
> > > > > Currently
> > > > > > in the Stanbol connector we have added these entity abstractions
> as
> > > > JSON
> > > > > > strings to a multi-valued 'entities' field in the repository
> > document
> > > > and
> > > > > > we parse that JSON in the SolrWrapper output connector to index
> in
> > > > > separate
> > > > > > Solr cores (primary documents, linked entities and entity types
> > with
> > > > > their
> > > > > > attributes).
> > > > > >
> > > > > > Can we can have a primary repository document and create sub
> > > documents
> > > > > for
> > > > > > the extracted entities? Is it possible to generate sub documents
> > for
> > > a
> > > > > > repo-document in a transformation connector?
> > > > > >
> > > > > > Thanks.
> > > > > > Dileepa
> > > > > >
> > > > > > [1] Sample Stanbol response
> > > > > >
> > > > > > {
> > > > > >   "@context": {
> > > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > > > >     "dc": "http://purl.org/dc/terms/",
> > > > > >     "dc:created": {
> > > > > >       "@type": "xsd:dateTime"
> > > > > >     },
> > > > > >     "enhancer": "http://fise.iks-project.eu/ontology/",
> > > > > >     "enhancer:confidence": {
> > > > > >       "@type": "xsd:double"
> > > > > >     },
> > > > > >     "enhancer:end": {
> > > > > >       "@type": "xsd:int"
> > > > > >     },
> > > > > >     "enhancer:entity-reference": {
> > > > > >       "@type": "@id"
> > > > > >     },
> > > > > >     "enhancer:entity-type": {
> > > > > >       "@type": "@id"
> > > > > >     },
> > > > > >     "enhancer:extracted-from": {
> > > > > >       "@type": "@id"
> > > > > >     },
> > > > > >     "enhancer:start": {
> > > > > >       "@type": "xsd:int"
> > > > > >     },
> > > > > >     "entityhub": "
> > > > > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > > > ",
> > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > > > >     "foaf:depiction": {
> > > > > >       "@type": "@id"
> > > > > >     },
> > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > > > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > > > > >     "schema": "http://schema.org/",
> > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > > > >   },
> > > > > >   "@graph": [
> > > > > >     {
> > > > > >       "@id": "http://dbpedia.org/resource/France",
> > > > > >       "@type": [
> > > > > >         "dbp-ont:Country",
> > > > > >         "dbp-ont:Place",
> > > > > >         "dbp-ont:PopulatedPlace",
> > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > >         "owl:Thing",
> > > > > >         "schema:Country",
> > > > > >         "schema:Place"
> > > > > >       ],
> > > > > >       "foaf:depiction": [
> > > > > >         "
> > > > > >
> > > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > > > ",
> > > > > >         "
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > > > "
> > > > > >       ],
> > > > > >       "rdfs:comment": {
> > > > > >         "@language": "en",
> > > > > >         "@value": "France, officially the French Republic, is a
> > > > > > unitary semi-presidential republic in Western Europe with several
> > > > > > overseas territories and islands located on other continents and
> in
> > > > > > the Indian, Pacific, and Atlantic oceans. Metropolitan France
> > extends
> > > > > > from the Mediterranean Sea to the English Channel and the North
> > Sea,
> > > > > > and from the Rhine to the Atlantic Ocean. It is often referred to
> > as
> > > > > > l’Hexagone because of the geometric shape of its territory."
> > > > > >       },
> > > > > >       "rdfs:label": [
> > > > > >         {
> > > > > >           "@language": "en",
> > > > > >           "@value": "France"
> > > > > >         },
> > > > > >         {
> > > > > >           "@language": "fr",
> > > > > >           "@value": "France"
> > > > > >         },
> > > > > >       ]
> > > > > >     },
> > > > > >
> > > > > >     {
> > > > > >       "@id": "http://dbpedia.org/resource/Paris",
> > > > > >       "@type": [
> > > > > >         "dbp-ont:Place",
> > > > > >         "dbp-ont:PopulatedPlace",
> > > > > >         "dbp-ont:Settlement",
> > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > >         "owl:Thing",
> > > > > >         "schema:Place"
> > > > > >       ],
> > > > > >       "foaf:depiction": [
> > > > > >         "
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > ",
> > > > > >         "
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > "
> > > > > >       ],
> > > > > >       "geo:lat": 48.8567,
> > > > > >       "geo:long": 2.3508,
> > > > > >       "rdfs:comment": {
> > > > > >         "@language": "en",
> > > > > >         "@value": "Paris is the capital and largest city of
> France.
> > > It
> > > > > > is situated on the river Seine, in northern France, at the heart
> of
> > > > > > the Île-de-France region (or Paris Region, French: Région
> > > parisienne).
> > > > > > As of January 2008 the city of Paris, within its administrative
> > > limits
> > > > > > largely unchanged since 1860, has an estimated population of
> > > 2,211,297
> > > > > > and a metropolitan population of 12,089,098, and is one of the
> most
> > > > > > populated metropolitan areas in Europe."
> > > > > >       },
> > > > > >       "rdfs:label": [
> > > > > >
> > > > > >         {
> > > > > >           "@language": "en",
> > > > > >           "@value": "Paris"
> > > > > >         },
> > > > > >         {
> > > > > >           "@language": "fr",
> > > > > >           "@value": "Paris"
> > > > > >         },
> > > > > >       ]
> > > > > >     },
> > > > > >    }
> > > > > >     {
> > > > > >       "@id":
> > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > >       "@type": [
> > > > > >         "enhancer:Enhancement",
> > > > > >         "enhancer:TextAnnotation"
> > > > > >       ],
> > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > >       "dc:creator":
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > >       "dc:type": "dbp-ont:Place",
> > > > > >       "enhancer:confidence": 0.6017613,
> > > > > >       "enhancer:end": 5,
> > > > > >       "enhancer:extracted-from":
> > > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > >       "enhancer:selected-text": {
> > > > > >         "@language": "en",
> > > > > >         "@value": "Paris"
> > > > > >       },
> > > > > >       "enhancer:selection-context": {
> > > > > >         "@language": "en",
> > > > > >         "@value": "Paris is in France"
> > > > > >       },
> > > > > >       "enhancer:start": 0
> > > > > >     },
> > > > > >     {
> > > > > >       "@id":
> > "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > > > >       "@type": [
> > > > > >         "enhancer:Enhancement",
> > > > > >         "enhancer:EntityAnnotation"
> > > > > >       ],
> > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > >       "dc:creator":
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > >       "dc:relation":
> > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > >       "enhancer:confidence": 1.0,
> > > > > >       "enhancer:entity-label": {
> > > > > >         "@language": "en",
> > > > > >         "@value": "France"
> > > > > >       },
> > > > > >       "enhancer:entity-reference": "
> > > http://dbpedia.org/resource/France
> > > > ",
> > > > > >       "enhancer:entity-type": [
> > > > > >         "dbp-ont:Country",
> > > > > >         "dbp-ont:Place",
> > > > > >         "dbp-ont:PopulatedPlace",
> > > > > >         "schema:Country",
> > > > > >         "schema:Place",
> > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > >         "owl:Thing"
> > > > > >       ],
> > > > > >       "enhancer:extracted-from":
> > > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > >       "entityhub:site": "dbpedia"
> > > > > >     },
> > > > > >     {
> > > > > >       "@id":
> > "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > > > >       "@type": [
> > > > > >         "enhancer:Enhancement",
> > > > > >         "enhancer:EntityAnnotation"
> > > > > >       ],
> > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > >       "dc:creator":
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > >       "dc:relation":
> > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > >       "enhancer:confidence": 0.25715446,
> > > > > >       "enhancer:entity-label": {
> > > > > >         "@language": "en",
> > > > > >         "@value": "Vichy France"
> > > > > >       },
> > > > > >       "enhancer:entity-reference": "
> > > > > > http://dbpedia.org/resource/Vichy_France",
> > > > > >       "enhancer:entity-type": [
> > > > > >         "dbp-ont:Country",
> > > > > >         "dbp-ont:Place",
> > > > > >         "dbp-ont:PopulatedPlace",
> > > > > >         "schema:Country",
> > > > > >         "schema:Place",
> > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > >         "owl:Thing"
> > > > > >       ],
> > > > > >       "enhancer:extracted-from":
> > > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > >       "entityhub:site": "dbpedia"
> > > > > >     },
> > > > > >     {
> > > > > >       "@id":
> > "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > > > >       "@type": [
> > > > > >         "enhancer:Enhancement",
> > > > > >         "enhancer:EntityAnnotation"
> > > > > >       ],
> > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > >       "dc:creator":
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > >       "dc:relation":
> > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > >       "enhancer:confidence": 0.1493264,
> > > > > >       "enhancer:entity-label": {
> > > > > >         "@language": "en",
> > > > > >         "@value": "Paris Commune"
> > > > > >       },
> > > > > >       "enhancer:entity-reference": "
> > > > > > http://dbpedia.org/resource/Paris_Commune",
> > > > > >       "enhancer:entity-type": [
> > > > > >         "dbp-ont:Country",
> > > > > >         "dbp-ont:Place",
> > > > > >         "dbp-ont:PopulatedPlace",
> > > > > >         "schema:Country",
> > > > > >         "schema:Place",
> > > > > >         "owl:Thing"
> > > > > >       ],
> > > > > >       "enhancer:extracted-from":
> > > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > >       "entityhub:site": "dbpedia"
> > > > > >     },
> > > > > >     {
> > > > > >       "@id":
> > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > >       "@type": [
> > > > > >         "enhancer:Enhancement",
> > > > > >         "enhancer:TextAnnotation"
> > > > > >       ],
> > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > >       "dc:creator":
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > >       "dc:type": "dbp-ont:Place",
> > > > > >       "enhancer:confidence": 0.99354976,
> > > > > >       "enhancer:end": 18,
> > > > > >       "enhancer:extracted-from":
> > > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > >       "enhancer:selected-text": {
> > > > > >         "@language": "en",
> > > > > >         "@value": "France"
> > > > > >       },
> > > > > >       "enhancer:selection-context": {
> > > > > >         "@language": "en",
> > > > > >         "@value": "Paris is in France"
> > > > > >       },
> > > > > >       "enhancer:start": 12
> > > > > >     }
> > > > > >   ]
> > > > > > }
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <da...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi Dileepa,
> > > > > > >
> > > > > > > Repository connectors have an abstraction that allows them to
> > > > generate
> > > > > > > compound documents (where a document has a primary identifier,
> > and
> > > > > there
> > > > > > > are subdocuments that share that primary identifier and have a
> > > > > secondary
> > > > > > > identifier).  This sounds a bit like what you are describing.
> > Does
> > > > > > Stanbol
> > > > > > > work by decorating an existing document, or does it work by
> > > > generating
> > > > > > all
> > > > > > > content for a document?
> > > > > > >
> > > > > > > Karl
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> > > > djayakody@zaizi.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > >
> > > > > > > > While thanking you all for your input on Stanbol connector
> > > > > > requirement, I
> > > > > > > > would like to continue with modifying the Stanbol connector
> to
> > be
> > > > > > > > compatible with any output connector. If you guys can give
> some
> > > > > > guidance
> > > > > > > on
> > > > > > > > how the entity metadata should be added to the repository
> > > document
> > > > I
> > > > > > can
> > > > > > > > modify the stanbol connector accordingly.
> > > > > > > >
> > > > > > > > From Rafa's comments, I gathered we can add the entity
> metadata
> > > to
> > > > > the
> > > > > > > > repo.doc as key value pairs.
> > > > > > > > However this idea is not yet clear to me. There could be 'N'
> > > number
> > > > > of
> > > > > > > > entities in a document and each of them will have some common
> > > > > > attributes
> > > > > > > > such as name, id, type and specific attributes for particular
> > > > entity
> > > > > > > type.
> > > > > > > > I'm not clear on how to maintain that structure of N number
> of
> > > > > entities
> > > > > > > > with their attributes in a repo.document as key value pairs
> and
> > > > make
> > > > > > them
> > > > > > > > LDPath compatible for retrieval in an output connector.
> > > > > > > >
> > > > > > > > @Rafa
> > > > > > > > If you can please elaborate on your suggestion it would be
> > > greatly
> > > > > > > helpful
> > > > > > > > to me.
> > > > > > > > All other suggestions are also welcome.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Dileepa
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <
> > daddywri@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > I, too, agree.  Somebody will need to turn this connector
> > into
> > > > one
> > > > > > that
> > > > > > > > > plays by the rules.  It may be possible for someone on the
> > team
> > > > > here
> > > > > > to
> > > > > > > > do
> > > > > > > > > that, but it won't be me; I'm seriously overextended at the
> > > > moment.
> > > > > > It
> > > > > > > > > would be best if someone who knew the connector well could
> do
> > > the
> > > > > > > > necessary
> > > > > > > > > work.
> > > > > > > > >
> > > > > > > > > Karl
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> > > > rharoapache@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I must agree with Antonio. When I started to work on
> this I
> > > was
> > > > > > > > expecting
> > > > > > > > > > the connector to work by just extracting the entities and
> > > > > entities
> > > > > > > > > metadata
> > > > > > > > > > and put them as plain metadata of the documents, probably
> > > > > following
> > > > > > > > > LDPATH
> > > > > > > > > > queries configuration
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This is probably ok for Sensefy but I don’t think this
> > could
> > > be
> > > > > > > > suitable
> > > > > > > > > > to be included in the project. But this is only my
> opinion.
> > > Of
> > > > > > > course,
> > > > > > > > a
> > > > > > > > > > version of the connector that fully respect the
> ManifoldCF
> > > > > > > architecture
> > > > > > > > > > would be more than welcome in my opinion
> > > > > > > > > >
> > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez
> > Morales
> > > > > > > > > > <ad...@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi
> > > > > > > > > > > The removal of the SolrWrapper is a must. It was a
> > > > requirement
> > > > > > for
> > > > > > > an
> > > > > > > > > > > internal project which has nothing to do here with a
> > normal
> > > > > > > operation
> > > > > > > > > of
> > > > > > > > > > > Manifold, so forcing the users to use Solr does not fit
> > the
> > > > > > > Manifold
> > > > > > > > > > > philosophy.
> > > > > > > > > > > In my opinion, at this moment, a Stanbol connector with
> > > such
> > > > a
> > > > > > big
> > > > > > > > > > > dependency which will not fit almost any use case is
> not
> > > very
> > > > > > > useful.
> > > > > > > > > > > You should think a way to convert Stanbol connector
> into
> > a
> > > > > normal
> > > > > > > > > > > Transformation connector without assuming that a
> specific
> > > > > output
> > > > > > > > > > connector
> > > > > > > > > > > will be used.
> > > > > > > > > > > Regards
> > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> > > > > djayakody@zaizi.com
> > > > > > >:
> > > > > > > > > > >> Hi guys,
> > > > > > > > > > >>
> > > > > > > > > > >> I have developed a Stanbol connector for MCF. You can
> > > check
> > > > it
> > > > > > out
> > > > > > > > > from
> > > > > > > > > > our
> > > > > > > > > > >> github repo here:
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > > > >>
> > > > > > > > > > >> It requires the SolrWrapper output connector which
> > indexes
> > > > > > > enhanced
> > > > > > > > > > >> documents, entities and entityTypes in separate Solr
> > > cores.
> > > > > > > > Basically
> > > > > > > > > it
> > > > > > > > > > >> requires 3 separate solr cores configured with a
> > specific
> > > > Solr
> > > > > > > > schema
> > > > > > > > > > for
> > > > > > > > > > >> primary documents, entities and entityTypes
> separately.
> > > This
> > > > > was
> > > > > > > > done
> > > > > > > > > > for
> > > > > > > > > > >> our specific use-case.
> > > > > > > > > > >>
> > > > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > > > >>
> > > > > > > > > > >> Perhaps we can discuss and remove the Stanbol
> > connector's
> > > > > > > dependency
> > > > > > > > > > with
> > > > > > > > > > >> SolrWrapper and have it working with any output
> > connector.
> > > > > > > > > > >> Please note that the Stanbol connector currently has a
> > bug
> > > > in
> > > > > > the
> > > > > > > UI
> > > > > > > > > > >> (editSpecification) which I'm working on at the
> moment.
> > > > After
> > > > > > > fixing
> > > > > > > > > > that I
> > > > > > > > > > >> will update here. And also I will provide
> documentations
> > > for
> > > > > > > > > configuring
> > > > > > > > > > >> the connector.
> > > > > > > > > > >>
> > > > > > > > > > >> Thanks,
> > > > > > > > > > >> Dileepa
> > > > > > > > > > >>
> > > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez
> > > Morales
> > > > <
> > > > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> > Hi Joshua
> > > > > > > > > > >> >
> > > > > > > > > > >> > It is not the list for that, but Marmotta is already
> > > > > > integrated
> > > > > > > in
> > > > > > > > > > Apache
> > > > > > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > > > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165
> .
> > > > > > > > > > >> >
> > > > > > > > > > >> > Anyway, as I said this is not the list for that, so
> > > let's
> > > > > use
> > > > > > > the
> > > > > > > > > > proper
> > > > > > > > > > >> > list for these things.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Regards
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > > > > > joshua.dunham@gmail.com
> > > > > > > > >:
> > > > > > > > > > >> >
> > > > > > > > > > >> > > Hey Dileepa,
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >       In case you were interested, I pinged the
> > list a
> > > > few
> > > > > > > days
> > > > > > > > > ago
> > > > > > > > > > >> > asking
> > > > > > > > > > >> > > for integration tips for Apache Marmotta.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > I got some great tips on how to do this which
> could
> > > help
> > > > > > you.
> > > > > > > > > Since
> > > > > > > > > > >> > > Marmotta is a drop in replacement for Clarezza on
> > > > Stanbol
> > > > > it
> > > > > > > may
> > > > > > > > > be
> > > > > > > > > > >> > easier
> > > > > > > > > > >> > > for you to take this way.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > I'm not a Java programmer but I'm bringing this
> > > problem
> > > > to
> > > > > > the
> > > > > > > > > > >> > development
> > > > > > > > > > >> > > staff at my company for assistance. If you like
> the
> > > > > Marmotta
> > > > > > > > > > approach
> > > > > > > > > > >> we
> > > > > > > > > > >> > > may gain more traction solving the same
> integration.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > I'm also integrating Marmotta with Stanbol so the
> > > effect
> > > > > > would
> > > > > > > > be
> > > > > > > > > > the
> > > > > > > > > > >> > same
> > > > > > > > > > >> > > except not using the Stanbol API for data import
> in
> > > > favor
> > > > > of
> > > > > > > > > > Marmotta.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Best,
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > -J
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > >
> > > > > > > > > > >> > > wrote:
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Thanks you for the feedback and offering your
> help
> > > in
> > > > > > this.
> > > > > > > > > > >> > > > Let me get back to you on where to start the
> code
> > > > base.
> > > > > > > > > > >> > > > As the first step, I would like to start by
> > > creating a
> > > > > > > > > > architecture
> > > > > > > > > > >> > > diagram
> > > > > > > > > > >> > > > for the connector.
> > > > > > > > > > >> > > > I will send the diagram for your review soon.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > >> > > > Dileepa
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > --
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > ------------------------------
> > > > > > > > > > >> > > > This message should be regarded as confidential.
> > If
> > > > you
> > > > > > have
> > > > > > > > > > received
> > > > > > > > > > >> > > this
> > > > > > > > > > >> > > > email in error please notify the sender and
> > destroy
> > > it
> > > > > > > > > > immediately.
> > > > > > > > > > >> > > > Statements of intent shall only become binding
> > when
> > > > > > > confirmed
> > > > > > > > in
> > > > > > > > > > hard
> > > > > > > > > > >> > > copy
> > > > > > > > > > >> > > > by an authorised signatory.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Zaizi Ltd is registered in England and Wales
> with
> > > the
> > > > > > > > > registration
> > > > > > > > > > >> > number
> > > > > > > > > > >> > > > 6440931. The Registered Office is Brook House,
> 229
> > > > > > Shepherds
> > > > > > > > > Bush
> > > > > > > > > > >> Road,
> > > > > > > > > > >> > > > London W6 7AN.
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >> --
> > > > > > > > > > >>
> > > > > > > > > > >> ------------------------------
> > > > > > > > > > >> This message should be regarded as confidential. If
> you
> > > have
> > > > > > > > received
> > > > > > > > > > this
> > > > > > > > > > >> email in error please notify the sender and destroy it
> > > > > > > immediately.
> > > > > > > > > > >> Statements of intent shall only become binding when
> > > > confirmed
> > > > > in
> > > > > > > > hard
> > > > > > > > > > copy
> > > > > > > > > > >> by an authorised signatory.
> > > > > > > > > > >>
> > > > > > > > > > >> Zaizi Ltd is registered in England and Wales with the
> > > > > > registration
> > > > > > > > > > number
> > > > > > > > > > >> 6440931. The Registered Office is Brook House, 229
> > > Shepherds
> > > > > > Bush
> > > > > > > > > Road,
> > > > > > > > > > >> London W6 7AN.
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > ------------------------------
> > > > > > > > This message should be regarded as confidential. If you have
> > > > received
> > > > > > > this
> > > > > > > > email in error please notify the sender and destroy it
> > > immediately.
> > > > > > > > Statements of intent shall only become binding when confirmed
> > in
> > > > hard
> > > > > > > copy
> > > > > > > > by an authorised signatory.
> > > > > > > >
> > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > registration
> > > > > > number
> > > > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> > Bush
> > > > > Road,
> > > > > > > > London W6 7AN.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > ------------------------------
> > > > > > This message should be regarded as confidential. If you have
> > received
> > > > > this
> > > > > > email in error please notify the sender and destroy it
> immediately.
> > > > > > Statements of intent shall only become binding when confirmed in
> > hard
> > > > > copy
> > > > > > by an authorised signatory.
> > > > > >
> > > > > > Zaizi Ltd is registered in England and Wales with the
> registration
> > > > number
> > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > > Road,
> > > > > > London W6 7AN.
> > > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > > ------------------------------
> > > > This message should be regarded as confidential. If you have received
> > > this
> > > > email in error please notify the sender and destroy it immediately.
> > > > Statements of intent shall only become binding when confirmed in hard
> > > copy
> > > > by an authorised signatory.
> > > >
> > > > Zaizi Ltd is registered in England and Wales with the registration
> > number
> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> Road,
> > > > London W6 7AN.
> > > >
> > >
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
> >
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Karl Wright <da...@gmail.com>.
The next step would be to pull this code into an svn branch.  This is
something I can tackled after the 2.3 release candidate is put together.

Thanks,
Karl


On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <dj...@zaizi.com>
wrote:

> Hi Rafa,
>
> Thanks for reviewing my code and for your feedback. Please see my comments
> inline below.
>
>
> On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <rh...@apache.org> wrote:
>
> > Hi Dileepa,
> >
> > This seems to be going in the right direction clearly now in my opinion.
> > Quick comments after a first review:
> >
> >
> >    - Rejecting a document because it can't be enhanced is kind of tough.
> >    You are preventing a document to be finally indexed because the
> > enhancement
> >    didn't perform correctly, probably it is better just to let them
> > continue
> >    the workflow within the system
> >
>
> Got your point. Will remove that part from the code
>
>
> >    - As I can deduce for the code, you are correctly extracting the
> >    configured dereferenced fields, but you are not processing at all the
> >    LDPath results
> >
> > I'm passing the LDPath program as an enhancer parameter to Stanbol to
> retrieve the enhancement result according to the LDPath program (which is
> given as a text string in the connector UI).
> If the user has not defined a LDPath program and added derefence fields in
> the UI instead, then the enhancement request will be built using the
> dereference fields as enhancer parameters.
>
>
> If neither a LDPath or dereference fields are given in the transformation
> UI, then I just call the given enhancement chain without any other enhancer
> paramaters.
>
> Please refer below code segment where I do this and let me know if it needs
> more improvements.
>
>             // ldpath program is given priority if it's set
>             if (ldPath != null)
>             {
>                 parameters =
>
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
>             }
>             else if (!derefFields.isEmpty())
>             {
>                 parameters =
>
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
>                         derefFields.keySet()).build();
>             }
>             else
>             {
>                 parameters =
> EnhancerParameters.builder().setChain(chain).setContent(content).build();
>             }
>             eRes = enhancerClient.enhance(parameters);
>
>
> Thanks,
> Dileepa
>
>
> >
> > Cheers,
> > Rafa
> >
> >
> >
> >
> > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <dj...@zaizi.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > As per our discussion I have modified the Stanbol Connector so that it
> > adds
> > > all extracted entity URIs and entity attributes to the repository
> > document
> > > as fields.
> > >
> > > On a separate branch I have committed this code to our github project
> > > sensefy-connectors.
> > > You can find the source code here:
> > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > > Let me know your feedback.
> > >
> > > I will write a blog post on how to add it in a connection and get
> > > ehancement results and share it with you.
> > >
> > > Thanks,
> > > Dileepa
> > >
> > >
> > >
> > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <da...@gmail.com>
> wrote:
> > >
> > > > Hi Dileepa,
> > > >
> > > > You cannot create sub-documents in a transformation connector.  And
> > > adding
> > > > that capability to the framework is not possible; we would be missing
> > key
> > > > bookkeeping logic if that was allowed.
> > > >
> > > > Karl
> > > >
> > > >
> > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
> djayakody@zaizi.com>
> > > > wrote:
> > > >
> > > > > Hi Karl,
> > > > >
> > > > > Thanks a lot for the pointer.
> > > > >
> > > > > Stanbol doesn't update an existing document, it generates a new
> > > response
> > > > > with requested enhancement details for the content enhansment
> > request.
> > > > > For example for a request like : "Paris is a city in France"
> > following
> > > > RDF
> > > > > response [1] is given by Stanbol.
> > > > >
> > > > > In the Stanbol connector, enhancement artifacts such as
> > TextAnnotations
> > > > > and EntityAnnotations are extracted from the RDF response, to
> > generate
> > > > the
> > > > > entity abstractions and add them to the mcf repository document.
> > > > Currently
> > > > > in the Stanbol connector we have added these entity abstractions as
> > > JSON
> > > > > strings to a multi-valued 'entities' field in the repository
> document
> > > and
> > > > > we parse that JSON in the SolrWrapper output connector to index in
> > > > separate
> > > > > Solr cores (primary documents, linked entities and entity types
> with
> > > > their
> > > > > attributes).
> > > > >
> > > > > Can we can have a primary repository document and create sub
> > documents
> > > > for
> > > > > the extracted entities? Is it possible to generate sub documents
> for
> > a
> > > > > repo-document in a transformation connector?
> > > > >
> > > > > Thanks.
> > > > > Dileepa
> > > > >
> > > > > [1] Sample Stanbol response
> > > > >
> > > > > {
> > > > >   "@context": {
> > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > > >     "dc": "http://purl.org/dc/terms/",
> > > > >     "dc:created": {
> > > > >       "@type": "xsd:dateTime"
> > > > >     },
> > > > >     "enhancer": "http://fise.iks-project.eu/ontology/",
> > > > >     "enhancer:confidence": {
> > > > >       "@type": "xsd:double"
> > > > >     },
> > > > >     "enhancer:end": {
> > > > >       "@type": "xsd:int"
> > > > >     },
> > > > >     "enhancer:entity-reference": {
> > > > >       "@type": "@id"
> > > > >     },
> > > > >     "enhancer:entity-type": {
> > > > >       "@type": "@id"
> > > > >     },
> > > > >     "enhancer:extracted-from": {
> > > > >       "@type": "@id"
> > > > >     },
> > > > >     "enhancer:start": {
> > > > >       "@type": "xsd:int"
> > > > >     },
> > > > >     "entityhub": "
> > > > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > > ",
> > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > > >     "foaf:depiction": {
> > > > >       "@type": "@id"
> > > > >     },
> > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > > > >     "schema": "http://schema.org/",
> > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > > >   },
> > > > >   "@graph": [
> > > > >     {
> > > > >       "@id": "http://dbpedia.org/resource/France",
> > > > >       "@type": [
> > > > >         "dbp-ont:Country",
> > > > >         "dbp-ont:Place",
> > > > >         "dbp-ont:PopulatedPlace",
> > > > >         "http://www.opengis.net/gml/_Feature",
> > > > >         "owl:Thing",
> > > > >         "schema:Country",
> > > > >         "schema:Place"
> > > > >       ],
> > > > >       "foaf:depiction": [
> > > > >         "
> > > > >
> > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > > ",
> > > > >         "
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > > "
> > > > >       ],
> > > > >       "rdfs:comment": {
> > > > >         "@language": "en",
> > > > >         "@value": "France, officially the French Republic, is a
> > > > > unitary semi-presidential republic in Western Europe with several
> > > > > overseas territories and islands located on other continents and in
> > > > > the Indian, Pacific, and Atlantic oceans. Metropolitan France
> extends
> > > > > from the Mediterranean Sea to the English Channel and the North
> Sea,
> > > > > and from the Rhine to the Atlantic Ocean. It is often referred to
> as
> > > > > l’Hexagone because of the geometric shape of its territory."
> > > > >       },
> > > > >       "rdfs:label": [
> > > > >         {
> > > > >           "@language": "en",
> > > > >           "@value": "France"
> > > > >         },
> > > > >         {
> > > > >           "@language": "fr",
> > > > >           "@value": "France"
> > > > >         },
> > > > >       ]
> > > > >     },
> > > > >
> > > > >     {
> > > > >       "@id": "http://dbpedia.org/resource/Paris",
> > > > >       "@type": [
> > > > >         "dbp-ont:Place",
> > > > >         "dbp-ont:PopulatedPlace",
> > > > >         "dbp-ont:Settlement",
> > > > >         "http://www.opengis.net/gml/_Feature",
> > > > >         "owl:Thing",
> > > > >         "schema:Place"
> > > > >       ],
> > > > >       "foaf:depiction": [
> > > > >         "
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > ",
> > > > >         "
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > "
> > > > >       ],
> > > > >       "geo:lat": 48.8567,
> > > > >       "geo:long": 2.3508,
> > > > >       "rdfs:comment": {
> > > > >         "@language": "en",
> > > > >         "@value": "Paris is the capital and largest city of France.
> > It
> > > > > is situated on the river Seine, in northern France, at the heart of
> > > > > the Île-de-France region (or Paris Region, French: Région
> > parisienne).
> > > > > As of January 2008 the city of Paris, within its administrative
> > limits
> > > > > largely unchanged since 1860, has an estimated population of
> > 2,211,297
> > > > > and a metropolitan population of 12,089,098, and is one of the most
> > > > > populated metropolitan areas in Europe."
> > > > >       },
> > > > >       "rdfs:label": [
> > > > >
> > > > >         {
> > > > >           "@language": "en",
> > > > >           "@value": "Paris"
> > > > >         },
> > > > >         {
> > > > >           "@language": "fr",
> > > > >           "@value": "Paris"
> > > > >         },
> > > > >       ]
> > > > >     },
> > > > >    }
> > > > >     {
> > > > >       "@id":
> "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > >       "@type": [
> > > > >         "enhancer:Enhancement",
> > > > >         "enhancer:TextAnnotation"
> > > > >       ],
> > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > >       "dc:creator":
> > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > >       "dc:type": "dbp-ont:Place",
> > > > >       "enhancer:confidence": 0.6017613,
> > > > >       "enhancer:end": 5,
> > > > >       "enhancer:extracted-from":
> > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > >       "enhancer:selected-text": {
> > > > >         "@language": "en",
> > > > >         "@value": "Paris"
> > > > >       },
> > > > >       "enhancer:selection-context": {
> > > > >         "@language": "en",
> > > > >         "@value": "Paris is in France"
> > > > >       },
> > > > >       "enhancer:start": 0
> > > > >     },
> > > > >     {
> > > > >       "@id":
> "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > > >       "@type": [
> > > > >         "enhancer:Enhancement",
> > > > >         "enhancer:EntityAnnotation"
> > > > >       ],
> > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > >       "dc:creator":
> > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > >       "dc:relation":
> > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > >       "enhancer:confidence": 1.0,
> > > > >       "enhancer:entity-label": {
> > > > >         "@language": "en",
> > > > >         "@value": "France"
> > > > >       },
> > > > >       "enhancer:entity-reference": "
> > http://dbpedia.org/resource/France
> > > ",
> > > > >       "enhancer:entity-type": [
> > > > >         "dbp-ont:Country",
> > > > >         "dbp-ont:Place",
> > > > >         "dbp-ont:PopulatedPlace",
> > > > >         "schema:Country",
> > > > >         "schema:Place",
> > > > >         "http://www.opengis.net/gml/_Feature",
> > > > >         "owl:Thing"
> > > > >       ],
> > > > >       "enhancer:extracted-from":
> > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > >       "entityhub:site": "dbpedia"
> > > > >     },
> > > > >     {
> > > > >       "@id":
> "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > > >       "@type": [
> > > > >         "enhancer:Enhancement",
> > > > >         "enhancer:EntityAnnotation"
> > > > >       ],
> > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > >       "dc:creator":
> > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > >       "dc:relation":
> > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > >       "enhancer:confidence": 0.25715446,
> > > > >       "enhancer:entity-label": {
> > > > >         "@language": "en",
> > > > >         "@value": "Vichy France"
> > > > >       },
> > > > >       "enhancer:entity-reference": "
> > > > > http://dbpedia.org/resource/Vichy_France",
> > > > >       "enhancer:entity-type": [
> > > > >         "dbp-ont:Country",
> > > > >         "dbp-ont:Place",
> > > > >         "dbp-ont:PopulatedPlace",
> > > > >         "schema:Country",
> > > > >         "schema:Place",
> > > > >         "http://www.opengis.net/gml/_Feature",
> > > > >         "owl:Thing"
> > > > >       ],
> > > > >       "enhancer:extracted-from":
> > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > >       "entityhub:site": "dbpedia"
> > > > >     },
> > > > >     {
> > > > >       "@id":
> "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > > >       "@type": [
> > > > >         "enhancer:Enhancement",
> > > > >         "enhancer:EntityAnnotation"
> > > > >       ],
> > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > >       "dc:creator":
> > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > >       "dc:relation":
> > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > >       "enhancer:confidence": 0.1493264,
> > > > >       "enhancer:entity-label": {
> > > > >         "@language": "en",
> > > > >         "@value": "Paris Commune"
> > > > >       },
> > > > >       "enhancer:entity-reference": "
> > > > > http://dbpedia.org/resource/Paris_Commune",
> > > > >       "enhancer:entity-type": [
> > > > >         "dbp-ont:Country",
> > > > >         "dbp-ont:Place",
> > > > >         "dbp-ont:PopulatedPlace",
> > > > >         "schema:Country",
> > > > >         "schema:Place",
> > > > >         "owl:Thing"
> > > > >       ],
> > > > >       "enhancer:extracted-from":
> > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > >       "entityhub:site": "dbpedia"
> > > > >     },
> > > > >     {
> > > > >       "@id":
> "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > >       "@type": [
> > > > >         "enhancer:Enhancement",
> > > > >         "enhancer:TextAnnotation"
> > > > >       ],
> > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > >       "dc:creator":
> > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > >       "dc:type": "dbp-ont:Place",
> > > > >       "enhancer:confidence": 0.99354976,
> > > > >       "enhancer:end": 18,
> > > > >       "enhancer:extracted-from":
> > > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > >       "enhancer:selected-text": {
> > > > >         "@language": "en",
> > > > >         "@value": "France"
> > > > >       },
> > > > >       "enhancer:selection-context": {
> > > > >         "@language": "en",
> > > > >         "@value": "Paris is in France"
> > > > >       },
> > > > >       "enhancer:start": 12
> > > > >     }
> > > > >   ]
> > > > > }
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <da...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Dileepa,
> > > > > >
> > > > > > Repository connectors have an abstraction that allows them to
> > > generate
> > > > > > compound documents (where a document has a primary identifier,
> and
> > > > there
> > > > > > are subdocuments that share that primary identifier and have a
> > > > secondary
> > > > > > identifier).  This sounds a bit like what you are describing.
> Does
> > > > > Stanbol
> > > > > > work by decorating an existing document, or does it work by
> > > generating
> > > > > all
> > > > > > content for a document?
> > > > > >
> > > > > > Karl
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> > > djayakody@zaizi.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > >
> > > > > > > While thanking you all for your input on Stanbol connector
> > > > > requirement, I
> > > > > > > would like to continue with modifying the Stanbol connector to
> be
> > > > > > > compatible with any output connector. If you guys can give some
> > > > > guidance
> > > > > > on
> > > > > > > how the entity metadata should be added to the repository
> > document
> > > I
> > > > > can
> > > > > > > modify the stanbol connector accordingly.
> > > > > > >
> > > > > > > From Rafa's comments, I gathered we can add the entity metadata
> > to
> > > > the
> > > > > > > repo.doc as key value pairs.
> > > > > > > However this idea is not yet clear to me. There could be 'N'
> > number
> > > > of
> > > > > > > entities in a document and each of them will have some common
> > > > > attributes
> > > > > > > such as name, id, type and specific attributes for particular
> > > entity
> > > > > > type.
> > > > > > > I'm not clear on how to maintain that structure of N number of
> > > > entities
> > > > > > > with their attributes in a repo.document as key value pairs and
> > > make
> > > > > them
> > > > > > > LDPath compatible for retrieval in an output connector.
> > > > > > >
> > > > > > > @Rafa
> > > > > > > If you can please elaborate on your suggestion it would be
> > greatly
> > > > > > helpful
> > > > > > > to me.
> > > > > > > All other suggestions are also welcome.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dileepa
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <
> daddywri@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > I, too, agree.  Somebody will need to turn this connector
> into
> > > one
> > > > > that
> > > > > > > > plays by the rules.  It may be possible for someone on the
> team
> > > > here
> > > > > to
> > > > > > > do
> > > > > > > > that, but it won't be me; I'm seriously overextended at the
> > > moment.
> > > > > It
> > > > > > > > would be best if someone who knew the connector well could do
> > the
> > > > > > > necessary
> > > > > > > > work.
> > > > > > > >
> > > > > > > > Karl
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> > > rharoapache@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I must agree with Antonio. When I started to work on this I
> > was
> > > > > > > expecting
> > > > > > > > > the connector to work by just extracting the entities and
> > > > entities
> > > > > > > > metadata
> > > > > > > > > and put them as plain metadata of the documents, probably
> > > > following
> > > > > > > > LDPATH
> > > > > > > > > queries configuration
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > This is probably ok for Sensefy but I don’t think this
> could
> > be
> > > > > > > suitable
> > > > > > > > > to be included in the project. But this is only my opinion.
> > Of
> > > > > > course,
> > > > > > > a
> > > > > > > > > version of the connector that fully respect the ManifoldCF
> > > > > > architecture
> > > > > > > > > would be more than welcome in my opinion
> > > > > > > > >
> > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez
> Morales
> > > > > > > > > <ad...@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > Hi
> > > > > > > > > > The removal of the SolrWrapper is a must. It was a
> > > requirement
> > > > > for
> > > > > > an
> > > > > > > > > > internal project which has nothing to do here with a
> normal
> > > > > > operation
> > > > > > > > of
> > > > > > > > > > Manifold, so forcing the users to use Solr does not fit
> the
> > > > > > Manifold
> > > > > > > > > > philosophy.
> > > > > > > > > > In my opinion, at this moment, a Stanbol connector with
> > such
> > > a
> > > > > big
> > > > > > > > > > dependency which will not fit almost any use case is not
> > very
> > > > > > useful.
> > > > > > > > > > You should think a way to convert Stanbol connector into
> a
> > > > normal
> > > > > > > > > > Transformation connector without assuming that a specific
> > > > output
> > > > > > > > > connector
> > > > > > > > > > will be used.
> > > > > > > > > > Regards
> > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> > > > djayakody@zaizi.com
> > > > > >:
> > > > > > > > > >> Hi guys,
> > > > > > > > > >>
> > > > > > > > > >> I have developed a Stanbol connector for MCF. You can
> > check
> > > it
> > > > > out
> > > > > > > > from
> > > > > > > > > our
> > > > > > > > > >> github repo here:
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > > >>
> > > > > > > > > >> It requires the SolrWrapper output connector which
> indexes
> > > > > > enhanced
> > > > > > > > > >> documents, entities and entityTypes in separate Solr
> > cores.
> > > > > > > Basically
> > > > > > > > it
> > > > > > > > > >> requires 3 separate solr cores configured with a
> specific
> > > Solr
> > > > > > > schema
> > > > > > > > > for
> > > > > > > > > >> primary documents, entities and entityTypes separately.
> > This
> > > > was
> > > > > > > done
> > > > > > > > > for
> > > > > > > > > >> our specific use-case.
> > > > > > > > > >>
> > > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > > >>
> > > > > > > > > >> Perhaps we can discuss and remove the Stanbol
> connector's
> > > > > > dependency
> > > > > > > > > with
> > > > > > > > > >> SolrWrapper and have it working with any output
> connector.
> > > > > > > > > >> Please note that the Stanbol connector currently has a
> bug
> > > in
> > > > > the
> > > > > > UI
> > > > > > > > > >> (editSpecification) which I'm working on at the moment.
> > > After
> > > > > > fixing
> > > > > > > > > that I
> > > > > > > > > >> will update here. And also I will provide documentations
> > for
> > > > > > > > configuring
> > > > > > > > > >> the connector.
> > > > > > > > > >>
> > > > > > > > > >> Thanks,
> > > > > > > > > >> Dileepa
> > > > > > > > > >>
> > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez
> > Morales
> > > <
> > > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi Joshua
> > > > > > > > > >> >
> > > > > > > > > >> > It is not the list for that, but Marmotta is already
> > > > > integrated
> > > > > > in
> > > > > > > > > Apache
> > > > > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> > > > > > > > > >> >
> > > > > > > > > >> > Anyway, as I said this is not the list for that, so
> > let's
> > > > use
> > > > > > the
> > > > > > > > > proper
> > > > > > > > > >> > list for these things.
> > > > > > > > > >> >
> > > > > > > > > >> > Regards
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > > > > joshua.dunham@gmail.com
> > > > > > > >:
> > > > > > > > > >> >
> > > > > > > > > >> > > Hey Dileepa,
> > > > > > > > > >> > >
> > > > > > > > > >> > >       In case you were interested, I pinged the
> list a
> > > few
> > > > > > days
> > > > > > > > ago
> > > > > > > > > >> > asking
> > > > > > > > > >> > > for integration tips for Apache Marmotta.
> > > > > > > > > >> > >
> > > > > > > > > >> > > I got some great tips on how to do this which could
> > help
> > > > > you.
> > > > > > > > Since
> > > > > > > > > >> > > Marmotta is a drop in replacement for Clarezza on
> > > Stanbol
> > > > it
> > > > > > may
> > > > > > > > be
> > > > > > > > > >> > easier
> > > > > > > > > >> > > for you to take this way.
> > > > > > > > > >> > >
> > > > > > > > > >> > > I'm not a Java programmer but I'm bringing this
> > problem
> > > to
> > > > > the
> > > > > > > > > >> > development
> > > > > > > > > >> > > staff at my company for assistance. If you like the
> > > > Marmotta
> > > > > > > > > approach
> > > > > > > > > >> we
> > > > > > > > > >> > > may gain more traction solving the same integration.
> > > > > > > > > >> > >
> > > > > > > > > >> > > I'm also integrating Marmotta with Stanbol so the
> > effect
> > > > > would
> > > > > > > be
> > > > > > > > > the
> > > > > > > > > >> > same
> > > > > > > > > >> > > except not using the Stanbol API for data import in
> > > favor
> > > > of
> > > > > > > > > Marmotta.
> > > > > > > > > >> > >
> > > > > > > > > >> > > Best,
> > > > > > > > > >> > >
> > > > > > > > > >> > > -J
> > > > > > > > > >> > >
> > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> > > > > > > > djayakody@zaizi.com
> > > > > > > > > >
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Hi all,
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Thanks you for the feedback and offering your help
> > in
> > > > > this.
> > > > > > > > > >> > > > Let me get back to you on where to start the code
> > > base.
> > > > > > > > > >> > > > As the first step, I would like to start by
> > creating a
> > > > > > > > > architecture
> > > > > > > > > >> > > diagram
> > > > > > > > > >> > > > for the connector.
> > > > > > > > > >> > > > I will send the diagram for your review soon.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Thanks,
> > > > > > > > > >> > > > Dileepa
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > --
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > ------------------------------
> > > > > > > > > >> > > > This message should be regarded as confidential.
> If
> > > you
> > > > > have
> > > > > > > > > received
> > > > > > > > > >> > > this
> > > > > > > > > >> > > > email in error please notify the sender and
> destroy
> > it
> > > > > > > > > immediately.
> > > > > > > > > >> > > > Statements of intent shall only become binding
> when
> > > > > > confirmed
> > > > > > > in
> > > > > > > > > hard
> > > > > > > > > >> > > copy
> > > > > > > > > >> > > > by an authorised signatory.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Zaizi Ltd is registered in England and Wales with
> > the
> > > > > > > > registration
> > > > > > > > > >> > number
> > > > > > > > > >> > > > 6440931. The Registered Office is Brook House, 229
> > > > > Shepherds
> > > > > > > > Bush
> > > > > > > > > >> Road,
> > > > > > > > > >> > > > London W6 7AN.
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >>
> > > > > > > > > >> ------------------------------
> > > > > > > > > >> This message should be regarded as confidential. If you
> > have
> > > > > > > received
> > > > > > > > > this
> > > > > > > > > >> email in error please notify the sender and destroy it
> > > > > > immediately.
> > > > > > > > > >> Statements of intent shall only become binding when
> > > confirmed
> > > > in
> > > > > > > hard
> > > > > > > > > copy
> > > > > > > > > >> by an authorised signatory.
> > > > > > > > > >>
> > > > > > > > > >> Zaizi Ltd is registered in England and Wales with the
> > > > > registration
> > > > > > > > > number
> > > > > > > > > >> 6440931. The Registered Office is Brook House, 229
> > Shepherds
> > > > > Bush
> > > > > > > > Road,
> > > > > > > > > >> London W6 7AN.
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > ------------------------------
> > > > > > > This message should be regarded as confidential. If you have
> > > received
> > > > > > this
> > > > > > > email in error please notify the sender and destroy it
> > immediately.
> > > > > > > Statements of intent shall only become binding when confirmed
> in
> > > hard
> > > > > > copy
> > > > > > > by an authorised signatory.
> > > > > > >
> > > > > > > Zaizi Ltd is registered in England and Wales with the
> > registration
> > > > > number
> > > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> Bush
> > > > Road,
> > > > > > > London W6 7AN.
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > ------------------------------
> > > > > This message should be regarded as confidential. If you have
> received
> > > > this
> > > > > email in error please notify the sender and destroy it immediately.
> > > > > Statements of intent shall only become binding when confirmed in
> hard
> > > > copy
> > > > > by an authorised signatory.
> > > > >
> > > > > Zaizi Ltd is registered in England and Wales with the registration
> > > number
> > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > Road,
> > > > > London W6 7AN.
> > > > >
> > > >
> > >
> > > --
> > >
> > > ------------------------------
> > > This message should be regarded as confidential. If you have received
> > this
> > > email in error please notify the sender and destroy it immediately.
> > > Statements of intent shall only become binding when confirmed in hard
> > copy
> > > by an authorised signatory.
> > >
> > > Zaizi Ltd is registered in England and Wales with the registration
> number
> > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > > London W6 7AN.
> > >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Dileepa Jayakody <dj...@zaizi.com>.
Hi Rafa,

Thanks for reviewing my code and for your feedback. Please see my comments
inline below.


On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <rh...@apache.org> wrote:

> Hi Dileepa,
>
> This seems to be going in the right direction clearly now in my opinion.
> Quick comments after a first review:
>
>
>    - Rejecting a document because it can't be enhanced is kind of tough.
>    You are preventing a document to be finally indexed because the
> enhancement
>    didn't perform correctly, probably it is better just to let them
> continue
>    the workflow within the system
>

Got your point. Will remove that part from the code


>    - As I can deduce for the code, you are correctly extracting the
>    configured dereferenced fields, but you are not processing at all the
>    LDPath results
>
> I'm passing the LDPath program as an enhancer parameter to Stanbol to
retrieve the enhancement result according to the LDPath program (which is
given as a text string in the connector UI).
If the user has not defined a LDPath program and added derefence fields in
the UI instead, then the enhancement request will be built using the
dereference fields as enhancer parameters.


If neither a LDPath or dereference fields are given in the transformation
UI, then I just call the given enhancement chain without any other enhancer
paramaters.

Please refer below code segment where I do this and let me know if it needs
more improvements.

            // ldpath program is given priority if it's set
            if (ldPath != null)
            {
                parameters =
EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
            }
            else if (!derefFields.isEmpty())
            {
                parameters =
EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
                        derefFields.keySet()).build();
            }
            else
            {
                parameters =
EnhancerParameters.builder().setChain(chain).setContent(content).build();
            }
            eRes = enhancerClient.enhance(parameters);


Thanks,
Dileepa


>
> Cheers,
> Rafa
>
>
>
>
> On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <dj...@zaizi.com>
> wrote:
>
> > Hi All,
> >
> > As per our discussion I have modified the Stanbol Connector so that it
> adds
> > all extracted entity URIs and entity attributes to the repository
> document
> > as fields.
> >
> > On a separate branch I have committed this code to our github project
> > sensefy-connectors.
> > You can find the source code here:
> >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > Let me know your feedback.
> >
> > I will write a blog post on how to add it in a connection and get
> > ehancement results and share it with you.
> >
> > Thanks,
> > Dileepa
> >
> >
> >
> > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <da...@gmail.com> wrote:
> >
> > > Hi Dileepa,
> > >
> > > You cannot create sub-documents in a transformation connector.  And
> > adding
> > > that capability to the framework is not possible; we would be missing
> key
> > > bookkeeping logic if that was allowed.
> > >
> > > Karl
> > >
> > >
> > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <dj...@zaizi.com>
> > > wrote:
> > >
> > > > Hi Karl,
> > > >
> > > > Thanks a lot for the pointer.
> > > >
> > > > Stanbol doesn't update an existing document, it generates a new
> > response
> > > > with requested enhancement details for the content enhansment
> request.
> > > > For example for a request like : "Paris is a city in France"
> following
> > > RDF
> > > > response [1] is given by Stanbol.
> > > >
> > > > In the Stanbol connector, enhancement artifacts such as
> TextAnnotations
> > > > and EntityAnnotations are extracted from the RDF response, to
> generate
> > > the
> > > > entity abstractions and add them to the mcf repository document.
> > > Currently
> > > > in the Stanbol connector we have added these entity abstractions as
> > JSON
> > > > strings to a multi-valued 'entities' field in the repository document
> > and
> > > > we parse that JSON in the SolrWrapper output connector to index in
> > > separate
> > > > Solr cores (primary documents, linked entities and entity types with
> > > their
> > > > attributes).
> > > >
> > > > Can we can have a primary repository document and create sub
> documents
> > > for
> > > > the extracted entities? Is it possible to generate sub documents for
> a
> > > > repo-document in a transformation connector?
> > > >
> > > > Thanks.
> > > > Dileepa
> > > >
> > > > [1] Sample Stanbol response
> > > >
> > > > {
> > > >   "@context": {
> > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > >     "dc": "http://purl.org/dc/terms/",
> > > >     "dc:created": {
> > > >       "@type": "xsd:dateTime"
> > > >     },
> > > >     "enhancer": "http://fise.iks-project.eu/ontology/",
> > > >     "enhancer:confidence": {
> > > >       "@type": "xsd:double"
> > > >     },
> > > >     "enhancer:end": {
> > > >       "@type": "xsd:int"
> > > >     },
> > > >     "enhancer:entity-reference": {
> > > >       "@type": "@id"
> > > >     },
> > > >     "enhancer:entity-type": {
> > > >       "@type": "@id"
> > > >     },
> > > >     "enhancer:extracted-from": {
> > > >       "@type": "@id"
> > > >     },
> > > >     "enhancer:start": {
> > > >       "@type": "xsd:int"
> > > >     },
> > > >     "entityhub": "
> > > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > ",
> > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > >     "foaf:depiction": {
> > > >       "@type": "@id"
> > > >     },
> > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > > >     "schema": "http://schema.org/",
> > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > >   },
> > > >   "@graph": [
> > > >     {
> > > >       "@id": "http://dbpedia.org/resource/France",
> > > >       "@type": [
> > > >         "dbp-ont:Country",
> > > >         "dbp-ont:Place",
> > > >         "dbp-ont:PopulatedPlace",
> > > >         "http://www.opengis.net/gml/_Feature",
> > > >         "owl:Thing",
> > > >         "schema:Country",
> > > >         "schema:Place"
> > > >       ],
> > > >       "foaf:depiction": [
> > > >         "
> > > >
> http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > ",
> > > >         "
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > "
> > > >       ],
> > > >       "rdfs:comment": {
> > > >         "@language": "en",
> > > >         "@value": "France, officially the French Republic, is a
> > > > unitary semi-presidential republic in Western Europe with several
> > > > overseas territories and islands located on other continents and in
> > > > the Indian, Pacific, and Atlantic oceans. Metropolitan France extends
> > > > from the Mediterranean Sea to the English Channel and the North Sea,
> > > > and from the Rhine to the Atlantic Ocean. It is often referred to as
> > > > l’Hexagone because of the geometric shape of its territory."
> > > >       },
> > > >       "rdfs:label": [
> > > >         {
> > > >           "@language": "en",
> > > >           "@value": "France"
> > > >         },
> > > >         {
> > > >           "@language": "fr",
> > > >           "@value": "France"
> > > >         },
> > > >       ]
> > > >     },
> > > >
> > > >     {
> > > >       "@id": "http://dbpedia.org/resource/Paris",
> > > >       "@type": [
> > > >         "dbp-ont:Place",
> > > >         "dbp-ont:PopulatedPlace",
> > > >         "dbp-ont:Settlement",
> > > >         "http://www.opengis.net/gml/_Feature",
> > > >         "owl:Thing",
> > > >         "schema:Place"
> > > >       ],
> > > >       "foaf:depiction": [
> > > >         "
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > ",
> > > >         "
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > "
> > > >       ],
> > > >       "geo:lat": 48.8567,
> > > >       "geo:long": 2.3508,
> > > >       "rdfs:comment": {
> > > >         "@language": "en",
> > > >         "@value": "Paris is the capital and largest city of France.
> It
> > > > is situated on the river Seine, in northern France, at the heart of
> > > > the Île-de-France region (or Paris Region, French: Région
> parisienne).
> > > > As of January 2008 the city of Paris, within its administrative
> limits
> > > > largely unchanged since 1860, has an estimated population of
> 2,211,297
> > > > and a metropolitan population of 12,089,098, and is one of the most
> > > > populated metropolitan areas in Europe."
> > > >       },
> > > >       "rdfs:label": [
> > > >
> > > >         {
> > > >           "@language": "en",
> > > >           "@value": "Paris"
> > > >         },
> > > >         {
> > > >           "@language": "fr",
> > > >           "@value": "Paris"
> > > >         },
> > > >       ]
> > > >     },
> > > >    }
> > > >     {
> > > >       "@id": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > >       "@type": [
> > > >         "enhancer:Enhancement",
> > > >         "enhancer:TextAnnotation"
> > > >       ],
> > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > >       "dc:creator":
> > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > >       "dc:type": "dbp-ont:Place",
> > > >       "enhancer:confidence": 0.6017613,
> > > >       "enhancer:end": 5,
> > > >       "enhancer:extracted-from":
> > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > >       "enhancer:selected-text": {
> > > >         "@language": "en",
> > > >         "@value": "Paris"
> > > >       },
> > > >       "enhancer:selection-context": {
> > > >         "@language": "en",
> > > >         "@value": "Paris is in France"
> > > >       },
> > > >       "enhancer:start": 0
> > > >     },
> > > >     {
> > > >       "@id": "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > >       "@type": [
> > > >         "enhancer:Enhancement",
> > > >         "enhancer:EntityAnnotation"
> > > >       ],
> > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > >       "dc:creator":
> > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > >       "dc:relation":
> > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > >       "enhancer:confidence": 1.0,
> > > >       "enhancer:entity-label": {
> > > >         "@language": "en",
> > > >         "@value": "France"
> > > >       },
> > > >       "enhancer:entity-reference": "
> http://dbpedia.org/resource/France
> > ",
> > > >       "enhancer:entity-type": [
> > > >         "dbp-ont:Country",
> > > >         "dbp-ont:Place",
> > > >         "dbp-ont:PopulatedPlace",
> > > >         "schema:Country",
> > > >         "schema:Place",
> > > >         "http://www.opengis.net/gml/_Feature",
> > > >         "owl:Thing"
> > > >       ],
> > > >       "enhancer:extracted-from":
> > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > >       "entityhub:site": "dbpedia"
> > > >     },
> > > >     {
> > > >       "@id": "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > >       "@type": [
> > > >         "enhancer:Enhancement",
> > > >         "enhancer:EntityAnnotation"
> > > >       ],
> > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > >       "dc:creator":
> > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > >       "dc:relation":
> > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > >       "enhancer:confidence": 0.25715446,
> > > >       "enhancer:entity-label": {
> > > >         "@language": "en",
> > > >         "@value": "Vichy France"
> > > >       },
> > > >       "enhancer:entity-reference": "
> > > > http://dbpedia.org/resource/Vichy_France",
> > > >       "enhancer:entity-type": [
> > > >         "dbp-ont:Country",
> > > >         "dbp-ont:Place",
> > > >         "dbp-ont:PopulatedPlace",
> > > >         "schema:Country",
> > > >         "schema:Place",
> > > >         "http://www.opengis.net/gml/_Feature",
> > > >         "owl:Thing"
> > > >       ],
> > > >       "enhancer:extracted-from":
> > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > >       "entityhub:site": "dbpedia"
> > > >     },
> > > >     {
> > > >       "@id": "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > >       "@type": [
> > > >         "enhancer:Enhancement",
> > > >         "enhancer:EntityAnnotation"
> > > >       ],
> > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > >       "dc:creator":
> > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > >       "dc:relation":
> > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > >       "enhancer:confidence": 0.1493264,
> > > >       "enhancer:entity-label": {
> > > >         "@language": "en",
> > > >         "@value": "Paris Commune"
> > > >       },
> > > >       "enhancer:entity-reference": "
> > > > http://dbpedia.org/resource/Paris_Commune",
> > > >       "enhancer:entity-type": [
> > > >         "dbp-ont:Country",
> > > >         "dbp-ont:Place",
> > > >         "dbp-ont:PopulatedPlace",
> > > >         "schema:Country",
> > > >         "schema:Place",
> > > >         "owl:Thing"
> > > >       ],
> > > >       "enhancer:extracted-from":
> > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > >       "entityhub:site": "dbpedia"
> > > >     },
> > > >     {
> > > >       "@id": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > >       "@type": [
> > > >         "enhancer:Enhancement",
> > > >         "enhancer:TextAnnotation"
> > > >       ],
> > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > >       "dc:creator":
> > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > >       "dc:type": "dbp-ont:Place",
> > > >       "enhancer:confidence": 0.99354976,
> > > >       "enhancer:end": 18,
> > > >       "enhancer:extracted-from":
> > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > >       "enhancer:selected-text": {
> > > >         "@language": "en",
> > > >         "@value": "France"
> > > >       },
> > > >       "enhancer:selection-context": {
> > > >         "@language": "en",
> > > >         "@value": "Paris is in France"
> > > >       },
> > > >       "enhancer:start": 12
> > > >     }
> > > >   ]
> > > > }
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <da...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Dileepa,
> > > > >
> > > > > Repository connectors have an abstraction that allows them to
> > generate
> > > > > compound documents (where a document has a primary identifier, and
> > > there
> > > > > are subdocuments that share that primary identifier and have a
> > > secondary
> > > > > identifier).  This sounds a bit like what you are describing.  Does
> > > > Stanbol
> > > > > work by decorating an existing document, or does it work by
> > generating
> > > > all
> > > > > content for a document?
> > > > >
> > > > > Karl
> > > > >
> > > > >
> > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> > djayakody@zaizi.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > >
> > > > > > While thanking you all for your input on Stanbol connector
> > > > requirement, I
> > > > > > would like to continue with modifying the Stanbol connector to be
> > > > > > compatible with any output connector. If you guys can give some
> > > > guidance
> > > > > on
> > > > > > how the entity metadata should be added to the repository
> document
> > I
> > > > can
> > > > > > modify the stanbol connector accordingly.
> > > > > >
> > > > > > From Rafa's comments, I gathered we can add the entity metadata
> to
> > > the
> > > > > > repo.doc as key value pairs.
> > > > > > However this idea is not yet clear to me. There could be 'N'
> number
> > > of
> > > > > > entities in a document and each of them will have some common
> > > > attributes
> > > > > > such as name, id, type and specific attributes for particular
> > entity
> > > > > type.
> > > > > > I'm not clear on how to maintain that structure of N number of
> > > entities
> > > > > > with their attributes in a repo.document as key value pairs and
> > make
> > > > them
> > > > > > LDPath compatible for retrieval in an output connector.
> > > > > >
> > > > > > @Rafa
> > > > > > If you can please elaborate on your suggestion it would be
> greatly
> > > > > helpful
> > > > > > to me.
> > > > > > All other suggestions are also welcome.
> > > > > >
> > > > > > Thanks,
> > > > > > Dileepa
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <daddywri@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > I, too, agree.  Somebody will need to turn this connector into
> > one
> > > > that
> > > > > > > plays by the rules.  It may be possible for someone on the team
> > > here
> > > > to
> > > > > > do
> > > > > > > that, but it won't be me; I'm seriously overextended at the
> > moment.
> > > > It
> > > > > > > would be best if someone who knew the connector well could do
> the
> > > > > > necessary
> > > > > > > work.
> > > > > > >
> > > > > > > Karl
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> > rharoapache@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > I must agree with Antonio. When I started to work on this I
> was
> > > > > > expecting
> > > > > > > > the connector to work by just extracting the entities and
> > > entities
> > > > > > > metadata
> > > > > > > > and put them as plain metadata of the documents, probably
> > > following
> > > > > > > LDPATH
> > > > > > > > queries configuration
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > This is probably ok for Sensefy but I don’t think this could
> be
> > > > > > suitable
> > > > > > > > to be included in the project. But this is only my opinion.
> Of
> > > > > course,
> > > > > > a
> > > > > > > > version of the connector that fully respect the ManifoldCF
> > > > > architecture
> > > > > > > > would be more than welcome in my opinion
> > > > > > > >
> > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> > > > > > > > <ad...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Hi
> > > > > > > > > The removal of the SolrWrapper is a must. It was a
> > requirement
> > > > for
> > > > > an
> > > > > > > > > internal project which has nothing to do here with a normal
> > > > > operation
> > > > > > > of
> > > > > > > > > Manifold, so forcing the users to use Solr does not fit the
> > > > > Manifold
> > > > > > > > > philosophy.
> > > > > > > > > In my opinion, at this moment, a Stanbol connector with
> such
> > a
> > > > big
> > > > > > > > > dependency which will not fit almost any use case is not
> very
> > > > > useful.
> > > > > > > > > You should think a way to convert Stanbol connector into a
> > > normal
> > > > > > > > > Transformation connector without assuming that a specific
> > > output
> > > > > > > > connector
> > > > > > > > > will be used.
> > > > > > > > > Regards
> > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> > > djayakody@zaizi.com
> > > > >:
> > > > > > > > >> Hi guys,
> > > > > > > > >>
> > > > > > > > >> I have developed a Stanbol connector for MCF. You can
> check
> > it
> > > > out
> > > > > > > from
> > > > > > > > our
> > > > > > > > >> github repo here:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > >>
> > > > > > > > >> It requires the SolrWrapper output connector which indexes
> > > > > enhanced
> > > > > > > > >> documents, entities and entityTypes in separate Solr
> cores.
> > > > > > Basically
> > > > > > > it
> > > > > > > > >> requires 3 separate solr cores configured with a specific
> > Solr
> > > > > > schema
> > > > > > > > for
> > > > > > > > >> primary documents, entities and entityTypes separately.
> This
> > > was
> > > > > > done
> > > > > > > > for
> > > > > > > > >> our specific use-case.
> > > > > > > > >>
> > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > >>
> > > > > > > > >> Perhaps we can discuss and remove the Stanbol connector's
> > > > > dependency
> > > > > > > > with
> > > > > > > > >> SolrWrapper and have it working with any output connector.
> > > > > > > > >> Please note that the Stanbol connector currently has a bug
> > in
> > > > the
> > > > > UI
> > > > > > > > >> (editSpecification) which I'm working on at the moment.
> > After
> > > > > fixing
> > > > > > > > that I
> > > > > > > > >> will update here. And also I will provide documentations
> for
> > > > > > > configuring
> > > > > > > > >> the connector.
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >> Dileepa
> > > > > > > > >>
> > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez
> Morales
> > <
> > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi Joshua
> > > > > > > > >> >
> > > > > > > > >> > It is not the list for that, but Marmotta is already
> > > > integrated
> > > > > in
> > > > > > > > Apache
> > > > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> > > > > > > > >> >
> > > > > > > > >> > Anyway, as I said this is not the list for that, so
> let's
> > > use
> > > > > the
> > > > > > > > proper
> > > > > > > > >> > list for these things.
> > > > > > > > >> >
> > > > > > > > >> > Regards
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > > > joshua.dunham@gmail.com
> > > > > > >:
> > > > > > > > >> >
> > > > > > > > >> > > Hey Dileepa,
> > > > > > > > >> > >
> > > > > > > > >> > >       In case you were interested, I pinged the list a
> > few
> > > > > days
> > > > > > > ago
> > > > > > > > >> > asking
> > > > > > > > >> > > for integration tips for Apache Marmotta.
> > > > > > > > >> > >
> > > > > > > > >> > > I got some great tips on how to do this which could
> help
> > > > you.
> > > > > > > Since
> > > > > > > > >> > > Marmotta is a drop in replacement for Clarezza on
> > Stanbol
> > > it
> > > > > may
> > > > > > > be
> > > > > > > > >> > easier
> > > > > > > > >> > > for you to take this way.
> > > > > > > > >> > >
> > > > > > > > >> > > I'm not a Java programmer but I'm bringing this
> problem
> > to
> > > > the
> > > > > > > > >> > development
> > > > > > > > >> > > staff at my company for assistance. If you like the
> > > Marmotta
> > > > > > > > approach
> > > > > > > > >> we
> > > > > > > > >> > > may gain more traction solving the same integration.
> > > > > > > > >> > >
> > > > > > > > >> > > I'm also integrating Marmotta with Stanbol so the
> effect
> > > > would
> > > > > > be
> > > > > > > > the
> > > > > > > > >> > same
> > > > > > > > >> > > except not using the Stanbol API for data import in
> > favor
> > > of
> > > > > > > > Marmotta.
> > > > > > > > >> > >
> > > > > > > > >> > > Best,
> > > > > > > > >> > >
> > > > > > > > >> > > -J
> > > > > > > > >> > >
> > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> > > > > > > djayakody@zaizi.com
> > > > > > > > >
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > > Hi all,
> > > > > > > > >> > > >
> > > > > > > > >> > > > Thanks you for the feedback and offering your help
> in
> > > > this.
> > > > > > > > >> > > > Let me get back to you on where to start the code
> > base.
> > > > > > > > >> > > > As the first step, I would like to start by
> creating a
> > > > > > > > architecture
> > > > > > > > >> > > diagram
> > > > > > > > >> > > > for the connector.
> > > > > > > > >> > > > I will send the diagram for your review soon.
> > > > > > > > >> > > >
> > > > > > > > >> > > > Thanks,
> > > > > > > > >> > > > Dileepa
> > > > > > > > >> > > >
> > > > > > > > >> > > > --
> > > > > > > > >> > > >
> > > > > > > > >> > > > ------------------------------
> > > > > > > > >> > > > This message should be regarded as confidential. If
> > you
> > > > have
> > > > > > > > received
> > > > > > > > >> > > this
> > > > > > > > >> > > > email in error please notify the sender and destroy
> it
> > > > > > > > immediately.
> > > > > > > > >> > > > Statements of intent shall only become binding when
> > > > > confirmed
> > > > > > in
> > > > > > > > hard
> > > > > > > > >> > > copy
> > > > > > > > >> > > > by an authorised signatory.
> > > > > > > > >> > > >
> > > > > > > > >> > > > Zaizi Ltd is registered in England and Wales with
> the
> > > > > > > registration
> > > > > > > > >> > number
> > > > > > > > >> > > > 6440931. The Registered Office is Brook House, 229
> > > > Shepherds
> > > > > > > Bush
> > > > > > > > >> Road,
> > > > > > > > >> > > > London W6 7AN.
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >>
> > > > > > > > >> ------------------------------
> > > > > > > > >> This message should be regarded as confidential. If you
> have
> > > > > > received
> > > > > > > > this
> > > > > > > > >> email in error please notify the sender and destroy it
> > > > > immediately.
> > > > > > > > >> Statements of intent shall only become binding when
> > confirmed
> > > in
> > > > > > hard
> > > > > > > > copy
> > > > > > > > >> by an authorised signatory.
> > > > > > > > >>
> > > > > > > > >> Zaizi Ltd is registered in England and Wales with the
> > > > registration
> > > > > > > > number
> > > > > > > > >> 6440931. The Registered Office is Brook House, 229
> Shepherds
> > > > Bush
> > > > > > > Road,
> > > > > > > > >> London W6 7AN.
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > ------------------------------
> > > > > > This message should be regarded as confidential. If you have
> > received
> > > > > this
> > > > > > email in error please notify the sender and destroy it
> immediately.
> > > > > > Statements of intent shall only become binding when confirmed in
> > hard
> > > > > copy
> > > > > > by an authorised signatory.
> > > > > >
> > > > > > Zaizi Ltd is registered in England and Wales with the
> registration
> > > > number
> > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > > Road,
> > > > > > London W6 7AN.
> > > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > > ------------------------------
> > > > This message should be regarded as confidential. If you have received
> > > this
> > > > email in error please notify the sender and destroy it immediately.
> > > > Statements of intent shall only become binding when confirmed in hard
> > > copy
> > > > by an authorised signatory.
> > > >
> > > > Zaizi Ltd is registered in England and Wales with the registration
> > number
> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> Road,
> > > > London W6 7AN.
> > > >
> > >
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
> >
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Rafa Haro <rh...@apache.org>.
Hi Dileepa,

This seems to be going in the right direction clearly now in my opinion.
Quick comments after a first review:


   - Rejecting a document because it can't be enhanced is kind of tough.
   You are preventing a document to be finally indexed because the enhancement
   didn't perform correctly, probably it is better just to let them continue
   the workflow within the system
   - As I can deduce for the code, you are correctly extracting the
   configured dereferenced fields, but you are not processing at all the
   LDPath results


Cheers,
Rafa




On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <dj...@zaizi.com>
wrote:

> Hi All,
>
> As per our discussion I have modified the Stanbol Connector so that it adds
> all extracted entity URIs and entity attributes to the repository document
> as fields.
>
> On a separate branch I have committed this code to our github project
> sensefy-connectors.
> You can find the source code here:
>
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> Let me know your feedback.
>
> I will write a blog post on how to add it in a connection and get
> ehancement results and share it with you.
>
> Thanks,
> Dileepa
>
>
>
> On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <da...@gmail.com> wrote:
>
> > Hi Dileepa,
> >
> > You cannot create sub-documents in a transformation connector.  And
> adding
> > that capability to the framework is not possible; we would be missing key
> > bookkeeping logic if that was allowed.
> >
> > Karl
> >
> >
> > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <dj...@zaizi.com>
> > wrote:
> >
> > > Hi Karl,
> > >
> > > Thanks a lot for the pointer.
> > >
> > > Stanbol doesn't update an existing document, it generates a new
> response
> > > with requested enhancement details for the content enhansment request.
> > > For example for a request like : "Paris is a city in France" following
> > RDF
> > > response [1] is given by Stanbol.
> > >
> > > In the Stanbol connector, enhancement artifacts such as TextAnnotations
> > > and EntityAnnotations are extracted from the RDF response, to generate
> > the
> > > entity abstractions and add them to the mcf repository document.
> > Currently
> > > in the Stanbol connector we have added these entity abstractions as
> JSON
> > > strings to a multi-valued 'entities' field in the repository document
> and
> > > we parse that JSON in the SolrWrapper output connector to index in
> > separate
> > > Solr cores (primary documents, linked entities and entity types with
> > their
> > > attributes).
> > >
> > > Can we can have a primary repository document and create sub documents
> > for
> > > the extracted entities? Is it possible to generate sub documents for a
> > > repo-document in a transformation connector?
> > >
> > > Thanks.
> > > Dileepa
> > >
> > > [1] Sample Stanbol response
> > >
> > > {
> > >   "@context": {
> > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > >     "dc": "http://purl.org/dc/terms/",
> > >     "dc:created": {
> > >       "@type": "xsd:dateTime"
> > >     },
> > >     "enhancer": "http://fise.iks-project.eu/ontology/",
> > >     "enhancer:confidence": {
> > >       "@type": "xsd:double"
> > >     },
> > >     "enhancer:end": {
> > >       "@type": "xsd:int"
> > >     },
> > >     "enhancer:entity-reference": {
> > >       "@type": "@id"
> > >     },
> > >     "enhancer:entity-type": {
> > >       "@type": "@id"
> > >     },
> > >     "enhancer:extracted-from": {
> > >       "@type": "@id"
> > >     },
> > >     "enhancer:start": {
> > >       "@type": "xsd:int"
> > >     },
> > >     "entityhub": "
> > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > ",
> > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > >     "foaf:depiction": {
> > >       "@type": "@id"
> > >     },
> > >     "owl": "http://www.w3.org/2002/07/owl#",
> > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > >     "schema": "http://schema.org/",
> > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > >   },
> > >   "@graph": [
> > >     {
> > >       "@id": "http://dbpedia.org/resource/France",
> > >       "@type": [
> > >         "dbp-ont:Country",
> > >         "dbp-ont:Place",
> > >         "dbp-ont:PopulatedPlace",
> > >         "http://www.opengis.net/gml/_Feature",
> > >         "owl:Thing",
> > >         "schema:Country",
> > >         "schema:Place"
> > >       ],
> > >       "foaf:depiction": [
> > >         "
> > > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> ",
> > >         "
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > "
> > >       ],
> > >       "rdfs:comment": {
> > >         "@language": "en",
> > >         "@value": "France, officially the French Republic, is a
> > > unitary semi-presidential republic in Western Europe with several
> > > overseas territories and islands located on other continents and in
> > > the Indian, Pacific, and Atlantic oceans. Metropolitan France extends
> > > from the Mediterranean Sea to the English Channel and the North Sea,
> > > and from the Rhine to the Atlantic Ocean. It is often referred to as
> > > l’Hexagone because of the geometric shape of its territory."
> > >       },
> > >       "rdfs:label": [
> > >         {
> > >           "@language": "en",
> > >           "@value": "France"
> > >         },
> > >         {
> > >           "@language": "fr",
> > >           "@value": "France"
> > >         },
> > >       ]
> > >     },
> > >
> > >     {
> > >       "@id": "http://dbpedia.org/resource/Paris",
> > >       "@type": [
> > >         "dbp-ont:Place",
> > >         "dbp-ont:PopulatedPlace",
> > >         "dbp-ont:Settlement",
> > >         "http://www.opengis.net/gml/_Feature",
> > >         "owl:Thing",
> > >         "schema:Place"
> > >       ],
> > >       "foaf:depiction": [
> > >         "
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > ",
> > >         "
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > "
> > >       ],
> > >       "geo:lat": 48.8567,
> > >       "geo:long": 2.3508,
> > >       "rdfs:comment": {
> > >         "@language": "en",
> > >         "@value": "Paris is the capital and largest city of France. It
> > > is situated on the river Seine, in northern France, at the heart of
> > > the Île-de-France region (or Paris Region, French: Région parisienne).
> > > As of January 2008 the city of Paris, within its administrative limits
> > > largely unchanged since 1860, has an estimated population of 2,211,297
> > > and a metropolitan population of 12,089,098, and is one of the most
> > > populated metropolitan areas in Europe."
> > >       },
> > >       "rdfs:label": [
> > >
> > >         {
> > >           "@language": "en",
> > >           "@value": "Paris"
> > >         },
> > >         {
> > >           "@language": "fr",
> > >           "@value": "Paris"
> > >         },
> > >       ]
> > >     },
> > >    }
> > >     {
> > >       "@id": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > >       "@type": [
> > >         "enhancer:Enhancement",
> > >         "enhancer:TextAnnotation"
> > >       ],
> > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > >       "dc:creator":
> > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > >       "dc:type": "dbp-ont:Place",
> > >       "enhancer:confidence": 0.6017613,
> > >       "enhancer:end": 5,
> > >       "enhancer:extracted-from":
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > >       "enhancer:selected-text": {
> > >         "@language": "en",
> > >         "@value": "Paris"
> > >       },
> > >       "enhancer:selection-context": {
> > >         "@language": "en",
> > >         "@value": "Paris is in France"
> > >       },
> > >       "enhancer:start": 0
> > >     },
> > >     {
> > >       "@id": "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > >       "@type": [
> > >         "enhancer:Enhancement",
> > >         "enhancer:EntityAnnotation"
> > >       ],
> > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > >       "dc:creator":
> > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > >       "dc:relation":
> > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > >       "enhancer:confidence": 1.0,
> > >       "enhancer:entity-label": {
> > >         "@language": "en",
> > >         "@value": "France"
> > >       },
> > >       "enhancer:entity-reference": "http://dbpedia.org/resource/France
> ",
> > >       "enhancer:entity-type": [
> > >         "dbp-ont:Country",
> > >         "dbp-ont:Place",
> > >         "dbp-ont:PopulatedPlace",
> > >         "schema:Country",
> > >         "schema:Place",
> > >         "http://www.opengis.net/gml/_Feature",
> > >         "owl:Thing"
> > >       ],
> > >       "enhancer:extracted-from":
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > >       "entityhub:site": "dbpedia"
> > >     },
> > >     {
> > >       "@id": "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > >       "@type": [
> > >         "enhancer:Enhancement",
> > >         "enhancer:EntityAnnotation"
> > >       ],
> > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > >       "dc:creator":
> > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > >       "dc:relation":
> > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > >       "enhancer:confidence": 0.25715446,
> > >       "enhancer:entity-label": {
> > >         "@language": "en",
> > >         "@value": "Vichy France"
> > >       },
> > >       "enhancer:entity-reference": "
> > > http://dbpedia.org/resource/Vichy_France",
> > >       "enhancer:entity-type": [
> > >         "dbp-ont:Country",
> > >         "dbp-ont:Place",
> > >         "dbp-ont:PopulatedPlace",
> > >         "schema:Country",
> > >         "schema:Place",
> > >         "http://www.opengis.net/gml/_Feature",
> > >         "owl:Thing"
> > >       ],
> > >       "enhancer:extracted-from":
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > >       "entityhub:site": "dbpedia"
> > >     },
> > >     {
> > >       "@id": "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > >       "@type": [
> > >         "enhancer:Enhancement",
> > >         "enhancer:EntityAnnotation"
> > >       ],
> > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > >       "dc:creator":
> > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > >       "dc:relation":
> > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > >       "enhancer:confidence": 0.1493264,
> > >       "enhancer:entity-label": {
> > >         "@language": "en",
> > >         "@value": "Paris Commune"
> > >       },
> > >       "enhancer:entity-reference": "
> > > http://dbpedia.org/resource/Paris_Commune",
> > >       "enhancer:entity-type": [
> > >         "dbp-ont:Country",
> > >         "dbp-ont:Place",
> > >         "dbp-ont:PopulatedPlace",
> > >         "schema:Country",
> > >         "schema:Place",
> > >         "owl:Thing"
> > >       ],
> > >       "enhancer:extracted-from":
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > >       "entityhub:site": "dbpedia"
> > >     },
> > >     {
> > >       "@id": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > >       "@type": [
> > >         "enhancer:Enhancement",
> > >         "enhancer:TextAnnotation"
> > >       ],
> > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > >       "dc:creator":
> > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > >       "dc:type": "dbp-ont:Place",
> > >       "enhancer:confidence": 0.99354976,
> > >       "enhancer:end": 18,
> > >       "enhancer:extracted-from":
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > >       "enhancer:selected-text": {
> > >         "@language": "en",
> > >         "@value": "France"
> > >       },
> > >       "enhancer:selection-context": {
> > >         "@language": "en",
> > >         "@value": "Paris is in France"
> > >       },
> > >       "enhancer:start": 12
> > >     }
> > >   ]
> > > }
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <da...@gmail.com>
> wrote:
> > >
> > > > Hi Dileepa,
> > > >
> > > > Repository connectors have an abstraction that allows them to
> generate
> > > > compound documents (where a document has a primary identifier, and
> > there
> > > > are subdocuments that share that primary identifier and have a
> > secondary
> > > > identifier).  This sounds a bit like what you are describing.  Does
> > > Stanbol
> > > > work by decorating an existing document, or does it work by
> generating
> > > all
> > > > content for a document?
> > > >
> > > > Karl
> > > >
> > > >
> > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> djayakody@zaizi.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > >
> > > > > While thanking you all for your input on Stanbol connector
> > > requirement, I
> > > > > would like to continue with modifying the Stanbol connector to be
> > > > > compatible with any output connector. If you guys can give some
> > > guidance
> > > > on
> > > > > how the entity metadata should be added to the repository document
> I
> > > can
> > > > > modify the stanbol connector accordingly.
> > > > >
> > > > > From Rafa's comments, I gathered we can add the entity metadata to
> > the
> > > > > repo.doc as key value pairs.
> > > > > However this idea is not yet clear to me. There could be 'N' number
> > of
> > > > > entities in a document and each of them will have some common
> > > attributes
> > > > > such as name, id, type and specific attributes for particular
> entity
> > > > type.
> > > > > I'm not clear on how to maintain that structure of N number of
> > entities
> > > > > with their attributes in a repo.document as key value pairs and
> make
> > > them
> > > > > LDPath compatible for retrieval in an output connector.
> > > > >
> > > > > @Rafa
> > > > > If you can please elaborate on your suggestion it would be greatly
> > > > helpful
> > > > > to me.
> > > > > All other suggestions are also welcome.
> > > > >
> > > > > Thanks,
> > > > > Dileepa
> > > > >
> > > > >
> > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <da...@gmail.com>
> > > wrote:
> > > > >
> > > > > > I, too, agree.  Somebody will need to turn this connector into
> one
> > > that
> > > > > > plays by the rules.  It may be possible for someone on the team
> > here
> > > to
> > > > > do
> > > > > > that, but it won't be me; I'm seriously overextended at the
> moment.
> > > It
> > > > > > would be best if someone who knew the connector well could do the
> > > > > necessary
> > > > > > work.
> > > > > >
> > > > > > Karl
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> rharoapache@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > I must agree with Antonio. When I started to work on this I was
> > > > > expecting
> > > > > > > the connector to work by just extracting the entities and
> > entities
> > > > > > metadata
> > > > > > > and put them as plain metadata of the documents, probably
> > following
> > > > > > LDPATH
> > > > > > > queries configuration
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > This is probably ok for Sensefy but I don’t think this could be
> > > > > suitable
> > > > > > > to be included in the project. But this is only my opinion. Of
> > > > course,
> > > > > a
> > > > > > > version of the connector that fully respect the ManifoldCF
> > > > architecture
> > > > > > > would be more than welcome in my opinion
> > > > > > >
> > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> > > > > > > <ad...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hi
> > > > > > > > The removal of the SolrWrapper is a must. It was a
> requirement
> > > for
> > > > an
> > > > > > > > internal project which has nothing to do here with a normal
> > > > operation
> > > > > > of
> > > > > > > > Manifold, so forcing the users to use Solr does not fit the
> > > > Manifold
> > > > > > > > philosophy.
> > > > > > > > In my opinion, at this moment, a Stanbol connector with such
> a
> > > big
> > > > > > > > dependency which will not fit almost any use case is not very
> > > > useful.
> > > > > > > > You should think a way to convert Stanbol connector into a
> > normal
> > > > > > > > Transformation connector without assuming that a specific
> > output
> > > > > > > connector
> > > > > > > > will be used.
> > > > > > > > Regards
> > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> > djayakody@zaizi.com
> > > >:
> > > > > > > >> Hi guys,
> > > > > > > >>
> > > > > > > >> I have developed a Stanbol connector for MCF. You can check
> it
> > > out
> > > > > > from
> > > > > > > our
> > > > > > > >> github repo here:
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > >>
> > > > > > > >> It requires the SolrWrapper output connector which indexes
> > > > enhanced
> > > > > > > >> documents, entities and entityTypes in separate Solr cores.
> > > > > Basically
> > > > > > it
> > > > > > > >> requires 3 separate solr cores configured with a specific
> Solr
> > > > > schema
> > > > > > > for
> > > > > > > >> primary documents, entities and entityTypes separately. This
> > was
> > > > > done
> > > > > > > for
> > > > > > > >> our specific use-case.
> > > > > > > >>
> > > > > > > >> The SolrWrapper code is here :
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > >>
> > > > > > > >> Perhaps we can discuss and remove the Stanbol connector's
> > > > dependency
> > > > > > > with
> > > > > > > >> SolrWrapper and have it working with any output connector.
> > > > > > > >> Please note that the Stanbol connector currently has a bug
> in
> > > the
> > > > UI
> > > > > > > >> (editSpecification) which I'm working on at the moment.
> After
> > > > fixing
> > > > > > > that I
> > > > > > > >> will update here. And also I will provide documentations for
> > > > > > configuring
> > > > > > > >> the connector.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Dileepa
> > > > > > > >>
> > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales
> <
> > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > >>
> > > > > > > >> > Hi Joshua
> > > > > > > >> >
> > > > > > > >> > It is not the list for that, but Marmotta is already
> > > integrated
> > > > in
> > > > > > > Apache
> > > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> > > > > > > >> >
> > > > > > > >> > Anyway, as I said this is not the list for that, so let's
> > use
> > > > the
> > > > > > > proper
> > > > > > > >> > list for these things.
> > > > > > > >> >
> > > > > > > >> > Regards
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > > joshua.dunham@gmail.com
> > > > > >:
> > > > > > > >> >
> > > > > > > >> > > Hey Dileepa,
> > > > > > > >> > >
> > > > > > > >> > >       In case you were interested, I pinged the list a
> few
> > > > days
> > > > > > ago
> > > > > > > >> > asking
> > > > > > > >> > > for integration tips for Apache Marmotta.
> > > > > > > >> > >
> > > > > > > >> > > I got some great tips on how to do this which could help
> > > you.
> > > > > > Since
> > > > > > > >> > > Marmotta is a drop in replacement for Clarezza on
> Stanbol
> > it
> > > > may
> > > > > > be
> > > > > > > >> > easier
> > > > > > > >> > > for you to take this way.
> > > > > > > >> > >
> > > > > > > >> > > I'm not a Java programmer but I'm bringing this problem
> to
> > > the
> > > > > > > >> > development
> > > > > > > >> > > staff at my company for assistance. If you like the
> > Marmotta
> > > > > > > approach
> > > > > > > >> we
> > > > > > > >> > > may gain more traction solving the same integration.
> > > > > > > >> > >
> > > > > > > >> > > I'm also integrating Marmotta with Stanbol so the effect
> > > would
> > > > > be
> > > > > > > the
> > > > > > > >> > same
> > > > > > > >> > > except not using the Stanbol API for data import in
> favor
> > of
> > > > > > > Marmotta.
> > > > > > > >> > >
> > > > > > > >> > > Best,
> > > > > > > >> > >
> > > > > > > >> > > -J
> > > > > > > >> > >
> > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> > > > > > djayakody@zaizi.com
> > > > > > > >
> > > > > > > >> > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > Hi all,
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks you for the feedback and offering your help in
> > > this.
> > > > > > > >> > > > Let me get back to you on where to start the code
> base.
> > > > > > > >> > > > As the first step, I would like to start by creating a
> > > > > > > architecture
> > > > > > > >> > > diagram
> > > > > > > >> > > > for the connector.
> > > > > > > >> > > > I will send the diagram for your review soon.
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks,
> > > > > > > >> > > > Dileepa
> > > > > > > >> > > >
> > > > > > > >> > > > --
> > > > > > > >> > > >
> > > > > > > >> > > > ------------------------------
> > > > > > > >> > > > This message should be regarded as confidential. If
> you
> > > have
> > > > > > > received
> > > > > > > >> > > this
> > > > > > > >> > > > email in error please notify the sender and destroy it
> > > > > > > immediately.
> > > > > > > >> > > > Statements of intent shall only become binding when
> > > > confirmed
> > > > > in
> > > > > > > hard
> > > > > > > >> > > copy
> > > > > > > >> > > > by an authorised signatory.
> > > > > > > >> > > >
> > > > > > > >> > > > Zaizi Ltd is registered in England and Wales with the
> > > > > > registration
> > > > > > > >> > number
> > > > > > > >> > > > 6440931. The Registered Office is Brook House, 229
> > > Shepherds
> > > > > > Bush
> > > > > > > >> Road,
> > > > > > > >> > > > London W6 7AN.
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >>
> > > > > > > >> ------------------------------
> > > > > > > >> This message should be regarded as confidential. If you have
> > > > > received
> > > > > > > this
> > > > > > > >> email in error please notify the sender and destroy it
> > > > immediately.
> > > > > > > >> Statements of intent shall only become binding when
> confirmed
> > in
> > > > > hard
> > > > > > > copy
> > > > > > > >> by an authorised signatory.
> > > > > > > >>
> > > > > > > >> Zaizi Ltd is registered in England and Wales with the
> > > registration
> > > > > > > number
> > > > > > > >> 6440931. The Registered Office is Brook House, 229 Shepherds
> > > Bush
> > > > > > Road,
> > > > > > > >> London W6 7AN.
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > ------------------------------
> > > > > This message should be regarded as confidential. If you have
> received
> > > > this
> > > > > email in error please notify the sender and destroy it immediately.
> > > > > Statements of intent shall only become binding when confirmed in
> hard
> > > > copy
> > > > > by an authorised signatory.
> > > > >
> > > > > Zaizi Ltd is registered in England and Wales with the registration
> > > number
> > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > Road,
> > > > > London W6 7AN.
> > > > >
> > > >
> > >
> > > --
> > >
> > > ------------------------------
> > > This message should be regarded as confidential. If you have received
> > this
> > > email in error please notify the sender and destroy it immediately.
> > > Statements of intent shall only become binding when confirmed in hard
> > copy
> > > by an authorised signatory.
> > >
> > > Zaizi Ltd is registered in England and Wales with the registration
> number
> > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > > London W6 7AN.
> > >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Dileepa Jayakody <dj...@zaizi.com>.
Hi All,

As per our discussion I have modified the Stanbol Connector so that it adds
all extracted entity URIs and entity attributes to the repository document
as fields.

On a separate branch I have committed this code to our github project
sensefy-connectors.
You can find the source code here:
https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
Let me know your feedback.

I will write a blog post on how to add it in a connection and get
ehancement results and share it with you.

Thanks,
Dileepa



On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <da...@gmail.com> wrote:

> Hi Dileepa,
>
> You cannot create sub-documents in a transformation connector.  And adding
> that capability to the framework is not possible; we would be missing key
> bookkeeping logic if that was allowed.
>
> Karl
>
>
> On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <dj...@zaizi.com>
> wrote:
>
> > Hi Karl,
> >
> > Thanks a lot for the pointer.
> >
> > Stanbol doesn't update an existing document, it generates a new response
> > with requested enhancement details for the content enhansment request.
> > For example for a request like : "Paris is a city in France" following
> RDF
> > response [1] is given by Stanbol.
> >
> > In the Stanbol connector, enhancement artifacts such as TextAnnotations
> > and EntityAnnotations are extracted from the RDF response, to generate
> the
> > entity abstractions and add them to the mcf repository document.
> Currently
> > in the Stanbol connector we have added these entity abstractions as JSON
> > strings to a multi-valued 'entities' field in the repository document and
> > we parse that JSON in the SolrWrapper output connector to index in
> separate
> > Solr cores (primary documents, linked entities and entity types with
> their
> > attributes).
> >
> > Can we can have a primary repository document and create sub documents
> for
> > the extracted entities? Is it possible to generate sub documents for a
> > repo-document in a transformation connector?
> >
> > Thanks.
> > Dileepa
> >
> > [1] Sample Stanbol response
> >
> > {
> >   "@context": {
> >     "dbp-ont": "http://dbpedia.org/ontology/",
> >     "dc": "http://purl.org/dc/terms/",
> >     "dc:created": {
> >       "@type": "xsd:dateTime"
> >     },
> >     "enhancer": "http://fise.iks-project.eu/ontology/",
> >     "enhancer:confidence": {
> >       "@type": "xsd:double"
> >     },
> >     "enhancer:end": {
> >       "@type": "xsd:int"
> >     },
> >     "enhancer:entity-reference": {
> >       "@type": "@id"
> >     },
> >     "enhancer:entity-type": {
> >       "@type": "@id"
> >     },
> >     "enhancer:extracted-from": {
> >       "@type": "@id"
> >     },
> >     "enhancer:start": {
> >       "@type": "xsd:int"
> >     },
> >     "entityhub": "
> http://stanbol.apache.org/ontology/entityhub/entityhub#
> > ",
> >     "foaf": "http://xmlns.com/foaf/0.1/",
> >     "foaf:depiction": {
> >       "@type": "@id"
> >     },
> >     "owl": "http://www.w3.org/2002/07/owl#",
> >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> >     "schema": "http://schema.org/",
> >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> >   },
> >   "@graph": [
> >     {
> >       "@id": "http://dbpedia.org/resource/France",
> >       "@type": [
> >         "dbp-ont:Country",
> >         "dbp-ont:Place",
> >         "dbp-ont:PopulatedPlace",
> >         "http://www.opengis.net/gml/_Feature",
> >         "owl:Thing",
> >         "schema:Country",
> >         "schema:Place"
> >       ],
> >       "foaf:depiction": [
> >         "
> > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg",
> >         "
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > "
> >       ],
> >       "rdfs:comment": {
> >         "@language": "en",
> >         "@value": "France, officially the French Republic, is a
> > unitary semi-presidential republic in Western Europe with several
> > overseas territories and islands located on other continents and in
> > the Indian, Pacific, and Atlantic oceans. Metropolitan France extends
> > from the Mediterranean Sea to the English Channel and the North Sea,
> > and from the Rhine to the Atlantic Ocean. It is often referred to as
> > l’Hexagone because of the geometric shape of its territory."
> >       },
> >       "rdfs:label": [
> >         {
> >           "@language": "en",
> >           "@value": "France"
> >         },
> >         {
> >           "@language": "fr",
> >           "@value": "France"
> >         },
> >       ]
> >     },
> >
> >     {
> >       "@id": "http://dbpedia.org/resource/Paris",
> >       "@type": [
> >         "dbp-ont:Place",
> >         "dbp-ont:PopulatedPlace",
> >         "dbp-ont:Settlement",
> >         "http://www.opengis.net/gml/_Feature",
> >         "owl:Thing",
> >         "schema:Place"
> >       ],
> >       "foaf:depiction": [
> >         "
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > ",
> >         "
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > "
> >       ],
> >       "geo:lat": 48.8567,
> >       "geo:long": 2.3508,
> >       "rdfs:comment": {
> >         "@language": "en",
> >         "@value": "Paris is the capital and largest city of France. It
> > is situated on the river Seine, in northern France, at the heart of
> > the Île-de-France region (or Paris Region, French: Région parisienne).
> > As of January 2008 the city of Paris, within its administrative limits
> > largely unchanged since 1860, has an estimated population of 2,211,297
> > and a metropolitan population of 12,089,098, and is one of the most
> > populated metropolitan areas in Europe."
> >       },
> >       "rdfs:label": [
> >
> >         {
> >           "@language": "en",
> >           "@value": "Paris"
> >         },
> >         {
> >           "@language": "fr",
> >           "@value": "Paris"
> >         },
> >       ]
> >     },
> >    }
> >     {
> >       "@id": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> >       "@type": [
> >         "enhancer:Enhancement",
> >         "enhancer:TextAnnotation"
> >       ],
> >       "dc:created": "2015-12-07T11:22:07.740Z",
> >       "dc:creator":
> >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> >       "dc:type": "dbp-ont:Place",
> >       "enhancer:confidence": 0.6017613,
> >       "enhancer:end": 5,
> >       "enhancer:extracted-from":
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> >       "enhancer:selected-text": {
> >         "@language": "en",
> >         "@value": "Paris"
> >       },
> >       "enhancer:selection-context": {
> >         "@language": "en",
> >         "@value": "Paris is in France"
> >       },
> >       "enhancer:start": 0
> >     },
> >     {
> >       "@id": "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> >       "@type": [
> >         "enhancer:Enhancement",
> >         "enhancer:EntityAnnotation"
> >       ],
> >       "dc:created": "2015-12-07T11:22:07.748Z",
> >       "dc:creator":
> >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> >       "dc:relation":
> > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> >       "enhancer:confidence": 1.0,
> >       "enhancer:entity-label": {
> >         "@language": "en",
> >         "@value": "France"
> >       },
> >       "enhancer:entity-reference": "http://dbpedia.org/resource/France",
> >       "enhancer:entity-type": [
> >         "dbp-ont:Country",
> >         "dbp-ont:Place",
> >         "dbp-ont:PopulatedPlace",
> >         "schema:Country",
> >         "schema:Place",
> >         "http://www.opengis.net/gml/_Feature",
> >         "owl:Thing"
> >       ],
> >       "enhancer:extracted-from":
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> >       "entityhub:site": "dbpedia"
> >     },
> >     {
> >       "@id": "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> >       "@type": [
> >         "enhancer:Enhancement",
> >         "enhancer:EntityAnnotation"
> >       ],
> >       "dc:created": "2015-12-07T11:22:07.748Z",
> >       "dc:creator":
> >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> >       "dc:relation":
> > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> >       "enhancer:confidence": 0.25715446,
> >       "enhancer:entity-label": {
> >         "@language": "en",
> >         "@value": "Vichy France"
> >       },
> >       "enhancer:entity-reference": "
> > http://dbpedia.org/resource/Vichy_France",
> >       "enhancer:entity-type": [
> >         "dbp-ont:Country",
> >         "dbp-ont:Place",
> >         "dbp-ont:PopulatedPlace",
> >         "schema:Country",
> >         "schema:Place",
> >         "http://www.opengis.net/gml/_Feature",
> >         "owl:Thing"
> >       ],
> >       "enhancer:extracted-from":
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> >       "entityhub:site": "dbpedia"
> >     },
> >     {
> >       "@id": "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> >       "@type": [
> >         "enhancer:Enhancement",
> >         "enhancer:EntityAnnotation"
> >       ],
> >       "dc:created": "2015-12-07T11:22:07.748Z",
> >       "dc:creator":
> >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> >       "dc:relation":
> > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> >       "enhancer:confidence": 0.1493264,
> >       "enhancer:entity-label": {
> >         "@language": "en",
> >         "@value": "Paris Commune"
> >       },
> >       "enhancer:entity-reference": "
> > http://dbpedia.org/resource/Paris_Commune",
> >       "enhancer:entity-type": [
> >         "dbp-ont:Country",
> >         "dbp-ont:Place",
> >         "dbp-ont:PopulatedPlace",
> >         "schema:Country",
> >         "schema:Place",
> >         "owl:Thing"
> >       ],
> >       "enhancer:extracted-from":
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> >       "entityhub:site": "dbpedia"
> >     },
> >     {
> >       "@id": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> >       "@type": [
> >         "enhancer:Enhancement",
> >         "enhancer:TextAnnotation"
> >       ],
> >       "dc:created": "2015-12-07T11:22:07.740Z",
> >       "dc:creator":
> >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> >       "dc:type": "dbp-ont:Place",
> >       "enhancer:confidence": 0.99354976,
> >       "enhancer:end": 18,
> >       "enhancer:extracted-from":
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> >       "enhancer:selected-text": {
> >         "@language": "en",
> >         "@value": "France"
> >       },
> >       "enhancer:selection-context": {
> >         "@language": "en",
> >         "@value": "Paris is in France"
> >       },
> >       "enhancer:start": 12
> >     }
> >   ]
> > }
> >
> >
> >
> >
> >
> >
> > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <da...@gmail.com> wrote:
> >
> > > Hi Dileepa,
> > >
> > > Repository connectors have an abstraction that allows them to generate
> > > compound documents (where a document has a primary identifier, and
> there
> > > are subdocuments that share that primary identifier and have a
> secondary
> > > identifier).  This sounds a bit like what you are describing.  Does
> > Stanbol
> > > work by decorating an existing document, or does it work by generating
> > all
> > > content for a document?
> > >
> > > Karl
> > >
> > >
> > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <dj...@zaizi.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > >
> > > > While thanking you all for your input on Stanbol connector
> > requirement, I
> > > > would like to continue with modifying the Stanbol connector to be
> > > > compatible with any output connector. If you guys can give some
> > guidance
> > > on
> > > > how the entity metadata should be added to the repository document I
> > can
> > > > modify the stanbol connector accordingly.
> > > >
> > > > From Rafa's comments, I gathered we can add the entity metadata to
> the
> > > > repo.doc as key value pairs.
> > > > However this idea is not yet clear to me. There could be 'N' number
> of
> > > > entities in a document and each of them will have some common
> > attributes
> > > > such as name, id, type and specific attributes for particular entity
> > > type.
> > > > I'm not clear on how to maintain that structure of N number of
> entities
> > > > with their attributes in a repo.document as key value pairs and make
> > them
> > > > LDPath compatible for retrieval in an output connector.
> > > >
> > > > @Rafa
> > > > If you can please elaborate on your suggestion it would be greatly
> > > helpful
> > > > to me.
> > > > All other suggestions are also welcome.
> > > >
> > > > Thanks,
> > > > Dileepa
> > > >
> > > >
> > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <da...@gmail.com>
> > wrote:
> > > >
> > > > > I, too, agree.  Somebody will need to turn this connector into one
> > that
> > > > > plays by the rules.  It may be possible for someone on the team
> here
> > to
> > > > do
> > > > > that, but it won't be me; I'm seriously overextended at the moment.
> > It
> > > > > would be best if someone who knew the connector well could do the
> > > > necessary
> > > > > work.
> > > > >
> > > > > Karl
> > > > >
> > > > >
> > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rh...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > I must agree with Antonio. When I started to work on this I was
> > > > expecting
> > > > > > the connector to work by just extracting the entities and
> entities
> > > > > metadata
> > > > > > and put them as plain metadata of the documents, probably
> following
> > > > > LDPATH
> > > > > > queries configuration
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > This is probably ok for Sensefy but I don’t think this could be
> > > > suitable
> > > > > > to be included in the project. But this is only my opinion. Of
> > > course,
> > > > a
> > > > > > version of the connector that fully respect the ManifoldCF
> > > architecture
> > > > > > would be more than welcome in my opinion
> > > > > >
> > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> > > > > > <ad...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi
> > > > > > > The removal of the SolrWrapper is a must. It was a requirement
> > for
> > > an
> > > > > > > internal project which has nothing to do here with a normal
> > > operation
> > > > > of
> > > > > > > Manifold, so forcing the users to use Solr does not fit the
> > > Manifold
> > > > > > > philosophy.
> > > > > > > In my opinion, at this moment, a Stanbol connector with such a
> > big
> > > > > > > dependency which will not fit almost any use case is not very
> > > useful.
> > > > > > > You should think a way to convert Stanbol connector into a
> normal
> > > > > > > Transformation connector without assuming that a specific
> output
> > > > > > connector
> > > > > > > will be used.
> > > > > > > Regards
> > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> djayakody@zaizi.com
> > >:
> > > > > > >> Hi guys,
> > > > > > >>
> > > > > > >> I have developed a Stanbol connector for MCF. You can check it
> > out
> > > > > from
> > > > > > our
> > > > > > >> github repo here:
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > >>
> > > > > > >> It requires the SolrWrapper output connector which indexes
> > > enhanced
> > > > > > >> documents, entities and entityTypes in separate Solr cores.
> > > > Basically
> > > > > it
> > > > > > >> requires 3 separate solr cores configured with a specific Solr
> > > > schema
> > > > > > for
> > > > > > >> primary documents, entities and entityTypes separately. This
> was
> > > > done
> > > > > > for
> > > > > > >> our specific use-case.
> > > > > > >>
> > > > > > >> The SolrWrapper code is here :
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > >>
> > > > > > >> Perhaps we can discuss and remove the Stanbol connector's
> > > dependency
> > > > > > with
> > > > > > >> SolrWrapper and have it working with any output connector.
> > > > > > >> Please note that the Stanbol connector currently has a bug in
> > the
> > > UI
> > > > > > >> (editSpecification) which I'm working on at the moment. After
> > > fixing
> > > > > > that I
> > > > > > >> will update here. And also I will provide documentations for
> > > > > configuring
> > > > > > >> the connector.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Dileepa
> > > > > > >>
> > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
> > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > >>
> > > > > > >> > Hi Joshua
> > > > > > >> >
> > > > > > >> > It is not the list for that, but Marmotta is already
> > integrated
> > > in
> > > > > > Apache
> > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> > > > > > >> >
> > > > > > >> > Anyway, as I said this is not the list for that, so let's
> use
> > > the
> > > > > > proper
> > > > > > >> > list for these things.
> > > > > > >> >
> > > > > > >> > Regards
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > joshua.dunham@gmail.com
> > > > >:
> > > > > > >> >
> > > > > > >> > > Hey Dileepa,
> > > > > > >> > >
> > > > > > >> > >       In case you were interested, I pinged the list a few
> > > days
> > > > > ago
> > > > > > >> > asking
> > > > > > >> > > for integration tips for Apache Marmotta.
> > > > > > >> > >
> > > > > > >> > > I got some great tips on how to do this which could help
> > you.
> > > > > Since
> > > > > > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol
> it
> > > may
> > > > > be
> > > > > > >> > easier
> > > > > > >> > > for you to take this way.
> > > > > > >> > >
> > > > > > >> > > I'm not a Java programmer but I'm bringing this problem to
> > the
> > > > > > >> > development
> > > > > > >> > > staff at my company for assistance. If you like the
> Marmotta
> > > > > > approach
> > > > > > >> we
> > > > > > >> > > may gain more traction solving the same integration.
> > > > > > >> > >
> > > > > > >> > > I'm also integrating Marmotta with Stanbol so the effect
> > would
> > > > be
> > > > > > the
> > > > > > >> > same
> > > > > > >> > > except not using the Stanbol API for data import in favor
> of
> > > > > > Marmotta.
> > > > > > >> > >
> > > > > > >> > > Best,
> > > > > > >> > >
> > > > > > >> > > -J
> > > > > > >> > >
> > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> > > > > djayakody@zaizi.com
> > > > > > >
> > > > > > >> > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > Hi all,
> > > > > > >> > > >
> > > > > > >> > > > Thanks you for the feedback and offering your help in
> > this.
> > > > > > >> > > > Let me get back to you on where to start the code base.
> > > > > > >> > > > As the first step, I would like to start by creating a
> > > > > > architecture
> > > > > > >> > > diagram
> > > > > > >> > > > for the connector.
> > > > > > >> > > > I will send the diagram for your review soon.
> > > > > > >> > > >
> > > > > > >> > > > Thanks,
> > > > > > >> > > > Dileepa
> > > > > > >> > > >
> > > > > > >> > > > --
> > > > > > >> > > >
> > > > > > >> > > > ------------------------------
> > > > > > >> > > > This message should be regarded as confidential. If you
> > have
> > > > > > received
> > > > > > >> > > this
> > > > > > >> > > > email in error please notify the sender and destroy it
> > > > > > immediately.
> > > > > > >> > > > Statements of intent shall only become binding when
> > > confirmed
> > > > in
> > > > > > hard
> > > > > > >> > > copy
> > > > > > >> > > > by an authorised signatory.
> > > > > > >> > > >
> > > > > > >> > > > Zaizi Ltd is registered in England and Wales with the
> > > > > registration
> > > > > > >> > number
> > > > > > >> > > > 6440931. The Registered Office is Brook House, 229
> > Shepherds
> > > > > Bush
> > > > > > >> Road,
> > > > > > >> > > > London W6 7AN.
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >> --
> > > > > > >>
> > > > > > >> ------------------------------
> > > > > > >> This message should be regarded as confidential. If you have
> > > > received
> > > > > > this
> > > > > > >> email in error please notify the sender and destroy it
> > > immediately.
> > > > > > >> Statements of intent shall only become binding when confirmed
> in
> > > > hard
> > > > > > copy
> > > > > > >> by an authorised signatory.
> > > > > > >>
> > > > > > >> Zaizi Ltd is registered in England and Wales with the
> > registration
> > > > > > number
> > > > > > >> 6440931. The Registered Office is Brook House, 229 Shepherds
> > Bush
> > > > > Road,
> > > > > > >> London W6 7AN.
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > > ------------------------------
> > > > This message should be regarded as confidential. If you have received
> > > this
> > > > email in error please notify the sender and destroy it immediately.
> > > > Statements of intent shall only become binding when confirmed in hard
> > > copy
> > > > by an authorised signatory.
> > > >
> > > > Zaizi Ltd is registered in England and Wales with the registration
> > number
> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> Road,
> > > > London W6 7AN.
> > > >
> > >
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
> >
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Karl Wright <da...@gmail.com>.
Hi Dileepa,

You cannot create sub-documents in a transformation connector.  And adding
that capability to the framework is not possible; we would be missing key
bookkeeping logic if that was allowed.

Karl


On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <dj...@zaizi.com>
wrote:

> Hi Karl,
>
> Thanks a lot for the pointer.
>
> Stanbol doesn't update an existing document, it generates a new response
> with requested enhancement details for the content enhansment request.
> For example for a request like : "Paris is a city in France" following RDF
> response [1] is given by Stanbol.
>
> In the Stanbol connector, enhancement artifacts such as TextAnnotations
> and EntityAnnotations are extracted from the RDF response, to generate the
> entity abstractions and add them to the mcf repository document. Currently
> in the Stanbol connector we have added these entity abstractions as JSON
> strings to a multi-valued 'entities' field in the repository document and
> we parse that JSON in the SolrWrapper output connector to index in separate
> Solr cores (primary documents, linked entities and entity types with their
> attributes).
>
> Can we can have a primary repository document and create sub documents for
> the extracted entities? Is it possible to generate sub documents for a
> repo-document in a transformation connector?
>
> Thanks.
> Dileepa
>
> [1] Sample Stanbol response
>
> {
>   "@context": {
>     "dbp-ont": "http://dbpedia.org/ontology/",
>     "dc": "http://purl.org/dc/terms/",
>     "dc:created": {
>       "@type": "xsd:dateTime"
>     },
>     "enhancer": "http://fise.iks-project.eu/ontology/",
>     "enhancer:confidence": {
>       "@type": "xsd:double"
>     },
>     "enhancer:end": {
>       "@type": "xsd:int"
>     },
>     "enhancer:entity-reference": {
>       "@type": "@id"
>     },
>     "enhancer:entity-type": {
>       "@type": "@id"
>     },
>     "enhancer:extracted-from": {
>       "@type": "@id"
>     },
>     "enhancer:start": {
>       "@type": "xsd:int"
>     },
>     "entityhub": "http://stanbol.apache.org/ontology/entityhub/entityhub#
> ",
>     "foaf": "http://xmlns.com/foaf/0.1/",
>     "foaf:depiction": {
>       "@type": "@id"
>     },
>     "owl": "http://www.w3.org/2002/07/owl#",
>     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
>     "schema": "http://schema.org/",
>     "xsd": "http://www.w3.org/2001/XMLSchema#"
>   },
>   "@graph": [
>     {
>       "@id": "http://dbpedia.org/resource/France",
>       "@type": [
>         "dbp-ont:Country",
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "http://www.opengis.net/gml/_Feature",
>         "owl:Thing",
>         "schema:Country",
>         "schema:Place"
>       ],
>       "foaf:depiction": [
>         "
> http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg",
>         "
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> "
>       ],
>       "rdfs:comment": {
>         "@language": "en",
>         "@value": "France, officially the French Republic, is a
> unitary semi-presidential republic in Western Europe with several
> overseas territories and islands located on other continents and in
> the Indian, Pacific, and Atlantic oceans. Metropolitan France extends
> from the Mediterranean Sea to the English Channel and the North Sea,
> and from the Rhine to the Atlantic Ocean. It is often referred to as
> l’Hexagone because of the geometric shape of its territory."
>       },
>       "rdfs:label": [
>         {
>           "@language": "en",
>           "@value": "France"
>         },
>         {
>           "@language": "fr",
>           "@value": "France"
>         },
>       ]
>     },
>
>     {
>       "@id": "http://dbpedia.org/resource/Paris",
>       "@type": [
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "dbp-ont:Settlement",
>         "http://www.opengis.net/gml/_Feature",
>         "owl:Thing",
>         "schema:Place"
>       ],
>       "foaf:depiction": [
>         "
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> ",
>         "
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> "
>       ],
>       "geo:lat": 48.8567,
>       "geo:long": 2.3508,
>       "rdfs:comment": {
>         "@language": "en",
>         "@value": "Paris is the capital and largest city of France. It
> is situated on the river Seine, in northern France, at the heart of
> the Île-de-France region (or Paris Region, French: Région parisienne).
> As of January 2008 the city of Paris, within its administrative limits
> largely unchanged since 1860, has an estimated population of 2,211,297
> and a metropolitan population of 12,089,098, and is one of the most
> populated metropolitan areas in Europe."
>       },
>       "rdfs:label": [
>
>         {
>           "@language": "en",
>           "@value": "Paris"
>         },
>         {
>           "@language": "fr",
>           "@value": "Paris"
>         },
>       ]
>     },
>    }
>     {
>       "@id": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:TextAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.740Z",
>       "dc:creator":
>
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
>       "dc:type": "dbp-ont:Place",
>       "enhancer:confidence": 0.6017613,
>       "enhancer:end": 5,
>       "enhancer:extracted-from":
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "enhancer:selected-text": {
>         "@language": "en",
>         "@value": "Paris"
>       },
>       "enhancer:selection-context": {
>         "@language": "en",
>         "@value": "Paris is in France"
>       },
>       "enhancer:start": 0
>     },
>     {
>       "@id": "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:EntityAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.748Z",
>       "dc:creator":
>
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
>       "dc:relation":
> "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
>       "enhancer:confidence": 1.0,
>       "enhancer:entity-label": {
>         "@language": "en",
>         "@value": "France"
>       },
>       "enhancer:entity-reference": "http://dbpedia.org/resource/France",
>       "enhancer:entity-type": [
>         "dbp-ont:Country",
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "schema:Country",
>         "schema:Place",
>         "http://www.opengis.net/gml/_Feature",
>         "owl:Thing"
>       ],
>       "enhancer:extracted-from":
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "entityhub:site": "dbpedia"
>     },
>     {
>       "@id": "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:EntityAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.748Z",
>       "dc:creator":
>
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
>       "dc:relation":
> "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
>       "enhancer:confidence": 0.25715446,
>       "enhancer:entity-label": {
>         "@language": "en",
>         "@value": "Vichy France"
>       },
>       "enhancer:entity-reference": "
> http://dbpedia.org/resource/Vichy_France",
>       "enhancer:entity-type": [
>         "dbp-ont:Country",
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "schema:Country",
>         "schema:Place",
>         "http://www.opengis.net/gml/_Feature",
>         "owl:Thing"
>       ],
>       "enhancer:extracted-from":
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "entityhub:site": "dbpedia"
>     },
>     {
>       "@id": "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:EntityAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.748Z",
>       "dc:creator":
>
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
>       "dc:relation":
> "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
>       "enhancer:confidence": 0.1493264,
>       "enhancer:entity-label": {
>         "@language": "en",
>         "@value": "Paris Commune"
>       },
>       "enhancer:entity-reference": "
> http://dbpedia.org/resource/Paris_Commune",
>       "enhancer:entity-type": [
>         "dbp-ont:Country",
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "schema:Country",
>         "schema:Place",
>         "owl:Thing"
>       ],
>       "enhancer:extracted-from":
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "entityhub:site": "dbpedia"
>     },
>     {
>       "@id": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:TextAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.740Z",
>       "dc:creator":
>
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
>       "dc:type": "dbp-ont:Place",
>       "enhancer:confidence": 0.99354976,
>       "enhancer:end": 18,
>       "enhancer:extracted-from":
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "enhancer:selected-text": {
>         "@language": "en",
>         "@value": "France"
>       },
>       "enhancer:selection-context": {
>         "@language": "en",
>         "@value": "Paris is in France"
>       },
>       "enhancer:start": 12
>     }
>   ]
> }
>
>
>
>
>
>
> On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <da...@gmail.com> wrote:
>
> > Hi Dileepa,
> >
> > Repository connectors have an abstraction that allows them to generate
> > compound documents (where a document has a primary identifier, and there
> > are subdocuments that share that primary identifier and have a secondary
> > identifier).  This sounds a bit like what you are describing.  Does
> Stanbol
> > work by decorating an existing document, or does it work by generating
> all
> > content for a document?
> >
> > Karl
> >
> >
> > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <dj...@zaizi.com>
> > wrote:
> >
> > > Hi All,
> > >
> > >
> > > While thanking you all for your input on Stanbol connector
> requirement, I
> > > would like to continue with modifying the Stanbol connector to be
> > > compatible with any output connector. If you guys can give some
> guidance
> > on
> > > how the entity metadata should be added to the repository document I
> can
> > > modify the stanbol connector accordingly.
> > >
> > > From Rafa's comments, I gathered we can add the entity metadata to the
> > > repo.doc as key value pairs.
> > > However this idea is not yet clear to me. There could be 'N' number of
> > > entities in a document and each of them will have some common
> attributes
> > > such as name, id, type and specific attributes for particular entity
> > type.
> > > I'm not clear on how to maintain that structure of N number of entities
> > > with their attributes in a repo.document as key value pairs and make
> them
> > > LDPath compatible for retrieval in an output connector.
> > >
> > > @Rafa
> > > If you can please elaborate on your suggestion it would be greatly
> > helpful
> > > to me.
> > > All other suggestions are also welcome.
> > >
> > > Thanks,
> > > Dileepa
> > >
> > >
> > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <da...@gmail.com>
> wrote:
> > >
> > > > I, too, agree.  Somebody will need to turn this connector into one
> that
> > > > plays by the rules.  It may be possible for someone on the team here
> to
> > > do
> > > > that, but it won't be me; I'm seriously overextended at the moment.
> It
> > > > would be best if someone who knew the connector well could do the
> > > necessary
> > > > work.
> > > >
> > > > Karl
> > > >
> > > >
> > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rh...@gmail.com>
> > > wrote:
> > > >
> > > > > I must agree with Antonio. When I started to work on this I was
> > > expecting
> > > > > the connector to work by just extracting the entities and entities
> > > > metadata
> > > > > and put them as plain metadata of the documents, probably following
> > > > LDPATH
> > > > > queries configuration
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > This is probably ok for Sensefy but I don’t think this could be
> > > suitable
> > > > > to be included in the project. But this is only my opinion. Of
> > course,
> > > a
> > > > > version of the connector that fully respect the ManifoldCF
> > architecture
> > > > > would be more than welcome in my opinion
> > > > >
> > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> > > > > <ad...@gmail.com> wrote:
> > > > >
> > > > > > Hi
> > > > > > The removal of the SolrWrapper is a must. It was a requirement
> for
> > an
> > > > > > internal project which has nothing to do here with a normal
> > operation
> > > > of
> > > > > > Manifold, so forcing the users to use Solr does not fit the
> > Manifold
> > > > > > philosophy.
> > > > > > In my opinion, at this moment, a Stanbol connector with such a
> big
> > > > > > dependency which will not fit almost any use case is not very
> > useful.
> > > > > > You should think a way to convert Stanbol connector into a normal
> > > > > > Transformation connector without assuming that a specific output
> > > > > connector
> > > > > > will be used.
> > > > > > Regards
> > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <djayakody@zaizi.com
> >:
> > > > > >> Hi guys,
> > > > > >>
> > > > > >> I have developed a Stanbol connector for MCF. You can check it
> out
> > > > from
> > > > > our
> > > > > >> github repo here:
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > >>
> > > > > >> It requires the SolrWrapper output connector which indexes
> > enhanced
> > > > > >> documents, entities and entityTypes in separate Solr cores.
> > > Basically
> > > > it
> > > > > >> requires 3 separate solr cores configured with a specific Solr
> > > schema
> > > > > for
> > > > > >> primary documents, entities and entityTypes separately. This was
> > > done
> > > > > for
> > > > > >> our specific use-case.
> > > > > >>
> > > > > >> The SolrWrapper code is here :
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > >>
> > > > > >> Perhaps we can discuss and remove the Stanbol connector's
> > dependency
> > > > > with
> > > > > >> SolrWrapper and have it working with any output connector.
> > > > > >> Please note that the Stanbol connector currently has a bug in
> the
> > UI
> > > > > >> (editSpecification) which I'm working on at the moment. After
> > fixing
> > > > > that I
> > > > > >> will update here. And also I will provide documentations for
> > > > configuring
> > > > > >> the connector.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Dileepa
> > > > > >>
> > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
> > > > > >> adperezmorales@gmail.com> wrote:
> > > > > >>
> > > > > >> > Hi Joshua
> > > > > >> >
> > > > > >> > It is not the list for that, but Marmotta is already
> integrated
> > in
> > > > > Apache
> > > > > >> > Stanbol. You can take a look at this issue
> > > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> > > > > >> >
> > > > > >> > Anyway, as I said this is not the list for that, so let's use
> > the
> > > > > proper
> > > > > >> > list for these things.
> > > > > >> >
> > > > > >> > Regards
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > joshua.dunham@gmail.com
> > > >:
> > > > > >> >
> > > > > >> > > Hey Dileepa,
> > > > > >> > >
> > > > > >> > >       In case you were interested, I pinged the list a few
> > days
> > > > ago
> > > > > >> > asking
> > > > > >> > > for integration tips for Apache Marmotta.
> > > > > >> > >
> > > > > >> > > I got some great tips on how to do this which could help
> you.
> > > > Since
> > > > > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol it
> > may
> > > > be
> > > > > >> > easier
> > > > > >> > > for you to take this way.
> > > > > >> > >
> > > > > >> > > I'm not a Java programmer but I'm bringing this problem to
> the
> > > > > >> > development
> > > > > >> > > staff at my company for assistance. If you like the Marmotta
> > > > > approach
> > > > > >> we
> > > > > >> > > may gain more traction solving the same integration.
> > > > > >> > >
> > > > > >> > > I'm also integrating Marmotta with Stanbol so the effect
> would
> > > be
> > > > > the
> > > > > >> > same
> > > > > >> > > except not using the Stanbol API for data import in favor of
> > > > > Marmotta.
> > > > > >> > >
> > > > > >> > > Best,
> > > > > >> > >
> > > > > >> > > -J
> > > > > >> > >
> > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> > > > djayakody@zaizi.com
> > > > > >
> > > > > >> > > wrote:
> > > > > >> > > >
> > > > > >> > > > Hi all,
> > > > > >> > > >
> > > > > >> > > > Thanks you for the feedback and offering your help in
> this.
> > > > > >> > > > Let me get back to you on where to start the code base.
> > > > > >> > > > As the first step, I would like to start by creating a
> > > > > architecture
> > > > > >> > > diagram
> > > > > >> > > > for the connector.
> > > > > >> > > > I will send the diagram for your review soon.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Dileepa
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > >
> > > > > >> > > > ------------------------------
> > > > > >> > > > This message should be regarded as confidential. If you
> have
> > > > > received
> > > > > >> > > this
> > > > > >> > > > email in error please notify the sender and destroy it
> > > > > immediately.
> > > > > >> > > > Statements of intent shall only become binding when
> > confirmed
> > > in
> > > > > hard
> > > > > >> > > copy
> > > > > >> > > > by an authorised signatory.
> > > > > >> > > >
> > > > > >> > > > Zaizi Ltd is registered in England and Wales with the
> > > > registration
> > > > > >> > number
> > > > > >> > > > 6440931. The Registered Office is Brook House, 229
> Shepherds
> > > > Bush
> > > > > >> Road,
> > > > > >> > > > London W6 7AN.
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >> --
> > > > > >>
> > > > > >> ------------------------------
> > > > > >> This message should be regarded as confidential. If you have
> > > received
> > > > > this
> > > > > >> email in error please notify the sender and destroy it
> > immediately.
> > > > > >> Statements of intent shall only become binding when confirmed in
> > > hard
> > > > > copy
> > > > > >> by an authorised signatory.
> > > > > >>
> > > > > >> Zaizi Ltd is registered in England and Wales with the
> registration
> > > > > number
> > > > > >> 6440931. The Registered Office is Brook House, 229 Shepherds
> Bush
> > > > Road,
> > > > > >> London W6 7AN.
> > > > > >>
> > > > >
> > > >
> > >
> > > --
> > >
> > > ------------------------------
> > > This message should be regarded as confidential. If you have received
> > this
> > > email in error please notify the sender and destroy it immediately.
> > > Statements of intent shall only become binding when confirmed in hard
> > copy
> > > by an authorised signatory.
> > >
> > > Zaizi Ltd is registered in England and Wales with the registration
> number
> > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > > London W6 7AN.
> > >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Dileepa Jayakody <dj...@zaizi.com>.
Hi Karl,

Thanks a lot for the pointer.

Stanbol doesn't update an existing document, it generates a new response
with requested enhancement details for the content enhansment request.
For example for a request like : "Paris is a city in France" following RDF
response [1] is given by Stanbol.

In the Stanbol connector, enhancement artifacts such as TextAnnotations
and EntityAnnotations are extracted from the RDF response, to generate the
entity abstractions and add them to the mcf repository document. Currently
in the Stanbol connector we have added these entity abstractions as JSON
strings to a multi-valued 'entities' field in the repository document and
we parse that JSON in the SolrWrapper output connector to index in separate
Solr cores (primary documents, linked entities and entity types with their
attributes).

Can we can have a primary repository document and create sub documents for
the extracted entities? Is it possible to generate sub documents for a
repo-document in a transformation connector?

Thanks.
Dileepa

[1] Sample Stanbol response

{
  "@context": {
    "dbp-ont": "http://dbpedia.org/ontology/",
    "dc": "http://purl.org/dc/terms/",
    "dc:created": {
      "@type": "xsd:dateTime"
    },
    "enhancer": "http://fise.iks-project.eu/ontology/",
    "enhancer:confidence": {
      "@type": "xsd:double"
    },
    "enhancer:end": {
      "@type": "xsd:int"
    },
    "enhancer:entity-reference": {
      "@type": "@id"
    },
    "enhancer:entity-type": {
      "@type": "@id"
    },
    "enhancer:extracted-from": {
      "@type": "@id"
    },
    "enhancer:start": {
      "@type": "xsd:int"
    },
    "entityhub": "http://stanbol.apache.org/ontology/entityhub/entityhub#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "foaf:depiction": {
      "@type": "@id"
    },
    "owl": "http://www.w3.org/2002/07/owl#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "schema": "http://schema.org/",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  },
  "@graph": [
    {
      "@id": "http://dbpedia.org/resource/France",
      "@type": [
        "dbp-ont:Country",
        "dbp-ont:Place",
        "dbp-ont:PopulatedPlace",
        "http://www.opengis.net/gml/_Feature",
        "owl:Thing",
        "schema:Country",
        "schema:Place"
      ],
      "foaf:depiction": [
        "http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg",
        "http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png"
      ],
      "rdfs:comment": {
        "@language": "en",
        "@value": "France, officially the French Republic, is a
unitary semi-presidential republic in Western Europe with several
overseas territories and islands located on other continents and in
the Indian, Pacific, and Atlantic oceans. Metropolitan France extends
from the Mediterranean Sea to the English Channel and the North Sea,
and from the Rhine to the Atlantic Ocean. It is often referred to as
l’Hexagone because of the geometric shape of its territory."
      },
      "rdfs:label": [
        {
          "@language": "en",
          "@value": "France"
        },
        {
          "@language": "fr",
          "@value": "France"
        },
      ]
    },

    {
      "@id": "http://dbpedia.org/resource/Paris",
      "@type": [
        "dbp-ont:Place",
        "dbp-ont:PopulatedPlace",
        "dbp-ont:Settlement",
        "http://www.opengis.net/gml/_Feature",
        "owl:Thing",
        "schema:Place"
      ],
      "foaf:depiction": [
        "http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg",
        "http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg"
      ],
      "geo:lat": 48.8567,
      "geo:long": 2.3508,
      "rdfs:comment": {
        "@language": "en",
        "@value": "Paris is the capital and largest city of France. It
is situated on the river Seine, in northern France, at the heart of
the Île-de-France region (or Paris Region, French: Région parisienne).
As of January 2008 the city of Paris, within its administrative limits
largely unchanged since 1860, has an estimated population of 2,211,297
and a metropolitan population of 12,089,098, and is one of the most
populated metropolitan areas in Europe."
      },
      "rdfs:label": [

        {
          "@language": "en",
          "@value": "Paris"
        },
        {
          "@language": "fr",
          "@value": "Paris"
        },
      ]
    },
   }
    {
      "@id": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
      "@type": [
        "enhancer:Enhancement",
        "enhancer:TextAnnotation"
      ],
      "dc:created": "2015-12-07T11:22:07.740Z",
      "dc:creator":
"org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
      "dc:type": "dbp-ont:Place",
      "enhancer:confidence": 0.6017613,
      "enhancer:end": 5,
      "enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
      "enhancer:selected-text": {
        "@language": "en",
        "@value": "Paris"
      },
      "enhancer:selection-context": {
        "@language": "en",
        "@value": "Paris is in France"
      },
      "enhancer:start": 0
    },
    {
      "@id": "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
      "@type": [
        "enhancer:Enhancement",
        "enhancer:EntityAnnotation"
      ],
      "dc:created": "2015-12-07T11:22:07.748Z",
      "dc:creator":
"org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
      "dc:relation": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
      "enhancer:confidence": 1.0,
      "enhancer:entity-label": {
        "@language": "en",
        "@value": "France"
      },
      "enhancer:entity-reference": "http://dbpedia.org/resource/France",
      "enhancer:entity-type": [
        "dbp-ont:Country",
        "dbp-ont:Place",
        "dbp-ont:PopulatedPlace",
        "schema:Country",
        "schema:Place",
        "http://www.opengis.net/gml/_Feature",
        "owl:Thing"
      ],
      "enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
      "entityhub:site": "dbpedia"
    },
    {
      "@id": "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
      "@type": [
        "enhancer:Enhancement",
        "enhancer:EntityAnnotation"
      ],
      "dc:created": "2015-12-07T11:22:07.748Z",
      "dc:creator":
"org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
      "dc:relation": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
      "enhancer:confidence": 0.25715446,
      "enhancer:entity-label": {
        "@language": "en",
        "@value": "Vichy France"
      },
      "enhancer:entity-reference": "http://dbpedia.org/resource/Vichy_France",
      "enhancer:entity-type": [
        "dbp-ont:Country",
        "dbp-ont:Place",
        "dbp-ont:PopulatedPlace",
        "schema:Country",
        "schema:Place",
        "http://www.opengis.net/gml/_Feature",
        "owl:Thing"
      ],
      "enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
      "entityhub:site": "dbpedia"
    },
    {
      "@id": "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
      "@type": [
        "enhancer:Enhancement",
        "enhancer:EntityAnnotation"
      ],
      "dc:created": "2015-12-07T11:22:07.748Z",
      "dc:creator":
"org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
      "dc:relation": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
      "enhancer:confidence": 0.1493264,
      "enhancer:entity-label": {
        "@language": "en",
        "@value": "Paris Commune"
      },
      "enhancer:entity-reference": "http://dbpedia.org/resource/Paris_Commune",
      "enhancer:entity-type": [
        "dbp-ont:Country",
        "dbp-ont:Place",
        "dbp-ont:PopulatedPlace",
        "schema:Country",
        "schema:Place",
        "owl:Thing"
      ],
      "enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
      "entityhub:site": "dbpedia"
    },
    {
      "@id": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
      "@type": [
        "enhancer:Enhancement",
        "enhancer:TextAnnotation"
      ],
      "dc:created": "2015-12-07T11:22:07.740Z",
      "dc:creator":
"org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
      "dc:type": "dbp-ont:Place",
      "enhancer:confidence": 0.99354976,
      "enhancer:end": 18,
      "enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
      "enhancer:selected-text": {
        "@language": "en",
        "@value": "France"
      },
      "enhancer:selection-context": {
        "@language": "en",
        "@value": "Paris is in France"
      },
      "enhancer:start": 12
    }
  ]
}






On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <da...@gmail.com> wrote:

> Hi Dileepa,
>
> Repository connectors have an abstraction that allows them to generate
> compound documents (where a document has a primary identifier, and there
> are subdocuments that share that primary identifier and have a secondary
> identifier).  This sounds a bit like what you are describing.  Does Stanbol
> work by decorating an existing document, or does it work by generating all
> content for a document?
>
> Karl
>
>
> On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <dj...@zaizi.com>
> wrote:
>
> > Hi All,
> >
> >
> > While thanking you all for your input on Stanbol connector requirement, I
> > would like to continue with modifying the Stanbol connector to be
> > compatible with any output connector. If you guys can give some guidance
> on
> > how the entity metadata should be added to the repository document I can
> > modify the stanbol connector accordingly.
> >
> > From Rafa's comments, I gathered we can add the entity metadata to the
> > repo.doc as key value pairs.
> > However this idea is not yet clear to me. There could be 'N' number of
> > entities in a document and each of them will have some common attributes
> > such as name, id, type and specific attributes for particular entity
> type.
> > I'm not clear on how to maintain that structure of N number of entities
> > with their attributes in a repo.document as key value pairs and make them
> > LDPath compatible for retrieval in an output connector.
> >
> > @Rafa
> > If you can please elaborate on your suggestion it would be greatly
> helpful
> > to me.
> > All other suggestions are also welcome.
> >
> > Thanks,
> > Dileepa
> >
> >
> > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <da...@gmail.com> wrote:
> >
> > > I, too, agree.  Somebody will need to turn this connector into one that
> > > plays by the rules.  It may be possible for someone on the team here to
> > do
> > > that, but it won't be me; I'm seriously overextended at the moment.  It
> > > would be best if someone who knew the connector well could do the
> > necessary
> > > work.
> > >
> > > Karl
> > >
> > >
> > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rh...@gmail.com>
> > wrote:
> > >
> > > > I must agree with Antonio. When I started to work on this I was
> > expecting
> > > > the connector to work by just extracting the entities and entities
> > > metadata
> > > > and put them as plain metadata of the documents, probably following
> > > LDPATH
> > > > queries configuration
> > > >
> > > >
> > > >
> > > >
> > > > This is probably ok for Sensefy but I don’t think this could be
> > suitable
> > > > to be included in the project. But this is only my opinion. Of
> course,
> > a
> > > > version of the connector that fully respect the ManifoldCF
> architecture
> > > > would be more than welcome in my opinion
> > > >
> > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> > > > <ad...@gmail.com> wrote:
> > > >
> > > > > Hi
> > > > > The removal of the SolrWrapper is a must. It was a requirement for
> an
> > > > > internal project which has nothing to do here with a normal
> operation
> > > of
> > > > > Manifold, so forcing the users to use Solr does not fit the
> Manifold
> > > > > philosophy.
> > > > > In my opinion, at this moment, a Stanbol connector with such a big
> > > > > dependency which will not fit almost any use case is not very
> useful.
> > > > > You should think a way to convert Stanbol connector into a normal
> > > > > Transformation connector without assuming that a specific output
> > > > connector
> > > > > will be used.
> > > > > Regards
> > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <dj...@zaizi.com>:
> > > > >> Hi guys,
> > > > >>
> > > > >> I have developed a Stanbol connector for MCF. You can check it out
> > > from
> > > > our
> > > > >> github repo here:
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > >>
> > > > >> It requires the SolrWrapper output connector which indexes
> enhanced
> > > > >> documents, entities and entityTypes in separate Solr cores.
> > Basically
> > > it
> > > > >> requires 3 separate solr cores configured with a specific Solr
> > schema
> > > > for
> > > > >> primary documents, entities and entityTypes separately. This was
> > done
> > > > for
> > > > >> our specific use-case.
> > > > >>
> > > > >> The SolrWrapper code is here :
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > >>
> > > > >> Perhaps we can discuss and remove the Stanbol connector's
> dependency
> > > > with
> > > > >> SolrWrapper and have it working with any output connector.
> > > > >> Please note that the Stanbol connector currently has a bug in the
> UI
> > > > >> (editSpecification) which I'm working on at the moment. After
> fixing
> > > > that I
> > > > >> will update here. And also I will provide documentations for
> > > configuring
> > > > >> the connector.
> > > > >>
> > > > >> Thanks,
> > > > >> Dileepa
> > > > >>
> > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
> > > > >> adperezmorales@gmail.com> wrote:
> > > > >>
> > > > >> > Hi Joshua
> > > > >> >
> > > > >> > It is not the list for that, but Marmotta is already integrated
> in
> > > > Apache
> > > > >> > Stanbol. You can take a look at this issue
> > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> > > > >> >
> > > > >> > Anyway, as I said this is not the list for that, so let's use
> the
> > > > proper
> > > > >> > list for these things.
> > > > >> >
> > > > >> > Regards
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> joshua.dunham@gmail.com
> > >:
> > > > >> >
> > > > >> > > Hey Dileepa,
> > > > >> > >
> > > > >> > >       In case you were interested, I pinged the list a few
> days
> > > ago
> > > > >> > asking
> > > > >> > > for integration tips for Apache Marmotta.
> > > > >> > >
> > > > >> > > I got some great tips on how to do this which could help you.
> > > Since
> > > > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol it
> may
> > > be
> > > > >> > easier
> > > > >> > > for you to take this way.
> > > > >> > >
> > > > >> > > I'm not a Java programmer but I'm bringing this problem to the
> > > > >> > development
> > > > >> > > staff at my company for assistance. If you like the Marmotta
> > > > approach
> > > > >> we
> > > > >> > > may gain more traction solving the same integration.
> > > > >> > >
> > > > >> > > I'm also integrating Marmotta with Stanbol so the effect would
> > be
> > > > the
> > > > >> > same
> > > > >> > > except not using the Stanbol API for data import in favor of
> > > > Marmotta.
> > > > >> > >
> > > > >> > > Best,
> > > > >> > >
> > > > >> > > -J
> > > > >> > >
> > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> > > djayakody@zaizi.com
> > > > >
> > > > >> > > wrote:
> > > > >> > > >
> > > > >> > > > Hi all,
> > > > >> > > >
> > > > >> > > > Thanks you for the feedback and offering your help in this.
> > > > >> > > > Let me get back to you on where to start the code base.
> > > > >> > > > As the first step, I would like to start by creating a
> > > > architecture
> > > > >> > > diagram
> > > > >> > > > for the connector.
> > > > >> > > > I will send the diagram for your review soon.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Dileepa
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > >
> > > > >> > > > ------------------------------
> > > > >> > > > This message should be regarded as confidential. If you have
> > > > received
> > > > >> > > this
> > > > >> > > > email in error please notify the sender and destroy it
> > > > immediately.
> > > > >> > > > Statements of intent shall only become binding when
> confirmed
> > in
> > > > hard
> > > > >> > > copy
> > > > >> > > > by an authorised signatory.
> > > > >> > > >
> > > > >> > > > Zaizi Ltd is registered in England and Wales with the
> > > registration
> > > > >> > number
> > > > >> > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> > > Bush
> > > > >> Road,
> > > > >> > > > London W6 7AN.
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >> --
> > > > >>
> > > > >> ------------------------------
> > > > >> This message should be regarded as confidential. If you have
> > received
> > > > this
> > > > >> email in error please notify the sender and destroy it
> immediately.
> > > > >> Statements of intent shall only become binding when confirmed in
> > hard
> > > > copy
> > > > >> by an authorised signatory.
> > > > >>
> > > > >> Zaizi Ltd is registered in England and Wales with the registration
> > > > number
> > > > >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > > Road,
> > > > >> London W6 7AN.
> > > > >>
> > > >
> > >
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
> >
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Karl Wright <da...@gmail.com>.
Hi Dileepa,

Repository connectors have an abstraction that allows them to generate
compound documents (where a document has a primary identifier, and there
are subdocuments that share that primary identifier and have a secondary
identifier).  This sounds a bit like what you are describing.  Does Stanbol
work by decorating an existing document, or does it work by generating all
content for a document?

Karl


On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <dj...@zaizi.com>
wrote:

> Hi All,
>
>
> While thanking you all for your input on Stanbol connector requirement, I
> would like to continue with modifying the Stanbol connector to be
> compatible with any output connector. If you guys can give some guidance on
> how the entity metadata should be added to the repository document I can
> modify the stanbol connector accordingly.
>
> From Rafa's comments, I gathered we can add the entity metadata to the
> repo.doc as key value pairs.
> However this idea is not yet clear to me. There could be 'N' number of
> entities in a document and each of them will have some common attributes
> such as name, id, type and specific attributes for particular entity type.
> I'm not clear on how to maintain that structure of N number of entities
> with their attributes in a repo.document as key value pairs and make them
> LDPath compatible for retrieval in an output connector.
>
> @Rafa
> If you can please elaborate on your suggestion it would be greatly helpful
> to me.
> All other suggestions are also welcome.
>
> Thanks,
> Dileepa
>
>
> On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <da...@gmail.com> wrote:
>
> > I, too, agree.  Somebody will need to turn this connector into one that
> > plays by the rules.  It may be possible for someone on the team here to
> do
> > that, but it won't be me; I'm seriously overextended at the moment.  It
> > would be best if someone who knew the connector well could do the
> necessary
> > work.
> >
> > Karl
> >
> >
> > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rh...@gmail.com>
> wrote:
> >
> > > I must agree with Antonio. When I started to work on this I was
> expecting
> > > the connector to work by just extracting the entities and entities
> > metadata
> > > and put them as plain metadata of the documents, probably following
> > LDPATH
> > > queries configuration
> > >
> > >
> > >
> > >
> > > This is probably ok for Sensefy but I don’t think this could be
> suitable
> > > to be included in the project. But this is only my opinion. Of course,
> a
> > > version of the connector that fully respect the ManifoldCF architecture
> > > would be more than welcome in my opinion
> > >
> > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> > > <ad...@gmail.com> wrote:
> > >
> > > > Hi
> > > > The removal of the SolrWrapper is a must. It was a requirement for an
> > > > internal project which has nothing to do here with a normal operation
> > of
> > > > Manifold, so forcing the users to use Solr does not fit the Manifold
> > > > philosophy.
> > > > In my opinion, at this moment, a Stanbol connector with such a big
> > > > dependency which will not fit almost any use case is not very useful.
> > > > You should think a way to convert Stanbol connector into a normal
> > > > Transformation connector without assuming that a specific output
> > > connector
> > > > will be used.
> > > > Regards
> > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <dj...@zaizi.com>:
> > > >> Hi guys,
> > > >>
> > > >> I have developed a Stanbol connector for MCF. You can check it out
> > from
> > > our
> > > >> github repo here:
> > > >>
> > > >>
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > >>
> > > >> It requires the SolrWrapper output connector which indexes enhanced
> > > >> documents, entities and entityTypes in separate Solr cores.
> Basically
> > it
> > > >> requires 3 separate solr cores configured with a specific Solr
> schema
> > > for
> > > >> primary documents, entities and entityTypes separately. This was
> done
> > > for
> > > >> our specific use-case.
> > > >>
> > > >> The SolrWrapper code is here :
> > > >>
> > > >>
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > >>
> > > >> Perhaps we can discuss and remove the Stanbol connector's dependency
> > > with
> > > >> SolrWrapper and have it working with any output connector.
> > > >> Please note that the Stanbol connector currently has a bug in the UI
> > > >> (editSpecification) which I'm working on at the moment. After fixing
> > > that I
> > > >> will update here. And also I will provide documentations for
> > configuring
> > > >> the connector.
> > > >>
> > > >> Thanks,
> > > >> Dileepa
> > > >>
> > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
> > > >> adperezmorales@gmail.com> wrote:
> > > >>
> > > >> > Hi Joshua
> > > >> >
> > > >> > It is not the list for that, but Marmotta is already integrated in
> > > Apache
> > > >> > Stanbol. You can take a look at this issue
> > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> > > >> >
> > > >> > Anyway, as I said this is not the list for that, so let's use the
> > > proper
> > > >> > list for these things.
> > > >> >
> > > >> > Regards
> > > >> >
> > > >> >
> > > >> >
> > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <joshua.dunham@gmail.com
> >:
> > > >> >
> > > >> > > Hey Dileepa,
> > > >> > >
> > > >> > >       In case you were interested, I pinged the list a few days
> > ago
> > > >> > asking
> > > >> > > for integration tips for Apache Marmotta.
> > > >> > >
> > > >> > > I got some great tips on how to do this which could help you.
> > Since
> > > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol it may
> > be
> > > >> > easier
> > > >> > > for you to take this way.
> > > >> > >
> > > >> > > I'm not a Java programmer but I'm bringing this problem to the
> > > >> > development
> > > >> > > staff at my company for assistance. If you like the Marmotta
> > > approach
> > > >> we
> > > >> > > may gain more traction solving the same integration.
> > > >> > >
> > > >> > > I'm also integrating Marmotta with Stanbol so the effect would
> be
> > > the
> > > >> > same
> > > >> > > except not using the Stanbol API for data import in favor of
> > > Marmotta.
> > > >> > >
> > > >> > > Best,
> > > >> > >
> > > >> > > -J
> > > >> > >
> > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> > djayakody@zaizi.com
> > > >
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > Hi all,
> > > >> > > >
> > > >> > > > Thanks you for the feedback and offering your help in this.
> > > >> > > > Let me get back to you on where to start the code base.
> > > >> > > > As the first step, I would like to start by creating a
> > > architecture
> > > >> > > diagram
> > > >> > > > for the connector.
> > > >> > > > I will send the diagram for your review soon.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Dileepa
> > > >> > > >
> > > >> > > > --
> > > >> > > >
> > > >> > > > ------------------------------
> > > >> > > > This message should be regarded as confidential. If you have
> > > received
> > > >> > > this
> > > >> > > > email in error please notify the sender and destroy it
> > > immediately.
> > > >> > > > Statements of intent shall only become binding when confirmed
> in
> > > hard
> > > >> > > copy
> > > >> > > > by an authorised signatory.
> > > >> > > >
> > > >> > > > Zaizi Ltd is registered in England and Wales with the
> > registration
> > > >> > number
> > > >> > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> > Bush
> > > >> Road,
> > > >> > > > London W6 7AN.
> > > >> > >
> > > >> >
> > > >>
> > > >> --
> > > >>
> > > >> ------------------------------
> > > >> This message should be regarded as confidential. If you have
> received
> > > this
> > > >> email in error please notify the sender and destroy it immediately.
> > > >> Statements of intent shall only become binding when confirmed in
> hard
> > > copy
> > > >> by an authorised signatory.
> > > >>
> > > >> Zaizi Ltd is registered in England and Wales with the registration
> > > number
> > > >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > Road,
> > > >> London W6 7AN.
> > > >>
> > >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Dileepa Jayakody <dj...@zaizi.com>.
Hi All,


While thanking you all for your input on Stanbol connector requirement, I
would like to continue with modifying the Stanbol connector to be
compatible with any output connector. If you guys can give some guidance on
how the entity metadata should be added to the repository document I can
modify the stanbol connector accordingly.

>From Rafa's comments, I gathered we can add the entity metadata to the
repo.doc as key value pairs.
However this idea is not yet clear to me. There could be 'N' number of
entities in a document and each of them will have some common attributes
such as name, id, type and specific attributes for particular entity type.
I'm not clear on how to maintain that structure of N number of entities
with their attributes in a repo.document as key value pairs and make them
LDPath compatible for retrieval in an output connector.

@Rafa
If you can please elaborate on your suggestion it would be greatly helpful
to me.
All other suggestions are also welcome.

Thanks,
Dileepa


On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <da...@gmail.com> wrote:

> I, too, agree.  Somebody will need to turn this connector into one that
> plays by the rules.  It may be possible for someone on the team here to do
> that, but it won't be me; I'm seriously overextended at the moment.  It
> would be best if someone who knew the connector well could do the necessary
> work.
>
> Karl
>
>
> On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rh...@gmail.com> wrote:
>
> > I must agree with Antonio. When I started to work on this I was expecting
> > the connector to work by just extracting the entities and entities
> metadata
> > and put them as plain metadata of the documents, probably following
> LDPATH
> > queries configuration
> >
> >
> >
> >
> > This is probably ok for Sensefy but I don’t think this could be suitable
> > to be included in the project. But this is only my opinion. Of course, a
> > version of the connector that fully respect the ManifoldCF architecture
> > would be more than welcome in my opinion
> >
> > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> > <ad...@gmail.com> wrote:
> >
> > > Hi
> > > The removal of the SolrWrapper is a must. It was a requirement for an
> > > internal project which has nothing to do here with a normal operation
> of
> > > Manifold, so forcing the users to use Solr does not fit the Manifold
> > > philosophy.
> > > In my opinion, at this moment, a Stanbol connector with such a big
> > > dependency which will not fit almost any use case is not very useful.
> > > You should think a way to convert Stanbol connector into a normal
> > > Transformation connector without assuming that a specific output
> > connector
> > > will be used.
> > > Regards
> > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <dj...@zaizi.com>:
> > >> Hi guys,
> > >>
> > >> I have developed a Stanbol connector for MCF. You can check it out
> from
> > our
> > >> github repo here:
> > >>
> > >>
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > >>
> > >> It requires the SolrWrapper output connector which indexes enhanced
> > >> documents, entities and entityTypes in separate Solr cores. Basically
> it
> > >> requires 3 separate solr cores configured with a specific Solr schema
> > for
> > >> primary documents, entities and entityTypes separately. This was done
> > for
> > >> our specific use-case.
> > >>
> > >> The SolrWrapper code is here :
> > >>
> > >>
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > >>
> > >> Perhaps we can discuss and remove the Stanbol connector's dependency
> > with
> > >> SolrWrapper and have it working with any output connector.
> > >> Please note that the Stanbol connector currently has a bug in the UI
> > >> (editSpecification) which I'm working on at the moment. After fixing
> > that I
> > >> will update here. And also I will provide documentations for
> configuring
> > >> the connector.
> > >>
> > >> Thanks,
> > >> Dileepa
> > >>
> > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
> > >> adperezmorales@gmail.com> wrote:
> > >>
> > >> > Hi Joshua
> > >> >
> > >> > It is not the list for that, but Marmotta is already integrated in
> > Apache
> > >> > Stanbol. You can take a look at this issue
> > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> > >> >
> > >> > Anyway, as I said this is not the list for that, so let's use the
> > proper
> > >> > list for these things.
> > >> >
> > >> > Regards
> > >> >
> > >> >
> > >> >
> > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <jo...@gmail.com>:
> > >> >
> > >> > > Hey Dileepa,
> > >> > >
> > >> > >       In case you were interested, I pinged the list a few days
> ago
> > >> > asking
> > >> > > for integration tips for Apache Marmotta.
> > >> > >
> > >> > > I got some great tips on how to do this which could help you.
> Since
> > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol it may
> be
> > >> > easier
> > >> > > for you to take this way.
> > >> > >
> > >> > > I'm not a Java programmer but I'm bringing this problem to the
> > >> > development
> > >> > > staff at my company for assistance. If you like the Marmotta
> > approach
> > >> we
> > >> > > may gain more traction solving the same integration.
> > >> > >
> > >> > > I'm also integrating Marmotta with Stanbol so the effect would be
> > the
> > >> > same
> > >> > > except not using the Stanbol API for data import in favor of
> > Marmotta.
> > >> > >
> > >> > > Best,
> > >> > >
> > >> > > -J
> > >> > >
> > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> djayakody@zaizi.com
> > >
> > >> > > wrote:
> > >> > > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > > Thanks you for the feedback and offering your help in this.
> > >> > > > Let me get back to you on where to start the code base.
> > >> > > > As the first step, I would like to start by creating a
> > architecture
> > >> > > diagram
> > >> > > > for the connector.
> > >> > > > I will send the diagram for your review soon.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Dileepa
> > >> > > >
> > >> > > > --
> > >> > > >
> > >> > > > ------------------------------
> > >> > > > This message should be regarded as confidential. If you have
> > received
> > >> > > this
> > >> > > > email in error please notify the sender and destroy it
> > immediately.
> > >> > > > Statements of intent shall only become binding when confirmed in
> > hard
> > >> > > copy
> > >> > > > by an authorised signatory.
> > >> > > >
> > >> > > > Zaizi Ltd is registered in England and Wales with the
> registration
> > >> > number
> > >> > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> Bush
> > >> Road,
> > >> > > > London W6 7AN.
> > >> > >
> > >> >
> > >>
> > >> --
> > >>
> > >> ------------------------------
> > >> This message should be regarded as confidential. If you have received
> > this
> > >> email in error please notify the sender and destroy it immediately.
> > >> Statements of intent shall only become binding when confirmed in hard
> > copy
> > >> by an authorised signatory.
> > >>
> > >> Zaizi Ltd is registered in England and Wales with the registration
> > number
> > >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> Road,
> > >> London W6 7AN.
> > >>
> >
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Karl Wright <da...@gmail.com>.
I, too, agree.  Somebody will need to turn this connector into one that
plays by the rules.  It may be possible for someone on the team here to do
that, but it won't be me; I'm seriously overextended at the moment.  It
would be best if someone who knew the connector well could do the necessary
work.

Karl


On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rh...@gmail.com> wrote:

> I must agree with Antonio. When I started to work on this I was expecting
> the connector to work by just extracting the entities and entities metadata
> and put them as plain metadata of the documents, probably following LDPATH
> queries configuration
>
>
>
>
> This is probably ok for Sensefy but I don’t think this could be suitable
> to be included in the project. But this is only my opinion. Of course, a
> version of the connector that fully respect the ManifoldCF architecture
> would be more than welcome in my opinion
>
> On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> <ad...@gmail.com> wrote:
>
> > Hi
> > The removal of the SolrWrapper is a must. It was a requirement for an
> > internal project which has nothing to do here with a normal operation of
> > Manifold, so forcing the users to use Solr does not fit the Manifold
> > philosophy.
> > In my opinion, at this moment, a Stanbol connector with such a big
> > dependency which will not fit almost any use case is not very useful.
> > You should think a way to convert Stanbol connector into a normal
> > Transformation connector without assuming that a specific output
> connector
> > will be used.
> > Regards
> > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <dj...@zaizi.com>:
> >> Hi guys,
> >>
> >> I have developed a Stanbol connector for MCF. You can check it out from
> our
> >> github repo here:
> >>
> >>
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> >>
> >> It requires the SolrWrapper output connector which indexes enhanced
> >> documents, entities and entityTypes in separate Solr cores. Basically it
> >> requires 3 separate solr cores configured with a specific Solr schema
> for
> >> primary documents, entities and entityTypes separately. This was done
> for
> >> our specific use-case.
> >>
> >> The SolrWrapper code is here :
> >>
> >>
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> >>
> >> Perhaps we can discuss and remove the Stanbol connector's dependency
> with
> >> SolrWrapper and have it working with any output connector.
> >> Please note that the Stanbol connector currently has a bug in the UI
> >> (editSpecification) which I'm working on at the moment. After fixing
> that I
> >> will update here. And also I will provide documentations for configuring
> >> the connector.
> >>
> >> Thanks,
> >> Dileepa
> >>
> >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
> >> adperezmorales@gmail.com> wrote:
> >>
> >> > Hi Joshua
> >> >
> >> > It is not the list for that, but Marmotta is already integrated in
> Apache
> >> > Stanbol. You can take a look at this issue
> >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> >> >
> >> > Anyway, as I said this is not the list for that, so let's use the
> proper
> >> > list for these things.
> >> >
> >> > Regards
> >> >
> >> >
> >> >
> >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <jo...@gmail.com>:
> >> >
> >> > > Hey Dileepa,
> >> > >
> >> > >       In case you were interested, I pinged the list a few days ago
> >> > asking
> >> > > for integration tips for Apache Marmotta.
> >> > >
> >> > > I got some great tips on how to do this which could help you. Since
> >> > > Marmotta is a drop in replacement for Clarezza on Stanbol it may be
> >> > easier
> >> > > for you to take this way.
> >> > >
> >> > > I'm not a Java programmer but I'm bringing this problem to the
> >> > development
> >> > > staff at my company for assistance. If you like the Marmotta
> approach
> >> we
> >> > > may gain more traction solving the same integration.
> >> > >
> >> > > I'm also integrating Marmotta with Stanbol so the effect would be
> the
> >> > same
> >> > > except not using the Stanbol API for data import in favor of
> Marmotta.
> >> > >
> >> > > Best,
> >> > >
> >> > > -J
> >> > >
> >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <djayakody@zaizi.com
> >
> >> > > wrote:
> >> > > >
> >> > > > Hi all,
> >> > > >
> >> > > > Thanks you for the feedback and offering your help in this.
> >> > > > Let me get back to you on where to start the code base.
> >> > > > As the first step, I would like to start by creating a
> architecture
> >> > > diagram
> >> > > > for the connector.
> >> > > > I will send the diagram for your review soon.
> >> > > >
> >> > > > Thanks,
> >> > > > Dileepa
> >> > > >
> >> > > > --
> >> > > >
> >> > > > ------------------------------
> >> > > > This message should be regarded as confidential. If you have
> received
> >> > > this
> >> > > > email in error please notify the sender and destroy it
> immediately.
> >> > > > Statements of intent shall only become binding when confirmed in
> hard
> >> > > copy
> >> > > > by an authorised signatory.
> >> > > >
> >> > > > Zaizi Ltd is registered in England and Wales with the registration
> >> > number
> >> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> >> Road,
> >> > > > London W6 7AN.
> >> > >
> >> >
> >>
> >> --
> >>
> >> ------------------------------
> >> This message should be regarded as confidential. If you have received
> this
> >> email in error please notify the sender and destroy it immediately.
> >> Statements of intent shall only become binding when confirmed in hard
> copy
> >> by an authorised signatory.
> >>
> >> Zaizi Ltd is registered in England and Wales with the registration
> number
> >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> >> London W6 7AN.
> >>
>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Rafa Haro <rh...@gmail.com>.
I must agree with Antonio. When I started to work on this I was expecting the connector to work by just extracting the entities and entities metadata and put them as plain metadata of the documents, probably following LDPATH queries configuration




This is probably ok for Sensefy but I don’t think this could be suitable to be included in the project. But this is only my opinion. Of course, a version of the connector that fully respect the ManifoldCF architecture would be more than welcome in my opinion

On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
<ad...@gmail.com> wrote:

> Hi
> The removal of the SolrWrapper is a must. It was a requirement for an
> internal project which has nothing to do here with a normal operation of
> Manifold, so forcing the users to use Solr does not fit the Manifold
> philosophy.
> In my opinion, at this moment, a Stanbol connector with such a big
> dependency which will not fit almost any use case is not very useful.
> You should think a way to convert Stanbol connector into a normal
> Transformation connector without assuming that a specific output connector
> will be used.
> Regards
> 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <dj...@zaizi.com>:
>> Hi guys,
>>
>> I have developed a Stanbol connector for MCF. You can check it out from our
>> github repo here:
>>
>> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
>>
>> It requires the SolrWrapper output connector which indexes enhanced
>> documents, entities and entityTypes in separate Solr cores. Basically it
>> requires 3 separate solr cores configured with a specific Solr schema for
>> primary documents, entities and entityTypes separately. This was done for
>> our specific use-case.
>>
>> The SolrWrapper code is here :
>>
>> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
>>
>> Perhaps we can discuss and remove the Stanbol connector's dependency with
>> SolrWrapper and have it working with any output connector.
>> Please note that the Stanbol connector currently has a bug in the UI
>> (editSpecification) which I'm working on at the moment. After fixing that I
>> will update here. And also I will provide documentations for configuring
>> the connector.
>>
>> Thanks,
>> Dileepa
>>
>> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
>> adperezmorales@gmail.com> wrote:
>>
>> > Hi Joshua
>> >
>> > It is not the list for that, but Marmotta is already integrated in Apache
>> > Stanbol. You can take a look at this issue
>> > https://issues.apache.org/jira/browse/STANBOL-1165 .
>> >
>> > Anyway, as I said this is not the list for that, so let's use the proper
>> > list for these things.
>> >
>> > Regards
>> >
>> >
>> >
>> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <jo...@gmail.com>:
>> >
>> > > Hey Dileepa,
>> > >
>> > >       In case you were interested, I pinged the list a few days ago
>> > asking
>> > > for integration tips for Apache Marmotta.
>> > >
>> > > I got some great tips on how to do this which could help you. Since
>> > > Marmotta is a drop in replacement for Clarezza on Stanbol it may be
>> > easier
>> > > for you to take this way.
>> > >
>> > > I'm not a Java programmer but I'm bringing this problem to the
>> > development
>> > > staff at my company for assistance. If you like the Marmotta approach
>> we
>> > > may gain more traction solving the same integration.
>> > >
>> > > I'm also integrating Marmotta with Stanbol so the effect would be the
>> > same
>> > > except not using the Stanbol API for data import in favor of Marmotta.
>> > >
>> > > Best,
>> > >
>> > > -J
>> > >
>> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <dj...@zaizi.com>
>> > > wrote:
>> > > >
>> > > > Hi all,
>> > > >
>> > > > Thanks you for the feedback and offering your help in this.
>> > > > Let me get back to you on where to start the code base.
>> > > > As the first step, I would like to start by creating a architecture
>> > > diagram
>> > > > for the connector.
>> > > > I will send the diagram for your review soon.
>> > > >
>> > > > Thanks,
>> > > > Dileepa
>> > > >
>> > > > --
>> > > >
>> > > > ------------------------------
>> > > > This message should be regarded as confidential. If you have received
>> > > this
>> > > > email in error please notify the sender and destroy it immediately.
>> > > > Statements of intent shall only become binding when confirmed in hard
>> > > copy
>> > > > by an authorised signatory.
>> > > >
>> > > > Zaizi Ltd is registered in England and Wales with the registration
>> > number
>> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>> Road,
>> > > > London W6 7AN.
>> > >
>> >
>>
>> --
>>
>> ------------------------------
>> This message should be regarded as confidential. If you have received this
>> email in error please notify the sender and destroy it immediately.
>> Statements of intent shall only become binding when confirmed in hard copy
>> by an authorised signatory.
>>
>> Zaizi Ltd is registered in England and Wales with the registration number
>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>> London W6 7AN.
>>

Re: ManifoldCF transformation connector for Apache Stanbol

Posted by Antonio David Pérez Morales <ad...@gmail.com>.
Hi

The removal of the SolrWrapper is a must. It was a requirement for an
internal project which has nothing to do here with a normal operation of
Manifold, so forcing the users to use Solr does not fit the Manifold
philosophy.

In my opinion, at this moment, a Stanbol connector with such a big
dependency which will not fit almost any use case is not very useful.

You should think a way to convert Stanbol connector into a normal
Transformation connector without assuming that a specific output connector
will be used.

Regards


2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <dj...@zaizi.com>:

> Hi guys,
>
> I have developed a Stanbol connector for MCF. You can check it out from our
> github repo here:
>
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
>
> It requires the SolrWrapper output connector which indexes enhanced
> documents, entities and entityTypes in separate Solr cores. Basically it
> requires 3 separate solr cores configured with a specific Solr schema for
> primary documents, entities and entityTypes separately. This was done for
> our specific use-case.
>
> The SolrWrapper code is here :
>
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
>
> Perhaps we can discuss and remove the Stanbol connector's dependency with
> SolrWrapper and have it working with any output connector.
> Please note that the Stanbol connector currently has a bug in the UI
> (editSpecification) which I'm working on at the moment. After fixing that I
> will update here. And also I will provide documentations for configuring
> the connector.
>
> Thanks,
> Dileepa
>
> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
> adperezmorales@gmail.com> wrote:
>
> > Hi Joshua
> >
> > It is not the list for that, but Marmotta is already integrated in Apache
> > Stanbol. You can take a look at this issue
> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> >
> > Anyway, as I said this is not the list for that, so let's use the proper
> > list for these things.
> >
> > Regards
> >
> >
> >
> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <jo...@gmail.com>:
> >
> > > Hey Dileepa,
> > >
> > >       In case you were interested, I pinged the list a few days ago
> > asking
> > > for integration tips for Apache Marmotta.
> > >
> > > I got some great tips on how to do this which could help you. Since
> > > Marmotta is a drop in replacement for Clarezza on Stanbol it may be
> > easier
> > > for you to take this way.
> > >
> > > I'm not a Java programmer but I'm bringing this problem to the
> > development
> > > staff at my company for assistance. If you like the Marmotta approach
> we
> > > may gain more traction solving the same integration.
> > >
> > > I'm also integrating Marmotta with Stanbol so the effect would be the
> > same
> > > except not using the Stanbol API for data import in favor of Marmotta.
> > >
> > > Best,
> > >
> > > -J
> > >
> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <dj...@zaizi.com>
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Thanks you for the feedback and offering your help in this.
> > > > Let me get back to you on where to start the code base.
> > > > As the first step, I would like to start by creating a architecture
> > > diagram
> > > > for the connector.
> > > > I will send the diagram for your review soon.
> > > >
> > > > Thanks,
> > > > Dileepa
> > > >
> > > > --
> > > >
> > > > ------------------------------
> > > > This message should be regarded as confidential. If you have received
> > > this
> > > > email in error please notify the sender and destroy it immediately.
> > > > Statements of intent shall only become binding when confirmed in hard
> > > copy
> > > > by an authorised signatory.
> > > >
> > > > Zaizi Ltd is registered in England and Wales with the registration
> > number
> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> Road,
> > > > London W6 7AN.
> > >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>