You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@manifoldcf.apache.org by Joshua Dunham <jo...@gmail.com> on 2015/07/05 02:55:24 UTC

Output Connector - Apache Marmotta

Hi ManifoldCF Users (and Devs)

    I'm wondering if ManifoldCF can work in my use case. I have some
random mySQL and Oracle DB's that I would like to connect to and
extract certain known bits of info, format them each a certain way and
then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
store for linked data so I would need to parse and store the mySQL and
Oracle DB's info into a linked format, which is no problem for me to
create the relationships etc, I just need something that would let me
specifically do this.

    From what I've read, ManifoldCF can connect to mySQL and Oracle
(via non-distributed libraries), and store the results out in several
target data stores. What isn't clear is
(A) How I define the data to grab, whether some SQL statement or the like.
(B) How to use this data as individual variables which I can arrange
into a linked data relationship (ManifoldCF mapping module?)
(C) How difficult would it be to connect to Marmotta's webservice(s).
I'm not familiar with the exact mechanism, but I saw ManifoldCF has
support for elasticsearch so maybe I could put something together that
talks to Marmotta..

Would this be possible? If so, could someone point me in the right direction?

  Thanks!
   -Joshua


[1] - http://marmotta.apache.org/index.html

Re: Output Connector - Apache Marmotta

Posted by Karl Wright <da...@gmail.com>.

Hi Joshua,

"What is not apparent is how to use the metadata adjuster to interact with
the variables in the Data query. I've followed the guide and made a simple
hello, False, ${city} statement but the only bits that are written into the
file are the contents of the $DATACOLUMN variable. So, given a simple
address book in a database with columns, id, street, city, region, country,
post code, latitude, longitude ... how should I approach making such a data
query? "

(1) In the JDBC connector, every column that you include in your data
query, which isn't one of the known ones like $DATACOLUMN, is treated as a
metadata value, with the name of the metadata value being the name of the
column.  There are some differences of behavior between different JDBC
drivers; for example, some JDBC drivers map the column names to upper
case.  See:
https://manifoldcf.apache.org/release/release-2.1/en_US/end-user-documentation.html#jdbcrepository

(2) The Metadata Adjuster takes what the JDBC connector outputs, and allows
you to manipulate the metadata values according to certain rules.  The
documentation is here:
https://manifoldcf.apache.org/release/release-2.1/en_US/end-user-documentation.html#metadataadjuster

If you don't seem to be making any progress, please provide the exact data
query you are using, and a screenshot or paste of the job view that
includes the metadata adjuster specification, so I can see what you are
trying to do more precisely.

Karl


On Wed, Sep 9, 2015 at 3:48 PM, Joshua Dunham <jo...@gmail.com>
wrote:

> Could you shed any light on the middle part,
>
> =====
>
> What is not apparent is how to use the metadata adjuster to interact with
> the variables in the Data query. I've followed the guide and made a simple
> hello, False, ${city} statement but the only bits that are written into the
> file are the contents of the $DATACOLUMN variable. So, given a simple
> address book in a database with columns, id, street, city, region, country,
> post code, latitude, longitude ... how should I approach making such a data
> query?
>
> My real use cases will be much much more complicated so I'm wondering if
> you have some explanation of how I should want to use that field and maybe
> a small SQL snippet example with those columns? :) My end goal is to have a
> column called out and then use the metadata adjuster to simply prepend each
> column's value with a string. So if the city is 'New York' it would write
> out city:New_York or the like.
>
> =====
> Thx in advance!
>
> -J
>
> On Sep 9, 2015, at 1:53 PM, Karl Wright <da...@gmail.com> wrote:
>
> Hi Joshua,
>
> "My question is; why would I need to setup different transform modules?
> Since there is no real config to do in the transform connector (all the
> good stuff seems to be under Task config) I'm not sure why I would need to
> make more than one and keep reusing it by changing the transform paeans
> under task?"
>
> While the Metadata Adjuster transformer has no configuration, the model
> that MCF uses for transformers is just like the model it uses for other
> kinds of connectors.  Pretend for a moment that you needed to call an
> external system to do content extraction, then you will see the point.
>
> Thanks,
> Karl
>
>
> On Wed, Sep 9, 2015 at 12:55 PM, Joshua Dunham <jo...@gmail.com>
> wrote:
>
>> Hi Karl, Rafa,
>>
>>   I finally had some time to work on this and I have a scheme which
>> (largely) works very well but I have some question, one stumbling block,
>> and one comment.
>>
>> First, my environment consists of, Manifold v 2.1, MariaDB which I
>> imported a small CSV into for testing, and Marmotta 3.3.
>>
>> The real interesting bits are in specifying the Task. I have the mySQL
>> input -> metadata adjuster -> filesystem output. mySQL is setup and the
>> connection shows as OK and on starting the job, it does write files to the
>> output folder.
>>
>> Getting the list of ID's works well no issue there, and I'm not using
>> versioning or access tokens yet. The stumbling block has to do with setting
>> up the Data Query and the best use of the $URL and $DATA variables. First:
>> I've hijacked the $URL into ~ CONCAT("addresses/", id) AS $(URLCOLUMN)
>> which has the effect of creating a folder called addresses in the root of
>> the output folder. Inside of the addresses folder it makes numbered files
>> corresponding to the rowID. I can point the root folder path at the
>> marmotta import directory and even use the context templating feature
>> (setting 'addresses' into the real context name). That's really slick for
>> out of the box hack at integration.
>>
>> What is not apparent is how to use the metadata adjuster to interact with
>> the variables in the Data query. I've followed the guide and made a simple
>> hello, False, ${city} statement but the only bits that are written into the
>> file are the contents of the $DATACOLUMN variable. So, given a simple
>> address book in a database with columns, id, street, city, region, country,
>> post code, latitude, longitude ... how should I approach making such a data
>> query? My real use cases will be much much more complicated so I'm
>> wondering if you have some explanation of how I should want to use that
>> field and maybe a small SQL snippet example with those columns? :) My end
>> goal is to have a column called out and then use the metadata adjuster to
>> simply prepend each column's value with a string. So if the city is 'New
>> York' it would write out city:New_York or the like.
>>
>> =====
>>
>> The comment was in regards to a bit of sample data which could ship with
>> the source. It would be very educational if there was a complex but real
>> configuration of ManifoldCF that links to a sqlite3 file as input and maybe
>> the same one input db but a different table as output?
>>
>> =====
>>
>> My question is; why would I need to setup different transform modules?
>> Since there is no real config to do in the transform connector (all the
>> good stuff seems to be under Task config) I'm not sure why I would need to
>> make more than one and keep reusing it by changing the transform paeans
>> under task?
>>
>>
>> Thank you!
>>
>> J
>>
>>
>> > On 5 July 2015 at 17:27, Karl Wright <da...@gmail.com> wrote:
>> > Hi Joshua,
>> >
>> > My take:
>> >
>> > --> (A) How I define the data to grab, whether some SQL statement or the
>> > like. <--
>> >
>> > Have a look at the user documentation here:
>> >
>> https://manifoldcf.apache.org/release/release-1.9/en_US/end-user-documentation.html#jdbcrepository
>> >
>> > It should be pretty clear how you define what you are looking for.
>> >
>> > --> (B) How to use this data as individual variables which I can arrange
>> > into a linked data relationship (ManifoldCF mapping module?) <--
>> >
>> > Rafa's previous reply about the RepositoryDocument is appropriate.
>> > Basically, an output connector will be handed one of those objects for
>> every
>> > MCF "document".  The javadoc for it is here:
>> >
>> >
>> https://manifoldcf.apache.org/release/trunk/api/framework/org/apache/manifoldcf/agents/interfaces/RepositoryDocument.html
>> >
>> > --> (C) How difficult would it be to connect to Marmotta's
>> webservice(s).
>> > I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>> > support for elasticsearch so maybe I could put something together that
>> > talks to Marmotta..<--
>> >
>> > You can readily write your own output connector.  There's a book, in
>> fact,
>> > describing how to do that.  See:
>> >
>> > https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
>> >
>> > ... and read Chapter 9.
>> >
>> > Thanks,
>> > Karl
>> >
>> >
>> > On Sun, Jul 5, 2015 at 11:53 AM, Joshua Dunham <joshua.dunham@gmail.com
>> >
>> > wrote:
>> >>
>> >> That sounds promising. Would you recommend ManifoldCF for this? If so,
>> >> do you know of any resources which I can use to get up to speed with
>> >> using it in this way?
>> >>
>> >> -J
>> >>
>> >>> On 4 July 2015 at 21:48,  <rh...@gmail.com> wrote:
>> >>> Hi Joshua,
>> >>>
>> >>> The ManifoldCF unit logic in terms of indexing is the Repository
>> >>> Document
>> >>> which, simplifying a lot, model a document composed by content plus
>> >>> metadata
>> >>> (key-value). It should be relative easy to tripifly that structure and
>> >>> push
>> >>> it to Marmotta using SPARQL update queries or Marmotta’s java client
>> for
>> >>> adding resources.
>> >>> The Generic Database connector uses a set of queries for crawling the
>> >>> database. You should have to use that queries to get you data. I’m not
>> >>> completely sure if each record result is converted directly to a
>> >>> Repository
>> >>> Document, that is something that I would need to check.
>> >>>
>> >>> Hope that helps,
>> >>> Cheers, Rafa
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <
>> joshua.dunham@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hi ManifoldCF Users (and Devs)
>> >>>>
>> >>>> I'm wondering if ManifoldCF can work in my use case. I have some
>> >>>> random mySQL and Oracle DB's that I would like to connect to and
>> >>>> extract certain known bits of info, format them each a certain way
>> and
>> >>>> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
>> >>>> store for linked data so I would need to parse and store the mySQL
>> and
>> >>>> Oracle DB's info into a linked format, which is no problem for me to
>> >>>> create the relationships etc, I just need something that would let me
>> >>>> specifically do this.
>> >>>>
>> >>>> From what I've read, ManifoldCF can connect to mySQL and Oracle
>> >>>> (via non-distributed libraries), and store the results out in several
>> >>>> target data stores. What isn't clear is
>> >>>> (A) How I define the data to grab, whether some SQL statement or the
>> >>>> like.
>> >>>> (B) How to use this data as individual variables which I can arrange
>> >>>> into a linked data relationship (ManifoldCF mapping module?)
>> >>>> (C) How difficult would it be to connect to Marmotta's webservice(s).
>> >>>> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>> >>>> support for elasticsearch so maybe I could put something together
>> that
>> >>>> talks to Marmotta..
>> >>>>
>> >>>> Would this be possible? If so, could someone point me in the right
>> >>>> direction?
>> >>>>
>> >>>> Thanks!
>> >>>> -Joshua
>> >>>>
>> >>>>
>> >>>> [1] - http://marmotta.apache.org/index.html
>>
>>
>

Re: Output Connector - Apache Marmotta

Posted by Joshua Dunham <jo...@gmail.com>.

Could you shed any light on the middle part,

=====

What is not apparent is how to use the metadata adjuster to interact with the variables in the Data query. I've followed the guide and made a simple hello, False, ${city} statement but the only bits that are written into the file are the contents of the $DATACOLUMN variable. So, given a simple address book in a database with columns, id, street, city, region, country, post code, latitude, longitude ... how should I approach making such a data query? 

My real use cases will be much much more complicated so I'm wondering if you have some explanation of how I should want to use that field and maybe a small SQL snippet example with those columns? :) My end goal is to have a column called out and then use the metadata adjuster to simply prepend each column's value with a string. So if the city is 'New York' it would write out city:New_York or the like.

=====
Thx in advance!

-J

> On Sep 9, 2015, at 1:53 PM, Karl Wright <da...@gmail.com> wrote:
> 
> Hi Joshua,
> 
> "My question is; why would I need to setup different transform modules? Since there is no real config to do in the transform connector (all the good stuff seems to be under Task config) I'm not sure why I would need to make more than one and keep reusing it by changing the transform paeans under task?"
> 
> While the Metadata Adjuster transformer has no configuration, the model that MCF uses for transformers is just like the model it uses for other kinds of connectors.  Pretend for a moment that you needed to call an external system to do content extraction, then you will see the point.
> 
> Thanks,
> Karl
> 
> 
>> On Wed, Sep 9, 2015 at 12:55 PM, Joshua Dunham <jo...@gmail.com> wrote:
>> Hi Karl, Rafa,
>> 
>>   I finally had some time to work on this and I have a scheme which (largely) works very well but I have some question, one stumbling block, and one comment.
>> 
>> First, my environment consists of, Manifold v 2.1, MariaDB which I imported a small CSV into for testing, and Marmotta 3.3.
>> 
>> The real interesting bits are in specifying the Task. I have the mySQL input -> metadata adjuster -> filesystem output. mySQL is setup and the connection shows as OK and on starting the job, it does write files to the output folder.
>> 
>> Getting the list of ID's works well no issue there, and I'm not using versioning or access tokens yet. The stumbling block has to do with setting up the Data Query and the best use of the $URL and $DATA variables. First: I've hijacked the $URL into ~ CONCAT("addresses/", id) AS $(URLCOLUMN) which has the effect of creating a folder called addresses in the root of the output folder. Inside of the addresses folder it makes numbered files corresponding to the rowID. I can point the root folder path at the marmotta import directory and even use the context templating feature (setting 'addresses' into the real context name). That's really slick for out of the box hack at integration.
>> 
>> What is not apparent is how to use the metadata adjuster to interact with the variables in the Data query. I've followed the guide and made a simple hello, False, ${city} statement but the only bits that are written into the file are the contents of the $DATACOLUMN variable. So, given a simple address book in a database with columns, id, street, city, region, country, post code, latitude, longitude ... how should I approach making such a data query? My real use cases will be much much more complicated so I'm wondering if you have some explanation of how I should want to use that field and maybe a small SQL snippet example with those columns? :) My end goal is to have a column called out and then use the metadata adjuster to simply prepend each column's value with a string. So if the city is 'New York' it would write out city:New_York or the like.
>> 
>> =====
>> 
>> The comment was in regards to a bit of sample data which could ship with the source. It would be very educational if there was a complex but real configuration of ManifoldCF that links to a sqlite3 file as input and maybe the same one input db but a different table as output?
>> 
>> =====
>> 
>> My question is; why would I need to setup different transform modules? Since there is no real config to do in the transform connector (all the good stuff seems to be under Task config) I'm not sure why I would need to make more than one and keep reusing it by changing the transform paeans under task?
>> 
>> 
>> Thank you!
>> 
>> J
>> 
>> 
>> > On 5 July 2015 at 17:27, Karl Wright <da...@gmail.com> wrote:
>> > Hi Joshua,
>> >
>> > My take:
>> >
>> > --> (A) How I define the data to grab, whether some SQL statement or the
>> > like. <--
>> >
>> > Have a look at the user documentation here:
>> > https://manifoldcf.apache.org/release/release-1.9/en_US/end-user-documentation.html#jdbcrepository
>> >
>> > It should be pretty clear how you define what you are looking for.
>> >
>> > --> (B) How to use this data as individual variables which I can arrange
>> > into a linked data relationship (ManifoldCF mapping module?) <--
>> >
>> > Rafa's previous reply about the RepositoryDocument is appropriate.
>> > Basically, an output connector will be handed one of those objects for every
>> > MCF "document".  The javadoc for it is here:
>> >
>> > https://manifoldcf.apache.org/release/trunk/api/framework/org/apache/manifoldcf/agents/interfaces/RepositoryDocument.html
>> >
>> > --> (C) How difficult would it be to connect to Marmotta's webservice(s).
>> > I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>> > support for elasticsearch so maybe I could put something together that
>> > talks to Marmotta..<--
>> >
>> > You can readily write your own output connector.  There's a book, in fact,
>> > describing how to do that.  See:
>> >
>> > https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
>> >
>> > ... and read Chapter 9.
>> >
>> > Thanks,
>> > Karl
>> >
>> >
>> > On Sun, Jul 5, 2015 at 11:53 AM, Joshua Dunham <jo...@gmail.com>
>> > wrote:
>> >>
>> >> That sounds promising. Would you recommend ManifoldCF for this? If so,
>> >> do you know of any resources which I can use to get up to speed with
>> >> using it in this way?
>> >>
>> >> -J
>> >>
>> >>> On 4 July 2015 at 21:48,  <rh...@gmail.com> wrote:
>> >>> Hi Joshua,
>> >>>
>> >>> The ManifoldCF unit logic in terms of indexing is the Repository
>> >>> Document
>> >>> which, simplifying a lot, model a document composed by content plus
>> >>> metadata
>> >>> (key-value). It should be relative easy to tripifly that structure and
>> >>> push
>> >>> it to Marmotta using SPARQL update queries or Marmotta’s java client for
>> >>> adding resources.
>> >>> The Generic Database connector uses a set of queries for crawling the
>> >>> database. You should have to use that queries to get you data. I’m not
>> >>> completely sure if each record result is converted directly to a
>> >>> Repository
>> >>> Document, that is something that I would need to check.
>> >>>
>> >>> Hope that helps,
>> >>> Cheers, Rafa
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <jo...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hi ManifoldCF Users (and Devs)
>> >>>>
>> >>>> I'm wondering if ManifoldCF can work in my use case. I have some
>> >>>> random mySQL and Oracle DB's that I would like to connect to and
>> >>>> extract certain known bits of info, format them each a certain way and
>> >>>> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
>> >>>> store for linked data so I would need to parse and store the mySQL and
>> >>>> Oracle DB's info into a linked format, which is no problem for me to
>> >>>> create the relationships etc, I just need something that would let me
>> >>>> specifically do this.
>> >>>>
>> >>>> From what I've read, ManifoldCF can connect to mySQL and Oracle
>> >>>> (via non-distributed libraries), and store the results out in several
>> >>>> target data stores. What isn't clear is
>> >>>> (A) How I define the data to grab, whether some SQL statement or the
>> >>>> like.
>> >>>> (B) How to use this data as individual variables which I can arrange
>> >>>> into a linked data relationship (ManifoldCF mapping module?)
>> >>>> (C) How difficult would it be to connect to Marmotta's webservice(s).
>> >>>> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>> >>>> support for elasticsearch so maybe I could put something together that
>> >>>> talks to Marmotta..
>> >>>>
>> >>>> Would this be possible? If so, could someone point me in the right
>> >>>> direction?
>> >>>>
>> >>>> Thanks!
>> >>>> -Joshua
>> >>>>
>> >>>>
>> >>>> [1] - http://marmotta.apache.org/index.html
>

Re: Output Connector - Apache Marmotta

Posted by Karl Wright <da...@gmail.com>.

Hi Joshua,

"My question is; why would I need to setup different transform modules?
Since there is no real config to do in the transform connector (all the
good stuff seems to be under Task config) I'm not sure why I would need to
make more than one and keep reusing it by changing the transform paeans
under task?"

While the Metadata Adjuster transformer has no configuration, the model
that MCF uses for transformers is just like the model it uses for other
kinds of connectors.  Pretend for a moment that you needed to call an
external system to do content extraction, then you will see the point.

Thanks,
Karl


On Wed, Sep 9, 2015 at 12:55 PM, Joshua Dunham <jo...@gmail.com>
wrote:

> Hi Karl, Rafa,
>
>   I finally had some time to work on this and I have a scheme which
> (largely) works very well but I have some question, one stumbling block,
> and one comment.
>
> First, my environment consists of, Manifold v 2.1, MariaDB which I
> imported a small CSV into for testing, and Marmotta 3.3.
>
> The real interesting bits are in specifying the Task. I have the mySQL
> input -> metadata adjuster -> filesystem output. mySQL is setup and the
> connection shows as OK and on starting the job, it does write files to the
> output folder.
>
> Getting the list of ID's works well no issue there, and I'm not using
> versioning or access tokens yet. The stumbling block has to do with setting
> up the Data Query and the best use of the $URL and $DATA variables. First:
> I've hijacked the $URL into ~ CONCAT("addresses/", id) AS $(URLCOLUMN)
> which has the effect of creating a folder called addresses in the root of
> the output folder. Inside of the addresses folder it makes numbered files
> corresponding to the rowID. I can point the root folder path at the
> marmotta import directory and even use the context templating feature
> (setting 'addresses' into the real context name). That's really slick for
> out of the box hack at integration.
>
> What is not apparent is how to use the metadata adjuster to interact with
> the variables in the Data query. I've followed the guide and made a simple
> hello, False, ${city} statement but the only bits that are written into the
> file are the contents of the $DATACOLUMN variable. So, given a simple
> address book in a database with columns, id, street, city, region, country,
> post code, latitude, longitude ... how should I approach making such a data
> query? My real use cases will be much much more complicated so I'm
> wondering if you have some explanation of how I should want to use that
> field and maybe a small SQL snippet example with those columns? :) My end
> goal is to have a column called out and then use the metadata adjuster to
> simply prepend each column's value with a string. So if the city is 'New
> York' it would write out city:New_York or the like.
>
> =====
>
> The comment was in regards to a bit of sample data which could ship with
> the source. It would be very educational if there was a complex but real
> configuration of ManifoldCF that links to a sqlite3 file as input and maybe
> the same one input db but a different table as output?
>
> =====
>
> My question is; why would I need to setup different transform modules?
> Since there is no real config to do in the transform connector (all the
> good stuff seems to be under Task config) I'm not sure why I would need to
> make more than one and keep reusing it by changing the transform paeans
> under task?
>
>
> Thank you!
>
> J
>
>
> > On 5 July 2015 at 17:27, Karl Wright <da...@gmail.com> wrote:
> > Hi Joshua,
> >
> > My take:
> >
> > --> (A) How I define the data to grab, whether some SQL statement or the
> > like. <--
> >
> > Have a look at the user documentation here:
> >
> https://manifoldcf.apache.org/release/release-1.9/en_US/end-user-documentation.html#jdbcrepository
> >
> > It should be pretty clear how you define what you are looking for.
> >
> > --> (B) How to use this data as individual variables which I can arrange
> > into a linked data relationship (ManifoldCF mapping module?) <--
> >
> > Rafa's previous reply about the RepositoryDocument is appropriate.
> > Basically, an output connector will be handed one of those objects for
> every
> > MCF "document".  The javadoc for it is here:
> >
> >
> https://manifoldcf.apache.org/release/trunk/api/framework/org/apache/manifoldcf/agents/interfaces/RepositoryDocument.html
> >
> > --> (C) How difficult would it be to connect to Marmotta's webservice(s).
> > I'm not familiar with the exact mechanism, but I saw ManifoldCF has
> > support for elasticsearch so maybe I could put something together that
> > talks to Marmotta..<--
> >
> > You can readily write your own output connector.  There's a book, in
> fact,
> > describing how to do that.  See:
> >
> > https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
> >
> > ... and read Chapter 9.
> >
> > Thanks,
> > Karl
> >
> >
> > On Sun, Jul 5, 2015 at 11:53 AM, Joshua Dunham <jo...@gmail.com>
> > wrote:
> >>
> >> That sounds promising. Would you recommend ManifoldCF for this? If so,
> >> do you know of any resources which I can use to get up to speed with
> >> using it in this way?
> >>
> >> -J
> >>
> >>> On 4 July 2015 at 21:48,  <rh...@gmail.com> wrote:
> >>> Hi Joshua,
> >>>
> >>> The ManifoldCF unit logic in terms of indexing is the Repository
> >>> Document
> >>> which, simplifying a lot, model a document composed by content plus
> >>> metadata
> >>> (key-value). It should be relative easy to tripifly that structure and
> >>> push
> >>> it to Marmotta using SPARQL update queries or Marmotta’s java client
> for
> >>> adding resources.
> >>> The Generic Database connector uses a set of queries for crawling the
> >>> database. You should have to use that queries to get you data. I’m not
> >>> completely sure if each record result is converted directly to a
> >>> Repository
> >>> Document, that is something that I would need to check.
> >>>
> >>> Hope that helps,
> >>> Cheers, Rafa
> >>>
> >>>
> >>>
> >>>
> >>> On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <joshua.dunham@gmail.com
> >
> >>> wrote:
> >>>>
> >>>> Hi ManifoldCF Users (and Devs)
> >>>>
> >>>> I'm wondering if ManifoldCF can work in my use case. I have some
> >>>> random mySQL and Oracle DB's that I would like to connect to and
> >>>> extract certain known bits of info, format them each a certain way and
> >>>> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
> >>>> store for linked data so I would need to parse and store the mySQL and
> >>>> Oracle DB's info into a linked format, which is no problem for me to
> >>>> create the relationships etc, I just need something that would let me
> >>>> specifically do this.
> >>>>
> >>>> From what I've read, ManifoldCF can connect to mySQL and Oracle
> >>>> (via non-distributed libraries), and store the results out in several
> >>>> target data stores. What isn't clear is
> >>>> (A) How I define the data to grab, whether some SQL statement or the
> >>>> like.
> >>>> (B) How to use this data as individual variables which I can arrange
> >>>> into a linked data relationship (ManifoldCF mapping module?)
> >>>> (C) How difficult would it be to connect to Marmotta's webservice(s).
> >>>> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
> >>>> support for elasticsearch so maybe I could put something together that
> >>>> talks to Marmotta..
> >>>>
> >>>> Would this be possible? If so, could someone point me in the right
> >>>> direction?
> >>>>
> >>>> Thanks!
> >>>> -Joshua
> >>>>
> >>>>
> >>>> [1] - http://marmotta.apache.org/index.html
>
>

Re: Output Connector - Apache Marmotta

Posted by Joshua Dunham <jo...@gmail.com>.

Hi Karl, Rafa,

  I finally had some time to work on this and I have a scheme which (largely) works very well but I have some question, one stumbling block, and one comment.

First, my environment consists of, Manifold v 2.1, MariaDB which I imported a small CSV into for testing, and Marmotta 3.3.

The real interesting bits are in specifying the Task. I have the mySQL input -> metadata adjuster -> filesystem output. mySQL is setup and the connection shows as OK and on starting the job, it does write files to the output folder.

Getting the list of ID's works well no issue there, and I'm not using versioning or access tokens yet. The stumbling block has to do with setting up the Data Query and the best use of the $URL and $DATA variables. First: I've hijacked the $URL into ~ CONCAT("addresses/", id) AS $(URLCOLUMN) which has the effect of creating a folder called addresses in the root of the output folder. Inside of the addresses folder it makes numbered files corresponding to the rowID. I can point the root folder path at the marmotta import directory and even use the context templating feature (setting 'addresses' into the real context name). That's really slick for out of the box hack at integration.

What is not apparent is how to use the metadata adjuster to interact with the variables in the Data query. I've followed the guide and made a simple hello, False, ${city} statement but the only bits that are written into the file are the contents of the $DATACOLUMN variable. So, given a simple address book in a database with columns, id, street, city, region, country, post code, latitude, longitude ... how should I approach making such a data query? My real use cases will be much much more complicated so I'm wondering if you have some explanation of how I should want to use that field and maybe a small SQL snippet example with those columns? :) My end goal is to have a column called out and then use the metadata adjuster to simply prepend each column's value with a string. So if the city is 'New York' it would write out city:New_York or the like. 

=====

The comment was in regards to a bit of sample data which could ship with the source. It would be very educational if there was a complex but real configuration of ManifoldCF that links to a sqlite3 file as input and maybe the same one input db but a different table as output?

=====

My question is; why would I need to setup different transform modules? Since there is no real config to do in the transform connector (all the good stuff seems to be under Task config) I'm not sure why I would need to make more than one and keep reusing it by changing the transform paeans under task?

Thank you!

J

> On 5 July 2015 at 17:27, Karl Wright <da...@gmail.com> wrote:
> Hi Joshua,
> 
> My take:
> 
> --> (A) How I define the data to grab, whether some SQL statement or the
> like. <--
> 
> Have a look at the user documentation here:
> https://manifoldcf.apache.org/release/release-1.9/en_US/end-user-documentation.html#jdbcrepository
> 
> It should be pretty clear how you define what you are looking for.
> 
> --> (B) How to use this data as individual variables which I can arrange
> into a linked data relationship (ManifoldCF mapping module?) <--
> 
> Rafa's previous reply about the RepositoryDocument is appropriate. 
> Basically, an output connector will be handed one of those objects for every
> MCF "document".  The javadoc for it is here:
> 
> https://manifoldcf.apache.org/release/trunk/api/framework/org/apache/manifoldcf/agents/interfaces/RepositoryDocument.html
> 
> --> (C) How difficult would it be to connect to Marmotta's webservice(s).
> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
> support for elasticsearch so maybe I could put something together that
> talks to Marmotta..<--
> 
> You can readily write your own output connector.  There's a book, in fact,
> describing how to do that.  See:
> 
> https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
> 
> ... and read Chapter 9.
> 
> Thanks,
> Karl
> 
> 
> On Sun, Jul 5, 2015 at 11:53 AM, Joshua Dunham <jo...@gmail.com>
> wrote:
>> 
>> That sounds promising. Would you recommend ManifoldCF for this? If so,
>> do you know of any resources which I can use to get up to speed with
>> using it in this way?
>> 
>> -J
>> 
>>> On 4 July 2015 at 21:48,  <rh...@gmail.com> wrote:
>>> Hi Joshua,
>>> 
>>> The ManifoldCF unit logic in terms of indexing is the Repository
>>> Document
>>> which, simplifying a lot, model a document composed by content plus
>>> metadata
>>> (key-value). It should be relative easy to tripifly that structure and
>>> push
>>> it to Marmotta using SPARQL update queries or Marmotta’s java client for
>>> adding resources.
>>> The Generic Database connector uses a set of queries for crawling the
>>> database. You should have to use that queries to get you data. I’m not
>>> completely sure if each record result is converted directly to a
>>> Repository
>>> Document, that is something that I would need to check.
>>> 
>>> Hope that helps,
>>> Cheers, Rafa
>>> 
>>> 
>>> 
>>> 
>>> On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <jo...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi ManifoldCF Users (and Devs)
>>>> 
>>>> I'm wondering if ManifoldCF can work in my use case. I have some
>>>> random mySQL and Oracle DB's that I would like to connect to and
>>>> extract certain known bits of info, format them each a certain way and
>>>> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
>>>> store for linked data so I would need to parse and store the mySQL and
>>>> Oracle DB's info into a linked format, which is no problem for me to
>>>> create the relationships etc, I just need something that would let me
>>>> specifically do this.
>>>> 
>>>> From what I've read, ManifoldCF can connect to mySQL and Oracle
>>>> (via non-distributed libraries), and store the results out in several
>>>> target data stores. What isn't clear is
>>>> (A) How I define the data to grab, whether some SQL statement or the
>>>> like.
>>>> (B) How to use this data as individual variables which I can arrange
>>>> into a linked data relationship (ManifoldCF mapping module?)
>>>> (C) How difficult would it be to connect to Marmotta's webservice(s).
>>>> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>>>> support for elasticsearch so maybe I could put something together that
>>>> talks to Marmotta..
>>>> 
>>>> Would this be possible? If so, could someone point me in the right
>>>> direction?
>>>> 
>>>> Thanks!
>>>> -Joshua
>>>> 
>>>> 
>>>> [1] - http://marmotta.apache.org/index.html

Re: Output Connector - Apache Marmotta

Posted by Karl Wright <da...@gmail.com>.

Hi Joshua,

My take:

--> (A) How I define the data to grab, whether some SQL statement or the
like. <--

Have a look at the user documentation here:
https://manifoldcf.apache.org/release/release-1.9/en_US/end-user-documentation.html#jdbcrepository

It should be pretty clear how you define what you are looking for.

--> (B) How to use this data as individual variables which I can arrange
into a linked data relationship (ManifoldCF mapping module?) <--

Rafa's previous reply about the RepositoryDocument is appropriate.
Basically, an output connector will be handed one of those objects for
every MCF "document".  The javadoc for it is here:

https://manifoldcf.apache.org/release/trunk/api/framework/org/apache/manifoldcf/agents/interfaces/RepositoryDocument.html

--> (C) How difficult would it be to connect to Marmotta's webservice(s).
I'm not familiar with the exact mechanism, but I saw ManifoldCF has
support for elasticsearch so maybe I could put something together that
talks to Marmotta..<--

You can readily write your own output connector.  There's a book, in fact,
describing how to do that.  See:

https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs

... and read Chapter 9.

Thanks,
Karl


On Sun, Jul 5, 2015 at 11:53 AM, Joshua Dunham <jo...@gmail.com>
wrote:

> That sounds promising. Would you recommend ManifoldCF for this? If so,
> do you know of any resources which I can use to get up to speed with
> using it in this way?
>
> -J
>
> On 4 July 2015 at 21:48,  <rh...@gmail.com> wrote:
> > Hi Joshua,
> >
> > The ManifoldCF unit logic in terms of indexing is the Repository Document
> > which, simplifying a lot, model a document composed by content plus
> metadata
> > (key-value). It should be relative easy to tripifly that structure and
> push
> > it to Marmotta using SPARQL update queries or Marmotta’s java client for
> > adding resources.
> > The Generic Database connector uses a set of queries for crawling the
> > database. You should have to use that queries to get you data. I’m not
> > completely sure if each record result is converted directly to a
> Repository
> > Document, that is something that I would need to check.
> >
> > Hope that helps,
> > Cheers, Rafa
> >
> >
> >
> >
> > On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <jo...@gmail.com>
> > wrote:
> >>
> >> Hi ManifoldCF Users (and Devs)
> >>
> >> I'm wondering if ManifoldCF can work in my use case. I have some
> >> random mySQL and Oracle DB's that I would like to connect to and
> >> extract certain known bits of info, format them each a certain way and
> >> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
> >> store for linked data so I would need to parse and store the mySQL and
> >> Oracle DB's info into a linked format, which is no problem for me to
> >> create the relationships etc, I just need something that would let me
> >> specifically do this.
> >>
> >> From what I've read, ManifoldCF can connect to mySQL and Oracle
> >> (via non-distributed libraries), and store the results out in several
> >> target data stores. What isn't clear is
> >> (A) How I define the data to grab, whether some SQL statement or the
> like.
> >> (B) How to use this data as individual variables which I can arrange
> >> into a linked data relationship (ManifoldCF mapping module?)
> >> (C) How difficult would it be to connect to Marmotta's webservice(s).
> >> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
> >> support for elasticsearch so maybe I could put something together that
> >> talks to Marmotta..
> >>
> >> Would this be possible? If so, could someone point me in the right
> >> direction?
> >>
> >> Thanks!
> >> -Joshua
> >>
> >>
> >> [1] - http://marmotta.apache.org/index.html
> >
> >
>

Re: Output Connector - Apache Marmotta

Posted by Joshua Dunham <jo...@gmail.com>.

That sounds promising. Would you recommend ManifoldCF for this? If so,
do you know of any resources which I can use to get up to speed with
using it in this way?

-J

On 4 July 2015 at 21:48,  <rh...@gmail.com> wrote:
> Hi Joshua,
>
> The ManifoldCF unit logic in terms of indexing is the Repository Document
> which, simplifying a lot, model a document composed by content plus metadata
> (key-value). It should be relative easy to tripifly that structure and push
> it to Marmotta using SPARQL update queries or Marmotta’s java client for
> adding resources.
> The Generic Database connector uses a set of queries for crawling the
> database. You should have to use that queries to get you data. I’m not
> completely sure if each record result is converted directly to a Repository
> Document, that is something that I would need to check.
>
> Hope that helps,
> Cheers, Rafa
>
>
>
>
> On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <jo...@gmail.com>
> wrote:
>>
>> Hi ManifoldCF Users (and Devs)
>>
>> I'm wondering if ManifoldCF can work in my use case. I have some
>> random mySQL and Oracle DB's that I would like to connect to and
>> extract certain known bits of info, format them each a certain way and
>> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
>> store for linked data so I would need to parse and store the mySQL and
>> Oracle DB's info into a linked format, which is no problem for me to
>> create the relationships etc, I just need something that would let me
>> specifically do this.
>>
>> From what I've read, ManifoldCF can connect to mySQL and Oracle
>> (via non-distributed libraries), and store the results out in several
>> target data stores. What isn't clear is
>> (A) How I define the data to grab, whether some SQL statement or the like.
>> (B) How to use this data as individual variables which I can arrange
>> into a linked data relationship (ManifoldCF mapping module?)
>> (C) How difficult would it be to connect to Marmotta's webservice(s).
>> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>> support for elasticsearch so maybe I could put something together that
>> talks to Marmotta..
>>
>> Would this be possible? If so, could someone point me in the right
>> direction?
>>
>> Thanks!
>> -Joshua
>>
>>
>> [1] - http://marmotta.apache.org/index.html
>
>

Re: Output Connector - Apache Marmotta

Posted by rh...@gmail.com.

Hi Joshua, 

The ManifoldCF unit logic in terms of indexing is the Repository Document which, simplifying a lot, model a document composed by content plus metadata (key-value). It should be relative easy to tripifly that structure and push it to Marmotta using SPARQL update queries or Marmotta’s java client for adding resources. 

The Generic Database connector uses a set of queries for crawling the database. You should have to use that queries to get you data. I’m not completely sure if each record result is converted directly to a Repository Document, that is something that I would need to check.

Hope that helps, 

Cheers, Rafa

On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <jo...@gmail.com>
wrote:

> Hi ManifoldCF Users (and Devs)
>     I'm wondering if ManifoldCF can work in my use case. I have some
> random mySQL and Oracle DB's that I would like to connect to and
> extract certain known bits of info, format them each a certain way and
> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
> store for linked data so I would need to parse and store the mySQL and
> Oracle DB's info into a linked format, which is no problem for me to
> create the relationships etc, I just need something that would let me
> specifically do this.
>     From what I've read, ManifoldCF can connect to mySQL and Oracle
> (via non-distributed libraries), and store the results out in several
> target data stores. What isn't clear is
> (A) How I define the data to grab, whether some SQL statement or the like.
> (B) How to use this data as individual variables which I can arrange
> into a linked data relationship (ManifoldCF mapping module?)
> (C) How difficult would it be to connect to Marmotta's webservice(s).
> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
> support for elasticsearch so maybe I could put something together that
> talks to Marmotta..
> Would this be possible? If so, could someone point me in the right direction?
>   Thanks!
>    -Joshua
> [1] - http://marmotta.apache.org/index.html