You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@rya.apache.org by Geoffry Roberts <th...@gmail.com> on 2018/03/14 21:28:10 UTC

Loading Rya via Map/Reduce is this the best I can do?

All,

Am I doing things the best way?

I have a pile of data that I need to load into Rya.  I must first convert
it into RDF, then do the load.  I am using map/reduce because I have a lot
of data.

I have an hdfs directory full of RDF in NTRIPLE format.

I have a mapper like this:

protected void map(LongWritable key, RyaStatementWritable value, Context ctx)
{

// RyaStatementWritable gives me a RyaStatement like this:


RyaStatement ryaStatement = value.getRyaStatement();


// At this point I find myself having to convert the

// RyaStatement into an OpenRDF Statement like this:


Sail ryaSail = RyaSailFactory.getInstance(conf);

ValueFactory vf = ryaSail.getValueFactory();

Statement stmt = vf.createStatement(vf.createURI(sS), vf.createURI(sP), vf
.createURI(sO));

ctx.write(null, stmt);

}

In my reducer, I use AccumuloLoadStatements to lood Rya like this:

protected void reduce(NullWritable key, Iterable<Statement> stmts,
Reducer<NullWritable, Statement, NullWritable, NullWritable>.Context ctx)
throws IOException, InterruptedException {

super.reduce(key, stmts, ctx);


AccumuloLoadStatements load = ...omitted for brevity...


try {

load.loadStatements(instance, stmts);

} catch (RyaClientException e) {

log.error("", e);

}

}


Thanks

-- 
There are ways and there are ways,

Geoffry Roberts

Re: Loading Rya via Map/Reduce is this the best I can do?

Posted by Geoffry Roberts <th...@gmail.com>.

Yup, it was a visibility issue.  Silly me I forgot all about that.  I'm
just now getting back to Accumulo following a two year hiatus.

Ugh! Scan work good now. Me see everything.  Life is good.

Thanks for the help.

On Fri, Mar 16, 2018 at 9:31 AM, Jeff Dasch <hc...@gmail.com> wrote:

> It is possible that visibilities are the culprit.  Does your Accumulo user
> have the same authorizations you specified on the command line for the
> RdfFileInputTool?   By CL scan, you're referring to an Accumulo Shell scan,
> right?  You can use the Accumulo Shell command "getauths" to verify your
> user's authorizations.
>
> On Thu, Mar 15, 2018 at 4:53 PM, Geoffry Roberts <th...@gmail.com>
> wrote:
>
> > RdfFileInputTool worked, thanks for the help.
> >
> > Do I have a visibility problem?
> >
> > I ran the tool and it showed 47 records were insterted--good.  I see the
> > tables were created as expected with the right prefix.
> >
> > But when I attempt a CL scan, I get one line out that appears to be
> telling
> > me which version of Rya I am using.   I did both the load and the scan as
> > the same user.
> >
> > Not sure what to make of this.
> >
> > On Thu, Mar 15, 2018 at 9:16 AM, Jeff Dasch <hc...@gmail.com> wrote:
> >
> > > Geoffry,
> > >
> > > Take a look at the RdfFileInputTool [1] in the rya.mapreduce module.
> It
> > > doesn't look like the shaded jar was uploaded to maven, so you will
> > likely
> > > need to build that artifact yourself by including the "-P mr" profile
> > when
> > > building Rya.
> > >
> > > There are instructions for loading data with the RdfFileInputTool here
> > [2],
> > > but they appear to be out of date.  I haven't tried it recently, but
> this
> > > command, based on the unit test [3] should work:
> > >
> > > hadoop jar target/rya.mapreduce-3.2.12-shaded.jar
> > > org.apache.rya.accumulo.mr.tools.RdfFileInputTool
> > > -Dac.zk=zoo1,zoo2,zoo3 -Dac.instance=accumulo -Dac.username=root
> > > -Dac.pwd=password -Dac.auth=auths -Dac.cv=auths -Drdf.tablePrefix=rya_
> > > -Drdf.format=N-Triples /hdfs/path/to/triplefiles
> > >
> > >
> > > [1]
> > > https://github.com/apache/incubator-rya/blob/master/
> > > mapreduce/src/main/java/org/apache/rya/accumulo/mr/tools/
> > > RdfFileInputTool.java
> > > [2]
> > > https://github.com/apache/incubator-rya/blob/master/
> > > extras/rya.manual/src/site/markdown/loaddata.md
> > > [3]
> > > https://github.com/apache/incubator-rya/blob/master/
> > > mapreduce/src/test/java/org/apache/rya/accumulo/mr/tools/
> > > RdfFileInputToolTest.java
> > >
> > >
> > >
> > > On Wed, Mar 14, 2018 at 5:28 PM, Geoffry Roberts <
> threadedblue@gmail.com
> > >
> > > wrote:
> > >
> > > > All,
> > > >
> > > > Am I doing things the best way?
> > > >
> > > > I have a pile of data that I need to load into Rya.  I must first
> > convert
> > > > it into RDF, then do the load.  I am using map/reduce because I have
> a
> > > lot
> > > > of data.
> > > >
> > > > I have an hdfs directory full of RDF in NTRIPLE format.
> > > >
> > > > I have a mapper like this:
> > > >
> > > > protected void map(LongWritable key, RyaStatementWritable value,
> > Context
> > > > ctx)
> > > > {
> > > >
> > > > // RyaStatementWritable gives me a RyaStatement like this:
> > > >
> > > >
> > > > RyaStatement ryaStatement = value.getRyaStatement();
> > > >
> > > >
> > > > // At this point I find myself having to convert the
> > > >
> > > > // RyaStatement into an OpenRDF Statement like this:
> > > >
> > > >
> > > > Sail ryaSail = RyaSailFactory.getInstance(conf);
> > > >
> > > > ValueFactory vf = ryaSail.getValueFactory();
> > > >
> > > > Statement stmt = vf.createStatement(vf.createURI(sS),
> > vf.createURI(sP),
> > > vf
> > > > .createURI(sO));
> > > >
> > > > ctx.write(null, stmt);
> > > >
> > > > }
> > > >
> > > > In my reducer, I use AccumuloLoadStatements to lood Rya like this:
> > > >
> > > > protected void reduce(NullWritable key, Iterable<Statement> stmts,
> > > > Reducer<NullWritable, Statement, NullWritable, NullWritable>.Context
> > ctx)
> > > > throws IOException, InterruptedException {
> > > >
> > > > super.reduce(key, stmts, ctx);
> > > >
> > > >
> > > > AccumuloLoadStatements load = ...omitted for brevity...
> > > >
> > > >
> > > > try {
> > > >
> > > > load.loadStatements(instance, stmts);
> > > >
> > > > } catch (RyaClientException e) {
> > > >
> > > > log.error("", e);
> > > >
> > > > }
> > > >
> > > > }
> > > >
> > > >
> > > > Thanks
> > > >
> > > > --
> > > > There are ways and there are ways,
> > > >
> > > > Geoffry Roberts
> > > >
> > >
> >
> >
> >
> > --
> > There are ways and there are ways,
> >
> > Geoffry Roberts
> >
>



-- 
There are ways and there are ways,

Geoffry Roberts

Re: Loading Rya via Map/Reduce is this the best I can do?

Posted by Jeff Dasch <hc...@gmail.com>.

It is possible that visibilities are the culprit.  Does your Accumulo user
have the same authorizations you specified on the command line for the
RdfFileInputTool?   By CL scan, you're referring to an Accumulo Shell scan,
right?  You can use the Accumulo Shell command "getauths" to verify your
user's authorizations.

On Thu, Mar 15, 2018 at 4:53 PM, Geoffry Roberts <th...@gmail.com>
wrote:

> RdfFileInputTool worked, thanks for the help.
>
> Do I have a visibility problem?
>
> I ran the tool and it showed 47 records were insterted--good.  I see the
> tables were created as expected with the right prefix.
>
> But when I attempt a CL scan, I get one line out that appears to be telling
> me which version of Rya I am using.   I did both the load and the scan as
> the same user.
>
> Not sure what to make of this.
>
> On Thu, Mar 15, 2018 at 9:16 AM, Jeff Dasch <hc...@gmail.com> wrote:
>
> > Geoffry,
> >
> > Take a look at the RdfFileInputTool [1] in the rya.mapreduce module.  It
> > doesn't look like the shaded jar was uploaded to maven, so you will
> likely
> > need to build that artifact yourself by including the "-P mr" profile
> when
> > building Rya.
> >
> > There are instructions for loading data with the RdfFileInputTool here
> [2],
> > but they appear to be out of date.  I haven't tried it recently, but this
> > command, based on the unit test [3] should work:
> >
> > hadoop jar target/rya.mapreduce-3.2.12-shaded.jar
> > org.apache.rya.accumulo.mr.tools.RdfFileInputTool
> > -Dac.zk=zoo1,zoo2,zoo3 -Dac.instance=accumulo -Dac.username=root
> > -Dac.pwd=password -Dac.auth=auths -Dac.cv=auths -Drdf.tablePrefix=rya_
> > -Drdf.format=N-Triples /hdfs/path/to/triplefiles
> >
> >
> > [1]
> > https://github.com/apache/incubator-rya/blob/master/
> > mapreduce/src/main/java/org/apache/rya/accumulo/mr/tools/
> > RdfFileInputTool.java
> > [2]
> > https://github.com/apache/incubator-rya/blob/master/
> > extras/rya.manual/src/site/markdown/loaddata.md
> > [3]
> > https://github.com/apache/incubator-rya/blob/master/
> > mapreduce/src/test/java/org/apache/rya/accumulo/mr/tools/
> > RdfFileInputToolTest.java
> >
> >
> >
> > On Wed, Mar 14, 2018 at 5:28 PM, Geoffry Roberts <threadedblue@gmail.com
> >
> > wrote:
> >
> > > All,
> > >
> > > Am I doing things the best way?
> > >
> > > I have a pile of data that I need to load into Rya.  I must first
> convert
> > > it into RDF, then do the load.  I am using map/reduce because I have a
> > lot
> > > of data.
> > >
> > > I have an hdfs directory full of RDF in NTRIPLE format.
> > >
> > > I have a mapper like this:
> > >
> > > protected void map(LongWritable key, RyaStatementWritable value,
> Context
> > > ctx)
> > > {
> > >
> > > // RyaStatementWritable gives me a RyaStatement like this:
> > >
> > >
> > > RyaStatement ryaStatement = value.getRyaStatement();
> > >
> > >
> > > // At this point I find myself having to convert the
> > >
> > > // RyaStatement into an OpenRDF Statement like this:
> > >
> > >
> > > Sail ryaSail = RyaSailFactory.getInstance(conf);
> > >
> > > ValueFactory vf = ryaSail.getValueFactory();
> > >
> > > Statement stmt = vf.createStatement(vf.createURI(sS),
> vf.createURI(sP),
> > vf
> > > .createURI(sO));
> > >
> > > ctx.write(null, stmt);
> > >
> > > }
> > >
> > > In my reducer, I use AccumuloLoadStatements to lood Rya like this:
> > >
> > > protected void reduce(NullWritable key, Iterable<Statement> stmts,
> > > Reducer<NullWritable, Statement, NullWritable, NullWritable>.Context
> ctx)
> > > throws IOException, InterruptedException {
> > >
> > > super.reduce(key, stmts, ctx);
> > >
> > >
> > > AccumuloLoadStatements load = ...omitted for brevity...
> > >
> > >
> > > try {
> > >
> > > load.loadStatements(instance, stmts);
> > >
> > > } catch (RyaClientException e) {
> > >
> > > log.error("", e);
> > >
> > > }
> > >
> > > }
> > >
> > >
> > > Thanks
> > >
> > > --
> > > There are ways and there are ways,
> > >
> > > Geoffry Roberts
> > >
> >
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Loading Rya via Map/Reduce is this the best I can do?

Posted by Geoffry Roberts <th...@gmail.com>.

RdfFileInputTool worked, thanks for the help.

Do I have a visibility problem?

I ran the tool and it showed 47 records were insterted--good.  I see the
tables were created as expected with the right prefix.

But when I attempt a CL scan, I get one line out that appears to be telling
me which version of Rya I am using.   I did both the load and the scan as
the same user.

Not sure what to make of this.

On Thu, Mar 15, 2018 at 9:16 AM, Jeff Dasch <hc...@gmail.com> wrote:

> Geoffry,
>
> Take a look at the RdfFileInputTool [1] in the rya.mapreduce module.  It
> doesn't look like the shaded jar was uploaded to maven, so you will likely
> need to build that artifact yourself by including the "-P mr" profile when
> building Rya.
>
> There are instructions for loading data with the RdfFileInputTool here [2],
> but they appear to be out of date.  I haven't tried it recently, but this
> command, based on the unit test [3] should work:
>
> hadoop jar target/rya.mapreduce-3.2.12-shaded.jar
> org.apache.rya.accumulo.mr.tools.RdfFileInputTool
> -Dac.zk=zoo1,zoo2,zoo3 -Dac.instance=accumulo -Dac.username=root
> -Dac.pwd=password -Dac.auth=auths -Dac.cv=auths -Drdf.tablePrefix=rya_
> -Drdf.format=N-Triples /hdfs/path/to/triplefiles
>
>
> [1]
> https://github.com/apache/incubator-rya/blob/master/
> mapreduce/src/main/java/org/apache/rya/accumulo/mr/tools/
> RdfFileInputTool.java
> [2]
> https://github.com/apache/incubator-rya/blob/master/
> extras/rya.manual/src/site/markdown/loaddata.md
> [3]
> https://github.com/apache/incubator-rya/blob/master/
> mapreduce/src/test/java/org/apache/rya/accumulo/mr/tools/
> RdfFileInputToolTest.java
>
>
>
> On Wed, Mar 14, 2018 at 5:28 PM, Geoffry Roberts <th...@gmail.com>
> wrote:
>
> > All,
> >
> > Am I doing things the best way?
> >
> > I have a pile of data that I need to load into Rya.  I must first convert
> > it into RDF, then do the load.  I am using map/reduce because I have a
> lot
> > of data.
> >
> > I have an hdfs directory full of RDF in NTRIPLE format.
> >
> > I have a mapper like this:
> >
> > protected void map(LongWritable key, RyaStatementWritable value, Context
> > ctx)
> > {
> >
> > // RyaStatementWritable gives me a RyaStatement like this:
> >
> >
> > RyaStatement ryaStatement = value.getRyaStatement();
> >
> >
> > // At this point I find myself having to convert the
> >
> > // RyaStatement into an OpenRDF Statement like this:
> >
> >
> > Sail ryaSail = RyaSailFactory.getInstance(conf);
> >
> > ValueFactory vf = ryaSail.getValueFactory();
> >
> > Statement stmt = vf.createStatement(vf.createURI(sS), vf.createURI(sP),
> vf
> > .createURI(sO));
> >
> > ctx.write(null, stmt);
> >
> > }
> >
> > In my reducer, I use AccumuloLoadStatements to lood Rya like this:
> >
> > protected void reduce(NullWritable key, Iterable<Statement> stmts,
> > Reducer<NullWritable, Statement, NullWritable, NullWritable>.Context ctx)
> > throws IOException, InterruptedException {
> >
> > super.reduce(key, stmts, ctx);
> >
> >
> > AccumuloLoadStatements load = ...omitted for brevity...
> >
> >
> > try {
> >
> > load.loadStatements(instance, stmts);
> >
> > } catch (RyaClientException e) {
> >
> > log.error("", e);
> >
> > }
> >
> > }
> >
> >
> > Thanks
> >
> > --
> > There are ways and there are ways,
> >
> > Geoffry Roberts
> >
>



-- 
There are ways and there are ways,

Geoffry Roberts

Re: Loading Rya via Map/Reduce is this the best I can do?

Posted by Jeff Dasch <hc...@gmail.com>.

Geoffry,

Take a look at the RdfFileInputTool [1] in the rya.mapreduce module.  It
doesn't look like the shaded jar was uploaded to maven, so you will likely
need to build that artifact yourself by including the "-P mr" profile when
building Rya.

There are instructions for loading data with the RdfFileInputTool here [2],
but they appear to be out of date.  I haven't tried it recently, but this
command, based on the unit test [3] should work:

hadoop jar target/rya.mapreduce-3.2.12-shaded.jar
org.apache.rya.accumulo.mr.tools.RdfFileInputTool
-Dac.zk=zoo1,zoo2,zoo3 -Dac.instance=accumulo -Dac.username=root
-Dac.pwd=password -Dac.auth=auths -Dac.cv=auths -Drdf.tablePrefix=rya_
-Drdf.format=N-Triples /hdfs/path/to/triplefiles


[1]
https://github.com/apache/incubator-rya/blob/master/mapreduce/src/main/java/org/apache/rya/accumulo/mr/tools/RdfFileInputTool.java
[2]
https://github.com/apache/incubator-rya/blob/master/extras/rya.manual/src/site/markdown/loaddata.md
[3]
https://github.com/apache/incubator-rya/blob/master/mapreduce/src/test/java/org/apache/rya/accumulo/mr/tools/RdfFileInputToolTest.java



On Wed, Mar 14, 2018 at 5:28 PM, Geoffry Roberts <th...@gmail.com>
wrote:

> All,
>
> Am I doing things the best way?
>
> I have a pile of data that I need to load into Rya.  I must first convert
> it into RDF, then do the load.  I am using map/reduce because I have a lot
> of data.
>
> I have an hdfs directory full of RDF in NTRIPLE format.
>
> I have a mapper like this:
>
> protected void map(LongWritable key, RyaStatementWritable value, Context
> ctx)
> {
>
> // RyaStatementWritable gives me a RyaStatement like this:
>
>
> RyaStatement ryaStatement = value.getRyaStatement();
>
>
> // At this point I find myself having to convert the
>
> // RyaStatement into an OpenRDF Statement like this:
>
>
> Sail ryaSail = RyaSailFactory.getInstance(conf);
>
> ValueFactory vf = ryaSail.getValueFactory();
>
> Statement stmt = vf.createStatement(vf.createURI(sS), vf.createURI(sP), vf
> .createURI(sO));
>
> ctx.write(null, stmt);
>
> }
>
> In my reducer, I use AccumuloLoadStatements to lood Rya like this:
>
> protected void reduce(NullWritable key, Iterable<Statement> stmts,
> Reducer<NullWritable, Statement, NullWritable, NullWritable>.Context ctx)
> throws IOException, InterruptedException {
>
> super.reduce(key, stmts, ctx);
>
>
> AccumuloLoadStatements load = ...omitted for brevity...
>
>
> try {
>
> load.loadStatements(instance, stmts);
>
> } catch (RyaClientException e) {
>
> log.error("", e);
>
> }
>
> }
>
>
> Thanks
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>