You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@rya.apache.org by "pranav.puri" <pr...@orkash.com> on 2016/10/17 10:43:25 UTC

regarding the ingestion of ttl files

Hi everyone

i am getting an exception while ingesting infobox properties turtle file from dbpedia dataset into rya .
The file size is 9.8 gb.
Here is the exception :-
Caused by: org.openrdf.repository.RepositoryException: org.openrdf.sail.SailException: mvm.rya.api.persist.RyaDAOException: java.io.IOException: mvm.rya.api.resolver.triple.TripleRowResolverException: mvm.rya.api.resolver.RyaTypeResolverException: Exception occurred serializing data[1000000000000]
	at org.openrdf.repository.sail.SailRepositoryConnection.addWithoutCommit(SailRepositoryConnection.java:287)
	at org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:469)
	at mvm.rya.rdftriplestore.utils.CombineContextsRdfInserter.handleStatement(CombineContextsRdfInserter.java:137)
	at org.openrdf.rio.turtle.TurtleParser.reportStatement(TurtleParser.java:1081)
	at org.openrdf.rio.turtle.TurtleParser.parseObject(TurtleParser.java:482)
	at org.openrdf.rio.turtle.TurtleParser.parseObjectList(TurtleParser.java:405)
	at org.openrdf.rio.turtle.TurtleParser.parsePredicateObjectList(TurtleParser.java:377)
	at org.openrdf.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:362)
	at org.openrdf.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:250)
	at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:205)
	at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:148)
	at org.openrdf.repository.util.RDFLoader.loadInputStreamOrReader(RDFLoader.java:325)
	at org.openrdf.repository.util.RDFLoader.load(RDFLoader.java:222)
	at mvm.rya.rdftriplestore.RyaSailRepositoryConnection.add(RyaSailRepositoryConnection.java:61)

Regards
Pranav


Re: regarding the ingestion of ttl files

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.
I'm sorry, but bulk ingest won't help.  The issue occurs when Rya persists
data to the datastore.  This mechanism is the same for both bulk ingest and
the simple Sail Java client.

You probably want the Distribution Zip.  It's kinda large, but it should
have everything you need.

Since we don't have an official release, try downloading a snapshot from
[1] or our latest release candidate from [2].  Let us know if you run into
any issues or if you find any strange behavior along the way.

--Aaron

[1]
https://repository.apache.org/content/repositories/snapshots/org/apache/rya/rya.indexing.example/3.2.10-incubating-SNAPSHOT/rya.indexing.example-3.2.10-incubating-20161013.201206-1-distribution.zip

[2]
https://repository.apache.org/content/groups/staging/org/apache/rya/rya.indexing.example/3.2.10-incubating/rya.indexing.example-3.2.10-incubating-distribution.zip




On Wed, Oct 19, 2016 at 8:24 AM pranav.puri <pr...@orkash.com> wrote:

Thanks for filling the ticket .

Will i be able to overcome this problem if i ingest this file using bulk
ingest jar using map reduce?
and i was also trying to run the map reduce job using the example given
in the git documentation but
on building the project i could not find
accumulo.rya-3.0.4-SNAPSHOT-shaded.jar in the target folder.

Where can i find this jar?

Regards
Pranav

On Monday 17 October 2016 07:16 PM, Aaron D. Mihalik wrote:
> Thanks Pranav.
>
> I verified that this is an issue with Rya, and filed a ticket here [1].
> The ticket includes some very simple example code that reproduces the
issue.
>
> Basically, Rya is calling Java's Integer.parseInt("1000000000000"), and
> that is throwing an exception.
>
> A workaround would be to remove the data from dbpedia dataset or modify
> AccumuloRyaDAO.commit(AccumuloRyaDAO.java:291) to discard that exception
an
> move on with the ingest.  Both workarounds will result in data loss.
>
> --Aaron
>
>
>
> On Mon, Oct 17, 2016 at 6:43 AM pranav.puri <pr...@orkash.com>
wrote:
>
> Hi everyone
>
> i am getting an exception while ingesting infobox properties turtle file
> from dbpedia dataset into rya .
> The file size is 9.8 gb.
> Here is the exception :-
> Caused by: org.openrdf.repository.RepositoryException:
> org.openrdf.sail.SailException: mvm.rya.api.persist.RyaDAOException:
> java.io.IOException:
> mvm.rya.api.resolver.triple.TripleRowResolverException:
> mvm.rya.api.resolver.RyaTypeResolverException: Exception occurred
> serializing data[1000000000000]
>          at
>
org.openrdf.repository.sail.SailRepositoryConnection.addWithoutCommit(SailRepositoryConnection.java:287)
>          at
>
org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:469)
>          at
>
mvm.rya.rdftriplestore.utils.CombineContextsRdfInserter.handleStatement(CombineContextsRdfInserter.java:137)
>          at
>
org.openrdf.rio.turtle.TurtleParser.reportStatement(TurtleParser.java:1081)
>          at
> org.openrdf.rio.turtle.TurtleParser.parseObject(TurtleParser.java:482)
>          at
> org.openrdf.rio.turtle.TurtleParser.parseObjectList(TurtleParser.java:405)
>          at
>
org.openrdf.rio.turtle.TurtleParser.parsePredicateObjectList(TurtleParser.java:377)
>          at
> org.openrdf.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:362)
>          at
> org.openrdf.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:250)
>          at
org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:205)
>          at
org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:148)
>          at
>
org.openrdf.repository.util.RDFLoader.loadInputStreamOrReader(RDFLoader.java:325)
>          at org.openrdf.repository.util.RDFLoader.load(RDFLoader.java:222)
>          at
>
mvm.rya.rdftriplestore.RyaSailRepositoryConnection.add(RyaSailRepositoryConnection.java:61)
>
> Regards
> Pranav
>

Re: regarding the ingestion of ttl files

Posted by "pranav.puri" <pr...@orkash.com>.
Thanks for filling the ticket .

Will i be able to overcome this problem if i ingest this file using bulk 
ingest jar using map reduce?
and i was also trying to run the map reduce job using the example given 
in the git documentation but
on building the project i could not find 
accumulo.rya-3.0.4-SNAPSHOT-shaded.jar in the target folder.

Where can i find this jar?

Regards
Pranav

On Monday 17 October 2016 07:16 PM, Aaron D. Mihalik wrote:
> Thanks Pranav.
>
> I verified that this is an issue with Rya, and filed a ticket here [1].
> The ticket includes some very simple example code that reproduces the issue.
>
> Basically, Rya is calling Java's Integer.parseInt("1000000000000"), and
> that is throwing an exception.
>
> A workaround would be to remove the data from dbpedia dataset or modify
> AccumuloRyaDAO.commit(AccumuloRyaDAO.java:291) to discard that exception an
> move on with the ingest.  Both workarounds will result in data loss.
>
> --Aaron
>
>
>
> On Mon, Oct 17, 2016 at 6:43 AM pranav.puri <pr...@orkash.com> wrote:
>
> Hi everyone
>
> i am getting an exception while ingesting infobox properties turtle file
> from dbpedia dataset into rya .
> The file size is 9.8 gb.
> Here is the exception :-
> Caused by: org.openrdf.repository.RepositoryException:
> org.openrdf.sail.SailException: mvm.rya.api.persist.RyaDAOException:
> java.io.IOException:
> mvm.rya.api.resolver.triple.TripleRowResolverException:
> mvm.rya.api.resolver.RyaTypeResolverException: Exception occurred
> serializing data[1000000000000]
>          at
> org.openrdf.repository.sail.SailRepositoryConnection.addWithoutCommit(SailRepositoryConnection.java:287)
>          at
> org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:469)
>          at
> mvm.rya.rdftriplestore.utils.CombineContextsRdfInserter.handleStatement(CombineContextsRdfInserter.java:137)
>          at
> org.openrdf.rio.turtle.TurtleParser.reportStatement(TurtleParser.java:1081)
>          at
> org.openrdf.rio.turtle.TurtleParser.parseObject(TurtleParser.java:482)
>          at
> org.openrdf.rio.turtle.TurtleParser.parseObjectList(TurtleParser.java:405)
>          at
> org.openrdf.rio.turtle.TurtleParser.parsePredicateObjectList(TurtleParser.java:377)
>          at
> org.openrdf.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:362)
>          at
> org.openrdf.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:250)
>          at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:205)
>          at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:148)
>          at
> org.openrdf.repository.util.RDFLoader.loadInputStreamOrReader(RDFLoader.java:325)
>          at org.openrdf.repository.util.RDFLoader.load(RDFLoader.java:222)
>          at
> mvm.rya.rdftriplestore.RyaSailRepositoryConnection.add(RyaSailRepositoryConnection.java:61)
>
> Regards
> Pranav
>


Re: regarding the ingestion of ttl files

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.
Thanks Pranav.

I verified that this is an issue with Rya, and filed a ticket here [1].
The ticket includes some very simple example code that reproduces the issue.

Basically, Rya is calling Java's Integer.parseInt("1000000000000"), and
that is throwing an exception.

A workaround would be to remove the data from dbpedia dataset or modify
AccumuloRyaDAO.commit(AccumuloRyaDAO.java:291) to discard that exception an
move on with the ingest.  Both workarounds will result in data loss.

--Aaron



On Mon, Oct 17, 2016 at 6:43 AM pranav.puri <pr...@orkash.com> wrote:

Hi everyone

i am getting an exception while ingesting infobox properties turtle file
from dbpedia dataset into rya .
The file size is 9.8 gb.
Here is the exception :-
Caused by: org.openrdf.repository.RepositoryException:
org.openrdf.sail.SailException: mvm.rya.api.persist.RyaDAOException:
java.io.IOException:
mvm.rya.api.resolver.triple.TripleRowResolverException:
mvm.rya.api.resolver.RyaTypeResolverException: Exception occurred
serializing data[1000000000000]
        at
org.openrdf.repository.sail.SailRepositoryConnection.addWithoutCommit(SailRepositoryConnection.java:287)
        at
org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:469)
        at
mvm.rya.rdftriplestore.utils.CombineContextsRdfInserter.handleStatement(CombineContextsRdfInserter.java:137)
        at
org.openrdf.rio.turtle.TurtleParser.reportStatement(TurtleParser.java:1081)
        at
org.openrdf.rio.turtle.TurtleParser.parseObject(TurtleParser.java:482)
        at
org.openrdf.rio.turtle.TurtleParser.parseObjectList(TurtleParser.java:405)
        at
org.openrdf.rio.turtle.TurtleParser.parsePredicateObjectList(TurtleParser.java:377)
        at
org.openrdf.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:362)
        at
org.openrdf.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:250)
        at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:205)
        at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:148)
        at
org.openrdf.repository.util.RDFLoader.loadInputStreamOrReader(RDFLoader.java:325)
        at org.openrdf.repository.util.RDFLoader.load(RDFLoader.java:222)
        at
mvm.rya.rdftriplestore.RyaSailRepositoryConnection.add(RyaSailRepositoryConnection.java:61)

Regards
Pranav