You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by 锐光刘 <ti...@gmail.com> on 2012/01/05 17:12:02 UTC

How to read and parse a large OWL file?

I am graceful for your help very much ! But there is still a problem which
I can't solve by myself.
I think most developers have encountered the same problem of JAVA heap
,when we read a big OWL file with JENA function ,which will load the whole
file in memory and parse it to statements much larger than raw OWL file.Our
team always get the problem about the JAVA heap space.We want to make a
Hadoop system and MapReduce to solve the problem.But it seems that we
should code JENA source code to make it as the Hadoop framwork,isn't it? Or
chould you teach me how to read large OWL file and parse it ,such as
DBpedia and Opencyc.
Thanks for your help again !
Best Wishes !

Throw Exception during process  a large OWL file

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
 at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
 at java.lang.StringBuffer.append(StringBuffer.java:320)
at
com.hp.hpl.jena.graph.impl.LiteralLabelImpl.toString(LiteralLabelImpl.java:256)
 at com.hp.hpl.jena.graph.Node_Literal.toString(Node_Literal.java:52)
at
com.hp.hpl.jena.rdf.model.impl.StatementBase.objectString(StatementBase.java:168)
 at
com.hp.hpl.jena.rdf.model.impl.StatementBase.toString(StatementBase.java:156)
at GetFunctions.Propertys.GetClassPropertyWithOutput(Propertys.java:33)
 at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:76)
at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
 at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
 at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
 at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
 at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
at GetFunctions.Classes.GetAllClassWithOutput(Classes.java:43)
 at AIndex.AllDetails.main(AllDetails.java:18)

Throw Exception when read a large OWL file

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.hp.hpl.jena.mem.HashedBunchMap.newKeyArray(HashedBunchMap.java:25)
 at com.hp.hpl.jena.mem.HashedBunchMap.grow(HashedBunchMap.java:66)
at com.hp.hpl.jena.mem.HashedBunchMap.put(HashedBunchMap.java:56)
 at
com.hp.hpl.jena.mem.faster.NodeToTriplesMapFaster.add(NodeToTriplesMapFaster.java:32)
at
com.hp.hpl.jena.mem.GraphTripleStoreBase.add(GraphTripleStoreBase.java:50)
 at
com.hp.hpl.jena.mem.faster.GraphMemFaster.performAdd(GraphMemFaster.java:32)
at
com.hp.hpl.jena.graph.impl.SimpleBulkUpdateHandler.add(SimpleBulkUpdateHandler.java:39)
 at
com.hp.hpl.jena.graph.impl.WrappedBulkUpdateHandler.add(WrappedBulkUpdateHandler.java:36)
at com.hp.hpl.jena.rdf.arp.JenaHandler.bulkUpdate(JenaHandler.java:83)
 at com.hp.hpl.jena.rdf.arp.JenaHandler.statement(JenaHandler.java:76)
at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.triple(XMLHandler.java:100)
 at
com.hp.hpl.jena.rdf.arp.impl.ParserSupport.triple(ParserSupport.java:240)
at
com.hp.hpl.jena.rdf.arp.states.WantDescription.aPredAndObj(WantDescription.java:101)
 at
com.hp.hpl.jena.rdf.arp.states.WantPropertyElement.theObject(WantPropertyElement.java:196)
at
com.hp.hpl.jena.rdf.arp.states.WantTypedLiteral.endElement(WantTypedLiteral.java:37)
 at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.endElement(XMLHandler.java:147)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
 at org.apache.xerces.impl.XMLNamespaceBinder.handleEndElement(Unknown
Source)
at org.apache.xerces.impl.XMLNamespaceBinder.endElement(Unknown Source)
 at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
 at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
 at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
 at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at com.hp.hpl.jena.rdf.arp.impl.RDFXMLParser.parse(RDFXMLParser.java:142)
 at com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:158)
at com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:145)
 at com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:215)
at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:197)
 at com.hp.hpl.jena.ontology.impl.OntModelImpl.read(OntModelImpl.java:2048)

Re: How to read and parse a large OWL file?

Posted by Andy Seaborne <an...@apache.org>.

Hi there,

How big is the file and how much heap are you using?

Also, how are you reading the data?

As Paolo says, inference takes memory and the more inference capability 
you apply needs more memory.  But files like DBpedia and Opencyc can be 
read a RDF (without inference) and loaded into a database.

	Andy


On 05/01/12 17:54, Paolo Castagna wrote:
> Hi,
> have you tried to load your data using TDB?
>
>   "TDB is a component of Jena for RDF storage and query,
>    as well as the full range of Jena APIs."
>
> Documentation is here:
> http://incubator.apache.org/jena/documentation/tdb/
>
> ... do you also need/want inference?
>
> See also:
> http://incubator.apache.org/jena/documentation/io/riot.html#inference
> ... which at the moment is doing (only) RDF Schema (but some parts of
> OWL could perhaps be added in a similar way).
>
> Paolo
>
> 锐光刘 wrote:
>> I am graceful for your help very much ! But there is still a problem which
>> I can't solve by myself.
>> I think most developers have encountered the same problem of JAVA heap
>> ,when we read a big OWL file with JENA function ,which will load the whole
>> file in memory and parse it to statements much larger than raw OWL file.Our
>> team always get the problem about the JAVA heap space.We want to make a
>> Hadoop system and MapReduce to solve the problem.But it seems that we
>> should code JENA source code to make it as the Hadoop framwork,isn't it? Or
>> chould you teach me how to read large OWL file and parse it ,such as
>> DBpedia and Opencyc.
>> Thanks for your help again !
>> Best Wishes !
>>
>> Throw Exception during process  a large OWL file
>>
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>> at java.util.Arrays.copyOf(Arrays.java:2882)
>>   at
>> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>> at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
>>   at java.lang.StringBuffer.append(StringBuffer.java:320)
>> at
>> com.hp.hpl.jena.graph.impl.LiteralLabelImpl.toString(LiteralLabelImpl.java:256)
>>   at com.hp.hpl.jena.graph.Node_Literal.toString(Node_Literal.java:52)
>> at
>> com.hp.hpl.jena.rdf.model.impl.StatementBase.objectString(StatementBase.java:168)
>>   at
>> com.hp.hpl.jena.rdf.model.impl.StatementBase.toString(StatementBase.java:156)
>> at GetFunctions.Propertys.GetClassPropertyWithOutput(Propertys.java:33)
>>   at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:76)
>> at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
>>   at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
>> at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
>>   at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
>> at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
>>   at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
>> at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
>>   at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
>> at GetFunctions.Classes.GetAllClassWithOutput(Classes.java:43)
>>   at AIndex.AllDetails.main(AllDetails.java:18)
>>
>> Throw Exception when read a large OWL file
>>
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>> at com.hp.hpl.jena.mem.HashedBunchMap.newKeyArray(HashedBunchMap.java:25)
>>   at com.hp.hpl.jena.mem.HashedBunchMap.grow(HashedBunchMap.java:66)
>> at com.hp.hpl.jena.mem.HashedBunchMap.put(HashedBunchMap.java:56)
>>   at
>> com.hp.hpl.jena.mem.faster.NodeToTriplesMapFaster.add(NodeToTriplesMapFaster.java:32)
>> at
>> com.hp.hpl.jena.mem.GraphTripleStoreBase.add(GraphTripleStoreBase.java:50)
>>   at
>> com.hp.hpl.jena.mem.faster.GraphMemFaster.performAdd(GraphMemFaster.java:32)
>> at
>> com.hp.hpl.jena.graph.impl.SimpleBulkUpdateHandler.add(SimpleBulkUpdateHandler.java:39)
>>   at
>> com.hp.hpl.jena.graph.impl.WrappedBulkUpdateHandler.add(WrappedBulkUpdateHandler.java:36)
>> at com.hp.hpl.jena.rdf.arp.JenaHandler.bulkUpdate(JenaHandler.java:83)
>>   at com.hp.hpl.jena.rdf.arp.JenaHandler.statement(JenaHandler.java:76)
>> at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.triple(XMLHandler.java:100)
>>   at
>> com.hp.hpl.jena.rdf.arp.impl.ParserSupport.triple(ParserSupport.java:240)
>> at
>> com.hp.hpl.jena.rdf.arp.states.WantDescription.aPredAndObj(WantDescription.java:101)
>>   at
>> com.hp.hpl.jena.rdf.arp.states.WantPropertyElement.theObject(WantPropertyElement.java:196)
>> at
>> com.hp.hpl.jena.rdf.arp.states.WantTypedLiteral.endElement(WantTypedLiteral.java:37)
>>   at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.endElement(XMLHandler.java:147)
>> at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
>>   at org.apache.xerces.impl.XMLNamespaceBinder.handleEndElement(Unknown
>> Source)
>> at org.apache.xerces.impl.XMLNamespaceBinder.endElement(Unknown Source)
>>   at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown
>> Source)
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>> Source)
>>   at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>> Source)
>> at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
>>   at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>   at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>> at com.hp.hpl.jena.rdf.arp.impl.RDFXMLParser.parse(RDFXMLParser.java:142)
>>   at com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:158)
>> at com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:145)
>>   at com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:215)
>> at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:197)
>>   at com.hp.hpl.jena.ontology.impl.OntModelImpl.read(OntModelImpl.java:2048)
>>

Re: How to read and parse a large OWL file?

Posted by Paolo Castagna <ca...@googlemail.com>.

Hi,
have you tried to load your data using TDB?

 "TDB is a component of Jena for RDF storage and query,
  as well as the full range of Jena APIs."

Documentation is here:
http://incubator.apache.org/jena/documentation/tdb/

... do you also need/want inference?

See also:
http://incubator.apache.org/jena/documentation/io/riot.html#inference
... which at the moment is doing (only) RDF Schema (but some parts of
OWL could perhaps be added in a similar way).

Paolo

锐光刘 wrote:
> I am graceful for your help very much ! But there is still a problem which
> I can't solve by myself.
> I think most developers have encountered the same problem of JAVA heap
> ,when we read a big OWL file with JENA function ,which will load the whole
> file in memory and parse it to statements much larger than raw OWL file.Our
> team always get the problem about the JAVA heap space.We want to make a
> Hadoop system and MapReduce to solve the problem.But it seems that we
> should code JENA source code to make it as the Hadoop framwork,isn't it? Or
> chould you teach me how to read large OWL file and parse it ,such as
> DBpedia and Opencyc.
> Thanks for your help again !
> Best Wishes !
> 
> Throw Exception during process  a large OWL file
> 
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2882)
>  at
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
>  at java.lang.StringBuffer.append(StringBuffer.java:320)
> at
> com.hp.hpl.jena.graph.impl.LiteralLabelImpl.toString(LiteralLabelImpl.java:256)
>  at com.hp.hpl.jena.graph.Node_Literal.toString(Node_Literal.java:52)
> at
> com.hp.hpl.jena.rdf.model.impl.StatementBase.objectString(StatementBase.java:168)
>  at
> com.hp.hpl.jena.rdf.model.impl.StatementBase.toString(StatementBase.java:156)
> at GetFunctions.Propertys.GetClassPropertyWithOutput(Propertys.java:33)
>  at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:76)
> at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
>  at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
> at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
>  at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
> at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
>  at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
> at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
>  at GetFunctions.Classes.GetSubClassWithOutput(Classes.java:78)
> at GetFunctions.Classes.GetAllClassWithOutput(Classes.java:43)
>  at AIndex.AllDetails.main(AllDetails.java:18)
> 
> Throw Exception when read a large OWL file
> 
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at com.hp.hpl.jena.mem.HashedBunchMap.newKeyArray(HashedBunchMap.java:25)
>  at com.hp.hpl.jena.mem.HashedBunchMap.grow(HashedBunchMap.java:66)
> at com.hp.hpl.jena.mem.HashedBunchMap.put(HashedBunchMap.java:56)
>  at
> com.hp.hpl.jena.mem.faster.NodeToTriplesMapFaster.add(NodeToTriplesMapFaster.java:32)
> at
> com.hp.hpl.jena.mem.GraphTripleStoreBase.add(GraphTripleStoreBase.java:50)
>  at
> com.hp.hpl.jena.mem.faster.GraphMemFaster.performAdd(GraphMemFaster.java:32)
> at
> com.hp.hpl.jena.graph.impl.SimpleBulkUpdateHandler.add(SimpleBulkUpdateHandler.java:39)
>  at
> com.hp.hpl.jena.graph.impl.WrappedBulkUpdateHandler.add(WrappedBulkUpdateHandler.java:36)
> at com.hp.hpl.jena.rdf.arp.JenaHandler.bulkUpdate(JenaHandler.java:83)
>  at com.hp.hpl.jena.rdf.arp.JenaHandler.statement(JenaHandler.java:76)
> at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.triple(XMLHandler.java:100)
>  at
> com.hp.hpl.jena.rdf.arp.impl.ParserSupport.triple(ParserSupport.java:240)
> at
> com.hp.hpl.jena.rdf.arp.states.WantDescription.aPredAndObj(WantDescription.java:101)
>  at
> com.hp.hpl.jena.rdf.arp.states.WantPropertyElement.theObject(WantPropertyElement.java:196)
> at
> com.hp.hpl.jena.rdf.arp.states.WantTypedLiteral.endElement(WantTypedLiteral.java:37)
>  at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.endElement(XMLHandler.java:147)
> at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
>  at org.apache.xerces.impl.XMLNamespaceBinder.handleEndElement(Unknown
> Source)
> at org.apache.xerces.impl.XMLNamespaceBinder.endElement(Unknown Source)
>  at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown
> Source)
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>  at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
> at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
>  at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>  at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at com.hp.hpl.jena.rdf.arp.impl.RDFXMLParser.parse(RDFXMLParser.java:142)
>  at com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:158)
> at com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:145)
>  at com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:215)
> at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:197)
>  at com.hp.hpl.jena.ontology.impl.OntModelImpl.read(OntModelImpl.java:2048)
>