You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@clerezza.apache.org by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org> on 2010/08/25 09:48:16 UTC

[jira] Created: (CLEREZZA-286) Inserting lager literals takes too long

Inserting lager literals takes too long
---------------------------------------

                 Key: CLEREZZA-286
                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
             Project: Clerezza
          Issue Type: Improvement
            Reporter: Reto Bachmann-Gmür


As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907640#action_12907640 ] 

Reto Bachmann-Gmür commented on CLEREZZA-286:
---------------------------------------------

The current circle causes problems beacuse independently of the startlevel the Provider will never be actiavted before the TcManager leving time for mgraphs to be created without using the externalizer.

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reto Bachmann-Gmür reassigned CLEREZZA-286:
-------------------------------------------

    Assignee: Reto Bachmann-Gmür

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902340#action_12902340 ] 

Reto Bachmann-Gmür commented on CLEREZZA-286:
---------------------------------------------

The solution approach I'm planing to implement is the following:

- On write operations objects that are of type base64 are stored (possibly decoded, i.e. just there values) as files with a directory and file name based on a strong hash of the content. instead of the literals a uri (a urn:x-something scheme?) encoding the hash/filename is stored in the undelying graph
- On read operations if the object is of that particular uri-type it is replaced with the literal reconstructed from the file

issues:
- this prevents sparql fastlane for queries containing such a literal
- as the literal could be stored by its binary value it is akward transforming it to base64 on the storage layer and recreate the byte[] in the literal-factory
- this can quite easily be implemented in a single storage provider (i.e. in the TDB provider) but one could image this being a storage option available for any StorageProvider 

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902474#action_12902474 ] 

Reto Bachmann-Gmür commented on CLEREZZA-286:
---------------------------------------------

yes exactly, the cycle definitively stinks, but preventing it requires either quite some code duplication or increased complexity in the TcManager (outsourcing delegation to "real" providers to another service).

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903794#action_12903794 ] 

Reto Bachmann-Gmür commented on CLEREZZA-286:
---------------------------------------------

I committed a patch, it is not yet part of any launcher and I'm not yet closing the issue as I think some more investigation on memory usage and performance consequences are needed, speed could be (additionally) improved with:
- lazy writing of files: the file is written in the background, till it is written the literal can be accessed from memory
- value-literals forwarding the value (like the byte array) to the store and computing the lexical form only on demand


To test the patchdo the following:
- start clerezza
- install the bundle setting and set its startlevel to 3
- shutdown clerezza
- delete the mgraphs that shall use the externalizer (i.e. the content graph) from the felix cache
- start clerezza again

if everything worked you will notice http://tpf.localhost/content.graph-externalizedliterals alongside the normaal contennt-graph

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Manuel Innerhofer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904666#action_12904666 ] 

Manuel Innerhofer commented on CLEREZZA-286:
--------------------------------------------

I tried it out and here is what I found:
- The literals returned seem to be malformed. E.g. when I request a infodiscobit the following exception occurs:
org.apache.clerezza.rdf.core.InvalidLiteralTypeException: Cannot create a class [B from a literal of type <>
	at org.apache.clerezza.rdf.core.impl.SimpleLiteralFactory$ByteArrayConverter.createObject(SimpleLiteralFactory.java:91)
	at org.apache.clerezza.rdf.core.impl.SimpleLiteralFactory$ByteArrayConverter.createObject(SimpleLiteralFactory.java:78)
	at org.apache.clerezza.rdf.core.impl.SimpleLiteralFactory.createObject(SimpleLiteralFactory.java:287)
	at org.apache.clerezza.platform.content.InfoDiscobit.getData(InfoDiscobit.java:89)
The .rdf of this typed literal has no datatype and the literal itself starts with "http://www.w3.org/2001/XMLSchema#base64Binary^^"

- I think the procedure to create a MGraph with externalized literals pretty awkward. I know that it is provisional to do it the way you described (with setting the startlevel etc.). But on the other hand assuming this bundle will be delivered with the platform, how will I create a MGraph that shouldn't externalize its literals? Do I have to stop/deactivate the LiteralExternalizingProvider service, then create the MGraph that shouldn't have externalized literals, then start LiteralExternalizingProvider again?



> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902351#action_12902351 ] 

Reto Bachmann-Gmür commented on CLEREZZA-286:
---------------------------------------------

Starting to implement I think it could be as easy to make a TDB independent solution as follows:
- LiteralExternalizingProvider is a TcProvider with a high prority that support creation of MGraphs for URIs (graphname) that do not end with "-externalizedliterals"
- When an MGraph is to be created it uses the TcManager to create an MGraph with the requested name + the string  "-externalizedliterals" so that another provider will create this MGraph (referred to as baseMGraph)
- all read and write operations are directed to baseMGraph doing the appropriate replacement of the literals

Issue:
- There's a circular reference between this Provider and the TcManager, but as the references can be dynamic this shouldn't be an problem, the alternative would be to reimplement significant parts of TcManager (which could have the advantage of preventing duplicated locking and security checks)

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Hasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902466#action_12902466 ] 

Hasan commented on CLEREZZA-286:
--------------------------------

Reto,

ok. what I don't understand is how you would implement createMGraph in LiteralExternalizingProvider. It will be invoked by TcManager when an MGraph is to be created. 

Something like this?
TcManager.createMGraph invokes LiteralExternalizingProvider.createMGraph
LiteralExternalizingProvider.createMGraph checks graphname ?
if it ends with -externalizedliterals, it throws unsupported exception
if it does not end with -externalizedliterals, it invokes TcManager.createMGraph with appending -externalizedliterals to the graphname
Then TcManager.createMGraph invokes again LiteralExternalizingProvider.createMGraph

OK. I think this should work, if the cycle does not pose a problem.

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909170#action_12909170 ] 

Reto Bachmann-Gmür commented on CLEREZZA-286:
---------------------------------------------

Cicle problem now solved. Should this be added to the tdb launcher or should there be an additional launcher "tdb-externalizing"?

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reto Bachmann-Gmür closed CLEREZZA-286.
---------------------------------------

    Resolution: Fixed

not created launcher, added installation documentation to javadoc of provider

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902343#action_12902343 ] 

Reto Bachmann-Gmür commented on CLEREZZA-286:
---------------------------------------------

as a first implementation I will tie the solution to TDB not yet caring about possible performance improvements with respect to the LiteralFactory

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902454#action_12902454 ] 

Reto Bachmann-Gmür commented on CLEREZZA-286:
---------------------------------------------

Hasan,
 
If literals are externalized or not is an issue of configuration of the Clerezza instance and is completely transparent to the application, it is not up to the app to decide if literals are externalized or not. If you create an Mgraph <http://mygraph/>  (made the uri legal) with the externalizer installed this will efectively also create the mgraph <http://mygraph/-externalizedliterals> where instead of the literals you'll have references to them, but applications and users will not access this MGraph (with the exceptions maybe of backup purposes).

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906789#action_12906789 ] 

Reto Bachmann-Gmür commented on CLEREZZA-286:
---------------------------------------------

fix that one, investigating on different order of graph creation scenarios now.

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907232#action_12907232 ] 

Reto Bachmann-Gmür commented on CLEREZZA-286:
---------------------------------------------

Thanks manuel for your suggestions.

The file remains on disk as the cost of checking if in another triple there is a reference to the same hash seems much higher than the waste of disk-space. we could add a cleaning utility later that removes literal-files no longer referenced in the undelying base graph.

I will implement the lazy reading feature you suggest.

I imagine adding a ValueTypedLiteral<T> object which extends TypedLiteral with a getValue: T method, but we could also introduce a DataTypedLiteral which extends ValueTypedLiteral<byte[]> and provides a getInputStream method. 

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Hasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902407#action_12902407 ] 

Hasan commented on CLEREZZA-286:
--------------------------------

I don't quite get your description. 
If I want to create an MGraph http://mygraph without externalization of base64 literals then I will call TcManager.createMGraph("http://mygraph")
and since LiteralExternalizingProvider has a higher weight, it will be invoked by TcManager. But then what will it do actually?
- Calling TcManager with an extended graphName
or
- Throwing Unsupported erxception since the graphName does not end with "-externalizedliterals"

Assuming it calls TcManager with an extended graphName, then what will TcManager do? Iterating again through the provider list? and calling again LiteralExternalizingProvider?

Another alternative:
If we don't "misuse" the graphname parameter in createMGraph, we may need to provide a second signature for creaeteMGraph which takes a second parameter for specifying externalization of base64 literals. But then we have to change every implementation of TcProvider to implement this method or we change TcProvider from interface to an abstract class. 
What are other consequences here?


> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Manuel Innerhofer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906715#action_12906715 ] 

Manuel Innerhofer commented on CLEREZZA-286:
--------------------------------------------

step 1: upload an image. E.g.
curl -uadmin:admin -T image.jpg -H "Content-type: image/jpeg" http://localhost:8080/image
step 2: browse the image URL (http://localhost:8080/image) in a browser

When I do that then the exception above occurs.

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Manuel Innerhofer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907177#action_12907177 ] 

Manuel Innerhofer commented on CLEREZZA-286:
--------------------------------------------

When I add a triple containing a typedliteral of datatype base64 and remove it again, then the file remains on the harddisk. Maybe this is intentional?
Otherwise it seems to work now. 
Here some suggestions:
- you read the fileinputstream in parseLiteral() in ExternalizingMGraph and use it to instantiate a TypeLiteralImpl. To improve performance you could implement a TypedLiteral that does not read the fileinputstream until getLexicalForm() or getDataType() is called.
- You wrote "as the literal could be stored by its binary value it is akward transforming it to base64 on the storage layer and recreate the byte[] in the literal-factory".  If this problem would be solved, would it be possible to extend the TypeLiteral interface with a method returning an InputStream? A new TypedLiteral implementation could directly return the FileInputStream, which would be directly streamed back to a client (e.g. in org.apache.clerezza.platform.content.InfoDiscobitWriter), therefore preventing unnecessary memory usage to buffer the literal string.  


> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CLEREZZA-286) Inserting lager literals takes too long

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CLEREZZA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906513#action_12906513 ] 

Reto Bachmann-Gmür commented on CLEREZZA-286:
---------------------------------------------

Hi Manuel, could you indicate steps to reproduce the described problems?

To your question "how will I create a MGraph that shouldn't externalize its literals?" That's not currently possible (apart from the hacks you describe), but I think this is a more general issue, there is no support to somehow constrain the TcProvider chosen, triple collcetions are always created ba the one with the highest priority. It might be desirable to be able to tell the TcManager what the requirement are and that the best provider for that purpose is chosen (criteria might be read-performance, write-performance, size, leanification, ...).

> Inserting lager literals takes too long
> ---------------------------------------
>
>                 Key: CLEREZZA-286
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-286
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Reto Bachmann-Gmür
>            Assignee: Reto Bachmann-Gmür
>
> As binary resources are stored as literal values in the graph inserting literals of many MB is not a seldom usage scenario. Inserting such literals however take very long and thus require a very long write-lock on the graph (e.g. on my laptop 9 seconds when uploading a 40MB file with TDB and 3 second with Sesame)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.