You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Matthew Jarvis (JIRA)" <ji...@apache.org> on 2014/10/24 23:26:35 UTC

[jira] [Issue Comment Deleted] (JENA-804) Jena is not reusing already allocated space on the file system which results in large amounts of disk space reserved by Jena files

     [ https://issues.apache.org/jira/browse/JENA-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew Jarvis updated JENA-804:
--------------------------------
    Comment: was deleted

(was: I attached TdbGrowthTests.java which illustrates the growth of the TDB if the same models are repeatedly added and removed.  Each iteration of the test, I add 1,000 models (each containing a single triple) to the TDB in one transaction.  Then I remove the same models in a separate transaction.  At the end of each iteration, there are 0 triples in the TDB.  Every 1,000 iterations, I validate the triple count (making sure it's zero) and I output the TDB size.  This is the output I'm seeing from the test:

Initial TDB size: 192.000 MB
Size of TDB after 1000 iterations: 435.431 MB
Size of TDB after 2000 iterations: 722.679 MB
Size of TDB after 3000 iterations: 1009.927 MB
Size of TDB after 4000 iterations: 1297.175 MB
Size of TDB after 5000 iterations: 1584.423 MB
Size of TDB after 6000 iterations: 1872.048 MB
Size of TDB after 7000 iterations: 2163.431 MB
Size of TDB after 8000 iterations: 2402.679 MB
Size of TDB after 9000 iterations: 2689.927 MB
Size of TDB after 10000 iterations: 2977.175 MB)

> Jena is not reusing already allocated space on the file system which results in large amounts of disk space reserved by Jena files
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-804
>                 URL: https://issues.apache.org/jira/browse/JENA-804
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Jena
>    Affects Versions: Jena 2.11.2, TDB 1.0.2
>         Environment: Windows 7, IBM JRE 1.7, Tomcat 7.0.54
>            Reporter: Keith Wells
>         Attachments: TdbGrowthTests.java, out.txt, test-tdb-size.sh
>
>
> We have a product based on Jena TDB where we insert quads to Jena TDB along with the deletion of quads.  We understand the performance over space architectural decision to not clean up deleted nodeids from the indexes. But the usage of disk space appears that Jena TDB is not reusing allocated space which had been allocated by Jena previously.  Based on this comment there appears to be something that is not correct on file space utilization, 
> http://mail-archives.apache.org/mod_mbox/jena-users/201310.mbox/%3CCE7D7929.2A707%25rvesse@dotnetrdf.org%3E: "The indexes won't shrink - TDB never gives disk space back to the OS -  but disk space is reused when reallocated within the same JVM.".
> In this scenario on the same JVM with NO server stops or starts, we add 27765 graphs to IndexTdb and immediately remove them,  repeating this process several times. 
> {noformat}
> 	           MB	Bytes		Diff (Bytes)
> Start	          193	203239424		
> 				
> Reindex 5	249	262066176		58826752
> Reindex 6	249	262086656		20480
> Reindex 10	298	312500224		50413568
> Reindex 11	298	312520704		20480
> Reindex 12	298	312541184		20480
> Reindex 13	298	312586240		45056
> Reindex 14	306	320995328		8409088
> Reindex 15	330	346181632		25186304
> Reindex 16	330	346198538		16906
> Reindex 17	346	362999808		16801270
> Reindex 18	346	363020288		20480
> Reindex 19	346	363040768		20480
> Reindex 20	346	363061248		20480
> Reindex 21	346	363081728		20480
> Reindex 22	354	371490816		8409088
> Reindex 23	378	396677120		25186304
> 				
> End	193	203239424		
> {noformat}
> The system starts with 193MB of data allocated by indexTdb.  A reindex consists of a remove followed by an add of these graphs. As you can see from the data there is a dramatic increase in the size of indexTdb on the disk after repeadedly removing and adding graphs.  After Reindex 23, there is 378 MB of disk space used.  If Jena TDB reused allocated space there would be no need to allocate more space other than what is used by deleted node ids (unless nodeid storage is eating all of this space?).  Jena does not appear to be reusing the allocated disk space.  At the very end of this scenario, we exported the nquads and reloaded them to show the original disk space was 193MB back to where it started. 
> We believe Jena TDB is not reusing the space allocated by the TDB file system within the same JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)