You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@jena.apache.org by GitBox <gi...@apache.org> on 2022/04/30 18:35:05 UTC

[GitHub] [jena] afs opened a new issue, #1279: Improve the implementation of in-memory, general purpose, non-transactional graphs.

afs opened a new issue, #1279:
URL: https://github.com/apache/jena/issues/1279

   See PR #1273.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1279: Improve the implementation of in-memory, general purpose, non-transactional graphs.

Posted by "afs (via GitHub)" <gi...@apache.org>.
afs commented on issue #1279:
URL: https://github.com/apache/jena/issues/1279#issuecomment-1566242716

   Yes - term-quality.
   
   A `GraphMem2` sounds like a good plan; just the switch to term-equality means we need to migrate somehow. Jena5 is likely later this year.
   
   No support `Iterator#remove` is generally a good idea. It is at best an "optional extra" not a core part of iterators IMO.
   
   The extra space usage - this I am in two minds about. Faster is better but so is having an storage that optimizes for space. It doesn't sound if it is so fundamental as to be irreversible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] arne-bdt commented on issue #1279: Improve the implementation of in-memory, general purpose, non-transactional graphs.

Posted by "arne-bdt (via GitHub)" <gi...@apache.org>.
arne-bdt commented on issue #1279:
URL: https://github.com/apache/jena/issues/1279#issuecomment-1562406230

   There's a straightforward way to improve the speed of `GraphMem` for `Graph#add` by nearly 50% and `Graph#delete` by 10-35%. This change has almost no impact on any other operation. However, it uses an additional array for the hashCodes in `HashCommon`. As a result, the memory consumption of the indexing structures (excluding the space for the triples) would increase by 31-51%.
   
   I propose not to implement this for the deprecated `GraphMem`, but instead to create a successor `GraphMem2` with the following changes:
   - Speed up `Graph#add` and `Graph#delete` accepting the given memory cost
   - Remove support for Iterator#remove to reduce complexity 
     (other implementations like `DatasetGraphInMemory` don't support this either)
   - Switch to term-equality as proposed in [issue#1867](https://github.com/apache/jena/issues/1867#issuecomment-1546931793) mostly by:
     - Removing calls to `org.apache.jena.graph.Node#sameValueAs`
     - Replacing usages of `org.apache.jena.graph.Node#getIndexingValue` with the Node itself
   
   What do you say?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] arne-bdt commented on issue #1279: Improve the implementation of in-memory, general purpose, non-transactional graphs.

Posted by "arne-bdt (via GitHub)" <gi...@apache.org>.
arne-bdt commented on issue #1279:
URL: https://github.com/apache/jena/issues/1279#issuecomment-1566270071

   Thanks, that's the information I needed to proceed.
   
   Adding and removing the array with the hashes is straightforward. It might also be quite simple to have two variants of the graph store. That way, users can choose between the fast variant or the low memory one.
   
   Speaking of which:
   There are two fields tracking changes for potential concurrent modifications:
   1. `org.apache.jena.mem.ArrayBunch#changes`
   2. `org.apache.jena.mem.HashCommon#changes`
   
   The one in `ArrayBunch` is volatile, but the one in `HashCommon` is not. `ArrayBunch` is totally fine with integer overflow, since it checks for equality when verifying: 
   `if (changes != initialChanges) throw new ConcurrentModificationException();`
   
   HashCommon, however, wouldn't handle an integer overflow correctly: 
   `if (changes > initialChanges) throw new ConcurrentModificationException();`
   
   The document header of `java.util.HashMap` states:
   ` * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
    * as it is, generally speaking, impossible to make any hard guarantees in the
    * presence of unsynchronized concurrent modification.  Fail-fast iterators
    * throw {@code ConcurrentModificationException} on a best-effort basis.
    * Therefore, it would be wrong to write a program that depended on this
    * exception for its correctness: <i>the fail-fast behavior of iterators
    * should be used only to detect bugs.</i>`
   
   Tracking the changes here might be the safest possible way. However, `HashCommon` should then also make `changes` volatile and compare for equality rather than using a greater than comparison.
   
   On the other hand, if we removed both fields and checked `org.apache.jena.mem.HashCommon#size` and `org.apache.jena.mem.ArrayBunch#size` instead, we would save memory (since there are many instances of both classes) and avoid incrementing a second variable on `#add` and `#remove` operations. The iterators should still help detect concurrent modifications in almost any scenario. Only if there are an equal number of `#add` and `#remove` calls, so that the total size of the graph is not affected, could there be undetected concurrent modifications while executing an iterator. This seems unlikely enough to me that it would prevent anyone from detecting a concurrency bug.
   What is your opinion in this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs closed issue #1279: Improve the implementation of in-memory, general purpose, non-transactional graphs.

Posted by "afs (via GitHub)" <gi...@apache.org>.
afs closed issue #1279: Improve the implementation of in-memory, general purpose, non-transactional graphs.
URL: https://github.com/apache/jena/issues/1279


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] arne-bdt commented on issue #1279: Improve the implementation of in-memory, general purpose, non-transactional graphs.

Posted by "arne-bdt (via GitHub)" <gi...@apache.org>.
arne-bdt commented on issue #1279:
URL: https://github.com/apache/jena/issues/1279#issuecomment-1544157955

   My work on an alternative to GraphMem is progressing slowly.
   So I decided to contribute a few improvements to the existing GraphMem on the way.
   Please review my [PR](https://github.com/apache/jena/pull/1865).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org