You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by GitBox <gi...@apache.org> on 2022/11/26 22:00:58 UTC

[GitHub] [maven-indexer] mbien opened a new pull request, #277: IndexDataReader could use Sets.

mbien opened a new pull request, #277:
URL: https://github.com/apache/maven-indexer/pull/277

   Noticed that both, the single threaded and the multi threaded impl used CHMs. Instead of only changing `readIndexST` to plain maps, I switched everything to Sets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-indexer] mbien commented on a diff in pull request #277: IndexDataReader could use Sets.

Posted by GitBox <gi...@apache.org>.
mbien commented on code in PR #277:
URL: https://github.com/apache/maven-indexer/pull/277#discussion_r1032837361


##########
indexer-core/src/main/java/org/apache/maven/index/updater/IndexDataReader.java:
##########
@@ -277,17 +277,17 @@ private IndexWriter tempWriter( final String name ) throws IOException
     }
 
     private void addToIndex( final Document doc, final IndexingContext context, final IndexWriter indexWriter,
-                             final ConcurrentMap<String, Boolean> rootGroups,
-                             final ConcurrentMap<String, Boolean> allGroups )
+                             final Set<String> rootGroups,
+                             final Set<String> allGroups )
             throws IOException
     {
         ArtifactInfo ai = IndexUtils.constructArtifactInfo( doc, context );
         if ( ai != null )
         {
             indexWriter.addDocument( IndexUtils.updateDocument( doc, context, false, ai ) );
 
-            rootGroups.putIfAbsent( ai.getRootGroup(), Boolean.TRUE );
-            allGroups.putIfAbsent( ai.getGroupId(), Boolean.TRUE );
+            rootGroups.add( ai.getRootGroup() );
+            allGroups.add( ai.getGroupId() );

Review Comment:
   `add` should be a little bit faster compared to `putIfAbsent` in the concurrent case. But since so much IO is going on it goes down in the noise.



##########
indexer-core/src/main/java/org/apache/maven/index/updater/IndexDataReader.java:
##########
@@ -165,8 +165,8 @@ private IndexDataReadResult readIndexMT( IndexWriter w, IndexingContext context
 
         final Document theEnd = new Document();
 
-        ConcurrentMap<String, Boolean> rootGroups = new ConcurrentHashMap<>();
-        ConcurrentMap<String, Boolean> allGroups = new ConcurrentHashMap<>();
+        Set<String> rootGroups = ConcurrentHashMap.newKeySet();
+        Set<String> allGroups = ConcurrentHashMap.newKeySet();

Review Comment:
   I tried to size them appropriately (central has about 70k groups, and resizing CHMs has a larger impact than normal maps), but since this made no performance difference I left the defaults as is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-indexer] mbien commented on a diff in pull request #277: IndexDataReader could use Sets.

Posted by GitBox <gi...@apache.org>.
mbien commented on code in PR #277:
URL: https://github.com/apache/maven-indexer/pull/277#discussion_r1032837233


##########
indexer-core/src/main/java/org/apache/maven/index/updater/IndexDataReader.java:
##########
@@ -165,8 +165,8 @@ private IndexDataReadResult readIndexMT( IndexWriter w, IndexingContext context
 
         final Document theEnd = new Document();
 
-        ConcurrentMap<String, Boolean> rootGroups = new ConcurrentHashMap<>();
-        ConcurrentMap<String, Boolean> allGroups = new ConcurrentHashMap<>();
+        Set<String> rootGroups = ConcurrentHashMap.newKeySet();
+        Set<String> allGroups = ConcurrentHashMap.newKeySet();

Review Comment:
   I tried to size them appropriately (central has about 70k groups, and resizing CHMs has a larger impact than normal maps), but since this made no performance difference I left the defaults as is.
   
   (tested only twice so the sample size is small though, noise is a bit high)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-indexer] cstamas merged pull request #277: [MINDEXER-181] IndexDataReader should use Sets

Posted by GitBox <gi...@apache.org>.
cstamas merged PR #277:
URL: https://github.com/apache/maven-indexer/pull/277


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-indexer] cstamas commented on pull request #277: [MINDEXER-181] IndexDataReader should use Sets

Posted by GitBox <gi...@apache.org>.
cstamas commented on PR #277:
URL: https://github.com/apache/maven-indexer/pull/277#issuecomment-1328696142

   @mbien anything more you want to add or this PR is good to go?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [maven-indexer] mbien commented on pull request #277: [MINDEXER-181] IndexDataReader should use Sets

Posted by GitBox <gi...@apache.org>.
mbien commented on PR #277:
URL: https://github.com/apache/maven-indexer/pull/277#issuecomment-1328883441

   > @mbien anything more you want to add or this PR is good to go?
   
   would be all for this PR. 
   
   Wanted to investigate if it would be possible to do a shallow extraction (comparable to git clone --depth) based on artifact age or number of versions since it might be interesting for IDE use. As a test i tried to run with `indexUpdateRequest.setDocumentFilter((doc) -> false);` but this did produce an index with identical size even though the filter method was called. But first I probably try to take a look what is taking up all the space in the first place - maybe there are other ways to reduce footprint. But that would be all for later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@maven.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org