You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/02 08:20:05 UTC

[GitHub] [arrow-datafusion] yahoNanJing opened a new pull request, #4079: Replace RwLock and Mutex by using DashMap

yahoNanJing opened a new pull request, #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #4077 .
   
   # Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] metesynnada commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
metesynnada commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1301132871

   > I'm surprised any of these data structures are contended, and so dashmap is unlikely to yield benefits here? It may even be slower.
   > 
   > Do you have any benchmark results you could share?
   
   Actually, it is better in most use cases. You can [check ](https://github.com/xacrimon/conc-map-bench). It is quite similar to Java's [ConcurrentHashMap](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
alamb merged PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1305704793

   Thanks again everyone!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1302527533

   Until the fix for https://github.com/apache/arrow-datafusion/issues/4100 is merged, clippy will be failing on this PR as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] yahoNanJing commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1301558786

   Thanks @alamb and @tustvold, I'll try to add some benchmark testing before this PR be accepted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
ozankabak commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1301886965

   My two cents: I think it is more about encapsulating certain low-level operations regarding the data structure within the data structure itself, vs. exposing them in higher-level code. I think this is probably what @xudong963 means by cleaner, and I agree.
   
   I also agree with @tustvold that typical workloads will not result in contention, without which performance improvements will not manifest.
   
   If this change would merge, the benefits would be:
   1. Code with better separation of concern (although the benefit is not as much as the Ballista change)
   2. Potential performance gain in the unrealistic-looking contention scenarios
   3. Being more in-line with Ballista codebase
   The main disadvantage that I can see:
   1. Introducing a dependency to Datafusion (albeit it is fairly commonly-used one)
   
   I don't think the argument either way is super strong, but it seems to me that pros somewhat outweigh the cons (unless I am missing something).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1300062274

   Looks like maybe dashmap was added in https://github.com/apache/arrow-ballista/pull/319 but there aren't any performance measurements that I could see there


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] yahoNanJing commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1304702522

   Thanks @ozankabak. Just rebased.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1301327646

   Those benchmarks are all for high throughput workloads, in this case I would be extremely surprised to see these hashmaps showing up in profiles. In the absence of a compelling benchmark it is hard for me to approve this... It is a non-trivial additional dependency, not to mention one that I've run into API issues with in the past, for an unclear benefit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1300054560

   I agree I think it would be good to have some more evidence that using a new third-party library adds benefits other than that libraries marketing materials


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1300057638

   Or maybe put another way, if a different locking implementation improved performance significantly in Ballista, I wonder if we should investigate ways to avoid locking all together (copy-on-write, for example) 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1302107891

   I agree this makes the code look cleaner and I will defer to your judgement that the pros outweigh the cons. It looks like this just needs to have the datafusion-cli Cargo.lock file updated and it should be good to go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1304780062

   Let's plan to merge this tomorrow if we don't hear any additional comments 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on pull request #4079: Replace RwLock and Mutex by using DashMap

Posted by GitBox <gi...@apache.org>.
ozankabak commented on PR #4079:
URL: https://github.com/apache/arrow-datafusion/pull/4079#issuecomment-1304543737

   @yahoNanJing, given that #4100 is merged, this can merge if you resolve the conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org