You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "brancz (via GitHub)" <gi...@apache.org> on 2023/08/22 12:17:17 UTC

[GitHub] [arrow] brancz opened a new issue, #37306: [Go] Dictionary unifier uses unnecessarily expensive `getvalFn`

brancz opened a new issue, #37306:
URL: https://github.com/apache/arrow/issues/37306

   ### Describe the enhancement requested
   
   Our profiling data shows clearly that even though dictionary unification is working brilliantly, there is still plenty of CPU time that is used that is unnecessary. Take this profiling data: https://pprof.me/6681ca5
   
   This shows that ~43% is spent because of `convTslice` within `getvalFn`, because the value has to be treated as an `interface{}` and is then subsequently added to the MemoTable using non-type-safe `GetOrInsert`.
   
   I see 3 potential ways forward:
   
   1) Somehow optimize this away entirely, though I don't see how.
   2) Offer a specific DictionaryUnifier per type (these could be added one by one as they're needed).
   3) Move `hashing` out of the `internal` package so that people can build their own `DictionaryUnifier`s.
   
   @zeroshade thoughts?
   
   ### Component(s)
   
   Go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on issue #37306: [Go] Dictionary unifier uses unnecessarily expensive `getvalFn`

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on issue #37306:
URL: https://github.com/apache/arrow/issues/37306#issuecomment-1688390235

   Haha, perfectly fine. It's on my list of things to do, particularly because of the performance potential. In the meantime, I agree with you that both solutions 2 *and* 3 are viable ideas. I'll definitely take a look at review the PR you added. I guess whichever one of us can manage to get to this first will put up a PR for it :smile: 
   
   If I do get around to this, I'll make sure to tag you on the PR since I'd be interested to see if you notice any performance improvements in your use cases (might even be interesting to see if you could simplify some of your use cases enough to contribute potential benchmark functions).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] brancz commented on issue #37306: [Go] Dictionary unifier uses unnecessarily expensive `getvalFn`

Posted by "brancz (via GitHub)" <gi...@apache.org>.
brancz commented on issue #37306:
URL: https://github.com/apache/arrow/issues/37306#issuecomment-1688393823

   > If I do get around to this, I'll make sure to tag you on the PR since I'd be interested to see if you notice any performance improvements in your use cases (might even be interesting to see if you could simplify some of your use cases enough to contribute potential benchmark functions).
   
   I wish ... unfortunately I don't even have local benchmarks for my code for this. It's all coming from profiling in production and using that data to inform changes.
   
   > Haha, perfectly fine. It's on my list of things to do, particularly because of the performance potential. In the meantime, I agree with you that both solutions 2 and 3 are viable ideas. I'll definitely take a look at review the PR you added. I guess whichever one of us can manage to get to this first will put up a PR for it 😄
   
   Amazing! Sounds good!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] brancz commented on issue #37306: [Go] Dictionary unifier uses unnecessarily expensive `getvalFn`

Posted by "brancz (via GitHub)" <gi...@apache.org>.
brancz commented on issue #37306:
URL: https://github.com/apache/arrow/issues/37306#issuecomment-1688383652

   I have a feeling you must have thought about this much much longer than I have, for me that's a pretty unknown amount of time to spend, and I won't be able to get to something like that for the next 1-2 months.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade closed issue #37306: [Go] Dictionary unifier uses unnecessarily expensive `getvalFn`

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade closed issue #37306: [Go] Dictionary unifier uses unnecessarily expensive `getvalFn`
URL: https://github.com/apache/arrow/issues/37306


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] brancz commented on issue #37306: [Go] Dictionary unifier uses unnecessarily expensive `getvalFn`

Posted by "brancz (via GitHub)" <gi...@apache.org>.
brancz commented on issue #37306:
URL: https://github.com/apache/arrow/issues/37306#issuecomment-1688085216

   I think I would tend towards 2 _and_ 3, but don't feel strongly. Either way, I'm already starting to implement a `BinaryDictionaryUnifier`, which I'm happy to contribute.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on issue #37306: [Go] Dictionary unifier uses unnecessarily expensive `getvalFn`

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on issue #37306:
URL: https://github.com/apache/arrow/issues/37306#issuecomment-1688378620

   So, @brancz I think it's been long enough at this point that I'd be fine with updating the Arrow lib to go1.18+ and introducing generics right into the main library (since go1.21 is officially out, the standard would be that we can go up to 1.19 safely). Do you want to try your hand at upgrading the unifier / memo table / etc. to fully leverage generics for the performance improvements?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org