-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-48897: [C++] Benchmark and optimize CountSetBits #48898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Also @AntoinePrv FYI |
|
@ursabot please benchmark |
|
Hmm, it seems performance is behind the expected theoretical throughput. From Agner Fog's instruction tables, I see that AMD Zen 2 should be able to sustain 4 POPCNT operations/cycle (reciprocal throughput = 0.25), i.e. 32 bytes/cycle on 64-bit ints. |
08383d7 to
9921e9d
Compare
|
Ok, the nested for-loop is un-nested by gcc 15.2.0... |
9921e9d to
d0f45cf
Compare
|
Updated benchmark numbers after I hand-unrolled the loop. |
|
@github-actions crossbow submit -g cpp |
zanmato1984
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Revision: d0f45cf Submitted crossbow builds: ursacomputing/crossbow @ actions-cdbe33a753 |
|
@ursabot please benchmark |
|
@rok I have deleted the branch, so I'm not sure that can work? |
|
I see the event on kubernetes, but the github api token was expired so it couldn't post back. |
|
Trying on #48907 |
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit ed35594. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 10 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Counting the set bits in a null bitmap is an operation that comes often, it can be useful to get a more precise idea of its performance.
What changes are included in this PR?
CountSetBits.Local results (AMD Zen 2):
Local results (Intel(R) Core(TM) Ultra 7 255H):
Are these changes tested?
By running said benchmark manually (and by Continuous Benchmarking).
Are there any user-facing changes?
No.