unicorn/qemu/target-i386
Emilio G. Cota 3dc16ebca3
target-i386: remove helper_lock()
It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
$ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.

atomic_add-bench: 5000000 ops/thread, [0,1] range

25 ++---------+----------+---------+----------+----------+----------+---++
+ atomic +-E--+ + + + + + |
|cmpxchg +-H--+ |
20 +Emaster +-N--+ ++
|| |
|++ |
|| |
15 +++ ++
|N| |
|+| |
10 ++| ++
|+|+ |
| | -+E+------ +++ ---+E+------+E+------+E+-----+E+------+E|
|+E+E+- +++ +E+------+E+-- |
5 ++|+ ++
|+N+H+--- +++ |
++++N+--+H++----+++ + +++ --++H+------+H+------+H++----+H+---+--- |
0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 5000000 ops/thread, [0,2] range

25 ++---------+----------+---------+----------+----------+----------+---++
++atomic +-E--+ + + + + + |
|cmpxchg +-H--+ |
20 ++master +-N--+ ++
|E| |
|++ |
||E |
15 ++| ++
|N|| |
|+|| ---+E+------+E+-----+E+------+E|
10 ++| | ---+E+------+E+-----+E+--- +++ +++
||H+E+--+E+-- |
|+++++ |
| || |
5 ++|+H+-- +++ ++
|+N+ - ---+H+------+H+------ |
+ +N+--+H++----+H+---+--+H+----++H+--- + + +H+---+--+H|
0 ++---------+----------+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 5000000 ops/thread, [0,8] range

40 ++---------+----------+---------+----------+----------+----------+---++
++atomic +-E--+ + + + + + |
35 +cmpxchg +-H--+ ++
| master +-N--+ ---+E+------+E+------+E+-----+E+------+E|
30 ++| ---+E+-- +++ ++
| | -+E+--- |
25 ++E ---- +++ ++
|+++++ -+E+ |
20 +E+ E-- +++ ++
|H|+++ |
|+| +H+------- |
15 ++H+ ---+++ +H+------ ++
|N++H+-- +++--- +H+------++|
10 ++ +++ - +++ ---+H+ +++ +H+
| | +H+-----+H+------+H+-- |
5 ++| +++ ++
++N+N+--+N++ + + + + + |
0 ++---------+----------+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 5000000 ops/thread, [0,128] range

160 ++---------+---------+----------+---------+----------+----------+---++
+ atomic +-E--+ + + + + + |
140 +cmpxchg +-H--+ +++ +++ ++
| master +-N--+ E--------E------+E+------++|
120 ++ --| | +++ E+
| -- +++ +++ ++|
100 ++ - ++
| +++- +++ ++|
80 ++ -+E+ -+H+------+H+------H--------++
| ---- ---- +++ H|
| ---+E+-----+E+- ---+H+ ++|
60 ++ +E+--- +++ ---+H+--- ++
| --+++ ---+H+-- |
40 ++ +E+-+H+--- ++
| +H+ |
20 +EE+ ++
+N+ + + + + + + |
0 ++N-N---N--+---------+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 5000000 ops/thread, [0,1024] range

350 ++---------+---------+----------+---------+----------+----------+---++
+ atomic +-E--+ + + + + + |
300 +cmpxchg +-H--+ +++
| master +-N--+ +++ ||
| +++ | ----E|
250 ++ | ----E---- ++
| ----E--- | ---+H|
200 ++ -+E+--- +++ ---+H+--- ++
| ---- -+H+-- |
| +E+ +++ ---- +++ |
150 ++ ---+++ ---+H+- ++
| --- -+H+-- |
100 ++ ---+E+ ---- +++ ++
| +++ ---+E+-----+H+- |
| -+E+------+H+-- |
50 ++ +E+ ++
+EE+ + + + + + + |
0 ++N-N---N--+---------+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.

Backports commit 37b995f6e7a1cb6fa378c5cd4217b9dd9e1fc98b from qemu
2018-02-27 23:43:22 -05:00
..
arch_memory_mapping.c
bpt_helper.c cpu-exec: Rename cpu_resume_from_signal() to cpu_loop_exit_noexc() 2018-02-24 17:25:28 -05:00
cc_helper.c
cc_helper_template.h
cpu-qom.h target-i386: List CPU models using subclass list 2018-02-26 08:17:04 -05:00
cpu.c target-i386: Don't use cpu->migratable when filtering features 2018-02-26 09:51:14 -05:00
cpu.h target-i386: Move xsave component mask to features array 2018-02-26 04:45:35 -05:00
excp_helper.c cpu: move exec-all.h inclusion out of cpu.h 2018-02-24 02:39:08 -05:00
fpu_helper.c target-i386: Use struct X86XSaveArea in fpu_helper.c 2018-02-26 03:38:53 -05:00
helper.c cpus: pass CPUState to run_on_cpu helpers 2018-02-26 04:54:55 -05:00
helper.h target-i386: remove helper_lock() 2018-02-27 23:43:22 -05:00
int_helper.c cpu: move exec-all.h inclusion out of cpu.h 2018-02-24 02:39:08 -05:00
Makefile.objs
mem_helper.c target-i386: remove helper_lock() 2018-02-27 23:43:22 -05:00
misc_helper.c cpu: move exec-all.h inclusion out of cpu.h 2018-02-24 02:39:08 -05:00
mpx_helper.c cpu: move exec-all.h inclusion out of cpu.h 2018-02-24 02:39:08 -05:00
ops_sse.h
ops_sse_header.h
seg_helper.c target-i386: Fixed syscall posssible segfault 2018-02-26 02:36:09 -05:00
shift_helper_template.h
smm_helper.c target-i386: Include log.h in smm_helper 2018-02-24 03:06:07 -05:00
svm.h Clean up ill-advised or unusual header guards 2018-02-25 04:22:46 -05:00
svm_helper.c cpu: move exec-all.h inclusion out of cpu.h 2018-02-24 02:39:08 -05:00
TODO
topology.h pc: Add x86_topo_ids_from_apicid() 2018-02-25 20:31:36 -05:00
translate.c target-i386: remove helper_lock() 2018-02-27 23:43:22 -05:00
unicorn.c
unicorn.h