cputlb: Fix for self-modifying writes across page boundaries

As it currently stands, QEMU does not properly handle self-modifying code
when the write is unaligned and crosses a page boundary. The procedure
for handling a write to the current translation block is to write-protect
the current translation block, catch the write, split up the translation
block into the current instruction (which remains write-protected so that
the current instruction is not modified) and the remaining instructions
in the translation block, and then restore the CPU state to before the
write occurred so the write will be retried and successfully executed.
However, since unaligned writes across pages are split into one-byte
writes for simplicity, writes to the second page (which is not the
current TB) may succeed before a write to the current TB is attempted,
and since these writes are not invalidated before resuming state after
splitting the TB, these writes will be performed a second time, thus
corrupting the second page. Credit goes to Patrick Hulin for
discovering this.

In recent 64-bit versions of Windows running in emulated mode, this
results in either being very unstable (a BSOD after a couple minutes of
uptime), or being entirely unable to boot. Windows performs one or more
8-byte unaligned self-modifying writes (xors) which intersect the end
of the current TB and the beginning of the next TB, which runs into the
aforementioned issue. This commit fixes that issue by making the
unaligned write loop perform the writes in forwards order, instead of
reverse order. This way, QEMU immediately tries to write to the current
TB, and splits the TB before any write to the second page is executed.
The write then proceeds as intended. With this patch applied, I am able
to boot and use Windows 7 64-bit and Windows 10 64-bit in QEMU without
KVM.

Per Richard Henderson's input, this patch also ensures the second page
is in the TLB before executing the write loop, to ensure the second
page is mapped.

The original discussion of the issue is located at
http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg02161.html.

Backports commit 81daabaf7a572f138a8b88ba6eea556bdb0cce46 from qemu
This commit is contained in:
Samuel Damashek 2018-02-25 03:12:09 -05:00 committed by Lioncash
parent 04c423b081
commit 670d81367b
No known key found for this signature in database
GPG key ID: 4E3C3CC1031BA9C7

View file

@ -670,12 +670,25 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
if (DATA_SIZE > 1
&& unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
>= TARGET_PAGE_SIZE)) {
int i;
int i, index2;
target_ulong page2, tlb_addr2;
do_unaligned_access:
/* XXX: not efficient, but simple */
/* Note: relies on the fact that tlb_fill() does not remove the
* previous page from the TLB cache. */
for (i = DATA_SIZE - 1; i >= 0; i--) {
/* Ensure the second page is in the TLB. Note that the first page
is already guaranteed to be filled, and that the second page
cannot evict the first. */
page2 = (addr + DATA_SIZE) & TARGET_PAGE_MASK;
index2 = (page2 >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
tlb_addr2 = env->tlb_table[mmu_idx][index2].addr_write;
if (page2 != (tlb_addr2 & (TARGET_PAGE_MASK | TLB_INVALID_MASK))
&& !VICTIM_TLB_HIT(addr_write, page2)) {
tlb_fill(ENV_GET_CPU(env), page2, MMU_DATA_STORE,
mmu_idx, retaddr);
}
/* XXX: not efficient, but simple. */
/* This loop must go in the forward direction to avoid issues
with self-modifying code in Windows 64-bit. */
for (i = 0; i < DATA_SIZE; ++i) {
/* Little-endian extract. */
uint8_t val8 = (uint8_t)(val >> (i * 8));
/* Note the adjustment at the beginning of the function.
@ -806,12 +819,25 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
if (DATA_SIZE > 1
&& unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
>= TARGET_PAGE_SIZE)) {
int i;
int i, index2;
target_ulong page2, tlb_addr2;
do_unaligned_access:
/* Ensure the second page is in the TLB. Note that the first page
is already guaranteed to be filled, and that the second page
cannot evict the first. */
page2 = (addr + DATA_SIZE) & TARGET_PAGE_MASK;
index2 = (page2 >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
tlb_addr2 = env->tlb_table[mmu_idx][index2].addr_write;
if (page2 != (tlb_addr2 & (TARGET_PAGE_MASK | TLB_INVALID_MASK))
&& !VICTIM_TLB_HIT(addr_write, page2)) {
tlb_fill(ENV_GET_CPU(env), page2, MMU_DATA_STORE,
mmu_idx, retaddr);
}
/* XXX: not efficient, but simple */
/* Note: relies on the fact that tlb_fill() does not remove the
* previous page from the TLB cache. */
for (i = DATA_SIZE - 1; i >= 0; i--) {
/* This loop must go in the forward direction to avoid issues
with self-modifying code. */
for (i = 0; i < DATA_SIZE; ++i) {
/* Big-endian extract. */
uint8_t val8 = (uint8_t)(val >> (((DATA_SIZE - 1) * 8) - (i * 8)));
/* Note the adjustment at the beginning of the function.