Jan Kiszka [Sun, 5 Apr 2015 09:55:07 +0000 (11:55 +0200)]
x86: Do not call vmload/vmsave on every VM exit
Benchmarks indicate that we can gain about 160 cycles per VM exit &
reentry by only saving/restoring MSR_GS_BASE. We don't touch the other
states that vmload/vmsave deals with.
Specifically, we don't depend on a valid TR/TSS while in root mode
because Jailhouse has neither in userspace nor uses the IST for
interrupts or exceptions, thus does not try to access the TSS.
We still need to perform vmload on handover (actually, we only need to
load MSR_GS_BASE, but vmload is simpler) and after VCPU reset. And as we
no longer save the full state, also for shutdown, we need to pull the
missing information for arch_cpu_restore directly from the registers.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 5 Apr 2015 08:52:32 +0000 (10:52 +0200)]
x86: Make FS_BASE MSR restoration VMX-specific
SVM does not touch this MSR on VM exit, thus does not require the
restoration done in arch_cpu_restore so far. Make it VMX-specific so
that we can drop a few lines of code.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 5 Apr 2015 07:19:33 +0000 (09:19 +0200)]
x86: Make SYSENTER MSR restoration VMX-specific
SVM does not overwrite these MSRs on VM exit, thus does not require the
restoration done in arch_cpu_restore so far. Make them VMX-specific so
that we can drop a few lines of code.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Apr 2015 11:27:59 +0000 (13:27 +0200)]
x86: Refactor SVM version of vcpu_activate_vmm
We can reduce the assembly required in vcpu_activate_vmm by reordering
svm_vmexit to svm_vmentry, i.e. pulling the VM entry logic to the front.
Moreover, RAX can be loaded directly. There is furthermore no need to
declare clobbered variables as we won't return from the assembly block,
which is already declared via __builtin_unreachable.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 5 Apr 2015 07:36:44 +0000 (09:36 +0200)]
x86: Simplify set_svm_segment_from_segment
No need to complain: segment.access_rights is generic as it simply holds
bits 8..23 of the second descriptor dword. The additional invalid bit
used by VMX only can be ignored by SVM - and it is already, even when
leaving out the explicit test.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Apr 2015 15:32:14 +0000 (17:32 +0200)]
x86: Pass vmcb instead of cpu_data to some internal SVM functions
update_efer, svm_parse_mov_to_cr and svm_handle_apic_access have no use
for cpu_data and rather convert it into a vmcb reference directly. So
pass that one instead to save some statements.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Apr 2015 12:57:39 +0000 (14:57 +0200)]
x86: Remove traces of cpuid interception from SVM
There is no foreseeable need to intercept cpuid on AMD. On Intel, we
are not asked if we want to, so we have to execute it on behalf of the
cell.But here we can simple let it happen.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Apr 2015 06:22:49 +0000 (08:22 +0200)]
x86: Remove guest registers parameter from vcpu_handle_msr_read/write
The function only works against the current CPU, thus should avoid to
take the misleading parameter. The necessary reference can be obtained
from the per-cpu data structure now.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Apr 2015 06:20:33 +0000 (08:20 +0200)]
x86: Remove guest registers parameter from vcpu_handle_mmio_access
The function only works against the current CPU, thus should avoid to
take the misleading parameter. The necessary reference can be obtained
from the per-cpu data structure now.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Apr 2015 06:02:21 +0000 (08:02 +0200)]
x86: Remove guest registers and cell parameters from x86_pci_config_handler
The function only works against the current CPU, thus should avoid to
take the misleading parameters. Guest registers are no long er required,
and the cell reference can be obtained inline.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Apr 2015 05:53:18 +0000 (07:53 +0200)]
x86: Rework RAX register accessors of PCI layer
Stop requiring that the guest registers are passed down to the
accessors. Access handlers always work over the issuing CPU, thus can
obtain the register state themselves. Rename the accessors to make it
clear that they work against guest registers.
This allows to drop the guest_regs parameters from
data_port_in/out_handler.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 3 Apr 2015 18:04:44 +0000 (20:04 +0200)]
x86: Remove guest registers parameter from i8042_access_handler
The function only works against the current CPU, thus should avoid to
take the misleading parameter. The necessary reference can be obtained
from the per-cpu data structure now.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 3 Apr 2015 13:33:25 +0000 (15:33 +0200)]
x86: Remove parameters from x2apic_handle_read/write
The function only works against the current CPU, thus should avoid to
take the misleading parameters. We can retrieve the per-cpu data
structure and the guest registers in the function now.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 6 Apr 2015 18:19:34 +0000 (20:19 +0200)]
x86: Remove guest registers parameter from vcpu_handle_xsetbv
The function only works against the current CPU, thus should avoid to
take the misleading parameter. The necessary reference can be obtained
from the per-cpu data structure now.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 3 Apr 2015 13:03:22 +0000 (15:03 +0200)]
x86: Remove guest registers parameter from vcpu_handle_hypercall
The function only works against the current CPU, thus should avoid to
take the misleading parameter. The necessary reference can be obtained
from the per-cpu data structure now.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 3 Apr 2015 13:02:19 +0000 (15:02 +0200)]
x86: Remove guest registers parameter from vcpu_deactivate_vmm
The function only works against the current CPU, thus should avoid to
take the misleading parameter. The necessary reference can be obtained
from the per-cpu data structure now.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 3 Apr 2015 12:47:52 +0000 (14:47 +0200)]
x86: Remove guest registers parameter from vcpu_reset
The function only works against the current CPU, thus should avoid to
take the misleading parameter. The necessary reference can be obtained
from the per-cpu data structure now.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 3 Apr 2015 12:26:08 +0000 (14:26 +0200)]
x86: Enable direct access to per-cpu guest registers
Now that the guest registers are saved at the same location on the
per-cpu stack for both Intel and AMD, we can enable direct access via
the per-cpu data structure. This will allow to drop the guest registers
parameter from most functions.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 3 Apr 2015 11:46:28 +0000 (13:46 +0200)]
x86: Reorder stack layout in svm_vmexit
Push the guest registers first so that they end up at the same location
on the stack as on Intel. This will allow to address them generically
via the per_cpu structure.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 3 Apr 2015 17:21:32 +0000 (19:21 +0200)]
x86: Allow index-based guest register access without type casts
Convert struct registers into a union and provide a by_index array for
index-based access. This is used by various handlers that parse guest
instructions and so far use a blunt type case on the structure.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Apr 2015 11:07:03 +0000 (13:07 +0200)]
x86: Retrieve vcpu_mmio_intercept from vcpu_handle_mmio_access
Analogously to vcpu_handle_io_access, define the vendor callback
vcpu_vendor_get_mmio_intercept and call it from vcpu_handle_mmio_access
instead of passing it to the handler. For consistency reasons, rename
vcpu_pf_intercept to vcpu_mmio_intercept.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Apr 2015 10:23:09 +0000 (12:23 +0200)]
x86: Retrieve vcpu_io_intercept from vcpu_handle_io_access
Convert the vendor-specific functions into vcpu_vendor_get_io_intercept
and invoke that one from vcpu_handle_io_access. That offloads this
burden from the callers of vcpu_handle_io_access and takes us further
towards consistent vendor callbacks for such purposes.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 3 Apr 2015 17:51:51 +0000 (19:51 +0200)]
x86: Remove cpu_data parameter from vcpu_park
The function only works against the current CPU, thus should avoid to
take the misleading parameter. The implementations can obtain the
reference inline as needed.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 3 Apr 2015 09:06:54 +0000 (11:06 +0200)]
x86: Block write access to MTRR registers
Linux does not try to rewrite them on CPU hotplug if they are identical
to other CPUs' registers, and our non-root cells have no business in
touching them as well. This effectively freezes MTRRs after handover
ensures consistent states for both the hypervisor and all cells across
all CPUs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 3 Apr 2015 08:48:19 +0000 (10:48 +0200)]
x86: Emulate MTRR enable/disable
We assume that cells will only flip the enabled flag of
IA32_MTRR_DEF_TYPE, leaving the rest of the register in default state
(the one found during handover). SVM already implemented this but
emulated the disabled state by modifying the host PAT.
This approach works less invasively by only changing the effective guest
PAT to 0 in case MTRRs are off. And it provides this for Intel as well.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 2 Apr 2015 08:15:40 +0000 (10:15 +0200)]
x86: Maintain PAT shadow
For emulating the MTRR-disabled state, we will have to modify the
effective guest PAT state soon. This prepares for it by keeping PAT in
a shadow per-cpu field and intercept access to the MSR.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 29 Mar 2015 10:19:47 +0000 (12:19 +0200)]
x86: Switch between host and guest PAT
Do not allow the guest to mess with the PAT MSR in a was that also
affects the host. This may cause the host to run in uncached mode,
slowing it down, or - even worse- access MMIO with caches enabled which
will cause inconsistencies.
On Intel, we have to require and enable the related save/restore
feature. On AMD, we need to intercept the MSR accesses and map them on
the g_pat field of the VMCB.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 28 Mar 2015 11:02:04 +0000 (12:02 +0100)]
x86: Prevent interference by Intel perf counters
Make it simple but safe: Disable perf counters during setup and prevent
that cells can modify the corresponding MSR. This avoids that we have
to switch the MSR during vmentry/exit, but it also blocks perf & friends
while Jailhouse is active.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 18 Mar 2015 07:50:25 +0000 (08:50 +0100)]
core: Instrument relevant return paths for error tracing
This instruments return paths so that the origin of important errors can
be tracked down. Two so far explicit error outputs are replaced with
trace_error.
We do not instrument -ENOMEM cases unless they relate to allocations
from the remapping pool. All other -ENOMEM cases boil down to a too
small hypervisor region.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 18 Mar 2015 07:43:50 +0000 (08:43 +0100)]
core: Introduce error return code tracing
A number of errors that can be reported during setup or while
reconfiguring cells are hard to trace down to their detailed reasons
because of the limited number of error codes available through POSIX.
This Introduces a non-invasive mechanism to instrument error return
paths in the hypervisor and report the origin of a specific error code
in the form
Jan Kiszka [Wed, 18 Mar 2015 07:39:19 +0000 (08:39 +0100)]
x86: Bring host CR4 into well-defined state during setup
Analogously to CR0: Avoid any uncertainty about the state of CR4 left
behind by Linux: check for unexpectedly set reserved bits or required-1
bits, and otherwise set our own state.
A side effect of this change is the VMX's vcpu_exit will no longer clear
VMXE in CR4 but only in the cached Linux state that arch_cpu_restore
will write back.
CC: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 18 Mar 2015 07:11:34 +0000 (08:11 +0100)]
x86: Reformat and cleanup CR4 constants
Encode CR0 constants in an easier readable form, add soon required XSAVE
feature bit and remove unused PGE. Also add a mask of the reserved bits
that need to be left as-is on modifications.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 16 Mar 2015 08:21:58 +0000 (09:21 +0100)]
x86: Bring host CR0 into well-defined state during setup
Avoid any uncertainty about the state of CR0 left behind by Linux: check
for unexpectedly set reserved bits or required-1 bits, and otherwise set
our own state.
CC: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 17 Mar 2015 09:34:59 +0000 (10:34 +0100)]
x86: Rework CR0/CR4 restriction handling for VMX
First of all, we want to reuse the restrictions also for setting the
host CRx values. And then the current implementation would benefit from
more documentation, caching of those static values and checking their
consistency across all CPUs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 16 Mar 2015 08:18:56 +0000 (09:18 +0100)]
x86: Reformat and extend CR0 constants
Encode CR0 constants in an easier readable form and add some bits we
will need soon. Also add a mask of the reserved bits that need to be
left as-is on modifications.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 16 Mar 2015 07:07:05 +0000 (08:07 +0100)]
x86: Add MSR whitelisting to to-do list
We currently allow access to almost all MSRs (except for APIC-related
ones). This has to be changed into a whitelist approach to avoid that
the cell manipulates a CPU state in a way we didn't validate as safe.
CC: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Add parse_ivrs() function that extracts relevant bits of information
from ACPI IVRS table which describes AMD IOMMU units found in the system.
As VT-d and AMD-Vi impose slightly different requirements on PCI devices
configuration (eg PCI root complex), move sanity checks to corresponding
functions to account for these discrepancies.
Signed-off-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 12 Mar 2015 07:00:32 +0000 (08:00 +0100)]
ci: Add Coverity model for kmalloc
kmalloc can actually sanitize a tained size parameter if given the right
GFP flags, namely GFP_USER (to properly tag the request origin) and
__GFP_NOWARN (to avoid WARN_ON when hitting the kmalloc limit). Model
this for Coverity so that it no longer complains about the correct
pattern we use in jailhouse_cmd_cell_create.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 11 Mar 2015 06:39:43 +0000 (07:39 +0100)]
driver: Correctly tag kmalloc allocation on behalf of user space
When the provided config size if beyond kmalloc limits, it may raise a
WARN_ON. Avoid this by tagging the allocation with __GFP_NOWARN. Also
properly tag it as GFP_USER instead of GFP_KERNEL.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 10 Mar 2015 06:27:43 +0000 (07:27 +0100)]
driver: Improve input validation to make code scanners happier
We trust the configuration files passed down to the driver already
because they define the isolation set up by the hypervisor and can
therefore screw up the system in various ways.
Nevertheless, we can and should improve basic consistency checks of
config fields that influences allocations and copy operations. This will
detect some corruptions/inconsistencies earlier and also satisfies the
Coverity scanner.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 10 Mar 2015 13:09:23 +0000 (14:09 +0100)]
x86: Address sparse warnings about missing UL tags for constants
Automatic type conversion saved us in all these cases so far, but better
avoid surprises in the future and another finding turned out to be a bug
in fact.
JAILHOUSE_BASE requires special wrapping as it is also used in assembly
(the linker script) where the UL tag is not understood.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>