Jan Kiszka [Fri, 18 Jul 2014 07:50:21 +0000 (09:50 +0200)]
inmates: Add PCI services to inmates framework
Provide library services for PCI config space access, bus scanning,
capability scanning (non-extended only so far) and MSI vector
programming (MSI-X to be added later).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 17 Jul 2014 18:50:33 +0000 (20:50 +0200)]
inmates: Add IOAPIC demo
Simply demonstration and test for using the IOAPIC within an non-root
cell: Rob the ACPI IRQ and wait for events on this line, e.g. a power
button push. Read the warning before using it.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 17 Jul 2014 15:18:21 +0000 (17:18 +0200)]
inmates: Add memory services to inmates framework
This adds a primitive memory allocator (without release) and a page
mapper (without unmap) to the inmates library. MMIO accessors are also
included. Those used for intercepted resources are encoded in assembly
to ensure that only supported instructions are used. With these
services, inmates can now access memory-mapped devices.
The allocator uses the lower memory starting from the first page.
Document this as well as the remaining memory layout.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 16 Jul 2014 19:49:25 +0000 (21:49 +0200)]
inmates: Factor our interrupt library services
This simplifies registering interrupt handlers and also moves the EOI
ACK into library code. Only 64-bit support so far. Still, we need to fix
the definition of s64/u64 and make read/write_msr compatible with 32-bit
builds.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 16 Jul 2014 11:46:49 +0000 (13:46 +0200)]
inmates: Map Comm Region always at 0x100000 for inmates framework
Standardize mapping and access to the Comm Region within the inmates
framework. Reduces the work to be done for new inmates. We will move it
higher once paging services are available so that larger inmates can be
created.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 16 Jul 2014 11:05:42 +0000 (13:05 +0200)]
inmates: Refactor folder structure
Move common code into inmates/lib and showcases into a inmates/demos to
prepare for a reusable and extensible inmates framework. Also split
along architecture dependencies, we will get code for non-x86 as well
one day.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 21 Jul 2014 06:48:07 +0000 (08:48 +0200)]
x86: Handle more SIB cases in MMIO instruction parser
This adds, among other things, support for using r12 as address register
in MMIO accesses. And it actually simplifies the code. We can ignore SS
and index in MOD 0 as these only affect the memory address we obtain
differently.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 17 Jul 2014 18:48:12 +0000 (20:48 +0200)]
x86: Add support for REX.B to MMIO instruction parser
In none of the supported modes, REX.B is relevant for us because we
obtain the memory address - which it influences by selecting the address
register - differently. Therefore, we can ignore this bit, extending the
set of supported MMIO instructions.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 8 Jul 2014 06:06:21 +0000 (08:06 +0200)]
x86: Fix argument widths of hypercall ABI
The x86 hypercall ABI defined 64-bit arguments and return codes so far.
However, our interface header took and returned only 32 bits. This
slipped through unnoticed because usually no physical addresses beyond
4G are passed to the Cell Create hypercall, the only place where it
practically matters.
Fix the issue and extend the ABI to support also 32-bit callers. We
define hypercall code and return value to be 32 bits, argument width are
now corresponding the the callers mode: 64 bits in IA-32e mode, 32 bits
otherwise. While the root cell still has to be in 64-bit mode, non-root
cells in other modes are now fine to invoke the hypercalls as well.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 7 Jul 2014 17:01:00 +0000 (19:01 +0200)]
x86: Clear APIC on every SIPI event
The current logic only ensures that we clear the APIC when the CPU
enters the virtual wait-for-SIPI state. However, this does not cover the
case when we transfer a CPU from the root to a non-root cell. We only
stop the CPU for this, and reset it directly via a pseudo SIPI. This
change moves the clearing to the point where we are about to deliver the
SIPI.
The change has the positive side effect of moving potentially costly
APIC clearing out of the control_lock.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Adding a helper script to generate a configuration for the root cell.
The script can also generate another script to collect all the necessary
files on a remote machine.
Both scripts can be accessed through the jailhouse command.
Signed-off-by: Henning Schild <henning.schild@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 6 Jul 2014 14:41:17 +0000 (16:41 +0200)]
configs: Add Q35 machine support to QEMU VM
Add required PCI devices to the QEMU config so that it both works with
the default i440FX and the newer Q35 machine. This is transitional until
Q35 gains VT-d support, then we will drop i440FX bits.
Open the whole 0xC0xx port range for PCI devices to be more tolerant
regarding ordering or other changes.
At this chance, drop the unneeded permission to talk to the first legacy
PIC.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 6 Jul 2014 09:32:14 +0000 (11:32 +0200)]
inmates/configs: Pick up PM timer address from Comm Region
Instead of probing it, use the information that is now provided via the
Communication Region. This requires to map the region also into the
tiny-demo cell.
We can now drop all explicit port permissions from the inmate's cell
configurations as this is now done automatically by the hypervisor.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 6 Jul 2014 09:06:07 +0000 (11:06 +0200)]
x86: Provide PM timer access to all cells
Export the PM timer address via the Communication Region to non-root
cells and allow access to that port for all cells. This is safe as the
PM timer hardware is specified to be read-only.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 6 Jul 2014 09:37:48 +0000 (11:37 +0200)]
x86: Allow to specify the PM timer address via the system configuration
This enables the hypervisor to forward the information to non-root cells
and to permit access to the resource. We could also parse the ACPI table
in the hypervisor, but this approach is much simpler.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 5 Jul 2014 06:48:59 +0000 (08:48 +0200)]
x86: Drop redundant vmx_invept
No need to call invept also on vmx_cell_init. We already perform this
for all cpus involved in a cell creation (root and new cell cpus) via
arch_config_commit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 20 Jun 2014 23:05:03 +0000 (01:05 +0200)]
tools: Prepare further command line tool extensions
Allow to add more extension scripts to the command line tool. We define
a structure that describes an extension by command / subcommand and
provides help to be displayed by the tool. The extension script itself
has to be called jailhouse-<command>-<subcommand>. We look for it in
$PATH, extended by the tool's directory and /usr/lib/jailhouse.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 14 Jun 2014 06:36:57 +0000 (08:36 +0200)]
core/driver: Extend "CPU Get State" to "CPU Get Info" hypercall
Add a second argument to control which per-cpu information shall be
retrieved via JAILHOUSE_HC_CPU_GET_INFO. For now there will only be
JAILHOUSE_CPU_INFO_STATE, providing the original hypercall service.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 13 Jun 2014 13:08:27 +0000 (15:08 +0200)]
core: Clean up jailhouse_system_config_size
Instead of open-coding the content of struct jailhouse_system for
calculating its size, simply adjust sizeof(*system) by that part of
system->system that is already included in jailhouse_cell_config_size().
Simplifies future extensions of struct jailhouse_system.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 2 Jun 2014 10:17:48 +0000 (12:17 +0200)]
x86: Add support for IOAPIC access control
This adds basic access control to the IOAPIC. Based on the IRQ chip
configuration, we permit or deny writing to redirection table entries.
This may require integration with interrupt remapping later on.
We furthermore allow reads from other valid IOAPIC registers but deny
any other write accesses.
EOI writing is currently passed through. This will have to be revisited
as well when interrupt remapping is added.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 2 Jun 2014 10:11:27 +0000 (12:11 +0200)]
core/configs: Change IRQ line access control modeling
Change the configuration file to manage access to IRQ lines at IRQ chip
level. Each IRQ chip config entry consists an address, typically the
chip's MMIO address, a unique identifier that will be used for interrupt
remapping on x86, and a bitmap controlling access to individual IRQ pins
of that chip. This will simplify access control checks to IRQ chips.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 11 Jun 2014 17:31:16 +0000 (19:31 +0200)]
x86: Park a CPU when a VM entry failed
Do not give up a CPU if only VM entry failed. For whatever reason, we
may have loaded an invalid CPU state from which we can still recover by
resetting the virtual CPU. This also simplifies the exit handling.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 6 Jun 2014 07:05:11 +0000 (09:05 +0200)]
x86: Refactor vtd constant definitions
Make use of BIT_MASK and refactor the constant definitions used for vtd
into a consistent form that is more easily verifiable against the spec.
Drop some unused constants.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 5 Jun 2014 14:42:36 +0000 (16:42 +0200)]
core: Generic memory region mapping for cell creation with rollback on errors
Pull the memory region mappings that currently happen in vmx and vtd
into generic code paths. This allows us to properly roll back on errors
during cell creation.
We now perform the arch-specific cell initialization first, then
transfer CPUs and finally remap the memory regions. For the rollback, we
can simply use the infrastructure available for cell destruction, both
at generic level as well as inside vmx/vtd.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 15 Jun 2014 05:35:26 +0000 (07:35 +0200)]
core: Do not flush hypervisor TLB on unrelated page table changes
Only when working against the hypervisor page table in page_map_create
and page_map_destroy, we actually need to flush TLBs. Other page tables
require arch-specific flushings that we perform on arch_config_commit.
This measurably speeds up Jailhouse activation, e.g., when a significant
number of EPT and VT-d page table changes are performed.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 15 Jun 2014 05:17:42 +0000 (07:17 +0200)]
x86: Drop TLB flushes on cell configuration changes
The host TLB only requires flushing on hypervisor page table changes.
These only happen on the CPU that performs guest configurations changes
and only for mapping regions that are per-CPU. This is already handled
by flushes in page_map_create/destroy.
Hypervisor page mappings that are relevant for all CPUs are created
during setup. This is done on the setup master CPU before any other CPU
is initializing and flushing its caches by switching to the hypervisor
page table.
So we can drop x86_tlb_flush_all altogether. Rename the flush_caches
flag to flush_virt_caches to reflect that we only request guest-to-host
cache invalidations.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 6 Jun 2014 09:08:03 +0000 (11:08 +0200)]
core: Introduce arch_config_commit
This function allows us to consistently flush affected caches after
configuration changes. We did this after cell creation, partially did it
after destruction, but forgot about it on load/start. Flushing is now
extended to the CPU performing the changes as well as all CPUs of a
created or destroyed cell.
This change also enables the split-up of IOMMU activation and related
root cell and memory region mapping setup, a precondition for generic
memory region mapping.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 5 Jun 2014 14:36:42 +0000 (16:36 +0200)]
x86: Guard vtd_add_device_to_cell against addition of existing devices
Avoid that we add an already registered device to a cell, specifically
that we report this to the console. This case can happen soon when
rolling back failed cell creations.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 5 Jun 2014 16:25:39 +0000 (18:25 +0200)]
core/configs: Clean up config structure alignment/packing
Instead of spreading aligned(1), we rather need to pack all config
structures and also the containers we define in the config files
themselves. Clean this up, also dropping the now unneeded padding from
jailhouse_cell_desc.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 6 Jun 2014 14:19:39 +0000 (16:19 +0200)]
x86: Fold vmx/vtd_root_cell_shrink into vmx/vtd_cell_init
This primarily fixes a regression of 46ab6c2f1e: Due to that reordering,
we were first adding devices to a new cell, then removing them from the
root cell which effectively disabled them in the context table.
Analyzing the content of vmx/vtd_root_cell_shrink, we are better off
folding them into the corresponding cell_init functions. We fix the
ordering issue while doing this.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 6 Jun 2014 16:58:53 +0000 (18:58 +0200)]
x86: Make vtd_remove_device_from_cell more robust against non-existent devices
Do not crash if we call vtd_remove_device_from_cell for a device that is
not added to a cell, and there is even no corresponding context table.
This allows to use vtd_remove_device_from_cell e.g. for rollbacks of
half-done configurations after an error occurred.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 14 Jun 2014 07:36:08 +0000 (09:36 +0200)]
x86: Rework evaluation of MSR_IA32_VMX_TRUE_*_CTLS
The SDM recommends to keep default1-class controls enabled if they are
unknown to the VMM. This applies to most of those bits. Even worse, by
using the TRUE_*_CTLS, we kept DEBUG_CONTROLS saving/loading disabled on
most machines, corrupting the related states on vmexit.
Switch to the "untrue" capability MSRs, except for CR3 loading/storing,
which will ensure that default1 bits are kept enabled also on future
CPUs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 6 Jun 2014 06:07:01 +0000 (08:07 +0200)]
core: Add BIT_MASK macro and document BYTE_MASK
BIT_MASK will help defining constants according to hardware
specifications when bits [m:n] (m > n) form a field in a register or
data structure entry. Document also the BYTE_MASK macro at this chance.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 2 Jun 2014 10:09:53 +0000 (12:09 +0200)]
x86: Further improve EPT error reporting
Avoid double error reporting in vmx_handle_ept_violation if an access
handler already did this. Also correct the access direction message, it
was inverted.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 3 Jun 2014 15:38:20 +0000 (17:38 +0200)]
x86: Avoid crashes under QEMU due to missing DMAR units
Make sure we do not crash in the hypervisor when adding or removing
cells with PCI devices under QEMU. These hacks will be removed once
emulated VT-d is available.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 14 May 2014 09:40:10 +0000 (11:40 +0200)]
x86: Improve reporting of EPT violations
Report details about the EPT violation also when the MMIO parser fails.
At this chance, remove the term "EPT" from the print-out. This is an
invalid MMIO or RAM access.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 14 May 2014 09:14:04 +0000 (11:14 +0200)]
x86: Fix assembly constraints of write_gdtr/idtr
Copy & paste mistake: write_gdtr and write_idtr do not return anything
in descriptor table structure, they read from it. This broke the
hypervisor setup with certain optimizing compilers, noticed in
particular with old gcc 4.4.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 13 May 2014 14:36:46 +0000 (16:36 +0200)]
core: Fix tear-down order in setup error path
We have to do the arch shutdown before the restoring the CPU state to
Linux as we will otherwise lack required mapping for MMIO access. On
x86, VT-d shutdown will then cause a crash.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 9 May 2014 16:08:20 +0000 (18:08 +0200)]
tools: Fix loading of multiple images
Regression of regression fix c7fc4f1b04: We were incrementing the image
pointer twice, once in the loop control statement and the second time in
the loop body. Remove the latter.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 6 May 2014 15:16:52 +0000 (17:16 +0200)]
x86: Reset virtual CPU before parking it
We may bring the vCPU in an invalid state by setting HLT condition,
namely when SS.DPL != 0. Instead of fix this case and risking to miss
another, simply do a full reset which brings the vCPU to a known-good
state.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 5 May 2014 17:21:06 +0000 (19:21 +0200)]
core: Avoid exposing register set to pci_mmio_access_handler
This handler is generic and should not assume anything about how
registers can be accessed. At this chance, replace the open-coded MMIO
accesses with the appropriate helpers.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Ivan Kolchin [Tue, 22 Apr 2014 13:19:04 +0000 (17:19 +0400)]
core: Add support for a guest to access memory-mapped PCI configuration space
This patch is continuation of PIO support to request PCI config space.
Now it can be reached via MMIO. So, filtering logics is pretty similar.
Read accesses to PCI config is allowed just for devices which are owned.
Write accesses are regulated in accordance with white-list.
There are some limitations though as follows:
- Just 4-bytes operations are supported
- Guest must use only instructions 0x6b and 0x89 (read/write through intermediate
registers)
- All-1's write not supported
Signed-off-by: Ivan Kolchin <ivan.kolchin@siemens.com>
[Jan: style adjustments] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>