Jan Kiszka [Mon, 10 Mar 2014 15:16:15 +0000 (16:16 +0100)]
driver/core: Move page offset field from header into hypervisor core
No need to pass this information in from the loader driver, needlessly
extending the bootstrap interface. We can perfectly calculate the page
offset during paging setup and store it in a global variable.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 6 Mar 2014 15:34:37 +0000 (16:34 +0100)]
driver: Round up image region mappings to page boundaries
This fixes a Linux oops when loading images of non-page-aligned size.
More precisely, ioremap_page_range becomes unhappy when we try to map
partial pages.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 5 Mar 2014 07:55:34 +0000 (08:55 +0100)]
core: Factor out cpu_id_valid
Will be reused soon to validate CPU IDs passed in via a hypercall. For
this reason, we use unsigned long as ID type because this is also the
type of hypercall arguments.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Mar 2014 19:27:23 +0000 (20:27 +0100)]
core: Set cell state to "failed" if all its CPUs have failed
A cell that has crashed all its CPUs can be marked as failed. This means
that the root cell can destroy it even when it would otherwise ask for
permission first - there is no need to ask anymore, we are already deep
into an unordered cell shutdown at this point.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Mar 2014 19:27:14 +0000 (20:27 +0100)]
core: Mark CPU as "failed" after any violation
When marking a CPU that caused a fault in guest mode marked it as
"failed" until we forward it from the root cell or pass it back to it on
cell destruction.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Mar 2014 19:18:10 +0000 (20:18 +0100)]
x86: Move fault handler to control.c
The fault module became so trivial that we can perfectly host it as part
of control.c, saving one set of code and header files. Rename the
exception handler to x86_exception_handler in order to mark it
architecture specific.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Mar 2014 19:14:51 +0000 (20:14 +0100)]
core: Factor out generic panic_stop/halt services
These functions already contain too much generic logic, and panic_halt
will gain even more soon. Move them under the hood of the control module
and split them up into generic and arch-specific pieces.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Mar 2014 17:15:47 +0000 (18:15 +0100)]
core/x86: Provide "Cell Get State" hypercall
Implement the hypercall to retrieve the cell state. This is based on the
information the cell provides via its communication page (as long as it
is alive). So the value may be corrupt, and we need to check it before
returning it to the caller.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Mar 2014 08:12:42 +0000 (09:12 +0100)]
Documentation: Update and extend interface descriptions
First of all, introduce a glossary for key terms used in the Jailhouse
docs. It introduces the new terms "root cell" and "non-root cell". Then
extend and refactor the hypervisor interface descriptions, specifically
adding hypercalls for obtaining some basic diagnostic data. This data is
supposed to be exposed by the driver via sysfs. Start documenting its
structure.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 2 Mar 2014 18:31:12 +0000 (19:31 +0100)]
core/driver: Switch to ID-based cell addressing scheme
Return the cell ID on cell creation and request this ID instead of the
cell name for destruction. Will also help to keep future per-cell
hypercalls simple.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 1 Mar 2014 11:33:32 +0000 (12:33 +0100)]
driver: Track all active cells
Create a kobject for every existing cell, including the root cell, and
add it to sysfs. This will allow to export state information etc. about
active cells later on and to maintain additional data over the lifetime
of a cell in the driver.
Moreover, we can now avoid trying to create a cell twice. This only
triggers a memory access violation when writing to the reserved memory
of the existing cell, effectively offlining the Linux CPU that tries it.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 17 Feb 2014 10:02:36 +0000 (11:02 +0100)]
tools: Add support for loading multiple images during cell creation
Augment the power of "jailhouse cell create" by supporting to load
multiple images during cell creation. This allows, e.g., to specify cell
code and data separately.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 17 Feb 2014 08:51:22 +0000 (09:51 +0100)]
driver: Properly select memory region for image loading
Do not simply assume that the first memory region of a cell will take
the preloaded image. Rather, walk the list of regions, picking the one
that can completely take the image. Bail out if no region is found.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 16 Feb 2014 19:12:28 +0000 (20:12 +0100)]
driver: Factor out load_image
In preparation of processing multiple jailhouse_preload_image entries
and validating their content more carefully, encapsulate the existing
logic in a separate function.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 16 Feb 2014 18:27:41 +0000 (19:27 +0100)]
driver: Stop clearing cell memory
The current code assumes that the first memory region in the cell
configuration is RAM when clearing it. This is fragile. But it is also
unnecessary: we can require that the cell clear its memory as needed.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Ivan Kolchin [Thu, 13 Feb 2014 12:13:05 +0000 (16:13 +0400)]
x86: Add fault reporting in VT-d
The reporting facilitates configuration of PCI-devices. If there's
an error then corresponding message is shown in a console.
The reporting is implemented as delivery of NMI-interrupts via MSI to
one of the Linux cell's cores.
Signed-off-by: Ivan Kolchin <ivan.kolchin@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Ivan Kolchin [Tue, 4 Feb 2014 13:37:06 +0000 (17:37 +0400)]
x86: Add ffsl() and rename ffz() to ffzl() for consistency
Bit operations intended for use instead of using built-in functions.
Signed-off-by: Ivan Kolchin <ivan.kolchin@siemens.com>
[Jan: Renamed ffz placeholder for ARM as well] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 10 Feb 2014 16:51:24 +0000 (17:51 +0100)]
x86: Fix enabling of 1G hugepages
These are not supported by all CPUs, in fact. So check support and
otherwise clear the corresponding page_size field in hv_paging. We keep
x86_64_paging as template for all the 4-level paging modes on x86-64.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 10 Feb 2014 16:28:41 +0000 (17:28 +0100)]
core: Prevent hugepage creation if physical and virtual addresses are unaligned
We can only create hugepages if the there is an alignment on the page
size by both the virtual and physical address. Without this check, we
crashed, e.g., on configurations that placed the hypervisor on physical
start addresses that were not 2M-aligned.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 10 Feb 2014 10:42:33 +0000 (11:42 +0100)]
x86: Switch GDT, IDT and segments before enabling hypervisor page table
The Linux GDT and IDT are no longer accessible once we switched to the
hypervisor page table. So we need to move the latter after the switching
of those tables and segment registers to avoid occasional crashes during
hypervisor enabling.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 9 Feb 2014 10:08:27 +0000 (11:08 +0100)]
core: Move per_cpu::cpu_id initialization out of assembly code
We can trivially initialize this field during early setup if we pass the
value to entry(). Removes one offset define that needs to be kept in
manual sync with struct per_cpu.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 5 Feb 2014 10:35:18 +0000 (11:35 +0100)]
x86: Detect and use guest paging mode for MMIO parsing
This fixes the assumption our guests would only trigger MMIO in IA32e
mode. This does not handle other modes yet, but it lays the foundation
and prevents misinterpreting paging structures.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 6 Feb 2014 17:30:07 +0000 (18:30 +0100)]
core: Introduce and apply guest_paging_structures
Just like paging_structures describes a host-side page table hierarchy
via its paging mode and the root table pointer, guest_paging_structures
shall now provide information about the guest-side page table. The only
but important difference is that the reference to the root table is a
guest-physical address. Therefore, to avoid mixing up with host-side
table, we use different types.
This abstraction will help passing a guest page table reference around.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 5 Feb 2014 08:21:55 +0000 (09:21 +0100)]
x86: Enable hugepages in all hypervisor, EPT and VT-d page tables
Arm support for hugepage creation by adding the required sizes and
callbacks to the 64-bit paging mode. When deriving the paging modes of
EPT and VT-d, we now need to take their capabilities into account and
have to clear page_size at those levels that are not supported by the
underlying hardware.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Feb 2014 17:03:24 +0000 (18:03 +0100)]
core: Add support for creating page tables with hugepages
When adding support for generating hugepages during page_map_create, we
also need to address the issue of overwriting or splitting up such
pages. When partially unmapping a hugepage, we need to break it up
first, then unmap the included pages. This break-up may fail when we are
short on memory, thus page_map_destroy may actually fail now, and we
have to take this into account on the caller side.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Feb 2014 21:35:04 +0000 (22:35 +0100)]
core: Switch to table-driven page table construction and interpretation
Switch page table creation and interpretation to a new, fully
table-driven scheme. It is much more regular and also more flexible when
it comes to support more paging modes, specifically on x86 (32-bit
paging, PAE etc.) in order to extend MMIO support. It is also laying the
foundation creating hugepages, which will reduce TLB pressure and memory
usage. So far only reading of hugepages is supported.
A paging mode is now define via an array of paging structure. An array
entry represents a page table level, starting with the root level. Each
paging structure contains a number of handlers to set or get entries at
the corresponding level. It also contains a page size value which is
non-zero in case the page table level support terminal entries that
point to a physical page address. This implies that the final element in
the paging structure array must have a non-zero page size field.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 26 Jan 2014 18:26:28 +0000 (19:26 +0100)]
core: Introduce paging_structures abstraction
This structure shall once hold a reference to both the paging hierarchy
and how it is read or manipulated. So far, we once encapsulate the root
table reference and update all sites that deal with page tables.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Ivan Kolchin [Fri, 31 Jan 2014 05:43:03 +0000 (09:43 +0400)]
x86: Fix a bug with writing zeros to register IOTLB_REG containing RsvdP field
Register IOTLB_REG contains RsvdP fields. Their values must be preserved on writing
in accordance with the specification. Operations having accessed to this kind of
registers in unsafe way were replaced by means of new helpers.
Signed-off-by: Ivan Kolchin <ivan.kolchin@siemens.com>
[Jan: style fix of VTD_IOTLB_R_MASK] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Ivan Kolchin [Wed, 29 Jan 2014 12:42:02 +0000 (16:42 +0400)]
core: Add functions to read/write field values of 32/64-bit registers
The following aims are obtained using these functions: making a register
description to be uniform, easy reading/writing field values without using bit
operations, making the code more readable, preventing casual changes of
register content.
Signed-off-by: Ivan Kolchin <ivan.kolchin@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 30 Jan 2014 11:16:55 +0000 (12:16 +0100)]
core/driver: Switch hypervisor to fixed virtual address layout
Now that the driver always puts us at the same virtual address, we can
compile this into the hypervisor as well. On x86, we switch to the code
model "kernel", i.e. all virtual addresses have the higher 32 bits set.
This allows to drop -fpic and -fpie, the global offset table. And the
entry field in the header is now absolute.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 30 Jan 2014 11:27:02 +0000 (12:27 +0100)]
core: Fix output of hypervisor text start
Though we reduced the header size in dfe32d1ba6, the text segment is
still 16-byte aligned. Better introduce an explicit mark for the start
instead of relying on address calculations.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 29 Jan 2014 15:27:16 +0000 (16:27 +0100)]
driver: Map hypervisor at fixed virtual address
First step to overcome relocation of the hypervisor and to stabilize its
configuration footprint after loading: We reserve a fixed virtual
address range from the kernel and simply map the hypervisor there. The
address is, of course, architecture specific, may even require
adjustments per target. But the advantages of having a stable
configuration in memory that can rather easily checked after setup and
the simplifications in the hypervisor code when it will always have the
same virtual address outweighs this.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 29 Jan 2014 12:50:22 +0000 (13:50 +0100)]
driver: Fault-in hypervisor core pages before shutting down
Linux tends to apply changes to kernel mappings lazily to mm structs. If
this hits us in the middle of the world switch during shutdown, we will
triple-fault. Avoid this by touching all hypervisor core and per-CPU
pages in the IPI handler before triggering the hypercall.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 29 Jan 2014 12:20:37 +0000 (13:20 +0100)]
core: Allow dummy-read to hypervisor core region from Linux cell
In order to support the shutdown process which may have to fault-in the
hypervisor mapping into Linux address space, allow read access to the
physical region that contains the hypervisor core and the per-CPU data
structures. We simple expose an empty (zeroed) page to Linux.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 26 Jan 2014 18:40:46 +0000 (19:40 +0100)]
core: Move TEMPORARY_MAPPING_BASE define to generic header
The temporary mapping region is always located at the beginning of the
remapping pool. That's already encoded into generic code, so move the
symbolic address over as well.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 26 Jan 2014 18:13:03 +0000 (19:13 +0100)]
core: Get rid of unusual term "foreign"
In order to avoid using unusual terms for well-known things, rename
page_map_get_foreign_page to page_map_get_guest_page. Also avoid the
term foreign in related constants, calling the mapping target area now
temporary mapping region.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 26 Jan 2014 10:40:10 +0000 (11:40 +0100)]
core: Factor out FOREIGN_MAPPING_CPU_BASE
Encapsulate the calculation for the start of the per-CPU mapping region.
We use a macro to avoid having to include percpu.h from paging.h which
would create circular dependencies soon.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 28 Jan 2014 16:41:41 +0000 (17:41 +0100)]
core/driver: Consolidate bss_start/end header fields to core_size
We only need to know how large the hypervisor core is, i.e. the part
that is loaded into RAM during initialization. That this size is derived
from the end of the bss section can be seen in our linker script.
bss_start was completely unused so far.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 24 Jan 2014 16:15:26 +0000 (17:15 +0100)]
x86: Clean up xAPIC-related magics and register accesses
Instead of open-coding, use mmio_read/write for accessing xAPIC
registers. Wrap the address calculations with the helper macro and
use a symbolic constant for the xAPIC ID shift.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 18 Jan 2014 21:37:31 +0000 (22:37 +0100)]
x86: Fix SIPI processing
Make sure that we only deliver a SIPI vector when there is actually one
pending. We park the CPU while in wait-for-SIPI state. If we receive an
IPI before a SIPI was defined, x86_handle_events will deliver a random
SIPI vector. Avoid this by encoding SIPI availability via the
sipi_vector fields.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 17 Jan 2014 16:07:06 +0000 (17:07 +0100)]
x86: Fix stand-alone inclusion of vmx.h and vtd.h
This ensures compliance with our (yet unwritten) rule that all headers
should allow stand-alone inclusion. Exceptions are headers used also by
guests or the driver.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 17 Jan 2014 13:27:06 +0000 (14:27 +0100)]
core: Introduce per-page TLB flushing
Reduce the overhead of MMIO parsing specifically by introducing a
per-page TLB flush. Restrict the existing global one to x86, that's
where is is used so far.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 16 Jan 2014 19:47:08 +0000 (20:47 +0100)]
core: Properly translate guest physical addresses in page_map_get_foreign_page
We were incorrectly using a fixed page table offset for translating
physical addresses read from guest page tables to host physical
addresses. This approach totally neglected guest address space limits
and fragmentations. Fix it by asking the architecture to translate a
guest physical address. With VMX, we simply walk the EPT table of the
caller's CPU.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 16 Jan 2014 17:54:12 +0000 (18:54 +0100)]
core: Reintroduce page_map_virt2phys
This was quick: We actually need this function to obtain guest->host
physical address translations during page_map_get_foreign_page.
Reintroduce it, but with adjusted interface:
First, the only page table offset we need is the one of the hypervisor
because this function will only be used for walking page tables that are
fully mapped into the hypervisor address space.
Second, to align its interface to the companion function page_map_create
and page_map_destroy, introduce a variable level parameter.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 16 Jan 2014 17:49:11 +0000 (18:49 +0100)]
core: Pass caller's per_cpu to page_map_get_foreign_page
Replace mapping_region and page_table_offset parameters with the
caller's per_cpu struct. All information can be obtained from it, and we
will need it for the upcoming changes.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>