Jan Kiszka [Sun, 13 Apr 2014 06:30:53 +0000 (08:30 +0200)]
core/driver: Remove multi-arg support for hypercalls
There is no scenario is sight where we may need to pass more than one
argument to a hypercall. So remove the related infrastructure and update
the ABI documentation for zero or single-argument hypercalls only.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 23 Apr 2014 08:35:08 +0000 (10:35 +0200)]
x86: Permit PCI capability writes until we moderate them
9f49a9b899 blocks any write access to the PCI config space that is not
explicitly allowed. This includes capabilities which we need to properly
moderate (or even virtualize: MSI[-X]) later on. For now we need to
permit caps access again as machines will otherwise break too easily
over "Invalid PIO write".
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Ivan Kolchin [Tue, 15 Apr 2014 06:15:26 +0000 (10:15 +0400)]
x86: Add handler of accesses to PCI configuration space via I/O ports
Guest attempts to access ports 0xcf8 and 0xcfc are processed. String
and REP-prefixed instructions are not supported for this space. Ownership
of a device a cell tries to access to is checked. If the cell doesn't own it,
then hypervisor returns 0xFFFFFFFF to it. All read accesses to owned device are
not restricted. Writes all 1's to specific registers such as BARs or expansion
ROM address are not currently supported.
Writes to registers are moderated by white lists.
Signed-off-by: Ivan Kolchin <ivan.kolchin@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 12 Apr 2014 07:11:50 +0000 (09:11 +0200)]
core: Document implicit synchronization between cell/cpu_get_state and cell_create/destroy
It may not be obvious why we do not need to synchronize with cell
creation/destruction while accessing cell data structures from the
get_state hypercalls.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 9 Apr 2014 16:35:44 +0000 (18:35 +0200)]
tools: Fix freeing of image memory after cell creation
Regression of 95666fd1: We are not allocating multiple images, and the
image variable does not point to a valid address when we try to free the
former only image at the end of cell_create. Properly loop over all the
images for freeing.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 8 Apr 2014 15:39:56 +0000 (17:39 +0200)]
core: Fix page_alloc for more than BITS_PER_LONG pages
The start_mask in find_next_free_page has to be 0 if the start page
number can be divided by BIT_PER_LONG, but it was ~0UL so far. Due to
this bug, we weren't able to allocate more than BITS_PER_LONG (64 on
x86) pages in one run.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 6 Apr 2014 18:32:54 +0000 (20:32 +0200)]
core: Fix regressions of generic root cell shrinking
This commit restores unmapping of a new cell's memory regions from the
root cell. Two small but fatal bugs broke it:
- inverted error check in arch_unmap_memory_region (regression of 738dffd234)
- instead of setting mem.virt_start from phys_start, as stated in the
comment, we did the opposite (regression of a033f90c8b)
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 6 Apr 2014 09:01:17 +0000 (11:01 +0200)]
core: Implement generic root cell shrinking of memory regions
Use arch_unmap_memory_region to implement the unmapping of a new cell's
memory regions from the root cell. This simplifies the code and even
allows us to perform a roll-back on errors (always provided we still
have enough memory after a potential hughpage breakup).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 6 Apr 2014 08:53:37 +0000 (10:53 +0200)]
core: Make arch_unmap_memory_region return error codes
We are going to use arch_unmap_memory_region also for scenarios we
hugepages might require breakups, thus we will need to handle potential
errors. Prepare arch_unmap_memory_region and its implementations for
this by propagating the error of page_map_destroy.
We can still ignore the return code in cell_destroy, so move the
corresponding comment.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 3 Apr 2014 08:39:26 +0000 (10:39 +0200)]
core: Rename "Linux cell" to "root cell"
As we refer to the first cell that contains the boot-strap Linux as
"root cell" in the documentation, change the internal naming accordingly
in order to be more consistent. The affects linux_cell primarily, but
also a few related variable and function names as well as a couple of
console messages.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 4 Apr 2014 10:45:12 +0000 (12:45 +0200)]
configs: Update QEMU VM config
Reflects changes in latest QEMU: PCI range starts at lower address,
e1000 and virtio-9p flipped positions. Also adjust a harmless mistake of
the denied range at this chance.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 27 Mar 2014 11:35:24 +0000 (12:35 +0100)]
core: Take page offset of config data into account when mapping the data
We have to take the offset of the configuration data on the first page
into account when calculating the mapping size. Otherwise we may fail to
map its last page.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 25 Mar 2014 18:19:48 +0000 (19:19 +0100)]
core: Fix dependencies of arch files
Commit bf4918207a changed the way we include the arch subdirs into the
hypervisor build but broke the dependency check for arch/*/built-in.o.
Fix it by enforcing to perform the arch build steps.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 18 Mar 2014 14:22:37 +0000 (15:22 +0100)]
core: Fix parallel build
May be not the nicest solution, but we need to teach kbuild the
dependency between arch/$(SRCARCH)/built-in.o and the arch subdir. The
only know way to do this is to convert subdir-y into an explicit rule to
build to arch subdir.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 14 Mar 2014 13:12:13 +0000 (14:12 +0100)]
x86: Fix build on 32-bit userlands
If building for a 32-bit target userland, the toolchain defaults to
32-bit as well. While we gain -m64 automatically while building the
kernel module, we need to inject it explicitly for the hypervisor and
the inmates.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 11 Mar 2014 11:32:59 +0000 (12:32 +0100)]
core: Remove caller restriction from Hypervisor Get Info hypercall
There is no sensitive information to hide here. Rather, we want to be
able to use this hypercall also from non-root cells, e.g. to monitor the
system setup progress.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 10 Mar 2014 15:16:15 +0000 (16:16 +0100)]
driver/core: Move page offset field from header into hypervisor core
No need to pass this information in from the loader driver, needlessly
extending the bootstrap interface. We can perfectly calculate the page
offset during paging setup and store it in a global variable.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 6 Mar 2014 15:34:37 +0000 (16:34 +0100)]
driver: Round up image region mappings to page boundaries
This fixes a Linux oops when loading images of non-page-aligned size.
More precisely, ioremap_page_range becomes unhappy when we try to map
partial pages.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 5 Mar 2014 07:55:34 +0000 (08:55 +0100)]
core: Factor out cpu_id_valid
Will be reused soon to validate CPU IDs passed in via a hypercall. For
this reason, we use unsigned long as ID type because this is also the
type of hypercall arguments.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Mar 2014 19:27:23 +0000 (20:27 +0100)]
core: Set cell state to "failed" if all its CPUs have failed
A cell that has crashed all its CPUs can be marked as failed. This means
that the root cell can destroy it even when it would otherwise ask for
permission first - there is no need to ask anymore, we are already deep
into an unordered cell shutdown at this point.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Mar 2014 19:27:14 +0000 (20:27 +0100)]
core: Mark CPU as "failed" after any violation
When marking a CPU that caused a fault in guest mode marked it as
"failed" until we forward it from the root cell or pass it back to it on
cell destruction.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Mar 2014 19:18:10 +0000 (20:18 +0100)]
x86: Move fault handler to control.c
The fault module became so trivial that we can perfectly host it as part
of control.c, saving one set of code and header files. Rename the
exception handler to x86_exception_handler in order to mark it
architecture specific.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Mar 2014 19:14:51 +0000 (20:14 +0100)]
core: Factor out generic panic_stop/halt services
These functions already contain too much generic logic, and panic_halt
will gain even more soon. Move them under the hood of the control module
and split them up into generic and arch-specific pieces.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Mar 2014 17:15:47 +0000 (18:15 +0100)]
core/x86: Provide "Cell Get State" hypercall
Implement the hypercall to retrieve the cell state. This is based on the
information the cell provides via its communication page (as long as it
is alive). So the value may be corrupt, and we need to check it before
returning it to the caller.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 4 Mar 2014 08:12:42 +0000 (09:12 +0100)]
Documentation: Update and extend interface descriptions
First of all, introduce a glossary for key terms used in the Jailhouse
docs. It introduces the new terms "root cell" and "non-root cell". Then
extend and refactor the hypervisor interface descriptions, specifically
adding hypercalls for obtaining some basic diagnostic data. This data is
supposed to be exposed by the driver via sysfs. Start documenting its
structure.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 2 Mar 2014 18:31:12 +0000 (19:31 +0100)]
core/driver: Switch to ID-based cell addressing scheme
Return the cell ID on cell creation and request this ID instead of the
cell name for destruction. Will also help to keep future per-cell
hypercalls simple.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 1 Mar 2014 11:33:32 +0000 (12:33 +0100)]
driver: Track all active cells
Create a kobject for every existing cell, including the root cell, and
add it to sysfs. This will allow to export state information etc. about
active cells later on and to maintain additional data over the lifetime
of a cell in the driver.
Moreover, we can now avoid trying to create a cell twice. This only
triggers a memory access violation when writing to the reserved memory
of the existing cell, effectively offlining the Linux CPU that tries it.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 17 Feb 2014 10:02:36 +0000 (11:02 +0100)]
tools: Add support for loading multiple images during cell creation
Augment the power of "jailhouse cell create" by supporting to load
multiple images during cell creation. This allows, e.g., to specify cell
code and data separately.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 17 Feb 2014 08:51:22 +0000 (09:51 +0100)]
driver: Properly select memory region for image loading
Do not simply assume that the first memory region of a cell will take
the preloaded image. Rather, walk the list of regions, picking the one
that can completely take the image. Bail out if no region is found.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 16 Feb 2014 19:12:28 +0000 (20:12 +0100)]
driver: Factor out load_image
In preparation of processing multiple jailhouse_preload_image entries
and validating their content more carefully, encapsulate the existing
logic in a separate function.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 16 Feb 2014 18:27:41 +0000 (19:27 +0100)]
driver: Stop clearing cell memory
The current code assumes that the first memory region in the cell
configuration is RAM when clearing it. This is fragile. But it is also
unnecessary: we can require that the cell clear its memory as needed.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Ivan Kolchin [Thu, 13 Feb 2014 12:13:05 +0000 (16:13 +0400)]
x86: Add fault reporting in VT-d
The reporting facilitates configuration of PCI-devices. If there's
an error then corresponding message is shown in a console.
The reporting is implemented as delivery of NMI-interrupts via MSI to
one of the Linux cell's cores.
Signed-off-by: Ivan Kolchin <ivan.kolchin@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Ivan Kolchin [Tue, 4 Feb 2014 13:37:06 +0000 (17:37 +0400)]
x86: Add ffsl() and rename ffz() to ffzl() for consistency
Bit operations intended for use instead of using built-in functions.
Signed-off-by: Ivan Kolchin <ivan.kolchin@siemens.com>
[Jan: Renamed ffz placeholder for ARM as well] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 10 Feb 2014 16:51:24 +0000 (17:51 +0100)]
x86: Fix enabling of 1G hugepages
These are not supported by all CPUs, in fact. So check support and
otherwise clear the corresponding page_size field in hv_paging. We keep
x86_64_paging as template for all the 4-level paging modes on x86-64.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 10 Feb 2014 16:28:41 +0000 (17:28 +0100)]
core: Prevent hugepage creation if physical and virtual addresses are unaligned
We can only create hugepages if the there is an alignment on the page
size by both the virtual and physical address. Without this check, we
crashed, e.g., on configurations that placed the hypervisor on physical
start addresses that were not 2M-aligned.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 10 Feb 2014 10:42:33 +0000 (11:42 +0100)]
x86: Switch GDT, IDT and segments before enabling hypervisor page table
The Linux GDT and IDT are no longer accessible once we switched to the
hypervisor page table. So we need to move the latter after the switching
of those tables and segment registers to avoid occasional crashes during
hypervisor enabling.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 9 Feb 2014 10:08:27 +0000 (11:08 +0100)]
core: Move per_cpu::cpu_id initialization out of assembly code
We can trivially initialize this field during early setup if we pass the
value to entry(). Removes one offset define that needs to be kept in
manual sync with struct per_cpu.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 5 Feb 2014 10:35:18 +0000 (11:35 +0100)]
x86: Detect and use guest paging mode for MMIO parsing
This fixes the assumption our guests would only trigger MMIO in IA32e
mode. This does not handle other modes yet, but it lays the foundation
and prevents misinterpreting paging structures.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 6 Feb 2014 17:30:07 +0000 (18:30 +0100)]
core: Introduce and apply guest_paging_structures
Just like paging_structures describes a host-side page table hierarchy
via its paging mode and the root table pointer, guest_paging_structures
shall now provide information about the guest-side page table. The only
but important difference is that the reference to the root table is a
guest-physical address. Therefore, to avoid mixing up with host-side
table, we use different types.
This abstraction will help passing a guest page table reference around.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 5 Feb 2014 08:21:55 +0000 (09:21 +0100)]
x86: Enable hugepages in all hypervisor, EPT and VT-d page tables
Arm support for hugepage creation by adding the required sizes and
callbacks to the 64-bit paging mode. When deriving the paging modes of
EPT and VT-d, we now need to take their capabilities into account and
have to clear page_size at those levels that are not supported by the
underlying hardware.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>