Jan Kiszka [Wed, 24 Feb 2016 09:19:54 +0000 (10:19 +0100)]
x86: Filter out physical address that can't be handled by DMAR units
Make sure that we do not try to program DMAR page tables with physical
addresses beyond the supported range (39 or 48 bits, depending on the
page table levels).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 20 Feb 2016 18:10:22 +0000 (19:10 +0100)]
x86: Account for DMAR units with multi-page register sets
The fault reporting registers we use may be placed in a 2nd or even 3rd
page. Account for such cases by using the MMIO region size now provided
via the system config.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 20 Feb 2016 18:09:49 +0000 (19:09 +0100)]
core, configs, tools: Prepare for variable IOMMU register set sizes
Introduce a size field to struct jailhouse_iommu and fill it via the
config generator. The information can be retrieved from the ACPI tables
for AMD. On Intel, we need to study the Linux mappings, thus we need to
demand that DMAR is enabled now while retrieving system information.
Based on patches by Valentine Sinitsyn.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
For both AMD and Intel, we need to store not only base address but also
a size to map the complete MMIO region. Moreover, AMD requires a number
of PCI device parameters for the IOMMU. Introduce struct jailhouse_iommu
that will encapsulate all required data.
Based on patches by Valentine Sinitsyn.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 3 Feb 2016 17:55:23 +0000 (18:55 +0100)]
inmates: e1000-demo: Enable queues explicitly
Newer NICs require us to enable the RX and TX queue. Although they
should be on after reset, at least the I350 refuses to work otherwise.
As the related bit is harmless or even unused on older NICs, do this
unconditionally (just like ipxe does).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 26 Jan 2016 08:27:40 +0000 (09:27 +0100)]
x86: Make debug UART port configurable via system config
We already allow to enable a VGA console via the system config, so let's
make the UART port configurable this way as well: phys_start will hold
the port, and flags must not have JAILHOUSE_MEM_IO set, in order to
differentiate us from the memory-mapped VGA console. And by leaving
phys_start at 0, we can even turn off the console now.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 25 Jan 2016 17:20:37 +0000 (18:20 +0100)]
core, driver: Pass rounded-up core size in hypervisor header
Hypervisor and root kernel may have different ideas about PAGE_SIZE.
This will cause wrong hypervisor core size calculations as seen on arm64
with 64K Linux PAGE_SIZE.
Avoid this trap by moving the round-up into the hypervisor code, passing
a ready-to-be-used size value in the header.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Daniel Sangorrin [Thu, 21 Jan 2016 01:31:26 +0000 (10:31 +0900)]
vga: Add support for VGA text buffer output on x86
Hypervisor messages are useful for debugging and are
typically handed out to the serial port. Unfortunately, x86
computers often lack of a serial port. This patch allows
hypervisor messages to be redirected to a screen by leveraging
the traditional VGA text buffer mode.
Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
[Jan: avoid row_line writeback in panic case, remove redundant braces] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 9 Jan 2016 06:15:59 +0000 (07:15 +0100)]
configs: Update Banana Pi configs to make use of unaligned MMIO regions
Split up the MMIO page 0x1c20000 on the Alwinner A20 into CCU,
interrupts controller, GPIOs and the timer. GPIOs are further broken up
to allow assigning port H to the gic-demo cell, along with the CCU (to
control the UART timing).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 8 Jan 2016 18:18:34 +0000 (19:18 +0100)]
core: Add support for sub-page MMIO regions
This allows to specify memory regions for MMIO accesses that do not
start or end on page boundaries. Instead of mapping full pages into the
cell, sub-page MMIO requires to intercept the page accesses, validate
all parameters against the target memory region and then perform the
access in hypervisor context, provided the validation was successful.
As the access can now fail in hypervisor context, we need to be more
picky: besides read/write permissions, alignment and access widths can
be checked additionally. These attributes are specified via the
JAILHOUSE_MEM_IO_* flags.
Sub-page MMIO is surely not a fast path. It not only requires world
switches between cell and hypervisor, the current implementation also
uses dynamic mappings. This is easier to implement than a static mapping
scheme, but surely not faster. We may revisit this design later on,
ideally towards a 1:1 mapping scheme.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 7 Jan 2016 17:21:55 +0000 (18:21 +0100)]
core: Remove memory regions check
Most of the checks will be removed when adding sub-page memory region
support. We rather need some offline validation outside the hypervisor
eventually.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 7 Jan 2016 17:10:20 +0000 (18:10 +0100)]
arm: Remove useless warning from arm_mmio_perform_access
This functions is only called with size 1, 2 or 4. This is ensured by
arch_handle_dabt, the only (indirect) caller, which generates the size
accordingly (1 << sas) and filters out sizes > 4.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 11 Aug 2015 07:20:41 +0000 (09:20 +0200)]
configs: Add cache region to x86 demo cells
Assuming we have more than 4 units of L3 cache on systems that support
L3 partitioning, assign the first 2 units (e.g. 2 MB on a Xeon D 1540)
to apic-demo, the 3rd to tiny-demo. Also the non-root Linux config gets
the first 2 units (it cannot run in parallel to the other demos). All
this is for testing the management logic and will later be used to
benchmark the partitioning.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 11 Aug 2015 07:05:24 +0000 (09:05 +0200)]
x86: Introduce Cache Allocation Technology support for Intel CPUs
CAT is a CPU feature first added to Xeon D and certain Xeon E5 v3
processors. It so far allows to specify access restrictions to the L3
cache, including complete isolation between different entities.
This adds CAT control to Jailhouse on a per-cell level. The user is free
to specify a contiguous access mask for each cell, use that mask
exclusively (typical case), share any overlaps with the root cell
(JAILHOUSE_CACHE_ROOTSHARED), or simply use the root cell mask. If
nothing else is specified, the root cell uses the full cache (until
non-root cells shrink it).
Due to the hardware-induced requirement to have a contiguous bitmask,
shrinking the root mask on cell creation and extending it again on
destruction is not trivial. Not at all.
When creating a new cell, we may punch a hole into the root mask. In
that case, we also remove the lower half from the roor mask and
accumulate those bits in a "freed mask" for reuse once the hole closes
again. And if we are unlucky, adding a cell empties the current root
mask. Then we have to look into the freed mask and switch to it if it's
non-empty.
When restoring the root mask on cell destruction, we choose a simple
algorithm that first collects all released bits in the freed mask, then
try to merge that mask bit-wise with the current root cell mask. On
success we restart the freed mask walk to ensure that all contiguous
bits are merged.
One may wonder why not reallocating masks completely dynamically and
automatically on each reconfiguration, instead of requiring that
explicit allocation via the config? The reason is that we do not want to
invalidate cache allocations of those cells that are not involved in a
reconfiguration.
A lot of complication with this mechanism which looked so simple on
first sight. Let's just hope that there is a noteworthy benefit in
restricting CAT bitmasks in hardware this way.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 11 Aug 2015 06:58:38 +0000 (08:58 +0200)]
core, tools: Introduce cache regions to the cell configuration
Allow to specify regions of caches so that the hypervisor can partition
their usage accordingly whenever the hardware supports this.
The specification of their start location and sizes depend on the
architecture specific partitioning support. So far, only L3 cache types
are definable, either as unified cached or further partitioned into code
and data (to cater Intel's CAT and CDP). As with memory regions, caches
are usually taken from the root cell on non-root cell creation, but they
can also be declared as shared with the root cell.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 10 Jan 2016 08:42:43 +0000 (09:42 +0100)]
inmates: arm: Make LED blinking in gic-demo optional
This is both a test/demo case for command line parsing on ARM and a
feature to control the LED signal in the gic-demo on Banana Pi. The
green LED will now only blink if "blinking_led" is specified as inmate
command line option.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 4 Jan 2016 10:31:49 +0000 (11:31 +0100)]
inmates: x86: Add optional cache pollution to apic-demo
When "pollute_cache" is specified as command line parameter of the
apic-demo, the demo will fill each cache line with a pattern in each
measurement loop. Up to 512 KB of cache can be polluted this way.
This allows to test L3 cache partitioning features of recent Intel CPUs:
The cache pollution will dirty the L1 and L2 data caches so that the
next loop iteration will access L3. If that cache is shared, latencies
will rise as other cells use the cache as well.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 4 Jan 2016 10:28:08 +0000 (11:28 +0100)]
inmates: x86: Allow to bypass TSC and APIC timer calibration
Make use of the command line feature and introduce the "tsc_freq" and
"apic_freq" parameters. When provided, these values are used directly
instead of running calibrations against the PM timer.
This is particularly useful when running micro-benchmarks that are
sensitive to the inherent small variations of the calibrations.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 4 Jan 2016 10:25:09 +0000 (11:25 +0100)]
tools: jailhouse: Add support for string loading
Extend the "cell load" command by a variant where a string provided
along with the command is loaded into the cell memory. This can be used
together with the new command line feature to pass parameters to inmates
that support this.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 4 Jan 2016 10:17:11 +0000 (11:17 +0100)]
inmates: Add support for command line parameters
This provides support for parsing string, integer (long long type) and
boolean command line parameters. The former two need to be in the form
of "name=value" so that cmdline_parse_str/int will return the extracted
value. Boolean parameters are just of the form "name", and
cmdline_parse_bool will return true if this pattern is found. Parameters
need to be separated by blanks.
The parameters can be passed to the inmate by loading the string at an
architecture-specific location. That is 0xf0000 on x86 and 0x100 on ARM
so far. Note that the inmate has to reserve an appropriately sized
buffer via the CMDLINE_BUFFER macro.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 2 Jan 2016 14:52:08 +0000 (15:52 +0100)]
inmates: arm: Initialize bss programmatically
Aligns ARM with x86: initialize bss with a small assembly loop before
inmate_main is invoked. This allows to move it after other sections,
effectively removing it from the image file.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 7 Jan 2016 07:54:17 +0000 (08:54 +0100)]
x86: Properly roll back failing IOAPIC cell initialization
We have to release already allocated resources if ioapic_get_or_add_phys
fails. At least the arch.ioapics array should be freed again, but
possibly also previously claimed root cell IOAPIC pins.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 27 Dec 2015 18:02:43 +0000 (19:02 +0100)]
x86: Rename local result variable
The return value of vtd_emulate_inv_int is not of the typical "0 or
negative error code" but actually returns an IR table index on success.
Avoid any confusions by using the more neutral variable name "result".
No functional changes.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 24 Sep 2015 19:53:06 +0000 (21:53 +0200)]
tools: config-create: Add stubs for extended capabilities
Scan the extended capability space of PCI express devices and leave
a stub for anything that is detected. For SR-IOV, the size is already
encoded, other capabilities still need to be filled. This doesn't expand
write permission to any capability yet, standard or extended.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 24 Sep 2015 19:48:05 +0000 (21:48 +0200)]
core: Tag PCI express extended capability IDs via highest bit
PCI express extended capabilities span a separate ID space. In order to
use the same jailhouse_pci_capability structure as for PCI capabilities
and also to avoid extending the ID field, reserve the highest bit 15 to
tag extended IDs. PCI so far only uses the lowest 5 bits and apparently
expands linearly, so we won't see any conflicts in the foreseeable
future.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 11 Aug 2015 04:35:11 +0000 (06:35 +0200)]
x86: vmx: Micro-cleanup in vcpu_vendor_cell_init
Return the error code directly instead of take the indirect route via
pre-initialized err variable. Avoids that some refactoring once destroys
this relationship.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 21 Dec 2015 23:29:14 +0000 (00:29 +0100)]
x86: Report only unhandled PIO accesses in the common handler
This aligns vcpu_handle_io_access to vcpu_handle_mmio_access again which
got lost in 46ad4efeb8: errors detected by handlers are already reported
there.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 21 Dec 2015 23:53:45 +0000 (00:53 +0100)]
x86: Intercept #AC and #DB to prevent guest-triggered microcode loops
This addresses CVE-2015-5307 and CVE-2015-8104 [1] for Jailhouse:
malicious cells may bring VCPUs into a state where the CPU will
infinitely loop over microcode, providing the hypervisor no chance to
interrupt these loops anymore. For this we have to intercept the #DB and
the exceptions to the cell.
If a guest is trapped in an exception loop can be detected by checking
the exception exit statistics which are now recorded: a large number of
exception exists per second (>1 million typically) will indicate this.
Jan Kiszka [Mon, 21 Dec 2015 23:50:36 +0000 (00:50 +0100)]
x86: Enhance x86_handle_events to x86_check_events
There is now quite some commonality between svm and vmx when it comes to
checking for pending events. Move those parts into x86_check_events,
which becomes the extended version of x86_handle_events. Only a small
difference is now left behind in vmx_check_events(): the preemption
timer has to be disabled before the check.
Just like x86_handle_events, also x86_check_events only works against
the caller's CPU. So remove the cpu_data parameter at this chance.
We can remove the "sipi_vector = -1" after x86_enter_wait_for_sipi now
because we no longer return that value from x86_check_events, and
sipi_vector is not evaluated elsewhere because cpu_data->wait_for_sipi
is true.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Ralf Ramsauer [Tue, 27 Oct 2015 16:26:24 +0000 (17:26 +0100)]
Respect size of io bitmap in vcpu_cell_init()
Previous code copied the IO bitmap without respect to its actual size.
This patch simplifies the copying process and respects the size of the
destination.
vcpu functions were using sizeof() to determine the size of dynamically
allocated I/O bitmap, which won't work. Assign this value statically per
sub-architecture (Intel or AMD).
Xuguo Wang [Thu, 15 Oct 2015 07:13:26 +0000 (15:13 +0800)]
Documentation: articles: LJ-article-04-2015.txt
This document is used for the newbies, so I think the words must
accurate, and command must correct, but in the section of "Configs and
inmates", a command like this :
sudo tools/jailhouse cell stat apic-demo
but actually the right command is :
sudo tools/jailhouse cell stats apic-demo
So I send this patch.
Reported-by: Xuguo Wang <huddy1985@gmail.com> Signed-off-by: Xuguo Wang <huddy1985@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 15 Oct 2015 08:53:20 +0000 (10:53 +0200)]
ci: Update Travis Ubuntu environment
The utopic packages are no longer available, we need vivid. It's also a
good point to try out the beta environment based on trusty in to hope to
reduce the number of updates.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 25 Sep 2015 17:47:18 +0000 (19:47 +0200)]
x86: svm: Fix broken FS base on deactivation
After f93e23934b, we no longer call vmsave, thus will also not find the
right FS base there. This caused sporadic crashes of "jailhouse disable"
on return to userspace.
Fix it by loading the value from the corresponding MSR.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 18 Sep 2015 16:02:10 +0000 (18:02 +0200)]
core: pci: Fix MMCONFIG handling for root cell
Reorder the initialization in pci_init so that MMCONFIG is set up before
pci_cell_init is invoked for the root cell. Calling pci_cell_init
earlier has the undesired effect that the MMCONFIG region is not
registered for the root cell, and all related accesses will fail with
generic MMIO errors.
Jan Kiszka [Wed, 16 Sep 2015 07:22:23 +0000 (09:22 +0200)]
inmates: x86: Add support for TSC-based timing
Provide a service to calibrate the TSC against the PM timer and read out
the current time in nanoseconds. This service is much faster than the
slow PM timer, and it's also not affected by chipset-induced delays.
Note that the simplistic algorithm only supports measuring relative time
spans of a couple of seconds.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Antonios Motakis [Wed, 12 Aug 2015 16:21:58 +0000 (18:21 +0200)]
core: printk: include asm/bitops.h directly
Currently the implementation in hypervisor/printk.c assumes asm/bitops.h
will be included by asm/spinlock.h. Since this implementation is using
bitops directly, we include the right header file.
Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com>
[Jan: adjust to alphabetic ordering] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Ralf Ramsauer [Thu, 13 Aug 2015 23:23:58 +0000 (01:23 +0200)]
hypervisor, driver: Added signature for .cell files
Inserted signature field in struct jailhouse_cell_desc and
jailhouse_system. Jailhouse kernel driver will refuse loading
a system configuration as a cell configuration et vice versa.
Signed-off-by: Ralf Ramsauer <ralf@ramses-pyramidenbau.de>
[Jan: also adjust Linux loader script] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 5 Aug 2015 10:05:20 +0000 (12:05 +0200)]
arm: Migrate irqchips to generic MMIO dispatcher
Register the GIC distributor and, for the GICv3, also the redistributor
regions with the generic MMIO dispatcher. This allows to drop the GIC-
specific MMIO dispatching from arch_handle_dabt.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>