Describe so far undocumented functions and also the communication region
structure.
For the latter, we have to expand the generic COMM_REGION_GENERIC_HEADER
macro during a doxygen run. This is achieved by including the generic
header from within the arch-specific one, but only for doxygen
processing. This special treatment is required because doxygen processes
each file directly, even if it should have been processed indirectly
already (here asm/jailhouse_hypercall.h via jailhouse/hypercall.h).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 14 Jul 2015 05:16:50 +0000 (07:16 +0200)]
core, inmates: Move \r injection into console_write / arch_dbg_write
This moves the injection of \r on \n into the console_write and
arch_dbg_write implementations, causing some minor duplication but also
fixing injection for %s strings. Furthermore, this allows to skip the
injection for consoles the may not need it.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Tue, 14 Jul 2015 05:29:35 +0000 (07:29 +0200)]
arm: Use more panic_printk for fatal error messages
Fatal errors that will leave CPUs unusable and may occur in parallel on
multiple CPUs should be reported via panic_printk to maintain
readability of the output. Adjust some locations for unexpected HYP
exits and failing PSCI_CPU_OFF.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 13 Jul 2015 07:06:54 +0000 (09:06 +0200)]
arm: Unmap virtual GIC on cell destruction
This fixes a leak on cell destruction because we left the GICv2 mapped,
thus didn't free all paging structures. This also means we need to run
the irqchip cleanup before the cell MMU destruction.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 13 Jul 2015 07:01:51 +0000 (09:01 +0200)]
arm: Account for errors during irqchip cell_init
The cell_init callback of GICv2 should report the result of the mapping
request, thus needs a channel to return an error code. Extend the call
chain accordingly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 13 Jul 2015 06:40:52 +0000 (08:40 +0200)]
arm: Fix arm_page_table_empty
The size of a pt_entry_t is a reference to an entry, not the entry type
itself. So we were calculating with an entry size of 4 instead of 8,
overrunning the table during empty checks. This specifically caused
page leakages during cell destruction.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 12 Jul 2015 08:25:21 +0000 (10:25 +0200)]
arm: smp: Concentrate non-PSCI logic in Versatile Express module
We only keep the non-PSCI CPU hotplug support around for the sake of
old Versatile Express boards/models. No new boards will be accepted that
do not support the PSCI standard. Therefore, concentrate all functions
that were once considered reusable in the smp-vexpress module, folding
them into their only callers.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 12 Jul 2015 08:22:46 +0000 (10:22 +0200)]
arm: Fix coding style of asm blocks
This aligns them with our (kernel) coding style: indent multi-line asm
blocks, end each line with \n\t in multi-line blocks, remove the ending
in single-line statements. No functional changes.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 10 Jul 2015 20:21:43 +0000 (22:21 +0200)]
core: pci: Add support for devices with more than 16 MSI-X vectors
There are PCI devices with way more than 16 MSI-X vectors on the field,
some users reported up to 80. We don't want to increase the statically
allocated MSI-X shadow table that much as it would quickly increase the
memory usage.
Instead, implement an on-demand allocation pattern like we already use
for CPU bitmaps: up to 16 vectors are allocated statically, if more are
needed, allocation switches to a dynamic scheme.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Jul 2015 20:04:28 +0000 (22:04 +0200)]
core: ivshmem: Convert static virt_pci_bar information into constants
There is no need to carry the virt_pci_bar array in each endpoint
structure. The flags field is unused, and the sizes can easily be
expressed as constants - they do not change.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Jul 2015 21:31:27 +0000 (23:31 +0200)]
core: ivshmem: Fix cell disconnection
Move the disconnect call before the potential endpoint copy operation.
Otherwise we risk to update the stale second entry, not the now active
first one.
This change also ensures that disconnect is performed even for the last
endpoint. This will allow us to put cleanup tasks into that function
that have to be executed unconditionally.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Jul 2015 21:14:30 +0000 (23:14 +0200)]
core: ivshmem: Mark BARs as 64-bit again
Regression of 294110a887: Like physical devices fill their bar array
during setup, virtual devices need to do this as well. Namely, the
64-bit flag got lost during migration to generic BAR emulation.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 4 Jul 2015 11:36:53 +0000 (13:36 +0200)]
x86: Prevent usage of MMX, SSE, and AVX by compiler
The compiler may decide to use MMX, SSE or even AVX for copying data or
similar purposes. Prevent this because we neither initialize the related
units nor save/restore their state between the different worlds.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 1 Jul 2015 05:03:20 +0000 (07:03 +0200)]
x86: Embed page for EPT/NPT root_table into cell structure
Both Intel and AMD need this page and currently allocate it
programmatically. We can safe some logic, specifically error handling,
by reserving the page in the cell structure.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
"Get to know" Jailhouse originally appeared in Linux Journal issue
252 (April 2015). As of May 2015, it can be redistributed freely,
so add its slightly updated version to Documentation.
Signed-off-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
We are preparing to import yet another article about Jailhouse, so
it makes sense to have dedicated place to store them. Also, add a
timestamp to article's filename, so one can easily say its
publication date.
Signed-off-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 24 May 2015 08:39:28 +0000 (10:39 +0200)]
configs: Add a linux-x86-demo cell configuration
This demonstrates non-root Linux booting. It is targeting the QEMU
reference setup but can easily be tailored for physical setups as well.
The config contains an ivshmem device to demonstrate both PCI device
discovery and inter-cell communication. Of the four available CPUs in
the QEMU setup, 3 are assigned to the cell to show that SMP works.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 24 May 2015 08:10:22 +0000 (10:10 +0200)]
tools, inmates: Add "cell linux" subcommand to jailhouse tool
This adds support for loading and booting paravirtualized x86 Linux
kernels in non-root cells. The jailhouse tool is extended for this
purpose with a new subcommand "cell linux" that accepts the cell
configuration, the kernel image and an optional initrd as input. Also a
kernel command line can be specified. The script then creates the cell,
unless it already exists, load kernel, initrd, a special boot loader and
the required parameters for that loader into the cell RAM. Finally, it
starts the cell.
The interface between python helper and the boot loader inmate is based
on the kernels boot_params structure with a custom setup_data extension.
The former is initialized by the python help, specifically to inform
Linux about the location of its initrd and the command line. It also
contains an e820 list to report the memory layout. The setup_data is
filled by the boot loader with information about the PM timer address
and the available CPUs as well as their physical APIC IDs. For that
purpose, the Linux cell requires a communication region.
Although the loader script is currently x86-only, extension to ARM is
surely feasible as well.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 24 May 2015 08:00:02 +0000 (10:00 +0200)]
inmates: Add infrastructure for inmates that serve as tools
We will had an x86 inmate that will support the booting of Linux in
non-root cells. This lays the foundation for such tools, including their
installation into $(libexecdir)/jailhouse.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 27 Apr 2015 18:42:13 +0000 (20:42 +0200)]
configs: Extend inmates memory in qemu config
Reduce the hypervisor memory to 6 MB, which is still plenty, so that we
can create more or larger inmates. Reorder and extend the description
accordingly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 2 May 2015 10:40:05 +0000 (12:40 +0200)]
x86: Ignore writes to the xAPIC ID register
Writing to the APIC ID register is legal in xAPIC mode but is ignored by
recent CPU models. Linux performs a write on boot-up, e.g., and ignoring
this is both cheap and helpful to keep para-virtualization needs low.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 1 May 2015 11:00:09 +0000 (13:00 +0200)]
x86: Implement standard hypervisor detection protocol
This provides cpuid-based Jailhouse detection conforming to the protocol
also used by other major hypervisors: set bit 31 of ecx for function
0x01, provide a signature via function 0x40000000 and a so far empty
feature set via function 0x40000001.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 1 May 2015 10:12:27 +0000 (12:12 +0200)]
x86: Always intercept cpuid
Refactor vmx_handle_cpuid to vcpu_handle_cpuid and ensure that both VMX
and SVM use it for emulating guest cpuid invocations. That means SVM has
to intercept it now.
We will need this to reliably indicate the presence of Jailhouse to our
inmates.
CC: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 18 May 2015 06:51:27 +0000 (08:51 +0200)]
core: ivshmem: Use generic BAR emulation
Simplify the code by relying on the PCI core to emulate BAR writes. This
just requires proper settings of the bar_mask fields of ivshmem devices
in configs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 18 May 2015 05:22:09 +0000 (07:22 +0200)]
core: ivshmem: Refactor pci_ivshmem_cfg_write
We can do simpler by passing in the bias-shifted row value to be written
and the access byte-mask. Then pci_ivshmem_cfg_write just needs to
combine the new value with those of the other bytes as needed, and we
can drop all the size-specific dispatching.
This also lays the foundation for reusing generic BAR emulation.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 17 May 2015 09:50:04 +0000 (11:50 +0200)]
core: Add basic BAR write emulation for physical PCI devices
This enables cell to explore the size of PCI device resources by writing
1's to base address registers and then reading back which bits got
modified. We so far didn't support this because Linux in the root cell
already retrieved the sized before Jailhouse ran and other cell could
have been customized to use preconfigured information.
However, adding this features only increases the code by few ten lines
while making life for preexisting inmate OSes, including Linux,
significantly easier. Moreover, we will save some code again when
switching ivshmem's BAR emulation to this version.
Note that this does NOT allow cells to remap PCI device resources in
their address space. That would require more effort with at limited
benefits. Given that we preconfigure all BARs, neither Linux nor other
OSes have a need to change them. Any attempt to do so will simply have
no effect.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 17 May 2015 09:06:50 +0000 (11:06 +0200)]
core, tools: Add BAR masks to jailhouse_pci_device
Add a new field per BAR to the PCI device configuration. It allows to
mask the modifiable part of a BAR before storing writes. This will
support BAR write emulation that is required to make PCI resource sizes
explorable by cells.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 17 May 2015 08:47:33 +0000 (10:47 +0200)]
core: pci: Rework config space header write moderation
Switch to a more powerful array-based write access control for the PCI
config space header. The array consists of tuples, each controlling the
access to one dword row. Access can be denied, permitted or emulated as
read-only, thus ignored. As before, a mask selects the bytes of the row
for which the access type applies.
This new model allows to properly describe which registers of the bridge
header we effectively want to freeze as read-only so that Linux can
rescan buses.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Mon, 18 May 2015 06:49:52 +0000 (08:49 +0200)]
core: ivshmem: Improve error reporting
The warning in ivshmem_update_msix is actually fatal (callers will fail
the CPU when we return an error code), and we need some additional
reporting on MMIO accesses. The latter avoids that we just get a
register dump, no information where the problem was detected.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
An ivshmem PCI device always has a valid ivshmem_endpoint pointer, that
is ensured by ivshmem_connect_cell, called during device initialization.
And there is nothing that invalidates the pointer during device
lifetime. So we can remove related NULL checks.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sun, 17 May 2015 08:39:19 +0000 (10:39 +0200)]
core: pci: Skip architecture hooks on virtual device addition/removal
arch_pci_add_device and arch_pci_remove_device acted as nops for virtual
PCI devices so far, and there is no change in sight. So stop calling the
hooks from pci_add/remove_virtual_device, drop related checks from the
vtd code and rename functions that work on physical devices to clarify
their scope.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 15 May 2015 07:57:41 +0000 (09:57 +0200)]
driver: Prevent disabling when there are offlined CPUs
If Linux has some of the CPUs offlined itself, i.e. not for passing them
to other cells, and we disable the hypervisor then, those CPUs will not
be released. Attempts to online them again later on will fail. Reject
disable requests in such a case.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 14 May 2015 14:26:52 +0000 (16:26 +0200)]
inmates: x86: Add basic SMP support
Under Jailhouse, all the cell CPUs are started in parallel. To enable
SMP inmates, the entry code records their number and their APIC IDs (up
to the current limit of 255). Only the first CPU arriving at the entry
check will call inmate_main, the others are parked in halt state.
Inmates can use the recorded parameters to pick up all CPUs by sending
them regular INIT/SIPI signals. We use the entry path for this case as
well: ap_entry is introduced as an alternative entry function pointer.
If it is non-NULL, the CPU will bypass the SMP startup procedure and
call that function.
The library is extended to provide a boot-up barrier and a single-CPU
wakeup service. It also adds a simple IPI service.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 15 May 2015 06:20:35 +0000 (08:20 +0200)]
x86: Report number of CPUs via communication region
Append a field to the x86-specific part of the communication region to
inform non-root cells about the number of CPUs they can expect to show
up during boot.
We can generalize this when ARM has a need as well, but it's more likely
that it will use device trees instead (which are underdeveloped on x86).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 9 May 2015 14:57:08 +0000 (16:57 +0200)]
inmates: Build library archive and link it implicitly
Kbuild already comes with support for building lib.a archives from a set
of objects. Use this to build inmate libraries for x86, here in 64 and
32-bit form, and for ARM. Link against the correct libraries implicitly
so that the demos no longer have to state their dependencies explicitly.
This will also allow to use the inmate libraries from different folders
than demos because the library objects are now only built once.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 9 May 2015 14:51:34 +0000 (16:51 +0200)]
inmates: Define entry point in linker script
Export the reset address as symbols and define them as entry point of
our inmates in the linker scripts. We will bundle the headers together
with the other library objects in archives, and defining entry points
will ensure that the related sections will be included in the final
binary. This will simplify the inmate rules significantly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 15 May 2015 06:09:40 +0000 (08:09 +0200)]
x86: Fix and clean up APIC ICR write handling
apic_handle_icr_write so far expects the hi_val in the format that
corresponds to the APIC mode in use. Internally, it then normalizes it
into x2APIC-mode format. That's complicating the usage and actually
enabled the bug that x2apic_handle_write did not convert the
cell-provided value into the required format.
Simplify and fix things by changing the API of apic_handle_icr_write to
accept the destination only in x2APIC format. That's much easier because
both callers can hard-code the conversion (none or shift by 24 bits) as
they know the input format.
The only side effect is that apic_send_ipi will now report errors with
ICR.hi always in x2APIC format, independent of the delivery path.
Probably even an advantage.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Fri, 15 May 2015 05:57:29 +0000 (07:57 +0200)]
x86: Flush pending events when reprogramming the VT-d error interrupt
There seems to be the risk of in-flight error events still using the
address and data registers while we reprogram them. In practice, this
shouldn't happen on a correctly configured system because all valid
interrupt sources are silenced at this point. Nevertheless, play safe,
just like Linux does.
However, there is no reason to also read back after unmasking (like
Linux does) because the hardware injects pending events when the mask is
cleared.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Wed, 13 May 2015 17:22:24 +0000 (19:22 +0200)]
tools: config-create: Fix IOMMU unit number of IOAPICs
IOAPICs under the control of IOMMUs with unit number >= 1 were not
described correctly in the generated configs due to a stupid naming
mistake that Python cannot report.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 9 May 2015 06:00:41 +0000 (08:00 +0200)]
arm: Clean up hypervisor stage 1 memory attributes
Of the many attributes defined, some probably wrong, only 3 are actually
used: normal memory, device and non-cacheable. Validate those and drop
the rest. We can re-add more as needed.
See ARM ARM B4.1.104.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Sat, 9 May 2015 05:54:53 +0000 (07:54 +0200)]
arm: Fix stage 2 memory attributes
The definition of memory attributes for stage 2 translations was wrong.
This attributes consist only of 4 bits, but the defines covered 8. Set
the proper values for those two types we use: normal memory and devices.
See ARM ARM B3.6.2 and B3.8.5 for details.
This fixes the enforcement of read-only or write-only cell memory
regions.
Reported-and-tested-by: Philipp Rosenberger <ilu@linutronix.de> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Jan Kiszka [Thu, 7 May 2015 17:27:12 +0000 (19:27 +0200)]
core: Disable non-root PCI devices on shutdown
We already disable PCI devices that are removed when a cell is
destroyed but we should also do this on hypervisor shutdown to avoid
that those device later on annoy Linux with unexpected activities.
The change is bigger as it re-indents the shutdown loop to maintain
readability.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>