Sunday, June 30, 2013

Detecting virtualization: How to know if you're running inside a virtual machine

As you may know, the Linux kernel has some paravirtualization routines and setups to collaborate with the hypervisor in case it is running inside a virtual machine (VM). The paravirtualization components can be modules to be loaded (for instance, the virtio family) or the the initialization of an interface for paravirtualization routines (more on that in a future blog post). But how does the Linux kernel knows when it is running inside a VM to prepare the ground for paravirtualization?

This blog post is mostly dedicated to virtualization using KVM (Kernel-based virtual machine) as a hypervisor on x86 platforms, though the detection of virtualization is basically the same for any other hypervisor.

Everything starts in the setup_arch() function, in file arch/x86/kernel/setup.c from the kernel source tree. Here is a snippet of this function:
void __init setup_arch(char **cmdline_p)
  init_hypervisor_platform(); // (1)
  kvm_guest_init(); // (2)
1: Try to detect if the kernel is running inside a VM, and which hypervisor is used.
2: Try to initialize the kernel as a kvm guest. This function won't do anything if the system is not running inside a kvm guest.

Now, let's take a closer look at the function init_hypervisor_platform(), in file arch/x86/kernel/cpu/hypervisor.c :
void __init init_hypervisor_platform(void)
  detect_hypervisor_vendor(); // (3)

  if (!x86_hyper)

  if (x86_hyper->init_platform)
3: detect_hypervisor_vendor() is the function where all the magic happens, let's take a closer look at it (from the same file in the kernel source tree):
static inline void __init
  const struct hypervisor_x86 *h, * const *p;

  for (p = hypervisors; p < hypervisors + ARRAY_SIZE(hypervisors); p++) {
    h = *p;
    if (h->detect()) { // (4)
      x86_hyper = h;
      printk(KERN_INFO "Hypervisor detected: %s\n", h->name);

Here, "hypervisors" is an array of all the hypervisors that the Linux kernel knows about and that can be detected. This array is initialized in the same file and contains hypervisor_x86 structures that have already been created for the following hypervisors: Xen, VMWare, HyperV and KVM. The hypervisor_x86 structure is just a set of function pointers, each of which points to a function within each hypervisor's infrastructure. In (4), the detect() function of each hypervisor is called, until one of them succeeds. For KVM, the detect function pointer points to kvm_detect(). Let's take a look at it (from file arch/x86/kernel/kvm.c) :
static bool __init kvm_detect(void)
  if (!kvm_para_available())
    return false;
  return true;
And into kvm_para_available(). Now this is where the actual detection is done :
static inline bool kvm_para_available(void)
  unsigned int eax, ebx, ecx, edx;
  char signature[13];

  if (boot_cpu_data.cpuid_level < 0)
    return false; /* So we don't blow up on old processors */

  if (cpu_has_hypervisor) {
    cpuid(KVM_CPUID_SIGNATURE, &eax, &ebx, &ecx, &edx); // (5)
    memcpy(signature + 0, &ebx, 4);
    memcpy(signature + 4, &ecx, 4);
    memcpy(signature + 8, &edx, 4);
    signature[12] = 0;

    if (strcmp(signature, "KVMKVMKVM") == 0)
      return true;

  return false;

What this function mainly does is asking for the signature of the CPU, using the cpuid() function, which gives the results in registers eax, ecx and edx. These registers which form the signature are then compared to the string "KVMKVMKVM", which would indicate that KVM is used as a hypervisor in case the comparison matches.
6: From Intel's documentation: Intel 64 and IA-32 Architectures Software Developer Manuals, Volume 3, section 25.1.2 - Instructions That Cause VM Exits Unconditionally: the cpuid instruction is listed as causing a VM exit whenever it is executed in VMX non-root mode. The hypervisor can then trap this instruction and deal with it appropriately.
Note here that it is not the function cpuid() called in (5) that traps, but the assembly instruction that it ends up calling.

Now, let's take a look at an interesting function in the KVM module, do_cpuid_ent(), from file arch/x86/kvm/cpuid.c :
static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
       u32 index, int *nent, int maxnent)
    static const char signature[12] = "KVMKVMKVM\0\0";
    const u32 *sigptr = (const u32 *)signature;
    entry->eax = KVM_CPUID_FEATURES;
    entry->ebx = sigptr[0];
    entry->ecx = sigptr[1];
    entry->edx = sigptr[2];
Note that this code is executed in KVM module, on the host system, unlike to the other code snippets shown earlier, which were executed by the guest kernel directly inside the VM.
The cpuid instruction from the guest is trapped by the hypervisor, giving the execution to the KVM module in the host to handle it. The KVM module then sets "KVMKVMKVM" as a signature for the CPU and gives the execution back to the VM. Finally, the CPU signature that the guest sees matches what the kernel is looking for to conclude that KVM is the hypervisor used, thus running inside a VM.
The same approach is used for other hypervisors, depending on the signature of the CPU. "Microsoft Hv" indicates Microsoft's HyperV, and "VMWareVMWare" indicates VMWare's hypervisor.

1 comment:

  1. Nice blog... This blog share very useful information about linux hypervisor comparison. Thanks for sharing


Registering a probe to a kernel module using Systemtap

I was trying to register a probe on a function of a kernel module of mine using Systemtap. The .stp file was fairly simple: $> cat mymod...