Clock_gettime is a function that, as its name suggests, gives the time.
clock_gettime
has a
VDSO
implementation on x86 architectures.
VDSO is a shared memory segment between the kernel and each user application. It allows the kernel to export functions to userland so that userspace processes can use them without the overhead of a system call.
clock_gettime() requires two arguments, first one being the wanted clock id, and the second one being a pointer to a
struct timespec variable in which the values will be stored.
Struct timespec is simply a structure that contains two fields,
tv_sec for seconds, and
tv_nsec for nanoseconds:
struct timespec {
__kernel_time_t tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};
Note: The main focus of this blog post will be around clock ids CLOCK_MONOTONIC and CLOCK_REALTIME as these are the clocks that the LTTng tracer uses for userspace tracing to put a timestamp on recorded events.
clock_gettime() is relative to a certain time reference, ie. some specific event in the past. The main difference on Linux between
CLOCK_MONOTONIC and
CLOCK_REALTIME is this reference.
CLOCK_REALTIME gives the "real time" as in the wall clock time, or the time on your watch. Its time reference is the epoch which is defined to be the first of January 1970. If I call:
clock_gettime(CLOCK_REALTIME, &ts);
at the time I am writing this post, the returned values are the following:
ts.tv_sec = 1383065479, ts.tv_nsec = 750367192.
If we take the number of seconds and convert it to years (dividing it by 3600, then 24, then 365.25), we get 43.82. This means that 43.82 years have elapsed since the epoch up until the moment I called
clock_gettime(CLOCK_REALTIME, &ts). This also means that if I manually change the clock (or the date) of my system, this change will have repercussions on the value returned by
clock_gettime(CLOCK_REALTIME, &ts).Note that this is also true for time changes made by
NTP. Thus, the time given by the
CLOCK_REALTIME clock is not ~monotonic~, as it is not necessarily monotonically increasing in time, and can go backwards and forwards.
This helps us introduce the other clock id,
CLOCK_MONOTONIC. This clock is, as you could have guessed, updated in a strictly monotonic fashion. In other words, consecutive reads of this clock unconditionally give ascending values; this clock can not go back in time, even if the clock of my system is changed. The time reference to which it relatively gives the time to is the boot time of the system. Note that this is specific to Linux, and not to all
POSIX systems. The time returned by
clock_gettime(CLOCK_MONOTONIC, &ts) is the elapsed time since the system boot. If I call:
clock_gettime(CLOCK_MONOTONIC, &ts);
I get the following values:
ts.tv_sec = 103941, ts.tv_nsec = 959414826
Meaning that my (Linux) system has booted 103941/3600 = 28.8 hours ago. We can clearly see why this time reference guarantees monotonicity. The elapsed time since boot is independent from the wall clock time. If I change the clock of my system, the value given by the
CLOCK_MONOTONIC clock is still relative to the boot time, which still hasn't changed.
As you can see,
CLOCK_MONOTONIC is better for ordering events during the lifetime of a session, whereas
CLOCK_REALTIME is better when an absolute time is needed. LTTng uses the monotonic clock to assign a timestamp to the recorded events in a trace. However, since it is more useful to have an actual wall clock time, LTTng stores the difference between
CLOCK_REALTIME and
CLOCK_MONOTONIC at the beginning of the tracing in a metadata file. When LTTng is done tracing, a conversion from boot time to absolute time can be made by adding that value to all recorded timestamps.
Now let's take a look at the source code of the
VDSO implementation of
clock_gettime(), in file
arch/x86/vdso/vclock_gettime.c from the kernel source tree:
notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
{
int ret = VCLOCK_NONE;
switch (clock) {
case CLOCK_REALTIME:
ret = do_realtime(ts);
break;
case CLOCK_MONOTONIC:
ret = do_monotonic(ts);
break;
case CLOCK_REALTIME_COARSE:
return do_realtime_coarse(ts);
case CLOCK_MONOTONIC_COARSE:
return do_monotonic_coarse(ts);
}
if (ret == VCLOCK_NONE)
return vdso_fallback_gettime(clock, ts);
return 0;
}
This code snippet simply calls the time function corresponding to the requested clock id. Assuming we asked for
CLOCK_MONOTONIC, let's take a look at the
do_monotonic() function, from the same file:
notrace static int do_monotonic(struct timespec *ts)
{
unsigned long seq;
u64 ns;
int mode;
ts->tv_nsec = 0;
do {
seq = read_seqcount_begin(>od->seq);
mode = gtod->clock.vclock_mode;
ts->tv_sec = gtod->monotonic_time_sec;
ns = gtod->monotonic_time_snsec;
ns += vgetsns(&mode);
ns >>= gtod->clock.shift;
} while (unlikely(read_seqcount_retry(>od->seq, seq)));
timespec_add_ns(ts, ns);
return mode;
}
As you can see, all this function does is to "fill" the
ts structure that was given as a parameter with the current values of
tv_sec and
tv_nsec. The do-while loop is simply a synchronization scheme and can be ignored for now.
ts->tv_sec is set to
gtod->monotonic_time_sec while
ts->tv_nsec is set to
gtod->monotonic_time_snsec plus the returned value of
vgetsns(), for finer granularity.
gtod is simply a structure that acts as a replacement for the actual values kept in the kernel, that userspace processes can't access. Therefore, the values in
gtod have to get updated regularly. This update happens in
update_vsyscall(struct timekeeper *tk), from file arch/x86/kernel/vsyscall_64.c:
void update_vsyscall(struct timekeeper *tk)
{
struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;
write_seqcount_begin(&vdata->seq);
/* copy vsyscall data */
[...]
vdata->monotonic_time_sec = tk->xtime_sec // (1)
+ tk->wall_to_monotonic.tv_sec;
vdata->monotonic_time_snsec = tk->xtime_nsec // (2)
+ (tk->wall_to_monotonic.tv_nsec
<< tk->shift);
while (vdata->monotonic_time_snsec >=
(((u64)NSEC_PER_SEC) << tk->shift)) {
vdata->monotonic_time_snsec -=
((u64)NSEC_PER_SEC) << tk->shift;
vdata->monotonic_time_sec++;
}
[...]
write_seqcount_end(&vdata->seq);
}
In (1),
monotonic_time_sec is set, and in 2,
monotonic_time_snsec is set. These are the values that are "exported" to userland, via the
vsyscall_gtod_data structure. By digging a little more in the kernel source, we can have an idea at how and when is this structure is updated.
Depending on the frequency of "ticks" - see
CONFIG_HZ
Hardware timer interrupt (generated by the Programmable Interrupt Timer - PIT)
-> tick_periodic();
-> do_timer(1);
-> update_wall_time();
-> timekeeping_update(tk, false);
-> update_vsyscall(tk);
Or, (on tickless kernels - see
CONFIG_NO_HZ):
smp_apic_timer_interrupt()
-> irq_enter()
-> tick_check_idle()
-> tick_check_nohz()
-> tick_nohz_update_jiffies()
-> tick_do_update_jiffies64()
-> do_timer(ticks) // ex: ticks = 1344
-> update_wall_time();
-> timekeeping_update(tk, false);
-> update_vsyscall(tk);
So, to sum things up:
clock_gettime() gives some values that are updated regurarly, plus an interpolation to give better precision for the nanoseconds value. How regurarly are these values updated? Simply upon timer interrupts.