// Patrick Louis

Unix system calls

System calls

(Transcript of the podcast)

The most common Unix system calls. A quick overview.

Intro

System calls are one subject that scares many people. Actually most of the low level stuffs happening on the operating system scares a lot of people. I admit, I was a bit afraid of dealing with this subject. Not because it’s hard or anything but because it’s something that we’re not used to dealing with everyday, it’s like a hidden magic spell.

I was also afraid of dealing with this subject because I thought I could make mistakes while explaining it and giving other people false assumptions about the mechanism of their Unix operating systems.

But that’s ok… We’ll explain everything slowly.

In this podcast we’re discussing system calls on Unix operating systems, it’s gonna be a quick overview of what’s happening there. If you’re someone that hardly know anything about them then it’s the episode you need to listen to.

Here we go.

I’m venam and you’re listening to the nixers podcast!

What’s a system call

What it is

Let’s go over the definition of what is a system call.

A system call is a way to request a service from the kernel of the OS from the userland. It’s the interface that sits between processes running on the machine and the operating system.

The services offered by the kernel vary from one OS to another, but as we’ll see there’s something that sticks and stays coherent between all the Unix-like operating systems.

Most importantly the system calls are an abstration layer between the hardware and the user-space. It’s the same system calls for different hardware architecture, which means you don’t have to change anything to your program in user-space for it to be portable from cpu brand to brand, it’s ubiquitous.

It also generalizes functions across programming languages, any programming language can access the system calls. For the programmer it’s just another function to call.

In this definition, we’ve mentioned a bunch of reasons why systems calls are useful but what else can we say about why we have them.

Couldn’t this have been done another way?

Why?

So why do we have an interface between the OS, the processes, and the hardware?

Let’s review our definition of what an operating system kernel is:

A kernel is the operating system software running in protected mode and having access to the hardware’s privileged registers. The kernel is not a separate process running on the system. It is the guts of the operating system, which controls the scheduling of processes to achieve multitasking, and provides a set of routines, constantly in memory, to which every user-space process has access.

So the kernel is always “In-memory” and scheduling processes for multi-tasking, and it provides functions. But what’s that “protected mode”?

Most CPUs, most processors, have a security model built-in or also called CPU modes. The common one is the rings model which specifies multiple privileges levels at which a software can be executed. The kernel is executed in unrestricted access mode, it can do anything allowed by the cpu, read any part of memory, etc… All other programs run in a layer above. They are limited to their own address space and can’t mess up the harware devices, they are limited to their level of access to resources. They are prevented from it at a hardware level.

That is what is meant by protected mode and this is a must in any multi-tasking operating system.

The concept of rings protection was introduced in Multics, an OS which highly influenced the development of Unix.

This concept is core to Unix with its preemptive multitasking, where the cpu clock interrupts rapidly and routinely between processes, switching control from one to the other.

So what does this have to do with system calls, you might ask.

Programs need to access devices and components otherwise nothing would work.

That’s where system calls enter the scene.

They provide well defined and safe implementations for those operations. The OS handles the highest level of privileges and allow applications to request access to them via system calls.

The system call initiate an interrupt or also called “trap” which puts the CPU into elevated privilege and then it passes control to the kernel which handles arguments and determines if the call should be done or not. It then does its thing and return to the normal privilege level and pass back the control to the calling program.

This is similar to multitasking where the cpu switches control using interrupts, as we’ve said.

Couldn’t this have been done another way? Instead of having a central unit that controls the access to the important parts of the system and hardwares, a part that is there for us not to mess up our machine.

Well, no current operating systems do. There is however a concept called an exokernel where the operating system doesn’t offer a general abstraction of the resources but forces the application developers to make decisions about those hardware abstractions instead of the kernel.

That is in opposition with a microkernel and a monolithic architecture. The monolithic architecture is the most common among Unix-like operating systems.

So, then system calls are there as a way to protect you from yourself destroying your machine.

That was the why. Now let’s take a look at those system calls, from an outer point of view and inner point of view.

What it looks like to the programmer - Library as middle man

As we said, system calls are like a library or API that sits between normal programs and the kernel.

On Unix-like systems, that API is usually part of an implementation of the C library (libc), such as glibc, that provides wrapper functions for the system calls, often named the same as the system calls they invoke.

Those system calls can be implemented across programming languages (partly because other programming languages have a lower C layer but they can also be done directly in the language if it has assembly facilities) and will look to the programmer just like another function. But in actuality, the code for the function is contained within the kernel.

That C library wrapper, other than exposing an ordinary function to the outside world, is made in a way that is modular and portable.

It’s not actually the C library but the assembly code that is implemented in it.

It works this way: The wrapper places the arguments to be passed to the system call in the appropriate register in the appropriate way and also sets a unique system call number for the kernel to call.

This way the API is portable, those unique system call numbers are stable.

So the call to the library function itself does not cause a switch to kernel mode directly, it’s when this part of the code with the code or number of the system call that is sent to the kernel that it’s executed.

This is highly implementation and platform dependent unlike the number assigned to the system call itself at that level above.

This level is called the application binary interface and it unstable, it changes through time. However, the name of the system calls don’t, like we said: They are an abstraction.

Super little Details

At the low level, there are differences between the Unix-like OS in the way the system calls are managed and received by the kernel.

The Linux and BSDs both need to have them written in assembly.

FreeBSD supports both the BSD style of system calls and the Linux style.

In the BSD world they use the C calling convention, also known as cdecl, which stands for C declaration. A declaration that originates from the C programming language.

That means that any program written in any language can access the kernel, as long as they can understand C functions.

The kernel is access using int 80h, also both on Linux and the BSDs.

interrupt vector 0x80

Specifically the convention for Free|Open|Net|DragonFly]BSD UNIX System Calls is that they are done by passing the parameters by pushing them to the stack and then doing the int $0x80 instruction.

kernel:
	int	80h	; Call kernel
	ret

open:
	push	dword mode
	push	dword flags
	push	dword path
	mov	eax, 5
	call	kernel
	add	esp, byte 12
	ret

On Linux the difference is in the way the parameters are passed to the system call. The parameters, however, are not passed on the stack but in EBX, ECX, EDX, ESI, EDI, EBP: are used for passing 6 parameters to system calls. The return value is in %eax. All other registers (including EFLAGS) are preserved across the int 0x80. So, ABCD, registers, and they’re filled in the order of the CPU endian.

open:
	mov	eax, 5
	mov	ebx, path
	mov	ecx, flags
	mov	edx, mode
	int	80h

For both Linux and BSDs the system call number is passed by filling the %eax register.

And more generally speaking the arguments are filled just before that but in different ways.

Let’s note that FreeBSD gives you the choice to use the Linux way of doing system calls only if the kernel has Linux emulation installed. Moreover you need to specify that a program is branded Linux. You do that using the brandelf tool:

% brandelf -t Linux filename

As a note here, if you want to write any program in assembly it’ll always come down to this: You wanna interact with your system so you’re gonna be doing the usual jumps and loops but other than that it’s all just about filling registers to do the system calls and managing memory.

Let’s now discuss more about the CPUs.

CPU - The underlying principles

As we’ve said, system calls in most Unix-like systems are processed in kernel mode which is done by changing the processor execution mode to a more privileged one. This, however, does not mean that there’s gonna be a process or context switch, it’s not a switch of process, it’s just a temporary delegation to the kernel while the calling process is waiting for the response.

We’ve mentioned that earlier.

But what does happen when it’s running in a multithreaded application. As we know, threads in the Unix world are small processes with their own IDs.

There are many ways to handle this situation. Most Unix-like OS use the one-to-one model, which means that every threads get attached to a distinct kernel-level thread during the system call. This solves the issue of blocking system calls.

Let’s mention that there are other ways to do that such as:

Many-to-one model: All system calls from any user thread in a process are handled by a single kernel-level thread. Which means every thread has to wait for the other to finish

Many-to-many model: In this model a pool of user threads is mapped to a pool of kernel threads. All system calls from a user thread pool are handled by the threads in their corresponding kernel thread pool

Hybrid model: This model implements both many to many and one to one model depending upon choice made by the kernel. This is found in old versions of IRIX, HP-UX and Solaris.

Let’s go back to the CPU.

Different architectures give out different facilities.

One of them for example is found in the x86 instruction set that contains the SYSCALL/SYSRET and SYSENTER/SYSEXIT, both implemented by AMD and Intel vendors. Those are fast control transfer instruction designed to quickly transfer control to the kernel for a system call without the overhead of an interrupt

Another one of those nifty mechanism is the old x86 call gate, which allows programs to directly call a kernel function using a safe control transfer mechanism.

Let’s talk about real examples of system calls and what are the ones available on most Unix.

Examples

Let’s read a little excerpt

On Unix, Unix-like and other POSIX-compliant operating systems, popular system calls are open, read, write, close, wait, exec, fork, exit, and kill. Many modern operating systems have hundreds of system calls. For example, Linux and OpenBSD each have over 300 different calls, NetBSD has close to 500, FreeBSD has over 500, while Plan 9 has 51.

POSIX… Wait, we haven’t mentioned POSIX yet.

What’s that thing? Let’s go!

POSIX?

What’s that POSIX thing? Posix stands for Portable Operating System Interface and it’s a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems.

POSIX standards are about the API of an OS and the command line shells and utility interfaces of Unix-like OS.

So, it’s a standard that is there for the compatibility between Unix-like OS but as with all standards the list of points to follow is huge and most just partly fullfil it and that’s not really an issue as long as they take the most important bits and pieces.

All the modern and most popular Unix-like OS are only partly adhering to it. Only OSX amongst the “new team” is fully compliant.

Other than that you have AIX, HP-UX, IRIX, Solaris, Tru64, UnixWare, QNX, Neutrino

Weirdly enough those are all mostly closed source operating systems which is suspiciously annoying.

But not being fully compliant doesn’t mean that the system isn’t Unix-like.

With Unix?

There’s another standard called the Single UNIX specification.

It shows if a system can be compliant to be qualified as a “UNIX” trademark. Again a very commercial way of seeing what Unix really is.

And very few BSD and Linux-based operating systems are submitted for compliance with the Single UNIX Specification. Also, again, only closed source Unix-like OSs adhere to it.

Unix is more about the philosophy and the way of working with this multitasking OS, taking the spirit back from the Bell Labs.

Well, what does that all have to do with system calls? It turns out that those standards have a set of functions that are sometimes implemented as system calls.

Categories of syscalls

POSIX calls can be implemented in the standard libarary or as system call.

It is a specification and does not “know” about syscalls which, in the POSIX view, are an implementation detail.

Nothing mandates the way they are implemented. They can even be implemented in non-Unix like OS.

To know which one are system calls you need to see which one overlaps with them.

Let’s talk about POSIX.

POSIX is divided in two parts: The system interfaces, and the commands and utilities

We’re not interested in the commands and utilities and only interested in the system interface, and only if those are also system calls.

There are 5 main categories of system calls:

  • Process Control
  • File management
  • Device Management
  • Information Maintenance
  • Communication

They overlap with some of the features in POSIX. Such as process creation and control in POSIX overlaps with process control, Clock and timers in POSIX overlaps with information maintenance.

What is to know here is that there are way more POSIX specifications than would be needed for system calls. So there’s more of a chance that a POSIX specification would not be a system call.

The list of POSIX specifications are quite extensive. Ranging from thread creation, managing shared memory, pipes, timers, bus error, signals, etc..

You can read more about those from the links in the show notes.

POSIX.1: Core Services (incorporates Standard ANSI C) (IEEE Std 1003.1-1988)
	Process Creation and Control
	Signals
	Floating Point Exceptions
	Segmentation / Memory Violations
	Illegal Instructions
	Bus Errors
	Timers
	File and Directory Operations
	Pipes
	C Library (Standard C)
	I/O Port Interface and Control
	Process Triggers

POSIX.1b: Real-time extensions (IEEE Std 1003.1b-1993, later appearing as librt—the Realtime Extensions library)[8])
	Priority Scheduling
	Real-Time Signals
	Clocks and Timers
	Semaphores
	Message Passing
	Shared Memory
	Asynchronous and Synchronous I/O
	Memory Locking Interface

POSIX.1c: Threads extensions (IEEE Std 1003.1c-1995)
	Thread Creation, Control, and Cleanup
	Thread Scheduling
	Thread Synchronization
	Signal Handling

The most common ones

To find the most common let’s do something crazy. Let’s check the source of openbsd, netbsd, linux, and freebsd, and list the common ones, we can even know if they are in POSIX.

That will answer if the common system calls are all POSIX or if there are some exceptions.

There are 136 common system calls betwen openbsd, netbsd, linux and freebsd.

They are the following:

accept
access
acct
bind
chdir
chmod
chown
chroot
clock_getres
clock_gettime
clock_settime
close
connect
dup
dup2
execve
exit
faccessat
fchdir
fchmod
fchmodat
fchown
fchownat
fcntl
flock
fork
fstat
fsync
ftruncate
getdents
getegid
geteuid
getgid
getgroups
getitimer
getpeername
getpgid
getpgrp
getpid
getppid
getpriority
getrlimit
getrusage
getsid
getsockname
getsockopt
gettimeofday
getuid
ioctl
kill
lchown
link
linkat
listen
lseek
lstat
madvise
mincore
mkdir
mkdirat
mknod
mknodat
mlock
mlockall
mmap
mount
mprotect
msgctl
msgget
msgrcv
msgsnd
msync
munlock
munlockall
munmap
nanosleep
open
openat
pipe2
poll
preadv
ptrace
pwritev
read
readlink
readlinkat
readv
reboot
recvfrom
recvmsg
rename
renameat
rmdir
sched_yield
select
semget
semop
sendmsg
sendto
setgid
setgroups
setitimer
setpgid
setpriority
setregid
setreuid
setrlimit
setsid
setsockopt
settimeofday
setuid
shmat
shmctl
shmdt
shmget
shutdown
sigaltstack
sigpending
sigprocmask
sigsuspend
socket
socketpair
stat
symlink
symlinkat
sync
truncate
umask
unlink
unlinkat
utimensat
utimes
vfork
wait4
write
writev

Only 5 aren’t POSIX:

flock ioctl mount reboot wait4

But overall they’re mostly POSIX, 97% of the time when they are common with other OSs.


Categories a system call can be part of:

Process Control
	load
	execute
	end, abort
	create process (for example, fork on Unix-like systems, or NtCreateProcess in the Windows NT Native API)
	terminate process
	get/set process attributes
	wait for time, wait event, signal event
	allocate, free memory
File management
	create file, delete file
	open, close
	read, write, reposition
	get/set file attributes
Device Management
	request device, release device
	read, write, reposition
	get/set device attributes
	logically attach or detach devices
Information Maintenance
	get/set time or date
	get/set system data
	get/set process, file, or device attributes
Communication
	create, delete communication connection
	send, receive messages
	transfer status information
	attach or detach remote devices

Tips and tools

man syscalls Check the source

Tools such as ktrace (BSD), strace (Linux), DTrace (Solaris), and truss allow a process to execute from start and report all system calls the process invokes, or can attach to an already running process and intercept any system call made by said process if the operation does not violate the permissions of the user. This special ability of the program is usually also implemented with a system call, e.g. strace is implemented with ptrace or system calls on files in procfs.


References:




If you want to have a more in depth discussion I'm always available by email or irc. We can discuss and argue about what you like and dislike, about new ideas to consider, opinions, etc..
If you don't feel like "having a discussion" or are intimidated by emails then you can simply say something small in the comment sections below and/or share it with your friends.