Asynchronous file I/O on Linux: the epoll API

If you have written even the simplest C/C++ program, you have probably come across file reading/writing, either through classes like ifstream (C++) or by using operating system calls like open and read.

In this blogpost, I will present a more performant and more CPU-efficient way of performing file operations: asynchronous I/O.

What is asynchronous I/O?

Two ways of doing input and output (I/O) operations exist: synchronous and asynchronous.

Synchronous (and blocking) I/O: the executing program waits until the I/O operation is finished.

When using (blocking) synchronous input and output, a program will start an I/O operation (reading or writing) and will wait until this operation is finished. After the operation is finished, the program is able to use the resulting data (in the case of reading) and is able to perform calculations or other I/O operations based on the result of the previous operation.

Asynchronous I/O: starting the operation still takes a bit of time, but the program continues with its other instructions only to be notified later when the I/O operation finishes.

When using asynchronous I/O on the other hand, a program will start an I/O operation and immediately continues running further program instructions, not waiting for the result of the operation. It is the task of the programmer (you!) to handle the resulting data and/or status information once the operation is finished.

The main advantage of using asynchronous I/O is the fact that your program is no longer dependent on and/or waiting for a read or write operation to finish. This independence of the I/O operation enables you to handle user input, perform calculations or start more asynchronous I/O operations. When using blocking, synchronous I/O, these extra operations cannot be done as the program is waiting for the I/O operation to finish.

Implementing asynchronous file I/O on Linux using epoll

A simple way of showing how asynchronous I/O works on Linux is by looking at the example of asynchronously reading from a file.

There are many different ways in which you could implement asynchronous I/O in Linux. In this post, we choose to go the with the epoll functionality. The epoll functionality provides a set of system calls that enables you to track I/O events on file handlers, networking connections, etc.

To start this epoll event listener, simply call the following function to create a list of file handlers to track:

int epoll_list_fd = epoll_create1(0);

This call will create an epoll list, with as return value the file descriptor pointing to this list. This list is empty initially, and we can add file descriptors (handlers representing opened files) to this list to track events (like data being ready to read in) on open files:

// First open a file (for reading).
int file_descriptor = open("myfile.txt", O_RDONLY);

// Add this file descriptor to the list of descriptors to handle events on.
// First, create a struct describing the information of the open file (file descriptor, whether to track input/output, etc.)
struct epoll_event epoll_event;
epoll_event.events = EPOLLIN; // Only poll for input events (reading).
epoll_event.data.fd = file_descriptor;

epoll_ctl(epoll_list_fd, EPOLL_CTL_ADD, file_descriptor, &epoll_event);

Now, simply call the epoll_wait­ function, which will give you a list of all the file descriptors that have events on them. Remember that for asynchronous I/O, we must not block the thread that opens the file. To do this, we run epoll_wait on a separate thread, which will not execute anything (and thus will not use any processor time) until the given timeout of the epoll_wait call expires:

// Create an event buffer, which will be filled with events on opened files.
const size_t EVENT_BUFFER_SIZE = 64;
struct epoll_event epoll_event_buffer[EVENT_BUFFER_SIZE];


while(true)
{
	// Call epoll_wait with a timeout of 1000 milliseconds.
	const int fds_ready = epoll_wait(epoll_list_fd, epoll_event_buffer, EVENT_BUFFER_SIZE, 1000);

	// Function epoll_wait returned.
	if(-1 == fds_ready)
	{
		// An error occured.
		break;
	}

	// Handle any file descriptors with events.
	for(int i = 0; i < fds_ready; i++)
	{
		const struct epoll_event& epoll_event = epoll_event_buffer[i];
		// Handle the event now, by reading in data and printing it.
		const int fd = epoll_event.data.fd;
		const size_t READ_BUFFER_SIZE = 64;
		char read_buffer[READ_BUFFER_SIZE ];
		int bytes_read = read(fd, read_buffer, READ_BUFFER_SIZE -1);
		printf("Bytes read: %d", bytes_read);
		printf("%s", read_buffer);
	}
}

Conclusion

Using epoll on Linux enables us to easily use (high-performant) asynchronous I/O in our programs. The next time you are writing a networking application or reading data from files, consider using asynchronous I/O to increase your program’s performance.

The full source code for an asynchronous file reader, including easy to use classes and functions, can be found at: https://github.com/daankolthof/async-io-linux-filesystem.