This report was prepared by Nikita Ronja Gillmann as a part of Google Summer of Code 2020
This is my second and final report for the Google Summer of Code project I am working on for NetBSD.
My code can be found at github.com/nikicoon/src in the gsoc2020 branch, at the time of writing some of it is still missing. The test facilities and logs can be found in github.com/teknokatze/gsoc2020. A diff can be found at github which will later be split into several patches before it is sent to QA for merging.
The initial and defined goal of this project was to make system(3) and popen(3) use posix_spawn(3) internally, which had been completed in June. For the second part I was given the task to replace fork+exec calls in our standard shell (sh) in one scenario. Similar to the previous goal we determine through implementation if the initial motivation, to get performance improvements, is correct otherwise we collect metrics for why posix_spawn() in this case should be avoided. This second part meant in practice that I had to add and change code in the kernel, add a new public libc function, and understand shell internals.
Summary of part 1
Prior work: In GSoC 2012 Charles Zhang added the posix_spawn syscall which according to its SF repository at the time (maybe even now, I have not looked very much into comparing all other systems and libcs + kernels) is an in-kernel implementation of posix_spawn which provides performance benefits compared to FreeBSD and other systems which had a userspace implementation (in 2012).
After 1 week of reading POSIX and writing code, 2 weeks of coding and another 1.5 weeks of bugfixes I have successfully implemented posix_spawn in usage in system(3) and popen(3) internally.
The biggest challenge for me was to understand POSIX, to read the standard. I am used to reading more formal books, but I can't remember working with the posix standard directly before.
system(3) was changed to use posix_spawnattr_ (where we used sigaction before) and posix_spawn (which replaced execve + vfork calls).
popen(3) and popenve(3)
Since the popen and popenve implementation in NetBSD's libc use a couple of shared helper functions, I was able to change both functions while keeping the majority of the changes focused on (some of ) the helper functions (pdes_child).
pdes_child, an internal function in popen.c, now takes one more argument (const char *cmd) for the command to pass to posix_spawn which is called in pdes_child.
On a high level what happens in pdes_child() and popen is that we first lock the pidlist_mutex. Then we create a file file action list for all concurrent popen() / popenve() instances and the side of the pipe not necessary, and the move to stdin/stdout. We unlock the pidlist_mutex. Finally we return the list and destroy.
In the new version of this helper function which now handles the majority of what popen/popenve did, we have to initialize a file_actions object which by default contains no file actions for posix_spawn() to perform. Since we have to have error handling and a common return value for the functions calling pdes_child() and deconstruction, we make use of goto in some parts of this function.
The close() and dup2() actions now get replaced by corresponding file_actions syscalls, they are used to specify a series of actions to be performed by a posix_spawn operation.
After this series of actions, we call _readlockenv(), and call posix_spawn with the file_action object and the other arguments to be executed. If it succeeds, we return the pid of the child to popen, otherwise we return -1, in both cases we destroy the file_action object before we proceed.
In popen and popenve our code has been reduced to just the 'pid == -1' branch, everything else happens in pdes_child() now.
After readlockenv we call pdes_child and pass it the command to execute in the posix_spawn'd child process; if pdes_child returns -1 we run the old error handling code. Likewise for popenve.
The outcome of the first part is, that thanks to how we implement posix_spawn in NetBSD we reduced the syscalls being made for popen and system. A full test with proper timing should indicate this, my reading was based on comparing old and new logs with ktrace and kdump.
sh, posix_spawn actions, libc and kernel - Part 2
The main goal of part 2 of this project was to change sh(1) to determine which simple cases of (v)fork + exec I could replace, and to replace them with posix_spawn where it makes sense.
fork needs to create a new address space by cloning the address space, or in the case of vfork update at least some reference counts. posix_spawn can avoid most of this as it creates the new address space from scratch.
The current posix_spawn as defined in POSIX has no good way to do tcsetpgrp, and we found that fish just avoids posix_spawn for foreground processes.
Since, roughly speaking, modern BSDs handle "#!" execution in the kernel (probably since before the 1990s, systems which didn't handle this started to disappear most likely in the mid to late 90s), our main concern so far was in the evalcmd function the default cmd switch case ('NORMALCMD').
After adjusting the function to use posix_spawn, I hit an issue in the execution of the curses application htop where htop would run but input would not be accepted properly (keysequences pressed are visible). In pre-posix_spawn sh, every subprocess that sh (v)forked runs forkchild() to set up the subprocess's environment. With posix_spawn, we need to arrange posix_spawn actions to do the same thing.
The intermediate resolution was to switch FORK_FG processes to fork+exec again. For foreground processes with job control we're in an interactive shell, so the performance benefit is small enough in this case to be negligible. It's really only for shell scripts that it matters.
Next I implemented a posix_spawn file_action, with the prototype
int posix_spawn_file_actions_addtcsetpgrp(posix_spawn_file_actions_t *fa, int fildes)
The kernel part of this was implemented inline in sys/kern/kern_exec.c, in the function handle_posix_spawn_file_actions() for the new case 'FAE_TCSETPGRP'.
The new version of the code is still in testing and debugging phase and at the time of writing not included in my repository (it will be published after Google Summer of Code when I'm done moving).
posix_spawnp kernel implementation
According to a conversation with kre@, the posix_spawnp() implementation we have is just itterating over $PATH calling posix_spawn until it succeeds. For some changes we might want a kernel implementation of posix_spawnp(), as the path search is supposed to happen in the kernel so the file actions are only ever run once:
some of the file actions may be "execute once only", they can't be repeated (eg: handling "set -C; cat foo >file" - file can only be created once, that has to happen before the exec (as the fd needs to be made stdout), and then the exec part of posix_spawn is attempted - if that fails, when it can't find "cat" in $HOME/bin (or whatever is first in $PATH) and we move along to the next entry (maybe /bin doesn't really matter) then the repeated file action fails, as file now exists, and "set -C" demands that we cannot open an already existing file (noclobber mode). It would be nice for this if there were "clean up on failure" actions, but that is likely to be very difficult to get right, and each would need to be attached to a file action, so only those which had been performed would result in cleanup attempts.
Replacing all of fork+exec in sh
Ideally we could replace all of (v)fork + exec with posix_spawn. According to my mentors there is pmap synchronisation as an impact of constructing the vm space from scratch with (v)fork. Less IPIs (inter-processor interrupts) matter for small processes too.
Future directions could involve a posix_spawn action for an arbitrary ioctl.
My thanks go to fellow NetBSD developers for answering questions, most recently kre@ for sharing invaluable sh knowledge, Riastradh and Jörg as the mentors I've interacted with most of the time and for their often in-depth explanations as well as allowing me to ask questions I sometimes felt were too obvious. My friends, for sticking up with my "weird" working schedule. Lastly would like to thank the Google Summer of Code program for continuing through the ongoing pandemic and giving students the chance to work on projects full-time.