Program & Terminal Settings for Text Editing

I’ve been curious about text editors before, in particular about how terminal based text editors work, so I’ve known about the Build Your Own Text Editor tutorial for a while. On looking over the initial steps, it includes things that are worth reiterating or expanding on a little bit out of an interest in C and low-level programming.

It starts with a program to read key presses from the user:

# include <unistd.h>

int main() {
  char c;
  while (read(STDIN_FILENO, &c, 1) == 1);
  return 0;
}
  

Reference

unistd.h

Short for Unix standard, this is a file containing constants and functions that constitute a standardised operating system API, referred to as The Portable Operating System Interface, or POSIX.

STDIN_FILENO

This is a file descriptor, which is an index number of a table containing input/output resources such as files and data streams that are available to the running process. When the process starts, the numbers 0, 1 and 2 will the indexes of the table entries that point to the data streams for input, output and errors. When running a shell program, each of these will by default be the terminal.

read

The read function takes three numbers and returns a fourth.

  • The first is a file descriptor, a number pointing to an input source, in this case the terminal which is being pointed to by the number 0.
  • The second is the memory address at which the bytes that are read will be stored, in this case the memory address of a single character.
  • The third is the number of bytes that should be read at a time, in this case a character being 1 byte of memory means that 1 byte should be read at a time.
  • The fourth is the number of bytes that have been read, here either 1 or 0.

According to the code, 1 character should be read repeatedly until there are no characters left in the input stream, i.e. the terminal.


# include <stdlib.h>
# include <termios.h>
# include <unistd.h>

struct termios orig_termios;

void disableRawMode() {
  tcsetattr(STDIN_FILENO, TCSAFLUSH, &orig_termios);
}

void enableRawMode() {
  struct termios raw;
  atexit(disableRawMode);

  tcgetattr(STDIN_FILENO, &raw);

  raw.c_lflag &= ~(ECHO);

  tcsetattr(STDIN_FILENO, TCSAFLUSH, &raw);
}

int main() {
  enableRawMode();
  
  char c;
  while (read(STDIN_FILENO, &c, 1) == 1 && c != 'q');
  return 0;
}
  

Reference

termios

Short for terminal I/O settings(?), this is a file containing functions for interfacing with the terminal, for the current purpose to change the way the terminal processes input and output.

atexit

From C’s standard library, it registers a function to be called when the process terminates normally (not due to an error or force quit).

tcgetattr

Short for terminal control get attributes, this takes a file descriptor and places the settings for the associated object (always the terminal?) in the struct provided for the second parameter.

c_lflag

This is a binary sequence that holds the terminal’s local modes (presumably as opposed to remote modes although this is not among the other flags). One local mode is ECHO, which will echo back to you the characters you input into the terminal. &= ~(ECHO) is a bitwise operation that flips the relevant bit in the c_lflag sequence to 0 to turn off ECHO.

TCSAFLUSH

This applies the modified terminal settings after all output written to the terminal has been transmitted and discards all unread input to the terminal.


The remaining settings to enable raw mode are as follows:

raw.c_iflag &= ~(BRKINT | ICRNL | INPCK | ISTRIP | IXON);
raw.c_oflag &= ~(OPOST);
raw.c_cflag |= (CS8);
raw.c_lflag &= ~(ECHO | ICANON | IEXTEN | ISIG);

Reference

BRKINT

It isn’t clear from the documentation what is meant by “break condition”. It may mean a breakpoint used to stop a program for inspection, in which case turning off this setting will prevent breakpoints from halting the program.

ICRNL

Changes any carriage returns input into newlines.

INPCK

A parity check is where a 0 or a 1 is added to a sequence of bits to mark whether the numbers of 1s in the uncorrupted sequence is even or odd. If the input recipient finds an incorrect number of 1s given the parity bit it will know that the data has been corrupted during transmission. This is a form of error handling that I suppose isn’t necessary when using the terminal as a text editor since the program input is simply being displayed back to the user.

ISTRIP

This turns off the setting to strip the 8th bit off of an input character. The 8th bit is used for parity checking; since parity checking is switched off it intuitively makes sense to strip off the 8th bit but here it’s being left on.

IXON

This setting allows stopping the transmission of output data. Turning it off makes Ctrl-S available to type without it stopping data transmission.

OPOST

Turning this off disables implementation defined output processing. From my understanding of IEXTEN, implementation defined processing in this context refers to the interpretation of certain characters, which for example can be interpreted as actions to reprint or erase input.

CS8

Sets the number of bits in a character to 8. This is always the case anyway on modern machines but not so historically.

ICANON

Turning this off disables canonical mode. This most critically means that input is made available character by character and not line by line.

IEXTEN

This is the partner of OPOST and turning it off disables implementation defined input processing. This refers to special characters such as ERASE which is interpreted as an action to erase the preceeding input character.

ISIG

Turning this off disables signals to kill or suspend the program.