I recently mentioned a personal project I’m working on to create a text editor, which I’m calling nabu. Tonight, I thought I’d take some time off from writing code to talk a bit about the data structure I’m using for the nabu character buffer: the gap buffer.
At its core, a character buffer is just a mutable list of characters. A list is an abstract data structure that holds an indexed set of elements. A mutable list can be modified, so you can replace any element with a new element, as well as add and remove elements at any index. A key feature of a list is random access, meaning you can access any element from the set by using its index.
There are lots of ways to implement a list data structure. The most straight forward is an array list, where a sequence of consecutive memory locations are used to store the elements, one element per location. An array list is really simple, and really memory efficient. It’s really fast to iterate over, great for random access, and modifying existing elements is extremely quick. However, inserting and deleting elements is generally quite slow. In the worst case for an insert, you need to reallocate the entire buffer (in order to get enough space for the new element), which involves copying every existing element to the new location, and then copy every element again to move them out of the way for the new element. Deletion is similar, though doesn’t usually require reallocation.
Another implementation option is a linked list, where the elements are stored in a set of nodes. Each node stores a single element as well as a pointer (reference) to the next node in the list. Linked lists are pretty efficient at element modification, insertion, and deletion. They’re relatively quick at iterating, but terrible at random access (in its simplest implementation, a linked list requires iterating all the way from the beginning to the element in question). They can also require a significant amount of memory overhead; in the case of a character buffer, a linked list can easily double the memory requirements, or worse (e.g., adding a 32-bit pointer for each 32-bit Unicode character).
Insertion and deletion are critically important for a text editor, so an array list would be a costly choice of implementation. A linked list would be better in this regard, but the memory overhead would be significant, as would the slowness of random access (imagine moving the cursor from the beginning of a document to the end by iterating one character at a time).
A third way to implement a list is called a gap buffer. Structurally, it’s very much like an array list but with a simple modification that makes insertion and deletion much cheaper. For this reason, gap buffers are commonly found in text editor implementations, including my own implementation in nabu.
So what is a gap buffer? Simply put, it’s an array list with a gap in it. Imagine you have an array list but you know you’re going to be appending some elements to it. You can pre-allocate additional space at the end of the array for the future elements. Conceptually, the buffer now has a gap at the end that can be filled in with added elements.