Gap Buffers

I recently mentioned a personal project I’m working on to create a text editor, which I’m calling nabu. Tonight, I thought I’d take some time off from writing code to talk a bit about the data structure I’m using for the nabu character buffer: the gap buffer.

At its core, a character buffer is just a mutable list of characters. A list is an abstract data structure that holds an indexed set of elements. A mutable list can be modified, so you can replace any element with a new element, as well as add and remove elements at any index. A key feature of a list is random access, meaning you can access any element from the set by using its index.

There are lots of ways to implement a list data structure. The most straight forward is an array list, where a sequence of consecutive memory locations are used to store the elements, one element per location. An array list is really simple, and really memory efficient. It’s really fast to iterate over, great for random access, and modifying existing elements is extremely quick. However, inserting and deleting elements is generally quite slow. In the worst case for an insert, you need to reallocate the entire buffer (in order to get enough space for the new element), which involves copying every existing element to the new location, and then copy every element again to move them out of the way for the new element. Deletion is similar, though doesn’t usually require reallocation.

Another implementation option is a linked list, where the elements are stored in a set of nodes. Each node stores a single element as well as a pointer (reference) to the next node in the list. Linked lists are pretty efficient at element modification, insertion, and deletion. They’re relatively quick at iterating, but terrible at random access (in its simplest implementation, a linked list requires iterating all the way from the beginning to the element in question). They can also require a significant amount of memory overhead; in the case of a character buffer, a linked list can easily double the memory requirements, or worse (e.g., adding a 32-bit pointer for each 32-bit Unicode character).

Insertion and deletion are critically important for a text editor, so an array list would be a costly choice of implementation. A linked list would be better in this regard, but the memory overhead would be significant, as would the slowness of random access (imagine moving the cursor from the beginning of a document to the end by iterating one character at a time).

A third way to implement a list is called a gap buffer. Structurally, it’s very much like an array list but with a simple modification that makes insertion and deletion much cheaper. For this reason, gap buffers are commonly found in text editor implementations, including my own implementation in nabu.

So what is a gap buffer? Simply put, it’s an array list with a gap in it. Imagine you have an array list but you know you’re going to be appending some elements to it. You can pre-allocate additional space at the end of the array for the future elements. Conceptually, the buffer now has a gap at the end that can be filled in with added elements.

Continue reading “Gap Buffers”

Gap Buffers

Allocation in Gap Buffers

I (very recently) posted about the gap buffer data structure I’m using for nabu, my home brewed text editor project. In this post, I’m going to talk a little about how to manage allocating the gap in the gap buffer.

On first blush, you might think, Memory is cheap, just allocate something you aren’t likely to exhaust and be done with it. A large gap definitely makes insert operations better. The bigger it is, the less often you will fill it, and the less often you need to reallocate and copy all the trailing text out of the way of the newly enlarged gap.

However, there is a conflicting operation that prefers smaller gaps: moving the caret large distances If you’re moving the caret by one character, forward for backward, it’s independent of the size of the buffer. If you’re moving the caret by a handful of characters forward or backward, you can just loop and do the single-character move operation multiple times. However, if you’re trying to move the caret a significant distance, this looping can be costly, and you’ll need to perform the jump operation.

The jump operation places the caret at any location in the buffer by moving the gap. This is done by copying all the characters between the current location and the new location forward or backward in order to fill in the old gap and expose the new gap. The important point here is that you also need to copy all the characters that are about to be overlayed by the new gap, so they aren’t lost. That means the bigger the gap, the more costly this operation is.

So how do you decide between allocating a large gap to make insertions better, and allocating a small gap to make jumps better?

One idea I have is to use a simple adaptive heuristic to determine how much space to add to the buffer each time you need to. The heuristic has two pieces: one that advocates on behalf of inserts to keep the gap large, and one that advocates for jumps to keep the gap small.

The first piece works as follows: anytime you need to allocate more space to expand the buffer, it means you didn’t allocate enough the last time, so this time, allocate more.

The second works for the jump: anytime you have a non-empty gap when you perform a jump, it means you allocated too much space last time, so next time allocate less.

This is managed with a single integer field, called growSize. It starts out at some reasonable value, say 20. Any time you have to grow the gap, increase growSize by some amount, say fifty percent, and then allocate that much more additional space for the gap. Any time you do a jump, reduce the growSize by some fraction of the current size of the gap.

Over time, this will hopefully dial in to a pretty good value, based on the behavior of the particular writer. If they jump around a lot, then growSize will tend to be pretty small. If they tend to insert long stretches of text without moving around much, it will tend to be large. I haven’t actually tried this out, and it will probably need to be tuned to get good behavior, but the idea sounds interesting enough to pursue a bit.

Allocation in Gap Buffers

Dead Simple Observables in Python

This is a really quick python module I banged out one evening while watching “Astronaut Wives”. It implements some basics of observables and reactive programming principles. It’s not meant as a serious library for you to use, more of a way to show how simple observables really are.

For a great introduction to reactive programming, check out The introduction to Reactive Programming you’ve been missing, by Andre Staltz.

If you want an observable library to actually use, you can check out Reactive-Extensions, including RxPy for python.


Why wouldn’t I create my own text editor?

I love Vim, except now that I’ve used Atom, I hate Vim. Not because I like Atom, I hate Atom, too. But it’s so much prettier than Vim!

So I’m no longer satisfied with any text editor that I’ve tried. So obviously, I’ll just create my own.

Continue reading “Why wouldn’t I create my own text editor?”