I (very recently) posted about the gap buffer data structure I’m using for nabu, my home brewed text editor project. In this post, I’m going to talk a little about how to manage allocating the gap in the gap buffer.
On first blush, you might think, Memory is cheap, just allocate something you aren’t likely to exhaust and be done with it. A large gap definitely makes insert operations better. The bigger it is, the less often you will fill it, and the less often you need to reallocate and copy all the trailing text out of the way of the newly enlarged gap.
However, there is a conflicting operation that prefers smaller gaps: moving the caret large distances If you’re moving the caret by one character, forward for backward, it’s independent of the size of the buffer. If you’re moving the caret by a handful of characters forward or backward, you can just loop and do the single-character move operation multiple times. However, if you’re trying to move the caret a significant distance, this looping can be costly, and you’ll need to perform the jump operation.
The jump operation places the caret at any location in the buffer by moving the gap. This is done by copying all the characters between the current location and the new location forward or backward in order to fill in the old gap and expose the new gap. The important point here is that you also need to copy all the characters that are about to be overlayed by the new gap, so they aren’t lost. That means the bigger the gap, the more costly this operation is.
So how do you decide between allocating a large gap to make insertions better, and allocating a small gap to make jumps better?
One idea I have is to use a simple adaptive heuristic to determine how much space to add to the buffer each time you need to. The heuristic has two pieces: one that advocates on behalf of inserts to keep the gap large, and one that advocates for jumps to keep the gap small.
The first piece works as follows: anytime you need to allocate more space to expand the buffer, it means you didn’t allocate enough the last time, so this time, allocate more.
The second works for the jump: anytime you have a non-empty gap when you perform a jump, it means you allocated too much space last time, so next time allocate less.
This is managed with a single integer field, called growSize. It starts out at some reasonable value, say 20. Any time you have to grow the gap, increase growSize by some amount, say fifty percent, and then allocate that much more additional space for the gap. Any time you do a jump, reduce the growSize by some fraction of the current size of the gap.
Over time, this will hopefully dial in to a pretty good value, based on the behavior of the particular writer. If they jump around a lot, then growSize will tend to be pretty small. If they tend to insert long stretches of text without moving around much, it will tend to be large. I haven’t actually tried this out, and it will probably need to be tuned to get good behavior, but the idea sounds interesting enough to pursue a bit.