Site Moved

I’ve moved this blog to a self-hosted site. It’s quite a bit slower, unfortunately, but it gives me more control over the site, so I can do things like embedding code snippets and latex.

So if you want to follow along, change your shortcuts and head on over to:

Site Moved


image courtesy of Noah Pendleton

We set aside Friday afternoons for these epic innovation sessions. We call it Insanely Creative Friday, or ICF. The torch represents the flame of creativity.

So we go to a local sweat lodge, drink psychotropic tea, and try to balance right on the edge of lucidity during a totally mad cap, off the wall brain storming session where nothing is out of bounds!



Anonymous Extensible Identifiers (AXID)

It seems like every organization has their own arbitrary way of assigning numerical identifiers to people. You have a driver’s license number, a social security number, a tax ID, an employee ID, a voter ID, and on and on. Wouldn’t it be nice if everyone just had one single ID? And wouldn’t it be nice if you didn’t have to look this ID up in some kind of database? What if you could figure it out just from your knowledge of a person, even incomplete knowledge? But at the same time, a person’s identifier doesn’t actually reveal any information about them?

Ok, so privacy experts will probably warn about using a common ID for everything, but let’s just ignore them and think about how this could work.

Continue reading “Anonymous Extensible Identifiers (AXID)”

Anonymous Extensible Identifiers (AXID)

Battery Testing Brainteaser

Our team has recently started working on some brain teasers to sharpen our minds, expose us to new ideas, and keep up our fun factor.

This is our first one:

You are camping with your friends. You have a flashlight for any emergency and have brought 8 batteries along with you. Your brother calls to tell you that four of those batteries are already dead.
Your flashlight requires two working batteries to run. What is the least number of pairs you will need to test to guarantee that you can get the flashlight on?

To clarify, we’re counting the time when you actually load the working pair into the flashlight as a test. So even if you’ve done N tests and now you know for a fact that a certain pair is working, the answer is still N+1 if that working pair isn’t already guaranteed to be in the flashlight.

And “guaranteed” is key. Obviously, it may be the case that the first pair of batteries you test works. That doesn’t mean the answer is one, because in other cases you won’t be so lucky. You need to prove that there is a strategy you can use so that you always find a working pair of batteries within N tries, and furthermore that there is no strategy that will work with fewer than N tries.

Spoiler: Solution and Proof

So below is a solution to this brain teaser. It’s a really interesting problem, so [blah blah blah] ruin the funĀ [blah blah blah].

Continue reading “Battery Testing Brainteaser”

Battery Testing Brainteaser

Gap Buffers

I recently mentioned a personal project I’m working on to create a text editor, which I’m calling nabu. Tonight, I thought I’d take some time off from writing code to talk a bit about the data structure I’m using for the nabu character buffer: the gap buffer.

At its core, a character buffer is just a mutable list of characters. A list is an abstract data structure that holds an indexed set of elements. A mutable list can be modified, so you can replace any element with a new element, as well as add and remove elements at any index. A key feature of a list is random access, meaning you can access any element from the set by using its index.

There are lots of ways to implement a list data structure. The most straight forward is an array list, where a sequence of consecutive memory locations are used to store the elements, one element per location. An array list is really simple, and really memory efficient. It’s really fast to iterate over, great for random access, and modifying existing elements is extremely quick. However, inserting and deleting elements is generally quite slow. In the worst case for an insert, you need to reallocate the entire buffer (in order to get enough space for the new element), which involves copying every existing element to the new location, and then copy every element again to move them out of the way for the new element. Deletion is similar, though doesn’t usually require reallocation.

Another implementation option is a linked list, where the elements are stored in a set of nodes. Each node stores a single element as well as a pointer (reference) to the next node in the list. Linked lists are pretty efficient at element modification, insertion, and deletion. They’re relatively quick at iterating, but terrible at random access (in its simplest implementation, a linked list requires iterating all the way from the beginning to the element in question). They can also require a significant amount of memory overhead; in the case of a character buffer, a linked list can easily double the memory requirements, or worse (e.g., adding a 32-bit pointer for each 32-bit Unicode character).

Insertion and deletion are critically important for a text editor, so an array list would be a costly choice of implementation. A linked list would be better in this regard, but the memory overhead would be significant, as would the slowness of random access (imagine moving the cursor from the beginning of a document to the end by iterating one character at a time).

A third way to implement a list is called a gap buffer. Structurally, it’s very much like an array list but with a simple modification that makes insertion and deletion much cheaper. For this reason, gap buffers are commonly found in text editor implementations, including my own implementation in nabu.

So what is a gap buffer? Simply put, it’s an array list with a gap in it. Imagine you have an array list but you know you’re going to be appending some elements to it. You can pre-allocate additional space at the end of the array for the future elements. Conceptually, the buffer now has a gap at the end that can be filled in with added elements.

Continue reading “Gap Buffers”

Gap Buffers

Allocation in Gap Buffers

I (very recently) posted about the gap buffer data structure I’m using for nabu, my home brewed text editor project. In this post, I’m going to talk a little about how to manage allocating the gap in the gap buffer.

On first blush, you might think, Memory is cheap, just allocate something you aren’t likely to exhaust and be done with it. A large gap definitely makes insert operations better. The bigger it is, the less often you will fill it, and the less often you need to reallocate and copy all the trailing text out of the way of the newly enlarged gap.

However, there is a conflicting operation that prefers smaller gaps: moving the caret large distances If you’re moving the caret by one character, forward for backward, it’s independent of the size of the buffer. If you’re moving the caret by a handful of characters forward or backward, you can just loop and do the single-character move operation multiple times. However, if you’re trying to move the caret a significant distance, this looping can be costly, and you’ll need to perform the jump operation.

The jump operation places the caret at any location in the buffer by moving the gap. This is done by copying all the characters between the current location and the new location forward or backward in order to fill in the old gap and expose the new gap. The important point here is that you also need to copy all the characters that are about to be overlayed by the new gap, so they aren’t lost. That means the bigger the gap, the more costly this operation is.

So how do you decide between allocating a large gap to make insertions better, and allocating a small gap to make jumps better?

One idea I have is to use a simple adaptive heuristic to determine how much space to add to the buffer each time you need to. The heuristic has two pieces: one that advocates on behalf of inserts to keep the gap large, and one that advocates for jumps to keep the gap small.

The first piece works as follows: anytime you need to allocate more space to expand the buffer, it means you didn’t allocate enough the last time, so this time, allocate more.

The second works for the jump: anytime you have a non-empty gap when you perform a jump, it means you allocated too much space last time, so next time allocate less.

This is managed with a single integer field, called growSize. It starts out at some reasonable value, say 20. Any time you have to grow the gap, increase growSize by some amount, say fifty percent, and then allocate that much more additional space for the gap. Any time you do a jump, reduce the growSize by some fraction of the current size of the gap.

Over time, this will hopefully dial in to a pretty good value, based on the behavior of the particular writer. If they jump around a lot, then growSize will tend to be pretty small. If they tend to insert long stretches of text without moving around much, it will tend to be large. I haven’t actually tried this out, and it will probably need to be tuned to get good behavior, but the idea sounds interesting enough to pursue a bit.

Allocation in Gap Buffers

The Saccharine world of modern Programming Languages

ES6, the latest version of ECMAScript (and by extension, JavaScript), was finally approved last month and ever since, the web has been blowing up with articles diving in to the newest features. This is not going to be one of those articles.

But I will link to one: Getting Started with JavaScript ES6 Destructuring

Destructuring in ES6 is a somewhat more developed form of the unpacking assignment syntax that python and some other languages have had for a while. In a nutshell, it lets you easily assign to multiple variables in a single statement, by automatically pulling values out of an array or object (or both).

In the world of programming language design, this type of feature is referred to as syntactic sugar. Sugar is something that saves you keystrokes or generally makes it easier to write the code, without actually providing any functionality that wasn’t previously there.

I have mixed feelings about syntactic sugar. The stodgy old dog programmer in me thinks sugar is wasteful. It complicates the language, obscures concepts and commonalities, and is only of value to people who are lazy and/or can’t type fast.

However, the side of me who has spent the last six or seven years writing extensively in python (arguably one of the more cloying languages out there) doesn’t want to have to write a separate statement to assign each element of a sequence to a different variable, and doesn’t want to have to write for loops to populate a sequence when I could just use a comprehension.

Continue reading “The Saccharine world of modern Programming Languages”

The Saccharine world of modern Programming Languages