shardulc

Semantic highlighting

Jul 10, 2021

Most code editors perform syntax highlighting, i.e. visually indicating syntactic properties like ‘does this keyword define a new function’ or ‘does this ( have a matching )’. The latter is often useful. But what about the former? Linus Akesson seems to have many issues with ‘part-of-speech’ highlighting, and I largely agree with him; I think the benefits for legibility and understandability are questionable.

“Semantics are more important than syntax”, he writes, and “it would be a good idea to configure our tools to help us find [bugs], rather than […] miss them”. But his conclusion doesn’t quite follow: “[e]dit your code without syntax highlighting, or […] with just two colours (one for comments and one for code)”. Why not improve highlighting to be more useful to the programmer?

Evan Brooks proposes to highlight the semantics of code by having lexicographically adjacent variable names have adjacent colors in a continuous color-space (with demo implementations). The intention is good, but if I’m understanding correctly (I haven’t tried the tools myself), it doesn’t help with code comprehension beyond perhaps tracing identifiers through code—which a good editor should have another way of doing anyway. It would also run into the issue of too many (too subtly different) colors.

Brian Will takes a different approach, focusing on more higher-level semantics. He wants to be able to visually distinguish scopes: “non-local names from imported modules, non-local names of the current module, standard library names (and reserved words), […] definitions of local names, […] uses of local names”. (Will points to a ‘scope highlighting’ idea of Douglas Crockford (of JavaScript & JSON fame) for inspiration.) I like this a lot! And my thoughts immediately jump to Bret Victor’s vision for Learnable Programming, not just for learning but also working effectively with code. The visual distinction is an invitation to probe the program more deeply. When I write to this variable, which lines of code read that value? Which objects are keeping around references to variables with indefinite extent? More about scope and extent, if you’re curious. Which of these external function calls are costly, rare, worth trying to eliminate?

You may argue: that’s what your compiler and/or runtime environment computes anyway! Why not run your program to get the answers? Why duplicate the work in your editor? I’d respond: yes, exactly! If you’re going to run your code anyway, might as well do it live, incrementally, instead of relying on a lengthy feedback loop. To quote Victor, contrast the typical programmer’s workflow with this:

[M]ost musicians don’t compose entire melodies in their head and then write them down; instead, they noodle around on a instrument for a while, playing with patterns and reacting to what they hear, adjusting and sculpting. […] An essential aspect of a painter’s canvas and a musical instrument is the immediacy with which the artist gets something there to react to. A canvas or sketchbook serves as an “external imagination”, where an artist can grow an idea from birth to maturity by continuously reacting to what’s in front of him.

(Honestly, just read the whole essay, it’s great. Link again.)