In 1845, Edgar Allan Poe wrote “The Purloined Letter”, a tale of a blackmailer who confounded police by hiding a letter in plain sight. Nearly two centuries later, a real-life generation of predators may use a similar technique to exploit JavaScript vulnerabilities. Understanding the method begins with a look at how text read by billions of people makes it to the screen.
ASCII and Unicode: The Roots of the Exploits
While most programmers know several ASCII codes by heart, it’s a good bet most coders don’t know what the acronym stands for. The American Standard Code for Information Interchange emerged in the early 1960s to standardize the output of teleprinters. The specification supported 128 characters, including several non-printing control characters.
Fast forward to the late 1980s. A pair of Xerox engineers and a counterpart from Apple began discussions for a universal code to resolve ASCII’s shortcomings. Xerox had a long involvement with printing, and Apple’s Macintosh computers were riding the desktop publishing boom. The result of the partnership was Unicode. This new standard incorporated ASCII’s code to preserve backward compatibility but included substantially more characters and non-printing codes to fine-tune text display.
Unicode version 13 defines 143,859 characters. This character set comprises practically every written language on the planet, including many obscure tongues. With this many characters, it’s understandable that several characters with different codes would closely resemble other characters. Characters with this close resemblance are called homoglyphs, and they form the seed of a nasty exploit.
The Homoglyph Exploit: Masters of Disguise
Using homoglyphs for nefarious purposes is nothing new. Homoglyphs have been used for years to spoof well-known URLs and lead unsuspecting clickers to spam websites. Later on, predators used homoglyphs to mimic the URLs of familiar banks to lure users into revealing their account passwords.
The year 2021 brought a more sophisticated exploit: Homoglyphs used to insert malicious code. Just like Poe’s fictional blackmailer, the predators hide the code in plain sight. In their 2021 paper “Trojan Source: Invisible Vulnerabilities”, University of Cambridge researchers Nicholas Boucher and Ross Anderson outlined Unicode exploits that slid by nearly all of the world’s popular programming languages.
Using the classic “Hello World” program as a starting point, the British pair constructed two JavaScript functions with these identifiers:
sayHello
sayНello
What appears to be an English language “H” in the second function is a Cyrillic character. According to the Cambridge team, the latter function compiles in Node.js v16.4.1 without throwing either an error or a warning. Boucher and Anderson argue that predators could thus easily write malicious code for well-known functions and merely exchange one character for a homoglyph. If the functions make it past code reviewers, backdoors open to countless servers.
While Boucher and Anderson toiled on their paper, German cybersecurity consultant Wolfgang Ettlinger outlined another possible homoglyph exploit. Examine this code fragment:
if(environmentǃ=ENV_PROD){
return true;
}
This code appears to use the “does not equal” operator to compare the environment variable with the constant ENV_PROD. Guess again. A genuine “does not equal” operator combines an exclamation point with an equal sign. In this code, what appears to be an exclamation point is an alveolar click, a pronunciation indicator for languages in Africa and Australia. Instead of evaluating a comparison, the JavaScript compiler sets a variable named environmentǃ to the value of constant ENV_PROD. The alveolar click is an allowable character for a JavaScript identifier, and Ettlinger successfully tested this code with Node.js v14.
This code block will always return true. Using this technique, a malicious coder could write a server access verification block that would not perform any actual authorization checks — a wide-open back door. What could be worse than hard-to-spot bogus operators? Try some invisible malicious characters on for size.
The Invisible Character Exploit: Attack of the Bidis
While English-based ASCII always renders a consecutive string of character codes left-to-right, Unicode accommodates bidirectional languages, such as Hebrew and Arabic. With a bidirectional language, words read right-to-left, while numbers read left-to-right. When phrases from conventional and bidirectional languages appear on the same page, accurately rendering the resulting text is a typographic challenge. Unicode solves this problem with nine non-printing bidirectional control characters. Taking the nickname Bidis, these characters provide the toolbox for a clever exploit.
Things Are Not What They Seem
In their paper, Boucher and Anderson provide a proof-of-concept exploit using the Bidi invisible characters to sneak malicious code past human eyeballs. Here is what a code reviewer would see in a typical viewer:
var isAdmin = false;
/* begin admins only */ if (isAdmin) {
console.log(“You are an admin.”);
/* end admins only */
This code appears to be an innocent Node.js verification block, only logging users with administrator status. The compiler sees something very different. Here’s Boucher and Anderson’s exploit code, with the Bidis displayed as three-letter codes and bolded for emphasis:
var isAdmin = false;
/* RLO } LRIif(isAdmin)PDI LRI begin admins only */
console.log(“You are an admin.”);
/* end admins only RLO { */
In the code the compiler sees, there is no conditional test of the variable isAdmin in line two; the entire line is a comment block. This code block will log an administrator no matter whether isAdmin is true or false. Step by step, here’s how the Bidi codes work their deceptive magic.
- The RLO Bidi in line two flips the right curly bracket into a left bracket and relocates it to the end of the line.
- Sticking with line two, the LRI and PDI codes bracketing if(isAdmin) relocate this phrase in between the closing comment block indicator and the opening curly bracket.
- Tidying things up in line two, the second LRI Bidi snugs begins admins only up to the opening comment block indicator.
- On line four, the RLO Bidi strikes again, this time flipping a left curly bracket to a right bracket and relocating the bracket outside the closing comment indicator.
Prevalence of the Exploits
While the invisible character exploits work, what is the probability of encountering these JavaScript security vulnerabilities? Boucher and Anderson scanned GitHub from January 2021 to October 2021 but could not turn up code patterns that matched their exploits. The pair found a handful of patterns using Bidi characters but were unwilling to definitively peg the applications as malicious. Nonetheless, the exploits defined by Boucher, Anderson, and Ettlinger demand concrete action.
Remedies: The Need for Community Cooperation
How should the developer community improve JavaScript vulnerabilities security? Ettlinger argues for an ASCII-only policy. The German security pro notes that many developers work in English regardless of nationality. Ettlinger also points out that there are well-established ASCII substitute character pairs for non-ASCII characters, and coders working outside the English language ecosystem quickly commit the substitute pairs to memory.
While Ettlinger’s remedy does indeed wall off avenues of attack, it’s an open question of how the JavaScript coding community would accept it. Part of JavaScript’s mission was democratizing programming, and an ASCII-only policy would fly in the face of that goal. The Cambridge duo instead advocates a nuanced and multilayered approach.
Compilers, Pipelines, and Code Editors
Beginning with the JavaScript compiler, the British team recommends banning unterminated Bidi characters. In other words, each condition-on character should have a matching condition-off character. If an unmatched Bidi character appears in the code, the Cambridge team argues the compiler should throw an error, not merely display a warning.
While Boucher and Anderson insist that the compiler is the ultimate solution to the exploit, the pair readily acknowledges the hazard that already may exist in the worldwide codebase. Application-building pipelines could serve as a layer of defense with a Bidi matched-pair scan. For code repositories, the team argues that web-based user interfaces should display Bidi override characters as visible tokens.
Boucher and Anderson next zero in on code editors. The team argues that code editors should implement the same matched-pair Bidi algorithm they advocate for compilers and pipelines. The pair praised the Vim editor — a spartan terminal application included with nearly every operating system — for displaying the codepoint numbers of the otherwise invisible Bidi characters. Conversely, heavy hitters like Microsoft’s Visual Studio Code and Apple’s Xcode received a chilly reception for not yet providing coders any weaponry to root out Unicode exploits. Ettlinger praises JetBrain’s WebStorm integrated development environment for providing a “Non-ASCII characters in an identifier” warning to head off homoglyph exploits.
SOOS: Knowing Where To Look
In Poe’s “The Purloined Letter”, the blackmailer’s scheme is ultimately foiled by a detective who knew where to look. The real-life trio of researchers on the hunt for JavaScript vulnerabilities embodies a similar focus. Thankfully, the answer lies within open-source software composition analysis tools like SOOS. A scan with SOOS compares your code — dependencies included — against more than 175,000 known vulnerabilities. SOOS then prioritizes trouble spots to keep your team’s focus where it belongs. Capable, priced right, and ready to integrate with your existing pipeline, SOOS is the SCA solution for developers who know where to look.