Deobfuscating Javascript via AST: Replacing References to Constant Variables with Their Actual Value
Preface
This article assumes a preliminary understanding of Abstract Syntax Tree structure and BabelJS. Click Here to read my introductory article on the usage of Babel.
Definition of a Constant Variable
For our purposes, a constant variable is any variable that meets all three of the following conditions:
The variable is declared AND initialized at the same time.
The variable is initialized to a literal value, e.g. StringLiteral, NumericLiteral, BooleanLiteral, etc.
The variable is never reassigned another value in the script
Therefore, a variable’s declaration keyword (let,var,const) has no bearing on whether or not it is a constant.
Here is a quick example:
1 2 3 4 5 6 7 8 9 10
const a = [1, 2, 3]; var d = 12; let e = "String!"; let f = 13; let g;
f += 2;
console.log(a, b, d, e, f); g = 14;
In this example:
a is not a constant, since it’s initialized as an ArrayExpression, not a Literal
d is a constant, as it is declared and initialized to a NumericLiteral. Declaration and initialization happen at the same time. It is also never reassigned.
e is a constant, as it is declared and initialized to a StringLiteral. Declaration and initialization happen at the same time. It is also never reassigned.
f is not a constant, since it is reassigned after initialization: f+=2
g is not a constant, since it is not declared and initialized at the same time.
The reasoning for declared but uninitialized variables not counting as a constant is an important concept to understand. Take the following script as an example:
1 2 3 4 5 6 7
let foo; // Initialization
console.log(foo); // => undefined
foo = 2;
console.log(foo); // => 2
Console Output:
1 2
undefined 2
If, in this case, we tried to substitute foo‘s initialization value (2) for each reference offoo:
1 2 3 4 5 6 7
let foo; // Initialization
console.log(2); // => 2, NOT undefined!
foo = 2;
console.log(2); // => 2
Console Output:
1 2
2 2
Which clearly breaks the original functionality of the script due to not accounting for the state of the variable at certain points in the script. Therefore, we must follow the 3 conditions when determining a constant variable.
I’ll now discuss an example where substituting in constant variables can be useful for deobfuscation purposes.
Examples
Let’s say we have a very simple, unobfuscated script that looks like this:
xhr.onreadystatechange = function () { if (xhr.readyState === 4) { console.log(xhr.status); console.log(xhr.responseText); } };
xhr.send(); };
Analysis Methodology
Obviously, the obfuscated script is much more difficult to read. If you were to manually deobfuscate it, you’d have to search up each referenced variable and replace each occurrence of it with the actual variable. That could get tedious for a large number of variables, so we’re going to do it the Babel way. As always, let’s start by pasting the code into AST Explorer.
Our targets of interest are the extra variable declarations. Let’s take a closer look at one of them:
So, the target node type appears to be of type VariableDeclaration. However, each of these VariableDeclarations contains an array of VariableDeclarators. It is the VariableDeclarator that actually contains the information of the variables, including its id and init values. So, the actual node type we should focus on is VariableDeclarator.
Recall that we want to identify all constant variables, then replace all their references with their actual value. It’s important to note that variables in different scopes (e.g. local vs. global), may share the same name but have different values. So, the solution isn’t as simple as blindly replacing all matching identifiers with their initial value.
This would be a convoluted process if not for Babel’s ‘Scope’ API. I won’t dive too deep into the available scope APIs, but you can refer to the Babel Plugin Handbook to learn more about them. In our case, the scope.getBinding(${identifierName}) method will be incredibly useful for us, as it directly returns information regarding if a variable is constant and all of its references.
Putting all this knowledge together, the steps for creating the deobfuscator are as follows:
Traverse the ast in search of VariableDeclarators. If one is found:
Check if the variable is initialized. If it is, check that the initial value is a Literal type. If not, skip the node by returning.
Use the path.scope.getBinding(${identifierName}) method with the name of the current variable as the argument.
Store the returned constant and referencedPaths properties in their own respective variables.
Check if the constant property is true. If it isn’t, skip the node by returning.
Loop through all NodePaths in the referencedPaths array, and replace them with the current VariableDeclarator ‘s initial value (path.node.init)
After finishing the loop, remove the original VariableDeclarator node since it has no further use.
/** * Deobfuscator.js * The babel script used to deobfuscate the target file * */ const parser = require("@babel/parser"); const traverse = require("@babel/traverse").default; const t = require("@babel/types"); const generate = require("@babel/generator").default; const beautify = require("js-beautify"); const { readFileSync, writeFile } = require("fs");
/** * Main function to deobfuscate the code. * @param source The source code of the file to be deobfuscated * */ functiondeobfuscate(source) { //Parse AST of Source Code const ast = parser.parse(source);
// Visitor for replacing constants
const replaceRefsToConstants = { VariableDeclarator(path) { const { id, init } = path.node; // Ensure the the variable is initialized to a Literal type. if (!t.isLiteral(init)) return; let {constant, referencePaths} = path.scope.getBinding(id.name); // Make sure it's constant if (!constant) return; // Loop through all references and replace them with the actual value. for (let referencedPath of referencePaths) { referencedPath.replaceWith(init); } // Delete the now useless VariableDeclarator path.remove(); }, };
// Execute the visitor traverse(ast, replaceRefsToConstants);
// Code Beautification let deobfCode = generate(ast, { comments: false }).code; deobfCode = beautify(deobfCode, { indent_size: 2, space_in_empty_paren: true, }); // Output the deobfuscated result writeCodeToFile(deobfCode); } /** * Writes the deobfuscated code to output.js * @param code The deobfuscated code */ functionwriteCodeToFile(code) { let outputPath = "output.js"; writeFile(outputPath, code, (err) => { if (err) { console.log("Error writing file", err); } else { console.log(`Wrote file to ${outputPath}`); } }); }
xhr.onreadystatechange = function () { if (xhr.readyState === 4) { console.log(xhr.status); console.log(xhr.responseText); } };
xhr.send(); };
And the code is restored. Even better than the original actually, since we substituted in the url variable too!
Conclusion
Substitution of constant variables is a must-know deobfuscation technique. It’ll usually be one of your first steps in the deobfuscation, combined with constant folding. If you would like to learn about constant folding, you can read my article about it here.
This article also gave a nice introduction to one of the useful Babel API methods. Unfortunately, there isn’t much good documentation out there aside from the Babel Plugin Handbook. However, you can discover a lot more useful features Babel has to offer by reading its source code, or using the debugger of an IDE to list and test helper methods (the latter of which I personally prefer 😄).
If you’re interested, you can find the source code for all the examples in this repository.
Okay, that’s all I have for you today. I hope that this article helped you learn something new. Thanks for reading, and happy reversing!