BlackGuard Analysis - Deobfuscation Using Dnlib
07 June 2022
By Jacob Pimental
.NET binaries are normally easier to reverse engineer than compiled binaries. Just like with Java .jars, they can be decompiled to reveal the underlying source code; however, writing deobfuscators for these binaries is not usually straightforward for newer analysts. This post will walk through writing a string deobfuscator for the Blackguard Infostealer.
- What is BlackGuard
- String Obfuscation
- Intro to Dnlib
- Retrieve Initial Byte Array
- Replacing Deobfuscation Functions
- Replacing Base64-Decoding Functions
- Conclusion
What is BlackGuard
The Blackguard infostealer was discovered by Zscaler in March of 2022 and retailed for a monthly price of $200 or a lifetime price of $700. Blackguard focuses on stealing information from web browsers, VPN Clients, messaging clients, FTP clients, and “cold” cryptocurrency wallets. It heavily obfuscates its strings through an encrypted byte array which is described in the rest of this article.
Hacking forum post advertising Blackguard. Retrieved from: https://www.zscaler.com/blogs/security-research/analysis-blackguard-new-info-stealer-malware-being-sold-russian-hacking
String Obfuscation
For string obfuscation, Blackguard uses a large byte array that is decrypted via XOR at the start of execution. Multiple functions will grab a subset of this byte array after it is decrypted and return the UTF-8 encoded string. Since each deobfuscation function is randomly placed, there is no pattern to the offsets. With DnSpy’s lack of custom in-line commenting and scripting, commonly found in Cutter or other disassemblers, deobfuscation is a daunting task. Without a library like Dnlib, the analyst would have to manually pull out the section of the array and decode by hand.
Setting up byte array used for deobfuscation
Multiple deobfuscation functions that pull strings from the byte array
Intro to Dnlib
We can use Dnlib to read .NET binaries and make changes to their instructions. This library is at the root of tools such as DnSpy and the popular De4Dot deobfuscation tool. We can use it to rewrite the obfuscation functions in our Blackguard sample to their string representations.
The easiest way to do this would be to walk the classes and methods of the binary to retrieve the byte array. Next, we can create a list of all the methods that return deobfuscated strings and replace calls to them with the string itself. First, we need to load the module from our binary in Dnlib. This is done by defining a ModuleContext
and passing that to ModuleDefMD.Load
. This will return an object that can be used to read metadata about a module including class definitions and .NET instructions.
ModuleContext modCtx = ModuleDef.CreateModuleContext();
ModuleDefMD module = ModuleDefMD.Load(@"path\to\blackguard.exe", modCtx);
Retrieve Initial Byte Array
It is important to know which instructions we are looking for when retrieving the initial byte array. In DnSpy, it is possible to view the Intermediary Language (IL) instructions of a particular function. Intermediary Language is what is used by the .NET virtual machine to translate high level code into bytecode during execution. You can liken this to the Java bytecode that is interpreted by the Java Virtual Machine (JVM) at runtime.
Looking at the function that sets up the initial byte array, we can see the instruction newarr
which initializes an array and pushes it onto the stack. We also see a call to InitializeArray
. This function takes an object of type System.Array
and a RuntimeFieldHandle
. A RuntimeFieldHandle
is a pointer to data stored in the module that is used to populate the array with data.
Setting up the byte array
Before the call to InitializeArray
, the ldtoken
instruction on line 6 is used to push a value onto the stack. According to the .NET Documentation, this instruction is used to “convert a metadata token into its runtime representation”. This will most likely be the RuntimeFieldHandle
that contains the initial data for our array. We can easily grab this data in Dnlib by grabbing the operand’s InitialValue
field.
Finally, we see a few xor
instructions used later in the method that are part of the decryption loop. These are good indications that this function sets up the byte array.
XOR Instructions used in decryption loop
The easiest way to find the function that sets up the byte array is to find the use of newarr
and the xor
instructions. The entire code for grabbing this array is:
// Will grab the key array from the binary
// Params: type (TypeDef): The class to look for the key in
static byte[] getKey(TypeDef type)
{
// Loop through class methods
foreach (var method in type.Methods)
{
if (!method.HasBody)
continue;
// Check if method declares a new array and has an xor instruction
Instruction newarr = containsOpCode(method.Body.Instructions, OpCodes.Newarr);
Instruction xor = containsOpCode(method.Body.Instructions, OpCodes.Xor);
if (newarr != null && xor != null)
{
// Pass instructions to the getArrayData Function Below
byte[] bArr = getArrayData(method.Body.Instructions);
// Decrypt the byte array once we have its initial value
for (int i = 0; i < bArr.Length; i++)
{
int d = bArr[i] ^ i ^ 170;
bArr[i] = (byte)d;
}
return bArr;
}
}
return null;
}
// Gets array's data from method
static byte[] getArrayData(System.Collections.Generic.IList<Instruction> instructions)
{
bool foundArr = false;
foreach(var ins in instructions)
{
if(ins.OpCode == OpCodes.Newarr)
{
foundArr = true;
}
// If the current instructions opcode is ldtoken
// and we already passed the newarr instruction then
// this must be the array RuntimeFieldHandle
if(ins.OpCode == OpCodes.Ldtoken && foundArr )
{
FieldDef field = (FieldDef)ins.Operand;
// return array's initial value
return field.InitialValue;
}
}
return null;
}
Replacing Deobfuscation Functions
Creating List of Deobfuscation Functions
To find the deobfuscation functions and replace them, first we will need to find the function that will grab a subset from the decrypted byte array, convert it to a string, and return the value. The easiest way to do this is to look for calls to the GetString
function. Since this is a system function, there is no way for the compiler to know the exact location of this function in memory. To address this, the binary uses the callvirt
instruction to locate the GestString
function. We can look for usage of this instruction and verify that it is being used to call GetString
. The code looks something like this:
static MethodDef getDecryptMethod(TypeDef type)
{
// Loop through methods in class
foreach(var method in type.Methods)
{
if (!method.HasBody)
continue;
// Loop through method instructions
foreach(var ins in method.Body.Instructions)
{
// Check if instruction is using the callvirt opcode
if(ins.OpCode == OpCodes.Callvirt)
{
// Verify that it is calling GetString
if (ins.Operand.ToString().Contains("GetString"))
{
// Return the method definition itself
return method;
}
}
}
}
return null;
}
IL Code of the Main Deobfuscation Function
Once we have the method definition of the GetString
function, we can look for other methods that are calling it. Once we do that, we can store the method definition as well as its deobfuscated string in a HashMap to easily pull later. This will be useful when we go to replace the calls to these deobfuscation functions. The code for finding these functions looks like the following:
/* Params:
TypeDef type: Class to look for deobfuscation functions in
byte[] kArr: Byte array used for decryption
MethodDef decryptMethod: Main Deobfuscation Method Definition
*/
static Dictionary<MethodDef, string> decryptFuncs(TypeDef type, byte[] kArr, MethodDef decryptMethod)
{
Dictionary<MethodDef, string> funcs = new Dictionary<MethodDef, string>();
foreach(var method in type.Methods)
{
if (!method.HasBody)
continue;
// Starting at index two to grab previous two instructions once we
// find the call to the main deobfuscation method
for (var i = 2; i < method.Body.Instructions.Count; i++)
{
// Previous two instructions will be paramaters for method
Instruction prevIns = method.Body.Instructions[i - 1];
Instruction prevprevIns = method.Body.Instructions[i - 2];
Instruction ins = method.Body.Instructions[i];
// Look for call instruction that calls the main deobfuscation method
if(ins.OpCode == OpCodes.Call && ins.Operand == decryptMethod)
{
// Add method to hashmap with the method as the key and
// the deobfuscated string as the value
funcs.Add(method, decryptString(prevprevIns.GetLdcI4Value(), prevIns.GetLdcI4Value(), kArr));
}
}
}
return funcs;
}
// Returns deobfuscated string
static string decryptString(int x, int y, byte[] kArr)
{
return Encoding.UTF8.GetString(kArr, x, y);
}
Finding and Replacing Deobfuscation Function Usage
Now that we have a map containing all deobfuscation functions and their corresponding string, we can loop through the binary again and replace the usage of these functions. To do this we can simply replace the call
instruction with ldstr
followed by the string. The ldstr
instruction will push a new string object onto the stack which is useful in case the string is used as a parameter for another function. This means that the code will still function as it normally would with the obfuscation methods in place. The code for doing this is as follows:
// Grab hashmap of deobfuscation functions
var obfFuncs = getObfFuncs(module);
// Loop through each class in module
foreach (var type in module.Types)
{
if (!type.HasMethods)
continue;
// Loop through each method in class
foreach (var method in type.Methods)
{
if (!method.HasBody)
continue;
// Loop through each instruction in method
foreach (var inst in method.Body.Instructions)
{
// Check if the current instruction is for calling a function
if (inst.OpCode == OpCodes.Call && inst.Operand is MethodDef)
{
// Check to see if method being called is in hashmap
if (obfFuncs.ContainsKey((MethodDef)inst.Operand))
{
// Replace opcode with ldstr
inst.OpCode = OpCodes.Ldstr;
// Replace operand with deobfuscated string
inst.Operand = obfFuncs[(MethodDef)inst.Operand];
}
}
}
}
}
You can see in the below image the before and after of our deobfuscation efforts:
Deobfuscation Before and After
Replacing Base64-Decoding Functions
Now that we have deobfuscated all strings in the binary, we can see yet another issue. A lot of these strings are Base64-Encoded and are being decoded by another obfuscated function.
Base64-Encoded Strings Passed to Decoding Functions
A quick glance indicates these functions only return the Base64-Decoded string:
Base64-Decoding Function
We can use the same process to find these Base64-Decoding functions as we did for finding the string deobfuscation functions by looking for a call
instruction for FromBase64String
.
static List<MethodDef> getB64Funcs(ModuleDefMD module)
{
// Generate empty list to store function
List<MethodDef> b64Funcs = new List<MethodDef>();
// Loop through classes
foreach (var type in module.Types)
{
if (!type.HasMethods)
continue;
// Loop through Methods
foreach(var method in type.Methods)
{
if (!method.HasBody )
continue;
// Loop through instructions
foreach(var ins in method.Body.Instructions)
{
// Look for call to FromBase64String
if(ins.OpCode == OpCodes.Call && ins.Operand.ToString().Contains("FromBase64String"))
{
// Add found method to list
b64Funcs.Add(method);
}
}
}
}
return b64Funcs;
}
From here, we can replace these functions with their string representation much like what we did with the deobfuscation functions. We must make sure that the parameter being passed to the Base64-Decoding function is a string, so we check that the previous instruction is ldstr
before performing the replacement. Our final loop looks like the following:
// Get map of obfuscated functions
var obfFuncs = getObfFuncs(module);
// Get list of base64 functions
var b64Funcs = getB64Funcs(module);
// Loop through classes
foreach (var type in module.Types)
{
if (!type.HasMethods)
continue;
// Loop through methods
foreach (var method in type.Methods)
{
if (!method.HasBody)
continue;
// Loop to replace the obfuscated strings
foreach (var inst in method.Body.Instructions)
{
if (inst.OpCode == OpCodes.Call && inst.Operand is MethodDef)
{
if (obfFuncs.ContainsKey((MethodDef)inst.Operand))
{
inst.OpCode = OpCodes.Ldstr;
inst.Operand = obfFuncs[(MethodDef)inst.Operand];
}
}
}
// Loop through instructions again to find base64 functions
for (var i = 1; i < method.Body.Instructions.Count; i++)
{
// Get previous instruction
var prevIns = method.Body.Instructions[i - 1];
var ins = method.Body.Instructions[i];
if (ins.OpCode == OpCodes.Call && ins.Operand is MethodDef)
{
// Check to see if instruction is calling base64 function
if (b64Funcs.Contains((MethodDef)ins.Operand))
{
// Verify that previous instruction is a string
if (prevIns.Operand is string && prevIns.OpCode == OpCodes.Ldstr)
{
try
{
// base64 decode string
string b64Dec = Encoding.ASCII.GetString(Convert.FromBase64String(prevIns.Operand.ToString()));
// Replace with ldstr instruction
ins.OpCode = OpCodes.Ldstr;
ins.Operand = b64Dec;
// Replace paramater passing with nop instruction as it's no longer needed
prevIns.OpCode = OpCodes.Nop;
}
catch
{
continue;
}
}
}
}
}
}
}
Now all of the strings in the binary are fully deobfuscated and we can continue our analysis.
Comparison of obfuscated vs deobfuscated output
Conclusion
If you would like to check out the complete code you can find it on my GitHub or my Projects Page. If you have any questions or feedback on this article, feel free to message me on my Twitter or LinkedIn.
Thanks for reading and happy reversing!