BlackGuard Analysis - Deobfuscation Using Dnlib

07 June 2022

By Jacob Pimental

.NET binaries are normally easier to reverse engineer than compiled binaries. Just like with Java .jars, they can be decompiled to reveal the underlying source code; however, writing deobfuscators for these binaries is not usually straightforward for newer analysts. This post will walk through writing a string deobfuscator for the Blackguard Infostealer.

What is BlackGuard
String Obfuscation
Intro to Dnlib
Retrieve Initial Byte Array
Replacing Deobfuscation Functions
1. Creating List of Deobfuscation Functions
2. Finding and Replacing Deobfuscation Function Usage
Replacing Base64-Decoding Functions
Conclusion

What is BlackGuard

The Blackguard infostealer was discovered by Zscaler in March of 2022 and retailed for a monthly price of $200 or a lifetime price of $700. Blackguard focuses on stealing information from web browsers, VPN Clients, messaging clients, FTP clients, and “cold” cryptocurrency wallets. It heavily obfuscates its strings through an encrypted byte array which is described in the rest of this article.

Forum post describing the features of Blackguard Hacking forum post advertising Blackguard. Retrieved from: https://www.zscaler.com/blogs/security-research/analysis-blackguard-new-info-stealer-malware-being-sold-russian-hacking

String Obfuscation

For string obfuscation, Blackguard uses a large byte array that is decrypted via XOR at the start of execution. Multiple functions will grab a subset of this byte array after it is decrypted and return the UTF-8 encoded string. Since each deobfuscation function is randomly placed, there is no pattern to the offsets. With DnSpy’s lack of custom in-line commenting and scripting, commonly found in Cutter or other disassemblers, deobfuscation is a daunting task. Without a library like Dnlib, the analyst would have to manually pull out the section of the array and decode by hand.

Byte Array Decryption Function Setting up byte array used for deobfuscation

Multiple deobfuscation functions that pull strings from the byte array

Intro to Dnlib

We can use Dnlib to read .NET binaries and make changes to their instructions. This library is at the root of tools such as DnSpy and the popular De4Dot deobfuscation tool. We can use it to rewrite the obfuscation functions in our Blackguard sample to their string representations.

The easiest way to do this would be to walk the classes and methods of the binary to retrieve the byte array. Next, we can create a list of all the methods that return deobfuscated strings and replace calls to them with the string itself. First, we need to load the module from our binary in Dnlib. This is done by defining a ModuleContext and passing that to ModuleDefMD.Load. This will return an object that can be used to read metadata about a module including class definitions and .NET instructions.

ModuleContext modCtx = ModuleDef.CreateModuleContext();
ModuleDefMD module = ModuleDefMD.Load(@"path\to\blackguard.exe", modCtx);

Retrieve Initial Byte Array

It is important to know which instructions we are looking for when retrieving the initial byte array. In DnSpy, it is possible to view the Intermediary Language (IL) instructions of a particular function. Intermediary Language is what is used by the .NET virtual machine to translate high level code into bytecode during execution. You can liken this to the Java bytecode that is interpreted by the Java Virtual Machine (JVM) at runtime.

Looking at the function that sets up the initial byte array, we can see the instruction newarr which initializes an array and pushes it onto the stack. We also see a call to InitializeArray. This function takes an object of type System.Array and a RuntimeFieldHandle. A RuntimeFieldHandle is a pointer to data stored in the module that is used to populate the array with data.

IL Instructions used to set up array (newarr, dup, ldtoken, call to InitializeArray) Setting up the byte array

Before the call to InitializeArray, the ldtoken instruction on line 6 is used to push a value onto the stack. According to the .NET Documentation, this instruction is used to “convert a metadata token into its runtime representation”. This will most likely be the RuntimeFieldHandle that contains the initial data for our array. We can easily grab this data in Dnlib by grabbing the operand’s InitialValue field.

Finally, we see a few xor instructions used later in the method that are part of the decryption loop. These are good indications that this function sets up the byte array.

XOR Instructions used in decryption loop

The easiest way to find the function that sets up the byte array is to find the use of newarr and the xor instructions. The entire code for grabbing this array is:

// Will grab the key array from the binary
// Params: type (TypeDef): The class to look for the key in
static byte[] getKey(TypeDef type)
{
    // Loop through class methods
    foreach (var method in type.Methods)
    {
        if (!method.HasBody)
            continue;

        // Check if method declares a new array and has an xor instruction
        Instruction newarr = containsOpCode(method.Body.Instructions, OpCodes.Newarr);
        Instruction xor = containsOpCode(method.Body.Instructions, OpCodes.Xor);

        if (newarr != null && xor != null)
        {
            // Pass instructions to the getArrayData Function Below
            byte[] bArr = getArrayData(method.Body.Instructions);

            // Decrypt the byte array once we have its initial value
            for (int i = 0; i < bArr.Length; i++)
            {
                int d = bArr[i] ^ i ^ 170;
                bArr[i] = (byte)d;
            }
            return bArr;
        }
    }
    return null;
}

// Gets array's data from method
static byte[] getArrayData(System.Collections.Generic.IList<Instruction> instructions)
{
    bool foundArr = false;
    foreach(var ins in instructions)
    {
        if(ins.OpCode == OpCodes.Newarr)
        {
            foundArr = true;
        }

        // If the current instructions opcode is ldtoken
        // and we already passed the newarr instruction then
        // this must be the array RuntimeFieldHandle
        if(ins.OpCode == OpCodes.Ldtoken && foundArr )
        {
            FieldDef field = (FieldDef)ins.Operand;
            // return array's initial value
            return field.InitialValue;
        }
    }
    return null;
}

Replacing Deobfuscation Functions

Creating List of Deobfuscation Functions

To find the deobfuscation functions and replace them, first we will need to find the function that will grab a subset from the decrypted byte array, convert it to a string, and return the value. The easiest way to do this is to look for calls to the GetString function. Since this is a system function, there is no way for the compiler to know the exact location of this function in memory. To address this, the binary uses the callvirt instruction to locate the GestString function. We can look for usage of this instruction and verify that it is being used to call GetString. The code looks something like this:

static MethodDef getDecryptMethod(TypeDef type)
{
    // Loop through methods in class
    foreach(var method in type.Methods)
    {
        if (!method.HasBody)
            continue;

        // Loop through method instructions
        foreach(var ins in method.Body.Instructions)
        {
            // Check if instruction is using the callvirt opcode
            if(ins.OpCode == OpCodes.Callvirt)
            {
                // Verify that it is calling GetString
                if (ins.Operand.ToString().Contains("GetString"))
                {
                    // Return the method definition itself
                    return method;
                }
            }
        }
    }
    return null;
}

IL Code of Main Deobfuscation Function IL Code of the Main Deobfuscation Function

Once we have the method definition of the GetString function, we can look for other methods that are calling it. Once we do that, we can store the method definition as well as its deobfuscated string in a HashMap to easily pull later. This will be useful when we go to replace the calls to these deobfuscation functions. The code for finding these functions looks like the following:

/* Params:
    TypeDef type: Class to look for deobfuscation functions in
    byte[] kArr: Byte array used for decryption
    MethodDef decryptMethod: Main Deobfuscation Method Definition
*/
static Dictionary<MethodDef, string> decryptFuncs(TypeDef type, byte[] kArr, MethodDef decryptMethod)
{
    Dictionary<MethodDef, string> funcs = new Dictionary<MethodDef, string>();
    foreach(var method in type.Methods)
    {
        if (!method.HasBody)
            continue;

        // Starting at index two to grab previous two instructions once we 
        // find the call to the main deobfuscation method
        for (var i = 2; i < method.Body.Instructions.Count; i++)
        {
            // Previous two instructions will be paramaters for method
            Instruction prevIns = method.Body.Instructions[i - 1];
            Instruction prevprevIns = method.Body.Instructions[i - 2];
            Instruction ins = method.Body.Instructions[i];

            // Look for call instruction that calls the main deobfuscation method
            if(ins.OpCode == OpCodes.Call && ins.Operand == decryptMethod)
            {
                // Add method to hashmap with the method as the key and 
                // the deobfuscated string as the value
                funcs.Add(method, decryptString(prevprevIns.GetLdcI4Value(), prevIns.GetLdcI4Value(), kArr));
            }
        }
    }
    return funcs;
}

// Returns deobfuscated string
static string decryptString(int x, int y, byte[] kArr)
{
    return Encoding.UTF8.GetString(kArr, x, y);
}

Finding and Replacing Deobfuscation Function Usage

Now that we have a map containing all deobfuscation functions and their corresponding string, we can loop through the binary again and replace the usage of these functions. To do this we can simply replace the call instruction with ldstr followed by the string. The ldstr instruction will push a new string object onto the stack which is useful in case the string is used as a parameter for another function. This means that the code will still function as it normally would with the obfuscation methods in place. The code for doing this is as follows:

// Grab hashmap of deobfuscation functions
var obfFuncs = getObfFuncs(module);

// Loop through each class in module
foreach (var type in module.Types)
{
    if (!type.HasMethods)
        continue;

    // Loop through each method in class
    foreach (var method in type.Methods)
    {
        if (!method.HasBody)
            continue;

        // Loop through each instruction in method
        foreach (var inst in method.Body.Instructions)
        {
            // Check if the current instruction is for calling a function
            if (inst.OpCode == OpCodes.Call && inst.Operand is MethodDef)
            {
                // Check to see if method being called is in hashmap
                if (obfFuncs.ContainsKey((MethodDef)inst.Operand))
                {
                    // Replace opcode with ldstr
                    inst.OpCode = OpCodes.Ldstr;
                    // Replace operand with deobfuscated string
                    inst.Operand = obfFuncs[(MethodDef)inst.Operand];
                }
            }
        }
    }
}

You can see in the below image the before and after of our deobfuscation efforts:

Deobfuscation Before and After

Replacing Base64-Decoding Functions

Now that we have deobfuscated all strings in the binary, we can see yet another issue. A lot of these strings are Base64-Encoded and are being decoded by another obfuscated function.

Usage of Base64 functions used in code Base64-Encoded Strings Passed to Decoding Functions

A quick glance indicates these functions only return the Base64-Decoded string:

Base64-Decoding Function

We can use the same process to find these Base64-Decoding functions as we did for finding the string deobfuscation functions by looking for a call instruction for FromBase64String.

static List<MethodDef> getB64Funcs(ModuleDefMD module)
{
    // Generate empty list to store function
    List<MethodDef> b64Funcs = new List<MethodDef>();
    // Loop through classes
    foreach (var type in module.Types)
    {
        if (!type.HasMethods)
            continue;
        // Loop through Methods
        foreach(var method in type.Methods)
        {
            if (!method.HasBody )
                continue;

            // Loop through instructions
            foreach(var ins in method.Body.Instructions)
            {
                // Look for call to FromBase64String
                if(ins.OpCode == OpCodes.Call && ins.Operand.ToString().Contains("FromBase64String"))
                {
                    // Add found method to list
                    b64Funcs.Add(method);
                }
            }
        }
    }
    return b64Funcs;
}

From here, we can replace these functions with their string representation much like what we did with the deobfuscation functions. We must make sure that the parameter being passed to the Base64-Decoding function is a string, so we check that the previous instruction is ldstr before performing the replacement. Our final loop looks like the following:

// Get map of obfuscated functions
var obfFuncs = getObfFuncs(module);

// Get list of base64 functions
var b64Funcs = getB64Funcs(module);

// Loop through classes
foreach (var type in module.Types)
{
    if (!type.HasMethods)
        continue;

    // Loop through methods
    foreach (var method in type.Methods)
    {
        if (!method.HasBody)
            continue;

        // Loop to replace the obfuscated strings    
        foreach (var inst in method.Body.Instructions)
        {
            if (inst.OpCode == OpCodes.Call && inst.Operand is MethodDef)
            {
                if (obfFuncs.ContainsKey((MethodDef)inst.Operand))
                {
                    inst.OpCode = OpCodes.Ldstr;
                    inst.Operand = obfFuncs[(MethodDef)inst.Operand];
                }
            }
        }

        // Loop through instructions again to find base64 functions
        for (var i = 1; i < method.Body.Instructions.Count; i++)
        {
            // Get previous instruction
            var prevIns = method.Body.Instructions[i - 1];
            var ins = method.Body.Instructions[i];
            if (ins.OpCode == OpCodes.Call && ins.Operand is MethodDef)
            {
                // Check to see if instruction is calling base64 function
                if (b64Funcs.Contains((MethodDef)ins.Operand))
                {
                    // Verify that previous instruction is a string
                    if (prevIns.Operand is string && prevIns.OpCode == OpCodes.Ldstr)
                    {
                        try
                        {
                            // base64 decode string
                            string b64Dec = Encoding.ASCII.GetString(Convert.FromBase64String(prevIns.Operand.ToString()));

                            // Replace with ldstr instruction
                            ins.OpCode = OpCodes.Ldstr;
                            ins.Operand = b64Dec;

                            // Replace paramater passing with nop instruction as it's no longer needed
                            prevIns.OpCode = OpCodes.Nop;
                        }
                        catch
                        {
                            continue;
                        }
                    }
                }
            }
        }
    }
}

Now all of the strings in the binary are fully deobfuscated and we can continue our analysis.

Comparison of obfuscated vs deobfuscated output

Conclusion

If you would like to check out the complete code you can find it on my GitHub or my Projects Page. If you have any questions or feedback on this article, feel free to message me on my Twitter or LinkedIn.

Thanks for reading and happy reversing!

GoggleHeadedHacker

BlackGuard Analysis - Deobfuscation Using Dnlib

07 June 2022

What is BlackGuard

String Obfuscation

Intro to Dnlib

Retrieve Initial Byte Array

Replacing Deobfuscation Functions

Creating List of Deobfuscation Functions

Finding and Replacing Deobfuscation Function Usage

Replacing Base64-Decoding Functions

Conclusion

malware Analysis, .NET, InfoStealer, Automation

More Content Like This:

OneNote Analysis

Analysis of Log4jShell Attack

Sodinokibi Ransomware Analysis

Anti-Analysis Techniques Used in Excel 4.0 Macros

Automatic Gobfuscator Deobfuscation with EKANS Ransomware

Malicious Excel 4.0 Macro Analysis