Linux is one of my favorite operating systems, but you seldom see malware for it, so I was pretty interested when Linux Malware was caught by my honeypot. This article will be my analysis of the sample, particularly the decryption function that was used throughout it. It’s a good example of why using your own encryption algorithm isn’t very secure.
Like with any analysis, I first toss the file into VirusTotal to see what is going on:
We can see here that only 34 out of 59 vendors identified this malware, not very surprising given that it’s a Linux binary. The binary is not packed and there isn’t anything extremely interesting in the VirusTotal analysis. The next thing to do is run it through rabin2 to get some basic information about it.
Again, nothing too interesting. This is a Linux binary that was coded in the C programming language. It starts to get interesting once you analyze the exports using the -E flag in rabin2.
You can see that all of the object and function names are defined for us. This was probably used while the author was debugging their malware. Now we won’t have to deal with radare2 creating names for functions since it already knows that they all are. Now we can pop the binary into radare2 and seek to the main function.
One of the first things I noticed was this snippet of code:
Not only does it show in the comments that the string is called strHost, it also shows that the name of the function that strHost is being passed into is called DecryptData. We can guess now that strHost probably holds some sort of hostname or address that the malware might connect to, and that DecryptData will decrypt this string so it will be usable, as opposed to what it is now, ‘yy123-e4213-mfs’. Next we can jump into DecryptData and see what that looks like.
To start off, we can see that the value 0 is moved into a counter variable (keep in mind, I renamed all of the variables in this function using the afvn command, this makes it a lot easier for me to analyze). It then starts the loop by checking to see if the counter variable is less than the string’s length that was provided as an argument. If it is then we go to the main logic of our loop.
This may seem pretty daunting for newer analysts but I’ll walk through what is going on. The program passes the current index of the string we are on to another counter variable (do not ask me why, because this could have been done without doing that). It then declares a new variable with the hex value of 0x55555556. This value is what is called a “magic number” and is used in improving the performance in division. This specific magic number comes out to 1431655766. If we multiply a number by this and then do a logical shift right by 32 bits, then it is the same result as dividing by 3! I will do my best to try to explain this, as it is the first I have heard about it but it is actually very interesting.
The example we will use for this is 6 divided by 3 which we all know is 2. Now in this case we would do 6 * 1431655766 which comes out to 8589934596, or in binary:
Now if we shift all of the bits to the right 32 spaces we get:
Which comes out to 2! It is a very interesting trick and teaches you a lot about how computer architecture works. It is dividing without having to divide! For more info on this I suggest reading this article.
So what the application is doing in essence is running through the counter and setting a variable to either 0, 1, or 2 based on the calculations made by using the magic number. If the variable is 1 then it jumps to one part of code, if the variable is not then it jumps to another. We will look at these two branches next.
So both branches do relatively the same thing with one minor change. To start off they both check to see if the character at the index has a hex value less than or equal to 0x20, which is just a space character. If it does, then we do not mess with that character, increase our counter and do the next iteration. If it does not then it checks to see if the value is equal to 0x7f, the DELETE character in ASCII. If it is then we do not do anything again and continue to the next iteration. If not then we continue with decryption. This is just setting the bounds of writable ASCII characters, 0x20–0x7f.
So if our magic variable that we calculated earlier is equal to 1 then we perform the left branch, which at the very end of the ASCII bounds check subtracts our letter by one for decryption. This means that B becomes A, D becomes C, and so on. If our magic variable does not equal 1 then we add our letter by one, A becomes B, C becomes D, and so on. That is all this algorithm does, which is not too frightening and makes it easy for us to decrypt the strings. Let us see what that strHost string (‘yy123-e4213-mfs’) becomes after running through this function.
First, I am going to assign each each character in the string the value of either 0, 1, or 2. This mimics the calculations done using that magic number.
Now for every character whose value is not 1 I will add its ASCII value by 1, and with every character whose value is 1 I will subtract its ASCII value by 1.
So the malware most likely connects to the host zx232.f3322.net, which is probably the command and control center.
I made a quick python script to decrypt any other strings I may come across.
And here is list of the other encrypted strings and their decrypted counterparts:
The rest of the sample is fairly straightforward. It copies itself into the file /etc/.zl, /tmp/.lz, and /etc/init.d/.zl. It then makes a copy of itself in rc2.d, rc3.d, rc4.d, and rc5.d and symbolically links them to the .zl file in /etc/init.d. Making these clones ensures persistence in the system as files in these folders are called on startup. This can be a host-based identifier of the malware if you notice these files in the system. It then pings the server at zx232.f3322.net on port 54188 to wait for some sort of response. This can be a network-based identifier of the malware. I have not seen the malware actually get a response yet, which can be caused by several reasons. For example, the server could no longer be in use, or the owner wasn’t sending out commands at the time, etc. I assume though that the program receives a command from the server and executes it.
The interesting part of this malware was mostly the decryption function, so this article was a little lesson on why home brewed cryptography is a bad idea. In this case it allowed us to decrypt the strings to find out more about what the malware is doing. Hopefully this was an interesting article for everyone as it was really fun analyzing this sample. As always, I am still a novice at malware analysis so if there is something I can improve on please tell me. You can contact me at my Twitter and Linkedin.
Thanks for reading, and happy reversing!