You probably already guessed it from the title’s name, API Hashing is used to obfuscate a binary in order to hide API names from static analysis tools, hindering a reverse engineer to understand the malware’s functionality.
A first approach to get an idea of an executable’s functionalities is to more or less dive through the functions and look out for API calls. If, for example a CreateFileW
function is called in a specific subroutine, it probably means that cross references or the routine itself implement some file handling functionalities. This won’t be possible if API Hashing is used.
Instead of calling the function directly, each API call has a corresponding checksum/hash. A hardcoded hash value might be retrieved and for each library function a checksum is computed. If the computed value matches the hash value we compare it against, we found our target.
In this case a reverse engineer needs to choose a different path to analyse the binary or deobfuscate it. This blog article will cover how the DanaBot banking trojan implements API Hashing and possibly the easiest way on how this can be defeated. The SHA256
of the binary I am dissecting here is added at the end of this blog post.
Deep diving into DanaBot
DanaBot itself is a banking trojan and has been around since atleast 2018 and was first discovered by ESET[1]. It is worth mentioning that it implements most of its functionalities in plugins, which are downloaded from the C2 server. I will focus on deobfuscating API Hashing in the first stage of DanaBot, a DLL which is dropped and persisted on the system, used to download further plugins.
Reversing the ResolvFuncHash routine
At the beginning of the function, the EAX
register stores a pointer to the DOS
header of the Dynamic Linked Library which, contains the function the binary wants to call. The corresponding hash of the yet unknown API function is stored in the EDX
register. The routine also contains a pile of junk instructions, obfuscating the actual use case for this function.
The hash is computed solely from the function name, so the first step is to get a pointer to all function names of the target library. Each DLL contains a table with all exported functions, which are loaded into memory. This Export Directory is always the first entry in the Data Directory array. The PE file format and its headers contain enough information to reach this mentioned directory by parsing header structures:
In the picture below, you can see an example of the mentioned junk instructions, as well as the critical block, which compares the computed hash with the checksum of the function we want to call. The routine iterates through all function names in the Export Directory and calculates the hash.
The loop breaks once the computed hash matches the value that is stored in the EDX
register since the beginning of this routine.
Reversing the hashing algorithm
The hashing algorithm is fairly simple and nothing too complicated. Junk instructions and opaque predicates complicate the process of reversing this routine.
The algorithm takes the nth
and the stringLength-n-1th
char of the function name and stores them, as well as capitalised versions into memory, resulting in a total of 4 characters. Each one of those characters is XOR'd
with the string length. Finally they are multiplied and the values are added up each time the loop is run and result in the hash value.
def get_hash(funcname):
"""Calculate the hash value for function name. Return hash value as integer"""
strlen = len(funcname)
# if the length is even, we encounter a different behaviour
i = 0
hashv = 0x0
while i < strlen:
if i == (strlen - 1):
ch1 = funcname[0]
else:
ch1 = funcname[strlen - 2 - i]
# init first character and capitalize it
ch = funcname[i]
uc_ch = ch.capitalize()
# Capitalize the second character
uc_ch1 = ch1.capitalize()
# Calculate all XOR values
xor_ch = ord(ch) ^ strlen
xor_uc_ch = ord(uc_ch) ^ strlen
xor_ch1 = ord(ch1) ^ strlen
xor_uc_ch1 = ord(uc_ch1) ^ strlen
# do the multiplication and XOR again with upper case character1
hashv += ((xor_ch * xor_ch1) * xor_uc_ch)
hashv = hashv ^ xor_uc_ch1
i += 1
return hashv
A python script for calculating the hash for a given function name is also uploaded on my github page[2] and free for everyone to use. I’ve also uploaded a text file with hashes for exported functions of commonly used DLLs.
Deobfuscation by Commenting
So now that we cracked the algorithm, we want to update our disassembly to know which hash value represents which function. As I’ve already mentioned, we want to focus on simplicity. The easiest way is to compute hash values for exported functions of commonly used DLLs and write them into a file.
With this file, we can write an IdaPython
script to comment the library function name next to the Api Hashing call. Luckily the Api Hashing function is always called with the same pattern:
- Move the wanted hash value into the
EDX
register - Move a
DWORD
intoEAX
register
First we retrieve all XRefs
of the Api Hashing function. Each XRef
will contain an address where the Api Hashing function is called at, which means that in atleast the 5 previous instructions, we will find the mentioned pattern. So we will fetch the previous instruction until we extract the wanted hash value, which is being pushed into EDX
. Finally we can use this immediate to extract the corresponding api function from the hash values we have generated before and comment the function name next to the Xref
address.
def add_comment(addr, hashv, api_table):
"""Write a comment at addr with the matching api function.Return True if a corresponding api hash was found."""
# remove the "h" at the end of the string
hashv = hex(int(hashv[:-1], 16))
keys = api_table.keys()
if hashv in keys:
apifunc = api_table[hashv]
print "Found ApiFunction = %s. Adding comment." % (apifunc,)
idc.MakeComm(addr, apifunc)
comment_added = True
else:
print "Api function for hash = %s not found" % (hashv,)
comment_added = False
return comment_added
def main():
"""Main"""
f = open(
"C:\\Users\\luffy\\Desktop\\Danabot\\05-07-2020\\Utils\\danabot_hash_table.txt", "r")
lines = f.readlines()
f.close()
api_table = get_api_table(lines)
i = 0
ii = 0
for xref in idautils.XrefsTo(0x2f2858):
i += 1
currentaddr = xref.frm
addr_minus = currentaddr - 0x10
while currentaddr >= addr_minus:
currentaddr = PrevHead(currentaddr)
is_mov = GetMnem(currentaddr) == "mov"
if is_mov:
dst_is_edx = GetOpnd(currentaddr, 0) == "edx"
# needs to be edx register to match pattern
if dst_is_edx:
src = GetOpnd(currentaddr, 1)
# immediate always ends with 'h' in IDA
if src.endswith("h"):
add_comment(xref.frm, src, api_table)
ii += 1
print "Total xrefs found %d" % (i,)
print "Total api hash functions deobfuscated %d" % (ii,)
if __name__ == '__main__':
main()
Conclusion
As reverse engineers, we will probably continue to encounter Api Hashing in various different ways. I hope I was able to show you some quick & dirty method or give you at least some fundament on how to beat this obfuscation technique. I also hope that, the next time a blue team fellow has to analyse DanaBot, this article might become handy to him and saves him some time reverse engineering this banking trojan.
IoCs
- Dropper =
e444e98ee06dc0e26cae8aa57a0cddab7b050db22d3002bd2b0da47d4fd5d78c
- DLL =
cde01a2eeb558545c57d5c71c75e9a3b70d71ea6bbeda790a0b871fcb1b76f49