A domain generation algorithm is a routine/program that generates a domain dynamically. Think of the following example:
An actor registers the domain evil.com
. The corresponding backdoor has this domain hardcoded into its code. Once the attacker infects a target with this malware, it will start contacting its C2 server.
As soon as a security company obtains the malware, it might blacklist the registered domain evil.com
. This will hinder any attempts of the malware to receive commands from the original C2.
If a domain generation algorithm would have been used, the domain will be generated based on a seed. The current date for example is a popular seed amongst malware authors. A simple domain blacklisting would not solve the problem. The security company will have to resort to different methods.
By generating domains dynamically, it is harder for defenders to hinder the malware from contacting its C2 server. It will be necessary to understand the algorithm.
Example implementation of a DGA
A quick & dirty implementation(loosely based on Wikipedia)[1] of such algorithm could look like this:
"""Example implementation of a domain generation algorithm."""
import sys
import time
import random
def gen_domain(month, day, hour, minute):
"""Generate the domain based on time. Return domain"""
print(
f"[+] Gen domain based on month={month} day={day} hour={hour} min={minute}")
domain = ""
for i in range(8):
month = (((month * 8) ^ 0xF))
day = (((day * 8) ^ 0xF))
hour = (((hour * 8) ^ 0xF))
minute = (((minute * 8) ^ 0xF))
domain += chr(((month * day * hour * minute) % 25) + 0x61)
return domain
try:
while True:
d = gen_domain(random.randint(1, 12), random.randint(1, 30),
random.randint(0, 24), random.randint(0, 60))
print(f"[+] Generated domain = {d}")
time.sleep(5)
except KeyboardInterrupt:
sys.exit()
Our DGA algorithm would use the current date and time as a seed. Each parameter is multiplied with 8 and XOR’d with 0xF
. Finally all four values are multiplied with each other. The final operations are used to make sure that we generate a character in small caps. The output of this program looks like this:
[+] Gen domain based on month=12 day=2 hour=4 min=4
[+] Generated domain = taavtaab.com
[+] Gen domain based on month=3 day=10 hour=11 min=36
[+] Generated domain = kugxfkvx.com
[+] Gen domain based on month=2 day=27 hour=4 min=1
[+] Generated domain = kaasuapn.com
Seed or Dictionary based
There are different main approaches when implementing a domain generation algorithm. For the sake of keeping this simple, we will not focus on the hybrid approach.
Seed based Approach
We already introduced the first one. Our implementation is an algorithm based on a seed, which is served as an input. Another example I can provide, is how APT34
used such seed based algorithm in a campaign targeting a government organisation in the Middle East. The campaign was discovered by FireEye[2].
The mentioned APT group used domain generation algorithms in one of their downloaders. The Downloader was named BONDUPDATER
by FireEye and is implemented in the Powershell Scripting Language.
The first 12 chars of the UUID is extracted. Next the program runs into a loop. Each iteration a new random number is generated and the domain is generated by concatenating hardcoded, as well as generated values. GetHostAddresses
will try to resolve the generated domain. If it fails, a new iteration starts. Once a registered domain is generated and resolved, it will break the loop.
Depending on the resolved ip address, the script will trigger different actions.
Dictionary based Approach
The second approach is to create a dictionary based domain generation algorithm. Instead of focusing on a seed, a list of words could be provided. The algorithm randomly selects words from these lists, concatenates them and generates a new domain. Suppobox[3] is a malware, which implemented the dictionary based approach[4].
Defeating Domain Generation Algorithms
The straight forward way to counter these algorithms is to reverse engineer the routine and to predict future domains. One famous case of predicting future domains is the takedown of the Necurs Botnet by Microsoft[5]. By understanding the DGA, they were able to predict the domains for the next 25 months.
I am not a ML magician. However, just a quick google research shows that there is a lot research going on. Machine Learning based approaches to counter DGAs seems to be promising too.