Reverse Engineering Malware Using Ghidra

v20260601

reverse-engineering-malware-with-ghidra

A deep dive into advanced malware reverse engineering techniques. Learn how to utilize Ghidra's powerful disassembler and decompiler to analyze complex malware binaries. This guide covers identifying C2 communication protocols, extracting cryptographic routines, understanding memory injection, and analyzing low-level assembly code for core logic and evasion techniques. Essential for professional cybersecurity analysts.

Malware Reverse Engineering Ghidra Disassembly Decompilation Cybersecurity Binary Analysis C2

Get Skill

332 downloads

Overview

Reverse Engineering Malware with Ghidra

When to Use

Static and dynamic analysis have identified suspicious functionality that requires deeper code-level understanding
You need to reverse engineer C2 communication protocols, encryption algorithms, or custom obfuscation
Understanding the exact exploit mechanism or vulnerability targeted by a malware sample
Extracting hardcoded configuration data (C2 addresses, encryption keys, campaign IDs) embedded in compiled code
Developing precise YARA rules or detection signatures based on unique code patterns

Do not use for initial triage of unknown samples; perform static analysis with PEStudio and behavioral analysis with Cuckoo first.

Prerequisites

Ghidra 11.x installed (download from https://ghidra-sre.org/) with JDK 17+
Analysis VM isolated from production network (Windows or Linux host)
Familiarity with x86/x64 assembly language and Windows API conventions
PDB symbol files for Windows system DLLs to improve decompilation accuracy
Ghidra scripts repository (ghidra_scripts) for automated analysis tasks
Secondary reference: IDA Free or Binary Ninja for cross-validation of analysis results

Workflow

Step 1: Create Project and Import Binary

Set up a Ghidra project and import the malware sample:

1. Launch Ghidra: ghidraRun (Linux) or ghidraRun.bat (Windows)
2. File -> New Project -> Non-Shared Project -> Select directory
3. File -> Import File -> Select malware binary
4. Ghidra auto-detects format (PE, ELF, Mach-O) and architecture
5. Accept default import options (or specify base address if known)
6. Double-click imported file to open in CodeBrowser
7. When prompted, run Auto Analysis with default analyzers enabled

Headless analysis for automation:

# Run Ghidra headless analysis with decompiler
/opt/ghidra/support/analyzeHeadless /tmp/ghidra_project MalwareProject \
  -import suspect.exe \
  -postScript ExportDecompilation.py \
  -scriptPath /opt/ghidra/scripts/ \
  -deleteProject

Step 2: Identify Key Functions and Entry Points

Navigate the binary to locate critical code sections:

Navigation Strategy:
━━━━━━━━━━━━━━━━━━━
1. Start at entry point (OEP) - follow execution from _start/WinMain
2. Check Symbol Tree for imported functions (Window -> Symbol Tree)
3. Search for cross-references to suspicious APIs:
   - VirtualAlloc/VirtualAllocEx (memory allocation for injection)
   - CreateRemoteThread (remote thread injection)
   - CryptEncrypt/CryptDecrypt (encryption operations)
   - InternetOpen/HttpSendRequest (C2 communication)
   - RegSetValueEx (persistence via registry)
4. Use Search -> For Strings to find embedded URLs, IPs, and paths
5. Check the Functions window sorted by size (large functions often contain core logic)

Ghidra keyboard shortcuts for efficient navigation:

G         - Go to address
Ctrl+E    - Search for strings
X         - Show cross-references to current location
Ctrl+Shift+F - Search memory for byte patterns
L         - Rename label/function
;         - Add comment
T         - Retype variable
Ctrl+L    - Retype return value

Step 3: Analyze Decompiled Code

Use Ghidra's decompiler to understand function logic:

// Example: Ghidra decompiler output for a decryption routine
// Analyst renames variables and adds types for clarity

void decrypt_config(BYTE *encrypted_data, int data_len, BYTE *key, int key_len) {
    // XOR decryption with rolling key
    for (int i = 0; i < data_len; i++) {
        encrypted_data[i] = encrypted_data[i] ^ key[i % key_len];
    }
    return;
}

// Analyst actions in Ghidra:
// 1. Right-click parameters -> Retype to correct types (BYTE*, int)
// 2. Right-click variables -> Rename to meaningful names
// 3. Add comments explaining the algorithm
// 4. Set function signature to propagate types to callers

Step 4: Trace C2 Communication Logic

Follow the network communication code path:

Analysis Steps for C2 Protocol Reverse Engineering:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Find InternetOpenA/WinHttpOpen call -> trace to wrapper function
2. Follow data flow from encrypted config -> URL construction
3. Identify HTTP method (GET/POST), headers, and body format
4. Locate response parsing logic (JSON parsing, custom binary protocol)
5. Map the C2 command dispatcher (switch/case or jump table)
6. Document the command set (download, execute, exfiltrate, update, uninstall)

Ghidra Script for extracting C2 configuration:

# Ghidra Python script: extract_c2_config.py
# Run via Script Manager in Ghidra

from ghidra.program.model.data import StringDataType
from ghidra.program.model.symbol import SourceType

# Search for XOR decryption patterns
listing = currentProgram.getListing()
memory = currentProgram.getMemory()

# Find references to InternetOpenA
symbol_table = currentProgram.getSymbolTable()
for symbol in symbol_table.getExternalSymbols():
    if "InternetOpen" in symbol.getName():
        refs = getReferencesTo(symbol.getAddress())
        for ref in refs:
            print("C2 init at: {}".format(ref.getFromAddress()))

Step 5: Analyze Encryption and Obfuscation

Identify and document cryptographic routines:

Common Malware Encryption Patterns:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
XOR Cipher:     Loop with XOR operation, often single-byte or rolling key
RC4:            Two loops (KSA + PRGA), 256-byte S-box initialization
AES:            Look for S-box constants (0x63, 0x7C, 0x77...) or calls to CryptEncrypt
Base64:         Lookup table with A-Za-z0-9+/= characters
Custom:         Combination of arithmetic operations (ADD, SUB, ROL, ROR with XOR)

Identification Tips:
- Search for constants: AES S-box, CRC32 table, MD5 init values
- Look for loop structures operating on byte arrays
- Check for Windows Crypto API usage (CryptAcquireContext -> CryptCreateHash -> CryptEncrypt)
- FindCrypt Ghidra plugin automatically identifies crypto constants

Step 6: Document Findings and Create Detection Signatures

Produce actionable intelligence from reverse engineering:

# Generate YARA rule from unique code patterns found in Ghidra
cat << 'EOF' > malware_family_x.yar
rule MalwareFamilyX_Decryptor {
    meta:
        description = "Detects MalwareX decryption routine"
        author = "analyst"
        date = "2025-09-15"
    strings:
        // XOR decryption loop with hardcoded key
        $decrypt = { 8A 04 0E 32 04 0F 88 04 0E 41 3B CA 7C F3 }
        // C2 URL pattern after decryption
        $c2_pattern = "/gate.php?id=" ascii
    condition:
        uint16(0) == 0x5A4D and $decrypt and $c2_pattern
}
EOF

Key Concepts

Term	Definition
Disassembly	Converting machine code bytes into human-readable assembly language instructions; Ghidra's Listing view shows disassembled code
Decompilation	Lifting assembly code to pseudo-C representation for easier analysis; Ghidra's Decompile window provides this view
Cross-Reference (XREF)	Reference showing where a function or data address is called from or used; essential for tracing code execution flow
Control Flow Graph (CFG)	Visual representation of all possible execution paths through a function; reveals branching logic and loops
Original Entry Point (OEP)	The actual start address of the malware code after unpacking; packers redirect execution through an unpacking stub first
Function Signature	The return type, name, and parameter types of a function; applying correct signatures improves decompiler output quality
Ghidra Script	Python or Java automation script executed within Ghidra to perform batch analysis, pattern searching, or data extraction

Tools & Systems

Ghidra: NSA's open-source software reverse engineering suite with disassembler, decompiler, and scripting support for multiple architectures
IDA Pro/Free: Industry-standard interactive disassembler; IDA Free provides x86/x64 cloud-based decompilation
Binary Ninja: Commercial reverse engineering platform with modern UI and extensive API for plugin development
x64dbg: Open-source x64/x32 debugger for Windows used alongside Ghidra for dynamic debugging of malware
FindCrypt (Ghidra Plugin): Plugin that identifies cryptographic constants and algorithms in binary code

Common Scenarios

Scenario: Reversing Custom C2 Protocol

Context: Behavioral analysis shows encrypted traffic to an external IP on a non-standard port. Network signatures cannot detect variants because the protocol is proprietary. Deep reverse engineering is needed to understand the protocol structure.

Approach:

Import the unpacked sample into Ghidra and run full auto-analysis
Locate socket/WinHTTP API calls and trace backwards to the calling function
Identify the encryption routine called before data is sent (follow data flow from send/HttpSendRequest)
Reverse the encryption (XOR key extraction, RC4 key derivation, AES key location)
Map the command structure by analyzing the response parsing function (switch/case on command IDs)
Document the protocol format (header structure, command bytes, encryption method)
Create a protocol decoder script for network monitoring tools

Pitfalls:

Not running the full auto-analysis before starting manual analysis (missing function boundaries and type propagation)
Ignoring indirect calls through function pointers or vtables (use cross-references to data holding function addresses)
Spending time on library code that Ghidra's Function ID (FID) or FLIRT signatures should have identified
Not saving Ghidra project progress frequently (analysis state can be lost on crashes)

Output Format

REVERSE ENGINEERING ANALYSIS REPORT
=====================================
Sample:           unpacked_payload.exe
SHA-256:          abc123def456...
Architecture:     x86 (32-bit PE)
Ghidra Project:   MalwareX_Analysis

FUNCTION MAP
0x00401000  main()              - Entry point, initializes config
0x00401200  decrypt_config()    - XOR decryption with 16-byte key
0x00401400  init_c2()           - WinHTTP initialization, URL construction
0x00401800  c2_beacon()         - HTTP POST beacon with system info
0x00401C00  cmd_dispatcher()    - Switch on 12 command codes
0x00402000  inject_process()    - Process hollowing into svchost.exe
0x00402400  persist_registry()  - HKCU Run key persistence
0x00402800  exfil_data()        - File collection and encrypted upload

C2 PROTOCOL
Method:           HTTPS POST to /gate.php
Encryption:       RC4 with derived key (MD5 of bot_id + campaign_key)
Bot ID Format:    MD5(hostname + username + volume_serial)
Beacon Interval:  60 seconds with 10% jitter
Command Set:
  0x01 - Download and execute file
  0x02 - Execute shell command
  0x03 - Upload file to C2
  0x04 - Update configuration
  0x05 - Uninstall and remove traces

ENCRYPTION DETAILS
Algorithm:        RC4
Key Derivation:   MD5(bot_id + "campaign_2025_q3")
Hardcoded Seed:   "campaign_2025_q3" at offset 0x00405A00

EXTRACTED IOCs
C2 URLs:          hxxps://update.malicious[.]com/gate.php
                  hxxps://backup.evil[.]net/gate.php (failover)
Campaign ID:      campaign_2025_q3
RC4 Key Material: [see encryption details above]

Info

Category Engineering

Name reverse-engineering-malware-with-ghidra

Version v20260601

Size 13.78KB

Source mukul975/Anthropic-Cybersecurity-Skills

Updated At 2026-06-03