Do not use as the sole analysis method; YARA triage identifies known patterns but does not reveal new or unknown malware behaviors.
apt install yara or pip install yara-python)yara-python for scripted scanningApply community and commercial YARA rules to classify samples:
# Scan a single file
yara -s malware_rules.yar suspect.exe
# Scan a directory of samples
yara -r malware_rules.yar /path/to/samples/
# Scan with multiple rule files
yara -r rules/apt_rules.yar rules/ransomware_rules.yar rules/trojan_rules.yar suspect.exe
# Scan with timeout (prevent hanging on large files)
yara -t 30 malware_rules.yar suspect.exe
# Scan and show matching strings
yara -s -r malware_rules.yar suspect.exe
# Scan with compiled rules (faster for repeated scans)
yarac malware_rules.yar compiled_rules.yarc
yara compiled_rules.yarc suspect.exe
# Download community rule sets
git clone https://github.com/Yara-Rules/rules.git yara-community-rules
git clone https://github.com/Neo23x0/signature-base.git signature-base
# Scan with signature-base
yara -r signature-base/yara/*.yar suspect.exe
Create YARA rules based on strings extracted during malware analysis:
rule MalwareX_Strings {
meta:
description = "Detects MalwareX based on unique strings"
author = "analyst"
date = "2025-09-15"
reference = "Internal Analysis Report #1547"
hash = "e3b0c44298fc1c149afbf4c8996fb924"
tlp = "WHITE"
strings:
// C2 URL pattern
$url1 = "/gate.php?id=" ascii
$url2 = "/panel/connect.php" ascii
// Unique mutex name
$mutex = "Global\\CryptLocker_2025" ascii wide
// User-Agent string
$ua = "Mozilla/5.0 (compatible; MSIE 10.0)" ascii
// Registry persistence path
$reg = "Software\\Microsoft\\Windows\\CurrentVersion\\Run\\WindowsUpdate" ascii
// Campaign identifier
$campaign = "campaign_2025_q3" ascii
condition:
uint16(0) == 0x5A4D and // PE file (MZ header)
filesize < 500KB and // Size constraint
($url1 or $url2) and // At least one C2 URL
($mutex or $campaign) and // Campaign identifier
$ua // Specific User-Agent
}
Create rules matching specific code sequences:
rule MalwareX_Decryptor {
meta:
description = "Detects MalwareX XOR decryption routine"
author = "analyst"
date = "2025-09-15"
strings:
// XOR decryption loop (x86 assembly)
// mov al, [esi+ecx]
// xor al, [edi+ecx]
// mov [esi+ecx], al
// inc ecx
// cmp ecx, edx
// jl loop
$xor_loop = { 8A 04 0E 32 04 0F 88 04 0E 41 3B CA 7C F3 }
// RC4 KSA initialization (256-byte loop)
$rc4_ksa = { 33 C0 88 04 ?8 40 3D 00 01 00 00 7? }
// Embedded RSA public key marker
$rsa_key = { 06 02 00 00 00 A4 00 00 52 53 41 31 } // PUBLICKEYBLOB
condition:
uint16(0) == 0x5A4D and
($xor_loop or $rc4_ksa) and
$rsa_key
}
Leverage YARA's PE module for structural detection:
import "pe"
import "hash"
import "math"
rule MalwareX_PE_Characteristics {
meta:
description = "Detects MalwareX by PE structure and imports"
author = "analyst"
condition:
pe.is_pe and
// Compiled within specific timeframe
pe.timestamp > 1693526400 and // After 2023-09-01
pe.timestamp < 1727740800 and // Before 2024-10-01
// Specific import hash
pe.imphash() == "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6" or
// Suspicious import combination
(
pe.imports("kernel32.dll", "VirtualAllocEx") and
pe.imports("kernel32.dll", "WriteProcessMemory") and
pe.imports("kernel32.dll", "CreateRemoteThread") and
pe.imports("wininet.dll", "InternetOpenA")
) or
// High entropy .text section (packed)
(
for any section in pe.sections : (
section.name == ".text" and
math.entropy(section.raw_data_offset, section.raw_data_size) > 7.0
)
)
}
rule MalwareX_Rich_Header {
meta:
description = "Detects MalwareX by Rich header hash"
condition:
pe.is_pe and
hash.md5(pe.rich_signature.clear_data) == "abc123def456abc123def456abc123de"
}
Automate scanning of sample collections:
import yara
import os
import json
import hashlib
from datetime import datetime
# Compile all rule files
rule_files = {
"apt": "rules/apt_rules.yar",
"ransomware": "rules/ransomware_rules.yar",
"trojan": "rules/trojan_rules.yar",
"custom": "rules/custom_rules.yar",
}
rules = yara.compile(filepaths=rule_files)
# Scan sample directory
results = []
sample_dir = "/path/to/samples"
for filename in os.listdir(sample_dir):
filepath = os.path.join(sample_dir, filename)
if not os.path.isfile(filepath):
continue
with open(filepath, "rb") as f:
data = f.read()
sha256 = hashlib.sha256(data).hexdigest()
matches = rules.match(filepath)
result = {
"filename": filename,
"sha256": sha256,
"size": len(data),
"matches": [],
"classification": "UNKNOWN",
}
for match in matches:
result["matches"].append({
"rule": match.rule,
"namespace": match.namespace,
"tags": match.tags,
"strings": [(hex(s[0]), s[1], s[2].decode("utf-8", errors="replace")[:100])
for s in match.strings] if match.strings else []
})
if result["matches"]:
result["classification"] = result["matches"][0]["namespace"].upper()
results.append(result)
# Summary
classified = sum(1 for r in results if r["classification"] != "UNKNOWN")
print(f"Scanned: {len(results)} samples")
print(f"Classified: {classified} ({classified/len(results)*100:.1f}%)")
print(f"Unknown: {len(results)-classified}")
# Export results
with open("triage_results.json", "w") as f:
json.dump(results, f, indent=2)
Test rules for false positives and performance:
# Test rule syntax
yara -C custom_rules.yar
# Scan known-clean directory to check false positives
yara -r custom_rules.yar /path/to/clean_files/ > false_positives.txt
wc -l false_positives.txt
# Benchmark rule performance
time yara -r custom_rules.yar /path/to/large_sample_collection/
# Profile individual rule performance
yara -p custom_rules.yar suspect.exe
| Term | Definition |
|---|---|
| YARA Rule | Pattern matching rule defining strings, byte sequences, and conditions that identify a specific file or malware family |
| Condition | Boolean expression combining string matches, file properties, and module functions to determine if a rule matches |
| Hex String | Byte pattern with optional wildcards (??) and jumps ([N-M]) for matching machine code or binary data |
| PE Module | YARA module providing access to PE file properties (imports, sections, timestamps, resources) for structural matching |
| Imphash | MD5 hash of a PE file's import table; samples from the same family often share import hashes |
| Rich Header | Undocumented PE structure containing compiler/linker metadata; consistent within malware build environments |
| YARA-C | Compiled YARA rule format enabling faster scanning by pre-compiling rules for repeated use |
Context: Reverse engineering of a new malware sample has identified unique strings, byte patterns, and PE characteristics. YARA rules are needed for enterprise-wide hunting and ongoing detection.
Approach:
Pitfalls:
YARA TRIAGE RESULTS
=====================
Scan Date: 2025-09-15
Rule Sets: apt_rules (847 rules), ransomware_rules (312 rules),
trojan_rules (1,204 rules), custom_rules (45 rules)
Samples Scanned: 2,500
Processing Time: 47 seconds
CLASSIFICATION SUMMARY
APT: 12 samples (0.5%)
Ransomware: 187 samples (7.5%)
Trojan: 423 samples (16.9%)
Unknown: 1,878 samples (75.1%)
TOP MATCHING RULES
Rule Matches Family
MalwareX_C2_Beacon 45 MalwareX
LockBit3_Ransom_Note 38 LockBit 3.0
Emotet_Epoch5_Loader 32 Emotet
CobaltStrike_Beacon_Config 28 Cobalt Strike
QakBot_DLL_Loader 25 QakBot
SAMPLE DETAIL
File: suspect.exe
SHA-256: e3b0c44298fc1c149afbf4c8996fb924...
Matches:
[1] MalwareX_Strings (custom)
- $url1 at 0x4A20: "/gate.php?id="
- $mutex at 0x5100: "Global\\CryptLocker_2025"
[2] MalwareX_Decryptor (custom)
- $xor_loop at 0x401200: { 8A 04 0E 32 04 0F ... }
[3] MalwareX_PE_Characteristics (custom)
- PE import combination matched
Classification: MALWAREX (HIGH CONFIDENCE)