File Carving With Foremost

v20260426

performing-file-carving-with-foremost

Use Foremost to carve files from forensic disk images and unallocated space, configuring headers, targeted scans, and optional Scalpel modes to extract documents, media, and evidence when metadata is missing.

Forensics File-Carving Foremost Data-Recovery Evidence Disk-Image Scalpel Digital-Forensics

Get Skill

263 downloads

Overview

Performing File Carving with Foremost

When to Use

When recovering files from unallocated disk space or corrupted file systems
For extracting evidence from formatted or wiped storage media
When file system metadata is unavailable but raw data sectors contain evidence
During investigations requiring recovery of specific file types from raw images
As a complement to file system-based recovery for maximum evidence extraction

Prerequisites

Foremost installed on forensic workstation
Forensic disk image in raw (dd) format
Sufficient output storage (potentially larger than source)
Custom foremost.conf for specialized file types (optional)
Understanding of file signatures (magic bytes) for target file types
Scalpel as an alternative for performance-critical carving

Workflow

Step 1: Install and Configure Foremost

# Install Foremost
sudo apt-get install foremost

# Verify installation
foremost -V

# Review default configuration
cat /etc/foremost.conf

# The default foremost.conf supports:
# jpg, gif, png, bmp - Image formats
# avi, exe, mpg, wav - Media and executables
# riff, wmv, mov, pdf - Documents and video
# ole (doc/xls/ppt), zip, rar - Office and archives
# htm, cpp, java - Text/code files

# Create custom configuration for additional file types
cp /etc/foremost.conf /cases/case-2024-001/custom_foremost.conf

# Add custom file signatures
cat << 'EOF' >> /cases/case-2024-001/custom_foremost.conf
# Custom additions for investigation
# Format: extension  case_sensitive  max_size  header  footer
    docx    y    10000000    \x50\x4b\x03\x04    \x50\x4b\x05\x06
    xlsx    y    10000000    \x50\x4b\x03\x04    \x50\x4b\x05\x06
    pptx    y    10000000    \x50\x4b\x03\x04    \x50\x4b\x05\x06
    sqlite  y    50000000    \x53\x51\x4c\x69\x74\x65\x20\x66\x6f\x72\x6d\x61\x74
    pst     y    500000000   \x21\x42\x44\x4e
    eml     y    1000000     \x46\x72\x6f\x6d\x3a    \x0d\x0a\x0d\x0a
    evtx    y    50000000    \x45\x6c\x66\x46\x69\x6c\x65
EOF

Step 2: Run Foremost Against the Disk Image

# Basic carving of all supported file types
foremost -t all \
   -i /cases/case-2024-001/images/evidence.dd \
   -o /cases/case-2024-001/carved/foremost_all/

# Carve only specific file types
foremost -t jpg,png,pdf,doc,xls,zip \
   -i /cases/case-2024-001/images/evidence.dd \
   -o /cases/case-2024-001/carved/foremost_targeted/

# Use custom configuration
foremost -c /cases/case-2024-001/custom_foremost.conf \
   -i /cases/case-2024-001/images/evidence.dd \
   -o /cases/case-2024-001/carved/foremost_custom/

# Carve from a specific partition offset
# First, find partitions
mmls /cases/case-2024-001/images/evidence.dd
# Then carve from unallocated space only
# Extract unallocated space with blkls
blkls -o 2048 /cases/case-2024-001/images/evidence.dd \
   > /cases/case-2024-001/unallocated.dd

foremost -t all \
   -i /cases/case-2024-001/unallocated.dd \
   -o /cases/case-2024-001/carved/foremost_unalloc/

# Verbose mode for detailed progress
foremost -v -t all \
   -i /cases/case-2024-001/images/evidence.dd \
   -o /cases/case-2024-001/carved/foremost_verbose/ 2>&1 | \
   tee /cases/case-2024-001/carved/foremost_log.txt

# Indirect mode (process standard input)
dd if=/cases/case-2024-001/images/evidence.dd bs=512 skip=2048 | \
   foremost -t jpg,pdf -o /cases/case-2024-001/carved/foremost_pipe/

Step 3: Use Scalpel for High-Performance Carving

# Install Scalpel (faster alternative based on Foremost)
sudo apt-get install scalpel

# Edit Scalpel configuration (uncomment desired file types)
cp /etc/scalpel/scalpel.conf /cases/case-2024-001/scalpel.conf
# Uncomment lines for target file types in the config

# Run Scalpel
scalpel -c /cases/case-2024-001/scalpel.conf \
   -o /cases/case-2024-001/carved/scalpel/ \
   /cases/case-2024-001/images/evidence.dd

# Scalpel with file size limits
# Edit scalpel.conf to set appropriate max sizes:
# jpg  y  5000000  \xff\xd8\xff  \xff\xd9
# pdf  y  20000000 %PDF  %%EOF

Step 4: Process and Validate Carved Files

# Review Foremost audit report
cat /cases/case-2024-001/carved/foremost_all/audit.txt

# The audit.txt contains:
# - Number of files found per type
# - Start and end offsets
# - File sizes

# Validate carved files
python3 << 'PYEOF'
import os
import subprocess
from collections import defaultdict

carved_dir = '/cases/case-2024-001/carved/foremost_all/'
stats = defaultdict(lambda: {'total': 0, 'valid': 0, 'invalid': 0, 'size': 0})

for subdir in os.listdir(carved_dir):
    subdir_path = os.path.join(carved_dir, subdir)
    if not os.path.isdir(subdir_path) or subdir == 'audit.txt':
        continue

    for filename in os.listdir(subdir_path):
        filepath = os.path.join(subdir_path, filename)
        if not os.path.isfile(filepath):
            continue

        ext = subdir
        filesize = os.path.getsize(filepath)
        stats[ext]['total'] += 1
        stats[ext]['size'] += filesize

        # Validate file using 'file' command
        result = subprocess.run(['file', '--brief', filepath], capture_output=True, text=True)
        file_type = result.stdout.strip()

        if 'data' in file_type.lower() or 'empty' in file_type.lower():
            stats[ext]['invalid'] += 1
        else:
            stats[ext]['valid'] += 1

print("=== CARVED FILE VALIDATION ===\n")
print(f"{'Type':<10} {'Total':<8} {'Valid':<8} {'Invalid':<10} {'Total Size':<15}")
print("-" * 55)
for ext in sorted(stats.keys()):
    s = stats[ext]
    size_mb = s['size'] / (1024*1024)
    print(f"{ext:<10} {s['total']:<8} {s['valid']:<8} {s['invalid']:<10} {size_mb:>10.1f} MB")

# Remove zero-byte files
for subdir in os.listdir(carved_dir):
    subdir_path = os.path.join(carved_dir, subdir)
    if os.path.isdir(subdir_path):
        for filename in os.listdir(subdir_path):
            filepath = os.path.join(subdir_path, filename)
            if os.path.isfile(filepath) and os.path.getsize(filepath) == 0:
                os.remove(filepath)
PYEOF

# Hash all valid carved files
find /cases/case-2024-001/carved/foremost_all/ -type f ! -name "audit.txt" \
   -exec sha256sum {} \; > /cases/case-2024-001/carved/carved_file_hashes.txt

# Check against known-bad hash database
# Check against NSRL known-good database to filter

Step 5: Examine and Catalog Evidence Files

# Extract metadata from carved images (EXIF data including GPS)
exiftool -r -csv /cases/case-2024-001/carved/foremost_all/jpg/ \
   > /cases/case-2024-001/analysis/carved_image_metadata.csv

# Search carved documents for keywords
find /cases/case-2024-001/carved/foremost_all/pdf/ -name "*.pdf" -exec pdftotext {} - \; 2>/dev/null | \
   grep -iE '(confidential|secret|password|account|ssn|credit.card)' \
   > /cases/case-2024-001/analysis/keyword_hits_pdf.txt

# Generate thumbnails for image review
mkdir -p /cases/case-2024-001/carved/thumbnails/
find /cases/case-2024-001/carved/foremost_all/jpg/ -name "*.jpg" -exec \
   convert {} -thumbnail 200x200 /cases/case-2024-001/carved/thumbnails/{} \; 2>/dev/null

# Create evidence catalog
python3 << 'PYEOF'
import os, hashlib, csv, subprocess

catalog = []
carved_dir = '/cases/case-2024-001/carved/foremost_all/'

for subdir in sorted(os.listdir(carved_dir)):
    subdir_path = os.path.join(carved_dir, subdir)
    if not os.path.isdir(subdir_path):
        continue
    for filename in sorted(os.listdir(subdir_path)):
        filepath = os.path.join(subdir_path, filename)
        if not os.path.isfile(filepath):
            continue
        size = os.path.getsize(filepath)
        sha256 = hashlib.sha256(open(filepath, 'rb').read()).hexdigest()
        file_type = subprocess.run(['file', '--brief', filepath], capture_output=True, text=True).stdout.strip()

        catalog.append({
            'filename': filename,
            'type': subdir,
            'size': size,
            'sha256': sha256,
            'file_description': file_type[:100]
        })

with open('/cases/case-2024-001/analysis/carved_file_catalog.csv', 'w', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=['filename', 'type', 'size', 'sha256', 'file_description'])
    writer.writeheader()
    writer.writerows(catalog)

print(f"Catalog created with {len(catalog)} files")
PYEOF

Key Concepts

Concept	Description
File carving	Recovering files by searching for known header/footer byte sequences in raw data
File signature	Unique byte pattern at the start (header) or end (footer) identifying a file type
Unallocated space	Disk sectors not assigned to any file; primary target for carving
Fragmentation	When file data is stored in non-contiguous sectors, complicating carving
Header-footer carving	Extracting data between known file start and end signatures
False positives	Carved data matching file signatures but containing corrupt or unrelated content
Slack space	Unused bytes at the end of a file's last allocated cluster
Sector alignment	Files typically start at sector boundaries, improving carving accuracy

Tools & Systems

Tool	Purpose
Foremost	Original header-footer file carving tool developed for US Air Force OSI
Scalpel	High-performance file carver with configurable signatures
PhotoRec	Signature-based file recovery supporting 300+ formats
bulk_extractor	Extracts features (emails, URLs, credit cards) from raw data
blkls	Sleuth Kit tool extracting unallocated space from disk images
mmls	Partition table display for identifying carving targets
ExifTool	Metadata extraction from carved image and document files
hashdeep	Recursive hash computation for carved file cataloging

Common Scenarios

Scenario 1: Recovering Deleted Evidence Documents Run Foremost targeting doc, pdf, xlsx formats against the unallocated space extracted with blkls, validate carved documents, search content for case-relevant keywords, catalog and hash all recoverable documents, present as evidence.

Scenario 2: Image Recovery from Formatted Media Carve JPEG, PNG, GIF, BMP from a formatted USB drive image, extract EXIF metadata including GPS coordinates and camera information, generate thumbnails for rapid visual review, identify evidence-relevant images, document recovery chain.

Scenario 3: Email Recovery from Damaged PST Use custom foremost.conf with PST and EML signatures, carve email artifacts from damaged Outlook data file, attempt to open carved PST fragments in a viewer, extract individual EML messages, search for relevant communications.

Scenario 4: Database Recovery for Financial Investigation Configure Foremost to carve SQLite databases from unallocated space, recover application databases that were deleted, query recovered databases for financial records, cross-reference with known transaction data, document findings for prosecution.

Output Format

File Carving Summary:
  Tool: Foremost 1.5.7
  Source: evidence.dd (500 GB)
  Target: Unallocated space (234 GB)
  Duration: 1h 45m

  Files Carved:
    jpg:    2,345 files (1.8 GB) - Valid: 2,100 / Invalid: 245
    png:      234 files (456 MB) - Valid: 210 / Invalid: 24
    pdf:      156 files (890 MB) - Valid: 134 / Invalid: 22
    doc:       89 files (234 MB) - Valid: 67 / Invalid: 22
    xls:       45 files (123 MB) - Valid: 38 / Invalid: 7
    zip:       67 files (567 MB) - Valid: 52 / Invalid: 15
    exe:       34 files (234 MB) - Valid: 30 / Invalid: 4
    sqlite:    12 files (89 MB)  - Valid: 10 / Invalid: 2

  Total Files: 2,982 (3.4 GB recovered)
  Evidence-Relevant: 45 files flagged for review
  Audit Log: /cases/case-2024-001/carved/foremost_all/audit.txt
  File Catalog: /cases/case-2024-001/analysis/carved_file_catalog.csv

Info

Category Development

Name performing-file-carving-with-foremost

Version v20260426

Size 11.5KB

Source mukul975/Anthropic-Cybersecurity-Skills

Updated At 2026-05-10