Smash is a high-performance CLI tool for detecting duplicate files — fast. It works by slicing files or blobs into segments and hashing them with blazing-fast, non-cryptographic algorithms like xxhash or murmur3.
Built for speed and scale, smash is ideal for everything from low-bandwidth deduplication to analysing multi-terabyte datasets.
Key Features
- Fast: Handles large files quickly via slicing
- Efficient: Optimised for low I/O and bandwidth-constrained environments
- Smart hashing: Supports multiple algorithms like
xxhash, murmur3, and more
- Safe: Performs read-only scans of the filesystem
- Comprehensive: Detects duplicate and empty (0-byte) files
- Machine-friendly: JSON output compatible with tools like
jq — examples, demos
- Proven: Used to dedupe multi-terabyte astrophysics, image, and video datasets
smash does not delete duplicates. It generates detailed reports for you to safely review and act on.

Find duplicates in the linux/drivers source tree with smash (see our 🍿 other demos). Made with vhs!
The name comes from a prototype tool called SmartHash (written many years ago in C/ASM that's now lost in source &
too hard to modernise). It operated on a similar concept of slicing and hashing (with CRC32 then later MD5).
Installation

You can download the latest binaries from Github Releases or via our simple installer script - which currently supports Linux, macos, FreeBSD & Windows:
bash <(curl -s https://raw.githubusercontent.com/thushan/smash/main/install.sh)
It will download the latest version & extract it to its own folder for you.
Alternatively, you can install it via go:
go install github.com/thushan/smash@latest
smash has been developed on Linux (Pop!_OS & Fedora), tested on macOS, FreeBSD & Windows.
Docker
You can also run smash using Docker. Multi-architecture images (amd64/arm64) are available on GitHub Container Registry:
[!TIP]
Use the -t flag to allocate a pseudo-TTY for better output formatting with Docker.
We use the --rm flag to automatically remove the container after it exits, keeping
your environment clean in these examples.
# Pull the latest image
docker pull ghcr.io/thushan/smash:latest
# Scan current directory
docker run -t --rm -v "$PWD:/data" ghcr.io/thushan/smash:latest -r /data
# Scan with output file (saves to current directory)
docker run -t --rm -v "$PWD:/data" ghcr.io/thushan/smash:latest -r --silent -o /data/report.json /data
# Use the built-in /output directory (container includes a writable /output)
docker run -t --rm -v "$PWD:/data" -v "$PWD/output:/output" ghcr.io/thushan/smash:latest \
-r --silent -o /output/report.json /data
# Or create your own output directory
mkdir -p my-reports
docker run -t --rm -v "$PWD:/data" -v "$PWD/my-reports:/output" ghcr.io/thushan/smash:latest \
-r --silent -o /output/report.json /data
# Scan multiple directories with output
docker run -t --rm \
-v "$HOME/Documents:/docs:ro" \
-v "$HOME/Pictures:/pics:ro" \
-v "$PWD/output:/output" \
ghcr.io/thushan/smash:latest -r -o /output/report.json /docs /pics
# Windows PowerShell example
docker run --rm -v "${PWD}:/data" -v "${PWD}/output:/output" ghcr.io/thushan/smash:latest `
-r --silent -o /output/report.json /data
# Use a specific version
docker pull ghcr.io/thushan/smash:v1.0.0
Important notes:
- Output files must be written to mounted volumes (e.g.,
/data or /output)
- Use
:ro for read-only mounts when you only need to scan directories
- The container runs as non-root user, so ensure output directories are writable
The Docker image is based on Alpine Linux for a minimal footprint (~8MB) and runs as a non-root user for security.
Usage
# Basic usage - scan current directory
smash
# Recursive scan
smash -r
# Scan multiple directories
smash -r ~/Documents ~/Downloads
# Silent mode with report
smash -r --silent -o report.json ~/data
For detailed usage, see the User Guide.
Command Line Options
Key flags:
-r, --recurse - Scan subdirectories (required for recursive scanning)
-o, --output-file - Save results to JSON file
--silent - Suppress all output except errors
--algorithm - Choose hash algorithm (default: xxhash)
--exclude-dir - Skip directories (comma-separated)
--exclude-file - Skip files (comma-separated patterns)
Run smash --help for complete options.
Quick Examples
Find Duplicates
# In photos directory
smash -r ~/photos -o duplicates.json
# Across multiple drives
smash -r ~/Documents /mnt/backup/Documents
# Large video files only
smash -r --min-size=104857600 ~/Videos
Filter and Exclude
# Skip git and node_modules
smash -r --exclude-dir=.git,node_modules ~/projects
# Include empty files
smash -r --ignore-empty=false ~/data
# For network drives
smash -r --max-workers=4 /mnt/nas
# For many small files
smash -r --disable-slicing ~/documents
Working with Reports
# Generate report
smash -r ~/data -o report.json
# List all duplicates
jq -r '.analysis.dupes[].files[].path' report.json
# Show space wasted
jq '.analysis.summary.spaceWasted' report.json
See the User Guide for detailed examples and advanced usage.
Contributing
We welcome contributions! Please see our Developer Guide for information on:
- Building from source
- Running tests
- Development workflow
- Docker development
- Release process
Acknowledgements
This project was possible thanks to the following projects or folks.
Testers - MarkB, JarredT, BenW, DencilW, JayT, ASV, TimW, RyanW, WilliamH, SpencerB, EmadA, ChrisE, AngelaB, LisaA, YousefI, JeffG, MattP
License
Copyright (c) Thushan Fernando and licensed under Apache License 2.0