Pure Go. Parallel workers. Pre-scan fingerprint filtering. Neither yarac nor yara-x can do what yara-g does at 88 workers.
$ curl -sSLf https://github.com/plan10/yara-g/releases/latest/download/yara-g-linux-amd64 | sudo tee /usr/local/bin/yara-g > /dev/null && sudo chmod +x /usr/local/bin/yara-g $ yara-g --auto-workers rules.yar /corpus/ # 7,489 MB/s on AMD EPYC 7H12, 88 workers
Native -p N parallelism with per-worker buffers, NUMA-aware rule replication, and auto-worker detection. Scales linearly to 32 workers. yarac and yara-x are single-threaded.
Scan network streams, pipes, and partial data without materialising to disk. Neither yarac nor yara-x expose a streaming API. yara-g achieves 100 MB/s streaming at 1,000 rules.
Jaccard-similarity fingerprint index prunes non-matching files before expensive AC scanning. Cuts total scan time by 65% on large corpora. A capability no other engine has.
Persist compiled rules to disk with --rules-cache. Auto-invalidated on source change. 1.4× speedup at 5,000 rules. Skip recompilation entirely across sessions.
Self-contained Go binary. No CGO, no shared library wrangling. Deploy to containers or embedded Linux. Cross-compile with a single go build.
Every pattern type, modifier, condition operator. All built-in modules (pe, elf, math, hash). Custom modules via the Go API. Backward-compatible with yarac test suite.
Built only when XOR or hex patterns fire. Eliminates 2 full file passes for the majority of non-XOR rule sets.
Channel + writer goroutine replaces global mutex. Workers never block on output — serialisation is decoupled from the scan hot path.
4 MB pre-allocated per worker. Eliminates sync.Pool contention at 88 concurrent workers. Zero pool overhead.
LiteralAC and HexGateAC merged into one pass over file data. Halves Phase A memory bandwidth consumption.
Dedicated I/O goroutines prefetch files while scan workers run. Decouples disk latency from CPU-bound scanning.
Scales with GOMAXPROCS. Inner goroutine spawn suppressed unless rule set is large enough. Eliminates 7,744 goroutine spawns per file-batch.
Reads /sys/devices/system/node/ topology. Defaults to socket-local core count. Prevents 2× efficiency loss from over-threading.
Deep-copies all 4 AC tries per NUMA node. Workers use node-local memory. Eliminates cross-socket pointer chasing in the AC Walk hot path.
When ≥200 patterns and file ≥1 MB, verification distributed across up to 8 goroutines. Up to 2× on files dominated by expensive Phase B patterns.
7-repeat median · 96 vCPU dual-socket · Linux 6.12
The original YARA engine was never designed for multi-socket servers, NUMA topology, or corpus sizes in the billions. yara-g was. Parallel worker scaling, a pre-scan filter that runs in constant time regardless of rule count, and a streaming API that neither yarac nor yara-x provide. If you're scanning anything at scale — malware repositories, forensic corpora, DFIR pipelines — yara-g is the only engine that treats your hardware as it deserves.