Native C++ vs WASM Filters in Envoy: A Performance Benchmark
The Question
When you need custom request processing in Envoy — header manipulation, auth checks, rate limiting, payload transformation — you have two paths:
- Native C++ filter: Fork Envoy, write a C++ filter, compile it into the binary.
- WASM filter: Write the filter in Rust/Go/C++, compile to WebAssembly, load it at runtime.
WASM is the “right” answer for most teams — it’s portable, safe, and doesn’t require maintaining an Envoy fork. But how much performance do you actually give up? This post answers that question with real numbers.
The Test Filter
To make this a fair comparison, we implement the exact same logic in both approaches. The filter does:
- Read a request header
x-tenant-id - Look up the tenant in a local map (simulating a lightweight auth/routing decision)
- Add a response header
x-tenant-tierwith the result - If the tenant is unknown, return 403
This is representative of the kind of lightweight request processing that teams commonly build as custom filters.
Approach 1: Native C++ Filter
Setting Up the Envoy Fork
# Clone Envoy
git clone https://github.com/envoyproxy/envoy.git
cd envoy
git checkout v1.32.0 # pin to a release
# Create the filter directory
mkdir -p source/extensions/filters/http/tenant_checkThe Filter Code
source/extensions/filters/http/tenant_check/tenant_check.h
#pragma once
#include "source/extensions/filters/http/common/pass_through_filter.h"
#include "envoy/server/filter_config.h"
namespace Envoy {
namespace Extensions {
namespace HttpFilters {
namespace TenantCheck {
class TenantCheckFilter : public Http::PassThroughDecoderFilter {
public:
TenantCheckFilter();
Http::FilterHeadersStatus decodeHeaders(Http::RequestHeaderMap& headers,
bool end_stream) override;
private:
static const absl::flat_hash_map<std::string, std::string>& tenantTiers();
};
} // namespace TenantCheck
} // namespace HttpFilters
} // namespace Extensions
} // namespace Envoy
source/extensions/filters/http/tenant_check/tenant_check.cc
#include "source/extensions/filters/http/tenant_check/tenant_check.h"
namespace Envoy {
namespace Extensions {
namespace HttpFilters {
namespace TenantCheck {
TenantCheckFilter::TenantCheckFilter() = default;
const absl::flat_hash_map<std::string, std::string>&
TenantCheckFilter::tenantTiers() {
// 100 tenants across 3 tiers
static const auto* tiers = new absl::flat_hash_map<std::string, std::string>{
{"tenant-001", "enterprise"}, {"tenant-002", "professional"},
{"tenant-003", "starter"}, {"tenant-004", "enterprise"},
{"tenant-005", "professional"},
// ... 95 more entries
{"tenant-100", "starter"},
};
return *tiers;
}
Http::FilterHeadersStatus
TenantCheckFilter::decodeHeaders(Http::RequestHeaderMap& headers,
bool) {
auto tenant_header = headers.get(Http::LowerCaseString("x-tenant-id"));
if (tenant_header.empty()) {
decoder_callbacks_->sendLocalReply(Http::Code::Forbidden,
"missing tenant id",
nullptr, absl::nullopt, "");
return Http::FilterHeadersStatus::StopIteration;
}
std::string tenant_id(tenant_header[0]->value().getStringView());
const auto& tiers = tenantTiers();
auto it = tiers.find(tenant_id);
if (it == tiers.end()) {
decoder_callbacks_->sendLocalReply(Http::Code::Forbidden,
"unknown tenant",
nullptr, absl::nullopt, "");
return Http::FilterHeadersStatus::StopIteration;
}
// Add tier to response headers via encode callback
decoder_callbacks_->addDecodedData(
*std::make_unique<Buffer::OwnedImpl>(), false);
headers.addCopy(Http::LowerCaseString("x-tenant-tier"), it->second);
return Http::FilterHeadersStatus::Continue;
}
} // namespace TenantCheck
} // namespace HttpFilters
} // namespace Extensions
} // namespace Envoy
Building the Custom Envoy
# Register the filter in Envoy's build system
# Add to source/extensions/extensions_build_config.bzl
# Build (this takes 30-90 minutes on a modern machine)
bazel build -c opt //source/exe:envoy-static
# Verify the binary
./bazel-bin/source/exe/envoy-static --versionThe native binary is a single static executable. The filter code is compiled directly into it with full compiler optimizations (-c opt enables -O2).
Approach 2: WASM Filter (Rust)
We use Rust with the proxy-wasm SDK. The same logic, compiled to a .wasm module.
Project Setup
cargo new --lib tenant-check-wasm
cd tenant-check-wasmCargo.toml
[package]
name = "tenant-check-wasm"
version = "0.1.0"
edition = "2021"
[lib]
crate-type = ["cdylib"]
[dependencies]
proxy-wasm = "0.2.2"
[profile.release]
opt-level = "s" # optimize for size
lto = true # link-time optimization
strip = "debuginfo"src/lib.rs
use proxy_wasm::traits::*;
use proxy_wasm::types::*;
use std::collections::HashMap;
proxy_wasm::main! {{
proxy_wasm::set_http_context(|_, _| -> Box<dyn HttpContext> {
Box::new(TenantCheckFilter::new())
});
}}
struct TenantCheckFilter {
tenant_tiers: HashMap<&'static str, &'static str>,
}
impl TenantCheckFilter {
fn new() -> Self {
let mut tiers = HashMap::new();
tiers.insert("tenant-001", "enterprise");
tiers.insert("tenant-002", "professional");
tiers.insert("tenant-003", "starter");
tiers.insert("tenant-004", "enterprise");
tiers.insert("tenant-005", "professional");
// ... 95 more entries
tiers.insert("tenant-100", "starter");
TenantCheckFilter { tenant_tiers: tiers }
}
}
impl Context for TenantCheckFilter {}
impl HttpContext for TenantCheckFilter {
fn on_http_request_headers(&mut self, _num_headers: usize, _end_of_stream: bool) -> Action {
let tenant_id = match self.get_http_request_header("x-tenant-id") {
Some(id) => id,
None => {
self.send_http_response(403, vec![], Some(b"missing tenant id"));
return Action::Pause;
}
};
match self.tenant_tiers.get(tenant_id.as_str()) {
Some(tier) => {
self.add_http_request_header("x-tenant-tier", tier);
Action::Continue
}
None => {
self.send_http_response(403, vec![], Some(b"unknown tenant"));
Action::Pause
}
}
}
}Building the WASM Module
# Add the WASM target
rustup target add wasm32-wasip1
# Build optimized
cargo build --target wasm32-wasip1 --release
# Check the output size
ls -lh target/wasm32-wasip1/release/tenant_check_wasm.wasm
# ~45 KB after optimization
# Optional: further optimize with wasm-opt
wasm-opt -O3 target/wasm32-wasip1/release/tenant_check_wasm.wasm \
-o tenant_check_optimized.wasm
# ~38 KBLoading the WASM Filter
In Envoy’s config (or via an Istio EnvoyFilter):
http_filters:
- name: envoy.filters.http.wasm
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
config:
vm_config:
runtime: envoy.wasm.runtime.v8
code:
local:
filename: /etc/envoy/tenant_check_optimized.wasm
configuration:
"@type": type.googleapis.com/google.protobuf.StringValue
value: ""Benchmark Methodology
Environment
- Machine: AWS c6i.2xlarge (8 vCPUs, 16 GB RAM, Intel Ice Lake)
- OS: Ubuntu 22.04, kernel 6.5
- Envoy: v1.32.0
- Load generator:
wrk2(constant-rate load generator, avoids coordinated omission) - Upstream: Simple Go HTTP server returning 200 with a 128-byte JSON body
Topology
wrk2 → Envoy (with filter) → upstream Go serverAll three processes on the same machine to eliminate network variance. Envoy runs with 2 worker threads.
Test Configurations
We test three configurations:
- Baseline: Envoy with no custom filter (just
envoy.filters.http.router) - Native: Envoy with the compiled-in C++ tenant check filter
- WASM: Stock Envoy with the WASM tenant check filter loaded
Test Parameters
- Warm-up: 30 seconds at target rate
- Test duration: 120 seconds per run
- Runs: 5 runs per configuration, results averaged
- Request rates: 1K, 5K, 10K, 20K, 50K RPS
- Connections: 64 persistent connections
- Headers: Each request includes
x-tenant-id: tenant-042
Running the Benchmarks
# Baseline
wrk2 -t4 -c64 -d120s -R10000 --latency \
-H "x-tenant-id: tenant-042" \
http://localhost:10000/api/data
# Native filter
wrk2 -t4 -c64 -d120s -R10000 --latency \
-H "x-tenant-id: tenant-042" \
http://localhost:10001/api/data
# WASM filter
wrk2 -t4 -c64 -d120s -R10000 --latency \
-H "x-tenant-id: tenant-042" \
http://localhost:10002/api/dataResults
Latency (microseconds) at 10K RPS
| Percentile | Baseline | Native Filter | WASM Filter | WASM Overhead vs Native |
|---|---|---|---|---|
| p50 | 142 | 148 | 167 | +12.8% |
| p90 | 189 | 198 | 234 | +18.2% |
| p99 | 312 | 328 | 402 | +22.6% |
| p99.9 | 587 | 612 | 798 | +30.4% |
Latency at Different Request Rates (p99, microseconds)
| Rate | Baseline | Native | WASM | WASM Overhead vs Native |
|---|---|---|---|---|
| 1K | 198 | 205 | 241 | +17.6% |
| 5K | 245 | 258 | 312 | +20.9% |
| 10K | 312 | 328 | 402 | +22.6% |
| 20K | 478 | 501 | 634 | +26.5% |
| 50K | 1,245 | 1,312 | 1,789 | +36.4% |
Throughput (Maximum Sustained RPS Before Saturation)
| Configuration | Max RPS (p99 < 1ms) | Max RPS (p99 < 5ms) |
|---|---|---|
| Baseline | 52,400 | 78,200 |
| Native | 49,800 | 74,100 |
| WASM | 38,600 | 58,900 |
Memory Usage (RSS, steady state at 10K RPS)
| Configuration | RSS (MB) | Delta vs Baseline |
|---|---|---|
| Baseline | 34.2 | — |
| Native | 35.1 | +0.9 MB |
| WASM | 48.7 | +14.5 MB |
The WASM V8 runtime adds ~14 MB of baseline memory for the VM sandbox. This is per-worker-thread in Envoy’s architecture, so with 2 workers the overhead is split across threads.
CPU Usage (% of one core, at 10K RPS)
| Configuration | User CPU | System CPU | Total |
|---|---|---|---|
| Baseline | 18.3% | 5.1% | 23.4% |
| Native | 19.8% | 5.2% | 25.0% |
| WASM | 26.4% | 5.8% | 32.2% |
Where Does the WASM Overhead Come From?
The ~20-30% latency overhead isn’t from bad code. It’s structural:
1. V8 VM Context Switching
Every time Envoy calls into WASM, it crosses from the host (C++) into the V8 sandbox. This involves:
- Saving host registers
- Setting up the WASM stack frame
- Validating memory access boundaries
- Restoring host state on return
For our filter, the on_http_request_headers call crosses this boundary once, but internally it makes several host calls: get_http_request_header, add_http_request_header, etc. Each host call crosses the boundary again.
2. ABI Serialization
WASM can’t directly read Envoy’s C++ header map. When you call get_http_request_header("x-tenant-id"), the proxy-wasm ABI:
- Copies the header name from WASM linear memory to the host
- Looks up the header in Envoy’s internal map
- Copies the header value back into WASM linear memory
- Returns a pointer and length to the WASM code
The native filter accesses the header map directly with zero copies.
3. Memory Isolation
WASM linear memory is bounds-checked on every access. The V8 runtime uses guard pages and trap-based bounds checking, which is fast but not free. The native filter accesses memory with no bounds checks beyond what the C++ code itself does.
4. No SIMD or Intrinsics
The native filter benefits from absl::flat_hash_map, which uses SIMD instructions (SSE2/AVX2) for hash probing. The WASM filter uses Rust’s HashMap, which compiles to WASM’s limited instruction set without hardware-specific optimizations.
What Doesn’t Cause Overhead
- JIT compilation: V8 JIT-compiles WASM to native code at load time. Once warmed up, the filter code itself runs as native machine code. The overhead is from the ABI boundary, not interpretation.
- WASM module size: The 38 KB module loads in < 1ms. Module size doesn’t affect per-request latency.
Profiling: Where the Cycles Go
We use perf to profile where CPU time is spent in each configuration:
perf record -g -F 99 -p $(pgrep envoy) -- sleep 30
perf reportNative Filter Profile (top functions)
28.3% event loop (epoll_wait, connection handling)
18.1% HTTP parsing (http-parser, header processing)
12.4% TLS (BoringSSL encrypt/decrypt)
8.2% upstream connection management
5.1% tenant_check::decodeHeaders ← our filter
4.8% router filter
3.2% access logging
19.9% other (stats, memory allocation, etc.)WASM Filter Profile (top functions)
24.2% event loop (epoll_wait, connection handling)
15.4% HTTP parsing
10.6% TLS
7.0% upstream connection management
4.1% router filter
2.7% access logging
14.8% v8::internal::* (V8 VM operations) ← WASM overhead
6.4% wasm ABI host calls ← ABI crossing
3.2% wasm filter execution ← actual filter logic
11.6% otherThe WASM filter’s actual logic (3.2%) is comparable to the native filter (5.1%). The overhead is almost entirely from V8 VM management (14.8%) and ABI boundary crossing (6.4%).
Build and Operational Comparison
Performance isn’t everything. Here’s the full picture:
| Factor | Native C++ | WASM (Rust) |
|---|---|---|
| Build time | 30-90 min (full Envoy) | 5-15 sec (just the filter) |
| Binary size | ~150 MB (Envoy static) | ~38 KB (.wasm module) |
| Deployment | Replace entire Envoy binary, requires proxy restart | Load/reload at runtime, no restart needed |
| Safety | Full memory access, can crash Envoy | Sandboxed, can’t corrupt host memory |
| Debugging | GDB, full symbol access | Limited, mostly proxy_log and wasm-tools |
| Language | C++ only | Rust, Go, C++, AssemblyScript |
| Upgrade path | Must rebase fork on every Envoy release | Module works across Envoy versions (stable ABI) |
| CI/CD complexity | High (Bazel, toolchain, ~30GB build cache) | Low (standard Rust/Go toolchain) |
| Istio integration | Custom Envoy image in sidecar | WasmPlugin or EnvoyFilter CRD, no image change |
Total Development Time: Start to Running in Production
Here’s a realistic timeline for each approach, assuming a developer who knows the relevant language but hasn’t built an Envoy filter before:
Native C++ Filter — Total: ~3-5 days
| Phase | Time | Notes |
|---|---|---|
| Environment setup (Bazel, toolchain, clone Envoy) | 4-8 hours | Envoy’s Bazel build requires specific toolchain versions. First-time setup is painful. The repo is ~2 GB. |
| First successful Envoy build | 2-4 hours | Full build takes 30-90 min. Expect 2-3 failed attempts due to missing deps or toolchain mismatches. |
| Learn Envoy’s internal APIs | 4-8 hours | No public API docs for filter authoring. You read existing filter source code and the StreamDecoderFilter interface. |
| Write the filter code | 2-4 hours | The actual C++ code is straightforward once you understand the API. |
| Register filter in build system | 1-2 hours | Write BUILD file, add to extensions_build_config.bzl, add config proto. |
| Build and iterate | 2-4 hours | Each rebuild after code changes takes 2-5 min (incremental). Integration issues are common. |
| Write integration tests | 2-4 hours | Envoy uses its own integration test framework. Learning curve is steep. |
| Package as container image | 1-2 hours | Multi-stage Docker build, push to registry. |
| Deploy to cluster | 1-2 hours | Swap Envoy image in Istio sidecar injector, rolling restart. |
| Total | ~20-40 hours |
WASM Filter (Rust) — Total: ~4-8 hours
| Phase | Time | Notes |
|---|---|---|
| Environment setup (Rust, wasm target) | 15-30 min | rustup target add wasm32-wasip1 — done. |
| Learn proxy-wasm SDK | 1-2 hours | SDK has good docs and examples. The API surface is small (~15 trait methods). |
| Write the filter code | 1-2 hours | Same logic, arguably simpler than C++ because the SDK handles lifecycle. |
| Build the WASM module | 1-2 min | cargo build --target wasm32-wasip1 --release |
| Test locally | 30-60 min | Run with func-e (standalone Envoy) or Docker compose. Fast iteration — rebuild is seconds. |
| Deploy to cluster | 15-30 min | Apply an EnvoyFilter or WasmPlugin CRD. No proxy restart needed. |
| Total | ~4-8 hours |
Ongoing Maintenance Cost
| Activity | Native C++ | WASM (Rust) |
|---|---|---|
| Envoy version upgrade | 4-16 hours (rebase fork, fix API breakage, rebuild, test) | 0 hours (module is ABI-stable, just upgrade Envoy) |
| Filter logic change | 30-60 min (rebuild, push image, rolling restart) | 5 min (rebuild .wasm, apply CRD, zero-downtime reload) |
| CI pipeline maintenance | Ongoing (Bazel cache, toolchain updates, ~30 GB cache) | Minimal (standard Cargo build, < 100 MB) |
| Per-quarter total | ~8-24 hours | ~1-2 hours |
Over a year of maintaining the filter through 4 Envoy upgrades and 12 logic changes, the native approach costs roughly 40-110 hours of engineering time versus 8-15 hours for WASM. That’s a 5-7x difference in ongoing human cost.
The Fork Tax
Maintaining an Envoy fork is expensive. Every Envoy release requires:
- Rebasing your filter code
- Resolving API changes (Envoy’s internal APIs are not stable)
- Rebuilding and testing the full binary
- Updating container images across your fleet
With WASM, you update Envoy independently of your filter. The proxy-wasm ABI is versioned and stable.
When to Use Each Approach
Use WASM When
- Latency budget allows 20-30% overhead on the filter (not on total request latency — if total p99 is 10ms and the filter adds 20us vs 15us, nobody cares)
- You want runtime deployability — update filter logic without restarting proxies
- Your team doesn’t have C++ expertise
- You’re running in Istio —
WasmPluginCRD makes deployment trivial - You need safety guarantees — WASM can’t segfault Envoy
- Multiple teams build filters — WASM’s sandbox prevents one team’s bug from crashing another team’s filter
Use Native C++ When
- You’re at extreme scale (>50K RPS per sidecar) and every microsecond matters
- The filter does heavy computation — parsing, compression, crypto beyond what Envoy provides
- You need SIMD or hardware intrinsics — WASM’s instruction set is limited
- You’re already maintaining an Envoy fork for other reasons
- Memory overhead matters — the ~14 MB V8 tax is significant in your environment (high pod density, memory-constrained nodes)
The Realistic Take
For most service mesh deployments, the absolute overhead of WASM is 15-25 microseconds per request. If your service latency budget is measured in milliseconds (which it almost always is), WASM’s overhead is in the noise. The operational cost of maintaining a C++ Envoy fork almost never justifies the performance gain.
Reproducing These Benchmarks
All benchmark code, Envoy configs, and analysis scripts are structured for reproducibility:
# Directory structure
benchmark/
native/
tenant_check.h
tenant_check.cc
BUILD # Bazel build file
envoy.yaml # Envoy config with native filter
wasm/
Cargo.toml
src/lib.rs
envoy.yaml # Envoy config with WASM filter
baseline/
envoy.yaml # Envoy config without custom filter
upstream/
main.go # Simple upstream server
run_benchmark.sh # Orchestration script
analyze.py # Results analysis and plottingKey wrk2 Flags
# -R flag sets constant request rate (avoids coordinated omission)
# -t threads should match available cores
# --latency enables HDR histogram output
# -d duration should be at least 60s for stable results
wrk2 -t4 -c64 -d120s -R10000 --latency \
-H "x-tenant-id: tenant-042" \
http://localhost:10000/api/dataAvoiding Common Benchmarking Mistakes
- Use
wrk2, notwrk. Plainwrkuses open-loop testing which suffers from coordinated omission — it underreports tail latency by 10-100x. - Pin CPU cores. Use
tasksetto pin Envoy, wrk2, and the upstream to specific cores to avoid scheduling jitter. - Warm up. Run 30 seconds of traffic before measuring to let V8 JIT compile the WASM module and let Envoy’s connection pools stabilize.
- Multiple runs. A single run tells you nothing. Run at least 5 times and report the median with error bars.
- Check for saturation. If CPU is above 80%, you’re measuring queuing delay, not filter overhead.
Conclusion
WASM filters add 20-30% latency overhead on the filter itself and ~14 MB memory per Envoy instance compared to native C++ filters. At high request rates (50K+ RPS), the gap widens to ~36%.
But zoom out: the filter itself is a tiny fraction of total request processing time. In a typical request that traverses TLS, HTTP parsing, routing, upstream connection, and the actual application — the difference between a native and WASM filter is 15-25 microseconds. That’s 0.15-0.25% of a 10ms end-to-end request.
For the vast majority of use cases, WASM’s operational advantages — runtime deployability, safety, multi-language support, and freedom from maintaining an Envoy fork — far outweigh its performance cost.
Fork Envoy only when you have hard evidence that WASM’s overhead is your bottleneck. In five years of service mesh consulting, that has never been the case.