Research

Sharpness-Aware Surrogate Training for On-Sensor Spiking Neural Networks

This paper takes the core SAST idea and pushes it toward the deployment boundary that matters for event-based hardware: hard spikes, low precision arithmetic, fixed-point state, and energy-sensitive inference.

TLDR: The paper extends the earlier SAST preprint into an on-sensor setting. It keeps the surrogate-forward, exact-gradient training setup, applies sharpness-aware minimization, and then evaluates the same trained network under hard-spike and hardware-aware constraints. On N-MNIST, swap-only hard-spike accuracy rises from 65.7% to 94.7%, and under INT8 / Q8.8 hardware-aware inference it rises from 47.6% to 96.9%. Preprint: arXiv:2604.09696. Earlier method paper: arXiv:2603.18039.

65.7% -> 94.7%

N-MNIST swap-only hard-spike accuracy with SAST

31.8% -> 63.3%

DVS Gesture swap-only hard-spike accuracy with SAST

47.6% -> 96.9%

N-MNIST hardware-aware INT8/Q8.8 hard-spike accuracy

86,221k -> 4,323.5k

DVS Gesture INT8 SynOps energy proxy under SAST

Why on-sensor deployment is hard

Spiking neural networks are appealing for event-camera pipelines because the computation is sparse, binary, and naturally aligned with low-power neuromorphic hardware. The catch is that training usually happens with a smooth surrogate nonlinearity, while deployment has to use a hard threshold. That mismatch can look small in the loss function and still be painful at inference time, especially when many membrane potentials sit close to threshold.

This paper reframes that gap as an on-sensor systems problem rather than just a learning problem. The deployment target is not only a hard-spike model, but one that also tolerates quantized weights, fixed-point membrane potentials, discrete leak factors, and tight energy budgets. The question is whether the same flat-minima intuition from SAST survives those constraints.

What this paper adds

The starting point is the original SAST formulation: train a surrogate-forward SNN so the objective is genuinely smooth, then apply sharpness-aware minimization to prefer flatter regions of that surrogate loss surface. At deployment, only the spike nonlinearity is swapped from the surrogate to the hard Heaviside step; weights, thresholds, leak, and reset behavior stay fixed.

Relative to the earlier preprint, this version makes the on-sensor story much more explicit. It adds hardware-aware inference simulations, reports SynOps as an energy proxy, includes corruption and compute-matched controls, and extends the analysis with state stability, input-Lipschitz, smoothness, and nonconvex convergence results under explicit contraction assumptions.

Results

The main pattern is simple: SAST keeps surrogate-forward accuracy high while making the eventual hard-spike model much less brittle. That shows up both in the direct surrogate to hard swap and again when the evaluation stack is pushed closer to realistic hardware.

Swap-only transfer

In the cleanest test, the paper just replaces the surrogate activation with hard spikes and leaves everything else unchanged. Under that setup, the transfer gap shrinks sharply on both benchmarks at the best tested perturbation radius.

Transfer after only replacing the surrogate with hard spikes

Dataset	Baseline hard-spike	SAST hard-spike	Transfer gap
N-MNIST	65.7%	94.7%	30.3 pp -> 2.5 pp
DVS Gesture	31.8%	63.3%	43.2 pp -> 13.6 pp

Swap-only results from arXiv:2604.09696.

The paper also reports compute-matched controls. Even when the baseline gets more epochs to compensate for SAM's extra cost, it still does not close the gap: on N-MNIST the compute-matched baseline reaches 65.7% hard-spike accuracy versus 93.9% for SAST, and on DVS Gesture it reaches 28.0% versus 57.8%.

Hardware-aware inference

The stronger result is that the benefit survives a more deployment-like evaluation with quantized weights, fixed-point membrane state, and discrete leak. Under the tested INT8 and INT4 profiles, SAST still wins by a large margin and often reduces SynOps at the same time.

Hardware-aware hard-spike results

Profile	Baseline accuracy	SAST accuracy	SynOps
N-MNIST INT8 / Q8.8	47.6%	96.9%	1,734k -> 1,315k
N-MNIST INT4 / Q4.4	43.2%	81.0%	1,666k -> 1,346k
DVS Gesture INT8 / Q8.8	25.3%	47.6%	86,221.3k -> 4,323.5k
DVS Gesture INT4 / Q4.4	26.0%	43.8%	82,317k -> 4,145.6k

Hardware-aware INT8 and INT4 summaries from Table 4 in arXiv:2604.09696.

The paper also reports better robustness under random event-drop corruption and notes the expected training-time cost: roughly 2.1x wall-clock on N-MNIST and 1.8x on DVS Gesture, with no increase in peak memory when the two SAM minibatches are loaded sequentially.

Why it matters

A lot of SNN work can look good in surrogate mode and disappoint once the model is actually forced into binary hardware behavior. This paper is useful because it evaluates the method at that boundary directly. The headline is not just that flat minima help generalization in the abstract; it is that they seem to push membrane dynamics away from ambiguous threshold regions, making the eventual hard-spike implementation more reliable.

Just as importantly, the gain is not purely an accuracy story. The reduction in SynOps on some hardware-aware settings suggests that the same training signal can also produce sparser, cheaper spike activity, which is exactly the kind of tradeoff on-sensor systems care about.

Limits

The paper is careful not to oversell the result. The experiments are on two event-camera benchmarks and small SNN architectures, and the author frames SAST as one promising gap-reduction strategy under the tested settings rather than a universal answer for all neuromorphic deployments.

That is the right reading. The evidence here is strong for the specific swap-only and hardware-aware settings that were evaluated, but broader claims would still need more architectures, more sensors, and direct measurement on real hardware.

References

Nicholson, M. (2026). Sharpness-Aware Surrogate Training for On-Sensor Spiking Neural Networks. arXiv:2604.09696.

Nicholson, M. (2026). Sharpness Aware Surrogate Training for Spiking Neural Networks. arXiv:2603.18039.

Foret, P., Kleiner, A., Mobahi, H., and Neyshabur, B. (2021). Sharpness-Aware Minimization for Efficiently Improving Generalization. arXiv:2010.01412.