ROOT ▮ ENERGY-EFFICIENT AI BRANCH_0 OF ONE SPINE
TRACK 01 — FOR AMD · ARM · NVIDIA FOLKS
I build
cores & accelerators
that ship.
Not just papers — silicon. I've taken a custom Cortex-M0+ compatible core, a RISC-V-controlled transformer accelerator, and an SNN NPU from RTL to tape-out. I care about the same things you do: IPC, power, area, and whether the dense-fallback path is deterministic.
WHY ME FOR THIS TRACK
I've designed a core, not just used one
ARMuP: Cortex-M0+ compatible custom ISA with µSIMD instructions I specified, implemented, and taped out on SK keyfoundry 130nm.
Accelerator microarchitecture end-to-end
Sparsity-aware transformer accelerator with zero-skip datapaths and a RISC-V controller doing runtime clock gating — Samsung 28nm, 2025.
Power is my first-class metric
Spiking NPU with dynamic adaptive memory optimization published in IEEE TVLSI — the whole thesis is performance-per-watt.
EVIDENCE
RISC-V Adaptive Clock Control + Sparsity-Aware Transformer Accelerator
Samsung 28nm tape-out — zero-skip paths, MMIO control, deterministic dense-fallback
ARMuP — Custom ISA + µSIMD on Cortex-M0+
SK keyfoundry tape-out — 4 elems/cycle MAC throughput on a tiny core
S³A-NPU: Spiking Self-Supervised Learning Accelerator
Journal paper — dynamic adaptive memory optimization, pipelined SNN datapath
Opti-SpiSSL: Hardware Generation Framework
Auto-generates optimized accelerator RTL for FPGA/ASIC targets
Need someone who speaks
both RTL and PyTorch?
OPEN TO RESEARCH INTERNSHIPS — ARCHITECTURE / ACCELERATOR / NPU TEAMS