Distributed and Dynamic Shared-Buffer Router for High-Performance Interconnect
Abstract
Most Network-on-Chip routers dedicate a set of buffers to the input and/or output ports. This design decision leads to buffer underutilization especially when running applications with non-uniform traffic patterns. In order to maximize resource usage for performance and energy gains, we present a synchronous and elastic buffer implementation of a router architecture called Roundabout with intrinsic resource sharing. Roundabout is inspired by real-life traffic roundabouts and consists of lanes shared by multiple input and output ports. Roundabout offers performance improvement of 61% for uniform traffic pattern and up to 88% for non-uniform traffic pattern over the Hermes router, a typical input buffered router. In terms of power, it consumes 24% less than the Hermes router. Roundabout provides a highly parametric architecture that can produce different router configurations with varying topological trade-offs for performance gains without sacrificing area.