Open-Source Coflex Framework Makes AI Chip Design 9.5× Faster

Image Credit: Shubham Dhage | Splash

Researchers from the Singapore University of Technology and Design (SUTD) and the Agency for Science, Technology and Research (A*STAR) have developed Coflex, an open-source framework that uses artificial intelligence to automate the optimization of application-specific integrated circuits (ASICs) for deep neural networks, delivering computational speed-ups in the optimization process ranging from 1.9 times to 9.5 times compared to state-of-the-art methods.

The framework, which integrates sparse Gaussian processes with multi-objective Bayesian optimization, targets hardware-aware neural architecture search (HW-NAS) to co-optimize neural network architectures and accelerator designs for edge devices, balancing metrics such as accuracy and energy-delay product (EDP).

Framework Details and Methodology

Coflex addresses the challenges of vast search spaces in HW-NAS by decomposing the joint configuration space into subspaces for error rates and energy efficiency. It starts with initial data sampling via Latin hypercube methods to ensure uniform coverage of high-dimensional spaces.

The core mechanism employs sparse Gaussian processes (SGP) to build surrogate models that predict outcomes without exhaustive testing, using inducing points selected through Pareto front filtering to reduce computational complexity from cubic to near-linear time. This allows handling search spaces exceeding 10^18 configurations. An acquisition function identifies promising candidates, evaluated using a training-free NAS assessor and the DeFiNES cycle-accurate simulator.

The search space encompasses software parameters, including layer types, window sizes, activation functions, and normalization methods, as well as hardware parameters such as processing element counts (1-10), memory sizes (64-512 units), and bus bandwidths in the joint search space, with experiment-specific ranges varying. This integrated approach ensures designs meet hardware constraints while maintaining neural network performance.

The framework is available as open-source code on GitHub at https://github.com/Edge-AI-Acceleration-Lab/Coflex.

Performance Results

Coflex was tested on benchmarks including NATS-Bench-SSS for image classification, TransNAS-Bench-101 for semantic segmentation, and NAS-Bench-NLP for natural language processing. It achieved computational speed-ups from 1.9 times to 9.5 times, with runtime reductions such as 86.3 minutes for certain workloads compared to 971.6 minutes for the qEHVI baseline.

Pareto-optimal outcomes included an error rate of 53.70% and EDP of 23.00 microjoule-seconds on ImageNet tasks, surpassing methods like qNParEGO et al., with NSGA-II comparisons on CIFAR tasks. Convergence to key thresholds occurred in as few as two iterations, with speed-ups up to 11.2 times in reaching predefined performance levels.

Background and Development

Coflex emerges from the need to manage the complexity of AI hardware design as deep neural networks demand greater computational resources for inference on resource-constrained edge devices. Traditional HW-NAS methods, such as reinforcement learning and evolutionary algorithms, face limitations in scalability due to high evaluation costs.

The work, led by Yinhui Ma and Tomomasa Yamasaki from SUTD, alongside Zhehui Wang, Tao Luo from A*STAR's Institute of High Performance Computing, and Bo Wang from SUTD, builds on Bayesian optimization advancements by incorporating sparsity for efficiency. It was motivated by edge AI requirements, where power and latency constraints are paramount.

Published as a preprint on arXiv in late July 2025 and accepted to ICCAD 2025 (conference Oct 26–30, 2025, Munich), the framework aligns with industry efforts to automate chip design using AI tools. Its open-source nature supports community-driven enhancements.

Impact on AI Hardware

The framework reduces barriers to custom ASIC development for deep neural networks by accelerating the HW-NAS process, potentially cutting research and development time and costs. It enables energy-efficient designs suitable for applications in autonomous systems, healthcare, and IoT, where real-time processing and privacy are essential.

By optimizing EDP, Coflex contributes to lowering the energy consumption of AI inference on edge devices. In edge computing, it facilitates broader AI deployment on devices with limited resources.

Future Trends

Frameworks like Coflex indicate a progression toward AI-integrated design workflows, where machine learning directly shapes hardware architectures. Open-source collaborations could lead to refinements, possibly incorporating generative models for expanded automation.

Ongoing challenges include validating designs across varied hardware and ensuring security in AI-generated systems. As neural networks advance, such tools may adapt to new workloads, promoting innovation in hardware-software co-design.

3% Cover the Fee
TheDayAfterAI News

We are a leading AI-focused digital news platform, combining AI-generated reporting with human editorial oversight. By aggregating and synthesizing the latest developments in AI — spanning innovation, technology, ethics, policy and business — we deliver timely, accurate and thought-provoking content.

Previous
Previous

AI ‘Big Bang’ Study 2025: ChatGPT Captures 48% of AI Traffic, 83% of Top 10 Chatbots

Next
Next

Keystone Partners: AI Could Expose 300 Million Jobs to Automation