• 中国期刊全文数据库
  • 中国学术期刊综合评价数据库
  • 中国科技论文与引文数据库
  • 中国核心期刊(遴选)数据库
韦雪明, 周立昕, 尹仁川, 等. 一种高精度8T SRAM存储阵列存内计算电路[J]. 桂林电子科技大学学报, 2023, 43(6): 465-472. DOI: 10.3969/1673-808X.2022141
引用本文: 韦雪明, 周立昕, 尹仁川, 等. 一种高精度8T SRAM存储阵列存内计算电路[J]. 桂林电子科技大学学报, 2023, 43(6): 465-472. DOI: 10.3969/1673-808X.2022141
WEI Xueming, ZHOU Lixin, YIN Renchuan, et al. An 8T SRAM bitcell array-based compute-in-memory macro with high accuracy[J]. Journal of Guilin University of Electronic Technology, 2023, 43(6): 465-472. DOI: 10.3969/1673-808X.2022141
Citation: WEI Xueming, ZHOU Lixin, YIN Renchuan, et al. An 8T SRAM bitcell array-based compute-in-memory macro with high accuracy[J]. Journal of Guilin University of Electronic Technology, 2023, 43(6): 465-472. DOI: 10.3969/1673-808X.2022141

一种高精度8T SRAM存储阵列存内计算电路

An 8T SRAM bitcell array-based compute-in-memory macro with high accuracy

  • 摘要: 为解决传统“冯·诺依曼”架构功耗墙瓶颈,提升人工智能应用中点乘求和计算能效,设计了一种基于8T 静态随机存储器阵列的存内计算电路,可有效解决“内存墙”问题。通过对存储单元的偏置电压设计来稳定充放电电流,可改善位线放电线性度,提高计算准确性。同时,在保证放电电流相同的前提条件下,减少了模数转换器(ADC)阈值编码,存储阵列的面积明显减小。电路基于65 nm CMOS工艺设计,通过8\times72存储阵列的并行计算结构完成了64 Byte二进制点乘累加计算功能。仿真结果表明,在3位ADC输出、8 bit比较输出模式下,使用0.8、1.2 V的核心电源电压和250 MHz的时钟频率,可达到每比特1.69 GOPS/W的计算能效。与理论值基线相比,计算输出的平均计算偏差最大为1.05%,有效提高了计算准确率,并减小了电路面积。

     

    Abstract: To solve the bottleneck of power wall in traditional "von Neumann" architecture and improve the energy efficiency of multiplication and accumulation (MAC) in artificial intelligence applications, an in-memory computing circuit based on 8T static random memory array was designed to effectively avoid the "memory wall" problem. The bias voltage of the storage cell was designed to stabilize the charging and discharging currents, improve the linearity of the bitline discharge and increase the accuracy of the calculation. At the same time, the analog-to-digital converter (ADC) threshold coding was reduced and the area of the memory array was significantly reduced under the premise of ensuring the same discharge current. The circuit was designed based on a 65 nm CMOS process and accomplished a 64-Bite binary point multiplication and accumulation calculation function through a parallel calculation structure of 8\times72 memory arrays. Simulations show that a computational energy efficiency of 1.69 GOPS/W per bit is achieved in the 3-bit ADC output 8-bit comparison output mode, using core supply voltages of 0.8 and 1.2 V and a clock frequency of 250 MHz. Compared to the theoretical value baseline, the average calculation deviation of the calculated output is 1.05% maximum, effectively improving the calculation accuracy and reducing the circuit area.

     

/

返回文章
返回