Predictive transient simulation analysis for the GPUs

Nowadays, graphics processing units (GPUs) feature tens of billions of transistors. With each new generation of GPUs, the number of transistors in GPUs continues to increase to improve processor performance. However, the growing number of transistors is also resulting in an exponential increase in power demand, which makes it more difficult to meet transient response specifications.

This article demonstrates how to use the SIMPLIS simulator from SIMPLIS Technologies to predict and optimize the behavior of power supplies for the next generation of GPUs, where high slew rate requirements and current levels exceeding 1,000 A demand faster transient response.

Constant-on-time (COT) control

The constant-on-time (COT) architecture of multi-phase buck converters replaces the error amplifier (EA) in the compensation network with a high-speed comparator. The output voltage (VOUT) is sensed via the feedback resistors and compared to a reference voltage (VREF). When VOUT drops below VREF, the high-side MOSFET (HS-FET) turns on. The MOSFET’s on time is fixed, meaning that the converter can achieve constant frequency in steady state. If there are load step transients, the converter can also significantly increase its pulse rate to minimize the output undershoot. In this scenario, however, the nonlinear loop control complicates loop tuning.

Figure 1 shows COT control for fast transient response.

Figure 1 COT control achieves fast transient response. Source: Monolithic Power Systems

The converter’s behavior and the power delivery network (PDN) must be accurately modeled to emulate the transient buck performance and validate various GPU-based systems without having to go through a long, costly iterative process.

Power delivery network (PDN)

The PDN is comprised of the components connected to the voltage and ground rail, including the power and ground plane layout, decoupling capacitors used for power stability, and any other copper features that connect or couple to the main power rails. The PDN design’s primary objective is to minimize the voltage fluctuations and ensure normal GPU operation.

Figure 2 shows the PDN architecture of a typical GPU power delivery network.

Figure 2 The PDN architecture of a typical GPU power delivery network comprises components connected to the voltage and ground rail. Source: Monolithic Power Systems

The components in the PDN display parasitic behaviors, such as the equivalent series inductance (ESL) and equivalent series resistance (ESR) of the capacitor. These parasitic elements must also be considered when modeling the system response. Increasing the slew rates generates more powerful high-frequency harmonics. The PDN’s resistor, inductor, ad capacitor (RLC) components create resonant tanks that designers may not be aware of, with resonant frequencies that amplify the high-frequency harmonics created by the converter’s switching, leading to unexpected converter behavior.

Table 1 shows the typical power rail requirements for artificial intelligence (AI) applications.

Table 1 The above numbers highlight design specifications for power rail. Source: Monolithic Power Systems

This analysis has been performed using an evaluation board that combines MP2891, a 16-phase digital controller, and MPC22163-130, a 130 A, two-phase, non-isolated, step-down power module. The evaluation board can reach up to 2,000 A (Figure 3).

Figure 3 The evaluation board combines digital controller and step-down power module. Source: Monolithic Power Systems

PCB modeling

The complexity of the power and ground polygon shapes and the multi-layer stack-up make it difficult to manually calculate the resistance and inductance from the layout. Instead, the PCB’s scattering parameters (S-parameters) can be extracted using Cadence Sigrity PowerSI, with a 0 MHz to 700 MHz frequency range. The ports are defined as follows: Port 1 includes the vertical modules on the top side; Port 2 includes the vertical MPC22163-130 modules on the bottom side; Port 3 includes the capacitor connection; and Port 4 includes the connection to load.

Figure 4 Extracting the PCB’s S-parameters requires specific port configurations. Source: Monolithic Power Systems

It is important to allocate special ports for the capacitor connections since their effectiveness in mitigating fast transients from the GPU depends on both the quantity and placement. Different capacitor positions affect the PCB’s S-parameters, where ineffective positioning can lead to poor transient mitigation and inefficient power. Generally, it is recommended to place capacitors in a row to minimize differences in path length and to select the capacitance based on the resonant frequency required to meet the target impedance specification.

Two different capacitor types are used in this PDN board design: bulk capacitors and MLCC capacitors. Parameters such as voltage, temperature rating, and construction materials impact the frequency at which the capacitors are effective at filtering. Therefore, to optimize the design, designers must consider the capacitor’s impedance profile using a lumped-capacitance model in the simulations (see Figure 5).

Figure 5 The equivalent bulk capacitor model and frequency response evaluate the capacitor’s impedance profile. Source: Monolithic Power Systems

CBYPASS, ESL, and ESR in the lumped-capacitance model define the frequency response of the capacitor’s impedance. The resonance frequency (fO), or the minimum impedance point, can be determined with Equation (1):

fo = 1/2π√L×C                   (1)

The primary objective of these capacitors is to maintain a low impedance when subjected to high frequencies at which the voltage regulator module (VRM) is inefficient. This inefficiency occurs because the VRM’s effective bandwidth and phase margin are at low frequencies (<1MHz). Thus, the capacitors must filter out the signals with frequencies outside of the VRM’s bandwidth, typically ranging between a few hundred kHz and a few MHz, which can affect the PDN’s operation.

Figure 6 shows a typical PDN impedance profile that can be divided into three regions: low frequency (0 MHz to 1 MHz), mid-frequency (1 MHz to 100 MHz), and high frequency (above 100 MHz). This correlation only considers the VRM and the motherboard, which are in the low- to mid-frequency range, and the transient load is applied on the ball grid array (BGA) connector.

Figure 6 The PDN impendence profile shows three different frequency ranges. Source: Monolithic Power Systems

Time domain simulation and correlation

Transient simulation is conducted using the SIMPLIS simulator, a switching power systems circuit simulation software that enables nonlinear features such as COT control. The MP2891 digital controller’s SIMPLIS model is combined with the MPC22163-130 step-down module and the PCB’s S-parameters that were previously extracted. The S-parameters must be converted to an RLGC model using IdEM from Dassault Systems before being used in the SIMPLIS simulator for transient analysis.

Figure 7 shows the SIMPLIS model of the MP2891 and MPC22163-130, where the S-parameters are added to the schematic as series inductors (L9 and L3) and resistors (R1 and R2).

Figure 7 The SIMPLIS model conducts transient simulation of the MP2891 and MPC22163-130. Source: Monolithic Power Systems

The SIMPLIS simulation combines the MP2891 digital controller’s nonlinearity with accurate power delivery modeling to enable accurate prediction of transient behavior on the motherboard. Figure 8 shows a comparison of the SIMPLIS simulation and lab measurement, where the difference is only 5 mV.

Figure 8 There is only a 5 mV difference between the SIMPLIS simulation and lab measurement. Source: Monolithic Power Systems

Why transient simulation?

This article modelled predictive transient simulation using a multi-phase controller and a two-phase, non-isolated, high-efficiency step-down power block on an evaluation board. Precise converter models and power delivery network parameters allow for accurate prediction of the multi-phase buck converter’s performance, transient droop, and overshoot.

As a result, it is possible to optimize the processor design in the early stages by reducing the number of output capacitors and determining their effective placement. Furthermore, if the design specifications change, accurate simulation enables making a quick assessment of the impact of these changes, as well as identifying any potential issues.

Marisol Cabrera is applications engineering supervisor at Monolithic Power Systems.

Tomas Hudson is applications engineer at Monolithic Power Systems.

Marlon Eguia is applications engineer at Monolithic Power Systems.

Related Content

GPU-Powered SPICE Simulator
Modeling and Simulation in Power Electronics
A Comparison of Power-Electronics Simulation Tools
GPU-Based Analytics Platform Interprets Large Datasets

googletag.cmd.push(function() { googletag.display(‘div-gpt-ad-native’); });

The post Predictive transient simulation analysis for the GPUs appeared first on EDN.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *