==============NVSMI LOG============== Timestamp : Sat Sep 27 11:31:51 2025 Driver Version : 535.161.08 CUDA Version : 12.2 Attached GPUs : 8 GPU 00000000:19:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1655023063898 GPU UUID : GPU-c58ae247-e8ab-7e82-2a70-985217ce0668 Minor Number : 0 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0x1900 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 2 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:30:01.626 Latest Duration : 86876 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x19 Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:19:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 15 Replay Number Rollovers : 0 Tx Throughput : 12632 KB/s Rx Throughput : 26484 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 85 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 57 C GPU T.Limit Temp : 26 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 69 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 695.50 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1830 MHz SM : 1830 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 870.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 3599569 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:3B:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1655223025023 GPU UUID : GPU-21ac6c7d-7c7d-b092-82f8-2cbfa90bc9ea Minor Number : 1 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0x3b00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 4 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:31:24.223 Latest Duration : 93762 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x3B Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:3B:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 23 Replay Number Rollovers : 0 Tx Throughput : 12433 KB/s Rx Throughput : 26253 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 88 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 53 C GPU T.Limit Temp : 34 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 61 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 698.45 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1860 MHz SM : 1860 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 885.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 3599569 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:4C:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1655223024308 GPU UUID : GPU-c9702494-4e51-248c-e543-b590f811a9b0 Minor Number : 2 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0x4c00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 1 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:31:21.156 Latest Duration : 87740 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x4C Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:4C:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 15 Replay Number Rollovers : 0 Tx Throughput : 11757 KB/s Rx Throughput : 24933 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 87 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 51 C GPU T.Limit Temp : 35 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 60 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 697.58 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1875 MHz SM : 1875 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 890.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 3599569 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:5D:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1655223042615 GPU UUID : GPU-90f8676b-993f-15cb-a068-8d1046c97acc Minor Number : 3 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0x5d00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 3 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:30:07.208 Latest Duration : 137515 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x5D Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:5D:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 16 Replay Number Rollovers : 0 Tx Throughput : 12824 KB/s Rx Throughput : 26496 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 88 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 58 C GPU T.Limit Temp : 29 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 64 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 696.20 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1830 MHz SM : 1830 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 870.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 3599569 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:9B:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1655223025329 GPU UUID : GPU-cce068a9-09b9-e975-51d8-3a1c08598b93 Minor Number : 4 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0x9b00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 7 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:31:06.429 Latest Duration : 89809 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x9B Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:9B:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 18 Replay Number Rollovers : 0 Tx Throughput : 12046 KB/s Rx Throughput : 24957 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 86 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 58 C GPU T.Limit Temp : 28 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 67 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 694.75 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1815 MHz SM : 1815 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 865.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 3599569 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:BB:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1655023064577 GPU UUID : GPU-be46e941-f5eb-48a9-9e87-891afa1cb340 Minor Number : 5 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0xbb00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 5 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:30:27.214 Latest Duration : 90212 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0xBB Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:BB:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 12 Replay Number Rollovers : 0 Tx Throughput : 11457 KB/s Rx Throughput : 24054 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 86 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 52 C GPU T.Limit Temp : 34 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 61 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 699.75 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1890 MHz SM : 1890 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 900.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 3599569 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:CB:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1655123024766 GPU UUID : GPU-744ae3a0-5f1f-2c83-0e7f-434b9402e304 Minor Number : 6 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0xcb00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 6 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:29:46.873 Latest Duration : 106617 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0xCB Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:CB:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 19 Replay Number Rollovers : 0 Tx Throughput : 11906 KB/s Rx Throughput : 24621 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 86 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 58 C GPU T.Limit Temp : 27 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 68 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 697.81 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1815 MHz SM : 1815 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 865.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 3599569 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:DB:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1655223025178 GPU UUID : GPU-884506b7-1154-b58e-7249-b10995e8f1aa Minor Number : 7 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0xdb00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 8 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:31:12.677 Latest Duration : 93786 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0xDB Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:DB:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 14 Replay Number Rollovers : 0 Tx Throughput : 11859 KB/s Rx Throughput : 25136 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 86 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 1 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 6621 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 50 C GPU T.Limit Temp : 36 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 59 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 697.19 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1845 MHz SM : 1845 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 885.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 3599569 Type : C Name : /etc/system Used GPU Memory : 5610 MiB