==============NVSMI LOG============== Timestamp : Sat Sep 27 11:56:12 2025 Driver Version : 535.161.08 CUDA Version : 12.2 Attached GPUs : 8 GPU 00000000:19:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1650624003662 GPU UUID : GPU-6098b7c5-a583-da67-2600-afe72578e2d1 Minor Number : 0 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0x1900 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 2 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:29:52.040 Latest Duration : 96107 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x19 Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:19:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 5 Replay Number Rollovers : 0 Tx Throughput : 12414 KB/s Rx Throughput : 26363 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 87 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 58 C GPU T.Limit Temp : 28 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 66 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 699.00 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1875 MHz SM : 1875 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 885.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 790934 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:3B:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1650524071424 GPU UUID : GPU-1c52e0b6-34e5-b2a8-402f-2fdea23ace77 Minor Number : 1 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0x3b00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 4 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:29:37.959 Latest Duration : 91113 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x3B Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:3B:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 12 Replay Number Rollovers : 0 Tx Throughput : 12441 KB/s Rx Throughput : 26234 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 87 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 50 C GPU T.Limit Temp : 35 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 60 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 697.00 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1860 MHz SM : 1860 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 875.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 790934 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:4C:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1650524071057 GPU UUID : GPU-2bbddc35-2796-4043-2870-eaedd44ad562 Minor Number : 2 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0x4c00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 1 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:29:37.371 Latest Duration : 93027 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x4C Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:4C:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 11 Replay Number Rollovers : 0 Tx Throughput : 16144 KB/s Rx Throughput : 27097 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 87 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 51 C GPU T.Limit Temp : 34 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 60 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 698.62 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1875 MHz SM : 1875 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 900.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 790934 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:5D:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1650524074392 GPU UUID : GPU-f57d1abe-b53b-671e-886c-0b227a7e913e Minor Number : 3 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0x5d00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 3 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:29:14.459 Latest Duration : 93967 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x5D Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:5D:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 10 Replay Number Rollovers : 0 Tx Throughput : 12730 KB/s Rx Throughput : 26789 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 87 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 57 C GPU T.Limit Temp : 28 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 67 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 700.06 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1860 MHz SM : 1860 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 890.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 790934 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:9B:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1650524015411 GPU UUID : GPU-8f8c2aa2-82d8-5630-6ed5-d62fd314b906 Minor Number : 4 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0x9b00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 7 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:29:20.736 Latest Duration : 89587 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x9B Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:9B:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 6 Replay Number Rollovers : 0 Tx Throughput : 11535 KB/s Rx Throughput : 24472 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 87 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 58 C GPU T.Limit Temp : 28 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 67 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 696.93 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1875 MHz SM : 1875 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 890.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 790934 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:BB:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1650624006135 GPU UUID : GPU-ae991ab0-acf2-0702-9797-7fcdac61bbe1 Minor Number : 5 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0xbb00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 5 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:29:45.326 Latest Duration : 95818 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0xBB Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:BB:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 17 Replay Number Rollovers : 0 Tx Throughput : 12320 KB/s Rx Throughput : 26101 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 88 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 52 C GPU T.Limit Temp : 34 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 61 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 698.74 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1875 MHz SM : 1875 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 895.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 790934 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:CB:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1650524070739 GPU UUID : GPU-9591cff7-113b-62ad-55d6-36b9a92132f3 Minor Number : 6 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0xcb00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 6 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:29:26.891 Latest Duration : 93497 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0xCB Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:CB:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 9 Replay Number Rollovers : 0 Tx Throughput : 11562 KB/s Rx Throughput : 24757 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 86 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 57 C GPU T.Limit Temp : 28 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 67 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 699.57 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1845 MHz SM : 1845 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 890.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 790934 Type : C Name : /etc/system Used GPU Memory : 5610 MiB GPU 00000000:DB:00.0 Product Name : NVIDIA H100 80GB HBM3 Product Brand : NVIDIA Product Architecture : Hopper Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : None MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1650624005237 GPU UUID : GPU-c75ae2a9-04fb-89fc-a7a3-f20211acbc71 Minor Number : 7 VBIOS Version : 96.00.89.00.01 MultiGPU Board : No Board ID : 0xdb00 Board Part Number : 692-2G520-0200-000 GPU Part Number : 2330-885-A1 FRU Part Number : N/A Module ID : 8 Inforom Version Image Version : G520.0200.00.05 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2025/09/26 16:29:56.603 Latest Duration : 103457 us GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 535.161.08 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : No Drain and Reset Recommended : No IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0xDB Device : 0x00 Domain : 0x0000 Device Id : 0x233010DE Bus Id : 00000000:DB:00.0 Sub System Id : 0x16C110DE GPU Link Info PCIe Generation Max : 5 Current : 5 Device Current : 5 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 12 Replay Number Rollovers : 0 Tx Throughput : 15554 KB/s Rx Throughput : 25832 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P0 Clocks Event Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Sparse Operation Mode : Disabled FB Memory Usage Total : 81559 MiB Reserved : 551 MiB Used : 5619 MiB Free : 75388 MiB BAR1 Memory Usage Total : 131072 MiB Used : 4 MiB Free : 131068 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 100 % Memory : 86 % Encoder : 0 % Decoder : 0 % JPEG : 0 % OFA : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 2560 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 50 C GPU T.Limit Temp : 36 C GPU Shutdown T.Limit Temp : -8 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : 58 C Memory Max Operating T.Limit Temp : 0 C GPU Power Readings Power Draw : 698.85 W Current Power Limit : 700.00 W Requested Power Limit : 700.00 W Default Power Limit : 700.00 W Min Power Limit : 200.00 W Max Power Limit : 700.00 W Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1845 MHz SM : 1845 MHz Memory : 2619 MHz Video : 1545 MHz Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Default Applications Clocks Graphics : 1980 MHz Memory : 2619 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1980 MHz SM : 1980 MHz Memory : 2619 MHz Video : 1545 MHz Max Customer Boost Clocks Graphics : 1980 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 890.000 mV Fabric State : Completed Status : Success Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 790934 Type : C Name : /etc/system Used GPU Memory : 5610 MiB