AMD EPYC 7313 CPU temperature
Nov. 10th, 2021 03:37 pmYesterday we thought that the temperature that "sensors" command showed - is a "PCI adapter" temperature.
We also thought that Centos8 drivers, currently, do not support CPU temperature measurement on AMD EPYC.
Today we learned that "Tdie" is, actually, CPU temperature.
We also discovered that AMD EPYC 7313 does not heat much: 55C maximum (compare vs Intel Xeon CPU that reaches 70C).
More technical details
1) "sensors" command shows CPU temperature even under regular user permissions.
But in order to see energy consumption per core/socket - "sensors" command needs superuser permissions [on AMD EPYC CPU].
2) "Tdie" is CPU temperature:
3) We ran stress test on AMD EPYC 7313 (esovh).
4) Under 20 minutes stress - CPU temperature reached +45.0°C maximum.
5) After ~10 minutes under stress - CPU temperature fell -0.2°C (to +44.8°C):
6) After stress ended - CPU temperature fell [with approximate speed -1°C per second] down to +26.2°C:
More temperature tests:
Intel Xeon E-2136 CPU Temperature
Ryzen 5900X CPU temperature
We also thought that Centos8 drivers, currently, do not support CPU temperature measurement on AMD EPYC.
Today we learned that "Tdie" is, actually, CPU temperature.
We also discovered that AMD EPYC 7313 does not heat much: 55C maximum (compare vs Intel Xeon CPU that reaches 70C).
More technical details
1) "sensors" command shows CPU temperature even under regular user permissions.
But in order to see energy consumption per core/socket - "sensors" command needs superuser permissions [on AMD EPYC CPU].
[centos@esovh ~]$ sudo -s sensors
k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +25.0°C (high = +70.0°C)
Tctl: +25.0°C
amd_energy-isa-0000
Adapter: ISA adapter
Ecore000: 206.91 J
Ecore001: 61.11 J
Ecore002: 38.78 J
Ecore003: 40.49 J
Ecore004: 27.27 J
Ecore005: 24.25 J
Ecore006: 23.04 J
Ecore007: 23.77 J
Ecore008: 23.50 J
Ecore009: 22.67 J
Ecore010: 43.86 J
Ecore011: 22.50 J
Ecore012: 27.76 J
Ecore013: 24.55 J
Ecore014: 23.84 J
Ecore015: 24.82 J
Esocket0: 20.47 kJ
Esocket1: 20.47 kJ
Esocket2: 20.47 kJ
Esocket3: 20.47 kJ
2) "Tdie" is CPU temperature:
https://forums.gentoo.org/viewtopic-t-1098716-start-0.html
-CPU (Tctl): This is the T_control temperature available on AMD CPUs only. On several generations before Zen (Ryzen), this is not a reliable representation of the temperature. On AMD Zen series this is the temperature used to control cooling and is a fixed offset from the real CPU temperature. Offset is used mostly on X-series and some Threadripper CPUs; in such case two values are shown: Tctl and Tdie. If no offset is used, then only a single value is shown as Tctl/Tdie, which equals the real temperature.
-CPU (Tdie): This value is shown in case the CPU uses an offset from Tctl and represents the real temperature (Tdie = Tctl - Tctl_offset).
3) We ran stress test on AMD EPYC 7313 (esovh).
stress --cpu 24 --timeout 20m
4) Under 20 minutes stress - CPU temperature reached +45.0°C maximum.
Every 2.0s: sensors esovh: Wed Nov 10 14:06:17 2021
k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +45.0°C (high = +70.0°C)
Tctl: +45.0°C
amd_energy-isa-0000
Adapter: ISA adapter
Ecore000: 2.51 kJ
Ecore001: 2.44 kJ
Ecore002: 2.93 kJ
Ecore003: 2.73 kJ
Ecore004: 2.51 kJ
Ecore005: 2.78 kJ
Ecore006: 2.98 kJ
Ecore007: 2.32 kJ
Ecore008: 2.71 kJ
Ecore009: 2.23 kJ
Ecore010: 2.38 kJ
Ecore011: 2.86 kJ
Ecore012: 2.69 kJ
Ecore013: 2.36 kJ
Ecore014: 2.78 kJ
Ecore015: 2.74 kJ
Esocket0: 1.02 MJ
Esocket1: 1.02 MJ
Esocket2: 1.02 MJ
Esocket3: 1.02 MJ
5) After ~10 minutes under stress - CPU temperature fell -0.2°C (to +44.8°C):
Every 2.0s: sensors esovh: Wed Nov 10 14:15:49 2021
k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +44.8°C (high = +70.0°C)
Tctl: +44.8°C
amd_energy-isa-0000
Adapter: ISA adapter
Ecore000: 4.83 kJ
Ecore001: 4.76 kJ
Ecore002: 5.99 kJ
Ecore003: 5.81 kJ
Ecore004: 4.87 kJ
Ecore005: 5.89 kJ
Ecore006: 6.11 kJ
Ecore007: 4.66 kJ
Ecore008: 5.61 kJ
Ecore009: 4.94 kJ
Ecore010: 4.75 kJ
Ecore011: 5.41 kJ
Ecore012: 5.13 kJ
Ecore013: 4.70 kJ
Ecore014: 5.63 kJ
Ecore015: 5.77 kJ
Esocket0: 1.09 MJ
Esocket1: 1.09 MJ
Esocket2: 1.09 MJ
Esocket3: 1.09 MJ
6) After stress ended - CPU temperature fell [with approximate speed -1°C per second] down to +26.2°C:
Every 2.0s: sensors esovh: Wed Nov 10 14:29:50 2021
k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +26.2°C (high = +70.0°C)
Tctl: +26.2°C
amd_energy-isa-0000
Adapter: ISA adapter
Ecore000: 5.92 kJ
Ecore001: 5.65 kJ
Ecore002: 6.97 kJ
Ecore003: 6.99 kJ
Ecore004: 5.93 kJ
Ecore005: 7.08 kJ
Ecore006: 7.15 kJ
Ecore007: 5.57 kJ
Ecore008: 6.49 kJ
Ecore009: 6.09 kJ
Ecore010: 5.89 kJ
Ecore011: 6.29 kJ
Ecore012: 6.03 kJ
Ecore013: 5.58 kJ
Ecore014: 6.78 kJ
Ecore015: 6.93 kJ
Esocket0: 1.15 MJ
Esocket1: 1.15 MJ
Esocket2: 1.15 MJ
Esocket3: 1.15 MJ
More temperature tests:
Intel Xeon E-2136 CPU Temperature
Ryzen 5900X CPU temperature