10 days ago we ordered "Advance-4" server on OvhCloud.
(See Dedicated Server Hosting: CPU overheating)
OvhCloud promised to setup "Advance-4" server in 72 hours (3 days).
10 days later, OvhCloud prepared this server for us.
Unfortunately, temperature measurement utilities [that worked on Intel servers]:
It looks like current version of "sensors" command does not support machines with "AMD Epyc 7313" CPU.
Could you please recommend how to measure CPU temperature on this AMD EPYC 7313 server?
Should we install Windows to get correct CPU drivers?
(See Dedicated Server Hosting: CPU overheating)
OvhCloud promised to setup "Advance-4" server in 72 hours (3 days).
10 days later, OvhCloud prepared this server for us.
Unfortunately, temperature measurement utilities [that worked on Intel servers]:
sudo -s dnf install -y lm_sensorsdid not work on AMD EPYC 7313. Not under Centos-8 anyway.
sudo -s sensors-detect
sensors
It looks like current version of "sensors" command does not support machines with "AMD Epyc 7313" CPU.
[centos@esovh ~]$ sudo -s sensors-detect
# sensors-detect version 3.4.0+git
# Board: TYAN S8030GM4NE-2T-HOV
# Kernel: 4.18.0-305.25.1.el8_4.x86_64 x86_64
# Processor: AMD EPYC 7313 16-Core Processor (25/1/1)
This program will help you determine which kernel modules you need
to load to use lm_sensors most effectively. It is generally safe
and recommended to accept the default answers to all questions,
unless you know what you're doing.
Some south bridges, CPUs or memory controllers contain embedded sensors.
Do you want to scan for them? This is totally safe. (YES/no):
Silicon Integrated Systems SIS5595... No
VIA VT82C686 Integrated Sensors... No
VIA VT8231 Integrated Sensors... No
AMD K8 thermal sensors... No
AMD Family 10h thermal sensors... No
AMD Family 11h thermal sensors... No
AMD Family 12h and 14h thermal sensors... No
AMD Family 15h thermal sensors... No
AMD Family 16h thermal sensors... No
AMD Family 17h thermal sensors... No
AMD Family 15h power sensors... No
AMD Family 16h power sensors... No
AMD Family 19h thermal sensors... Success!
(driver `k10temp')
Intel digital thermal sensor... No
Intel AMB FB-DIMM thermal sensor... No
Intel 5500/5520/X58 thermal sensor... No
VIA C7 thermal sensor... No
VIA Nano thermal sensor... No
Some Super I/O chips contain embedded sensors. We have to write to
standard I/O ports to probe them. This is usually safe.
Do you want to scan for Super I/O sensors? (YES/no):
Probing for Super-I/O at 0x2e/0x2f
Trying family `National Semiconductor/ITE'... No
Trying family `SMSC'... No
Trying family `VIA/Winbond/Nuvoton/Fintek'... No
Trying family `ITE'... No
Probing for Super-I/O at 0x4e/0x4f
Trying family `National Semiconductor/ITE'... No
Trying family `SMSC'... No
Trying family `VIA/Winbond/Nuvoton/Fintek'... No
Trying family `ITE'... No
Some systems (mainly servers) implement IPMI, a set of common interfaces
through which system health data may be retrieved, amongst other things.
We first try to get the information from SMBIOS. If we don't find it
there, we have to read from arbitrary I/O ports to probe for such
interfaces. This is normally safe. Do you want to scan for IPMI
interfaces? (YES/no):
Found `IPMI BMC KCS' at 0xca2... Success!
(confidence 8, driver `to-be-written')
Some hardware monitoring chips are accessible through the ISA I/O ports.
We have to write to arbitrary I/O ports to probe them. This is usually
safe though. Yes, you do have ISA I/O ports even if you do not have any
ISA slots! Do you want to scan the ISA I/O ports? (YES/no):
Probing for `National Semiconductor LM78' at 0x290... No
Probing for `National Semiconductor LM79' at 0x290... No
Probing for `Winbond W83781D' at 0x290... No
Probing for `Winbond W83782D' at 0x290... No
Lastly, we can probe the I2C/SMBus adapters for connected hardware
monitoring devices. This is the most risky part, and while it works
reasonably well on most systems, it has been reported to cause trouble
on some systems.
Do you want to probe the I2C/SMBus adapters now? (YES/no):
Using driver `i2c-piix4' for device 0000:00:14.0: AMD KERNCZ SMBus
Module i2c-dev loaded successfully.
Next adapter: SMBus PIIX4 adapter port 0 at ff00 (i2c-0)
Do you want to scan it? (YES/no/selectively):
Next adapter: SMBus PIIX4 adapter port 2 at ff00 (i2c-1)
Do you want to scan it? (YES/no/selectively):
Next adapter: SMBus PIIX4 adapter port 3 at ff00 (i2c-2)
Do you want to scan it? (YES/no/selectively):
Next adapter: SMBus PIIX4 adapter port 4 at ff00 (i2c-3)
Do you want to scan it? (YES/no/selectively):
Now follows a summary of the probes I have just done.
Just press ENTER to continue:
Driver `k10temp' (autoloaded):
* Chip `AMD Family 19h thermal sensors' (confidence: 9)
Driver `to-be-written':
* ISA bus, address 0xca2
Chip `IPMI BMC KCS' (confidence: 8)
Note: there is no driver for IPMI BMC KCS yet.
Check https://hwmon.wiki.kernel.org/device_support_status for updates.
No modules to load, skipping modules configuration.
Unloading i2c-dev... OK
[centos@esovh ~]$ sensors
k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +25.4°C (high = +70.0°C)
Tctl: +25.4°C
amd_energy-isa-0000
Adapter: ISA adapter
ERROR: Can't get value of subfeature energy1_input: Kernel interface error
energy1: N/A
ERROR: Can't get value of subfeature energy2_input: Kernel interface error
energy2: N/A
ERROR: Can't get value of subfeature energy3_input: Kernel interface error
energy3: N/A
ERROR: Can't get value of subfeature energy4_input: Kernel interface error
energy4: N/A
ERROR: Can't get value of subfeature energy5_input: Kernel interface error
energy5: N/A
ERROR: Can't get value of subfeature energy6_input: Kernel interface error
energy6: N/A
ERROR: Can't get value of subfeature energy7_input: Kernel interface error
energy7: N/A
ERROR: Can't get value of subfeature energy8_input: Kernel interface error
energy8: N/A
ERROR: Can't get value of subfeature energy9_input: Kernel interface error
energy9: N/A
ERROR: Can't get value of subfeature energy10_input: Kernel interface error
energy10: N/A
ERROR: Can't get value of subfeature energy11_input: Kernel interface error
energy11: N/A
ERROR: Can't get value of subfeature energy12_input: Kernel interface error
energy12: N/A
ERROR: Can't get value of subfeature energy13_input: Kernel interface error
energy13: N/A
ERROR: Can't get value of subfeature energy14_input: Kernel interface error
energy14: N/A
ERROR: Can't get value of subfeature energy15_input: Kernel interface error
energy15: N/A
ERROR: Can't get value of subfeature energy16_input: Kernel interface error
energy16: N/A
ERROR: Can't get value of subfeature energy17_input: Kernel interface error
energy17: N/A
ERROR: Can't get value of subfeature energy18_input: Kernel interface error
energy18: N/A
ERROR: Can't get value of subfeature energy19_input: Kernel interface error
energy19: N/A
ERROR: Can't get value of subfeature energy20_input: Kernel interface error
energy20: N/A
Could you please recommend how to measure CPU temperature on this AMD EPYC 7313 server?
Should we install Windows to get correct CPU drivers?