Hi, we had some problems lately with ESXi servers that become frozen, and it seems to be related to the logs that being redirected to the vSAN datastore (KB2147541).
There's no more space on the SD Card, so the solution was to move the logs and scratch to another drive.
So I installed some USB key, I've been able to set everything there.
The Syslog.global.logDir is set to "[USB-Datastore] log", and the ScratchConfig.ConfiguredScratchLocation to "/vmfs/volumes/5cc0c7e0-f10c3b07-2462-000af794fe74/.locker", which is the same usb key.
Details
vSAN provides the following storage solutions for ESXi coredump and scratch partitions:
Assign one disk for each host. When you install vSAN on a local disk, a disk is automatically assigned to the host. Use this approach for hosts with more than 512GB of memory.
Use a USB or SD card and do not set scratch partitions as non-persistent. vSAN tracefiles take up space in a coredump, so a 4GB SD or USB card is sufficient to support coredumps for hosts with up to 512GB of memory rather than 1TB for hosts without vSAN.
Note: vSAN does not support having scratch log on the vSAN Datastore. For more information, see Redirecting system logs to a vSAN object causes an ESXi host lock up (2147541).
Solution
Using a USB or SD card for ESXi installations, coredump partitions, and a non-persistent scratch partitions has the following drawbacks:
vSAN tracefiles are stored in a virtual RAM drive that is persisted only in case of host failure. All other log files are stored in a non-persistent virtual RAM drive.
Use a 4GB SD card or USB drive for coredumps on hosts with 512GB of memory and where vSAN is enabled.
vSAN cannot recover tracefiles or any other log files, in case of a power loss.
In the first part, it say do not set scratch as non-persistent (in other word set the scratch to persistent) on a USB key with host with up to 512GB.
My servers have 320GB of memory. And from what I understand, my setup use persistent scratch now, right?
And the second part talk about a non-persistent scratch on usb key, but with hosts with 512GB of memory.
So is my setup ok?
After the changes, I began receiving warning from my Veeam One server about low latency on these drives, and want to be sure that its "normal".
Its really bad when you start the day with 2 servers frozen on a 3 servers vSAN cluster!
thank you
Eric.