Disk health forecast

The disk health control feature allows you to monitor the current disk health status and get a forecast of disk health. This information allows you to prevent any problems with data loss related to disk crashes. Both HDD and SSD types of disk are supported.

Limitations:
1. Disk health forecast is supported only for Windows machines.
2. Only the disks of physical machines can be monitored. The disks of virtual machines cannot be monitored and shown in the widget.

Disk health can be in one of the following statuses:

  • OK – disk health is 70-100%
  • Warning – disk health is 30-70%
  • Critical – disk health is 0-30%
  • Calculating disk data – the current disk status and forecast are being calculated

How it works

The Disk Health Prediction Service uses the artificial intelligence based prediction model.

  1. The agent collects the SMART parameters of disks and passes this data to Disk Health Prediction Service:

    • SMART 5 – reallocated sectors count
    • SMART 9 – power-on hours
    • SMART 187 – reported uncorrectable errors
    • SMART 188 – command timeout
    • SMART 197 – current pending sector count
    • SMART 198 – offline uncorrectable sector count
    • SMART 200 – write error rate
  2. Disk Health Prediction Service processes the received SMART parameters, makes forecasts, and provides the following disk health characteristics:

    • Disk health current state: OK, Warning, Critical.
    • Disk health forecast: negative, stable, positive.
    • Disk health forecast probability in percentage.

    The prediction period is always one month.

  3. The Monitoring Service gets the disk health characteristics and use this data in disk health widgets shown to a user in the console.

Disk health widgets

The results of the disk health monitoring can be found on the dashboard in the disk health related widgets:

  • Disk health overview – a treemap widget that has two levels of details that can be switched by drilling down:

    • Machine level – shows summarized information about disk status per the selected customer machines. The widget represents the most critical disk status data, other statuses are shown in the tooltip when you hover over the particular block. The machine block size depends on the total size of all disks of this machine. The machine block color depends on the most critical disk status found.

    • Disk level – shows the current disk status of all disks for the selected machine. Each disk block shows a forecast of disk status change:

      • Will be degraded (disk health forecast probability in %)
      • Will stay stable (disk health forecast probability in %)
      • Will be improved (disk health forecast probability in %)

  • Disk health status – a pie chart widget showing the number of disks for each status.

Disk health status alerts

Disk health check runs every 30 minutes while the corresponding alert is generated once a day. When the disk health has changed from Warning to Critical, you will also get the alert even if you already got another alert during a day.

Alert name Severity Disk health status Description
Disk failure is possible Warning [30;70) The [disk_name] disk on [machine_name] machine is likely to fail in the future. Please run a full image backup of this disk as soon as possible, replace it and then recover the image to the new disk.
Disk failure is imminent Critical (0;30) The [disk_name] disk on [machine_name] machine is in a critical state and will most likely fail very soon. An image backup of this disk is not recommended at this point as the added stress can cause the disk to fail. Please back up all the most important files on this disk right now and replace it.