title | ms.custom | ms.date | ms.reviewer | ms.suite | ms.tgt_pltfrm | ms.topic | ms.assetid | caps.latest.revision | author |
---|---|---|---|---|---|---|---|---|---|
Understanding Admin Console Alerts (Analytics Platform System) |
na |
01/05/2017 |
na |
na |
na |
article |
0c4aa221-55c2-44cf-9eaa-3bf7bd55e51a |
10 |
BarbKess |
Alerts appear in the appliance Admin Console and in System Center Operations Manager (SCOM). Use this list of alerts to help identify which alerts require additional investigation.
For information about connecting to the Admin Console by using Internet Explorer, see Monitor the Appliance by Using the Admin Console (Analytics Platform System). For information about SCOM, see Monitor the Appliance by Using System Center Operations Manager (Analytics Platform System)
For information about obtaining alert information by using Transact-SQL, see Monitor the Appliance by Using System Views (Analytics Platform System).
Alert names that indicate a NORMAL status do not usually require investigation. Alert names that contain the words NON_CRITICAL sometimes require action. Investigation is required for all other types of alerts.
Alerts are listed alphabetically by the Alert Name. All possible alerts are not in the list. The wording of some alerts varies slightly for different vendors.
Alert Name | Action Required? | State | Severity | Description | More Information |
---|---|---|---|---|---|
Ambari Agent has CRITICAL status. | Yes | Failed | Error | This Ambari Agent resource has failed (status: 4) or is offline (status: 3). Other offline states included are when an offline is pending (status: 130). Status is reported in the component's "hadoop_service_status" property. | Review the Cluster Resource on the Head and Data nodes. |
Ambari Agent has NON-CRITICAL status. | Yes | Degraded | Warning | This Ambari Agent resource is in an non-critical state due to one of the following reasons: - resource is in inherited state (status: 0) - resource is in pending state (status: 128) - resource is in online pending state (status: 129) - resource is performing initialization (status: 1) Status is reported in the component's "hadoop_service_status" property. | Review the Cluster Resource on the Head and Data nodes. |
Ambari Agent has NORMAL status. | No | Operational | Informational | The Ambari Agent is running normally (status: Running). Status is reported in the component's "hadoop_service_status" property. | |
Ambari Agent has UNKNOWN status. | Yes | Degraded | Warning | Status of this Ambari Agent resource could not be determined (status: -1). Status is reported in the component's "hadoop_service_status" property. | Review the Cluster Resource on the Head and Data nodes. |
Application Heartbeat has NORMAL status. | No | Operational | Informational | Successfully established communication with the application. | Indicates that the component previously reported a different status, but has since returned to normal. |
Application Heartbeat is throwing CRITICAL alert. | Yes | Non-operational | Error | Could not communicate with the application. Application might be in the process of restarting. | The application heartbeat is in an unexpected state. Troubleshooting required. Review the node's Windows event log for details. |
Cluster Failover event has occurred. | Yes | Operational | Error | The primary clustered node is no longer active so the passive has taken over as the primary node. Review the failed node's Windows event log for details and review the Failover Cluster Manager on the HST01 VM. | Failover has occurred. Troubleshooting required. Review the Failover Cluster Manager on the HST01 VM and the node's system event log. |
Cluster resource group has CRITICAL status. | Yes | Failed | Error | This cluster resource group has failed and may be in the process of attempting a restart, or is being offline HST01 VM. | The resource group status has failed and requires troubleshooting. Review the Failover Cluster Manager on the HST01 VM. |
Cluster resource group has NON-CRITICAL status. | Yes | Degraded | Warning | This cluster resource group is online but in an non-critical state due to one of the following reasons: resource group is partially online or the resource group is in pending state. | The resource group is not completely in the expected state. Troubleshooting required. Review the Failover Cluster Manager on the HST01 VM. |
Cluster resource group has NORMAL status. | No | Operational | Informational | This cluster resource group is online | Indicates that the component previously reported a different status, but has since returned to normal. |
Cluster resource group has UNKNOWN status. | Yes | Degraded | Warning | This cluster resource group is in an unknown state. | The system was not able to retrieve the health status of the cluster resource group. Troubleshooting required. Review the Failover Cluster Manager on the HST01 VM. |
Cluster resource has CRITICAL status. | Yes | Failed | Error | This clustered resource has failed and may be attempting a restart, or is in offline state. | Cluster resource is not in the expected state. Troubleshooting required. Review the Failover Cluster Manager on the HST01 VM. |
Cluster resource has NON-CRITICAL status. | Yes | Degraded | Warning | This clustered resource is in a non-critical state due to one of the following reasons: resource is in inherited state, resource is in pending state, resource is in online pending state, resource is in offline pending state, or resource is performing initialization. | Cluster resource is not in the expected state. Troubleshooting required. Review the Failover Cluster Manager on the HST01 VM. |
Cluster resource has NORMAL status. | No | Operational | Informational | This clustered resource is online. | Indicates that the component previously reported a different status, but has since returned to normal. |
Cluster resource has UNKNOWN status. | Yes | Degraded | Warning | Status of this clustered resource could not be determined. | The system was not able to retrieve the health state of the cluster resource. Troubleshooting required. Review the Failover Cluster Manager on the HST01 VM. |
Cluster Shared Volume has CRITICAL status. | Yes | Failed | Error | This clustered shared volume resource has failed (status: 4) or is offline (status: 3). Other offline states included are when an offline is pending (status: 130). Status is reported in the component's "csv_state" property. | Review the Failover Cluster Manager on the HST01 VM. |
Cluster Shared Volume has NON-CRITICAL status. | Yes | Degraded | Warning | This clustered shared volume resource is in an non-critical state due to one of the following reasons: - resource is in inherited state (status: 0) - resource is in pending state (status: 128) - resource is in online pending state (status: 129) - resource is performing initialization (status: 1) Status is reported in the component's "csv_state" property. | Review the Failover Cluster Manager on the HST01 VM. |
Cluster Shared Volume has NORMAL status. | No | Operational | Informational | This clustered shared volume resource is online (status: 2). Status is reported in the component's "csv_state" property. | |
Cluster Shared Volume has UNKNOWN status. | Yes | Degraded | Warning | Status of this clustered shared volume resource could not be determined (status: -1). Status is reported in the component's "csv_state" property. | Review the Failover Cluster Manager on the HST01 VM. |
Cluster Status Normal | No | Operational | Informational | Cluster has NORMAL status. | Indicates that the component previously reported a different status, but has since returned to normal. |
Controller has CRITICAL status. | Yes | Failed | Error | The PERC disk is indicating there is a critical error, or the controller has been powered off. | The local RAID controller has a critical error and may need to be replaced. Troubleshooting required. Review the node's Windows event log for details. |
Controller has NON-CRITICAL status. | Yes If problem persists more than 7 hours or reoccurs multiple times on the same node not tied to expected reboots | Degraded | Warning | The PERC disk reported non-critical problem, probably related to cable malfunction. | This most commonly indicates a battery recharging cycle on the PowerEdge RAID Controller's battery-backed cache module. This could be the scheduled test cycle (duration up to ~7 hours)or it also could be reported after reboots or power cycles when the battery must recharge. Important: This also usually indicates that the controller's policy temporarily has changed from write-through to write-back until the charging is complete which will have performance implications on the local storage (tempdb). Review the node's Windows event log for details. |
Controller has NON-RECOVERABLE status. | Yes | Failed | Error | The PERC disk status is non-recoverable. | The local RAID controller is not functional and has entered a non-recoverable state and may need to be replaced. Troubleshooting required. Review the node's Windows event log for details. |
Controller has NORMAL status. | No | Operational | Informational | The PERC disk is running normally | Indicates that the component previously reported a different status, but has since returned to normal. |
Controller has UNKNOWN status. | Yes | Degraded | Warning | The status of the PERC disk could not be determined. | The system could not retrieve the health state of the local RAID controller. Troubleshooting required. Review the node's Windows event log for details. |
Cooling device has CRITICAL status. | Yes | Failed | Warning | The cooling device has reached critical upper or lower threshold | The cooling device may require replacement. Troubleshooting required. Review the node's Windows event log for details. |
Cooling device has NON-CRITICAL status. | Yes | Degraded | Warning | The cooling device has reached non critical upper or lower threshold. | The cooling device hasn't reached critical levels, but is outside the expected upper or lower range. Review the node's Windows event log for details. |
Cooling device has NON-RECOVERABLE status. | Yes | Failed | Warning | The cooling device has reached non-recoverable upper or lower threshold. | The cooling device may require replacement. Troubleshooting required. Review the node's Windows event log for details. |
Cooling device has NORMAL status. | No | Operational | Informational | The cooling device is running normally. | Indicates that the component previously reported a different status, but has since returned to normal. |
Cooling device has UNKNOWN status. | Yes | Degraded | Warning | The status of the cooling device could not be determined | The system could not retrieve the status of the cooling device. Troubleshooting required. Review the node's Windows event log for details. |
Disk array has CRITICAL overall status. | Yes | Failed | Error | The disk array overall status is critical. | Could indicate that the disk array is no longer active due to failed drives or a similar problem. Troubleshooting required. Review the node's Windows event log for details. |
Disk array has NON-CRITICAL overall status. | Yes | Degraded | Warning | The disk array overall status is indicating there is a non-critical warning but system is still operational. | The disk array is still functional, but this could indicate a disk failure or a similar problem. Troubleshooting required. Review the node's Windows event log for details. |
Disk array has NON-RECOVERABLE overall status. | Yes | Failed | Error | The disk array overall status is non-recoverable. | The disk array is no longer functional. Troubleshooting required. Review the node's Windows event log for details. |
Disk array has NORMAL overall status. | No | Operational | Informational | The disk array overall status is normal. | Indicates that the component previously reported a different status, but has since returned to normal. |
Disk array has UNKNOWN overall status. | Yes | Degraded | Warning | Overall status of the disk array could not be determined. | The system cannot retrieve the health state of the local disk array. Troubleshooting required. Review the node's Windows event log for details. |
External Storage Array has CRITICAL status. | Yes | Failed | Error | The external storage array is indicating there is a failure (Vendor OperationalStatus: 6, 16)! Vendor status is reported in the component's "storage_global_status" property. Values: 6-Error, 16-Supporting Entity Error. | Review the node's Windows event log for details or contact device manufacturer. |
External Storage Array has NON-CRITICAL status. | Yes | Degraded | Warning | The external storage array reported non-critical warning (Vendor OperationalStatus: 3,4,5,11,14,15,17). Vendor status is reported in the component's "storage_global_status" property. Values: 3-Degraded, 4-Stressed, 5-Predictive Failure, 11-In Service, 14-Aborted, 15-Dormant, 17-Completed Operation. | Review the node's Windows event log for details or contact device manufacturer. |
External Storage Array has NON-RECOVERABLE status. | Yes | Failed | Error | The external storage array is indicating that the storage array is down and non-recoverable (Vendor OperationalStatus: 7). Vendor status is reported in the component's "storage_global_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
External Storage Array has NORMAL status. | No | Operational | Informational | The external storage array is working normally (vendor status: ok). Vendor status is reported in the component's "storage_global_status" property. | |
External Storage Array has UNKNOWN status. | Yes | Degraded | Warning | The status of the external storage array could not be determined based on the vendor status (Vendor OperationalStatus: 0,1,18). Vendor status is reported in the component's "storage_global_status" property. Values: 0-Unkown, 1-Other, 18-Power Mode. | Review the node's Windows event log for details or contact device manufacturer. |
External Storage Array has UNREACHABLE status. | Yes | Failed | Error | The external storage array is indicating that the storage array is unreachable (Vendor OperationalStatus: 8,9,10,12,13). Vendor status is reported in the component's "storage_global_status" property. Values: 8-Starting, 9-Stopping, 10-Stopped, 12-No Contact, 13-Lost Communication. | Review the node's Windows event log for details or contact device manufacturer. |
External Storage has CRITICAL status. | Yes | Failed | Error | The external storage is indicating there is a failure. | Troubleshooting required. Review Windows event log and the storage device's event log for details. |
External Storage has DEGRADED status. | Yes | Degraded | Warning | The storage system is degraded. You need to check the temperature status or power supply status of this storage system. Additionally, if the side panel for the storage system is removed, the air flow changes could result in improper cooling of the drives and affect the temperature status. Vendor status is reported in the component's "storage_global_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
External Storage has NON-CRITICAL status. | Yes if problem persists for more than 7 hours or reoccurs frequently on the same device more than every 90 days | Degraded | Warning | The external storage reported non-critical warning. | This event typically indicates one of two issues: Disk failures/transition events or battery recharging cycles on the raid controller's battery-backed cache module. The charging cycles are usually scheduled every 90 days and can take up to 7 hours. Important: During this time it is likely that the controller's write-cache policy has temporarily changed from write-through to write-back which can affect performance. Review the Windows event log and the storage device's event log for details. |
External Storage has NORMAL status. | No | Operational | Informational | The external storage is working normally. | Indicates that the component previously reported a different status, but has since returned to normal. |
External Storage has UNKNOWN status. | Yes | Degraded | Warning | The status of the external storage could not be determined. | The system was not able to retrieve the health state of the server's external storage. Troubleshooting required. Review the node's Windows event log and the storage device's event log for details. |
Fan device has CRITICAL status. | Yes | Failed | Warning | The fan device has reached critical upper or lower threshold (vendor status: CriticalUpper or CriticalLower). Vendor status is reported in the component's "device_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Fan device has NON-CRITICAL status. | Yes | Degraded | Warning | The fan device has reached non-critical upper or lower threshold (vendor status: nonCriticalUpper or nonCriticalLower). Vendor status is reported in the component's "device_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Fan device has NON-RECOVERABLE status. | Yes | Failed | Warning | The fan device has reached non-recoverable upper or lower threshold (vendor status: failed, nonRecoverableUpper or nonRecoverableLower). Vendor status is reported in the component's "device_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Fan device has NORMAL status. | No | Operational | Informational | The fan device is running normally (vendor status: ok). Vendor status is reported in the component's "device_status" property. | |
Fan device has UNKNOWN status. | Yes | Degraded | Warning | The status of the fan device could not be determined (vendor status: other or unknown). Vendor status is reported in the component's "device_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Fibre Channel host controller has CRITICAL status. | Yes | Failed | Warning | The Fibre Channel host controller component detects one of the following conditions: - host controller has failed and should be replaced (vendor status: failed) - host controller has been shutdown (vendor status: shutdown) - the fibre channel connection is failed (vendor status: loopFailed) Vendor status is reported in the component's "FC_device_rollup_status" property. | Review the node's Windows event log for details or contact device manufacturer. User Action: If the controller status is failed, replace the controller. |
Fibre Channel host controller has NON-CRITICAL status. | Yes | Degraded | Warning | The Fibre Channel host controller is reporting one of the following conditions: - fibre channel connection is degraded (vendor status: loopDegraded) - fibre channel port is not connected or the device to which it is connected is powered down (vendor status: notConnected) Vendor status is reported in the component's "FC_device_rollup_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Fibre Channel host controller has NORMAL status. | No | Operational | Informational | The Fibre Channel host controller is operating normally (vendor status: ok). Vendor status is reported in the component's "FC_device_rollup_status" property. | |
Fibre Channel host controller has UNKNOWN status. | Yes | Degraded | Warning | Fibre Channel host controller status could not be determined or controller is not present (vendor status: other). Vendor status is reported in the component's "FC_device_rollup_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Hadoop Service has CRITICAL status. | Yes | Non Operational | Error | This service is in a critical state and has stopped working (status: Installed or Stopped) or is in transitional state to be stopped (status: Stopping). Status is reported in the component's "hadoop_service_status" property. | Review the node's Windows and PDW Component event logs for details. |
Hadoop Service has NON-CRITICAL status. | Yes | Degraded | Warning | This service is in a non critical state due to one of the following reasons: - service is starting (status: Starting) - service is upgrading (status: Upgrading) Status is reported in the component's "hadoop_service_status" property. | Review the node's Windows and PDW Component event log for details. |
Hadoop Service has UNKNOWN status. | Yes | Degraded | Warning | This service is reporting that it is in a unknown state. Status is reported in the component's "hadoop_service_status" property. | Review the node's Hadoop logs, plus the Windows and PDW Component event log for details. |
Memory device has CRITICAL status. | Yes | Failed | Warning | The memory is reporting critical problem. | A DIMM may need to be replaced. Troubleshooting required. A server may still be active with some failed RAM but performance may be affected. Review the node's Windows event log for details. |
Memory device has NON-CRITICAL status. | Yes | Degraded | Warning | The memory is reporting non-critical situation. | Could point to imminent DIMM failure. Generally this means the DIMM has seen errors, but it is not yet past the threshold to make it a critical/failed status. A server may still be active with some failed ram, but performance may be affected. Hardware log must be cleared to clear the error. Review the node's Windows event log for details. |
Memory device has NON-RECOVERABLE status. | Yes | Failed | Warning | The memory reported non recoverable problem. | A DIMM may need to be replaced. Troubleshooting required. A server may still be active with some failed ram, but performance may be affected. Review the node's Windows event log for details |
Memory device has NORMAL status. | No | Operational | Informational | The memory is working normally | Indicates that the component previously reported a different status, but has since returned to normal. |
Memory device has UNKNOWN status. | Yes | Degraded | Warning | Status of the memory could not be determined. | The system cannot retrieve the health state of the system memory. A DIMM may need to be replaced. Troubleshooting required. A server may still be active with some failed RAM but performance may be affected. Review the node's Windows event log for details |
Network adapter has CRITICAL status. | Yes | Degraded | Warning | The network adapter is raising critical alert due to one of the following reasons: adapter is offline, adapter has been powered off, or adapter is in off duty status | The network adapter is in a failed state and may need replacement (which could mean motherboard replacement). Troubleshooting required. Review the node's Windows event log for details. |
Network adapter has NON-CRITICAL status. | Yes | Degraded | Warning | The network adapter is indicating there is a non-critical warning but is still operational, potentially degrading performance though. | The network adapter has some errors, but is not in a critical state. Since this could affect performance troubleshooting is required. Review the node's Windows event log for details. |
Network adapter has NON-RECOVERABLE status. | Yes | Failed | Warning | The network adapter is in non-recoverable status due to potentially being installed in error. | The network adapter is in a failed state and may need replacement (which could mean motherboard replacement). Troubleshooting required. Review the node's Windows event log for details. |
Network adapter has NORMAL status. | No | Operational | Informational | The network adapter is online and running normally. | Indicates that the component previously reported a different status, but has since returned to normal. |
Network adapter has UNKNOWN status. | Yes | Degraded | Warning | The status of the network adapter could not be determined. This status could be caused due to one of the following reasons: - network adapter is in Power Save mode: standby, low power, warning, unknown, or power cycle, network adapter has not been installed, network adapter device reported unknown status, network adapter might be in testing state. | The system was not able to retrieve the health state of the network adapter. Troubleshooting required. Review the node's Windows event log for details. |
Network connection has CRITICAL status. | Yes | Degraded | Warning | The network connectivity is raising critical alert due to one of the following reasons: network is disconnected, hardware is not present, hardware has been disabled, media is disconnected,- authentication has failed, invalid address was used, credential is required but not supplied | The network adapter is in a critical state. Review the node's Windows event log for details. |
Network connection has NON-CRITICAL status. | Yes | Degraded | Warning | The network is reporting a non-critical state. This status could be due to one of the following reasons: network is in connecting state, network is disconnecting state, network authentication is in process. | The network adapter is in an unexpected state. If this problem persists or happens multiple times then troubleshooting is required. Review the node's Windows event log for details. |
Network connection has NORMAL status. | No | Operational | Informational | The network is connected and working correctly. | Indicates that the component previously reported a different status, but has since returned to normal. |
Network connection profile is on a expected profile. | No | Operational | Informational | The network is connected and working as an expected profile. The profile is reported in the component's "profile_category" property. Domain profile is 2 and Private profile is 1. | Review the node's events in log 'Application and service logs\Microsoft\Windows\StorageSpaces-Driver\Operational' for further details. The health of the mirror could be impacted by the loss of a single disk so another alert might have occurred for the disk itself. |
Network connection profile is showing to be on the Public profile. | Yes | Degraded | Warning | The network is reporting that it is on the Public profile. The profile is reported in the component's "profile_category" property. Public profile is reported as 0. This could cause communication issues for this node. | Review the node's Windows event log for details or contact device manufacturer. |
Node in a cluster has CRITICAL status. | Yes | Failed | Error | The clustered node is down. | A server in the cluster is down. Review the Failover Cluster Manager on the HST01 VM. |
Node in a cluster has NON-CRITICAL status. | Yes | Degraded | Warning | The clustered node is throwing a non-critical alert. One of the following situations might have occurred: node is in paused state or node is in the process of joining the cluster. | The node is in an unexpected state. Troubleshooting required. Review the Failover Cluster Manager on the HST01 VM. |
Node in a cluster has NORMAL status. | No | Operational | Informational | The clustered node is up and running | Indicates that the component previously reported a different status, but has since returned to normal. |
Node in a cluster has UNKNOWN status. | Yes | Degraded | Warning | The clustered node is in an unknown state. | The system was not able to retrieve the health state of the node. Troubleshooting required. Review the Failover Cluster Manager on the HST01 VM. |
Physical Disk has CRITICAL status. | Yes | Failed | Error | The disk status is critical (vendor status: 2-Unhealthy)! The status is reported in the component's "phys_disk_status" property. The Operational Status, shown in property "phys_disk_oper_status" might provide more information about the problem. Operational Status Values: 0-The operational status of the physical disk is unknown. 2-OK 3-Degraded 4-Stressed 5-Predictive Failure 6-Error 7-Non-Recoverable Error 8-Starting 9-Stopping 10-Stopped 11-In Service 12-No Contact 13-Lost Communication 15-Dormant 18-Power Mode 0x8004-Failed Media 0x8005-Split 0x8006-Stale Metadata 0x8007-IO Error 0x8008-Corrupt Metadata. | |
Physical Disk has NON-CRITICAL status. | Yes | Degraded | Warning | The disk status is indicating there is a non-critical warning but system is still operational. The status is reported in the component's "phys_disk_status" property. The Operational Status, shown in property "phys_disk_oper_status" might provide more information about the problem. Operational Status Values: 0-The operational status of the physical disk is unknown. 2-OK 3-Degraded 4-Stressed 5-Predictive Failure 6-Error 7-Non-Recoverable Error 8-Starting 9-Stopping 10-Stopped 11-In Service 12-No Contact 13-Lost Communication 15-Dormant 18-Power Mode 0x8004-Failed Media 0x8005-Split 0x8006-Stale Metadata 0x8007-IO Error 0x8008-Corrupt Metadata. | Review the node's events in log 'Application and service logs\Microsoft\Windows\StorageSpaces-Driver\Operational' for further details. The health of the mirror could be impacted by the loss of a single disk so another alert might have occurred for the disk itself. |
Physical Disk has NORMAL status. | No | Operational | Informational | The disk status is normal. The status is reported in the component's "phys_disk_status" property. | |
Physical Disk has UNKNOWN status. | Yes | Degraded | Warning | The disk status could not be determined (status: 5-Unknown). The status is reported in the component's "phys_disk_status" property. The Operational Status, shown in property "phys_disk_oper_status" might provide more information about the problem. Operational Status Values: 0-The operational status of the physical disk is unknown. 2-OK 3-Degraded 4-Stressed 5-Predictive Failure 6-Error 7-Non-Recoverable Error 8-Starting 9-Stopping 10-Stopped 11-In Service 12-No Contact 13-Lost Communication 15-Dormant 18-Power Mode 0x8004-Failed Media 0x8005-Split 0x8006-Stale Metadata 0x8007-IO Error 0x8008-Corrupt Metadata. | Review the node's events in log 'Application and service logs\Microsoft\Windows\StorageSpaces-Driver\Operational' for further details. |
Power Supply has CRITICAL status. | Yes | Failed | Warning | The power supply is indicating there is a critical error. | The power supply may require replacement. Troubleshooting required. Power supplies are redundant so the server may still be active. Review the node's Windows event log for details. |
Power Supply has NON-CRITICAL status. | Yes | Operational | Warning | The power supply reported non-critical problem. | The power supply has reported a problem, but is not in a failed state. This might indicate imminent failure. Power supplies are redundant so a failure might not create a server outage. A hardware error probably needs to be cleared to clear the admin console error. Review the node's Windows event log for details. |
Power supply has NON-RECOVERABLE status. | Yes | Failed | Warning | The power supply is in non-recoverable status. | The power supply may require replacement. Troubleshooting required. Power supplies are redundant so the server may still be active. Review the node's Windows event log for details. |
Power Supply has NORMAL status. | No | Operational | Informational | The power supply is running normally. | Indicates that the component previously reported a different status, but has since returned to normal. |
Power supply has UNKNOWN status. | Yes | Degraded | Warning | The status of the power supply could not be determined. | The system was not able to retrieve the health state of the power supply. Power supplies are redundant so the server may still be active. Troubleshooting required. Review the node's Windows event log for details. |
Processor device has CRITICAL status. | Yes | Failed | Warning | The CPU is reporting critical problem. | CPU may need to be replaced. Troubleshooting required. Review the node's Windows event log for details. |
Processor device has NON-CRITICAL status. | Yes | Degraded | Warning | The CPU is reporting non critical situation. | The CPU encountered an error, but is not yet in a failed state. This may indicate an imminent failure. Review the node's Windows event log for details. |
Processor device has NON-RECOVERABLE status. | Yes | Failed | Warning | The CPU reported non recoverable problem. | Similar to critical status. CPU may need to be replaced. Troubleshooting required. Review the node's Windows event log for details. |
Processor device has NORMAL status. | No | Operational | Informational | The CPU is working normally. | Indicates that the component previously reported a different status, but has since returned to normal. |
Processor device has UNKNOWN status. | Yes | Degraded | Warning | Status of the CPU could not be determined. | The system cannot retrieve the health state of a CPU and further investigation is required. Review the node's Windows event log for details. |
SAS Host Bus Adapter has DEGRADED condition. | Yes | Degraded | Warning | The SAS Host Bus Adapter is reporting that the overall condition of the HBA and all of the physical drives controlled by it is degraded (vendor status: degraded). Vendor status is reported in the component's "hba_device_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
SAS Host Bus Adapter has FAILED condition. | Yes | Failed | Warning | The SAS Host Bus Adapter is reporting that the overall condition of the HBA is in a failed state, including all of the physical drives controlled by it. This will require a component to be replaced (vendor status: failed). Vendor status is reported in the component's "hba_device_rollup_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
SAS Host Bus Adapter has NORMAL status. | No | Operational | Informational | The SAS Host Bus Adapter is operating normally (vendor status: ok). Vendor status is reported in the component's "hba_device_rollup_status" property. | |
SAS Host Bus Adapter has UNKNOWN status. | Yes | Degraded | Warning | The SAS Host Bus Adapter status could not be determined (vendor status: other). Vendor status is reported in the component's "hba_device_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
SQL Server has CRITICAL status. | Yes | NonOperational | Error | This service is in a critical state and has stopped working (status: Stopped) or is in transitional state to be stopped (status: StopPending). Status is reported in the component's "sql_server_service_status" property. | Review the node's Windows event log for details. |
SQL Server has NORMAL status. | No | Operational | Informational | This service is running normally (status: Running). Status is reported in the component's "sql_server_service_status" property. | |
Storage Enclosure Fan has a DEGRADED status. | Yes | Degraded | Warning | The Storage Enclosure Fan is reporting that it is degraded (vendor status: 10,15). Vendor status is reported in the component's "storage_fan_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Storage Enclosure Fan has a FAILED status. | Yes | Failed | Warning | The Storage Enclosure Fan is reporting that it is in a failed state. This will require a component to be replaced (vendor status: 20,25). Vendor status is reported in the component's "storage_fan_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Storage Enclosure Fan has a NON-RECOVERABLE status. | Yes | Failed | Warning | The Storage Enclosure Fan is reporting that this fan is in a non-recoverable state. This will require a component to be replaced (vendor status: 30). Vendor status is reported in the component's "storage_fan_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Storage Enclosure Fan has a UNKOWN status. | Yes | Degraded | Error | The Storage Enclosure Fan status could not be determined (vendor status: 0-unknown). Vendor status is reported in the component's "storage_fan_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Storage Enclosure Fan has a NORMAL status. | No | Operational | Informational | The Storage Enclosure Fan is operating normally (vendor status: 5). Vendor status is reported in the component's "storage_fan_status" property. | |
Storage Enclosure Power Supply has a DEGRADED status. | Yes | Degraded | Warning | The Storage Enclosure Power Supply is reporting that this power supply is degraded (vendor status: 10,15). Vendor status is reported in the component's "storage_power_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Storage Enclosure Power Supply has a FAILED status. | Yes | Failed | Error | The Storage Enclosure Power Supply is reporting that this power supply is in a failed state. This will require a component to be replaced or restore power to the device (vendor status: 20,25). Vendor status is reported in the component's "storage_power_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Storage Enclosure Power Supply has a NON-RECOVERABLE status. | Yes | Failed | Error | The Storage Enclosure Power Supply is reporting that this power supply is in a non-recoverable state. This will require a component to be replaced (vendor status: 30). Vendor status is reported in the component's "storage_power_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Storage Enclosure Power Supply has a UNKNOWN status. | Yes | Degraded | Warning | The Storage Enclosure Power Supply status could not be determined (vendor status: 0). Vendor status is reported in the component's "storage_power_status" property. | Review the node's Windows event log for details or contact device manufacturer. |
Storage Enclosure Power Supply has a NORMAL status. | No | Operational | Informational | The Storage Enclosure Power Supply is operating normally (vendor status: 5). Vendor status is reported in the component's "storage_power_status" property. | |
Storage Pool has CRITICAL status. | Yes | Failed | The storage pool status is critical (vendor status: 2-Unhealthy)! The status is reported in the component's "storage_pool_status" property. The Operational Status, shown in property "storage_pool_oper_status" might provide more information about the problem. | Review the node's events in log 'Application and service logs\Microsoft\Windows\StorageSpaces-Driver\Operational' for further details. The health of the mirror could be impacted by the loss of a single disk so another alert might have occurred for the disk itself. | |
Storage Pool has NON-CRITICAL status. | Yes | Degraded | The storage pool status is indicating there is a non-critical warning but system is still operational (status: 1-Warning). The status is reported in the component's "storage_pool_status" property. The Operational Status, shown in property "storage_pool_oper_status" might provide more information about the problem. | Review the node's events in log 'Application and service logs\Microsoft\Windows\StorageSpaces-Driver\Operational' for further details.. The health of the mirror could be impacted by the loss of a single disk so another alert might have occurred for the disk itself. | |
Storage Pool has NORMAL status. | No | Operational | The storage pool status is normal (status: 0-Healthy). The status is reported in the component's "storage_pool_status" property. | ||
Storage Pool has UNKNOWN status. | Optional | Operational | The storage pool status is in an unknown state on this node (status: 5-Unknown). The status is reported in the component's "storage_pool_status" property. The Operational Status, shown in property "storage_pool_oper_status" might provide more information about the problem. This commonly happens when the node querying the storage pool state is not the owner of the storage pool. | Review the node's events in log 'Application and service logs\Microsoft\Windows\StorageSpaces-Driver\Operational' for further details. | |
Temperature status is CRITICAL. | Yes | Failed | Error | The temperature has reached critical upper or lower threshold. | The temperature is too high or too low. Continuing at this state could damage or drastically shorten the life of the hardware. Troubleshooting required. Review the node's Windows event log for details. |
Temperature status is NON-CRITICAL. | Optional | Degraded | Warning | The temperature has reached non critical upper or lower threshold | The temperature reported by the server is at a level higher or lower than normal, but has not reached the threshold for critical status. Temperatures outside of threshold shorten hardware life. Things that could affect temperature are workload, data center temperature/airflow, cabling restricting server exhaust, etc. Review the node's Windows event log for details. |
Temperature status is NON-RECOVERABLE. | Yes | Failed | Warning | The temperature is in non-recoverable status. | The temperature sensor has detected an error from which it cannot recover. This could be a problem with the temperature or with the temperature module itself. Review the node's Windows event log for details. |
Temperature status is NORMAL. | No | Operational | Informational | The temperature is normal | Indicates that the component previously reported a different status, but has since returned to normal. |
Temperature status is UNKNOWN. | Yes | Degraded | Warning | The status of the temperature could not be determined. | The system was unable to retrieve the server temperature. Troubleshooting required. Review the node's Windows event log for details. |
Virtual Disk has CRITICAL status. | Yes | Failed | Error | The storage spaces virtual disk status is critical (vendor status: 2-Unhealthy)! The status is reported in the component's "virtual_disk_status" property. The Operational Status, shown in property "virtual_disk_oper_status" might provide more information about the problem. | Review the node's events in log 'Application and service logs\Microsoft\Windows\StorageSpaces-Driver\Operational' for further details. The health of the mirror could be impacted by the loss of a single disk so another alert might have occurred for the disk itself. |
Virtual Disk has NON-CRITICAL status. | Yes | Degraded | Warning | The storage spaces virtual disk status is indicating there is a non-critical warning but system is still operational (status: 1-Warning). The status is reported in the component's "virtual_disk_status" property. The Operational Status, shown in property "virtual_disk_oper_status" might provide more information about the problem. If the Virtual Disk has moved to another node, then review the state of the cluster shared volume components and move the disks back to the expected owner, indicated by the number after the N in the name, eg. N01D01 belongs on HSA01. | Review the node's events in log 'Application and service logs\Microsoft\Windows\StorageSpaces-Driver\Operational' for further details. The health of the mirror could be impacted by the loss of a single disk so another alert might have occurred for the disk itself. |
Virtual Disk has NORMAL status. | No | Operational | Informational | The storage spaces virtual disk status is normal (status: 0-Healthy). The status is reported in the component's "virtual_disk_status" property. | |
Virtual Disk has UNKNOWN status. | Yes | Operational | Warning | The storage spaces virtual disk status could not be determined (status: 5-Unknown). The status is reported in the component's "virtual_disk_status" property. The Operational Status, shown in property "virtual_disk_oper_status" might provide more information about the problem. If the Virtual Disk has moved to another node, then review the state of the cluster shared volume components and move the disks back to the expected owner, indicated by the number after the N in the name, eg. N01D01 belongs on HSA01. | Review the node's events in log 'Application and service logs\Microsoft\Windows\StorageSpaces-Driver\Operational' for further details. |
Volume free space status is CRITICAL. | Yes | Degraded | Error | Volume free space is critically low! The current volume used disk space is beyond 90% of total capacity. Clean up unnecessary files/data to ensure normal appliance operation. | The Admin Console reports allocated space and not necessarily used space. You can use DBCC PDW_SHOWSPACEUSED to investigate used vs. allocated space. You can also use DBCC SHRINKLOG or ALTER DATABASE (Parallel Data Warehouse) to shrink databases. |
Volume free space status is NON-CRITICAL. | Optional | Operational | Warning | The current volume used disk space is between 70% and 90% full. Review disk space used on this volume and clean up unnecessary files/data to ensure normal appliance operation. | The Admin Console reports allocated space and not necessarily used space. You can use DBCC PDW_SHOWSPACEUSED to investigate used vs. allocated space. You can also use DBCC SHRINKLOG or ALTER DATABASE (Parallel Data Warehouse) to shrink databases. |
Volume free space status is NORMAL. | No | Operational | Informational | There is enough free disk space on this volume. The current volume used disk space is below 70%. | Indicates that the component previously reported a different status, but has since returned to normal. |