Solarwinds - Monitoring switch stacks

Today’s post is on using SQL queries in Solarwinds to get more detailed information on the frontpage on stackswitches. Solarwinds tends to average the memory and cpu utilization among all switches in a stack, but this does not always reflect the true statistics on the switches. Sustained CPU above 80% can lead to data packets being dropped. The queries are all build in SWQL studio on a Solarwinds server to verify the information that is returned. In the queries below, also other network devices show up, which is not a problem in my opinion, but you might want to scope a bit more if that is more to your taste.

Please note that the queries below might not seem like valid or efficient queries. They’re not. Solarwinds decided to implement only a subset of SQL, which means that sometimes a lot more code is needed to achieve something simple.

Top 30 nodes

I want only a top 30 of nodes returned for both performance reasons, but also because I don’t want an endless list stretching the whole front page or even multiple pages. If that would be the case, you’re primary objective is to start fixing those switches!

Three tables are essential to making this query work:

  • Orion.CPUMultiLoadCurrent
  • Orion.Nodes
  • Orion.NPM.SwitchStackMember

SQL code to get the top 30 nodes:

SELECT TOP 30 Nodes.DisplayName, Nodes.IPAddress, ssm.MacAddress, ssm.Role, ssm.SwitchNumber, ssm.SwPriority, Nodes.MachineType, ssm.SerialNumber, AvgLoad AS CPU , (DAYDIFF(Nodes.LastBoot, Nodes.NextPoll) ) AS \UptimeDays
FROM Orion.CPUMultiLoadCurrent AS cpu
INNER JOIN Orion.Nodes AS Nodes ON cpu.NodeID = Nodes.NodeID
LEFT JOIN Orion.NPM.SwitchStackMember AS ssm ON ssm.NodeID = cpu.NodeID  AND  ssm.MemberID = cpu.CPUIndex

The list can be ordered by making it a descending list: ORDER BY AvgLoad DESC

Excluding devices

This query returns a list of other nodes such as servers as well. We can filter this by utilizing the Orion.NodesCustomProperties table as in this environment all nodes are classified on type. The following code is added to the previous code:

WHERE Nodes.NodeID IN (SELECT ncp.NodeID FROM Orion.NodesCustomProperties AS ncp WHERE ncp.DeviceCategory = 'Network')

NOTE: Be aware that this is a custom property that doesn’t come with the default Solarwinds installation. In your environment, you might have other custom properties. Otherwise, you might need to find another way to distinguish servers from network devices.

Next, in this environment two DWDM devices show up on top of which the cpu util is not relevant for us to monitor. I decided to filter these by ID’s, but if you have more of these devices, you might need to filter on MachineType for example.

AND NOT (Nodes.NodeID = '1867' OR Nodes.NodeID = '1868')

End result

Code:

SELECT TOP 30 Nodes.DisplayName, Nodes.IPAddress, ssm.MacAddress, ssm.Role, ssm.SwitchNumber, ssm.SwPriority, Nodes.MachineType, ssm.SerialNumber, AvgLoad AS CPU , (DAYDIFF(Nodes.LastBoot, Nodes.NextPoll) ) AS \UptimeDays
FROM Orion.CPUMultiLoadCurrent AS cpu
INNER JOIN Orion.Nodes AS Nodes ON cpu.NodeID = Nodes.NodeID
LEFT JOIN Orion.NPM.SwitchStackMember AS ssm ON ssm.NodeID = cpu.NodeID  AND  ssm.MemberID = cpu.CPUIndex
WHERE Nodes.NodeID IN (SELECT ncp.NodeID FROM Orion.NodesCustomProperties AS ncp WHERE ncp.DeviceCategory = 'Network')
AND NOT (Nodes.NodeID = '1867' OR Nodes.NodeID = '1868')
ORDER BY AvgLoad DESC

Widget:

High CPU

When you only want nodes above 80% to be displayed, you can add some extra code to the WHERE statement. However, I choose not to do that, because I’d prefer to see nodes doing that ahead of time. In case you do want to cap, use the following code:

AND AvgLoad > 80

Alerting

Now, in Solarwinds you can add this code to a custom widget. However, perhaps you want e-mail alerting on this as well. In that case, go to Alerts & Activity > Alerts > Manage Alerts (top right) > Add New Alert.

Create an alert

Under Trigger Conditions, the SQL is slightly different, also due to the limited editing capabilities:

INNER JOIN CPUMultiLoad ON Nodes.NodeID = CPUMultiLoad.NodeID
WHERE DATEDIFF(mi, CPUMultiLoad.TimeStampUTC, SYSUTCDATETIME()) < 5
AND (CPUMultiLoad.AvgLoad-1) > 80
AND Nodes.NodeID IN (SELECT NodesCustomProperties.NodeID FROM NodesCustomProperties WHERE NodesCustomProperties.DeviceCategory = 'Network')

Alert SQL editor

In the reset condition, I’ve configured the load to be lower then 75 (<75) for five minutes. The idea here is that this setting prevents flapping, but the values are up to you.

Alert Reset condition

In the Trigger E-mail, I’ve used some extensive SQL as well. Here is an example, which is not great for reading here, but it does report the IP address, node name, cpu, memory etc. in a readable format when you receive an e-mail:

<b>Automatically generated alert from Solarwinds Orion.</b>

${N=SwisEntity;M=Vendor} device <b> ${N=SwisEntity;M=SysName} </b> has <b> >80% CPU utilization </b> since ${N=Generic;M=DateTime;F=DateTime}.

<b>Details</b>
Devicename : ${N=SwisEntity;M=DisplayName}
IP adress : ${N=SwisEntity;M=IP_Address}
Time triggered : ${N=SwisEntity;M=LastSystemUpTimePollUtc}
Location: ${N=SwisEntity;M=Location}
Device last booted at: ${N=SwisEntity;M=LastBoot}
Polling engine: ${N=SwisEntity;M=Engine.ServerName}
Severity level: ${N=Alerting;M=Severity}
System CPU : ${N=SwisEntity;M=Stats.CPULoad}
SystemMem : ${N=SwisEntity;M=Stats.PercentMemoryUsed}

<b>Current Statistics</b>
Stack CPU Utilization	:	${N=SWQL;M=SELECT TOP 1 AvgLoad
FROM (
	SELECT NodeID , CPUIndex , ObservationTimestamp , CPUMultiLoad.AvgLoad , SSM.Role , SSM.SwitchNumber , SSM.Model , SSM.MacAddress , SSM.SerialNumber
	FROM Orion.CPUMultiLoad AS CPUMultiLoad
	INNER JOIN Orion.NPM.SwitchStackMember AS SSM ON CPUMultiLoad.NodeID = SSM.NodeID AND CPUMultiLoad.CPUIndex = SSM.MemberID
	WHERE CPUMultiLoad.NodeID = ${N=SwisEntity;M=NodeID}
	AND AvgLoad > '80'
	ORDER BY ObservationTimestamp DESC
)
} %
Role				:	${N=SWQL;M=SELECT TOP 1 Role
FROM (
	SELECT NodeID , CPUIndex , ObservationTimestamp , CPUMultiLoad.AvgLoad , SSM.Role , SSM.SwitchNumber , SSM.Model , SSM.MacAddress , SSM.SerialNumber
	FROM Orion.CPUMultiLoad AS CPUMultiLoad
	INNER JOIN Orion.NPM.SwitchStackMember AS SSM ON CPUMultiLoad.NodeID = SSM.NodeID AND CPUMultiLoad.CPUIndex = SSM.MemberID
	WHERE CPUMultiLoad.NodeID = ${N=SwisEntity;M=NodeID}
	AND AvgLoad > '80'
	ORDER BY ObservationTimestamp DESC
)
}
SwitchNumber	:	${N=SWQL;M=SELECT TOP 1 SwitchNumber
FROM (
	SELECT NodeID , CPUIndex , ObservationTimestamp , CPUMultiLoad.AvgLoad , SSM.Role , SSM.SwitchNumber , SSM.Model , SSM.MacAddress , SSM.SerialNumber
	FROM Orion.CPUMultiLoad AS CPUMultiLoad
	INNER JOIN Orion.NPM.SwitchStackMember AS SSM ON CPUMultiLoad.NodeID = SSM.NodeID AND CPUMultiLoad.CPUIndex = SSM.MemberID
	WHERE CPUMultiLoad.NodeID = ${N=SwisEntity;M=NodeID}
	AND AvgLoad > '80'
	ORDER BY ObservationTimestamp DESC
)
}

<b>Hardware & Software details</b>
IP address		:	${N=SwisEntity;M=IP_Address}
MAC Address		:	${N=SWQL;M=SELECT TOP 1 MacAddress
FROM (
	SELECT NodeID , CPUIndex , ObservationTimestamp , CPUMultiLoad.AvgLoad , SSM.Role , SSM.SwitchNumber , SSM.Model , SSM.MacAddress , SSM.SerialNumber
	FROM Orion.CPUMultiLoad AS CPUMultiLoad
	INNER JOIN Orion.NPM.SwitchStackMember AS SSM ON CPUMultiLoad.NodeID = SSM.NodeID AND CPUMultiLoad.CPUIndex = SSM.MemberID
	WHERE CPUMultiLoad.NodeID = ${N=SwisEntity;M=NodeID}
	AND AvgLoad > '80'
	ORDER BY ObservationTimestamp DESC
)
}
Model type: ${N=SwisEntity;M=HardwareHealthInfos.Model}
IOS Image: ${N=SwisEntity;M=IOSImage}
IOS Version: ${N=SwisEntity;M=IOSVersion}
Serial Number: ${N=SWQL;M=SELECT TOP 1 SerialNumber
FROM (
	SELECT NodeID , CPUIndex , ObservationTimestamp , CPUMultiLoad.AvgLoad , SSM.Role , SSM.SwitchNumber , SSM.Model , SSM.MacAddress , SSM.SerialNumber
	FROM Orion.CPUMultiLoad AS CPUMultiLoad
	INNER JOIN Orion.NPM.SwitchStackMember AS SSM ON CPUMultiLoad.NodeID = SSM.NodeID AND CPUMultiLoad.CPUIndex = SSM.MemberID
	WHERE CPUMultiLoad.NodeID = ${N=SwisEntity;M=NodeID}
	AND AvgLoad > '80'
	ORDER BY ObservationTimestamp DESC
)
}





View all node details: ${NodeDetailsURL}

View full alert details here: ${N=Alerting;M=AlertDetailsUrl}
Click here to acknowledge the alert: ${N=Alerting;M=AcknowledgeUrl}

<b>Automatically generated alert from Solarwinds Orion.</b>