Recently, we’ve seen an increase in the number of high CPU/High Memory usage problems with WSUS, including WSUS in a System Center Configuration Manager environment – these have mostly corresponded with Update Tuesdays.
Microsoft support has determined that the issue is driven primarily by the Windows 10 1607 updates, for example KB4022723, KB4022715, KB4025339, etc. See here for the list of Windows 10 1607 updates.
Microsoft is also aware of a known issue with KB4034658 that will cause Windows 10 1607 clients to run a full scan after install – Microsoft is investigating and the latest information is available here.
These updates have large metadata payloads for the dependent (child) packages because they roll up a large number of binaries. Windows 10, versions 1507 (Windows 10 RTM) and 1511 updates can also cause this, though to a lesser extent. Windows 10, version 1703 is still recent enough that the metadata is not that large yet (but will continue to grow).
Symptom
The symptoms include
- High CPU on your WSUS server – 70-100% CPU in w3wp.exe hosting WsusPool
- High memory in the w3wp.exe process hosting the WsusPool – customers have reported memory usage approach 24GB
- Constant recycling of the W3wp.exe hosting the WsusPool (identifiable by the PID changing)
- Clients failing to scan with 8024401c (timeout) errors in the WindowsUpdate.log
- Mostly 500 errors for the /ClientWebService/Client.asmx requests in the IIS logs
Cause
Microsoft support has determined that the issue is driven primarily by the Windows 10 1607 updates, for example KB4022723, KB4022715, KB4025339, etc. See here for the list of Windows 10 1607 updates.
These updates have large metadata payloads for the dependent (child) packages because they roll up a large number of binaries. Windows 10, versions 1507 (Windows 10 RTM) and 1511 updates can also cause this, though to a lesser extent. Windows 10, version 1703 is still recent enough that the metadata is not that large yet (but will continue to grow).
How to determine if the 1607 Updates are the cause
To determine if WSUS is affected by this problem, decline the Windows 10 updates (including the latest cumulative update). If CPU and memory quickly drop back to normal, then the issue is likely the result of metadata size from the Windows 10 updates. They can be reapproved after you have determined if the updates are causing this issue, assuming you want to deploy them.
If declining the Windows 10 updates does not help, then the problem may be due to too many superseded updates in the WSUS server. Take the steps outlined in The Complete Guide to Microsoft WSUS and Configuration Manager SUP maintenance to decline the superseded updates. If, after doing this you are still having problems, read on.
This blog post may help alleviate some of these problems, but is not a magic bullet. After these changes are made, you will still see high CPU and memory until the system stabilizes as I explain further down.
WSUS Caching
WSUS has a caching mechanism whereby the first time update metadata is requested by any client WSUS will store it in memory. Further requests for the same update revision will retrieve the update metadata from memory instead of reading it from the database. Some of the metadata in the database is compressed, so not only must it be retrieved, it must be decompressed into memory, which is an expensive operation.
You can monitor the current number of updates stored in the cache via Performance Monitor with the counter WSUS: Client Web Service/Cache size and instance spgetcorexml. Keep in mind that this counter provides the number of cached items, not the amount of memory consumed by cached metadata. w3wp.exe process memory can be used as a proxy for the amount of space consumed by the metadata cache.
The Problem
For large metadata packages and many simultaneous requests, it can take longer than ASP.NET’s default timeout of 110 seconds to retrieve all of the metadata the client needs. When the timeout is hit, ASP.NET disconnects the client and aborts the thread doing the metadata retrieval. If you look at Program FilesUpdate ServicesLogFilesSoftwareDistribution.log, the abort looks like this:
System.Threading.ThreadAbortException: Thread was being aborted. at System.Buffer.__Memcpy(Byte* dest, Byte* src, Int32 len) at System.Buffer._Memcpy(Byte* dest, Byte* src, Int32 len) at System.Buffer.Memcpy(Byte* dest, Byte* src, Int32 len) at System.String.CtorCharPtrStartLength(Char* ptr, Int32 startIndex, Int32 length) at Microsoft.UpdateServices.Internal.CabUtilities.ExpandMemoryCabToString(Byte[] src) at Microsoft.UpdateServices.Internal.DataAccess.ExecuteSpGetCoreUpdateXml(Int32[] revisionIds) at Microsoft.UpdateServices.Internal.DataAccessCache.GetCoreUpdateXml(Int32[] revisionIds, DataAccess da, Int64 maxXmlPerRequest) at Microsoft.UpdateServices.Internal.ClientImplementation.GetSyncInfo(Version clientProtocolVersion, DataAccess dataAccess, Hashtable stateTable, Hashtable deploymentTable, Boolean haveGroupsChanged, Boolean driverSyncNeeded, Boolean doChunking) at Microsoft.UpdateServices.Internal.ClientImplementation.SoftwareSync(DataAccess dataAccess, UnencryptedCookieData cookieData, Int32[] installedNonLeafUpdateIds, Int32[] leafUpdateIds, Boolean haveGroupsChanged, Boolean expressQuery, Guid[] filterCategoryIds, Boolean needTwoGroupOutOfScopeUpdates) at Microsoft.UpdateServices.Internal.ClientImplementation.SyncUpdates(Cookie cookie, SyncUpdateParameters parameters) at Microsoft.UpdateServices.Internal.ClientImplementation.SyncUpdates(Cookie cookie, SyncUpdateParameters parameters)
Note: What you are looking for is a ThreadAbortException with ExecuteSpGetCoreUpdateXml on the stack (ThreadAbortExceptions could happen for other reasons as well – we are concerned with this specific scenario).
When the thread abort happens, all of the metadata that has been retrieved to that point is discarded and is not cached. As a result, WSUS enters a continuous cycle where the data isn’t cached, the clients can never complete the scan and continue to rescan.
Another issue that can occur is the WSUS application pool keeps recycling because it exceeds the private memory threshold (which it is very likely to do if the limit is still the default of 1843200). This recycles the app pool, and thus the cached updates, and forces WSUS to go back through retrieving updates from the database and caching them.
Solution
Configure IIS to stop recycling the App Pool
The goal is to stop the app pool recycling since a recycle clears the cache. The defaults in IIS for Private Memory limit of 1843200 will cause the pool to constantly recycle. We want to make sure it doesn’t recycle unless we intentionally restart the app pool.
- Open IIS Manager for the WSUS server
- Expand <Server name> and click Application Pools.
- Find WSUSPool > Right-click > Advanced Settings.
- Find the setting Private Memory Limit (KB) under Recyling and set it to 0.
- Check and verify Virtual Memory Limit (KB) is set to 0 .
- This will prevent IIS from recycling due to a memory limit.
- Find the setting Regular Time Interval (minutes) below the Private Memory limit and set to 0.
- Find the Ping Enabled setting and set it to False.
- This will prevent IIS from recycling the pool if it gets too busy and doesn’t respond to the ping.
- Click OK.
- From an elevated command prompt, run IISReset to restart IIS.
Limit the number of inbound connections to WSUS
Reducing the number of allowed connections will cause clients to receive 503 errors (service not available), but they will retry. If the performance counter Web Services | Current Connections for the website on which WSUS is hosted has more than 1000 connections, complete this step:
- Open IIS Manager for the WSUS server.
- Expand <Server name> and then Sites.
- Select the site hosting WSUS.
- If you aren’t sure, expand each site and look for the ClientWebService directory underneath it – that is the WSUS site the clients use.
- With the site selected, click the Limits link in the toolbar on the right side.
- Check the option Limit number of connections and change it to 1000 (or even smaller).
- Click Ok to save the changes.
- From an elevated command prompt, run IISReset to restart IIS.
Increase the ASP.NET timeout
- Make a copy of Program FilesUpdate ServicesWebServicesClientWebServiceWeb.Config.
- Open Program FilesUpdate ServicesWebServicesClientWebServiceWeb.Config.
- Find the element “<httpRunTime”. It will look like this (in an unmodified web.config):
<httpRuntime maxRequestLength="4096" />
- Modify httpRunTime by adding an executionTimeout attribute:
<httpRuntime maxRequestLength="4096" executionTimeout="3600" />
- Save the web.config to a different location and copy the modified one into the directory.
- From an elevated command prompt, run IISReset to restart IIS.
Monitor
Open Windows Performance monitor and add the following counters
- WSUS: Client Web Service | Cache Size counter for spgetcorexml instance.
- Process | Private Memory counters.
- If there is more than one w3wp.exe, add them all – the one with the highest memory usage is probably the WSUSPool, but you can also add Process | ID Process to determine which worker process should be monitored.
Monitor the cache size counter – it should increase and eventually reach a peak value that does not change. This indicates all metadata that clients need is cached. It can take several hours for this to stabilize, so be patient.
Monitor the IIS logs and filter on ClientWebService/Client.asmx. The majority will be 500s, but as the cache increases, the number of 200s will increase with it. Once the cache is fully built, you should see mostly 200s.
If you see the cache size drop, then one of two things has happened:
- The App pool was recycled (or it crashed), or
- The cache was purged due to memory pressure
If the app pool process ID didn’t change and you didn’t make any changes to IIS config that would cause the app domain to unload (such as changing IIS connection limit), then you have most likely hit scenario #2. To get around this, you can force the cache to be a certain size before items will be trimmed from it. You can also make this change beforehand if you wish.
- Make a copy of Program FilesUpdate ServicesWebServicesClientWebServiceWeb.Config.
- Open Program FilesUpdate ServicesWebServicesClientWebServiceWeb.Config.
- Find the element <system.web>.
- Immediately under it add a new element:
<caching> <cache privateBytesLimit = "8000000000"/> </caching>
- The privateBytesLimit value can be changed to be larger. 8,000,000,000 is usually enough
- Save the web.config to someplace else, backup the old one, then copy the modified one into the directory.
- From an elevated command prompt, run IISReset to restart IIS.
Again monitor the cache size – if it continues to bounce around and the PID isn’t changing and memory is high ( > 8GB) then you probably need increase the privateBytesLimit further.