Microsoft has been running services on the Internet for quite some time now, but things fundamentally changed a few years ago leading up to Steve Ballmer’s famous quote, "This is the bet for the company. For the cloud, we're all in." We have all the oars in the boat pulling in the same direction and learning from each other to deliver the best cloud products. We are taking the lessons from running cloud services and feeding them into our products to make them better. During our planning for Windows Server 2012, we spent a lot of time with our cloud services to understand what worked well and where their pain points were. When you run services at the scale we are running things, every little problem gets amplified and every improvement helps enormously. These learnings are translated into dozens and dozens of features in the areas of performance, automating everything, supporting datacenter topologies, continuous availability, and minimizing mean-time-to-detection (MTTD)/mean-time-to-recovery (MTTR). In today’s blog, Mukul Sabharwal, a software development engineer on the Bing team, describes a few of the features of Windows Server 2012, their effect on the Bing Service, and why Bing is adopting and deploying Windows Server 2012 as fast as they can. As you’ll see, Windows Server 2012 is truly a cloud-optimized operating system.
Cheers,
Jeffrey
With the recent announcement of the Windows Server 2012 Release Candidate (RC), we at Bing.com considered how we might benefit from some of the operating system’s new features. Bing.com is a cloud service that runs on thousands of computers spanning many datacenters across the globe. Performance is a key component in running a successful cloud service such as Bing. Bing serves thousands of user queries every second, and users demand both relevancy and speed in those results.
Our deployment of Windows Server 2012 leveraged four key new features, in particular:
- Built-in Microsoft .NET Framework 4.5, included with Windows Server 2012, including the background garbage collection and associated improved latencies
- Improved performance at startup, enabled by the multicore JIT functionality of .NET 4.5
- Ability to collect call stacks for 64-bit .NET JITted applications
- Evaluation of Hyper-V 3 (the version of Hyper-V in Window Server 2012)
Pre-installed .NET Framework 4.5
Our rendering tier runs entirely on managed code, relying on the power of the .NET Framework and the accompanying web frameworks, ASP.NET and ASP.NET MVC. One of the primary reasons for writing managed code is the improved developer productivity and run-time safety afforded by the CLR. These also have performance costs; for example, Garbage Collection (GC).
Server background GC
.NET 4.5 introduces Background GC for Server applications. Background GC was released in .NET 4 for client-side applications, and it was a hit, which is why we were really excited when Windows Server 2012 brought it to the server. And of course, we were gleeful when we saw the results!
This graph of InternalRequestLatency shows the time our application spends doing non-I/O work. (Just so you know, we set goals around making our 99th percentile faster.)
In the following graph, note that the majority of these gains come from spending less time dealing with the managed memory we create (garbage collecting) as part of servicing a request. (Note that the dates are coordinated with all the graphs in this post.)
Another measure of optimal performance—and an indicator of server health for ASP.NET applications—is the number of requests that are queued in the ASP.NET pipeline. This is shown in the following graph. Note the sharp decline on 5/25!
Upgrading to Windows Server 2012 can have this same positive impact for your managed applications. Less time spent garbage collecting is more time spent serving user requests. The end result is less latency and better throughput for your services.
Multicore JIT
Another great feature that is available in Windows Server 2012 via .NET 4.5 is JIT-compiling by using multiple cores. The feature is a profile-guided optimization; a background thread compiles methods that are likely to be required by the executing thread, and in an ideal case, the application methods are already JIT compiled when they are needed to run.
This feature is already enabled for ASP.NET applications, which makes it trivial to upgrade—do nothing, and let your ASP.NET applications get multicore JIT automatically.
You may be wondering: Why do startup times matter for server-side applications? The hope is that you restart your services only when needed, and at a time of your choosing so that it can be done during off-peak hours. But what if your application crashes?
The case for startup time is critical for service availability. When a service goes down and can recover quickly, it can be the difference between a total system outage vs. a degraded (but functional) experience.
And the startup time improvements, you ask?
This 50 percent reduction in startup time gives our operational staff relief that in case a service goes down, it’ll start back up twice as fast!
Taking call stacks on your production servers
Sample-based profiling in production
Also new to Windows Server 2012 is the ability to collect call stacks for 64-bit .NET applications that are JIT compiled.
We use performance counters as our primary monitoring mechanism, and as the first line of notification to our operational staff. They range from the ones we’ve discussed here (time in GC, and ASP.NET requests queued) to other important ones such as %CPU time and number of exceptions thrown.
Performance counters are extremely valuable for detecting performance degradation because they give insight into operational health. However, they are usually not sufficient to diagnose the “root cause” of a performance problem. So let’s say a performance counter spiked. For example, our %CPU time doubled, which in turn impacted latency. What’s our next step? With Windows Server 2012, our next step is now to enable low-overhead sample-based profiling.
Windows Server 2012 introduces the ability to perform sample-based profiling on 64-bit, JIT-compiled .NET applications via the Event Tracing for Windows (ETW) system. If you’re familiar with ETW, you’ll know that it’s a system-wide service that does not require process restarts; it’s non-invasive (in other words, no attaching to a process), and depending on the events you subscribe to, it also has low overhead: approximately 10 percent CPU cost when profiling is active.
The upward trend of %CPU time spent only increased from 80 percent to 90 percent. This translates to roughly equal percentage degradation in latencies, as shown in the following graph.
A 10 percent hit to performance can be acceptable at times when it is critical to diagnose the “root cause” of a performance problem.
We’ve spoken here about sample based profiling, but this feature extends to many other types of ETW events, including context switch events and ReadyThread events. For example, not only can you analyze CPU-bound problems, but also you can attempt to resolve problems that are stemming from I/O or thread-scheduling. (Note that enabling context-switch analysis is much more costly than profiling, with close to a 30 percent impact on CPU in our applications case.)
Sometimes there are problems that are only reproducible in a production environment, at scale. The ability to turn on a low-overhead logging facility that can profile your system is an incredible advantage —and with Windows Server 2012, managed applications now have the full support and parity with their native counterparts.
(Note that while Windows Server 2008 introduced the ability to take call stacks on sample profile ETW events, it did not work for 64-bit applications that were JITted.)
Hyper-V 3 and guest NUMA support
Before Windows Server 2012, virtualization implied overhead, even with all the accompanying benefits. Specifically, the cost of a software indirection imposed by virtualization was not acceptable. Moreover, the limitation on virtual cores (4) proved to be a throughput bottleneck in our synthetic lab testing.
However, with Windows Server 2012 comes Hyper-V 3, and with it a host of “scale” features. In particular, we were particularly excited about guest NUMA support. The ability to detect the NUMA topology is critical for making intelligent decisions about memory allocation; minimizing cross-node memory accesses is sometimes the key ingredient to making applications faster.
With the new Windows Server 2012 implementation, and with the accompanying NUMA efficiency from within a guest operating system, we are reevaluating using Hyper-V 3. We anticipate that Hyper-V will allow us to meet our performance targets, while also delivering large cost savings.
Cloud business and cloud-optimized operating system
Bing.com has already seen significant improvements by switching to Windows Server 2012 with the public RC. We basically cut %CPU time usage in half for critical servicing of managed applications (including GC) and startup times when restarting services. We achieved significant performance gains by making process monitoring easier and building it into the operating system. And we are exploring how we can use NUMA efficiency with Hyper-V 3 to enhance our service with virtualization. Bing.com-—a recognizably successful and large-scale, enterprise cloud service—represents how we built Windows Server 2012 to be a truly cloud-optimized operating system.
Bing.com is a cloud service. It’s a fast cloud service, and it runs on thousands of computers that span many datacenters across the globe. And now it’s optimized with Windows Server 2012.
Summary
The promise of the new Windows Server 2012 features intrigued the Bing.com team as we considered migrating to the latest operating system. What began as exploratory evaluations of the impact of a migration quickly led to a full-scale deployment, which benefited greatly from the built-in .NET 4.5 functionality, multicore JIT functionality, and potentially, the much-improved Hyper-V 3 functionality.
Now that you understand the inner workings of these Bing.com optimizations, one last reminder before you depart: All Bing.com search results worldwide are being served by Windows Server 2012!
- Built-in .NET 4.5, included with Windows Server 2012, including the background garbage collection and associated improved latencies
- Improved performance at startup, enabled by the multicore JIT functionality of .NET 4.5
- Ability to collect call stacks for 64-bit .NET JITted applications
- A clear future to Hyper-V 3 adoption