Cloud TCO

Total Cost of Ownership Benefits With Cloud HPC

A Cost Model for In-House Versus In-Cloud High Performance Computing

How cost efficient is HPC in the cloud?

 

An engineer or scientists has three compute options for their design and development:

  1. A workstation

  2. An in-house server

  3. Compute power in the cloud

Each option has benefits and drawbacks. This article discusses some of these issues, describes a model to calculate cloud TCO, and shows a simple way to compare the costs across these options. Specifically, we'll see the ways to add extra compute power from the cloud: on demand, or a hybrid model with a smaller in-house system for average daily load and cloud cycles for peak demands.

 

The reality of utilization

 

Small and medium size enterprises (SMEs) can benefit greatly by using HPC technology for design and development. There are big cost savings by reducing product failure early during design, development, and production. More simulations lead to higher quality products, and more computing power means shorter time to market. All this powers competitiveness and innovation.

Less than 10% of manufacturers today use HPC servers for computer simulations to design and develop their products. This data is from two studies, ‘Reflect’ and ‘Reveal’, by the US Council of Competitiveness. Over 90% of companies do virtual prototyping and large-scale data modeling on small desktop computers such as workstations or laptops. The same studies show that 57% of these companies say they have problems they can’t solve because their desktops are too slow. Their geometry or physics are too complex and need more memory than their desktop has. These companies need high performance computing.

 

There are two ways to get HPC computing power to supplement the desktop system. The traditional option is to buy an HPC server which is many times faster and more capable than the desktop workstation. But, for many SME companies, buying an HPC server is not viable. There is a high Total Cost of Ownership, TCO, as shown by IDC's Typical Three-Year Server TCO. The hardware is only 7% of the total cost, and the hardware and software together is only 14%. 

IDC TCO

 

The bulk of the total cost is actually IT operations and training. The high cost of expertise (staffing), equipment, maintenance, middleware, and training is something even big companies don't want to take on. Then there is the long and painful approval and buying processes. News skills and people are needed to run and fix such a system. So, buying an HPC server is not a particularly attractive option, but it is one possible option nonetheless, and it was the only possible option until a few years ago.

 

Cloud computing to the rescue

 

Cloud computing now gives SMEs a second viable option. They can get the benefits of HPC without buying and operating their own HPC system. HPC in the Cloud allows engineers to continue to use their own desktop system for daily design and development work, and submit large, complex, time-consuming jobs into the cloud. Other benefits of HPC Cloud are:

  • On-demand access to ‘infinite’ resources
  • Pay per use
  • Reduced capital expenditure (CAPEX)
  • Greater business agility
  • Higher-quality results
  • Lower risk
  • Lower product failure rate
  • Dynamic scaling resources up and down as needed.

 

So how much does this cost? Here is an analysis of only the total cost of an HPC system. Of course, in real-life a more detailed analysis may be required. For example, you need pay-per-use application software licenses; your application and data should be suitable for cloud security, and you should be able to trust your service provider's security. You also need to be sure data transfers are not a bottleneck.

 

 

The real cost of an HPC system

 

To get a realistic estimate of the cost, lets assume a company needs to run a mix of simulation jobs with an industry leading software such as Ansys. Most of time the Ansys Mechanical jobs run on 32 cores, while some larger jobs with more fine-grained geometry and more sophisticated physics such as Ansys Fluid Structure Interaction need 256 cores. So, the company buys a typical 16-node HPC system, each node with 2 CPUs with 16 cores, resulting in 256 cores in total. With this system the company can perform the 32-core runs as well as the 256-core runs. A reasonable price of such a system would be say $70K.

Now, according to IDC, the Total Cost of Ownership of such a system is:

  • $1M over three years (or)
  • $333K per year for 256 cores (or)
  • $1,302 per core per year (or)
  • $0.149 for one core per hour (core hour)

 

Of course this is only if you are utilizing this system 100%. A quick Google search for ‘average server utilization’ shows that actually utilization rates hover between 5% and 20%. This is because of a peak-and-valley utilization pattern caused by varying workloads and time of day. Now, you can also find 90% utilization if you use a workload manager such as Grid Engine.

The following table shows the total cost of one core hour for a 16-node (256-core) inhouse HPC cluster depending on % utilization or ‘number of busy nodes’. The real cost per core/hour is calculated as: cost of 100% utilization divided by real utilization. e.g. for a utilization of 20%: ($0.149 / 20) * 100 = $0.75.

 

TCO Analysis

 

Table: Cost per core per hour for an inhouse HPC server with 16 compute nodes (256 cores),

as a function of utilization

In this example, with an average utilization of say 20%, the real cost for one core hour is $0.75. That is 5 times higher than for a 100% utilized cluster. For 40% utilization, which is not uncommon in HPC, the cost of one core hour is still $0.37. A general formula for X% of cluster utilization reads:

Cost per core/h ($) = { Cluster price ($) * TCO factor for 1 year (100/7/3) } / { # cores * 365 * 24 * utilization (%) / 100 }

Now lets assume that the cost of Ansys Cloud with an in-cloud solution is $0.20 per core hour. Note: you can get much cheaper prices today on a true enterprise HPC Cloud such as Microsoft Azure, but this is to make the math easier.  

The cost of the in-cloud solution will only exceed the in-house solution, if the latter's average utilization is over 75%. 

These utilization numbers only occur in academic or big-industry Supercomputing Centers serving hundreds or even thousands of users. Also, the average price for one core per hour on a true Cloud HPC system with low-latency high-bandwidth communication and big RAM has dropped down quite a bit to $0.10 and even below, often with support included. Additional cost of the engineering application software can be in the range of $0.20 – $0.50 for one core per hour for a software with $20K – $50K annual license fee.

 

The cost of HPC in the cloud

 

Now the comparison: what would it cost to move an in-house HPC server that is 20% utilized cluster to the cloud? The in-house server that is 20% utilized is equivalent to 256 cores * 24 hours * 365 days * 20 % = 448,512 core hours. Recall that cost $333K/year. If we assume $0.20 per core-hr for cloud HPC this gives us $89,702.

Compared to the in-house server, the cloud option offers more than a 3X  better TCO.

 

TCO Comparison

 

 

 

Now let’s go back to the scenario of a company that needs to run a mix of 32-core and 256-core simulation jobs, at an average utilization of 20% . Instead of buying for peak usage of 256-cores, lets that they buy for the average use of 32-cores. We assume the 32-core workload utilizes a small in-house HPC cluster with two 16-core nodes, to accommodate all the 32-core jobs. This cluster is just 12.5% of the size of the big 256-core cluster with $1M 3-year TCO, i.e. roughly $42K per year (knowing that IDC’s TCO model gets more inaccurate for very small clusters). With an excellent cluster utilization of for example 92% of this small in-house cluster (to match the 20% utilization of the big cluster mentioned above) the core hour for the small cluster comes to $0.18. The remaining 256-core (16-node) big jobs run in the cloud, for about one month during the one year. We choose one month to get to the 20% utilization of the big 256-core 16-node in-house cluster. The core hour in the cloud is again $0.20, resulting in $37K for one month. In-house cost of $42K (for the small jobs) and in-cloud cost of $37K (for the big jobs) for this hybrid solution result in $79K per year for the hybrid solution, compared with $90K for the full HPC in the Cloud service, and $333K per year for the large 256-core in-house cluster.

We need to think holistically about Total Cost of Ownership. That is why the very first word of TCO is TOTAL. Without a rigorous analysis that includes all costs, any comparison is meaningless. So if total cost of ownership is important to you, the choice is clear: the hybrid and the cloud solutions trounce the in-house HPC cluster by a factor of 4.2 and 3.7 respectively. 

 

If you're ready for TCO study for your specific situation, get in touch.

 

Free Consultation


 

 

Build your simulation capability in the cloud.

CONTACT US