Table of Contents

 

How to use Azure HPC for Engineering Simulations

You need a scalable, secure, cost-effective way to run your engineering simulations - so you picked Microsoft's Cloud Platform: Azure. Azure HPC has become the de-facto choice for High Performance Computing in the cloud. But setting up your simulations on Azure requires more than simply signing an Enterprise Agreement. You need to carefully plan your approach, deploy the right technology components and resources, and put the correct measures in place so you can reap the huge benefits to come. 

 

Introduction

The world of IT has changed forever. We now live in a world where 80% of enterprises are adopting a Cloud 1st Policy marking an inflection point in the evolution of enterprise cloud. If you remember, it wasn't long ago that Microsoft's cloud-based email service was seen as unsuitable for large corporations. Then Microsoft started signing up companies such as Lowes to move their email to the cloud. Suddenly, the CIOs who still wanted to run their own exchange server looked woefully out of touch. Fast forward to today and Microsoft's cloud email Office365 serves over 120 million business users. The security and latency concerns have been addressed and it is business as usual.

And that was just the beginning. More applications and verticals followed. ERP, Databases, Analytics - and now High Performance Computing.

Everyone wants the benefit of not doing IT the old way.

As with any change of this magnitude, there will be winners and losers. And just like any change, the winners will be the ones who adapt to the new world. Because the rewards for adapting are huge. 

 

Cloud Computing for Engineering Organizations 

Cloud computing is a modern option that complements your workstation and traditional on-premises datacenter. The cloud vendor, in this case Microsoft, takes over the responsibility for hardware purchase and maintenance and provides a wide variety of platform services that you can use. You lease the  hardware and software services you need on an as-needed basis. This has the effect of converting your capital expenses into operational expenses. It also allows you to lease access to specialized hardware and software resources that would be too expensive to purchase. 

As an engineering organization, you need to decide exactly what you think you're moving to the cloud. If the answer is "my ANSYS simulations" then understand that this involves a lot more than just hardware and Azure VMs.

You need to get clear on what exactly you're moving to the Cloud

For example, you need to map out your department's engineering applications and understand the dependencies — how they work with each other and depend on each other. You need to have an idea of what kind of data will move between the various engineering applications you use. You want to understand if these engineer applications depend upon on-premise IT services . 

If you move your ANSYSTM workflow to Azure but keep your post processing systems in your data center, then you may end up increasing your network traffic, slowing your process - all while paying more!

The Basics of Microsoft Azure

Azure Cloud for Engineers

  

What is Microsoft Azure?

Microsoft Azure is a cloud platform you can use to build, deploy, and manage solutions for a wide range of uses.   

Azure is an open and flexible cloud platform that enables you to quickly build, deploy, and manage applications across a global network of Microsoft-managed datacenters. You can build applications using any language, tool, or framework. And you can integrate your public cloud applications with your existing IT environment.

 

Azure HPC

Figure 1: Azure Offers a Dizzying Array of Services

 

Azure offers a dizzying array of services, and continues to add services at a blinding pace. In fact, this alone should be a good reason for enterprise IT teams to stop viewing Azure as competition and instead view it as another powerful tool in their arsenal to help their customers.

To illustrate the breadth of Azure for HPC, lets look at two specific services.

Azure IaaS

The first service is Azure Infrastructure-as-a-Service. This is the classic DIY or Do-It-Yourself approach where instead of buying hardware from a vendor, you lease VMs from Azure. This is called Infrastructure as a Service or IaaS. Customers can then deploy their own middleware and end-user software to run their HPC workloads on Azure. IaaS is an efficient way to get infrastructure on demand. The scalability and elasticity it provides are very desirable cloud features. 

There are some limitations with this do-it-yourself approach. First: running HPC workloads on Azure is not a simple, one-time activity. There are many moving parts and multiple stakeholders including the engineer, IT, software vendor, etc. In addition, you need the software itself in addition to infrastructure. 

 

Azure Marketplace

At the other end of the spectrum is the Azure Marketplace. This is a Software-as-a-Service solution that gives customers a complete, self-service user experience. Even non-HPC users can easily launch entry level HPC services. These services automatically deploy many popular engineering applications to Azure instances, leading to an easy to use, supported, zero maintenance, user experience for engineers.

Azure  offers a service for every need. The key is knowing exactly what your user's pain points are and applying the right tool.

 


 

Benefits of Cloud HPC for Engineering Organizations  

 

Now that we know a bit more about Azure, let's review why engineering organizations should use it at all. This includes a number of obvious and some not-so-obvious benefits. 

  1. Burst to the Azure cloud when you need additional compute capacity: This bursting is invaluable to engineers to avoid project delays when your internal HPC capacities have reached their limits. 
  2. Improved Total Cost of Ownership (TCO): Azure can help you transition your IT costs from a Capital Expenditure (CAPEX) to an Operational Expenditure (OPEX) model. Azure is a pay-per-use model whereas an in-house cluster is at fixed cost including CAPEX and OPEX. In addition, the Total Cost of Ownership for a comparable service will be much lower in Azure. The key is to look at this holistically. For example a comparison of  the unit cost of compute power ($/CPU core hours) on Azure vs your in-house system is meaningless unless it factors in the TOTAL cost of the in-house system because that's what is being replaced. Thereforce it should include personnel, middleware software, power, and floor space, as well as system utilization less than 100%. To get this correct you should require that your Azure partner helps you with a full TCO study and comparison
  3. Strategic use of HPC: Helping the organization shift from treating HPC resources as a lights-on activity handled by overloaded IT teams, to a strategic capability for the organization. For example, the R&D organization may want to take over direct management of the systems they need. Now this is possible since it is a vendor-management role, rather than a technical role.
  4. Green Agenda: Driving the customer’s green agenda for energy efficiency in the datacenter.
  5. Enhanced Security: Handling physical, systems/network, operational security.

Benefits of Azure for Engineering


Figure 2: Advantages of parallel computing

 

Azure enables you to take advantage of parallel computing - something that your engineering software such as ANSYS is already optimized for. Parallel computing lets you do more and more in less and less time.

In addition, Microsoft partners with specialized HPC service providers to expand the value proposition further:

  1. HPC Expertise: Microsoft partners install, tune and maintain the HPC environment for customers who need the extra help.
  2. PoC and ROI: Microsoft partners prove the solution in a proof of concept (PoC) phase, which helps the customer to calculate the return on investment (ROI) and the partners to accelerate the sales cycle.
  3. Production-ready: Some Microsoft partners even provide compute environments tailored specifically for customers’ applications (SaaS), based on a deep understanding of engineering software. Examples: ANSYS Fluent, CFX, Mechanical, LS-DYNA; COMSOL; STAR-CCM+; OpenFOAM, Abaqus, Numeca)
  4. Expert engineering, Cloud, HPC Support: Microsoft partners provide qualified support for HPC Tech support to assist the in-house HPC support team. 

 

 

Why is Azure the best cloud for HPC?

 

Microsoft has invested billions of dollars in a modern hyperscale infrastructure. The Azure platform is highly differentiated in two major infrastructure components that are required for HPC:

 

  • High Bandwidth - Low Latency Networking (InfiniBand / RDMA): As the complexity of mathematical modeling increases, the need for additional compute power grows. A low latency network can efficiently leverage more computers to solve a problem. Without a low latency network, adding computers may not shorten computation time, since network wait time increases. Azure has an HPC-grade Low Latency Network, not available on many other public clouds. 
  • Co-processors (GPUs): Co-processors (e.g. those from NVIDIA and Intel) are used in parallel computing to accelerate math calculations as well as increase the performance of graphics rendering for 3D engineering models. Azure offers VMs (N-series) with up-to-date GPU’s and provides more compute power on its GPU instances. Azure provides GPU nodes (currently from NVIDIA) coupled with low latency networking available for HPC workloads.

Azure HPC 

Figure 3: Evolution of HPC on Azure

 

Azure offers engineering workloads a distinct advantage and it is a good time for R&D organizations to exploit that advantage.

Microsoft also has a strong advantage in the context of enterprise engagements. Since HPC workloads carry strategic importance for enterprises, this type of workload requires the capabilities of a focused enterprise vendor team. If a Microsoft Enterprise Agreement is in place which includes Microsoft products, services and cloud, this is a great step towards a successful HPC implementation on Azure.  

 

How to setup your first simulation on
Cloud HPC

Now that you are convinced that Azure can deliver incredible benefits to your engineering workflow, it is time to set up your first project. After all, this is a new technology that needs to integrate into your own business processes. On the other hand, if you don't really need it to be integrated into your processes, you can simply use a self-service option that's available on the Azure Marketplace. But if you want to incorporate Azure as a tool in your engineering tool chain, then you need a methodical approach to your implementation.

Fortunately, there are some common best practices you can employ to make sure you have covered all the bases. Figure 4 shows a sequence of steps that an organization needs to take including the important considerations, both technical and organizational, that must be tackled for a successful implementation.

Azure for Engineers

 Figure 4 - Deployment Strategy for Simulations on Azure

 

Preparation

Before you begin, it's a good idea to document your in-house infrastructure so you know what you're dealing with. Establishing this as-is state can be very helpful down the road. Here are some example questions you should consider:

  1. What type of hardware resources are the engineers using today to run their simulations? If these are workstations, what is the sizing?
  2. If the engineers have access to an in-house HPC Cluster what are the specifications? What is its age?
  3. How do you manage your workloads? Do you use any HPC middleware software?
  4. What are the different engineering software being used? 

Getting these collated and documented gives you a good baseline to start. 

Meet with your Microsoft AE and Partner Team

The next step should be to meet with the team that will be responsible for delivering the project. Ensure that you bring a representative from the Engineering organization (a user of HPC) to attend the meeting.

  1. Services. The meeting should be your opportunity to ask about all the options available to your organizations. Ask for a description of the relevant Azure Services. For example, try to understand what Azure IaaS is and what the Marketplace is. They are 2 means of achieving the same end and place different levels of responsiblity on you and the partner. If your team is interested in scaling and automation, you'd want to ask about Azure Batch. If you use a Windows based HPC cluster then the technology you want to know more about is called HPC Pack. The bottom line is: there are a lot of options and it helps to make the right choice early on.
  2. Partner ecosystem. These are partners with specific Azure competencies to help the HPC support team during the introduction, testing and even production phases.
  3. Demo. Request to see an interactive demo of the entire solution so you can see what’s possible and ask more questions. The demo can be a part of the introduction and can be delivered by one of the Microsoft Azure Partners. Request to see the Azure basics such as creating a VM, how to run your CAE application, and various forms of automation.
  4. Performance Benchmarks and Case Studies. Ask for case studies of your CAE application on Azure so you can compare the performance characteristics to your HPC environment. Azure performance benchmarks and Azure case studies are already available for many popular software applications through Azure HPC product team as well as Microsoft Azure Partners. 

It is a good idea to run a PoC to test your own codes and models on Azure as you get ready to move into more widespread implementation.

How to integrate Azure into your
R&D workflow

 

Once you are successful with your first project on Azure, you will gain the confidence to start integrating Azure into your regular R&D workflow. 

As you start integrating Azure into your R&D workflow, more stakeholders from IT, Procurement, and Legal need to get involved. It is a good idea to document what the shared responsibilities will be between you, Microsoft and the Microsoft partner who is helping you. 

 

Your responsiblities

Here are some areas that you might want to consider as you take this next step:

  1. What level of access will be required to your organization's key personnel. This includes end-users, IT and operational staff.
  2. Are there any relevant operational performance standards in effect in your organization that may be related to the delivery of this project
  3. Do you have an escalation procedure in the event that timely responses are not provided to the project team to enable the Service to be delivered within the established time frames
  4. Will you be able to provide Partner staff with the level of access they need to the relevant section of your Azure subscription
  5. Is there a VPN in place that you need to include in your network architecture
  6. Who will be responsible for all Azure usage charges during this project.

 

Assumptions

At this stage it is critical that you get rigorous with your project planning. With so many stakeholders it is easy to get lost in who is doing what. One good way to document this is to agree on a set of assumptions. Here is an example of some common ones:

  1. Who will be the project manager for this engagement and coordinate project management activities with the project manager on your end.
  2. Who will have primary responsibility for coordinating all activities for this Service, including scheduling resources, confirming project activities and that Deliverables are within scope.
  3. Who will be the single point of contact for this Service and will explain the skill sets that are required on your end. For example, if trained Azure Administrators are needed that needs to be called out.
  4. Will the Partner perform the Service remotely or travel to your site if required. On-site visits will entail additional travel and labor costs  - what will be the Partner's Time and Materials rates.
  5. Who will own all build specifications, design specifications, run books, operations guides, solution design documents, custom scripts, test plans, and/or any other documentation.
  6. What software tools will be used and what are the licensince terms. What will be the corresponding fee for the number of copies and the number of users.

 

Security

Data security is critical to organizations, regardless of whether that data is in the cloud or not. You want to know, "Is my engineering data secure in the cloud?". You want to work with a partner that maintains the highest standards of security to protect your privacy and your data. You need to validate that the partner enforces strict internal product controls, and regularly audits its policies and procedures. 

The primary areas of security include: Physical Security, System Security, Operational Security, Application and Data Security. 


The security protocols you want to pay attention to are:

  • Security is designed “from the ground up“ in the application, network, hardware, and procedures.
  • Clear guidelines for Logical Security as well as Physical Security.
  • Use of enterprise data centers with world-class protection and monitoring.
  • Strong encryption options to secure data in motion and data at rest.
  • Authentication procedures that leverage best practices such as multi-factor authentication.
  • Mechanisms to ensure that only authorized individuals have access to data per security policies
  • Code development, testing, and operations that adhere to security best practices.
  • Regular review of the policies and procedures for security and operations. 

Engineering data is of strategic importance. Protection of this data from accidental deletion and unauthorized change or disclosure is a key part of any cloud implementation.

Microsoft's Leadership in Security

Microsoft has been leading the industry in establishing clear security and privacy requirements for cloud computing and then consistently meeting these requirements.

Microsoft datacenters are protected by layers of defense-in-depth security that include a variety of physical safeguards. Microsoft's Cyber Defense Operations Center (CDOC) are manned by security response experts to help protect, detect, and respond 24/7 to security threats in real time. 

Azure Storage offers a set of security features that help secure your storage account by using Role - Based Access Control (RBAC) and Microsoft Azure Active Directory (Azure AD). Azure also offers Storage Service Encryption, which will encrypt data written to the storage account.

Finally, Azure meets a broad set of international and industry-specific compliance standards, such as ISO 27001, HIPAA, FedRAMP, SOC 1 and SOC 2, as well as country-specific standards like Australia IRAP, UK G-Cloud, and Singapore MTCS. Rigorous third-party audits, such as by the British Standards Institute, verify Azure’s adherence to the strict security controls these standards mandate.

Azure Government Certifications

 Figure 5 - Azure has the Widest Certifications and Coverage

 

For more details please read the UberCloud Azure Security White Paper

How to scale and automate your simulations on Azure

 

 

Computer Aided Engineering applications can scale to thousands of compute cores in the cloud. They can also be used to extend on-premises clusters. This solution is implemented with a service from Microsoft called Azure Batch. This service includes all the HPC capabilities you need including job scheduling, auto-scaling of compute resources, and more.

 azure batch

 Figure 6 - Azure Batch gives you all the scalability and automation you need

 

Azure Batch

Azure Batch is a platform service for running large-scale parallel and high-performance computing (HPC) applications efficiently in the cloud. Azure Batch schedules compute-intensive work to run on a managed collection of virtual machines and can automatically scale compute resources to meet the needs of your jobs by creating virtual machines when needed and destroying the VMs when less jobs are queued.

With Azure Batch, you can easily define Azure compute resources to execute your applications in parallel, and at scale. There's no need to manually create, configure, and manage an HPC cluster, individual virtual machines, virtual networks, or complex job or task scheduling infrastructure. Azure Batch automates or simplifies these tasks for you.

 
Batch is a managed Azure service that is used for batch processing or batch computing--running a large volume of similar tasks for a desired result. Batch computing is most commonly used by organizations that regularly process, transform, and analyze large volumes of data.
Azure Batch also supports Docker containers. This means that you can use it to orchestrate containerized CAE applications with it. 
 

The following high-level workflow is typical of nearly all applications and services that use the Batch service for processing parallel workloads:

  1. Upload the data files that you want to process to an Azure Storage account. Batch includes built-in support for accessing Azure Blob storage, and your tasks can download these files to compute nodes when the tasks are run.
  2. Upload the application files that your tasks will run. These files can be binaries or scripts and their dependencies, and are executed by the tasks in your jobs. Your tasks can download these files from your Storage account, or you can use the application packages feature of Batch for application management and deployment.
  3. Create a pool of compute nodes. When you create a pool, you specify the number of compute nodes for the pool, their size, and the operating system. When each task in your job runs, it's assigned to execute on one of the nodes in your pool.
  4. Create a job. A job manages a collection of tasks. You associate each job to a specific pool where that job's tasks will run.
  5. Add tasks to the job. Each task runs the application or script that you uploaded to process the data files it downloads from your Storage account. As each task completes, it can upload its output to Azure Storage.
  6. Monitor job progress and retrieve the task output from Azure Storage. 

When you choose an RDMA-capable size such as H16r for the compute nodes in your Batch pool, your MPI application can take advantage of Azure's high-performance, low-latency remote direct memory access (RDMA) network.

Azure Batch is a free service; you aren't charged for the Batch account itself. You are charged for the underlying Azure compute resources that your Batch solutions consume, and for the resources consumed by other services when your workloads run. For example, you are charged for the compute nodes (VMs) in your pools and for the data you store in Azure Storage as input or output for your tasks. Similarly, if you use the application packages feature of Batch, you are charged for the Azure Storage resources used for storing your application packages. See Batch pricing for more information.

Low-priority VMs can significantly reduce the cost of Batch workloads. For information about pricing for low-priority VMs, see Batch Pricing. 

Note: Amazon has a container management system as well (ACS), which focusses on micro services. Azure Batch on the other hand was built for HPC and other batch processing needs.

 

Resource Managers

Resource managers tie HPC workloads together. This is how we abstract the complexity of HPC workloads. 

If you already have an on-premise cluster managed through your favorite resource manager, then you can integrate that to Azure. Examples of common resource managers include Univa GridEngine, IBM LSF, Altair PBS Pro. These all can be used to tie Cloud resources to on-premise resources to manage the Hybrid environment.

To take one example, Univa® UniCloud® is a perfect fit if your workload volume is increasing. You can define rules in UniCloud and it will dynamically adjust Cloud usage accordingly. The way it does this is by monitoring the workloads queuing up in your on-premise resource manager and sending eligible workloads to Azure.

Every engineer uses applications with a variety of compute requirements. Some applications require distributed memory via MPI, where others are single threaded. Workloads also differ in their hardware requirements. For example some workloads run faster with the aid of a co-processor. Managing hybrid infrastructures and assigning the most appropriate resources to each workload is what resource managers are very good at. A resource manager is also good at tying workloads together to form a workflow.

All the popular resource managers are adding support for Azure and the Cloud and making it possible to mix on-premise compute resources and Cloud resources to achieve a Hybrid infrastructure. 
 

Final Thoughts


“Getting productive on Azure doesn't have to be hard. You just need the right combination of skills, tools and processes.”

Getting started with a complex FEA or CFD application on Azure can be daunting. Fortunately, this has been done before and there are ways to make this process fast, low-risk and inexpensive. Here’s what you need to remember.

Working with a partner can be helpful as you start down this journey. But you should insist that the partner show you how they can help you with the following:

  1. Speed Up Your Time to Value: In the fast moving world of cloud computing, its essential to get early wins. Otherwise you can get paralyzed by the complexity and fall into the trap of endless planning. Insist that your vendors demonstrate how they will get your engineers the fastest time to value. For example, if your engineers need to change the way they work to use cloud, that should be a non-starter. After all, your engineers have spent years learning the nuances of their favorite simulation software. You don’t want them to learn a new interface or go through a complex on-boarding process just to get the benefits of cloud. The Cloud simulation solution you pick should look and feel exactly like your engineer's desktop workstation. 
  2. Lower Your Total Cost of Ownership: You want to make sure you do a simple TCO study so that you know roughly what to expect when you implement Azure. Your partner should be able to help with this.
  3. Lower Your Risk: Your partnership should have a track record of success. This lowers your risk and increases the chances your project will meet or exceed its goals.

 

Need help setting up your engineers on Azure?