Using Availability Domains and Fault Domains to Improve Application Resiliency

Oracle Cloud

Cloud Providers / Oracle Cloud 1860 Views 0

The unfortunate truth about technology is that hardware requires maintenance and hardware failures do occur. Cloud resources are affected by the same hardware-related maintenance as traditional on-premises resources. In August 2018, Oracle Cloud Infrastructure introduced fault domains for virtual machine and bare metal Compute instances. Oracle Cloud Infrastructure fault domains ensure that applications can avoid hardware failures and planned hardware maintenance within an availability domain.

An Oracle Cloud Infrastructure availability domain is:

One or more data centers located within a region. Availability domains are isolated from each other, fault tolerant, and very unlikely to fail simultaneously. Because availability domains do not share infrastructure such as power or cooling, or the internal availability domain network, a failure at one availability domain is unlikely to impact the availability of the others. All the availability domains in a region are connected to each other by a low latency, high bandwidth network.

An Oracle Cloud Infrastructure fault domain is:

A grouping of hardware and infrastructure within an availability domain. Each availability domain contains three fault domains. Fault domains let you distribute your instances so that they are not on the same physical hardware within a single availability domain. A hardware failure or Compute hardware maintenance that affects one fault domain does not affect instances in other fault domains.

Sanjay Pillai posted an excellent overview of the theory behind Oracle Cloud Infrastructure fault domains and how you place Compute instances in them. When you are deploying an application to cloud infrastructure, the unique architecture and affinity or anti-affinity requirements of the application determine how instances distribute across fault domains. The Best Practices for Your Compute Instances topic in the Oracle Cloud Infrastructure documentation explains fault domains with a couple of application scenarios.

Deploying cloud services to multiple availability domains is the preferred method to ensure high availability. When your application architecture requires components to be in the same availability domain, choosing the proper fault domain for application components can help protect against resource failures. The sections below analyze the published architectures for JD Edwards EnterpriseOne (JDE) and Oracle E-Business Suite (EBS) and how availability domains and fault domains support the application. There are links at the end of this blog post to the Oracle solution documentation for EBS, JDE, and Siebel CRM.

Multiple Availability Domains

When you use multiple availability domains for an application stack, distributing instances across a single fault domain gives them the proper affinity to each other. In the following JD Edwards EnterpriseOne example, geographically separated availability domains provide application primary redundancy. Fault domains are assigned differently than in the preceding Oracle E-Business Suite example.

Incoming connections to JD Edwards EnterpriseOne are routed to the load balancers in both availability domains via DNS that is external to Oracle Cloud Infrastructure. The load balancers route traffic to the application instances based on the configured distribution policy. In the following example, separate availability domains provide primary application redundancy. All hosts in the presentation tiers and middle tiers belong to FAULT-DOMAIN-1. Because fault domains are specific to an availability domain, FAULT-DOMAIN-1 in Availability Domain 1 is a different set of hardware than& FAULT-DOMAIN-1 in Availability Domain 2, despite having the same name. Because all of the hosts in each availability domain are in the same fault domain, hardware failure or planned maintenance in a geographic region only minimally affects each availability domain. Hardware events that affect hosts in Availability Domain 2 to not affect hosts in Availability Domain 1. Placing all the hosts in the same fault domain ensures that similarly required infrastructure maintenance activities minimally affect the application stack.

Diagram from Learn about Deploying JD Edwards EnterpriseOne on Oracle Cloud Infrastructure & One Availability Domain

When you deploy an application stack in a single availability domain, distributing instances across multiple fault domains gives them the proper anti-affinity to each other. In the following Oracle E-Business Suite example, fault domains ensure that end users have access during hardware failures or planned infrastructure maintenance.

Incoming connections to Oracle E-Business Suite are routed to the servers in the application pool via the load balancer. In the application tier, Host 1 is in FAULT-DOMAIN-1 and Host 2 is in FAULT-DOMAIN-2. If a hardware failure affects Host 1, Oracle E-Business Suite is accessible through Host 2. If Host 1 and Host 2 existed in the same fault domain, Oracle E-Business Suite would likely be inaccessible to end users. In the case of Oracle Cloud Infrastructure hardware maintenance, fault domain maintenance windows do not overlap.

Diagram from Learn About Deploying Oracle E-Business Suite on Oracle Cloud Infrastructure &

If you want to learn more about using fault domains in your application, the following solution documentation links have additional scenarios and details. Carefully selecting the proper fault domain for Compute instances in Oracle Cloud Infrastructure ensures hardware failures and scheduled maintenance activities do not unexpectedly affect your end users.& If you want another perspective, search the web for an article written by my colleague, Luke Feldman, about why fault domains are so crucial in Oracle Cloud Infrastructure.

Finally, if you haven't already signed up for a& free trial of Oracle Cloud Infrastructure, you can do it now. Let us provide the cloud so that you can build the future.

Solutions Design Resources

Comments