Being In The Cloud

In this post, I demonstrate how you can quickly create a Kubernetes cluster on Oracle Cloud Infrastructure by using the Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE) service. Then I showcase how you can achieve a significant performance boost by running applications on Container Engine for Kubernetes bare metal instances, and I perform a comparison by running the same workload on Container Engine for Kubernetes virtual machine (VM) instances.

Oracle Cloud Infrastructure is currently the only public cloud provider that provides the capability to run bare metal Kubernetes clusters. Running Kubernetes and containers on bare metal machines shows a 25 to 30 percent performance improvement for both CPU and IO operations, compared to running them on VMs, which makes it suitable for running Big Data applications.

Deployment Architecture

At a high level, the deployment looks as follows:

Deploy a highly available Kubernetes cluster across three availability domains.
Deploy two node pools in this cluster, across three availability domains. One node pool consists of VMStandard1.4 shape nodes, and the other has BMStandard2.52 shape nodes.
Deploy Apache Spark pods on each node pool.

Deployment Steps

Perform the following steps to set up and test the deployment.

Step 1: Deploy a Three-Node VMStandard1.4 Shape Kubernetes Cluster

Create a Kubernetes cluster on Oracle Cloud Infrastructure using Container Engine for Kubernetes, using the steps outlined in this tutorial. This step involves creating the necessary virtual cloud network (VCN), subnets, security list rules, and Identity and Access Management (IAM) policies.
After creating the cluster, deploy a three-node VMStandard1.4 shape node pool on the cluster, with one node per subnet in each availability domain. Download the kubeconfig file from the Oracle Cloud Infrastructure Console to your local environment and have kubectl installed. You need this to create Spark pods and to work with your Kubernetes environment.

The cluster should look similar to the following example:

Step 2: Deploy Apache Spark and Apache Zeppelin Pods on the Node Pool

On the node pool that you just created, deploy one replica of Spark master, one replica of Spark UI-proxy controller, one replica of Apache Zeppelin, and three replicas of Spark master pods. You will use Apache Zeppelin to run Spark computation on the Spark pods. To create the Spark pods, follow the steps outlined in this& GitHub repo.

The spark-master-controller.yaml and spark-worker-controller.yaml files are the necessary Kubernetes manifest files for deploying Spark master and worker controllers, and the spark-master-service.yaml file exposes this as a Kubernetes service. Similarly, there are zepplin-controller.yaml and zepplin-service.yaml manifest files for deploying Zeppelin pods and to expose Zeppelin as a service.

Note: The CPU and memory available for the entire cluster is 24 vCPUs (8 vCPUs per node) and 84 GB of memory (24 GB per node). If you look at the manifest files, you will observe that we are assigning 1vCPU per pod and 1000MiB of memory for Spark worker pods, and 200m vCPU per pod and100MiB of memory for each for Spark master, UI proxy, and Zeppelin pods. We are using the same allocation of memory and CPU per pod throughout this post.

The deployment so far should look as follows:

The process works as follows:

Connect to Apache Zeppelin’s UI and trigger a Spark computation, which in turn interacts with the cluster's hosted Container Engine for Kubernetes master.
The Container Engine for Kubernetes master sends the request to the node that contains the Spark master.
The Spark master delegates the scheduling back to the Kubernetes master to run the Spark jobs on the Spark worker pods.
The Kubernetes master schedules the Spark jobs on the Spark worker pods.
The Spark worker and master pods interact with one other to perform the Spark computation.

In the next step, you initiate the Spark computation by using Zeppelin.

Step 3: Initiate the Spark Computation to Measure the Performance of the Cluster

At the end of step 2, you took the Zeppelin pod and port-forwarded the WebUI port as follows:

kubectl port-forward zepplin-controller-ja9s 8080:8080

After you load the Zeppelin UI, create a new notebook. In it, paste the Python code needed to run the Spark computation, which finds prime numbers in a data set from 0 to 100 million natural numbers. You need to add a %pyspark hint for Zeppelin to understand it.

%pyspark from math import sqrt; from itertools import count, islice def isprime(n): return n > 1 and all(n%i for i in islice(count(2), int(sqrt(n)-1))) nums = sc.parallelize(xrange(100000000)) print nums.filter(isprime).count()

After pasting the code, press Shift+Enter or click the play icon to the right of the snippet. The Spark job runs and the result is displayed. Observe that it takes about 387 seconds to complete this task (completion numbers may vary).

Step 4: Scale the Replicas of Spark Worker Pods and Measure the Performance Again

Use the following command to scale the replicas of Spark worker pods to six, using the same allocation of vCPU and memory per pod as described in step 2.

kubectl scale --replicas=6 rc/spark-worker-controller

You can check the CPU and memory allocation of the cluster by using kubectl describe nodes. CPU Requests and Memory Requests indicate the allocated values.

kubectl describe nodes | grep -A 2 -e "^\\s*CPU Requests"

Now, you can run the Spark computation again on the Zeppelin UI on the newly scaled six replica Spark worker pods cluster, and measure the performance. Notice that the computation happens slightly faster because the Spark jobs were distributed across more sets of worker pods.

Lastly, scale the worker pods to 20 replicas and test the performance of the cluster again. Notice that the performance actually deteriorates because of excessive scaling of worker pods. This time it takes 371 seconds to complete. The “Inferences” section explains why this pattern occurs.

Step 5: Deploy a Three-Node BMStandard2.52 Node Pool in the Same Cluster

Deploy a three-node BMStandard2.52 shape node pool on the same cluster by clicking the Add Node Pool button in the Node Pools section on the Oracle Cloud Infrastructure Console.

Step 6: Repeat Steps 2, 3, and 4

Repeat steps 2, 3, and 4 of deploying Apache Spark and Zeppelin, and running the performance tests on the new node pool. In Container Engine for Kubernetes, each node pool has a unique Kubernetes label assigned. Use these labels to have a "node-affinity" to schedule the Spark jobs on the BMStandard2.52 shape node pool, rather than on the VMStandard1.4 shape node pool.

Note: Use the same vCPU and memory allocation for pods in the bare metal shape cluster, which ensures that the performance comparison is consistent across both the node pools.

Spark computation on three replica Spark worker pods takes just 74 seconds to complete.

Spark computation on six replica Spark worker pods takes 39 seconds to complete.

Finally, the Spark computation on a 20-replica Spark worker pods cluster takes 30 seconds to complete.

The Spark jobs on BMStandard2.52 bare metal node pools finish 5 to 10 times faster compared to the same workloads running on the VMStandard1.4 node pool, although both the node pools have the same vCPU and memory allocation for Spark and Zeppelin pods. The following section discusses the reasons for the performance differences.

Inferences

With no hypervisor overhead, containers on bare metal perform up to 30 percent better than those on VMs, which makes them well suited for running performance-intensive workloads like Big Data and HPC jobs.

Bare metal instances also offer higher packing density for containers, which causes better resource use and provides minimal network traversal for intercontainer communication. For a massively parallel and distributed computation like Spark or Hadoop, minimizing the network traversal for intercontainer communication results in significant performance gain.

Lastly, bare metal instances come with two extremely fast NICs offering 25 Gbps of raw bandwidth each. The bandwidth on VM shapes scales with the size of the VM; a VMStandard1.4 shape offers 1.2 Gbps of network bandwidth. As a result, bare metal instances are better suited for applications that require high network throughput.

Conclusion

This post demonstrated how to quickly deploy a highly available, multiple-availability-domain Kubernetes cluster on Oracle Cloud Infrastructure; running VM and bare metal instances on the same cluster, as separate node pools; and the significant performance benefits that you can achieve by running Big Data and HPC applications on bare metal Kubernetes clusters.

Container Engine for Kubernetes is the only managed Kubernetes offering in the public cloud provider space that lets you create a node pool of bare metal instance shapes in a Kubernetes cluster. As shown in this post, you can create independent node pools of VM shapes and bare metal shapes in the same Kubernetes cluster, and use Kubernetes labels to intelligently route high-performance computation workloads to bare metal node pools and the rest to VM node pools. Having this flexibility is extremely useful, as shown in the following figure:

References

-Abhiram Annangi

& Twitter | LinkedIn

Apache Spark on Kubernetes: Maximizing Big Data Performance on Container Engine for Kubernetes

Comments

Categories

Pages

Tags