Deploying Cloud Foundation 3.9.1 on VxRail; Parts 1 & 2

Before I move on and dedicate the majority of my time to Cloud Foundation 4, I created a relatively short series of screencasts detailing the process to deploy Cloud Foundation 3.9.1 on VxRail. 

I say detailing, I really mean quite a high-level overview. It’s by no means a replacement for actually reading and understanding the documentation. I’ve split the whole show into six parts;

  1. Deploying Cloud Builder
  2. Performing the Cloud Foundation bringup
  3. Creating a workload domain
  4. Deploying vRealize Lifecycle Manager
  5. Deploying vRealize Automation
  6. Deploying vRealize Operations Manager

It’s my hope that each of the fairly brief videos will provide an overview of the deployment process and maybe even help someone that is in a “what the hell is this screen and what do I do next?” scenario.

My environment for this series is 7 E460F VxRail nodes. The nodes have had a RASR upgrade to 4.7.410 and four of them have already been built into a cluster for my Cloud Foundation management domain. It goes without saying that I’m following the VMware bill of materials for version 3.9.1.

Before we do anything, we need to get Cloud Builder running. That’s what I’ve done in part 1 below. For all the videos in this series, it’s better to view fullscreen. Unless you like squinting at microscopic text of course.

Prerequisites for this part are easy, you need the Cloud Builder OVA. Unfortunately, the prerequisites aren’t going to remain this easy to satisfy throughout the rest of the series.

In the video above, I’ve also included two of the prerequisites for the next part;

  1. Externalising the vCenter server. This was made much easier in later VxRail builds thankfully.
  2. Converting the management portgroup to ephemeral binding.

Because simply deploying an OVA isn’t exactly face meltingly exciting, I’m including the second part of the series in this post also.

That second part being the actual deployment/bringup of Cloud Foundation and establishing the management cluster.

The prerequisites for this part are slightly more demanding. In what could be a frustrating move, I’m going to insist that you go out and search for these yourself. Or just deploy Cloud Builder and check out the extensive list you get when you attempt a bringup. The three that concern me most are;

  1. Make sure you have end to end jumbo frames configured (MTU of 9000). VMware don’t specifically recommend this on all VLANs, but I usually go jumbo everywhere to save me time and potential troubleshooting headaches later.
  2. Enable and configure BGP on your top of rack switches. In 3.9.1, we’re going with BGP right from the start with something VMware is calling “Application Virtual Networks” (AVNs). Or to everyone else, NSX-V logical switches. Two of these will be configured from day 1, so we’ll need to set up BGP peers on the ToRs and make sure the network is set up to route to the subnets for the AVNs (in the case where you’re not running dynamic routing everywhere). 
  3. DHCP for VXLAN VTEPs. I don’t have DHCP readily available in the lab, so this has been a pain for me since the first VCF on VxRail deployment. I end up deploying pfsense onto the management cluster, configuring it and then shutting it down and removing it from inventory. Once the Cloud Foundation bringup validation is complete and bringup is running, I hop back into vCenter and add the VM to inventory and power it up. That’s shown in the video below. Reason being that if any unknown VMs are running while bringup validation is running, it seems to make it fail. 

Everything else is taken care of. I’ve configured all the DNS records and ensured the cluster nodes are healthy in vCenter.

A word of caution before continuing. Be sure, very sure, that your deployment parameter excel spreadsheet is correctly completed. Make sure all the IP addresses and FQDNs you’ve entered are correct and everything is set up in DNS and forward & reverse lookups are perfect. The bringup validation won’t necessarily catch all errors and if bringup kicks off or gets half way through and then fails due to an incorrect IP address, you’re going to be resetting your VxRail and starting from scratch. Ask me how I know…

Having a look at the Planning & Preparation guide is probably a wise choice before we go kicking off any bringups.

On with part 2 and getting the management domain up and running.

In the above video, you’ll see the bringup failed while validating BGP. When Cloud Builder deploys the NSX edge service gateways for the AVN subnets, it doesn’t specify default gateways. So no traffic can get out of the two AVN NSX segments. Digging through the planning & prep guide, I can’t see any specific requirement for what I’ve done. That being to enable default-originate within the BGP neighbor config for each of the four peerings to the ESGs. That way, a default route is advertised to the ESGs and everybody is happy. Maybe this is environment specific, maybe it’s an omission from the guide. Either way, works for me in my lab!

That’s it for now. Next up, I’ll be adding a workload domain.

Leave a Reply

Your email address will not be published. Required fields are marked *