A startup’s early learnings in Managed Kubernetes with Amazon EKS

By Deepak Bobbarjung & Vijay Malhotra

At Passage AI, we moved our production workloads to run on Kubernetes early in this year. In this post, we’ll share the next step of our journey, which is to move to a managed Kubernetes offering.

We had set up our initial Kubernetes clusters (including our production cluster) on Azure. We used acs-engine to setup and manage our clusters. While this worked well for our initial needs, we quickly realized there were several benefits to be realized by moving to one of few nascent but maturing managed Kubernetes offering such as AWS EKS, Google GKE, and Azure AKS. Managed Kubernetes services offer the following key benefits that we found attractive:

  1. Availability of control plane nodes
  2. Autoscaling of the worker nodes
  3. Either on-demand or automatic upgrades of the cluster to the latest vetted version of Kubernetes with zero downtime.

While Kubernetes is super dope, we don’t find a lot of joy or glory in managing the master nodes of a Kubernetes cluster— and we recognized early on that having a managed service that provides the above functions would give our engineering team fewer things to worry about.

We decided to explore all three managed Kubernetes options. We already had a footprint in AWS, and AWS EKS went GA recently, so we decided to explore this option first.

Creating the initial cluster in EKS is a breeze and can be done via the AWS console.  There is a EKS dashboard with a ‘Create Cluster’ button and a few simple inputs was all that was needed and within about 5 minutes, we had an EKS cluster up and running.

In the rest of this blog, we will talk about two non-trivial issues that we had to get educated about as part of moving to EKS.

1. RBAC in EKS is tied to the AWS IAM. 

We set up RBAC in our Azure Kubernetes cluster to control access to resources. In order to replicate the same inside the EKS cluster, we had to understand EKS’ integration with AWS IAM. Here are some details that we learned about.

The AWS IAM users/roles are coupled with Kubernetes users/groups. The IAM entity user or role who creates the cluster by default has the system:master permissions in the cluster’s RBAC configuration.

To grant additional IAM users access to the cluster we had to edit a special configmap called aws-auth which you created as a part of setting up the EKS cluster to add worker nodes to the cluster

To apply aws-auth configmap to the cluster

kubectl edit -n kube-system configmap/aws-auth

Along with the mapRoles section, which adds the worker nodes using the rolearn of the node group that you created in the cluster setup, you can add the mapUsers section with the the mapping information of IAM users to the Kubernetes users. It supports the following parameters:
userarn: The ARN of the IAM user to add.
username: The user name within Kubernetes to map to the IAM user. By default, the user name is the ARN of the IAM user.
groups: A list of groups within Kubernetes to which the user is mapped to.

The following is a snippet of how the aws-auth configmap would look for 2 different IAM users mapped to Kubernetes users and groups.


apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
  - rolearn: {ARN of instance role (not instance profile)}
    username: system:node:{{EC2PrivateDNSName}}
    groups:
    - system:bootstrappers
    - system:nodes
  mapUsers: |
  - userarn: arn:aws:iam::018519209949:user/admin-user-1
    username: admin-user-1
    groups:
    - system:masters
  - userarn: arn:aws:iam::018519209949:user/readwrite-user-1
    username: readwrite-user-1
    groups:
    - readwrite-group

Note: If you do not have the configmap as a part of the kube-system namespace, it could be because you have not added the worker nodes to the EKS cluster. In this case, you could follow the EKS getting started tutorial or download the aws-auth configmap file and edit it in your favorite editor using this command:

curl -O https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/aws-auth-cm.yaml

 

2. Creating a cluster with two or more different EC2 instance types.

As an AI company, we needed GPU instances to be part of our cluster. However, we only wanted pods that perform machine learning tasks (either training or prediction) to run on GPUs, and we wanted other pods to run on regular EC2 instance types such as m4.xlarge or t2.large.

When we created our first EKS clusters using the getting started on AWS EKS guide, that brought up the cluster quickly, but it did not provide the ability to bring up nodes of multiple EC2 instance types inside that cluster in a way that would work for us.

Problem faced:
As per the EKS setup guide, we created our initial cluster by adding one cloud formation stack which adds 1 node group of EC2 instances (t2.large in our case) to the cluster. We then added a second cloud formation stack with GPU nodes (AWS EKS supports GPU nodes), p2.xlarge in our case. Although new nodes of p2.xlarge type were added to the cluster, any pod that came up on those nodes was not able to access any of the other nodes or the Internet.

Solution:

The solution to the problem was to to connect the two node groups with a SecurityGroup to allow all the nodes to communicate with each other.

If you followed the getting started on AWS EKS article you might have used the amazon-eks-vpc-sample.yaml to create a VPC used for creating your EKS cluster and cloud formation stack for worker node groups. This VPC config had to be edited to add the following additional parameters for cross node group connections.

– Create a SecurityGroup and a SecurityGroupIngress resource and output the SecurityGroup

========================================================================================================================
Resources:
[...]

CrossNodeSecGroup:
  Type: AWS::EC2::SecurityGroup
  Properties:
    GroupDescription: Allow nodes in separate Node Groups to communicate
    VpcId: !Ref VPC

CrossNodeSecGroupIngress:
  Type: AWS::EC2::SecurityGroupIngress
  Properties:
    GroupId: !Ref CrossNodeSecGroup
    IpProtocol: -1
    SourceSecurityGroupId: !Ref CrossNodeSecGroup

[...]
  Outputs:
[...]

CrossNodeSecurityGroup:
  Description: Security Group that allows communication between nodes in different node groups.
  Value: !Ref CrossNodeSecGroup

========================================================================================================================

In the Node Group stack template (`amazon-eks-nodegroup.yaml`):

– Add a parameter to specify the `CrossNodeSecurityGroup` we created above.
– Add the Security Group to the LaunchConfiguration so that nodes are deployed in that security group.

========================================================================================================================
Parameters: 
[...]

  CrossNodeSecGroup: 
    Type: AWS::EC2::SecurityGroup::Id 
    Description: Security group allowing communication between nodes in separate node groups.
 
[...] 

Resources: 

[...] 

  NodeLaunchConfig: 
    Type: AWS::AutoScaling::LaunchConfiguration 
    Properties: 
    [...] 
    
    SecurityGroups: 
    - !Ref NodeSecurityGroup 
    - !Ref CrossNodeSecGroup 
    BlockDeviceMappings: 

[...] 


========================================================================================================================

Now when you update the VPC stack configs, the cross node security group will be available in the output. This must be added to the parameters of the Node Groups while creating the node groups via the cloud formation stack.
The above VPC and security group configs give you the ability to add different node groups to your EKS cluster.

Conclusion

As an engineering organization that is committed to shipping and deploying our application stack on Kubernetes, we are very excited about the emergence of managed Kubernetes offerings from all the major cloud providers. We’re now able to leverage the benefits of Kubernetes without the tedious but crucial responsibility of managing availability and scale of the Kubernetes nodes, not to mention the burden of doing zero downtime upgrades.

Our experience with moving one of our clusters to EKS has been a positive one. In particular we appreciate the timely support we got from AWS support teams. Their depth of knowledge and promptness in responding to our support requests has been crucial in enabling us to maintain our timelines for our devops roadmap.

We have shared our experience with regards to two specific issues that we believe could be relevant to several other teams that are considering a similar move to EKS. We are happy to further share details of our Kubernetes journey with you.

Please don’t hesitate to reach out – @MalhotraaVijay/vijay@passage.ai, @bobbarjung/deepak@passage.ai

 

Leave a Reply

Your email address will not be published. Required fields are marked *