Launching a distributed system is not an easy task, unlike the simple command-line tools or desktop application. It takes time to install software and prepare the server instances. There are many automation tools to make it easy to launch a complicated system in public cloud services like AWS. Terraform is a tool to enable us to achieve infrastructure as a code on the major cloud services. It accelerates not only the process to provision production servers but also the validation of new features or bug fixes you have created.

I have used docker-presto-cluster for testing purposes. But it is necessary to launch multiple node cluster under the environment close to more real cases. I have found Terraform is capable of provisioning the Presto cluster in AWS quickly. This post will introduce new module I have created to provision out-of-the-box Presto cluster in AWS environment.

terraform-aws-presto

terraform-aws-presto is a module to create all resources to launch a Presto cluster in the AWS environment. It uses Docker images to build and distributed by myself and AWS Fargate to start a cluster in the ECS environment.

All resources created by the module are illustrated as follows.

overview

  • VPC
    • Public Subnet
    • Private Subnet
    • Application Load Balancer (ALB)
  • ECS Cluster
    • Task Definition
    • ECS Service

It creates a public subnet and private subnet inside the specific VPC. The default CIDR block of VPS is 10.0.0.0/16. ALB connecting to the coordinator is located in public subnet so that anyone can submit a query to the cluster. All services of Presto (coordinator and worker) are running inside the private subnet. No one can access Presto instances directly.

There are two task definitions in ECS service. One is for the coordinator, and the other is for workers. You can control the number of worker instances by using the Terraform variable.

Usage

For example, this is the minimum code to launch a Presto cluster with two worker instances in the default VPC.

module "presto" {
  source           = "github.com/Lewuathe/terraform-aws-presto"
  cluster_capacity = 2
}

output "alb_dns_name" {
  value = module.presto.alb_dns_name
}

As the module returns the coordinator DNS name, you can get access to the coordinator through ALB.

$ ./presto-cli --server http://presto-XXXX.us-east-1.elb.amazonaws.com \
    --catalog tpch \
    --schema tiny

The module downloads the Docker images distributed in lewuathe/presto-coordinator and lewuathe/presto-worker. You can control the version of Presto installed in the cluster by changing the presto_version input variable.

To create the module, Terraform: Up and Running: Writing Infrastructure as Code was a good reference to learn how Terraform works in general. Take a look if you are interested in Terraform module.

As far as I tried, the module will launch a Presto cluster in a few minutes because it’s minimal and straightforward. You may find something insufficient or not useful. Please give me any feedback or pull requests if you have any requests.

Thanks as usual.

Reference