The First Cry of Atom

Run queries in your local Presto cluster on Docker

Once I created a tool to launch a Presto cluster in your local machine by using Docker a few years ago.

This tool enables you to launch your Presto cluster with multiple nodes (i.e. multiple Docker containers) so that you can easily test your own connector or improvements in the environment close to the production environment. I described the detail of the framework in the previous posts.

Yesterday, I’ve got a question about the usage of the framework about how to connect to the Presto cluster and submit queries. So I’m going to describe the way to access the Presto cluster running on your local machine with Docker.

Presto Command Line Tool

Presto open source project provides a simple command line tool to submit SQL to any Presto cluster. That implements all protocols and interfaces necessary to run the query. You can get the tool from the official documentation site, “2.2. Command Line Interface”. After you download the CLI, you should be able to use that like this.

$ chmod +x presto-cli-301-executable.jar
$ ./presto-cli-301-executable.jar --server localhost:8080 --catalog tpch

Since docker container of docker-presto-cluster exposes 8080 port to the host machine, the CLI can recognize the 8080 port just like as normal Presto cluster. A console is launched and you can now submit any query to the Presto cluster running Docker container.

Presto Client Libraries

Of course, you can use any kind of Preto client libraries. Here is the list of client libraries as far as I recognize.

Language Repository
Ruby treasure-data/presto-client-ruby
Node tagomoris/presto-client-node
Go prestodb/presto-go-client
C easydatawarehousing/prestoclient
Java JDBC Driver
PHP 360d-io-labs/PhpPrestoClient
Python easydatawarehousing/prestoclient/
R prestodb/RPresto

For example, you can use Ruby client as follows without any modification to the library itself.

require 'presto-client'

# create a client object:
client = Presto::Client.new(
  server: "localhost:8080",   # Specify the exact port exposed by Docker container
  ssl: {verify: false},
  catalog: "tpch",
  schema: "default",
  time_zone: "US/Pacific",
  language: "English",
  http_debug: true,
)

# run a query and get results as an array of arrays:
columns, rows = client.run("select * from system.nodes")
rows.each {|row|
  p row  # row is an array
}

So overall, you can connect to the Docker container running coordinator process with 8080 by using any kind of existing tools without any modification as far as it exposes 8080 port to the host machine. You need expose directive in Dockerfile and port mapping in your docker-compose.yml. “Docker in Action” is a good book to know the fundamental usage and interfaces of Docker.

But please let me know if you find something wrong around docker-presto-cluster anytime. Issues and pull requests are always welcome. Thanks!