The First Cry of Atom Today is the first day of the rest of my life.

Read Python Machine Learning

I’ve finished reading this book. Though this is written in English originally, I read the translated version. As written in review in Amazon, the book explains not only theoretical side of machine learning but also practical code. Moreover the balance of these two side was appropriate for me. Most books about machine learning can only tell ... Read more

Build Presto on OSX

As described here, we cannot run Presto package build on OSX. Mainly it was caused by JNI side issue and machine architecture. I knew the issue from Issue 3849. It looks like you’re running on ppc64. Presto only supports x86_64 on Linux (required for bundled Hadoop JNI libraries) and has many assumptions about the architecture being little e... Read more

Multi node Presto cluster on docker

Recently I’m getting started using Presto. This is a distributed SQL query engine like Hive. I’m working on Hadoop, Hive until now. So I have though there are a lot of similarity between Hive and Presto. This is almost true in terms of user interface and SQL syntax. But Presto does not depend on Hadoop distributed architecture. Resource scheduli... Read more

Assemble and creating table in Hive UDF

histogram_numeric is a UDAF which should calculate the distribution of given records. But at the same time it should generate a table that represents one category by one row. In this point we can regard this type of UDF is a combination of UDAF and UDTF. For example the output of histogram_numeric looks like hive> SELECT explode(histogram_nu... Read more

Digdag syntax highlighter in Atom

Digdag was released from Treasure Data. This is a highly scalable distributed workflow engine. It was developed for both analyst and engineers in order to make their daily batch and adhoc jobs more easy. The important part I want to say here is we can define workflow in one file called *.dig. So you can put the file under version control system ... Read more