The First Cry of Atom Today is the first day of the rest of my life.

About BlinkDB

Today I found a interesting commit in Presto project. Remove support for approximate queries What is approximate queries? Why do we use approximate queries? The idea was originally developed in BlinkDB. According to the official website of BlinkDB, it’s called approximate query engine. BlinkDB is a massively parallel, approximate query en... Read more

Tuning G1 GC algorithm on JVM

Recently I faced the necessity to tune garbage collection of our Java application. The application is Presto , distributed query execution engine. It requires a lot of memory but needs to achieve high throughput and low latency. So I read a book about tuning Java applications. Since Presto uses G1 GC algorithm, I want to summarize how to tune... Read more

Hivemall is now Apache Incubator!

Today I have a big news. Now Hivemall joined Apache incubator project! Hivemall is a scalable machine learning library running on Hadoop. It was originally developed by Yui Makoto who is a research engineer at Treasure Data Inc. So from now, we call it Apache Hivemall. Top page is now opened. Apache Hivemall is developed as Hive UDF. Therefor... Read more

Introduction to Airframe

Do you have any experience of using DI container framework for your Scala project? The most famous DI container frameworks is Google Guice. This library is widely adapted in enterprise and open source community. Since it is defact standard library, it is the best option to select Guice as your DI framework in Java project. How about Scala pro... Read more

Serialization in Hive UDAF

Serialization of Java sometimes complex and difficult to understand for me. I’ve read Effective Java and javadoc of JDK SE api docs. So I knew I understood the basic concept of serialization of Java object. But I have faced to a problem when I wrote Hive UDAF. This might be a problem every people encountered when they try to write Hive UDAF. So ... Read more