The First Cry of Atom

Hive UDAF mode

Posted on October 22, 2015

There are big difference between Hive UDF and UDAF. I found that when I was developing UDAF. Normal UDF usually process one row into one value. And Hive jobs are executed as MapReduce job (of course other can do run such as Tez or Spark). So in the case of... [Read More]

Join TreasureData

Posted on October 15, 2015

I’ve just done career change. I quitted the company where I’ve worked for about 3 years. That’s the first company for me after graduating university. To make progress my career, I decided to change my affiliation. [Read More]

Build SparkR

Posted on September 25, 2015

Apache Spark includes R API. If you are a developer of Spark, you will have a time to change API or implementation of Spark core and MLlib. In this case, you also have to change SparkR test codes. (And also you may have to change Java API test cases too.)... [Read More]

Hadoop build commands

Posted on September 20, 2015

You might have a experience when you cannot remember the command how to build in your purpose. How to skip tests? How to build tar.gz package? How to build native packages? [Read More]

Multiple Hadoop Cluster on Docker

Posted on September 16, 2015

As a hadoop developer, there are several times when I want to create multiple node hadoop cluster more easily. First I came up with using VirtualBox and Vagrant. But it was very slow to launch one cluster. Besides the more nodes we added, the slower launching time be. I cannot... [Read More]