The First Cry of Atom Today is the first day of the rest of my life.

Hive UDAF mode

There are big difference between Hive UDF and UDAF. I found that when I was developing UDAF. Normal UDF usually process one row into one value. And Hive jobs are executed as MapReduce job (of course other can do run such as Tez or Spark). So in the case of ordinal UDF, it is only necessary to run mapper. However UDAF is different. UDAF must... Read more

Join TreasureData

I’ve just done career change. I quitted the company where I’ve worked for about 3 years. That’s the first company for me after graduating university. To make progress my career, I decided to change my affiliation. I joined into TreasureData which provides a cloud based data analysis service. I was attracted the high level technologies and tha... Read more

Getting Started

Lagrange Lagrange is a minimalist Jekyll theme for running a personal blog or site for free through Github Pages, or on your own server. Everything that you will ever need to know about this Jekyll theme is included in the README below, which you can also find in the demo site. Notable features Compatible with GitHub Pages. ... Read more

Build SparkR

Apache Spark includes R API. If you are a developer of Spark, you will have a time to change API or implementation of Spark core and MLlib. In this case, you also have to change SparkR test codes. (And also you may have to change Java API test cases too.) But I had no experience to use R on my mac. So I wrote the process in this time. Install... Read more

Hadoop build commands

You might have a experience when you cannot remember the command how to build in your purpose. How to skip tests? How to build tar.gz package? How to build native packages? These information are in BUILDING.txt. So in this post, I’d like to file these command you may often use in your hadoop projects like a cheetsheet. Definit... Read more