• Hive UDAF mode

    There are big difference between Hive UDF and UDAF. I found that when I was developing UDAF. Normal UDF usually process one row into one value. And Hive jobs are executed as MapReduce job (of course other can do run such as Tez or Spark). So in the case of... [Read More]
  • Join TreasureData

    I’ve just done career change. I quitted the company where I’ve worked for about 3 years. That’s the first company for me after graduating university. To make progress my career, I decided to change my affiliation. [Read More]
  • Build SparkR

    Apache Spark includes R API. If you are a developer of Spark, you will have a time to change API or implementation of Spark core and MLlib. In this case, you also have to change SparkR test codes. (And also you may have to change Java API test cases too.)... [Read More]
  • Hadoop build commands

    You might have a experience when you cannot remember the command how to build in your purpose. How to skip tests? How to build tar.gz package? How to build native packages? [Read More]
  • Multiple Hadoop Cluster on Docker

    As a hadoop developer, there are several times when I want to create multiple node hadoop cluster more easily. First I came up with using VirtualBox and Vagrant. But it was very slow to launch one cluster. Besides the more nodes we added, the slower launching time be. I cannot... [Read More]