The first cry of Atom

The people who are crazy enough to think thay can change the world, are the ones who do

about me

I’m a software engineer living in Tokyo, Japan.I have a passion for developing the artificial intelligence that is the dream of humankind for a long time.I long for the world where technology supports our imcompleteness which is burdened since human was born. So I dedicate myself to realize these ideal some day.

Vagrant in TravisCI

I received pull request from Bill Warner. He writes unittest which used test-kitchen and RSpec. This pull request is a great resource for me because I didn’t know how to write test code for cookbook. There are a lot of things I can obtain. I cannot say thank you enough. And I always glad to receive PR from anyone, anytime.

written in TravisCI, Vagrant Read on →

Run Spark Local Machine

From Spark 1.2.0, it seems to be different from older version when you want to run your spark job on your local machine. Before v0.9.2 you can run standalone job with such code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/*** SimpleApp.scala ***/
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "$YOUR_SPARK_HOME/README.md" // Should be some file on your system
    val sc = new SparkContext("local", "Simple App", "YOUR_SPARK_HOME",
    List("target/scala-2.10/simple-project_2.10-1.0.jar"))
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

You can run this app with sbt command.

1
$ sbt run

That’s fine. But with latest version, you cannot find no more such documentation. I think spark does not anticipate the use case with sbt standalone running. So current version looks like below. SBT section has been totally changed and you should submit jar file to your local spark. But I found a way to run spark job by using sbt command with some changes. There are two major changes.

  • Must add master configuration
  • SparkContext should be stopped

I add these code to original one. This is the working one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
      .setMaster("local[2]") // Set master configuration for local
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
    sc.stop() // Stop SparkContext
  }
}

With these two lines, you can run sbt command.

written in Spark

Clipboard on MacOSX With Tmux

With default tmux, you may have some trouble copying text on clipboard. Drag with option key enables you to copy selected text to clipboard. But the number of keys you have to touch when you select has been increased. This is completely annoying. By using reattach-to-user-namespace, you can copy any text from copy mode of tmux to mac clipboard. Below is the process.

First you have to install reattach-to-user-namespace. If you have Homebrew, it is easy.

1
$ brew install reattach-to-user-namespace

Then write your tmux.conf below.

1
2
3
4
5
6
7
8
9
10
# Use vim keybindings in copy mode
setw -g mode-keys vi

# Setup 'v' to begin selection as in Vim
bind-key -t vi-copy v begin-selection
bind-key -t vi-copy y copy-pipe "reattach-to-user-namespace pbcopy"

# Update default binding of `Enter` to also use copy-pipe
unbind -t vi-copy Enter
bind-key -t vi-copy Enter copy-pipe "reattach-to-user-namespace pbcopy"

My key binding is here. I use Space key as begin-selection, and Enter key as copy-pipe`. Of course, if can change any keys as you like.

Reference

tmux Copy & Paste on OS X: A Better Future

written in Mac

I Have a Dream

Today, I took a walk with one of my friend. He joined my office the end of last year and he is from Wuhan, China. He is a good man and very kind to me wherever we met. We walked around imperial palace and visited Tokyo station, Imperial Palace East Garden, Yasukuni shrine and Jimbocho book stores. I like to walk around central of Tokyo. I was pleased looking he was also enjoy this short trip.

written in History, Life Read on →

Your Own Cluster With Storm-devenv

As written in this post, I developed a tool for constructing storm cluster more easily. When you want to add new features or investigate some bugs issued by others, this tool will be useful. Usually this kind of tools can only construct a cluster with released packages. storm-devenv enables us to construct storm cluster with your own storm code on your local machine. Vagrant and VirtualBox make is possible. I think that we can apply same process to AWS EC2 instances by using Vagrant. So I choose Vagrant as constructing tool. And as provisioning tool, I wrote chef cookbooks which is a de fact tool for configuration tool. I’d like to introduce storm-devenv mode detail in this post.

written in Vagrant, storm Read on →

Making Storm Cluster for Development

When you develop big data processing platform such as Hadoop, Spark and Storm, you need to construct a cluster. You can create it with whichever virtual machines and real server. Personally it is hard to obtain real servers. Setting up networks and configurations are tough work. So you might use virtual machines, EC2 , VirtualBox. Today I’d like to introduce some options to create your own storm cluster for development of new features and investigation of problems.

written in Development, Storm Read on →

Targets in 2015

This is my first post in 2015. There was a lot of things in last year and so will it this year. At the beginning of new year, I’d like to write down three targets I want to achieve.

written in Life, Targets Read on →

Training Conditional Random Field

This article is written as 17th entry of Qiita machine learning advent calendar.

Conditional random field, CRF, is a kind of discriminative model for sequential data. This model is used widely for labeling natural language sequences such as “I have a pen”. There is a motivation to attach tags to this sequence. For example, “I have a pen” can be tagged as “I(Noun) have(Verb) a(Article) pen(Noun)”. You can train CRF to predict these tags with given sequential data. CRF is not a forefront algorithm but its knowledges and notions included in CRF must be valuable for understanding a lot of types of probability models. In this entry I’d like to explain the training and prediction process which are used by CRF, especially linear chain CRF.

written in CRF, Machine Learning Read on →

Implement Random Feedback Neural Network

I’m very interested in neural network algorithm such as recurrent neural network, sparse auto encoder, restricted boltzmann machine and so on. Most of neural network learning algorithms are based on backpropagation. This algorithm was first developed in 1974 as the context of neural network. Backpropagation is a simple and efficient learning algorithm. So it seems as a defact standard in machine learning field.

written in Machine Learning, Neural Network Read on →