(Continued from the last post)

3. Meet-up

We (my wife and I) have been hosting study group meet-ups in New York, and thought it’d be nice to have experimental meet-ups for Japanese learners of Chinese language. We made this video and the event web page, and planned to have meet-ups in three largest cities in Japan.

Unfortunately, there were few participants for Nagoya and Osaka, but we successfully held a 6-person study group in Tokyo and enjoyed practice and sharing information.

Later I heard from some of my friends that they wanted to participate but they didn’t know it was in Tokyo or they couldn’t make it because they were not available on that day. We should definitely hold meet-ups in Japan again!

4. Mitoh Conference at mixi’s office, Shibuya, Tokyo

I was invited for a lightning talk at Mitoh Conference 2012, which is an annual conference for Mitoh Software participants and alumni.

It was fun to listen to many interesting projects and presentations (my favorite was Mr. Tanaka Taisei from Geisha Tokyo Entertainment and Mr. Goto Masaki from BestTeacher) and get to know new people in the industry.

I talked about Unnatural language processing (here is the slide), which was well received I think. (Actually, I didn’t sleep at all on the airplane from NY to Tokyo, partially because I wanted to avoid jet lag by being awake when it’s noon in Tokyo, but mainly because I wanted to finish my demos)

5. NLP20122 at Hiroshima City University, Hiroshima

The main reason I went back to Japan is to attend NLP2012, the largest NLP conference, which was held in Hiroshima this year.

I was one of the program committee members (I spent two full days creating the whole program with my boss, which is my first experience), had to chair a session, and had two presentation to make, so it turned out the busiest conference I’ve ever attended in Japan.

My main duty during the conference as a PC member is to broadcast UStream of tutorials and the invited talks without no problems. Despite of minor issues such as low-quality audio and illegible characters on slides, overall it went all right. Hope everybody enjoyed the broadcast. The sessions where I presented, especially the one for morphological analysis, was really popular, with a roomfull of audience (or some of them) aggressively discussing different perspectives.

After all, it was very fruitful event, getting to know many new researchers / students, and enjoying wonderful Hiroshima local speciality!

 

It’s been almost one month since I came back from the once-in-a-year trip back to Japan in March.
I had been too busy (i.e., too lazy to consider it a top priority) to write anything about it, but now let me briefly note down some of the event I participated in before my memory fades.

1. IPSJ Special Event at Nagoya Instite of Technology

I attended and made a presentation at a special event “Real-world Natural Language Processing.”

http://www.ipsj.or.jp/10jigyo/taikai/74kai/event_2-1.html

My talk was about the “Unnatural Language Processing”-related activities I’ve been involved in the past few years and I included catchy examples explaining how to “decode” Gal-Moji (letter subsitution based cryptic Japanese writing style used by high school girls) and “Cambridge” sentences. (this is one kind of Typoglycemia )

It was nice and intersting to listen to other talks presented by top-tier researchers, so interesting that everybody talked too much and there’s little time left for the panel discussion. We end up deciding not to have the panel discussion this time.

I heard there were 89 participants at the event, and it was also refreshing to me to meet local people who I haven’t met for long (including my advisors at Nagoya University). It was a pity I couldn’t linger there because I had to go to Tokyo on that very day. (Thanks to Inui san @ Tsukuba Univ. for organizing the event!)

2. Rakuten Tokyo office business trip in Shinagawa, Tokyo

One of the main reasons I went back to Japan is, of course, to visit Rakuten Headquarter in Shinagawa. I was really glad that I could avoid the rush-hour by staying at a hotel in Ohimachi and having a 20-minute-walk commute to the office every morning. Too many things are going on in the company and it was great to meet everybody after a 1-year gap and refresh my memory.

I’ll write the latter half of my trip later.

 

I have been working on a weekend project called “jufsisku” for the past few weeks. This project is to build a search engine where you can look up Lojban-English translations using queries in these two languages. You can try out the search here:

http://lojban.lilyx.net/jufsisku/

I have shown the demo to a group of Japanese-speaking lojbanist at our Skype study group the other day, and announced the initial version at the English-speaking mailing list for lojbanists. Overall, it was positively accepted, and I’m glad to see several people said they liked it. I personally believe that a bunch of good quality translations (and a system to search them) are essentical not only when you are translating some documents but also when you are writing in a foreign languages. Dictionaries don’t help very much because you have to know not only what words to use but also how to use them. This issue is more serious for languages with small number of speakers and learning materials, like Lojban, which is why I decided to start on this project.

Let me explain the system architecture of lojbo jufsisku here, since it is built on exciting (and relatively recent) open source softwares, which are different from, well, normal MySQL & PHP things. Its search back-end is Apache solr , which is a pretty nice full-text search server. The entire web application is written by Compojure, a clojure based web aplication framework. (By the way, I have tried many programming languages in the past, including Ruby, Python, Java, Javascript, PHP, etc etc but Clojure is by far the best language for me). The framework and the clojure programming language which the framework is so powerful that there are only 300 lines of codes including EVERYTHING, including logic, html-generation, css, db-store, and so on.

And the translation data is stored in MongoDB, a flexible “NoSQL” database system, to which the users can add new sentences.

lojban jufsisku is only the beginning of my long-term goal to provide the best learning environment for Lojban. Any feedback is appreciated.

 

The paper we submitted to IJCNLP2011 has been accepted, and will be presented soon at the conference which will be held in a few weeks from now.
The paper describes the #ANPI_NLP project, a voluntary relief project focusing on text and safety information mining in the wake of The East Japan Earthquake in March, 2011.

Here’s the full paper PDF (which is kindly uploaded by the leading co-author Mr. Graham Neubig).

In the paper, we not only describe how the project was started and evolved and what kind of tasks we dealt with, but also focused on the lessons we learned from the project experience.
Even after the submission we have received some useful feedback from colleagues and peer researchers. In retrospect, we could have done more things during the relief effort and even BEFORE any disasters happen.
Please read the paper if you are interested, and give us back any feedback. (Floods in Thailand still continue as I write this article — I hope the conference is held without any problems)

 

My wife and I decided to provide a program named “Intensive Chinese Weekend Stay” in New York. In this program, we invite a learner of the Chinese language for free to our home and provide an intensive learning course.

Intensive Chinese Weekend Stay Program (New York City)

Part of the reasons why we started this kind of program is that we recently signed up for CouchSurfing, which we found very interesting (we are actually hosting an American girl next weekend just two weeks after signing up!). We decided to impose a particular condition when hosting somebody that the guests should at least speak or learning one of the CJK (i.e., Chinese, Japanese, Korean) languages, so that the guests can deepen their understanding in East Asian languages and cultures.

This “Intensive Chinese Weekend Stay” is the extension of the above concept. In this program we are going to provide comprehensive pronunciation and grammar review so if you are interested in the details please go to the above page and apply!

 

On Labor day weekend, my wife and I paid a visit to Penn State University, which is located at State College, in the middle of the state of Pennsylvania. It was a four-and-a-half-hour bus ride from New York City, taking Megabus first and Gotobus for the return trip, which was not very comfortable.

Our purpose is to pay a visit to a professor there whose major is Confucianism and East Asian history. This is especially useful for my wife to have a deeper glimpse of the field of East Asian Philosophy, and to set the future research direction. (The Hongkong-born professor and we have talked in Mandarin, which was interesting to me, too).

Visiting such researchers in the country reminds me of a blog post “Conferences: Costs and Benefits ” in natural language processing blog, where the author Hal Daumé III claims that inviting famous type researchers to one’s own university and visiting labs in the country and having deep in-office conversation can compensate for the large amount of money we usually spend on domestic and/or international conferences every year.

I feel more positive about this idea as I keep working here at Rakuten Institute of Technology, New York. It cannot be underestimated to be able to work in a hub-like place which lots of top-tier researchers keep visiting. That’s one of the reasons why places like Google and Microsoft Research stay as competitive places all the time, where a lot of researchers and top engineers have “tech-talks.” That could be much more important than simply attending every conference, from good ones and not-so-good ones. I would also like to increase this kind of opportunity personally, hopefully starting from this year.

 

Just for my convenience, I’ve listed up best papers of major NLP conferences (ACL / COLING / NAACL / EMNLP / CoNLL) for the past 7 years or so. If you find anything wrong or mistaken, please let me know. Thanks~

ACL

COLING

NAACL

EMNLP

CoNLL

 

The special issue “Unnatural Language Processing” of Journal of Natural Language Processing, for which I’m a leading editorial member, has started its call for paper a few weeks ago.

This special issue, subtitled “Processing of Out-of-the-box Language
Expressions” is the sequel to the past two events of “Unnatural Language Processing” last year. The topics include not only normal academic papers but also papers describing systems and data regarding the theme.

Although we’ve prepared the CFP only in Japanese, which you can see here, this doesn’t mean that we are excluding any submissions in English.

By the way, I’ve observed some arguments on twitter about the title, claiming that the word “unnatural” is not suitable for the theme because the language phenomena which this special issue focuses on are exactly the examples of human “natural” language activity.

Let me explain a little about this — we (and ANLP, too) have absolutely no intention to declare or define them as “unnatural.” You can see that we are consistently using the word “out-of-the-box” in the CFP. We suppose that the title is just an alias to this field of domain, which targets at the processing of language phenomena which have not been gathering much attention so far because they are irregular and/or new. Further academic discussion should follow in the near future.

Anyway, the submission deadline is March 23rd, 2012. We all welcome your submission!

 

The Japanese morphological analyzer MeCab can also be directly called from Clojure, too, by using its Java binding. I have, however, come across some pitfalls related to JNI in the process, so I’ll describe how I’ve overcome them in the following so that everyone else doesn’t have to stumble over the same issues.

The first thing you have to do is to install MeCab’s Java binding, which is rather straightforward. Download mecab-java-0.98pre3.tar.gz (which is the latest version at the time of writing) from here, untar & make it. (Be sure to set the INCLUDE variable to an appropriate path if you are using non-typical environment, such as OpenJDK.)

One issue I encountered here is that JVM dies from SIGSEGV when tried to run the sample program in Java:

#
# An unexpected error has been detected by Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0000003091c7b59b, pid=32167, tid=1106934080
#

I found one blog article which describes exactly the same problem. As the article suggests, adding the following two lines after line 710 of MeCab_wrap.cxx did the trick for me:

char work[128] ; // add this line
sprintf(work,"result:%0x\n",result); // add this line

Now you are ready to use MeCab from Clojure code. Make sure that MeCab.jar is in CLASSPATH and libMeCab.so is loadable, both of which are created after running “make.”

The thing is, even after importing org.chasen.mecab MeCab + Tagger + Node and running (System/loadLibrary “MeCab”), the Clojure code will complain with “UnsatisfiedLinkError,” which basically means that the necessary native code library cannot be loaded appropriately.

The reason was, as I found out after a full hour of struggling, try and error, that the library is not loaded appropriately because of Clojure’s classloader. The solution, provided here, is to call “Runtime/loadLibrary0″ method directly using wall-hack-method so that the library is loaded in the same classLoader which the caller specifies:

(use '[clojure.contrib.java-utils : only (wall-hack-method)])

(defn load-lib [class lib]
  (wall-hack-method java.lang.Runtime "loadLibrary0" [Class String]
                    (Runtime/getRuntime) class lib))

(load-lib MeCab "MeCab")

You can now call MeCab via Clojure’s typical Java interop functions/macros:

(println (MeCab/VERSION))
(let [tagger (new Tagger)
      sent "太郎は二郎にこの本を渡した。"]
  (println (. tagger (parse sent)))

  (loop [node (. tagger (parseToNode sent))]
    (when node
      (println (str (. node getSurface) "\t" (. node getFeature)))
      (recur (. node getNext))
      )
    )
  )

0.98pre3
太郎 名詞,固有名詞,人名,名,*,*,太郎,タロウ,タロー
は 助詞,係助詞,*,*,*,*,は,ハ,ワ
二郎 名詞,固有名詞,人名,名,*,*,二郎,ジロウ,ジロー
に 助詞,格助詞,一般,*,*,*,に,ニ,ニ
この 連体詞,*,*,*,*,*,この,コノ,コノ
本 名詞,一般,*,*,*,*,本,ホン,ホン
を 助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
渡し 動詞,自立,*,*,五段・サ行,連用形,渡す,ワタシ,ワタシ
た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ
。 記号,句点,*,*,*,*,。,。,。
EOS

BOS/EOS,*,*,*,*,*,*,*,*
太郎 名詞,固有名詞,人名,名,*,*,太郎,タロウ,タロー
は 助詞,係助詞,*,*,*,*,は,ハ,ワ
二郎 名詞,固有名詞,人名,名,*,*,二郎,ジロウ,ジロー
に 助詞,格助詞,一般,*,*,*,に,ニ,ニ
この 連体詞,*,*,*,*,*,この,コノ,コノ
本 名詞,一般,*,*,*,*,本,ホン,ホン
を 助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
渡し 動詞,自立,*,*,五段・サ行,連用形,渡す,ワタシ,ワタシ
た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ
。 記号,句点,*,*,*,*,。,。,。
BOS/EOS,*,*,*,*,*,*,*,*

 

We’ve been on a short trip to Toronto over the weekend, visiting my wife’s old friends, one of whom is now spending a week in her hometown.

We’ve been to Niagara falls, downtown Toronto (ex-world-tallest CN Tower was amazing, and Cosa Loma castle was fun), had a BBQ at their wonderful house, and even enjoyed Cantonese style Dimsum, too!

I didn’t know that the Chinese culture brought by a large number of immigrants has penetrated so deep into the city, finding a lot of Chinese-style restaurants and supermarkets on the streets.

Bot of our friends actually have Cantonese roots, which makes their cultural background very diverse. The conversation and languages were also diverse, ranging from Cantonese and English, and even to Mandarin and some Japanese, switching from one language to anther even during single sentences. It’s a pity that I’m all thumbs when it comes to Cantonese, and always motivated by linguistic diversity but never had time to master it. It’s still pleasant to my ears just listening to them speaking and enjoying the tonal language’s melodies and exotic vowels.

Anyway, we really thank Diana and Niki for their hospitality and fun. We’ll definitely be back soon!