I have been working on a weekend project called “jufsisku” for the past few weeks. This project is to build a search engine where you can look up Lojban-English translations using queries in these two languages. You can try out the search here: http://lojban.lilyx.net/jufsisku/ I have shown the demo to a group of Japanese-speaking lojbanist [...]
The paper we submitted to IJCNLP2011 has been accepted, and will be presented soon at the conference which will be held in a few weeks from now. The paper describes the #ANPI_NLP project, a voluntary relief project focusing on text and safety information mining in the wake of The East Japan Earthquake in March, 2011. [...]
On Labor day weekend, my wife and I paid a visit to Penn State University, which is located at State College, in the middle of the state of Pennsylvania. It was a four-and-a-half-hour bus ride from New York City, taking Megabus first and Gotobus for the return trip, which was not very comfortable. Our purpose [...]
Just for my convenience, I’ve listed up best papers of major NLP conferences (ACL / COLING / NAACL / EMNLP / CoNLL) for the past 7 years or so. If you find anything wrong or mistaken, please let me know. Thanks~ ACL 2005: David Chiang A hierarchical phrase-based model for statistical machine translation 2006: Rion [...]
The special issue “Unnatural Language Processing” of Journal of Natural Language Processing, for which I’m a leading editorial member, has started its call for paper a few weeks ago. This special issue, subtitled “Processing of Out-of-the-box Language Expressions” is the sequel to the past two events of “Unnatural Language Processing” last year. The topics include [...]
The Japanese morphological analyzer MeCab can also be directly called from Clojure, too, by using its Java binding. I have, however, come across some pitfalls related to JNI in the process, so I’ll describe how I’ve overcome them in the following so that everyone else doesn’t have to stumble over the same issues. The first [...]
Since Clojure is based on JVM, you can easily pick a publicly available library for Java (machine learning, multimedia processing, or whatever) and call it. Calling Java libraries is normally straightforward thanks to Clojure’s inter-operation functionalities, but you could spend hours reading the library’s API document and tweaking around your code accordingly, especially if you [...]
It’s already the final day of ACL-HLT 2011. Overall, I enjoyed the conference very much, listening to new ideas, algorithms, tasks, and meeting old acquaintances, friends, and meeting new researchers, whether from Japan, the U.S., or abroad. I especially liked the presidential speech by Kevin Knight at the banquet on the second day. What would [...]
The big difference of this year’s ACL is that they have made public the best papers before the main conference started. Two of the best papers are both graph-based, although I don’t think this is an evidence that the recent research trend is toward graph-based models. An important thing here is that what is truly [...]
The first day of ACL-HLT 2011 main conference is now over, I pretty much enjoyed listening to talks and meeting researchers I know after a long time. I especially enjoyed the session “NLP for Web2.0,” (this name is kind of outdated now I suppose ), where several researches on twitter information extraction and spelling correction [...]
About the Author
Masato Hagiwara currently works for Rakuten Institute of Technology in New York, as a Senior Scientist. Have worked on search technologies at Google, Microsoft Research, and Baidu in the past. Expert in Natural Language Processing (NLP). Also a lead translator of the O'Reilly book "Natural Language Processing in Python." A native speaker of Japanese. Good command of English and Chinese (Mandarin). For more information, see About Me.Pages
- 100 NLP Papers
- About Me
- iconlang – new ideographic writing system for better visibility and legibility
- iconlang – 視認性・識別性向上のための新しい表意文字体系
- Music
- Music for Language Fans
- NLTK Japanese Corpora – NLTKで使える日本語コーパス
- Python/Romkan ローマ字とひらがなを相互に変換する Python用のライブラリ
- TinySegmenter in Python
- 中国語学習完全ガイド | 1年以内にマスターする中国語
- 巻き舌クリニック – みんなで巻き舌を克服するサイト