I recently found out that tatoeba.org is a pretty nice resource for collecting parallel text in many languages. The major reason why I love it is that the whole data is downloadable as a dump file, with all the sentences being under the creative commons license (although there are some mistakes in the sentences). Specifically, [...]
Last Wednesday, we held our first meet-up meeting of East Asian Language Learning through Interpretation Methods. The purpose is to brush up your language skills (we target at East Asian languages, namely Chinese, Japanese, and Korean) through interpretation methods. Although this was our first time to even set up a meetup group, it was quite [...]
Since Clojure is based on JVM, you can easily pick a publicly available library for Java (machine learning, multimedia processing, or whatever) and call it. Calling Java libraries is normally straightforward thanks to Clojure’s inter-operation functionalities, but you could spend hours reading the library’s API document and tweaking around your code accordingly, especially if you [...]
I think I’m a quick learner when it comes to languages. After learning several languages (some of which are successful, others are not), I found that the effective ways to learn languages do not vary very much from one language to another — reading aloud, repetition, and shadowing, which are actually used well-established training methods [...]
It’s already the final day of ACL-HLT 2011. Overall, I enjoyed the conference very much, listening to new ideas, algorithms, tasks, and meeting old acquaintances, friends, and meeting new researchers, whether from Japan, the U.S., or abroad. I especially liked the presidential speech by Kevin Knight at the banquet on the second day. What would [...]
The big difference of this year’s ACL is that they have made public the best papers before the main conference started. Two of the best papers are both graph-based, although I don’t think this is an evidence that the recent research trend is toward graph-based models. An important thing here is that what is truly [...]
The first day of ACL-HLT 2011 main conference is now over, I pretty much enjoyed listening to talks and meeting researchers I know after a long time. I especially enjoyed the session “NLP for Web2.0,” (this name is kind of outdated now I suppose ), where several researches on twitter information extraction and spelling correction [...]
The first day of ACL-HLT 2011, tutorial, is now finished. I’ve listened to the following two sessions, and I liked the Marius Pasca’s talk about query log very much. I just cut & paste my memos below. I think I have mainly written down what has NOT been said in the slides (or handouts). I [...]
Now getting ready to set off to Portland to participate ACL-HLT 2011. I’ve learned that Portland is a little bit cooler than New York. Hope I can meet you guys there at the venue. After 2011 Great East Japan Earthquake, we had some opportunities to write / talk about the ANPI_NLP (ANPI stands for “safety [...]
The (probably most important) NLP international conference ACL-HLT 2011 is coming up next week. I’ve prepared my presentation (After I received my colleagues’ nice comments), so I upload the camera ready and the PPT slides here. We, short paper presenters, only have 10 minutes each, excluding questions and answers. This limitation forces me to remove [...]
About the Author
Masato Hagiwara currently works for Rakuten Institute of Technology in New York, as a Senior Scientist. Have worked on search technologies at Google, Microsoft Research, and Baidu in the past. Expert in Natural Language Processing (NLP). Also a lead translator of the O'Reilly book "Natural Language Processing in Python." A native speaker of Japanese. Good command of English and Chinese (Mandarin). For more information, see About Me.Calender
May 2012 M T W T F S S « Apr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Pages
- 100 NLP Papers
- About Me
- iconlang – new ideographic writing system for better visibility and legibility
- iconlang – 視認性・識別性向上のための新しい表意文字体系
- Music
- Music for Language Fans
- NLTK Japanese Corpora – NLTKで使える日本語コーパス
- Python/Romkan ローマ字とひらがなを相互に変換する Python用のライブラリ
- TinySegmenter in Python
- 中国語学習完全ガイド | 1年以内にマスターする中国語
- 巻き舌クリニック – みんなで巻き舌を克服するサイト