What kind of language technologies would the “World Government” require in 30 years from now?
— And why don’t we just start now?
『世界政府』というものがあるとして、そこで30年後に必要になる言語技術は何だろう
— そして、今から始められることは何だろう

To contact me, follow me at twitter: @mhagiwara, or shoot me an email at hagisan [at] gmail.com

You can find my current CV (resume): English version / Japanese version .

Research Interests

  1. “Un” natural Language Processing
  2. UnNatural Language Processing (UNLP) is one research field of NLP, which deals with “real” and “noisy” language data which cannot be captured by conventional “text-book” NLP techniques. Targets of UNLP include, but not limited to: twitter, emoticons, noisy data, irregular NEs, unknown words, informal languages, and so on. The projects I’ve worked on so far are:
    — Emoticon processing for mobile search engines
    The First Unnatural Language Processing Contest hosted by Baidu Japan
    The second Unnatural Language Processing Thematic Session at NLP2011
    ANPI_NLP (Safety Information Mining Project for 2011 Tohoku Region Pacific Coast Earthquake in Japan)

  3. Lexical Knowledge Acquisition using Machine Learning and Graph-Theoretic Approaches
  4. — worked on the use of latent semantic models in acquiring lexical knowledge from large corpora. Recently focusing on the use of graph-kernels for knowledge extraction from unsegmented Japanese text

  5. Japanese Transliteration and Query Alteration
  6. — focusing on multil-lingual latent semantic transliteration models and query alteration

Work Experience

  • Oct. 2010 – Present: Senior Scientist – Rakuten Institute of Technology (in New York)
  • Apr. 2009 – Sep. 2010: Research and Development Engineer – Baidu Japan, Inc. (worked in Shanghai / Beijing / Tokyo)
    • Planned and acted as a lead developer in various projects including Unnatural language processing contest, Baidu Mobile Corpus and Timed Corpus.
    • Worked on the ranking and page analytical algorithms including spam detection for Baidu mobile search. Also worked on the mobile emoticon search using various NLP semantic analysis techniques.
    • Also worked on various NLP topics including – word / sentence analysis technologies, synonym mining and dictionary construction, proper noun detection, Japanese Input Method BaiduType, etc.
  • Apr. 2008 – Jul. 2008 : Research Intern – Microsoft Research, WA, USA. (Mentor: Hisami Suzuki)
    • Proposed a state-of-the-art method for Japanese query alteration, which corrects misspellings and normalizes the spelling/transliteration variants, with higher accuracy than conventional systems.
    • Implemented the system using Visual C#, SQL Server, and Ruby, with tens of gigabytes of query log. This system is being integrated into Microsoft Live Search (http://www.live.com/).
    • Developed a method to automatically and efficiently generate query re-writing pairs from session log.
    • Presented the project at the 3rd NLP Symposium for Young Researchers and was awarded the outstanding presentation award. International conference papers are being submitted as well.
  • Nov. 2006 – Aug. 2007 : Developer – IPA, JAPAN: Exploratory Software Project. (Project Manager: Prof. David J. Farber)
    • Accepted as the Exploratory Software Project “Serendi: A Location-Aware Social Networking Platform” (http://serendi.org/), a location-aware meta social networking service targeted at mobile devices with GPS.
    • Developed the “compatibility” analysis module, which recommends users in real time based on natural language processing and network analysis. Used PHP, JavaScript, Ruby, MySQL, and ActiveRecord.
    • Conducted an extensive user test with more than 50 users and confirmed the reliability of the system.
  • Aug. 2005 – Sep. 2005 : Intern (Software Engineer), Google Inc., CA, USA. (Mentors: Dekang Lin and Jun Wu)
    • Participated in the two-month internship program, as one of the few interns chosen from Japan, as it was only the second year since the internship was started.
    • Worked on Japanese query suggestion, which is currently used as the basis for the query suggestion shown at the top and bottom of the Google search result.
    • Fully used the parallel distributed computation algorithms such as MapReduce and the large network cluster infrastructure which Google offers.
  • Apr. 2006 – Mar. 2007 : Research Assistant, Nagoya University
    • Worked on some research projects related to the 21st Century COE Program “Intelligent Media Integration for Social Information Infrastructure” at Nagoya University.
    • Proposed and implemented some extension and selection methods of context for lexical similarity computation, to increase the performance of linguistic resources construction such as thesauri.
    • Published several papers at the top-tier international conferences as well as in journals. (see the “Publications” section)
  • Sep. 2005 – Mar. 2006, Sep. 2006 – Mar. 2007 : Teaching Assistant, Nagoya University
            Taught “Linear Algebra” and “Automata and Formal Language Theory” to undergraduate students.

Education

  • Apr. 2006 – Mar. 2009: Ph.D. Candidate, Department of Information Engineering,
         Graduate School of Information Science, Nagoya University, Japan
         Doctoral Thesis: “Modeling and Selection of Context for Better Synonym Acquisition”
  • Apr. 2004 – Mar. 2006 : Master’s Program in Department of Information Engineering,
         Graduate School of Information Science, Nagoya University, Japan
         * Entered using the grade-skipping system. Overall GPA: 3.8
         Master’s Thesis: “Utilization of Probabilistic Latent Semantics for Automatic Thesaurus Construction”
  • Apr. 2001 – Mar. 2004 : Information Engineering Course, School of Engineering,,

         Nagoya University, Japan. Computer Science GPA: 3.9

Publications (Selected)

    Books and Articles

  • Steven Bird, Ewan Klein, Edward Loper. 萩原正人 (Masato Hagiwara), 中山敬広 (Takahiro Nakayama), 水野貴明(Takaaki Mizuno) (translation). 入門 自然言語処理 (Natural Language Processing with Python). O’Reilly Japan, 2010. O’Reilly Japan – 入門 自然言語処理

  • Journal Papers

  • 萩原正人,小川泰弘,外山勝彦: グラフカーネルを用いた非分かち書き文からの漸次的語彙知識獲得, 人工知能学会誌, Vol.26, No.3, pp.- (2011. 3.)
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Supervised Synonym Acquisition Using Distributional Features and Syntactic Patterns. Journal of Natural Language Processing, Vol. 16, Num. 2, pp. 59-83, 2009.
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. A Comparative Study on Effective Context Selection for Distributional Similarity. Journal of Natural Language Processing, Vol. 5, Num. 5, pp. 119-150, 2008.
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effective Use of Indirect Dependency for Distributional Similarity. Journal of Natural Language Processing, Vol. 15, Num. 4, pp. 19-42, 2008.
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Bootstrapping-based Extraction of Dictionary Terms from Unsegmented Legal Text. New Frontiers in Artificial Intelligence: JSAI 2008 Conference and Workshops, Revised Selected papers, Lecture Notes in Computer Science, Vol. 5447, pp. 213-227, 2009.

  • Conference Papers

  • Graham Neubig, Yuichiroh Matsubayashi, Masato Hagiwara, Koji Murakami. Safety Information Mining — What can NLP do in a disaster —, Proc. of IJCNLP 2011. [pdf]
  • Masato Hagiwara and Satoshi Sekine. Latent Class Transliteration based on Source Language Origins. Proc. of ACL-HLT 2011 [pdf]
  • Masato Hagiwara and Hisami Suzuki. Japanese Query Alteration Based on Lexical Semantic Similarity. Proc. of NAACL HLT 2009, pp. 191-199, 2009.
  • Nobuyuki Shimizu, Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama and Hiroshi Nakagawa. Metric learning for synonym acquisition. Proc. of COLING 2008, pp. 793-800, 2008.
  • Masato Hagiwara. A Supervised Learning Approach to Automatic Synonym Identification based on Distributional Features. Proc. of ACL 2008 Student Research Workshop, pp. 1-6, 2008. [pdf] [link]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Bootstrapping-based Extraction of Dictionary Terms from Unsegmented Legal Text. Proc. of JURISIN 2008, pp. 63-72, 2008. [ppt]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Context Feature Selection for Distributional Similarity. Proc. of IJCNLP 2008, pp. 553-560, 2008. [pdf] [link]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effective Proximity Distance for Word-Based Context. Proc. of SNLP 2007, pp. 105-110, 2007. [ppt] [link]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effectiveness of Indirect Dependency for Automatic Synonym Acquisition. Proc. of CoSMo 2007, pp. 1 – 8, 2007. [pdf] [ppt]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Selection of Effective Contextual Information for Automatic Synonym Acquisition. Proc. of COLING/ACL 2006, pp. 353 – 360, 2006. [pdf] [link]
  • Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. PLSI Utilization for Automatic Thesaurus Construction. Proc. of IJCNLP 2005, pp. 334 – 345, 2005. [link]

Softwares and Projects

Awards & Professional Activities

Computer Skills

  • Languages : C, C++, C#, Clojure, Python, Ruby, JavaScript, (D)HTML
  • Applications: Solr, MongoDB, MySQL, NLTK
  • Platforms: Windows, Linux
         5+ years of Web application development experience, including LAMP architecture

Natural Language Skills

  • Japanese : Native
  • English : Fluent – TOEIC score 960 (2007)
  • Chinese (Mandarin) : Advanced – New HSK (汉语水平考试) Grade 6 (Dec. 2010)
 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>