Hello and welcome to Masato Hagiwara's home page. I'm
currently a PhD student at Nagoya University, Japan. My
research interests include statistical natural language
processing (NLP), especially lexical knowledge
acquisition.
To contact me, please send me an email at hagisan [at] gmail.com
You can find my current CV (resume) here. Japanese version.
Research Interests
- 1. Lexical Knowledge Acquisition from Large Corpora
- 2. Japanese Query Alteration
Acquiring lexical knowledge such as synonyms, hypernyms, and hyponyms from large corpora is an important issue in natural language processing and it has a broad range of applications. Many of the acquisition methods are based on the distributional hypothesis, where the word relatedness is measured by the commonality of the contexts of words. I've been working especially on the extraction, selection, and extension of useful contextual information for distributional similarity, and proposed several extension and selection methods. The result is matched against some existing thesauri such as WordNet and I confirmed that these methods enhance the acquisition performance.
When acquiring lexical knowledge, the modeling of word-context co-occurrence is essential. I applied the probabilistic latent semantic models such as PLSA in order to solve the sparseness problem and to improve the performance.
I have also been working on the lexical knowledge acquisition using syntactic patterns and machine learning techniques. The recent results show that it achieves more than 50% performance improvement compared to the conventional methods using the vector space model and the distributional similarity.
Query correction is an important problem for the robust Web information retrieval. I proposed and implemented a unified approach for Japanese query alteration in the internship project at Microsoft Research. This approach is basically based on the conventional spelling correction methods in English, but it differs from them in that (1) it can handle re-writing of candidates that are semantically similar but distinct in spelling, and (2) it uses kernel-based lexical semantic similarity to avoid the problem of data sparseness in computing query-candidate similarity.
I also proposed an efficient method for generating query-candidate re-writing pairs from search session log. It showed a better re-writing performance compared to the conventional methods in English query correction.
Education
- Apr. 2006 - Mar. 2009 (Expected) : Ph.D. Candidate, Department
of Information Engineering,
Graduate School of Information Science, Nagoya University, Japan
Doctoral Thesis: "Modeling and Selection of Context for Better Synonym Acquisition" - Apr. 2004 - Mar. 2006 : Master's Program in
Department of Information Engineering,
Graduate School of Information Science, Nagoya University, Japan
* Entered using the grade-skipping system. Overall GPA: 3.8
Master's Thesis: "Utilization of Probabilistic Latent Semantics for Automatic Thesaurus Construction" - Apr. 2001 - Mar. 2004 : Information Engineering Course, School of Engineering,,
Nagoya University, Japan. Computer Science GPA: 3.9
Experience
- Apr. 2008 - Jul. 2008 : Research Intern - Microsoft Research, WA, USA. (Mentor: Hisami Suzuki)
- Proposed a state-of-the-art method for Japanese query alteration, which corrects misspellings and normalizes the spelling/transliteration variants, with higher accuracy than conventional systems.
- Implemented the system using Visual C#, SQL Server, and Ruby, with tens of gigabytes of query log. This system is being integrated into Microsoft Live Search (http://www.live.com/).
- Developed a method to automatically and efficiently generate query re-writing pairs from session log.
- Presented the project at the 3rd NLP Symposium for Young Researchers and was awarded the outstanding presentation award. International conference papers are being submitted as well.
- Nov. 2006 - Aug. 2007 : Developer - IPA, JAPAN: Exploratory Software Project. (Project Manager: Prof. David J. Farber)
- Accepted as the Exploratory Software Project "Serendi: A Location-Aware Social Networking Platform" (http://serendi.org/), a location-aware meta social networking service targeted at mobile devices with GPS.
- Developed the "compatibility" analysis module, which recommends users in real time based on natural language processing and network analysis. Used PHP, JavaScript, Ruby, MySQL, and ActiveRecord.
- Conducted an extensive user test with more than 50 users and confirmed the reliability of the system.
- Aug. 2005 - Sep. 2005 : Intern (Software Engineer), Google Inc., CA, USA. (Mentors: Dekang Lin and Jun Wu)
- Participated in the two-month internship program, as one of the few interns chosen from Japan, as it was only the second year since the internship was started.
- Worked on Japanese query suggestion, which is currently used as the basis for the query suggestion shown at the top and bottom of the Google search result.
- Fully used the parallel distributed computation algorithms such as MapReduce and the large network cluster infrastructure which Google offers.
- Apr. 2006 - Mar. 2007 : Research Assistant, Nagoya University
- Worked on some research projects related to the 21st Century COE Program "Intelligent Media Integration for Social Information Infrastructure" at Nagoya University.
- Proposed and implemented some extension and selection methods of context for lexical similarity computation, to increase the performance of linguistic resources construction such as thesauri.
- Published several papers at the top-tier international conferences as well as in journals. (see the "Publications" section)
- Sep. 2006 - Mar. 2006, Sep. 2006 - Mar. 2007 : Teaching Assistant, Nagoya University
Taught "Linear Algebra" and "Automata and Formal Language Theory" to undergraduate students.
Publications (Selected)
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Supervised Synonym Acquisition Using Distributional Features and Syntactic Patterns. Journal of Natural Language Processing, 2009 (to appear).
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. A Comparative Study on Effective Context Selection for Distributional Similarity. Journal of Natural Language Processing, Vol. 5, Num. 5, pp. 119-150, 2008.
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effective Use of Indirect Dependency for Distributional Similarity. Journal of Natural Language Processing, Vol. 15, Num. 4, pp. 19-42, 2008.
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Bootstrapping-based Extraction of Dictionary Terms from Unsegmented Legal Text. New Frontiers in Artificial Intelligence: JSAI 2008 Conference and Workshops, Revised Selected papers, Lecture Notes in Computer Science, 14 pages (to appear).
- Masato Hagiwara and Hisami Suzuki. Japanese Query Alteration Based on Lexical Semantic Similarity. Proc. of NAACL HLT 2009 (to appear).
- Nobuyuki Shimizu, Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama and Hiroshi Nakagawa. Metric learning for synonym acquisition. Proc. of COLING 2008, pp. 793-800, 2008.
- Masato Hagiwara. A Supervised Learning Approach to Automatic Synonym Identification based on Distributional Features. Proc. of ACL 2008 Student Research Workshop, pp. 1-6, 2008. [pdf] [link]
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Bootstrapping-based Extraction of Dictionary Terms from Unsegmented Legal Text. Proc. of JURISIN 2008, pp. 63-72, 2008. [ppt]
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Context Feature Selection for Distributional Similarity. Proc. of IJCNLP 2008, pp. 553-560, 2008. [pdf] [link]
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effective Proximity Distance for Word-Based Context. Proc. of SNLP 2007, pp. 105-110, 2007. [ppt] [link]
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Effectiveness of Indirect Dependency for Automatic Synonym Acquisition. Proc. of CoSMo 2007, pp. 1 - 8, 2007. [pdf] [ppt]
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. Selection of Effective Contextual Information for Automatic Synonym Acquisition. Proc. of COLING/ACL 2006, pp. 353 - 360, 2006. [pdf] [link]
- Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama. PLSI Utilization for Automatic Thesaurus Construction. Proc. of IJCNLP 2005, pp. 334 - 345, 2005. [link]
Journal Papers
Conference Papers
Other Projects
- frippa (http://www.frippa.com/)
- Developed the entire system of this community-based classified ads service, one of the most active peer-to-peer trading communities in Japan with more than 2,000 users.
- Runs on an original MVC framework based on Linux, MySQL, ActiveRecord, Ruby, etc.
- Implemented a functionality to provide users with related items using natural language processing.
- Provided the item database in the joint project with the Reuse Market for furniture and appliances at Nagoya University in 2007, as a social contribution activity.
Also worked on user interface utilizing Ajax and Flash, as a temporary developer at a few IT start-up companies including RINEN.inc (http://rinen.cc/) and Anchor (http://anchor.vc/)
Awards & Professional Activities
- Outstanding Presentation Award at the 3rd NLP Symposium for Young Researchers. Presentation: "A Unified Approach to Japanese Query Alteration based on Semantic Similarity"
- Outstanding Presentation Award at the 22nd IMI Seminar of the 21st Century COE Program. Presentation: "Utilization of Probabilistic Latent Semantics for Automatic Thesaurus Construction"
- Program Committee of the Student Research Workshop (SRW) at ACL-IJCNLP 2009 (Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing).
Computer Skills
- Languages : C, C++, C#, Java, Ruby, PHP, JavaScript, SQL, (D)HTML, ActionScript
- Platforms: Windows, Linux
3+ years of Web application development experience, including LAMP architecture
Natural Language Skills
- Japanese : Native
- English : Advanced - TOEIC score 960 (2007)
- French and Chinese (Mandarin) : Elementary