<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>UnNatural Language Processing Blog</title>
	<atom:link href="http://lilyx.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://lilyx.net</link>
	<description>UnNatural Language Processing (UNLP) Blog, presented by Masato Hagiwara</description>
	<lastBuildDate>Wed, 15 Feb 2012 21:56:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Lojban translation search &#8220;lojbo jufsisku&#8221;</title>
		<link>http://lilyx.net/2012/02/15/lojban-translation-search-lojbo-jufsisku/</link>
		<comments>http://lilyx.net/2012/02/15/lojban-translation-search-lojbo-jufsisku/#comments</comments>
		<pubDate>Wed, 15 Feb 2012 21:54:40 +0000</pubDate>
		<dc:creator>hagiwara</dc:creator>
				<category><![CDATA[Clojure]]></category>
		<category><![CDATA[Lojban]]></category>
		<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[Search Technologies]]></category>

		<guid isPermaLink="false">http://lilyx.net/?p=419</guid>
		<description><![CDATA[I have been working on a weekend project called &#8220;jufsisku&#8221; for the past few weeks. This project is to build a search engine where you can look up Lojban-English translations using queries in these two languages. You can try out the search here: http://lojban.lilyx.net/jufsisku/ I have shown the demo to a group of Japanese-speaking lojbanist [...]]]></description>
			<content:encoded><![CDATA[<p>I have been working on a weekend project called &#8220;jufsisku&#8221; for the past few weeks. This project is to build a search engine where you can look up Lojban-English translations using queries in these two languages. You can try out the search here:</p>
<p><a href="http://lojban.lilyx.net/jufsisku/">http://lojban.lilyx.net/jufsisku/</a></p>
<p>I have shown the demo to a group of Japanese-speaking lojbanist at our Skype study group the other day, and announced the initial version at the English-speaking mailing list for lojbanists. Overall, it was positively accepted, and I&#8217;m glad to see several people said they liked it. I personally believe that a bunch of good quality translations (and a system to search them) are essentical not only when you are translating some documents but also when you are writing in a foreign languages. Dictionaries don&#8217;t help very much because you have to know not only what words to use but also how to use them. This issue is more serious for languages with small number of speakers and learning materials, like Lojban, which is why I decided to start on this project.</p>
<p>Let me explain the system architecture of lojbo jufsisku here, since it is built on exciting (and relatively recent) open source softwares, which are different from, well, normal MySQL &#038; PHP things. Its search back-end is <a href="http://lucene.apache.org/solr/"> Apache solr </a>, which is a pretty nice full-text search server. The entire web application is written by <a href="https://github.com/weavejester/compojure">Compojure</a>, a <a href="http://clojure.org/">clojure</a> based web aplication framework. (By the way, I have tried many programming languages in the past, including Ruby, Python, Java, Javascript, PHP, etc etc but Clojure is by far the best language for me). The framework and the clojure programming language which the framework is so powerful that there are only 300 lines of codes including EVERYTHING, including logic, html-generation, css, db-store, and so on. </p>
<p>And the translation data is stored in <a href="http://www.mongodb.org/">MongoDB</a>, a flexible &#8220;NoSQL&#8221; database system, to which the users can add new sentences.</p>
<p>lojban jufsisku is only the beginning of my long-term goal to provide the best learning environment for Lojban. Any feedback is appreciated.</p>
]]></content:encoded>
			<wfw:commentRss>http://lilyx.net/2012/02/15/lojban-translation-search-lojbo-jufsisku/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What can NLP do in a Disaster?</title>
		<link>http://lilyx.net/2011/10/26/what-can-nlp-do-in-a-disaster/</link>
		<comments>http://lilyx.net/2011/10/26/what-can-nlp-do-in-a-disaster/#comments</comments>
		<pubDate>Wed, 26 Oct 2011 22:40:11 +0000</pubDate>
		<dc:creator>hagiwara</dc:creator>
				<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[Unnatural Language Processing]]></category>

		<guid isPermaLink="false">http://lilyx.net/?p=398</guid>
		<description><![CDATA[The paper we submitted to IJCNLP2011 has been accepted, and will be presented soon at the conference which will be held in a few weeks from now. The paper describes the #ANPI_NLP project, a voluntary relief project focusing on text and safety information mining in the wake of The East Japan Earthquake in March, 2011. [...]]]></description>
			<content:encoded><![CDATA[<p>The paper we submitted to <a href="http://www.ijcnlp2011.org/">IJCNLP2011</a> has been accepted, and will be presented soon at the conference which will be held in a few weeks from now.<br />
The paper describes the <a href="http://trans-aid.jp/ANPI_NLP/index.php/メインページ">#ANPI_NLP</a> project, a voluntary relief project focusing on text and safety information mining in the wake of The East Japan Earthquake in March, 2011.</p>
<p>Here&#8217;s the <a href="http://www.phontron.com/paper/anpinlp11ijcnlp.pdf">full paper PDF</a> (which is kindly uploaded by the leading co-author Mr. Graham Neubig). </p>
<p>In the paper, we not only describe how the project was started and evolved and what kind of tasks we dealt with, but also focused on the lessons we learned from the project experience.<br />
Even after the submission we have received some useful feedback from colleagues and peer researchers. In retrospect, we could have done more things during the relief effort and even BEFORE any disasters happen.<br />
Please read the paper if you are interested, and give us back any feedback. (Floods in Thailand still continue as I write this article &#8212; I hope the conference is held without any problems)</p>
]]></content:encoded>
			<wfw:commentRss>http://lilyx.net/2011/10/26/what-can-nlp-do-in-a-disaster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Providing a Free Intensive Chinese Weekend Stay Program in New York City</title>
		<link>http://lilyx.net/2011/10/02/providing-a-free-intensive-chinese-weekend-stay-program-in-new-york-city/</link>
		<comments>http://lilyx.net/2011/10/02/providing-a-free-intensive-chinese-weekend-stay-program-in-new-york-city/#comments</comments>
		<pubDate>Sun, 02 Oct 2011 03:33:53 +0000</pubDate>
		<dc:creator>hagiwara</dc:creator>
				<category><![CDATA[Chinese]]></category>
		<category><![CDATA[Language Learning]]></category>

		<guid isPermaLink="false">http://lilyx.net/?p=387</guid>
		<description><![CDATA[My wife and I decided to provide a program named &#8220;Intensive Chinese Weekend Stay&#8221; in New York. In this program, we invite a learner of the Chinese language for free to our home and provide an intensive learning course. Intensive Chinese Weekend Stay Program (New York City) Part of the reasons why we started this [...]]]></description>
			<content:encoded><![CDATA[<p>My wife and I decided to provide a program named &#8220;Intensive Chinese Weekend Stay&#8221; in New York. In this program, we invite a learner of the Chinese language for free to our home and provide an intensive learning course. </p>
<p><a href="http://www.asianharmony.net/intensive-chinese-weekend-stay-program-new-york-city/">Intensive Chinese Weekend Stay Program (New York City)</a></p>
<p>Part of the reasons why we started this kind of program is that we recently signed up for <a href="http://www.couchsurfing.org/">CouchSurfing</a>, which we found very interesting (we are actually hosting an American girl next weekend just two weeks after signing up!). We decided to impose a particular condition when hosting somebody that the guests should at least speak or learning one of the CJK (i.e., Chinese, Japanese, Korean) languages, so that the guests can deepen their understanding in East Asian languages and cultures. </p>
<p>This &#8220;Intensive Chinese Weekend Stay&#8221; is the extension of the above concept. In this program we are going to provide comprehensive pronunciation and grammar review so if you are interested in the details please go to the above page and apply!</p>
]]></content:encoded>
			<wfw:commentRss>http://lilyx.net/2011/10/02/providing-a-free-intensive-chinese-weekend-stay-program-in-new-york-city/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Something More Important Than Just Attending Conferences</title>
		<link>http://lilyx.net/2011/09/10/something-more-important-than-just-attending-conferences/</link>
		<comments>http://lilyx.net/2011/09/10/something-more-important-than-just-attending-conferences/#comments</comments>
		<pubDate>Sat, 10 Sep 2011 16:28:54 +0000</pubDate>
		<dc:creator>hagiwara</dc:creator>
				<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://lilyx.net/?p=367</guid>
		<description><![CDATA[On Labor day weekend, my wife and I paid a visit to Penn State University, which is located at State College, in the middle of the state of Pennsylvania. It was a four-and-a-half-hour bus ride from New York City, taking Megabus first and Gotobus for the return trip, which was not very comfortable. Our purpose [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://lilyx.net/wp-content/uploads/2011/09/photo.jpg"><img src="http://lilyx.net/wp-content/uploads/2011/09/photo.jpg" alt="" title="photo" width="200" height="160" class="alignleft" /></a></p>
<p>On Labor day weekend, my wife and I paid a visit to Penn State University, which is located at <a href="http://maps.google.com/maps?q=State+College,+Centre,+Pennsylvania&#038;hl=en&#038;ll=40.791849,-77.859077&#038;spn=0.078757,0.109692&#038;sll=37.0625,-95.677068&#038;sspn=23.875,57.630033&#038;geocode=FTN1bgIdX_Nb-w&#038;t=m&#038;z=13&#038;vpsrc=6">State College</a>, in the middle of the state of Pennsylvania. It was a four-and-a-half-hour bus ride from New York City, taking Megabus first and Gotobus for the return trip, which was not very comfortable. </p>
<p>Our purpose is to pay a visit to a professor there whose major is Confucianism and East Asian history. This is especially useful for my wife to have a deeper glimpse of the field of East Asian Philosophy, and to set the future research direction. (The Hongkong-born professor and we have talked in Mandarin, which was interesting to me, too).</p>
<p>Visiting such researchers in the country reminds me of <a href="http://nlpers.blogspot.com/2007/08/conferences-costs-and-benefits.html">a blog post &#8220;Conferences: Costs and Benefits &#8221; in natural language processing blog</a>, where the author Hal Daumé III claims that inviting famous type researchers to one&#8217;s own university and visiting labs in the country and having deep in-office conversation can compensate for the large amount of money we usually spend on domestic and/or international conferences every year. </p>
<p>I feel more positive about this idea as I keep working here at Rakuten Institute of Technology, New York. It cannot be underestimated to be able to work in a hub-like place which lots of top-tier researchers keep visiting. That&#8217;s one of the reasons why places like Google and Microsoft Research stay as competitive places all the time, where a lot of researchers and top engineers have &#8220;tech-talks.&#8221; That could be much more important than simply attending every conference, from good ones and not-so-good ones. I would also like to increase this kind of opportunity personally, hopefully starting from this year. </p>
]]></content:encoded>
			<wfw:commentRss>http://lilyx.net/2011/09/10/something-more-important-than-just-attending-conferences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>List of Past NLP Conference Best Papers</title>
		<link>http://lilyx.net/2011/08/30/list-of-past-nlp-conference-best-papers/</link>
		<comments>http://lilyx.net/2011/08/30/list-of-past-nlp-conference-best-papers/#comments</comments>
		<pubDate>Tue, 30 Aug 2011 01:38:14 +0000</pubDate>
		<dc:creator>hagiwara</dc:creator>
				<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://lilyx.net/?p=341</guid>
		<description><![CDATA[Just for my convenience, I&#8217;ve listed up best papers of major NLP conferences (ACL / COLING / NAACL / EMNLP / CoNLL) for the past 7 years or so. If you find anything wrong or mistaken, please let me know. Thanks~ ACL 2005: David Chiang A hierarchical phrase-based model for statistical machine translation 2006: Rion [...]]]></description>
			<content:encoded><![CDATA[<p>Just for my convenience, I&#8217;ve listed up best papers of major NLP conferences (ACL / COLING / NAACL / EMNLP / CoNLL) for the past 7 years or so. If you find anything wrong or mistaken, please let me know. Thanks~</p>
<p><b>ACL</b></p>
<ul>
<li>2005: David Chiang  <a href="http://www.isi.edu/~chiang/papers/chiang-acl05.pdf">A hierarchical phrase-based model for statistical machine translation</a></li>
<li>2006: Rion Snow, Daniel Jurafsky, and Andrew Ng. <a href="http://nlp.stanford.edu/pubs/semtax_acl06.pdf">Semantic Taxonomy Induction from Heterogenous Evidence</a></li>
<li>2007: Y. W. Wong and R. J. Mooney <a href="http://www.cs.utexas.edu/users/ml/papers/wasp-logic-acl-07.pdf">Learning synchronous grammars for semantic parsing with lambda calculus</a>
<li>2008: Liang Huang. <a href="http://www.cis.upenn.edu/~lhuang3/forest-rerank.pdf">Forest Reranking: Discriminative Parsing with Non-Local Features</a><br />
     Libin Shen, Jinxi Xu and Ralph Weischedel  <a href="A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model">A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model</a></li>
<li>2009: Andre Martins, Noah Smith and Eric Xing. <a href="http://www.cs.cmu.edu/~afm/Home_files/acl2009.pdf">Concise Integer Linear Programming Formulations for Dependency Parsing</a><br />
    S.R.K. Branavan, Harr Chen, Luke Zettlemoyer and Regina Barzilay. <a href="http://aclweb.org/anthology/P/P09/P09-1010.pdf">Reinforcement Learning for Mapping Instructions to Actions</a><br />
   Adam Pauls and Dan Klein. <a href="http://www.aclweb.org/anthology-new/P/P09/P09-1108.pdf">K-Best A* Parsing</a></li>
<li>2010:<br />
Matthew Gerber and Joyce Chai. <a href="http://aclweb.org/anthology/P/P10/P10-1160.pdf">Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates</a></li>
<li>2011: Dipanjan Das, Slav Petrov. <a href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/ja/us/pubs/archive/37071.pdf">Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections</a>
</ul>
<p><b>COLING</b></p>
<ul>
<li>2006: See ACL 2006</li>
<li>2008: Bill MacCartney and Christopher D. Manning. <a href="http://nlp.stanford.edu/pubs/natlog-coling08.pdf">Modeling semantic containment and exclusion in natural language inference</a></li>
<li>2010: Fan Bu, Xiaoyan Zhu and Ming Li, <a href="http://aclweb.org/anthology/C/C10/C10-1014.pdf">Measuring the Non-compositionality of Multiword Expressions</a></li>
</ul>
<p><b>NAACL</b></p>
<ul>
<li>2006: 	Mehryar Mohri and Brian Roark 	<a href="http://www.cs.nyu.edu/~mohri/pub/spcfg.pdf">Probabilistic Context-Free Grammar Induction Based on Structural Zeros</a></li>
<li>2006: 	Aria Haghighi and Dan Klein 	<a href="http://acl.ldc.upenn.edu/N/N06/N06-1041.pdf">Prototype-Driven Learning for Sequence Models</a></li>
<li>2007: 	Antti-Veikko Rosti, Bing Xiang, Spyros Matsoukas, Richard Schwartz, Necip Fazil Ayan and Bonnie Dorr 	<a href="http://acl.ldc.upenn.edu/N/N07/N07-1029.pdf">Combining Outputs from Multiple Machine Translation Systems</a></li>
<li>2009: 	Hoifung Poon, Colin Cherry and Kristina Toutanova 	<a href="http://www.cs.washington.edu/homes/hoifung/papers/naacl09.pdf"> Unsupervised Morphological Segmentation with Log-Linear Models</a></li>
<li>2009: 	David Chiang, Kevin Knight and Wei Wang 	<a href="http://www.isi.edu/~chiang/papers/11001.pdf">11,001 New Features for Statistical Machine Translation </a></li>
<li>2010: Aria Haghighi and Dan Klein  <a href="http://aclweb.org/anthology/N/N10/N10-1061.pdf">Coreference Resolution in a Modular, Entity-Centered Model</a></li>
</ul>
<p><b>EMNLP</b></p>
<ul>
<li>2005  	Ryan McDonald, Fernando Pereira, Kiril Ribarov and Jan Hajic 	<a href="http://www.ryanmcd.com/papers/nonprojectiveHLT-EMNLP2005.pdf">Non-Projective Dependency Parsing using Spanning Tree Algorithms</a></li>
<li>2006: NO AWARD</li>
<li>2007: 	James Clarke and Maria Lapata 	<a href="Modelling Compression with Discourse Constraints">Modelling Compression with Discourse Constraints</a></li>
<li>2008: 	NO AWARD</li>
<li>2009: 	Hoifung Poon and Pedro Domingos 	<a href="http://www.aclweb.org/anthology-new/D/D09/D09-1001.pdf">Unsupervised semantic parsing</a></li>
<li>2010:    Automata Terry Koo, Alexander M. Rush, Michael Collins, Tommi Jaakkola and David Sontag <a href="http://www.cs.columbia.edu/~mcollins/papers/emnlp10-mst.pdf">Dual Decomposition for Parsing with Non-Projective Head </a></li>
<li>2011:  Wei Lu and Hwee Tou Ng <a href="http://www.comp.nus.edu.sg/~nght/pubs/emnlp11_gen.pdf">A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions </a></li>
</ul>
<p><b>CoNLL</b></p>
<ul>
<li>2006:  Rie Kubota Ando  <a href="http://riejohnson.com/rie/wsd_nll_cr.pdf">Applying Alternating Structure Optimization to Word Sense Disambiguation</a></li>
<li>2007: James Clarke and Mirella Lapata. <a href="http://acl.ldc.upenn.edu/D/D07/D07-1001.pdf">Modelling Compression with Discourse Constraints</a></li>
<p>2008:  Xavier Carreras, Michael Collins and Terry Koo  <a href="http://www.cs.columbia.edu/~mcollins/papers/conll.final.pdf">TAG, Dynamic Programming, and the Perceptron for Efficient, Feature-Rich Parsing </a></li>
<li>2009: Roi Reichart, Ari Rappoport. <a href="http://aclweb.org/anthology/W/W09/W09-1103.pdf">Sample Selection for Statistical Parsers: Cognitively Driven Algorithms and Evaluation Measures</a>
<li>2010: Alexander Clark <a href="http://www.cs.rhul.ac.uk/home/alexc/papers/conll2010.pdf">Efficient, correct, unsupervised learning for context-sensitive languages</a></li>
<li>2011 Wen-tau Yih, Kristina Toutanova, John Platt, and Chris Meek <a href="http://aclweb.org/anthology-new/W/W11/W11-0329.pdf">Learning Discriminative Projections for Text Similarity Measures</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://lilyx.net/2011/08/30/list-of-past-nlp-conference-best-papers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Call for Papers: Special Issue on &#8220;Unnatural Language Processing&#8221; (Journal of Natural Language Processing)</title>
		<link>http://lilyx.net/2011/08/21/call-for-papers-special-issue-on-unnatural-language-processing-journal-of-natural-language-processing/</link>
		<comments>http://lilyx.net/2011/08/21/call-for-papers-special-issue-on-unnatural-language-processing-journal-of-natural-language-processing/#comments</comments>
		<pubDate>Sun, 21 Aug 2011 16:35:21 +0000</pubDate>
		<dc:creator>hagiwara</dc:creator>
				<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[Unnatural Language Processing]]></category>

		<guid isPermaLink="false">http://lilyx.net/?p=333</guid>
		<description><![CDATA[The special issue &#8220;Unnatural Language Processing&#8221; of Journal of Natural Language Processing, for which I&#8217;m a leading editorial member, has started its call for paper a few weeks ago. This special issue, subtitled &#8220;Processing of Out-of-the-box Language Expressions&#8221; is the sequel to the past two events of &#8220;Unnatural Language Processing&#8221; last year. The topics include [...]]]></description>
			<content:encoded><![CDATA[<p>The special issue &#8220;Unnatural Language Processing&#8221; of Journal of Natural Language Processing, for which I&#8217;m a leading editorial member, has started its call for paper a few weeks ago. </p>
<p>This special issue, subtitled &#8220;Processing of Out-of-the-box Language<br />
Expressions&#8221; is the sequel to the past two events of &#8220;Unnatural Language Processing&#8221; last year. The topics include not only normal academic papers but also papers describing systems and data regarding the theme. </p>
<p>Although we&#8217;ve prepared the CFP only in Japanese, which <a href="http://www.anlp.jp/home/topic110812.html">you can see here</a>, this doesn&#8217;t mean that we are excluding any submissions in English. </p>
<p>By the way, I&#8217;ve observed some arguments on twitter about the title, claiming that the word &#8220;unnatural&#8221; is not suitable for the theme because the language phenomena which this special issue focuses on are exactly the examples of human &#8220;natural&#8221; language activity. </p>
<p>Let me explain a little about this &#8212; we (and ANLP, too) have absolutely no intention to declare or define them as &#8220;unnatural.&#8221; You can see that we are consistently using the word &#8220;out-of-the-box&#8221; in the CFP. We suppose that the title is just an alias to this field of domain, which targets at the processing of language phenomena which have not been gathering much attention so far because they are irregular and/or new. Further academic discussion should follow in the near future. </p>
<p>Anyway, the submission deadline is March 23rd, 2012. We all welcome your submission! </p>
]]></content:encoded>
			<wfw:commentRss>http://lilyx.net/2011/08/21/call-for-papers-special-issue-on-unnatural-language-processing-journal-of-natural-language-processing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using MeCab the Japanese Morphological Analyzer from Clojure</title>
		<link>http://lilyx.net/2011/07/30/using-mecab-the-japanese-morphological-analyzer-from-clojure/</link>
		<comments>http://lilyx.net/2011/07/30/using-mecab-the-japanese-morphological-analyzer-from-clojure/#comments</comments>
		<pubDate>Sat, 30 Jul 2011 03:31:53 +0000</pubDate>
		<dc:creator>hagiwara</dc:creator>
				<category><![CDATA[Clojure]]></category>
		<category><![CDATA[Japanese]]></category>
		<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://lilyx.net/?p=322</guid>
		<description><![CDATA[The Japanese morphological analyzer MeCab can also be directly called from Clojure, too, by using its Java binding. I have, however, come across some pitfalls related to JNI in the process, so I&#8217;ll describe how I&#8217;ve overcome them in the following so that everyone else doesn&#8217;t have to stumble over the same issues. The first [...]]]></description>
			<content:encoded><![CDATA[<p>The Japanese morphological analyzer <a href="http://mecab.sourceforge.net/">MeCab</a> can also be directly called from Clojure, too, by using its Java binding. I have, however, come across some pitfalls related to JNI in the process, so I&#8217;ll describe how I&#8217;ve overcome them in the following so that everyone else doesn&#8217;t have to stumble over the same issues.</p>
<p>The first thing you have to do is to install MeCab&#8217;s Java binding, which is rather straightforward. Download mecab-java-0.98pre3.tar.gz (which is the latest version at the time of writing) from <a href="http://sourceforge.net/projects/mecab/files/">here</a>, untar &#038; make it. (Be sure to set the INCLUDE variable to an appropriate path if you are using non-typical environment, such as OpenJDK.)</p>
<p>One issue I encountered here is that JVM dies from SIGSEGV when tried to run the sample program in Java:</p>
<blockquote><p>
#<br />
# An unexpected error has been detected by Java Runtime Environment:<br />
#<br />
#  SIGSEGV (0xb) at pc=0x0000003091c7b59b, pid=32167, tid=1106934080<br />
#
</p></blockquote>
<p>I found <a href="http://d.hatena.ne.jp/knaka20blue/20090907/1252305752">one blog article</a> which describes exactly the same problem. As the article suggests, adding the following two lines after line 710 of MeCab_wrap.cxx did the trick for me:</p>
<pre>
char work[128] ; // add this line
sprintf(work,"result:%0x\n",result); // add this line
</pre>
<p>Now you are ready to use MeCab from Clojure code. Make sure that MeCab.jar is in CLASSPATH and libMeCab.so is loadable, both of which are created after running &#8220;make.&#8221; </p>
<p>The thing is, even after importing org.chasen.mecab MeCab + Tagger + Node and running (System/loadLibrary &#8220;MeCab&#8221;), the Clojure code will complain with &#8220;UnsatisfiedLinkError,&#8221; which basically means that the necessary native code library cannot be loaded appropriately. </p>
<p>The reason was, as I found out after a full hour of struggling, try and error, that the library is not loaded appropriately because of Clojure&#8217;s classloader. The solution, provided here, is to call &#8220;Runtime/loadLibrary0&#8243; method directly using wall-hack-method so that the library is loaded in the same classLoader which the caller specifies:</p>
<pre>
(use '[clojure.contrib.java-utils : only (wall-hack-method)])

(defn load-lib [class lib]
  (wall-hack-method java.lang.Runtime "loadLibrary0" [Class String]
                    (Runtime/getRuntime) class lib))

(load-lib MeCab "MeCab")
</pre>
<p>You can now call MeCab via Clojure&#8217;s typical Java interop functions/macros:</p>
<pre>
(println (MeCab/VERSION))
(let [tagger (new Tagger)
      sent "太郎は二郎にこの本を渡した。"]
  (println (. tagger (parse sent)))

  (loop [node (. tagger (parseToNode sent))]
    (when node
      (println (str (. node getSurface) "\t" (. node getFeature)))
      (recur (. node getNext))
      )
    )
  )
</pre>
<blockquote><p>
0.98pre3<br />
太郎    名詞,固有名詞,人名,名,*,*,太郎,タロウ,タロー<br />
は      助詞,係助詞,*,*,*,*,は,ハ,ワ<br />
二郎    名詞,固有名詞,人名,名,*,*,二郎,ジロウ,ジロー<br />
に      助詞,格助詞,一般,*,*,*,に,ニ,ニ<br />
この    連体詞,*,*,*,*,*,この,コノ,コノ<br />
本      名詞,一般,*,*,*,*,本,ホン,ホン<br />
を      助詞,格助詞,一般,*,*,*,を,ヲ,ヲ<br />
渡し    動詞,自立,*,*,五段・サ行,連用形,渡す,ワタシ,ワタシ<br />
た      助動詞,*,*,*,特殊・タ,基本形,た,タ,タ<br />
。      記号,句点,*,*,*,*,。,。,。<br />
EOS</p>
<p>        BOS/EOS,*,*,*,*,*,*,*,*<br />
太郎    名詞,固有名詞,人名,名,*,*,太郎,タロウ,タロー<br />
は      助詞,係助詞,*,*,*,*,は,ハ,ワ<br />
二郎    名詞,固有名詞,人名,名,*,*,二郎,ジロウ,ジロー<br />
に      助詞,格助詞,一般,*,*,*,に,ニ,ニ<br />
この    連体詞,*,*,*,*,*,この,コノ,コノ<br />
本      名詞,一般,*,*,*,*,本,ホン,ホン<br />
を      助詞,格助詞,一般,*,*,*,を,ヲ,ヲ<br />
渡し    動詞,自立,*,*,五段・サ行,連用形,渡す,ワタシ,ワタシ<br />
た      助動詞,*,*,*,特殊・タ,基本形,た,タ,タ<br />
。      記号,句点,*,*,*,*,。,。,。<br />
        BOS/EOS,*,*,*,*,*,*,*,*
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://lilyx.net/2011/07/30/using-mecab-the-japanese-morphological-analyzer-from-clojure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Enjoying the diverse culture in Toronto</title>
		<link>http://lilyx.net/2011/07/28/enjoying-the-diverse-culture-in-toronto/</link>
		<comments>http://lilyx.net/2011/07/28/enjoying-the-diverse-culture-in-toronto/#comments</comments>
		<pubDate>Thu, 28 Jul 2011 03:57:22 +0000</pubDate>
		<dc:creator>hagiwara</dc:creator>
				<category><![CDATA[Chinese]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://lilyx.net/?p=313</guid>
		<description><![CDATA[We&#8217;ve been on a short trip to Toronto over the weekend, visiting my wife&#8217;s old friends, one of whom is now spending a week in her hometown. We&#8217;ve been to Niagara falls, downtown Toronto (ex-world-tallest CN Tower was amazing, and Cosa Loma castle was fun), had a BBQ at their wonderful house, and even enjoyed [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://lilyx.net/wp-content/uploads/2011/07/P1040257.jpg"><img src="http://lilyx.net/wp-content/uploads/2011/07/P1040257.jpg" alt="" title="P1040257" width="256" class="alignleft wp-image-314" /></a></p>
<p>We&#8217;ve been on a short trip to Toronto over the weekend, visiting my wife&#8217;s old friends, one of whom is now spending a week in her hometown. </p>
<p>We&#8217;ve been to Niagara falls, downtown Toronto (ex-world-tallest CN Tower was amazing, and Cosa Loma castle was fun), had a BBQ at their wonderful house, and even enjoyed Cantonese style Dimsum, too!</p>
<p>I didn&#8217;t know that the Chinese culture brought by a large number of immigrants has penetrated so deep into the city, finding a lot of Chinese-style restaurants and supermarkets on the streets. </p>
<p>Bot of our friends actually have Cantonese roots, which makes their cultural background very diverse. The conversation and languages were also diverse, ranging from Cantonese and English, and even to Mandarin and some Japanese, switching from one language to anther even during single sentences. It&#8217;s a pity that I&#8217;m all thumbs when it comes to Cantonese, and always motivated by linguistic diversity but never had time to master it. It&#8217;s still pleasant to my ears just listening to them speaking and enjoying the tonal language&#8217;s melodies and exotic vowels.</p>
<p>Anyway, we really thank Diana and Niki for their hospitality and fun. We&#8217;ll definitely be back soon!</p>
]]></content:encoded>
			<wfw:commentRss>http://lilyx.net/2011/07/28/enjoying-the-diverse-culture-in-toronto/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Extracting Multilingual Parallel Sentences from tatoeba.org</title>
		<link>http://lilyx.net/2011/07/21/extracting-multilingual-parallel-senteces-from-tatoeba-com/</link>
		<comments>http://lilyx.net/2011/07/21/extracting-multilingual-parallel-senteces-from-tatoeba-com/#comments</comments>
		<pubDate>Thu, 21 Jul 2011 03:32:47 +0000</pubDate>
		<dc:creator>hagiwara</dc:creator>
				<category><![CDATA[Chinese]]></category>
		<category><![CDATA[Clojure]]></category>
		<category><![CDATA[Japanese]]></category>
		<category><![CDATA[Lojban]]></category>
		<category><![CDATA[Machine Translation]]></category>

		<guid isPermaLink="false">http://lilyx.net/?p=309</guid>
		<description><![CDATA[I recently found out that tatoeba.org is a pretty nice resource for collecting parallel text in many languages. The major reason why I love it is that the whole data is downloadable as a dump file, with all the sentences being under the creative commons license (although there are some mistakes in the sentences). Specifically, [...]]]></description>
			<content:encoded><![CDATA[<p>I recently found out that <a href="http://tatoeba.org/">tatoeba.org</a> is a pretty nice resource for collecting parallel text in many languages. The major reason why I love it is that the whole data is downloadable as a dump file, with all the sentences being under the creative commons license (although there are some mistakes in the sentences).</p>
<p>Specifically, you can just go to the <a href="http://tatoeba.org/jpn/download_tatoeba_example_sentences">Downloads Page</a> and find the dump files. All you need is basically the sentence file in which all the sentences are stored in a tab-separated format with their IDs and languages. If you want parallel sentences, you need to use the links file, which is just a list of tab-separated &#8220;source sentence id [tab] target sentence id&#8221; rows.</p>
<p>Once you download the file, you can use your favorite way to &#8220;join&#8221; the files and get the parallel sentences. One of my favorite way would be to store everything into MongoDB and issue a few queries and, in a second, you&#8217;d get the result. </p>
<p>This time, however, I used Clojure to join the files for my learning. The trickiest part would be computing the one-level transitive relation of the graph (that is, to compute an edge A -> C from A -> B and B -> C), because the dump links file does not contain indirect translations, which tatoeba.com search does. The following Clojure code snippet does this:</p>
<pre>
(defn get-transitive [g]
  (into {} (filter second (map (fn [[k v]] [k (reduce union (map #(g %) v))]) g)))
  )

(let [links (reduce (fn [x y] (merge-with union x y))
                    (map #(let [v (split #"\t" %)]
                            {(first v) #{(second v)}}) (read-lines *in*)))
      tlinks (merge-with union links (get-transitive links))]
  (doseq [[k v] tlinks elm v]
    (println (str k "\t" elm))
    )
  )
</pre>
<p>The output is the original graph augmented with step-1 indirect transitive edges.</p>
<p>I won&#8217;t show the latter part, i.e, extraction of parallel sentence from the joined file because it&#8217;s pretty much straightforward. The result after extracting parallel sentences in English, Japanese, and Chinese looks like the following:</p>
<blockquote><p>
How would you like your steak?		ステーキの焼き方はどうなさいますか。			您的牛排要几分熟？<br />
Even though he was tired, he went on with his work.		疲れていたけれど、彼は仕事を続けた。			他雖然很累，但是也繼續工作。<br />
That was all Greek to me.		私にはちんぷんかんぷんでした。			我完全看不懂。<br />
He can both speak and write Russian.		彼はロシア語が話せるし書くことができる。			他會說俄語，也會寫俄文。<br />
You must consider it before you answer.		答える前によく考えねばならない。			在回答之前必须要考虑清楚。<br />
&#8230;
</p></blockquote>
<p>You can extract Lojban-English parallel sentences as well:</p>
<blockquote><p>
599308  jbo     le karce cu bredi       46819   eng     The car is ready.<br />
599316  jbo     le mlatu cu nelci le nu sipna ne&#8217;a mi   44558   eng     The cat likes to sleep beside me.<br />
599317  jbo     ti du le mi karce       56205   eng     This is my car.<br />
599321  jbo     dei na jufra    547389  eng     This is not a sentence.<br />
599329  jbo     lo nanla pu smaji       47434   eng     The boy remained silent.<br />
&#8230;
</p></blockquote>
<p>which I think is really useful for my Lojban study!</p>
<p>Enjoy~</p>
]]></content:encoded>
			<wfw:commentRss>http://lilyx.net/2011/07/21/extracting-multilingual-parallel-senteces-from-tatoeba-com/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Launching Your First Meetup</title>
		<link>http://lilyx.net/2011/07/16/launching-your-first-meetup/</link>
		<comments>http://lilyx.net/2011/07/16/launching-your-first-meetup/#comments</comments>
		<pubDate>Sat, 16 Jul 2011 15:40:46 +0000</pubDate>
		<dc:creator>hagiwara</dc:creator>
				<category><![CDATA[Chinese]]></category>
		<category><![CDATA[Japanese]]></category>
		<category><![CDATA[Korean]]></category>
		<category><![CDATA[Language Learning]]></category>

		<guid isPermaLink="false">http://lilyx.net/?p=303</guid>
		<description><![CDATA[Last Wednesday, we held our first meet-up meeting of East Asian Language Learning through Interpretation Methods. The purpose is to brush up your language skills (we target at East Asian languages, namely Chinese, Japanese, and Korean) through interpretation methods. Although this was our first time to even set up a meetup group, it was quite [...]]]></description>
			<content:encoded><![CDATA[<p>Last Wednesday, we held our <a href="http://www.meetup.com/East-Asian-Language-Learning-through-Interpretation-Methods/events/24528201/">first meet-up meeting</a> of <a href="http://www.meetup.com/East-Asian-Language-Learning-through-Interpretation-Methods/"> East Asian Language Learning through Interpretation Methods. </a> The purpose is to brush up your language skills (we target at East Asian languages, namely Chinese, Japanese, and Korean) through interpretation methods. </p>
<p>Although this was our first time to even set up a meetup group, it was quite a success, I believe, with 11 memebers participating the first meetup. After organizing the meetup, we organizers found several things which we should have done or shouldn&#8217;t have done. Here are some tips:</p>
<p><b>Read &#8220;Organizer Tips&#8221;</b></p>
<p>Meetup is equipped with nice tools for organizers. One of my favorite is <a href="http://www.meetup.com/East-Asian-Language-Learning-through-Interpretation-Methods/checklist/"> &#8220;organizer tips,&#8221; </a> where you can find DOs and DON&#8217;Ts when setting up meetup group or meetings as an organizer. Read them through and it&#8217;ll definitely help you make your meetup group a better place.</p>
<p><b>Get Prepared</b></p>
<p>Since we thought that the first meet-up was the best opportunity ever to let other participants know our goals, thoughts and purpose, we prepared <a href="http://lilyx.net/media/Kick-off-Presentation0713.ppt"> a complete presentation</a>. We also practiced how to demonstrate the language practice through &#8220;interpretation methods,&#8221; and collected materials which we can use for the demonstration.</p>
<p>The preparation itself was actually a fun, and it&#8217;ll help you organizers put your ideas into shape.</p>
<p><b>Details Matter</b></p>
<p>Details matter when it comes to letting participants feel more comfortable. Here are some small items we have prepared:</p>
<ul>
<li> iPhone x 2 (or any portable music devices) &#8212; to play audio language materials</li>
<li> Battery-powered portable speakers &#8212; to make the sound louder (that wasn&#8217;t loud enough actually)</li>
<li> Printed meet-up group logo &#8212; for participants to easily recognize where we are</li>
<li> Printed participant list &#8212; for organizers to take attendance</li>
<li> Printed name tags &#8212; this was the best of all (many participants said it was nice.) Meetup has a functionality to export a PDF file containing name tags which you can just print out. This was particularly useful to talk about people&#8217;s names because we focus on language learning.</li>
</ul>
<p><b>Bring Your Camera</b></p>
<p>Remember to take your camera to the meeting (we did, but accidentally forgot taking any pictures, what a pity). This not only serves as a record, but is also a good way to let potential participants know what the meeting was like. </p>
<p><b>Send Out Remind and Thank-You Emails</b></p>
<p>This goes without saying. It is also a great opportunity to hear the feedback from the participants.</p>
<p>&#8212;</p>
<p>After all, we decided to hold the meeting regularly. And by the way, <a href="http://www.meetup.com/East-Asian-Language-Learning-through-Interpretation-Methods/messages/boards/thread/13653361">we are hiring another co-organizer</a>. We appreciate your interests in our group!</p>
]]></content:encoded>
			<wfw:commentRss>http://lilyx.net/2011/07/16/launching-your-first-meetup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

