The Japanese morphological analyzer MeCab can also be directly called from Clojure, too, by using its Java binding. I have, however, come across some pitfalls related to JNI in the process, so I’ll describe how I’ve overcome them in the following so that everyone else doesn’t have to stumble over the same issues.

The first thing you have to do is to install MeCab’s Java binding, which is rather straightforward. Download mecab-java-0.98pre3.tar.gz (which is the latest version at the time of writing) from here, untar & make it. (Be sure to set the INCLUDE variable to an appropriate path if you are using non-typical environment, such as OpenJDK.)

One issue I encountered here is that JVM dies from SIGSEGV when tried to run the sample program in Java:

#
# An unexpected error has been detected by Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0000003091c7b59b, pid=32167, tid=1106934080
#

I found one blog article which describes exactly the same problem. As the article suggests, adding the following two lines after line 710 of MeCab_wrap.cxx did the trick for me:

char work[128] ; // add this line
sprintf(work,"result:%0x\n",result); // add this line

Now you are ready to use MeCab from Clojure code. Make sure that MeCab.jar is in CLASSPATH and libMeCab.so is loadable, both of which are created after running “make.”

The thing is, even after importing org.chasen.mecab MeCab + Tagger + Node and running (System/loadLibrary “MeCab”), the Clojure code will complain with “UnsatisfiedLinkError,” which basically means that the necessary native code library cannot be loaded appropriately.

The reason was, as I found out after a full hour of struggling, try and error, that the library is not loaded appropriately because of Clojure’s classloader. The solution, provided here, is to call “Runtime/loadLibrary0″ method directly using wall-hack-method so that the library is loaded in the same classLoader which the caller specifies:

(use '[clojure.contrib.java-utils : only (wall-hack-method)])

(defn load-lib [class lib]
  (wall-hack-method java.lang.Runtime "loadLibrary0" [Class String]
                    (Runtime/getRuntime) class lib))

(load-lib MeCab "MeCab")

You can now call MeCab via Clojure’s typical Java interop functions/macros:

(println (MeCab/VERSION))
(let [tagger (new Tagger)
      sent "太郎は二郎にこの本を渡した。"]
  (println (. tagger (parse sent)))

  (loop [node (. tagger (parseToNode sent))]
    (when node
      (println (str (. node getSurface) "\t" (. node getFeature)))
      (recur (. node getNext))
      )
    )
  )

0.98pre3
太郎 名詞,固有名詞,人名,名,*,*,太郎,タロウ,タロー
は 助詞,係助詞,*,*,*,*,は,ハ,ワ
二郎 名詞,固有名詞,人名,名,*,*,二郎,ジロウ,ジロー
に 助詞,格助詞,一般,*,*,*,に,ニ,ニ
この 連体詞,*,*,*,*,*,この,コノ,コノ
本 名詞,一般,*,*,*,*,本,ホン,ホン
を 助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
渡し 動詞,自立,*,*,五段・サ行,連用形,渡す,ワタシ,ワタシ
た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ
。 記号,句点,*,*,*,*,。,。,。
EOS

BOS/EOS,*,*,*,*,*,*,*,*
太郎 名詞,固有名詞,人名,名,*,*,太郎,タロウ,タロー
は 助詞,係助詞,*,*,*,*,は,ハ,ワ
二郎 名詞,固有名詞,人名,名,*,*,二郎,ジロウ,ジロー
に 助詞,格助詞,一般,*,*,*,に,ニ,ニ
この 連体詞,*,*,*,*,*,この,コノ,コノ
本 名詞,一般,*,*,*,*,本,ホン,ホン
を 助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
渡し 動詞,自立,*,*,五段・サ行,連用形,渡す,ワタシ,ワタシ
た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ
。 記号,句点,*,*,*,*,。,。,。
BOS/EOS,*,*,*,*,*,*,*,*

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>