Since Clojure is based on JVM, you can easily pick a publicly available library for Java (machine learning, multimedia processing, or whatever) and call it. Calling Java libraries is normally straightforward thanks to Clojure’s inter-operation functionalities, but you could spend hours reading the library’s API document and tweaking around your code accordingly, especially if you are not used to the whole Java architecture very much (like me).
In the following, I’ll show briefly how to use the Java-based SVM libarary JLIBSVM from Clojure.
First off, we’ll be using following two helper macro and functions — set-all! sets multiple fields of a Java objects at the same time, and into-sparsevec converts a map into a SparseVector, which will be used to represent vectors for JLIBSVM. And do not forget to “import” all the objects you need.
(import '(java.util HashSet Vector))
(import '(edu.berkeley.compbio.jlibsvm.kernel LinearKernel))
(import '(edu.berkeley.compbio.jlibsvm ImmutableSvmParameterGrid))
(import '(edu.berkeley.compbio.jlibsvm.binary C_SVC MutableBinaryClassificationProblemImpl))
(import '(edu.berkeley.compbio.jlibsvm.util SparseVector))
(defmacro set-all! [obj m]
`(do ~@(map (fn [e] `(set! (. ~obj ~(key e)) ~(val e))) m) ~obj))
(defn into-sparsevec [m]
(let [sv (new SparseVector (count m))
sm (sort-by first m)]
(set-all! sv {indexes (int-array (map first sm))
values (float-array (map second sm))})
sv)
)
The rest of the process is easy:
1. Create an SVM (this time we solve a binary classification problem) and parameters for training.
(def svm (new C_SVC))
(def builder (ImmutableSvmParameterGrid/builder))
(set-all! builder {eps 1.0e-3
Cset (doto (new HashSet) (.add (float 1.0)))
kernelSet (doto (new HashSet) (.add (new LinearKernel)))})
(def param (. builder build))
2. Create a problem — which consists of training examples and their classes.
(def x1 (into-sparsevec {1 1.0}))
(def x2 (into-sparsevec {1 -1.0}))
(def vx (new Vector [x1 x2]))
(def vy (new Vector [1 -1]))
(def prob (new MutableBinaryClassificationProblemImpl String (count vy)))
(doseq [x (map list vx vy)]
(. prob (addExample (first x) (second x)))
)
3. Train an SVM. This will returned a trained model.
(def model (. svm (train prob param)))
4. Then you are ready to classify new test examples using the model.
(println (. model (predictLabel x1))) (println (. model (predictLabel x2)))
(this will produce 1 and -1, respectively)
And that’s it. You are ready to use SVM for any problems (the overall process is the same for other SVMs, e.g., regression). One drawback is that saving and loading of learned models are not implemented in JLIBSVM yet. In order to do that, you have to write a Clojure code which directly writes or reads SVM parameters (which is actually not so difficult), or you can write a Java patch to implement it.
About the Author
Masato Hagiwara currently works for Rakuten Institute of Technology in New York, as a Senior Scientist. Have worked on search technologies at Google, Microsoft Research, and Baidu in the past. Expert in Natural Language Processing (NLP). Also a lead translator of the O'Reilly book "Natural Language Processing in Python." A native speaker of Japanese. Good command of English and Chinese (Mandarin). For more information, see About Me.Pages
- 100 NLP Papers
- About Me
- iconlang – new ideographic writing system for better visibility and legibility
- iconlang – 視認性・識別性向上のための新しい表意文字体系
- Music
- Music for Language Fans
- NLTK Japanese Corpora – NLTKで使える日本語コーパス
- Python/Romkan ローマ字とひらがなを相互に変換する Python用のライブラリ
- TinySegmenter in Python
- 中国語学習完全ガイド | 1年以内にマスターする中国語
- 巻き舌クリニック – みんなで巻き舌を克服するサイト



