Multithreading with KoNLPy¶
Sometimes it gets boring to wait for tagging jobs to end. How about using some concurrency tricks? Python supports multithreading and multiprocessing out-of-the-box, and you can use them with KoNLPy as well. Here’s an example using multithreading.
#! /usr/bin/python2.7
# -*- coding: utf-8 -*-
from konlpy.tag import Kkma
from konlpy.corpus import kolaw
from threading import Thread
import jpype
def do_concurrent_tagging(start, end, lines, result):
jpype.attachThreadToJVM()
l = [k.pos(lines[i]) for i in range(start, end)]
result.append(l)
return
if __name__=="__main__":
import time
print('Number of lines in document:')
k = Kkma()
lines = kolaw.open('constitution.txt').read().splitlines()
nlines = len(lines)
print(nlines)
print('Batch tagging:')
s = time.clock()
result = []
l = [k.pos(line) for line in lines]
result.append(l)
t = time.clock()
print(t - s)
print('Concurrent tagging:')
result = []
t1 = Thread(target=do_concurrent_tagging, args=(0, int(nlines/2), lines, result))
t2 = Thread(target=do_concurrent_tagging, args=(int(nlines/2), nlines, lines, result))
t1.start(); t2.start()
t1.join(); t2.join()
m = sum(result, []) # Merge results
print(time.clock() - t)
Console:
Number of lines in document: 356 Batch tagging: 37.758173 Concurrent tagging: 8.037602
Check out how much faster it gets!
Note
- Some useful references on concurrency with Python:
- 장혜식, “파이썬은 멀티코어 줘도 쓰잘데기가 없나요?”에 대한 파이썬 2.6의 대답, 2008.
- 하용호, 파이썬으로 클라우드 하고 싶어요, 2011.