Sometimes it gets boring to wait for tagging jobs to end. How about using some concurrency tricks? Python supports multithreading and multiprocessing out-of-the-box, and you can use them with KoNLPy as well. Here’s an example using multithreading.
#! /usr/bin/python2.7
# -*- coding: utf-8 -*-
from konlpy.tag import Kkma
from konlpy.corpus import kolaw
from threading import Thread
import jpype
def do_concurrent_tagging(start, end, lines, result):
jpype.attachThreadToJVM()
l = [k.pos(lines[i]) for i in range(start, end)]
result.append(l)
return
if __name__=="__main__":
import time
print('Number of lines in document:')
k = Kkma()
lines = kolaw.open('constitution.txt').read().splitlines()
nlines = len(lines)
print(nlines)
print('Batch tagging:')
s = time.clock()
result = []
l = [k.pos(line) for line in lines]
result.append(l)
t = time.clock()
print(t - s)
print('Concurrent tagging:')
result = []
t1 = Thread(target=do_concurrent_tagging, args=(0, int(nlines/2), lines, result))
t2 = Thread(target=do_concurrent_tagging, args=(int(nlines/2), nlines, lines, result))
t1.start(); t2.start()
t1.join(); t2.join()
m = sum(result, []) # Merge results
print(time.clock() - t)
Console:
Number of lines in document:
356
Batch tagging:
37.758173
Concurrent tagging:
8.037602
Check out how much faster it gets!
Note