Morphological analysis and POS tagging

Morphological analysis is the identification of the structure of morphemes and other linguistic units, such as root words, affixes, or parts of speech.

POS (part-of-speech) tagging is the process of marking up morphemes in a phrase, based on their definitions and contexts. For example.:

가방에 들어가신다 -> 가방/NNG + /JKM + 들어가/VV + /EPH + ㄴ다/EFN

POS tagging with KoNLPy

In KoNLPy, there are several different options you can choose for POS tagging. All have the same input-output structure; the input is a phrase, and the output is a list of tagged morphemes.

For detailed usage instructions see the tag Package.

See also

Korean POS tags comparison chart

Compare POS tags between several Korean analytic projects. (In Korean)

Comparison between POS tagging classes

Now, we do time and performation analysis for executing the pos method for each of the classes in the tag Package. The experiments were carried out on a Intel i7 CPU with 4 cores, Python 2.7, and KoNLPy 0.4.1.

Time analysis [1]

  1. Loading time: Class loading time, including dictionary loads.

  2. Execution time: Time for executing the pos method for each class, with 100K characters.

    If we test among a various number of characters, all classes’ execution times increase in an exponential manner.

    ../_images/time.png

Performance analysis

The performance evaluation is replaced with result comparisons for several sample sentences.

  1. “아버지가방에들어가신다”

    We can check the spacing algorithm through this example. Desirably, an analyzer would parse this sentence to 아버지가 + 방에 + 들어가신다 (My father enters the room), rather than 아버지 + 가방에 + 들어가신다 (My father goes in the bag). Hannanum and Komoran are careful in spacing uncertain terms, and defaults the whole phrase to nouns. Kkma is more confident, but gets undesirable results. For this result, Mecab shows the best results.

Hannanum Kkma Komoran Mecab Twitter
아버지가방에들어가 / N 아버지 / NNG 아버지가방에들어가신다 / NNP 아버지 / NNG 아버지 / Noun
이 / J 가방 / NNG   가 / JKS 가방 / Noun
시ㄴ다 / E 에 / JKM   방 / NNG 에 / Josa
  들어가 / VV   에 / JKB 들어가신 / Verb
  시 / EPH   들어가 / VV 다 / Eomi
  ㄴ다 / EFN   신다 / EP+EC  
  1. “나는 밥을 먹는다” vs “하늘을 나는 자동차”

    If we focus on “나는” in both sentences, we can see whether an analyzer considers the context of words. “나는” in the first sentence should be 나/N + 는/J, and in the second sentence 나(-ㄹ다)/V + 는/E. Kkma properly understands the latter “나는” as a verb, wheras the rest observe it as nouns.

Hannanum Kkma Komoran Mecab Twitter
나 / N 나 / NP 나 / NP 나 / NP 나 / Noun
는 / J 는 / JX 는 / JX 는 / JX 는 / Josa
밥 / N 밥 / NNG 밥 / NNG 밥 / NNG 밥 / Noun
을 / J 을 / JKO 을 / JKO 을 / JKO 을 / Josa
먹 / P 먹 / VV 먹 / VV 먹 / VV 먹는 / Verb
는다 / E 는 / EPT 는다 / EC 는다 / EC 다 / Eomi
  다 / EFN      
Hannanum Kkma Komoran Mecab Twitter
하늘 / N 하늘 / NNG 하늘 / NNG 하늘 / NNG 하늘 / Noun
을 / J 을 / JKO 을 / JKO 을 / JKO 을 / Josa
나 / N 날 / VV 나 / NP 나 / NP 나 / Noun
는 / J 는 / ETD 는 / JX 는 / JX 는 / Josa
자동차 / N 자동차 / NNG 자동차 / NNG 자동차 / NNG 자동차 / Noun
  1. “아이폰 기다리다 지쳐 애플공홈에서 언락폰질러버렸다 6+ 128기가실버ㅋ”

    How do each of the analyzers deal with slang, or terms that are not included in the dictionary?

Hannanum Kkma Komoran Mecab Twitter
아이폰 / N 아이 / NNG 아이폰 / NNP 아이폰 / NNP 아이폰 / Noun
기다리 / P 폰 / NNG 기다리 / VV 기다리 / VV 기다리 / Verb
다 / E 기다리 / VV 다 / EC 다 / EC 다 / Eomi
지치 / P 다 / ECS 지치 / VV 지쳐 / VV+EC 지쳐 / Verb
어 / E 지치 / VV 어 / EC 애플 / NNP 애플 / Noun
애플공홈 / N 어 / ECS 애플 / NNP 공 / NNG 공홈 / Noun
에서 / J 애플 / NNP 공 / NNG 홈 / NNG 에서 / Josa
언락폰질러버렸다 / N 공 / NNG 홈 / NNG 에서 / JKB 언락폰 / Noun
6+ / N 홈 / NNG 에서 / JKB 언락 / NNG 질 / Verb
128기가실벜 / N 에서 / JKM 언 / NNG 폰 / NNG 러 / Eomi
  언락 / NNG 락 / NNG 질러버렸 / VV+EC+VX+EP 버렸 / Verb
  폰 / NNG 폰 / NNG 다 / EC 다 / Eomi
  질르 / VV 지르 / VV 6 / SN 6 / Number
  어 / ECS 어 / EC + / SY + / Punctuation
  버리 / VXV 버리 / VX 128 / SN 128 / Number
  었 / EPT 었 / EP 기 / NNG 기 / Noun
  다 / ECS 다 / EC 가 / JKS 가 / Josa
  6 / NR 6 / SN 실버 / NNP 실버 / Noun
  + / SW + / SW ㅋ / UNKNOWN ㅋ / KoreanParticle
  128 / NR 128기가실벜 / NA    
  기가 / NNG      
  실버 / NNG      
  ㅋ / UN      

Note

If you would like to run the experiments yourself, run this code from your local machine.

[1]Please note that these are comparisons among KoNLPy classes, and not the original distributions.