Morphological analysis and POS tagging

Morphological analysis is the identification of the structure of morphemes and other linguistic units, such as root words, affixes, or parts of speech.

POS (part-of-speech) tagging is the process of marking up morphemes in a phrase, based on their definitions and contexts. For example.:

가방에 들어가신다 -> 가방/NNG + 에/JKM + 들어가/VV + 시/EPH + ㄴ다/EFN

POS tagging with KoNLPy

In KoNLPy, there are several different options you can choose for POS tagging. All have the same input-output structure; the input is a phrase, and the output is a list of tagged morphemes.

For detailed usage instructions see the tag Package.

See also

Korean POS tags comparison chart

Compare POS tags between several Korean analytic projects. (In Korean)

Comparison between POS tagging classes

Now, we do time and performation analysis for executing the pos method for each of the classes in the tag Package. The experiments were carried out on a Intel i7 CPU with 4 cores, Python 2.7, and KoNLPy 0.4.0.

Time analysis

  1. Loading time: Class loading time, including dictionary loads.

  2. Execution time: Time for executing the pos method for each class, with 100K characters.

    If we test among a various number of characters, all classes’ execution times increase in an exponential manner.

    ../_images/time.png

Performance analysis

The performance evaluation is replaced with result comparisons for several sample sentences.

  1. “아버지가방에들어가신다”

    We can check the spacing algorithm through this example. Desirably, an analyzer would parse this sentence to “아버지가 방에 들어가신다” (My father enters the room), rather than “아버지 가방에 들어가신다” (My father goes in the bag). Hannanum and Komoran are careful in spacing uncertain terms, and defaults the whole phrase to nouns. Kkma is more confident, but gets undesirable results. For this result, Mecab shows the best results.

Hannanum Kkma Komoran Mecab
아버지가방에들어가 / N 아버지 / NNG 아버지가방에들어가신다 / NNP 아버지 / NNG
이 / J 가방 / NNG   가 / JKS
시ㄴ다 / E 에 / JKM   방 / NNG
  들어가 / VV   에 / JKB
  시 / EPH   들어가 / VV
  ㄴ다 / EFN   신다 / EP+EC
  1. “나는 밥을 먹는다” vs “하늘을 나는 자동차”

    If we focus on “나는” in both sentences, we can see whether an analyzer considers the context of words. “나는” in the first sentence should be “나/N + 는/J”, and in the second sentence “나(-ㄹ다)/V + 는/E”. :py:`.Kkma` properly understands the latter “나는” as a verb, wheras the rest observe it as nouns.

Hannanum Kkma Komoran Mecab
나 / N 나 / NP 나 / NP 나 / NP
는 / J 는 / JX 는 / JX 는 / JX
밥 / N 밥 / NNG 밥 / NNG 밥 / NNG
을 / J 을 / JKO 을 / JKO 을 / JKO
먹 / P 먹 / VV 먹 / VV 먹 / VV
는다 / E 는 / EPT 는다 / EC 는다 / EC
  다 / EFN    
Hannanum Kkma Komoran Mecab
하늘 / N 하늘 / NNG 하늘 / NNG 하늘 / NNG
을 / J 을 / JKO 을 / JKO 을 / JKO
나 / N 날 / VV 나 / NP 나 / NP
는 / J 는 / ETD 는 / JX 는 / JX
자동차 / N 자동차 / NNG 자동차 / NNG 자동차 / NNG
  1. “아이폰 기다리다 지쳐 애플공홈에서 언락폰질러버렸다 6+ 128기가실버ㅋ”

    How do each of the analyzers deal with slang, or terms that are not included in the dictionary?

Hannanum Kkma Komoran Mecab
아이폰 / N 아이 / NNG 아이폰 / NNP 아이폰 / NNP
기다리 / P 폰 / NNG 기다리 / VV 기다리 / VV
다 / E 기다리 / VV 다 / EC 다 / EC
지치 / P 다 / ECS 지치 / VV 지쳐 / VV+EC
어 / E 지치 / VV 어 / EC 애플 / NNP
애플공홈 / N 어 / ECS 애플 / NNP 공 / NNG
에서 / J 애플 / NNP 공 / NNG 홈 / NNG
언락폰질러버렸다 / N 공 / NNG 홈 / NNG 에서 / JKB
6+ / N 홈 / NNG 에서 / JKB 언락 / NNG
128기가실벜 / N 에서 / JKM 언 / NNG 폰 / NNG
  언락 / NNG 락 / NNG 질러버렸 / VV+EC+VX+EP
  폰 / NNG 폰 / NNG 다 / EC
  질르 / VV 지르 / VV 6 / SN
  어 / ECS 어 / EC
  • / SY
  버리 / VXV 버리 / VX 128 / SN
  었 / EPT 었 / EP 기 / NNG
  다 / ECS 다 / EC 가 / JKS
  6 / NR 6 / SN 실버 / NNP
 
  • / SW
  • / SW
ㅋ / UNKNOWN
  128 / NR 128기가실벜 / NA  
  기가 / NNG    
  실버 / NNG    
  ㅋ / UN    
comments powered by Disqus
Fork me on GitHub

Table Of Contents

Related Topics