We can find collocations with the help of NLTK.
In order to find trigram collocations, replace BigramAssocMeasures with TrigramAssocMeasures, and BigramCollocationFinder with TrigramCollocationFinder.
#! /usr/bin/python2.7
# -*- coding: utf-8 -*-
from konlpy.tag import Kkma
from konlpy.corpus import kolaw
from konlpy.utils import pprint
from nltk import collocations
measures = collocations.BigramAssocMeasures()
doc = kolaw.open('constitution.txt').read()
print('\nCollocations among tagged words:')
tagged_words = Kkma().pos(doc)
finder = collocations.BigramCollocationFinder.from_words(tagged_words)
pprint(finder.nbest(measures.pmi, 10)) # top 5 n-grams with highest PMI
print('\nCollocations among words:')
words = [w for w, t in tagged_words]
ignored_words = [u'안녕']
finder = collocations.BigramCollocationFinder.from_words(words)
finder.apply_word_filter(lambda w: len(w) < 2 or w in ignored_words)
finder.apply_freq_filter(3) # only bigrams that appear 3+ times
pprint(finder.nbest(measures.pmi, 10))
print('\nCollocations among tags:')
tags = [t for w, t in tagged_words]
finder = collocations.BigramCollocationFinder.from_words(tags)
pprint(finder.nbest(measures.pmi, 5))
Console:
Collocations among tagged words:
[((가부, NNG), (동수, NNG)),
((강제, NNG), (노역, NNG)),
((경자, NNG), (유전, NNG)),
((고, ECS), (채취, NNG)),
((공무, NNG), (담임, NNG)),
((공중, NNG), (도덕, NNG)),
((과반, NNG), (수가, NNG)),
((교전, NNG), (상태, NNG)),
((그러, VV), (나, ECE)),
((기본적, NNG), (인권, NNG))]
Collocations among words:
[(현행, 범인),
(형의, 선고),
(내부, 규율),
(정치적, 중립성),
(누구, 든지),
(회계, 연도),
(지체, 없이),
(평화적, 통일),
(형사, 피고인),
(지방, 자치)]
Collocations among tags:
[(XR, XSA),
(JKC, VCN),
(VCN, ECD),
(ECD, VX),
(ECD, VXV)]