본문으로 바로가기

파이썬 langdetect(언어감지) 모듈 소개

category Language/Python 2020. 7. 13. 18:32
 

langdetect

Language detection library ported from Google's language-detection.

pypi.org

모듈 설치

pip install langdetect

py36) PS C:\Users\Desktop\langdetect_> pip install langdetect
Requirement already satisfied: langdetect in c:\users\d-wook\.conda\envs\py36\lib\site-packages (1.0.8)
Requirement already satisfied: six in c:\users\d-wook\.conda\envs\py36\lib\site-packages (from langdetect) (1.12.0)

 

지원

Supported Python versions 2.7, 3.4+.

langdetect supports 55 languages out of the box (ISO 639-1 codes)
af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he,
hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl,
pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw

 

예제

 

>>> from langdetect import detect
>>>
>>> detect("안녕하세요. 한국인입니다.")
'ko'
>>> detect("お早はようございま")
'ja'
>>> detect('Why are you doing it?')
'en'
>>> detect("Ein, zwei, drei, vier")
'de'

 

분류 모델인것 같습니다.

>>> from langdetect import detect_langs
>>> detect_langs("Otec matka syn.")
[fi:0.8571406958398866, pl:0.14285796186836378]

 

테스트를 계속해보니, 간혹 다른 언어로 감지 됩니다.

알파벳을 사용하는 언어가 많다보니 ...

>>> detect('Why are you doing it?')
'en'
>>> detect('I like listening to music.')
'no'
>>> detect('Im a boy')
'en'
>>> detect('How much is it?')
'en'
>>> detect('Did you hear that?')
'en'
>>> detect("What's your hobby?")
'en'
>>> detect("I'd like to get something to eat.")
'en'
>>> detect("I'm just not hungry.")
'sv'

 

짧거나 모호한 문장을 넣게되면 매번 다른 결과가 나올 수 있다고 합니다.

https://pypi.org/project/langdetect/