Copyright © 2016 JoungKyun.Kim All rights reserved.
Notice
This project move to GitHUB. After 10 seconds, redirect to GitHUB project page.
Abstract
This module was created to replace python-chardet of PYPI. Please do not be confused with python-chardet of PYPI.
This module was written with C language in order to improve the performance of python-chardet that was written with pure python code. Also, this module uses the same algorithm(Mozilla Universal Charset Detection) with python-chardet and for this reason, use the libchardet library.
Mozilla Universal Charset Detection algorithm is uesd in order to detect the character set in browser, such as Netscape, Mozilla, Firefox and so on.
This module uses the same Namespace with python-chardet for alternative python-chardet. This means that you can not be used with python-chardet.
In addtion, from 2.0.0, supports the same API with python-chardet and you can get a better performance by only changing the module without changing the code.
Repository http://svn.oops.org/wsvn/Python.chardet/trunk/
Required
libchardet library
Python 2.6 and later
Python 3.0 supports: 1.0.2 and later
Download
chardet-2.0.0.tar.bz2 Python 3 enabled - 2016.05.11
mod_chardet-1.0.2.tar.bz2 Python 3 enabled - 2016.05.07
mod_chardet-1.0.1.tar.bz2 - 2011.04.27
mod_chardet-1.0.0.tar.bz2 - 2010.08.09
If you want to download with wget, don't use default user-agent of wget! (Use -U option)
Samples
See also tests directory of source code
* compatible python-chardet usnage : => http://chardet.readthedocs.io/en/latest/usage.html#basic-usage (From 2.0.0)
* 1.x old style api
When use old style API, be careful importing with chardet_c instead of chardet
#!/usr/bin/python # -*- coding: utf-8 -*- # for compatible python 3 from __future__ import print_function import os # for compatible python 3 try: import urllib from urllib.request import urlopen from urllib.error import HTTPError except ImportError: import urllib2 from urllib2 import urlopen, HTTPError import chardet_c print ("Python chardet c binding module version: %s" % (chardet.__version__)) print () url = r'https://raw.githubusercontent.com/BYVoid/uchardet/master/test/ar/windows-1256.txt' print ('** %s => ' % os.path.basename (url), end='') try : rawdata = urlopen (url).read () r = chardet_c.detector (rawdata) print (r) except HTTPError as e : print (e)
* 1.x old style api with handle
When use old style API, be careful importing with chardet_c instead of chardet
#!/usr/bin/python # -*- coding: utf-8 -*- # for compatible python 3 from __future__ import print_function import os # for compatible python 3 try: import urllib from urllib.request import urlopen from urllib.error import HTTPError except ImportError: import urllib2 from urllib2 import urlopen, HTTPError import chardet_c urlread = lambda url: urlopen (url).read () url = r'https://raw.githubusercontent.com/BYVoid/uchardet/master/test/ar/windows-1256.txt' print ('** %s => ' % os.path.basename (url), end='') ch = chardet_c.init (); try : det = chardet_c.detect (ch, urlread (url), err) if ( det == None ) : print ("Error: %s" % err) print ("encoding: %-15s, confidence: %.2f" % (det.encoding, det.confidence)) except HTTPError as e : print (e) chardet_c.destroy (ch);
Copyright & License
Copyright (c) 2016 JoungKyun.Kim <http://oops.org> All rights reserved. This program is under MPL 1.1 or LGPL 2.1