Copyright © 2016 JoungKyun.Kim All rights reserved.

Notice

This project move to GitHUB. After 10 seconds, redirect to GitHUB project page.

Abstract

This module was created to replace python-chardet of PYPI. Please do not be confused with python-chardet of PYPI.

This module was written with C language in order to improve the performance of python-chardet that was written with pure python code. Also, this module uses the same algorithm(Mozilla Universal Charset Detection) with python-chardet and for this reason, use the libchardet library.

Mozilla Universal Charset Detection algorithm is uesd in order to detect the character set in browser, such as Netscape, Mozilla, Firefox and so on.

This module uses the same Namespace with python-chardet for alternative python-chardet. This means that you can not be used with python-chardet.

In addtion, from 2.0.0, supports the same API with python-chardet and you can get a better performance by only changing the module without changing the code.

Repository http://svn.oops.org/wsvn/Python.chardet/trunk/

Required

libchardet library
Python 2.6 and later
Python 3.0 supports: 1.0.2 and later

Download

chardet-2.0.0.tar.bz2 Python 3 enabled - 2016.05.11
mod_chardet-1.0.2.tar.bz2 Python 3 enabled - 2016.05.07
mod_chardet-1.0.1.tar.bz2 - 2011.04.27
mod_chardet-1.0.0.tar.bz2 - 2010.08.09

If you want to download with wget, don't use default user-agent of wget! (Use -U option)

Samples

See also tests directory of source code

* compatible python-chardet usnage : => http://chardet.readthedocs.io/en/latest/usage.html#basic-usage (From 2.0.0)

* 1.x old style api
  When use old style API, be careful importing with chardet_c instead of chardet

        #!/usr/bin/python
        # -*- coding: utf-8 -*-

        # for compatible python 3
        from __future__ import print_function
        import os

        # for compatible python 3
        try:
            import urllib
            from urllib.request import urlopen
            from urllib.error import HTTPError
        except ImportError:
            import urllib2
            from urllib2 import urlopen, HTTPError

        import chardet_c

        print ("Python chardet c binding module version: %s" % (chardet.__version__))
        print ()
        url = r'https://raw.githubusercontent.com/BYVoid/uchardet/master/test/ar/windows-1256.txt'
        print ('** %s => ' % os.path.basename (url), end='')

        try :
            rawdata = urlopen (url).read ()
            r = chardet_c.detector (rawdata)
            print (r)
        except HTTPError as e :
            print (e)
	

* 1.x old style api with handle
  When use old style API, be careful importing with chardet_c instead of chardet

        #!/usr/bin/python
        # -*- coding: utf-8 -*-

        # for compatible python 3
        from __future__ import print_function
        import os

        # for compatible python 3
        try:
            import urllib
            from urllib.request import urlopen
            from urllib.error import HTTPError
        except ImportError:
            import urllib2
            from urllib2 import urlopen, HTTPError

        import chardet_c

        urlread = lambda url: urlopen (url).read ()

        url = r'https://raw.githubusercontent.com/BYVoid/uchardet/master/test/ar/windows-1256.txt'
        print ('** %s => ' % os.path.basename (url), end='')

        ch = chardet_c.init ();

        try :
            det = chardet_c.detect (ch, urlread (url), err)
            if ( det == None ) :
                print ("Error: %s" % err)
            print ("encoding: %-15s, confidence: %.2f" % (det.encoding, det.confidence))			
        except HTTPError as e :
            print (e)

        chardet_c.destroy (ch);
	

Copyright & License

        Copyright (c) 2016 JoungKyun.Kim <http://oops.org>
        All rights reserved.
        
		This program is under MPL 1.1 or LGPL 2.1