Pygments
Unicode Support
« Back To IndexSince Pygments 0.6, the lexers use unicode strings internally. Because of that you might discover the occasional UnicodeDecodeError if you pass strings with the wrong encoding.
Per default all lexers have encoding set to latin1. If you pass a lexer a string object (not unicode) it tries to decode the data using this encoding. You can override the encoding using the encoding lexer option. If you have the chardet library installed and set the encoding to chardet if will ananlyse the text and fetch the best encoding automatically:
from pygments.lexers import PythonLexer lexer = PythonLexer(encoding='chardet')
The best way is to pass Pygments unicode objects. In that case you can't get unexpected output.
The formatters now send unicode objects to the stream if you don't set the encoding. You can do so by passing the formatters an encoding option:
from pygments.formatters import HtmlFormatter f = HtmlFormatter(encoding='utf-8')