Downloads codecov.io Code Coverage MIT License

Introduction

This library is mainly for CLI programs that carefully produce output for Terminals, or make pretend to be an emulator.

Problem Statement: The printable length of most strings are equal to the number of cells they occupy on the screen 1 charater : 1 cell. However, there are categories of characters that occupy 2 cells (full-wide), and others that occupy 0 cells (zero-width).

Solution: POSIX.1-2001 and POSIX.1-2008 conforming systems provide wcwidth(3) and wcswidth(3) C functions of which this python module’s functions precisely copy. These functions return the number of cells a unicode string is expected to occupy.

Installation

The stable version of this package is maintained on pypi, install using pip:

pip install wcwidth

Example

Problem: given the following phrase (Japanese),

>>>  text = u'コンニチハ'

Python incorrectly uses the string length of 5 codepoints rather than the printible length of 10 cells, so that when using the rjust function, the output length is wrong:

>>> print(len('コンニチハ'))
5

>>> print('コンニチハ'.rjust(20, '_'))
_______________コンニチハ

By defining our own “rjust” function that uses wcwidth, we can correct this:

>>> def wc_rjust(text, length, padding=' '):
...    from wcwidth import wcswidth
...    return padding * max(0, (length - wcswidth(text))) + text
...

Our Solution uses wcswidth to determine the string length correctly:

>>> from wcwidth import wcswidth
>>> print(wcswidth('コンニチハ'))
10

>>> print(wc_rjust('コンニチハ', 20, '_'))
__________コンニチハ

Choosing a Version

Export an environment variable, UNICODE_VERSION. This should be done by terminal emulators or those developers experimenting with authoring one of their own, from shell:

$ export UNICODE_VERSION=13.0

If unspecified, the latest version is used. If your Terminal Emulator does not export this variable, you can use the jquast/ucs-detect utility to automatically detect and export it to your shell.

wcwidth, wcswidth

Use function wcwidth() to determine the length of a single unicode character, and wcswidth() to determine the length of many, a string of unicode characters.

Briefly, return values of function wcwidth() are:

-1
Indeterminate (not printable).
0
Does not advance the cursor, such as NULL or Combining.
2
Characters of category East Asian Wide (W) or East Asian Full-width (F) which are displayed using two terminal cells.
1
All others.

Function wcswidth() simply returns the sum of all values for each character along a string, or -1 when it occurs anywhere along a string.

Full API Documentation at http://wcwidth.readthedocs.org

Developing

Install wcwidth in editable mode:

pip install -e.

Execute unit tests using tox:

tox

Regenerate python code tables from latest Unicode Specification data files:

tox -eupdate

Supplementary tools for browsing and testing terminals for wide unicode characters are found in the bin/ of this project’s source code. Just ensure to first pip install -erequirements-develop.txt from this projects main folder. For example, an interactive browser for testing:

./bin/wcwidth-browser.py

Uses

This library is used in:

Other Languages

History

0.2.0 2020-06-01
  • Enhancement: Unicode version may be selected by exporting the Environment variable UNICODE_VERSION, such as 13.0, or 6.3.0. See the jquast/ucs-detect CLI utility for automatic detection.
  • Enhancement: API Documentation is published to readthedocs.org.
  • Updated tables for all Unicode Specifications with files published in a programmatically consumable format, versions 4.1.0 through 13.0 that are published , versions
0.1.9 2020-03-22
  • Performance optimization by Avram Lubkin, PR #35.
  • Updated tables to Unicode Specification 13.0.0.
0.1.8 2020-01-01
  • Updated tables to Unicode Specification 12.0.0. (PR #30).
0.1.7 2016-07-01
  • Updated tables to Unicode Specification 9.0.0. (PR #18).
0.1.6 2016-01-08 Production/Stable
  • LICENSE file now included with distribution.
0.1.5 2015-09-13 Alpha
  • Bugfix: Resolution of “combining character width” issue, most especially those that previously returned -1 now often (correctly) return 0. resolved by Philip Craig via PR #11.
  • Deprecated: The module path wcwidth.table_comb is no longer available, it has been superseded by module path wcwidth.table_zero.
0.1.4 2014-11-20 Pre-Alpha
0.1.3 2014-10-29 Pre-Alpha
0.1.2 2014-10-28 Pre-Alpha
0.1.1 2014-05-14 Pre-Alpha
  • Initial release to pypi, Based on Unicode Specification 6.3.0

This code was originally derived directly from C code of the same name, whose latest version is available at http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c:

* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
* for any purpose and without fee is hereby granted. The author
* disclaims all warranties with regard to this software.