using xlrd and formatting Excel numbers

The number (and dates) in Excel are float numbers. How these numbers are displyed to an user — as an integer, or with two digits after a point, etc — are defined by the cell format. Unfortunately, xlrd does not support number formatting. It is your task to interpret the format and display the number as expected. My code can probably help. Download xlrd-format-excel-number

The problem for an user

This is how you format a table in Excel:

small excel table

Let’s dump the the table (see the attached code “dump-sheet.py”):


wb = xlrd.open_workbook('invoice.xls', formatting_info=1)
sh = wb.sheets()[0]
for i_row in range(sh.nrows):
  for i_col in range(sh.ncols):
    ce = sh.cell(i_row,i_col)
    s = as_display_string(ce)
    smart_print(s)

A naive approach to get a value from a cell is:


def as_display_string(cell):
  return str(cell.value)

The result is:


what           | item price     | N              | total price    |
foo            | 23.25          | 4.0            | 93.0           |
bar            | 5.0            | 2.0            | 10.0           |
               |                |                |                |
Total (no tax) |                |                | 103.0          |
Tax 19%        |                |                | 19.57          |
               |                |                |                |

“N” can’t be “4.0” or “2.0”, it should be “4” or “2”

Price can’t be “5.0”, it should be “5.00”, or more precisely (in Germany) “5,00”.

The problem for a programmer

The format specification is too hard for a quick implementation:

Number format codes – Excel – Office.com

Nobody has enough time to implement everything.

And there are also border and hidden cases, like this format for currency:

#,##0.00\ [$€-407];[RED]\-#,##0.00\ [$€-407]

I just have no slightest idea how to get the euro-sign from the format chunk “[$€-407]“. And there are also conditions. And locale-aware formatting. No, I can’t.

The idea

* Find a substring, which consists of a mix of zeros, hashes, dots and commas.
* Then to use only this substring for further formatting.

It works at least for integer numbers and numbers with two digits after a comma. For my needs, it is 100% of all the use cases. It should work also for many other formats.

If your formats are more complex than the code can handle, you are welcome to extend the code in “numfmt.py”. Don’t forget to add tests into “numfmt_test.py”.

Usage

The example is “dump-sheet.py”. First you have to decide which separators for number part you want to use. A good idea is to get them from the locale user settings.


locale.setlocale(locale.LC_ALL, '')
l = locale.localeconv()
div1    = l['decimal_point'] or '.'
div1000 = l['thousands_sep'] or ','

The sample function format_number takes three parameters:

* f is the float number to format,
* cell and wb are the cell and workbook objects, they are used to find the format string.

The code to get s_fmt (the format string) is found somewhere in the internet, now I can’t remember the source.


def format_number(f, cell, wb):
  xf = wb.xf_list[cell.xf_index]
  fmt_key = xf.format_key
  fmt = wb.format_map[fmt_key]
  s_fmt = fmt.format_str
  a_fmt = numfmt.extract_number_format(s_fmt)
  if a_fmt:
    s_f = numfmt.format_number(f, a_fmt, div1000, div1)
  else:
    s_f = str(f)
  return s_f

The last remark: the call to xlrd.open_workbook should contain the parameter formatting_info=1, otherwise you can’t access the format strings.

Now the python code prints the table correct:


what           | item price     | N              | total price    |
foo            | 23,25          | 4              | 93,00          |
bar            | 5,00           | 2              | 10,00          |
-              | -              | -              | -              |
Total (no tax) | -              | -              | 103,00         |
Tax 19%        | -              | -              | 19,57          |
-              | -              | -              | -              |
Total          | -              | -              | 122,57         |

Download xlrd-format-excel-number

One Response to “using xlrd and formatting Excel numbers”

  1. Oleg Says:

    The code should be improved. There is a border case: one uses the format “General”. In this case, the “a_fmt” is None, and the string representation is again bad. Solution: after

    s_f = str(f)

    add

    tail = s_f[-2:]
    if (tail == ‘,0’) or (tail == ‘.0’):
    s_f = s_f[:-2]

Leave a Reply