The number (and dates) in Excel are float numbers. How these numbers are displyed to an user — as an integer, or with two digits after a point, etc — are defined by the cell format. Unfortunately, xlrd does not support number formatting. It is your task to interpret the format and display the number as expected. My code can probably help. Download xlrd-format-excel-number
The problem for an user
This is how you format a table in Excel:
Let’s dump the the table (see the attached code “dump-sheet.py”):
wb = xlrd.open_workbook('invoice.xls', formatting_info=1) sh = wb.sheets() for i_row in range(sh.nrows): for i_col in range(sh.ncols): ce = sh.cell(i_row,i_col) s = as_display_string(ce) smart_print(s)
A naive approach to get a value from a cell is:
def as_display_string(cell): return str(cell.value)
The result is:
what | item price | N | total price | foo | 23.25 | 4.0 | 93.0 | bar | 5.0 | 2.0 | 10.0 | | | | | Total (no tax) | | | 103.0 | Tax 19% | | | 19.57 | | | | |
“N” can’t be “4.0” or “2.0”, it should be “4” or “2”
Price can’t be “5.0”, it should be “5.00”, or more precisely (in Germany) “5,00”.
The problem for a programmer
The format specification is too hard for a quick implementation:
Nobody has enough time to implement everything.
And there are also border and hidden cases, like this format for currency:
#,##0.00\ [$€-407];[RED]\-#,##0.00\ [$€-407]
I just have no slightest idea how to get the euro-sign from the format chunk “
[$€-407]“. And there are also conditions. And locale-aware formatting. No, I can’t.
* Find a substring, which consists of a mix of zeros, hashes, dots and commas.
* Then to use only this substring for further formatting.
It works at least for integer numbers and numbers with two digits after a comma. For my needs, it is 100% of all the use cases. It should work also for many other formats.
If your formats are more complex than the code can handle, you are welcome to extend the code in “numfmt.py”. Don’t forget to add tests into “numfmt_test.py”.
The example is “dump-sheet.py”. First you have to decide which separators for number part you want to use. A good idea is to get them from the locale user settings.
locale.setlocale(locale.LC_ALL, '') l = locale.localeconv() div1 = l['decimal_point'] or '.' div1000 = l['thousands_sep'] or ','
The sample function format_number takes three parameters:
* f is the float number to format,
* cell and wb are the cell and workbook objects, they are used to find the format string.
The code to get s_fmt (the format string) is found somewhere in the internet, now I can’t remember the source.
def format_number(f, cell, wb): xf = wb.xf_list[cell.xf_index] fmt_key = xf.format_key fmt = wb.format_map[fmt_key] s_fmt = fmt.format_str a_fmt = numfmt.extract_number_format(s_fmt) if a_fmt: s_f = numfmt.format_number(f, a_fmt, div1000, div1) else: s_f = str(f) return s_f
The last remark: the call to xlrd.open_workbook should contain the parameter formatting_info=1, otherwise you can’t access the format strings.
Now the python code prints the table correct:
what | item price | N | total price | foo | 23,25 | 4 | 93,00 | bar | 5,00 | 2 | 10,00 | - | - | - | - | Total (no tax) | - | - | 103,00 | Tax 19% | - | - | 19,57 | - | - | - | - | Total | - | - | 122,57 |