Python practice: Using Excel data

Page 2: Writing to an Excel worksheet with pandas

Contents

For this purpose, the pandas library provides the DataFrame method to_excel(). The following instruction creates a worksheet in the Demo1.xlsx workbook with the name Turnover 2nd quarter 2023 and writes the data stored in the previously created DataFrame (see previous listing) to it.

data_frame.to_excel('Demo1.xlsx', sheet_name='Umsätze 2. Quartal 2023', index=False)

Please note that the target workbook must already exist and that to_excel() overwrites any existing content there. The specification index=False prevents the data line indices stored in the DataFrame from being written; the default setting is True.

Two further parameters of the to_excel() method are columns and startcol. columns is the counterpart to the read_excel() parameter usecols. If developers have not already made a preselection when importing the Excel data into the DataFrame, they can assign a list with the desired columns to usecols. If the columns are next to each other, the startcol parameter can be used as an alternative. The integer value defines the column from which to_excel() writes to the Excel table. The startrow parameter behaves in the same way and specifies the first row of data. To redefine the column names, a list with the desired new names is passed to the header parameter. With the specification header=False, to_excel() writes the data without column headers to the Excel table.

If developers want to write calculated data to a workbook, they must first create their own DataFrame. There are several ways to do this, for example by passing the desired data to the __init__() method of the DataFrame class. The columns parameter defines the column names. The following instruction saves the sales of districts A and B of the first half of the year (first quarter and second quarter) in the DataFrame.

data_frame = pandas.DataFrame([['Bezirk A', 19500, 13500], ['Bezirk B', 18400, 25100]], columns=['', 'I. Quartal', 'II. Quartal'])

Alternatively, developers first save the data in a dictionary and then transfer this to the __init__() method. In this case, __init__()- obtains the column names for the DataFrame from the keys and the data from the values of the Dictionary elements. The following instructions create a DataFrame with the total sales per quarter and then write them to a worksheet Total sales 2023 of the workbook Demo1.xlsx.

import pandas

data_frame = pandas.read_excel('Demo.xlsx')

summe_quartal1_2023 = data_frame['I. Quartal'].sum()
summe_quartal2_2023 = data_frame['II. Quartal'].sum()
summe_quartal3_2023 = data_frame['III. Quartal'].sum()
summe_quartal4_2023 = data_frame['IV. Quartal'].sum()

umsaetze_dic = {
    'Umsatz 1. Quartal': [summe_quartal1_2023],
    'Umsatz 2. Quartal': [summe_quartal2_2023],
    'Umsatz 3. Quartal': [summe_quartal3_2023],
    'Umsatz 4. Quartal': [summe_quartal4_2023]
}

data_frame_umsaetze = pandas.DataFrame(umsaetze_dic)
data_frame_umsaetze.to_excel('Demo1.xlsx', sheet_name='Gesamtumsätze 2023', index=False)

The values in the dictionary umsaetze_dic are specified as a list, even though they have one element, as the length of the values in the dictionary must match when a dictionary is passed to the __init__() method. This is not usually the case for simple values such as the sales here, as the totals result in different amounts. there is another way to create the DataFrame in order to also note simple values in the dictionary: from_dict() converts a Dictionary directly into a DataFrame. The parameter orient specifies that the keys of the dictionary are used as indexes instead of column headings, i.e. as row headings of the DataFrame to be created. The mandatory parameter columns contains the desired column headings:

…
umsaetze_dic = {
    'Umsatz 1. Quartal': summe_quartal1_2023,
    'Umsatz 2. Quartal': summe_quartal2_2023,
    'Umsatz 3. Quartal': summe_quartal3_2023,
    'Umsatz 4. Quartal': summe_quartal4_2023
}

data_frame_umsaetze = pandas.DataFrame.from_dict(umsaetze_dic, orient='index', columns=['Summe'])
data_frame_umsaetze.to_excel('Demo1.xlsx', sheet_name='Gesamtumsätze 2023')

The data is then written to Excel in the following order:

                    Summe
Umsatz 1. Quartal   63400
Umsatz 2. Quartal   73500
Umsatz 3. Quartal   77500
Umsatz 4. Quartal   84000

The specification index=False is omitted in this variant when calling to_excel(), as the indices, which here consist of the quarterly data, should be written explicitly.