How to read and write to a CSV File using Pandas

5 min readFeb 15, 2021

Reading a CSV from a file with Python and Pandas is super simple and something that you are likely to have to do many times as a data scientist.

As an example let's read an image dataset I gathered of all the paintings I could find about the Nativity.

I will not go into detail on how I was able to gather this dataset, but if you are curious I have previously created a video showing how I gathered images using Panda data frames and BeautifulSoup.

You can find it here:

Let’s first import the pandas library so we can read our CSV:

import pandas

To understand how to read CSV using pandas.read_csv, let’s use Python’s help function:

help(pandas.read_csv)Help on function read_csv in module pandas.io.parsers:

read_csv(filepath_or_buffer:Union[str, pathlib.Path, IO[~AnyStr]], sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal:str='.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
    Read a comma-separated values (csv) file into DataFrame.

    Also supports optionally iterating or breaking of the file
    into chunks.

    Additional help can be found in the online docs for
    `IO Tools <https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html>`_.

    Parameters
    ----------
    filepath_or_buffer : str, path object or file-like object
        Any valid string path is acceptable. The string could be a URL. Valid
        URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is
        expected. A local file could be: file://localhost/path/to/table.csv.

        If you want to pass in a path object, pandas accepts any ``os.PathLike``.

        By file-like object, we refer to objects with a ``read()`` method, such as
        a file handler (e.g. via builtin ``open`` function) or ``StringIO``.
    sep : str, default ','
        Delimiter to use. If sep is None, the C engine cannot automatically detect
        the separator, but the Python parsing engine can, meaning the latter will
        be used and automatically detect the separator by Python's builtin sniffer
        tool, ``csv.Sniffer``. In addition, separators longer than 1 character and
        different from ``'\s+'`` will be interpreted as regular expressions and
        will also force the use of the Python parsing engine. Note that regex
        delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``.
    delimiter : str, default ``None``
        Alias for sep.
    header : int, list of int, default 'infer'
        Row number(s) to use as the column names, and the start of the
        data.  Default behavior is to infer the column names: if no names
        are passed the behavior is identical to ``header=0`` and column
        names are inferred from the first line of the file, if column
        names are passed explicitly then the behavior is identical to
        ``header=None``. Explicitly pass ``header=0`` to be able to
        replace existing names. The header can be a list of integers that
        specify row locations for a multi-index on the columns
        e.g. [0,1,3]. Intervening rows that are not specified will be
        skipped (e.g. 2 in this example is skipped). Note that this
        parameter ignores commented lines and empty lines if
        ``skip_blank_lines=True``, so ``header=0`` denotes the first line of
        data rather than the first line of the file.
  ...

There are quite a lot of parameters that pandas.read_csv will accept, for many use-cases that you might encounter.

For now, let’s keep it simple and just read our CSV:

pd = pandas.read_csv("nativity_dataset.csv") 
display(pd)

It was that easy! You will notice that I used the display function to show the panda data frame with a nice look and feel. You can use the print() function instead, but it will not look so pretty:

print(pd)Image URL ... Labels 0 https://d3d00swyhr67nd.cloudfront.net/w1200h12... ... NaN 1 https://d3d00swyhr67nd.cloudfront.net/w1200h12... ... NaN 2 https://d3d00swyhr67nd.cloudfront.net/w1200h12... ... NaN 3 https://d3d00swyhr67nd.cloudfront.net/w1200h12... ... NaN 4 https://d3d00swyhr67nd.cloudfront.net/w944h944... ... NaN .. ... ... ... 215 https://upload.wikimedia.org/wikipedia/commons... ... NaN 216 https://preview.redd.it/4tc6qepayw521.jpg?widt... ... NaN 217 https://upload.wikimedia.org/wikipedia/commons... ... NaN 218 https://upload.wikimedia.org/wikipedia/commons... ... NaN 219 https://upload.wikimedia.org/wikipedia/commons... ... NaN [220 rows x 3 columns]

You don’t always have to use the display() function. In certain conditions, you can omit it.

For instance:

pd

What happened there?¶

The python notebook called the display function for us because it was the last value in the python block.

If you don’t like the column headers you can easily change them when reading the CSV:

pd = pandas.read_csv("nativity_dataset.csv", names=["Precise Image URL", "Precise Source URL", "Precise Labels"]) 
pd

It was almost what we wanted. But seems like the previous column names are still there. No worries, it is easy to fix, using header parameter.

pd = pandas.read_csv("nativity_dataset.csv", header=0, names=["Precise Image URL", "Precise Source URL", "Precise Labels"]) 
pd

You will notice that we have a column without any data: (“Precise Labels”). Let’s delete it.

del pd["Precise Labels"] 
pd

The column is gone. Let’s save the panda data frame to a CSV

pd.to_csv("nativity_dataset_updated.csv")

That was easy, right? There is a lot more that we can do with pandas and CSVs but I am sure this will help you get started.

That will be all for now. Happy Coding!

RESOURCES

Google Colaboratory

Edit description

colab.research.google.com

armindocachada/working-with-pandas-series

This is the repository for all the python notebooks that I created for my Youtube series, working with Panda Dataframes…

github.com

pandas.read_csv - pandas 1.2.2 documentation

pandas. read_csv( filepath_or_buffer, sep= , delimiter=None, header='infer', names=None, index_col=None, usecols=None…

pandas.pydata.org

Originally published at https://spltech.co.uk.

How to read and write to a CSV File using Pandas

What happened there?¶

RESOURCES

Google Colaboratory

Edit description

armindocachada/working-with-pandas-series

This is the repository for all the python notebooks that I created for my Youtube series, working with Panda Dataframes…

pandas.read_csv - pandas 1.2.2 documentation

pandas. read_csv( filepath_or_buffer, sep= , delimiter=None, header='infer', names=None, index_col=None, usecols=None…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Armindo Cachada

No responses yet

More from Armindo Cachada

Creating a Wildlife Camera With a Raspberry PI, Python OpenCV and Tensorflow

I am a great believer that for you to learn, you need to create. And to create you need to have fun!

How to reduce the size of KVM QCOW2 images that grow out of hand

This is a very short and practical article useful to anyone who uses KVM in production and somehow needs to reduce the space used by a…

Python Automation — Smart Home — Using my smart lights to tell me about the weather

In this article I will show you how I improved my life, using Smart Lights, Python and a Raspberry pi. I am going to make a Smart Light…

How to create a 3D model of a photo using Python, NumPy and Google Colab — Part I

I recently recorded a four-part video series for my Youtube channel on how I attempted to create a 3D model out of a photo, using Python…

Recommended from Medium

Overcoming Notion’s 2k Character Limit with N8N Workflows

As you know , there is a limit of 2000 characters when using Notion API calls to create content blocks.

5 Powerful F-String Tricks Every Python Developer Should Know!

Learn five powerful f-string techniques to write cleaner, faster, and more readable Python code.

Lists

Coding & Development

Predictive Modeling w/ Python

General Coding Knowledge

Practical Guides to Machine Learning

How I Learned to Love `__init__.py`: A Simple Guide😊

💡 Heads Up! Click here to unlock this article for free if you’re not a Medium member!

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

A Comprehensive Guide to Log Searching: Tools, Techniques, and Python Implementation

Log searching is an essential part of system monitoring, debugging, and troubleshooting. With logs, we can uncover critical insights into…

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

How I Learned to Love `init.py`: A Simple Guide😊