How to read and write to a CSV File using Pandas

Image for post
Image for post

Reading a CSV from a file with Python and Pandas is super simple and something that you are likely to have to do many times as a data scientist.

As an example let's read an image dataset I gathered of all the paintings I could find about the Nativity.

I will not go into detail on how I was able to gather this dataset, but if you are curious I have previously created a video showing how I gathered images using Panda data frames and BeautifulSoup.

You can find it here:

Let’s first import the pandas library so we can read our CSV:

import pandas

To understand how to read CSV using pandas.read_csv, let’s use Python’s help function:

help(pandas.read_csv)

There are quite a lot of parameters that pandas.read_csv will accept, for many use-cases that you might encounter.

For now, let’s keep it simple and just read our CSV:

pd = pandas.read_csv("nativity_dataset.csv") 
display(pd)
Image for post
Image for post

It was that easy! You will notice that I used the display function to show the panda data frame with a nice look and feel. You can use the print() function instead, but it will not look so pretty:

print(pd)

You don’t always have to use the display() function. In certain conditions, you can omit it.

For instance:

pd
Image for post
Image for post

What happened there?

The python notebook called the display function for us because it was the last value in the python block.

If you don’t like the column headers you can easily change them when reading the CSV:

pd = pandas.read_csv("nativity_dataset.csv", names=["Precise Image URL", "Precise Source URL", "Precise Labels"]) 
pd
Image for post
Image for post

It was almost what we wanted. But seems like the previous column names are still there. No worries, it is easy to fix, using header parameter.

pd = pandas.read_csv("nativity_dataset.csv", header=0, names=["Precise Image URL", "Precise Source URL", "Precise Labels"]) 
pd
Image for post
Image for post

You will notice that we have a column without any data: (“Precise Labels”). Let’s delete it.

del pd["Precise Labels"] 
pd
Image for post
Image for post

The column is gone. Let’s save the panda data frame to a CSV

pd.to_csv("nativity_dataset_updated.csv")

That was easy, right? There is a lot more that we can do with pandas and CSVs but I am sure this will help you get started.

That will be all for now. Happy Coding!

RESOURCES

Founding Director at Spltech

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store