MS DATA PROGRAM: DATA AND DATABASES DAY 3

CSV to SQL

We will download two CSV files from http://data.un.org/Explorer.aspx

In the Energy Statistics Database download Solar Electricity and Wind Electricity by clicking on "View Data" and then on that page click "download", and under "value separated" choose "comma".

But first, we need to create a new database in Postgres:

create database undata;

Then we can add some tables, one for solar and one for wind.

Here is the process for "solar" (please follow the same process for "wind")

The file you download will be called something like: UNdata_Export_20211104_043926118.zip

Unzip the file, and rename it something readable, like "UN_solar.csv"

Put that file in a folder on your computer (not in a remote/cloud drive, but in the computer itself). Make sure there are no spaces in the name of the folders it is in.

Maybe open the file in Excel in order to read the column names and see what kind of data is in each column. (Make sure you never save the file while in Excel--even if you made no changes, Excel may still add invisible and additional characters that will make the file unreadable in the terminal.)

Based on the columns write a CREATE TABLE statement that contains columns and datatypes for each column. For example:

CREATE TABLE solar (
  country varchar(80),
  type varchar(80),
  year int,
  unit varchar(80),
  usage double precision,
  notes varchar(80)
);

Once you successfully created the table in PostgreSQL, you need to prep this CSV for processing.

Again, make sure that you have not saved the document in Excel. Open the document in a text editor that does not add invisible characters. These include visual studio, sublimetext, bbedit, atom and others -- don't use the text editor that is preinstalled on your computer. Make sure it is made for development/programming.

Once you have found the proper text editor, open up the CSV file and delete the header, then scroll down to the bottom and delete any footers (lines that come after the rows). So your entire document should consistently look like this:

"Afghanistan","Electricity - total solar production","2021","Kilowatt-hours, million","32","1"
"Afghanistan","Electricity - total solar production","2020","Kilowatt-hours, million","31.93",""
"Afghanistan","Electricity - total solar production","2019","Kilowatt-hours, million","20.77",""
"Albania","Electricity - total solar production","2021","Kilowatt-hours, million","40.756",""
"Albania","Electricity - total solar production","2020","Kilowatt-hours, million","32.26",""
"Albania","Electricity - total solar production","2019","Kilowatt-hours, million","22.196",""
...
...
"Zambia","Electricity - total solar production","2021","Kilowatt-hours, million","147.661",""
"Zambia","Electricity - total solar production","2020","Kilowatt-hours, million","150.156",""
"Zambia","Electricity - total solar production","2019","Kilowatt-hours, million","117.51",""
"Zambia","Electricity - total solar production","2018","Kilowatt-hours, million","1.32",""
"Zambia","Electricity - total solar production","2017","Kilowatt-hours, million","0.163",""
"Zambia","Electricity - total solar production","2016","Kilowatt-hours, million","0.163",""
"Zimbabwe","Electricity - total solar production","2020","Kilowatt-hours, million","24.4",""

Make sure there are no empty lines at the end or at the beginning. Once that is prepped, save the file and close it.

Now you can run the copy command:

\copy [tablename] from [pathname] delimiter ',' csv

or, more specifically:

\copy solar from /Users/Jon/Documents/columbia_syllabus/undata-class/testing.csv delimiter ',' csv

Make sure that the full path you use has no spaces in it, and that it ends with .csv

If this worked you will get a message like this

COPY 2541

In which 2541 is the number of rows that were copied into the table!