R Lecture 7: Reading External Data Part II

10 minutes is a shockingly short period of time!

Today, some common errors you WILL see when you try to read in data. The code below has many typos.

Watch for Windows to mistakenly save your file as “advertising.csv.txt”. Windows might append that “.txt” without you’re being aware. Open the folder properties in “C:/myR” and unhide the file extensions.

Sometimes missing values in your spreadsheets have hidden characters, or R gets confused about the data type. Remember, R tries to guess whether each column in your data is a number or a factor or categorical variable. To avoid difficulties use the argument: na.strings=”” where “na” means “missing data” inside R.

Notice there is no space between the quotation marks. You can also specify a value that means missing. Perhaps—NEVER DO THIS—you coded missing as “99” or something equally foolish. Then you could use na.strings=”99″. We’ll talk about data coding another time.

Cut and paste or type the following NEW block of text into your myRcode.R file and SAVE it.


# Common errors
# Windows
x = read.csv("C:\myR\advertising.csv")
x = read.csv("C:/myr/advertising.csv")
x = read.csv("C:/myR/advertisng.csv")
# Mac
x = read.csv("~/Desktop/myr/advertising.csv")
x = read.csv("Desktop/myR/advertising.csv")
x = read.csv("~/Desktop/myr/advertisng.csv")
# Linux
x = read.csv("myR/advertising.csv")
x = read.csv("/home/matt/myr/advertising.csv")
x = read.csv("/home/matt/myR/advertisng.csv")
#
# Good code
?read.csv
# Windows
x = read.csv("C:/myR/advertising.csv", na.strings="")
# Mac
x = read.csv("~/Desktop/myr/advertising.csv", na.strings="")
# Linux
x = read.csv("/home/matt/myR/advertising.csv", na.strings="")

We will cut & paste this code from the file myRcode.R into the R command window. EACH TIME REMEMBERING TO HIT THE ENTER KEY (inside R).

R can be downloaded here: R-project.org. A direct link to the CRAN package archive is here.

All videos are on YouTube under the username “mattstat” (wmbriggs was taken). That service imposes a ten-minute limit of videos. Accordingly, lectures are short.

All questions to matt@wmbriggs.com.

3 Comments

  1. Frederick Davies

    I have just tried this lecture on a Windows XP installation of R 2.10.1, and I have to tell you that

    x = read.csv(“C:/myr/advertising.csv”)

    did actually work fine. As Windows is a case-insensitive but case-retentive operating system, it will allow you to use “myR” or “myr” as the name of a file of folder, but will see both as the same name. Hence a path of “C:\myr\” will reach a folder named C:\myR the same way as the path “C:\myR\”.

    Linux (and UNIX) will see “myR” and “myr” as completely different folders, but not Windows.

    Otherwise, great lectures!

  2. Briggs

    Frederick Davies,

    That darn Windows! It’s even more bizarre than you hint. Take a look at this Microsoft help file, particularly the hacks to fix case.

  3. Mart J.

    Dear Mr Briggs, An excellent an informative blog. I’m a regular visitor but part of the silent majority. I started to play around with R late last year and found your tutorials very useful. Thanks for taking the time out to post and I hope to see more lessons in the future. Kind regards.

Leave a Reply

Your email address will not be published. Required fields are marked *