Importing Data with the RStudio IDE

Follow

Introduction

Importing data into R is a necessary step that, at times, can become time intensive. To ease this task, the RStudio IDE includes new features to import data from: csv, xls, xlsx, sav, dta, por, sas and stata files. 

data-import-rstudio-overview.gif

Importing data

The data import features can be accessed from the environment pane or from the tools menu. The importers are grouped into 3 categories: Text data, Excel data and statistical data. To access this feature, use the "Import Dataset" dropdown from the "Environment" pane:

Screen_Shot_2018-10-31_at_9.24.22_PM.png

Or through the "File" menu, followed by the "Import Dataset" submenu:

Screen_Shot_2018-10-31_at_9.28.55_PM.png

Importing data from Text and CSV files

Importing "From Text (readr)" files allows you to import CSV files and in general, character delimited files using the readr package. This Text importer provides support to:

  • Import from the file system or a url
  • Change column data types
  • Skip or include-only columns
  • Rename the data set
  • Skip the first N rows
  • Use the header row for column names
  • Trim spaces in names
  • Change the column delimiter
  • Encoding selection
  • Select quote, escape, comment and NA identifiers

For example, one can import with ease a csv form data.gov by pasting this url https://data.montgomerycountymd.gov/api/views/2qd6-mr43/rows.csv?accessType=DOWNLOAD and selecting "Import".

Screen_Shot_2018-10-31_at_9.39.02_PM.png

Importing data from Text files

Importing using "From Text (base)" enables importing text files using the base package, this is helpful to preserve compatibility with previous versions of RStudio.

Screen_Shot_2018-10-31_at_9.33.14_PM.png

Importing data from Excel files

The Excel importer provides support to:

  • Import from the file system or a url
  • Change column data types
  • Skip columns
  • Rename the data set
  • Select an specific Excel sheet
  • Skip the first N rows
  • Select NA identifiers

For example, one can import with ease an xls file from data.gov by pasting this url http://www.fns.usda.gov/sites/default/files/pd/slsummar.xls and selecting "Update".

Notice that this file contains to tables and therefore, requires the first few rows to be removed.

Screen_Shot_2018-10-31_at_9.41.09_PM.png

We can clean this up by skipping 6 rows from this file and unchecking the "First Row as Names" checkbox.

Screen_Shot_2018-10-31_at_9.41.28_PM.png 

The file is looking better but some columns are being displayed as strings when they are clearly numerical data. We can fix this by selecting "numeric" from the column dropdown.

Screen_Shot_2018-10-31_at_9.42.13_PM.png

The final step is to click "Import" to run the code under "Code Preview" and import the data into RStudio, the final result should look as follows:

Screen_Shot_2018-10-31_at_9.44.21_PM.png

Importing data from SPSS, SAS and Stata files

The SPSS, SAS and Stata importer provides support to:

  • Import from the file system or a url
  • Rename the data set
  • Specify a model file

We can import https://github.com/rstudio/webinars/raw/master/23-Importing-Data-into-R/data/Child_Data.sav by pasting the address under File/Url and clicking "Update" followed by clicking "Import".

Screen_Shot_2018-10-31_at_9.48.39_PM.png

Need Help?

RStudio Pro customers may open a discussion with RStudio Support at any time.

You may also ask for help from R and RStudio users on community.rstudio.com. Be sure to include a reproducible example of your issue. Click here to start a new community discussion.

 

Comments

  • Avatar
    Diby Konan

    I'm using RStudio Version 0.99.903. But i haven't see new features to import data from: csv, xls, xlsx, sav, dta, por, sas and stata files. I see only two options (local File and Web url).
    My OS is Windows 10 32bit.
    Please someone can help me?

  • Avatar
    Javier Luraschi

    0.99.903 does not yet contain this functionality, try installing a newer preview from here instead: https://www.rstudio.com/products/rstudio/download/preview/

  • Avatar
    mspinola10

    I am working with Rstudio preview 1.0.12 (Window 10, 64 bits)

    I am trying to read a txt file, but when I want to change on of my columns from character to factor, is asking me "Please enter the format string".
    What is that? and why is asking me that?

  • Avatar
    Javier Luraschi

    Thanks for the feedback, we are planning to improve this by asking for a comma separated list of factors.

    In the meantime, you can specify the factors as follows: c("factor1", "factor2", "factor3")

  • Avatar
    matjung

    I believe this function is available at rstudio-1.0.136 - Centos7 64 bit
    But: I get this message:
    Preparing data import requires an updated version of the readr package.

    Updateing the readr package fails.
    Based on the error messages, readr depends on curl
    For whatever reasons, StudioR does not find libcurl
    No package 'libcurl' found
    Package libcurl was not found in the pkg-config search path.
    Perhaps you should add the directory containing `libcurl.pc'
    to the PKG_CONFIG_PATH environment variable
    No package 'libcurl' found
    How can I fix that?

  • Avatar
    Camilla L. Nesbo

    I recently upgraded my R studio and am now having issues with set.names.
    I used to use
    FileT = setNames(data.frame(t(File[,-1])), File[,1])
    To put the column names in the File to be the row names in the transposed FileT.

    Now it just puts all the names into the first cell of the data frame....
    Anyone know what I can do to fix this?

  • Avatar
    Javier Luraschi

    @Matjung: See, https://github.com/jeroen/curl your probably want to install curl as `sudo yum install libcurl-devel` for Centos7.

  • Avatar
    Javier Luraschi

    @Camilia: I'm not aware of any changes in setNames. I would suggest opening a new question in our support forum to have some of my colleagues help you out.

  • Avatar
    Robert Scott

    The Import Dataset dropdown is a potentially very convenient feature, but would be much more useful if it gave the option to read csv files etc. as proper data frames. Currently it imports files as one of these *@!^* "tibble" things, which screws up a lot of legacy code and even some base R functions, often creating a debugging nightmare. It is particularly insidious since tibbles appear the same as data frames in the environment pane, and this support article does not even mention that the data is imported as a tibble. I am sure that Camilla is not the first, and will not be the last to be tripped up by this. Of course it is always possible to convert the tibble to a data frame after import, but that rather destroys the convenience of this feature. Would it be possible at least to give an option to import data as a data frame? You could still make tibbles the default, but at least people would be aware what class it is.

    @Camilla: This is the reason for your problem. Like most people, you were probably not aware that when imported using this feature, your "File" is not a data frame, hence [, 1] indexing does not work properly i.e. it returns another "tibble" instead of a vector.

  • Avatar
    Javier Luraschi

    @Robert, we've added back the option to import from CSV using base functions. It is currently available on the daily builds under the "From Text (base)..." drop-down option. Would this help?

  • Avatar
    Robert Scott

    I have had a quick try with this, and it works fine. N.B. I have not extensively tested all the options, but if this is simply re-implementing what existed before, that should not be necessary. Many thanks for the quick response.

  • Avatar
    Javier Luraschi

    @Robert Yes, the entry "From Text (base)..." launches exactly the same components from previous versions, so we are confident it works the same way it used to.

  • Avatar
    vidyasagar

    how do we perform descriptive statistics and all other statistical analysis on imported data from excel?

  • Avatar
    Hassan Alamdari

    I have set the excel importer and I have change the vector to import as numeric but it keeps importing as character. Any ideas on what to try to fix this problem?

  • Avatar
    Javier Luraschi

    @vidyasagar the question seems to generic to be answered in a comment, is there a more specific question/issue you can share?

    @Hassan Alamdari, I can't reproduce this issue, could you share which version of readxl you are using and a few rows/cols of data to reproduce this issue?

  • Avatar
    Jesse Spencer-Smith

    It would be tremendously helpful to be able to choose the pipe character "|" as a delimiter when importing .csv files.

  • Avatar
    Joe King

    in a previous post you mentioned the ability in previous versions to use the base import but i dont see that in the most recent build (7/14), and i had the build just before that, is it possible to still do base import or not use readr package? i am having hte same problems that factors are considered characters which is not a problem with read.csv. i just modify the code when importing from read_csv to read.csv and only a minor issue but would be better to import without having to modify the code.

  • Avatar
    Brad Cannell

    Are there any plans to add support for renaming columns in the data viewer?

  • Avatar
    Javier Luraschi

    @Jesse one can select a custom delimiter directly from the dialog using: Delimiter -> Other -> Inserting '|' into the text prompt.

  • Avatar
    Javier Luraschi

    @Joe yes, it's still possible, there is a "From Text (base)..." option under the "Import Dataset" dropdown.

  • Avatar
    Javier Luraschi

    @Brad no plans to support renaming columns, to my knowledge. Mostly, the import dialogs follow closely what packages provide. In the case of readr it does not provide support to rename columns; probably since other packages, like dplyr, provide a wider set of data manipulation operations.

  • Avatar
    Joe King

    @Javier, I don't see that, i just see CSV, Excel, SAS, STATA and SPSS, no text.

  • Avatar
    Javier Luraschi

    @Joe Right, try the RStudio 1.1 preview release instead by downloading from https://www.rstudio.com/products/rstudio/download/preview/

  • Avatar
    David Maislin

    It would be useful to provide the option to import .RData in the drop-down menu with the other options. I work with several SAS users whose go to is 'File -> Import Data -> Big drop down list with everything.' It would save me time if there were an equivalent 'one-stop shop' button for data importing I could teach only once.

  • Avatar
    hashem nijim

    Hello Javier,
    I have the latest version of Rstudio installed on my PC but there is no option to import csv files in the "Import dataset" dropdown.
    Can you help me, please?

  • Avatar
    Javier Luraschi

    Hi Hashem, which version of RStudio is installed? You can get this from the help menu, then "About RStudio"

  • Avatar
    Scott Overmyer

    I'm having trouble with importing Excel files. I have a drop down, but there are only 3 options, one of which is Excel. However, I get an error that I am missing readxl, which I try to load, but which always aborts. I'm running 1.1.383 R Studio Server on Ubuntu.

  • Avatar
    Javier Luraschi

    @Scott, could you share the error you are getting while RStudio tried to install readxl? Would be worth opening a github issue for this error in the readxl repo as well: github.com/tidyverse/readxl

  • Avatar
    Javier Luraschi

    @Scott, I would also try to manually install readxl by running install.packages("readxl") and share any errors here and the readxl repo mentioned above.

  • Avatar
    hashem nijim

    Javier, I'm using this version 1.1.383! can you help me with this issue, please.