Introduction
Importing data into R is a necessary step that, at times, can become time intensive. To ease this task, the RStudio IDE includes new features to import data from: csv, xls, xlsx, sav, dta, por, sas and stata files.
Importing data
The data import features can be accessed from the environment pane or from the tools menu. The importers are grouped into 3 categories: Text data, Excel data and statistical data. To access this feature, use the "Import Dataset" dropdown from the "Environment" pane:
Or through the "File" menu, followed by the "Import Dataset" submenu:
Importing data from Text and CSV files
Importing "From Text (readr)" files allows you to import CSV files and in general, character delimited files using the readr package. This Text importer provides support to:
- Import from the file system or a url
- Change column data types
- Skip or include-only columns
- Rename the data set
- Skip the first N rows
- Use the header row for column names
- Trim spaces in names
- Change the column delimiter
- Encoding selection
- Select quote, escape, comment and NA identifiers
For example, one can import with ease a csv form data.gov by pasting this url https://data.montgomerycountymd.gov/api/views/2qd6-mr43/rows.csv?accessType=DOWNLOAD and selecting "Import".
Importing data from Text files
Importing using "From Text (base)" enables importing text files using the base package, this is helpful to preserve compatibility with previous versions of RStudio.
Importing data from Excel files
The Excel importer provides support to:
- Import from the file system or a url
- Change column data types
- Skip columns
- Rename the data set
- Select an specific Excel sheet
- Skip the first N rows
- Select NA identifiers
For example, one can import with ease an xls file from data.gov by pasting this url http://www.fns.usda.gov/sites/default/files/pd/slsummar.xls and selecting "Update".
Notice that this file contains to tables and therefore, requires the first few rows to be removed.
We can clean this up by skipping 6 rows from this file and unchecking the "First Row as Names" checkbox.
The file is looking better but some columns are being displayed as strings when they are clearly numerical data. We can fix this by selecting "numeric" from the column dropdown.
The final step is to click "Import" to run the code under "Code Preview" and import the data into RStudio, the final result should look as follows:
Importing data from SPSS, SAS and Stata files
The SPSS, SAS and Stata importer provides support to:
- Import from the file system or a url
- Rename the data set
- Specify a model file
We can import https://github.com/rstudio/webinars/raw/master/23-Importing-Data-into-R/data/Child_Data.sav by pasting the address under File/Url and clicking "Update" followed by clicking "Import".
Need Help?
RStudio Pro customers may open a discussion with RStudio Support at any time.
You may also ask for help from R and RStudio users on community.rstudio.com. Be sure to include a reproducible example of your issue. Click here to start a new community discussion.
I'm using RStudio Version 0.99.903. But i haven't see new features to import data from: csv, xls, xlsx, sav, dta, por, sas and stata files. I see only two options (local File and Web url).
My OS is Windows 10 32bit.
Please someone can help me?
0.99.903 does not yet contain this functionality, try installing a newer preview from here instead: https://www.rstudio.com/products/rstudio/download/preview/
I am working with Rstudio preview 1.0.12 (Window 10, 64 bits)
I am trying to read a txt file, but when I want to change on of my columns from character to factor, is asking me "Please enter the format string".
What is that? and why is asking me that?
Thanks for the feedback, we are planning to improve this by asking for a comma separated list of factors.
In the meantime, you can specify the factors as follows: c("factor1", "factor2", "factor3")
I believe this function is available at rstudio-1.0.136 - Centos7 64 bit
But: I get this message:
Preparing data import requires an updated version of the readr package.
Updateing the readr package fails.
Based on the error messages, readr depends on curl
For whatever reasons, StudioR does not find libcurl
No package 'libcurl' found
Package libcurl was not found in the pkg-config search path.
Perhaps you should add the directory containing `libcurl.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libcurl' found
How can I fix that?
I recently upgraded my R studio and am now having issues with set.names.
I used to use
FileT = setNames(data.frame(t(File[,-1])), File[,1])
To put the column names in the File to be the row names in the transposed FileT.
Now it just puts all the names into the first cell of the data frame....
Anyone know what I can do to fix this?
@Matjung: See, https://github.com/jeroen/curl your probably want to install curl as `sudo yum install libcurl-devel` for Centos7.
@Camilia: I'm not aware of any changes in setNames. I would suggest opening a new question in our support forum to have some of my colleagues help you out.
The Import Dataset dropdown is a potentially very convenient feature, but would be much more useful if it gave the option to read csv files etc. as proper data frames. Currently it imports files as one of these *@!^* "tibble" things, which screws up a lot of legacy code and even some base R functions, often creating a debugging nightmare. It is particularly insidious since tibbles appear the same as data frames in the environment pane, and this support article does not even mention that the data is imported as a tibble. I am sure that Camilla is not the first, and will not be the last to be tripped up by this. Of course it is always possible to convert the tibble to a data frame after import, but that rather destroys the convenience of this feature. Would it be possible at least to give an option to import data as a data frame? You could still make tibbles the default, but at least people would be aware what class it is.
@Camilla: This is the reason for your problem. Like most people, you were probably not aware that when imported using this feature, your "File" is not a data frame, hence [, 1] indexing does not work properly i.e. it returns another "tibble" instead of a vector.
@Robert, we've added back the option to import from CSV using base functions. It is currently available on the daily builds under the "From Text (base)..." drop-down option. Would this help?
I have had a quick try with this, and it works fine. N.B. I have not extensively tested all the options, but if this is simply re-implementing what existed before, that should not be necessary. Many thanks for the quick response.
@Robert Yes, the entry "From Text (base)..." launches exactly the same components from previous versions, so we are confident it works the same way it used to.
how do we perform descriptive statistics and all other statistical analysis on imported data from excel?
I have set the excel importer and I have change the vector to import as numeric but it keeps importing as character. Any ideas on what to try to fix this problem?
@vidyasagar the question seems to generic to be answered in a comment, is there a more specific question/issue you can share?
@Hassan Alamdari, I can't reproduce this issue, could you share which version of readxl you are using and a few rows/cols of data to reproduce this issue?
It would be tremendously helpful to be able to choose the pipe character "|" as a delimiter when importing .csv files.
in a previous post you mentioned the ability in previous versions to use the base import but i dont see that in the most recent build (7/14), and i had the build just before that, is it possible to still do base import or not use readr package? i am having hte same problems that factors are considered characters which is not a problem with read.csv. i just modify the code when importing from read_csv to read.csv and only a minor issue but would be better to import without having to modify the code.
Are there any plans to add support for renaming columns in the data viewer?
@Jesse one can select a custom delimiter directly from the dialog using: Delimiter -> Other -> Inserting '|' into the text prompt.
@Joe yes, it's still possible, there is a "From Text (base)..." option under the "Import Dataset" dropdown.
@Brad no plans to support renaming columns, to my knowledge. Mostly, the import dialogs follow closely what packages provide. In the case of readr it does not provide support to rename columns; probably since other packages, like dplyr, provide a wider set of data manipulation operations.
@Javier, I don't see that, i just see CSV, Excel, SAS, STATA and SPSS, no text.
@Joe Right, try the RStudio 1.1 preview release instead by downloading from https://www.rstudio.com/products/rstudio/download/preview/
It would be useful to provide the option to import .RData in the drop-down menu with the other options. I work with several SAS users whose go to is 'File -> Import Data -> Big drop down list with everything.' It would save me time if there were an equivalent 'one-stop shop' button for data importing I could teach only once.
Hello Javier,
I have the latest version of Rstudio installed on my PC but there is no option to import csv files in the "Import dataset" dropdown.
Can you help me, please?
Hi Hashem, which version of RStudio is installed? You can get this from the help menu, then "About RStudio"
I'm having trouble with importing Excel files. I have a drop down, but there are only 3 options, one of which is Excel. However, I get an error that I am missing readxl, which I try to load, but which always aborts. I'm running 1.1.383 R Studio Server on Ubuntu.
@Scott, could you share the error you are getting while RStudio tried to install readxl? Would be worth opening a github issue for this error in the readxl repo as well: github.com/tidyverse/readxl
@Scott, I would also try to manually install readxl by running install.packages("readxl") and share any errors here and the readxl repo mentioned above.
Javier, I'm using this version 1.1.383! can you help me with this issue, please.