Best Practices for Using Python with RStudio Connect

Follow

Posit/RStudio Connect allows you to deploy Shiny applications, R Markdown reports and Plumber APIs that use Python via the reticulate package. This allows data science teams to create content that combines the best features and libraries of both R and Python.

The concepts and best practices described on this page will help you work with Python in RStudio and make the deployment of Python content to RStudio Connect less prone to errors and frustration.

 

Reproducible Python and R Environments

Adding a new language to a data science project increases the complexity of development and deployment of any applications and notebooks since you need to manage R code and its dependencies as well as Python code and its dependencies.

This increased complexity and the differences between the development environment on your local machine and the RStudio Connect environment can make the deployment of these applications difficult. For more information on environments and considerations, refer to the best practices on reproducible environments for data science projects.

 

Use Python native tools for environments and package management

reticulate includes some convenient functions to install Python packages and manage environments such as: py_install(), conda_create(), virtualenv_create(), use_python().

This functions serve as an easy way for R users to get started with reticulate and Python. However, these functions are likely to result in errors and inconsistencies between development on a local machine and deployment in RStudio Connect. It is not recommended to use these functions inside in a project that you deploy to RStudio Connect because the functions will get executed at deployment time and will likely make a deployment unsuccessful.

In general, we recommend migrating to standard Python tooling such as virtualenv and pip when you are more comfortable with Python and are ready to deploy a project to RStudio Connect as this tooling is more likely to result in a successful deployment. 

Note that each Python installation on the RStudio Connect server is required to have the pip and virtualenv Python packages installed. virtualenv is used to create content-specific environments and pip is used to install Python packages. 

Which version of Python to use

Under the same recommendation of using standard Python tooling, you should not use the system Python that is included on systems such as macOS or Linux. Installing and upgrading libraries within the system/framework installation of Python can corrupt core system functionality.

In general, we recommend installing a standalone Python installation. Posit packages Python for easy installation into /opt/python/{PYTHON_VERSION}. If you would like to build Python yourself, the install Python from source instructions will give you the same results.

This gives you other advantages such as managing multiple versions of Python on the same system without package and version conflicts.

While Conda and Anaconda are excellent software packages, we find they introduce complexities that can make troubleshooting difficult, which is why Posit provides pre-built packages.

 

Use a virtualenv in every project

Like all software projects, data science projects should be reproducible and portable, which will make them easier to deploy with RStudio Connect. In R, you should be using packrat and in Python you should use virtualenv, pipenv, or poetry in every project.

virtualenv, pipenv, and poetry, allow you to control the version of Python and the version of each dependency for consistent results.

We recommend having the virtualenv directory in the root of your R project because it's easier to track. For example, to create a virtual environment in the project directory you can do:

# Go to the project directory
cd <PROJECT DIR>

# create a virtualenv with the version of Python called
/opt/python/3.11.1/bin/python3 -m venv .venv

You can then add .venv to the ignored files in version control system such as .gitignore

It is also recommended that you capture the packages related to your Python environment, the most common one being requirements.txt.

Note that you should always have the numpy Python package installed in your environments because this is a requirement for reticulate to move data between R and Python.

 

The RETICULATE_PYTHON environment variable

Once you have a created virtual environment, you need to point reticulate to the correct version of Python. The recommended way is to use the RETICULATE_PYTHON environment variable.

This environment variable is used by the rsconnect package when deploying to RStudio Connect to discover the dependencies of a Python project. The easiest way to set this is on a per-project basis, for example, in the .Rprofile of a project:

Sys.setenv(RETICULATE_PYTHON = ".venv/bin/python")

When deploying the app using the publish wizard in RStudio do not add .Rprofile to the bundle as RStudio connects, recreates the environment, and manages this for you on the deployment environment.

 

Python versions

Using virtualenv and the RETICULATE_PYTHON environment variable allows you to pin the Python version that will be used by RStudio Connect to recreate the environment, after that, the administrator just needs to be sure that the correct versions of Python are installed on the server. Refer to the support article on Configuring Python with RStudio Server Pro and RStudio Connect for more information.

 

See also

  1. FAQ on Using Python with RStudio Connect
  2. Troubleshooting Python with RStudio Connect

Comments