This article addresses potential issues with installing the tidyverse in air gapped (offline) environments. The tidyverse package imports the stringr package, which in turn imports the stringi package.
Building the stringi package from source requires the ICU libraries. Specific requirements can be found here. As of the time of this writing, compiling stringi from source required ICU4C >= 55. If these requirements are not met, stringi will attempt to statically link the dependencies by dynamically downloading them from the internet. If there is no internet connection (i.e. you are running in an environment that is air-gapped, semi-offline, or protected by an outbound proxy), then stringi's request for system dependencies may be rejected and may cause an installation failure. Installation failure can happen when using either RStudio Server Pro or RStudio Connect.
Installing from binaries
The recommended solution is to install the required system dependencies for your operating system and install the stringi package binary from an RStudio Package Manager instance that you host behind your firewall. RStudio Package Manager is a repository management server to organize and centralize R packages across your team, department, or entire organization. Two noteworthy features in this context are that it:
- Provides a list of system dependencies for individual packages and entire repositories
- Provides binary (pre-built) R packages for Linux operating systems
As a result, RStudio Package Manager simplifies package management and installation for air-gapped Linux infrastructure by providing pre-built R packages and the list of system dependencies needed to run them.
System dependencies - Ubuntu & SUSE / SLES
- Ubuntu: libicu-dev
- SUSE / SLES: libicu-devel
The system-provided version of libicu is sufficient to satisfy stringi's requirements (>= 52). As a result, you can install stringi from source or binary because it will pick up the system version of libicu and be built properly (even in an air-gapped environment).
System dependencies - CentOS / RHEL
- Centos / RHEL: libicu-devel
On these systems, the system-provided version of libicu is not sufficient to satisfy stringi's version requirement. As a result, you cannot install stringi from source. We recommend that you install stringi from a binary which will include bundled system dependencies and will work out of the box.
Installing from source
If you cannot use RStudio Package Manger or you need to install stringi from source you can follow these instructions.
When you are installing stringi without the option of binaries from RStudio Package Manager, then your installation process will require building the package from source. If your operating system provided version of libicu satisfies stringi's requirements, then you can build the package from source even in an air-gapped or semi-offline environment.
At the time of writing, stringi's system dependencies are satisfied by the system provided version of libicu for the following operating systems. If your operating system provided version of libicu does not satisfy stringi's requirements, then you must either open up internet access to the locations that stringi tries to get its sources from or install stringi manually following the directions for air-gapped installation.
Alternatively, you can configure stringi to install against another version of ICU:
- As an admin, download the ICU4C bundle onto the server and put it in the shared folder. Make it read only.
- As an admin, set ICUDT_DIR to this location using the configure.vars option in the .Rprofile configuration file:
options(configure.vers = list(stringi = "ICUDT_DIR=<icudt_dir>")
- As a user, install stringi
install.packages("stringi")
. A users should not have any issue installing the source version once the admin has configured the system.
Options for installing stringi in RStudio Connect
1. Use RStudio Package Manager
In non-interactive environments like RStudio Connect, the user or admin does not have control over the package installation process. In this scenario, we recommend using RStudio Package Manager along with the required system dependencies. If RStudio Package Manager is not available then you will need to install stringi as an external package.
2. Install as an external package
RStudio Connect allows for manual installation of packages through use of the external packages (i.e. packages that are managed and installed manually and provided to RStudio Connect through the system library). Note: Use of external packages is highly discouraged in all situations save exceptions like stringi, rJava, ROracle, etc. that have very particular system dependency requirements.
- Install stringi manually into the system library for each R installation utilized by RStudio Connect (this is most easily done by installing the package as root, which should default to the system library)
- Add the following section to the RStudio Connect configuration at /etc/rstudio-connect/rstudio-connect.gcfg
[Packages]
External = stringi
- Restart RStudio Connect
Once this is complete, the following behavior can be expected:
- RStudio Connect will bypass installation of stringi when deploying R content and trust that it is available in the system library
- If stringi is not available in the system library for all R installations, RStudio Connect will fail to start
- All content using a given version of R will utilize the stringi installed into the system library for that R version
- This means that updating, removing, or otherwise changing stringi in the system library will simultaneously affect all RStudio Connect content using the given R version.
There is more information about this process in the RStudio Connect Admin Guide. If you have questions or difficulties about the process of getting stringi to work with RStudio Connect, please reach out to your Customer Success representative or the RStudio support team.
Why installing stringi is hard
The stringi and dependent packages are commonly used in R for string manipulation, regex parsing, and the like. However, stringi's system dependencies (libicu) can make for difficult installation in an air-gapped environment.
In an online environment, the installation process of stringi is savvy enough to go get the system dependencies it needs at build time. This makes for a very friendly and simple user experience for most users. However, in an air-gapped or semi-offline (protected by an outbound proxy) environment, this request for system dependencies from the internet may be rejected and will cause an installation failure.
Note: There are several options to get stringi installed successfully on your system. For more information on the installation process and system dependencies that the package uses, see the excellent writeup maintained by the package author.
Another alternative for installing stringi in an offline Connect environment is to download the ICU data archive (e.g. from https://raw.githubusercontent.com/gagolews/stringi/master/src/icu61/data/icudt61l.zip) and move it onto your Connect server via SCP or some other method. Then set the ICUDT_DIR environment variable to the path where you have placed this downloaded data. I put mine in $R_HOME/share/icu and added the following to $R_HOME/etc/Renviron:
ICUDT_DIR=${R_HOME}/share/icu
Make sure that this directory is world-readable (0755) and the file is also (0644).
Also be aware that modifying the installation directory of a package installed with the system package manager (yum in my case) can constitute an "audit finding", depending on the regulatory standards you are subject to. So you may want to put the .zip file somewhere else (e.g. in /var somewhere or somewhere in Connect's Server.DataDir) and you may want to set the ICUDT_DIR environment variable somewhere other than in $R_HOME/etc/Renviron (e.g. in a script passed to Connect's Server.Supervisor option).
You may want to make directories for icudtXX.zip that are specific to the version of R, so that you can pin each R version to a version of ICUDT instead of having just one server-wide ICUDT_DIR that all versions of R share. But the version of ICUDT that stringi tries to fetch hasn't changed in a couple years, so you may be fine with just one server-wide ICUDT_DIR.
stringi's installation documentation presents this as an option, see here: https://github.com/gagolews/stringi/blob/master/INSTALL#L130