R Package installation problems on older enterprise operating systems (e.g. RHEL7 and CentOS 7)

Follow

Problem statement

Major versions of Enterprise grade Operating Systems do have a very long lifetime. For RHEL this is 12+ years. The core value of an Enterprise OS is that the ABI and API of core packages will be stable and unchanged. Any bug fixes, security erratum etc... that arise in later versions and affect the respective version in the Enterprise OS will be backported to the version present by the vendor as patch releases. Having this stability with respect to API and ABI ensures that code built on RHEL7 8 years ago will still run today on a system with bug fixes and erratum applied. 

R (and Python) on the contrary are rapidly evolving programming languages where package authors frequently opt to use the latest upstream version of open-source software as dependencies for their R package. Those versions however are either not available in the OS at all or in a much older version. As a consequence the package installation is likely to fail. 

For the below we will show how typical package installation problems on a RHEL7 / CentOS 7 system can be solved. 

 

R Packages that depend on gdal, geos and proj

See a list of R packages in the Appendix below. 

Versions available in RHEL 7 / CentOS 7 and upstream

Software Package RHEL 7 / Centos 7 Latest upstream (*)
gdal 1.11.4 3.6.0 
proj 4.8.0 9.1.1
geos 3.4.2 3.11.1

 

Solution summary

The general strategy for the solution is to interfere with the OS at a minimum and make any modification only accessible to the R installation(s)

The new software versions we will install from a third-party repository. We choose the pgdg-common repo from the PostgreSQL community repository. This does not only include PostgreSQL software but also dependencies that include gdal, proj, and geos in fairly recent versions. 

Finally, we will configure the system so that at compile time those libraries can be found and linked against. We achieve this by configuring/extending the PATH and PKG_CONFIG_PATH environment variable.

The solution will modify the R installation and hence is independent of any Posit Product. 

Installing software

First we install the RPM containing the repository information. For RHEL 7 / CentOS 7 you can choose this RPM:

sudo yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

Once this is done, we install the packages. Here we either can explore which versions are available by running 

yum --disablerepo=* --enablerepo=pgdg-common search gdal
yum --disablerepo=* --enablerepo=pgdg-common search geos
yum --disablerepo=* --enablerepo=pgdg-common search proj

to figure out the latest version available for the respective software. At the time of writing this article, gdal was available in version 3.4, geos in version 3.11 and proj in version 8.1. In order to keep the amount of packages to be installed, we only install `gdal` 3.4, `geos` 3.10 and `proj` 7.2 to respect inter-dependencies between those 3 packages. 

sudo yum install gdal34-devel geos310-devel proj72-devel

Note the versioning scheme: gdal34 corresponds to GDAL 3.4 and so on. Once GDAL 3.5 is available, you will find a package named gdal35 in the repo that you can use. The installation of the -devel packages will ensure the header files as well as the dynamically linkable libraries are installed. 

Configuring software 

When compiling a piece of software, various mechanisms can be employed to find out how to link against libraries, how to find include files etc... For this approach we use pkg-config. Software that uses this provides a text file with extension .pc where it includes all the relevant metadata. By setting/expanding an environment variable PKG_CONFIG_PATH we make the build scripts aware where to find those files. Our approach is to only set this for any existing R installation in their respective Rprofile.site file.  

In parallel, we also add the folders containing binaries to the PATH environment variable. This is important for geos-config for example.  

The below code snippet should be added to any R installation's Rprofile.site

# For gdal, geos and proj to work we prefix PKG_CONFIG_PATH and PATH 

temp_pkg_path<-Sys.getenv("PKG_CONFIG_PATH")

new_pkg_path<-"/usr/gdal34/lib/pkgconfig:/usr/geos310/lib64/pkgconfig:/usr/proj72/lib/pkgconfig"

if (is.na(temp_pkg_path) || temp_pkg_path != '') {
Sys.setenv(PKG_CONFIG_PATH=paste0(new_pkg_path,":",temp_pkg_path))
} else {
Sys.setenv(PKG_CONFIG_PATH=new_pkg_path)
}

temp_path<-Sys.getenv("PATH")
new_path<-"/usr/gdal34/bin:/usr/geos310/bin:/usr/proj72/bin"

if (is.na(temp_path) || temp_path != '') {
Sys.setenv(PATH=paste0(new_path,":",temp_path))
} else {
Sys.setenv(PATH=new_path)
}

Note the reference to the versions again. If you are installing different versions, the exact script above will differ in the lines defining new_pkg_path and new_path

Appendix

Packages that directly depend on gdal, geos and proj(*)

gdal

CoordinateCleaner, FedData, GWnnegPCA, GWpcor, IceSat2R, MODIStsp, PlanetNICFI, RCzechia, concaveman, deforestable, ebvcube, extRatum, foieGras, gdalcubes, ggseg, happign, mapme.biodiversity, mlr, pRecipe, rgdal, sen2r, sf, smile, terra, tiler, vapour

geos

GWnnegPCA, GWpcor, RCzechia, apcf, concaveman, deforestable, exactextractr, extRatum, foieGras, geos, geostan, ggseg, happign, lwgeom, mlr, rgeos, sen2r, sf, smile, spatsoc, terra

proj

GWnnegPCA, GWpcor, MODIStsp, ProjectionBasedClustering, R2admb, RCzechia, SGP, concaveman, deforestable, extRatum, foieGras, gdalcubes, ggseg, happign, lwgeom, mapme.biodiversity, mlr, proj4, reproj, rgdal, sen2r, sf, smile, terra, vapour

Any other package depending on one of the listed packages indirectly depends on the existence of the respective OS package as well. 

(*) as of December 2022

Comments

  • Avatar
    James Braithwaite

    Hi Michael,

    Thank you for this article! It has saved me quite a bit of trouble this week alone. Wanted to let you know there's a minor typo on line 7 of the .RProfile configuration you shared. tmp_pkg_path instead of temp_pkg_path

  • Avatar
    Michael Mayer

    Hi James, glad to hear this material is actually useful ! Thanks for making me aware of the typo - I have fixed it now accordingly !