Problem statement
Major versions of Enterprise grade Operating Systems do have a very long lifetime. For RHEL this is 12+ years. The core value of an Enterprise OS is that the ABI and API of core packages will be stable and unchanged. Any bug fixes, security erratum etc... that arise in later versions and affect the respective version in the Enterprise OS will be backported to the version present by the vendor as patch releases. Having this stability with respect to API and ABI ensures that code built on RHEL7 8 years ago will still run today on a system with bug fixes and erratum applied.
R (and Python) on the contrary are rapidly evolving programming languages where package authors frequently opt to use the latest upstream version of open-source software as dependencies for their R package. Those versions however are either not available in the OS at all or in a much older version. As a consequence the package installation is likely to fail.
For the below we will show how typical package installation problems on a RHEL7 / CentOS 7 system can be solved.
R Packages that depend on gdal
, geos
and proj
See a list of R packages in the Appendix below.
Versions available in RHEL 7 / CentOS 7 and upstream
Software Package | RHEL 7 / Centos 7 | Latest upstream (*) |
gdal | 1.11.4 | 3.6.0 |
proj | 4.8.0 | 9.1.1 |
geos | 3.4.2 | 3.11.1 |
Solution summary
The general strategy for the solution is to interfere with the OS at a minimum and make any modification only accessible to the R installation(s)
The new software versions we will install from a third-party repository. We choose the pgdg-common
repo from the PostgreSQL community repository. This does not only include PostgreSQL software but also dependencies that include gdal
, proj
, and geos
in fairly recent versions.
Finally, we will configure the system so that at compile time those libraries can be found and linked against. We achieve this by configuring/extending the PATH
and PKG_CONFIG_PATH
environment variable.
The solution will modify the R installation and hence is independent of any Posit Product.
Installing software
First we install the RPM containing the repository information. For RHEL 7 / CentOS 7 you can choose this RPM:
sudo yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
Once this is done, we install the packages. Here we either can explore which versions are available by running
yum --disablerepo=* --enablerepo=pgdg-common search gdal
yum --disablerepo=* --enablerepo=pgdg-common search geos
yum --disablerepo=* --enablerepo=pgdg-common search proj
to figure out the latest version available for the respective software. At the time of writing this article, gdal
was available in version 3.4, geos
in version 3.11 and proj
in version 8.1. In order to keep the amount of packages to be installed, we only install `gdal` 3.4, `geos` 3.10 and `proj` 7.2 to respect inter-dependencies between those 3 packages.
sudo yum install gdal34-devel geos310-devel proj72-devel
Note the versioning scheme: gdal34
corresponds to GDAL 3.4 and so on. Once GDAL 3.5 is available, you will find a package named gdal35
in the repo that you can use. The installation of the -devel
packages will ensure the header files as well as the dynamically linkable libraries are installed.
Configuring software
When compiling a piece of software, various mechanisms can be employed to find out how to link against libraries, how to find include files etc... For this approach we use pkg-config. Software that uses this provides a text file with extension .pc
where it includes all the relevant metadata. By setting/expanding an environment variable PKG_CONFIG_PATH
we make the build scripts aware where to find those files. Our approach is to only set this for any existing R installation in their respective Rprofile.site
file.
In parallel, we also add the folders containing binaries to the PATH
environment variable. This is important for geos-config
for example.
The below code snippet should be added to any R installation's Rprofile.site
.
# For gdal, geos and proj to work we prefix PKG_CONFIG_PATH and PATH
temp_pkg_path<-Sys.getenv("PKG_CONFIG_PATH")
new_pkg_path<-"/usr/gdal34/lib/pkgconfig:/usr/geos310/lib64/pkgconfig:/usr/proj72/lib/pkgconfig"
if (is.na(temp_pkg_path) || temp_pkg_path != '') {
Sys.setenv(PKG_CONFIG_PATH=paste0(new_pkg_path,":",temp_pkg_path))
} else {
Sys.setenv(PKG_CONFIG_PATH=new_pkg_path)
}
temp_path<-Sys.getenv("PATH")
new_path<-"/usr/gdal34/bin:/usr/geos310/bin:/usr/proj72/bin"
if (is.na(temp_path) || temp_path != '') {
Sys.setenv(PATH=paste0(new_path,":",temp_path))
} else {
Sys.setenv(PATH=new_path)
}
Note the reference to the versions again. If you are installing different versions, the exact script above will differ in the lines defining new_pkg_path
and new_path
.
Appendix
Packages that directly depend on gdal
, geos
and proj
(*)
gdal |
|
geos |
|
proj |
|
Any other package depending on one of the listed packages indirectly depends on the existence of the respective OS package as well.
(*) as of December 2022
Hi Michael,
Thank you for this article! It has saved me quite a bit of trouble this week alone. Wanted to let you know there's a minor typo on line 7 of the .RProfile configuration you shared. tmp_pkg_path instead of temp_pkg_path
Hi James, glad to hear this material is actually useful ! Thanks for making me aware of the typo - I have fixed it now accordingly !