Performance has many dimensions. R sometimes gets a bit of a bum rap because of a poor choice of package or a programming style unsuited for vectorized computation.
There are a variety of ways to get R to run faster. Most people are already aware of the importance of writing code with vectorization in mind and avoiding loops and unnecessary copies of data. In our experience, your choice of R package(s) can have a material impact on performance. For example, dplyr is easily 100 and even 1000 times faster than plyr. Similarly stringr is radically faster for string operations than the base R implementation. If you are using a lot of matrix computation you can either compile R with Intel’s MKL libraries or use Revo’s Open R implementation. Google is undertaking promising work on speeding up the R runtime by taking the CxxR implementation of R and making sure that it is perfectly backward compatible with the current open source R.
If there are specific areas in R that you would like to learn more about how to optimize let us know, we may either be working on something for those areas ourselves, or we may know if there are folks who are solving that problem. At the end of the day, to get the best performance in the world you can’t beat going to C++, and with Rcpp a capable C++ developer can really change the performance of their package and by extension the user’s experience with R.
Comments