R 3.5- A Major Upagrade

                              

The major upgrade of R is available since June 2017.The R3.5 is launched on 05-05-2018. Majority of its features are hidden to the user except for the performance based improvements. The biggest change that R 3.5 has bought with it is the ALTREP which is Alternate Representation for R objects. This allows for better representation of many vectors which thus takes up less storage space and thereby aides in faster computations. For example, in the sequence vector 1:1000000, which would previously allocate a million elements in the prior versions of R, is now represented by the starting and the ending value. So, while R 3.4.3 takes about 1.5 seconds to run the same vector it is instantaneous on R 3.5.0

Other notable improvements related to ALTREP are the output of the sort function. The function includes a flag that indicates that the vector is already sorted so that sorting it again is instantaneous. Therefore, running x <- sort (x) is now free the second and the subsequent times unlike the earlier versions of R. The default symbol table size has been increased from 4119 to 49157; this may improve the performance of symbol resolution when many packages are loaded.

Another notable change that should be mentioned is the conversion of a numeric to a character vector, as.character(x) is now also instantaneous, without the need to perform coercion to character, until the character representation is actually needed. This would have a significant impact in R’s statistical modelling function which carries a long character vector that usually contains just numbers – the row names – with the design matrix.

Therefore, the calculation

d <- data.frame(y = rnorm(1e7), x = 1:1e7)

lm(y ~ x, data=d)

Runs about 4x faster. (It also uses a lot less memory: running the equivalent command with 10x more rows failed in R 3.4.3 but succeeded in 3.5.0.)

The ALTREP system has been specifically designed to be extensible, but in R 3.5.0 the system is used exclusively for the internal operations of R. In addition, all packages are now byte-compiled on installation. On the previous versions of R, the base and recommended packages, packages on CRAN, were already byte-compiled, so this would automatically improve the performance of packages installed from Github and from private sources. R will perform better when a lot of packages are loaded in the same time frame.

Improved support for long vectors, by functions including object.size, approx and spline. Reading in text data with readLines and scan should be faster, thanks to buffering on text connections. R should handle some international data files better, with several bugs related to character encodings having been resolved.

Apart from that R added some major changes in the character methods for as.Date() and as.POSIXlt() are more flexible _via_ new arguments tryFormats and optional. There are two basic classes of date/times. Class “POSIXct” represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. Class “POSIXlt” is a named list of vectors representing sec (Seconds), Min (Minutes), hour (Hour), mday (day of the month), Mon (Month), year (Year), Wday (Weekday), yday  (Day of the year), isdst (If there is daylight saving), Zone(Time Zone), gmtoff (FMT on/off). And now POSIXt objects can now also be rounded or truncated to month or year.

Other notable changes include:

 

  1. Factor () now uses order () to sort its levels, rather than sort.list()
  2. The performance of functions like readlines(), scan() and read.table() has been drastically improved
  3. Previously, it was possible to pass only one single argument on the ‘#!’ line in Linux. Rscript can now accept more than one argument given on the ‘#!’ line of a script
  4. printCoefmat() now also works without column names.

And other user visible changes include, all packages are by default byte-compiled on installation.  This makes the installed packages larger (usually marginally so) and may affect the format of messages and trace backs (which exclude .Call and similar).Speaking of packages, probably with installation of R3.5, user might lose all the previously installed packages and installer package can help with this. After reading the release notes, I can assure there no major backwardly-incompatible changes, so the old scripts should continue to work. Nonetheless, given the significant changes behind the scenes, it might be best to wait for a maintenance release before using R 3.5.0 for production applications. But for developers and data science work, I recommend jumping over to R 3.5.0 right away, as the benefits are significant.

Please follow and like us:

Leave a Reply

  Subscribe  
Notify of