且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

为什么diag功能这么慢? [在R 3.2.0或更早版本中]

更新时间:2023-12-04 15:55:28

摘要

R版本3.2.1 (World-著名的宇航员)diag()已收到更新.讨论移至 r-devel ,其中指出c()会剥离非名称属性,这可能就是将其放置在其中的原因.尽管有些人担心删除c()会在类似矩阵的对象上引起未知的问题,但是Peter Dalgaard发现,"diag()内的c()起作用的唯一情况是M[i,j] != M[(i-1)*m+j] AND c(M)会以列优先顺序将M字符串化,以使M[i,j] == c(M)[(i-1)*m+j]."

As of R version 3.2.1 (World-Famous Astronaut) diag() has received an update. The discussion moved to r-devel where it was noted that c() strips non-name attributes and may have been why it was placed there. While some people worried that removing c() would cause unknown issues on matrix-like objects, Peter Dalgaard found that, "The only case where the c() inside diag() has an effect is where M[i,j] != M[(i-1)*m+j] AND c(M) will stringize M in column-major order, so that M[i,j] == c(M)[(i-1)*m+j]."

卢克·蒂尔尼(Luke Tierney)测试了@Frank对c()的删除,发现它对CRAN或BIOC没有任何影响,因此在第27行.这导致diag()中相对较大的加速.下面是一个速度测试,显示了R 3.2.1版本的diag()的改进.

Luke Tierney tested @Frank 's removal of c(), finding it did not effect anything on CRAN or BIOC and so was implemented to replace c(x)[...] with x[...] on line 27. This leads to relatively large speedups in diag(). Below is a speed test showing the improvement with R 3.2.1's version of diag().

library(microbenchmark)
nc  <- 1e4
set.seed(1)
m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)

    microbenchmark(diagOld(m),diag(m))
    Unit: microseconds
           expr        min          lq        mean      median         uq        max neval
     diagOld(m) 451189.242 526622.2775 545116.5668 531905.5635 540008.704 682223.733   100
        diag(m)    222.563    646.8675    644.7444    714.4575    740.701   1015.459   100