且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

R:在通用(通用)功能的功能参数中指定变量名称

更新时间:2022-11-20 21:28:35

(见我添加的评论)你原来的功能和调用,假设你的意思是将你感兴趣的列的名称传递给函数:

Let's investigate (see the comments I added) you original function and call, assuming you mean to pass the names of you columns of interest to the function:

myfun <- function (dataframe, varA, varB) {
              #on this next line, you use A and B. But this should be what is
              #passed in as varA and varB, no?
              daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
              #so, as a correction, we need:
              colnames(daf2)<-c(varA, varB)
              #the first argument to lm is a formula. If you use it like this,
              #it refers to columns with _names_ varA and varB, not as names
              #the _contents_ of varA and varB!!
              anv1 <- lm(varA ~ varB, daf2)
              #so, what we really want, is to build a formula with the contents
              #of varA and varB: we have to this by building up a character string:
              frm<-paste(varA, varB, sep="~")
              anv1 <- lm(formula(frm), daf2)
              print(anova(anv1)) 
             }             
#here, you pass A and B, because you are used to being able to do that in a formula
#(like in lm). But in a formula, there is a great deal of work done to make that
#happen, that doesn't work for most of the rest of R, so you need to pass the names
#again as character strings:
myfun (dataframe = dataf, varA = A, varB = B)
#becomes:
myfun (dataframe = dataf, varA = "A", varB = "B")

注意:在上面,我留下了原始代码,所以你可能需要删除一些以避免你最初得到的错误。您的问题的本质在于您应该始终将列名称作为字符传递,并将其用于此。这是R中公式的句法糖让人们变得不好的习惯和误解的地方之一...

Note: in the above, I left the original code in place, so you may have to remove some of that to avoid the errors you were originally getting. The essence of your problems is that you should always pass column names as characters, and use them as such. This is one of the places where the syntactic sugar of formulas in R gets people into bad habits and misunderstandings...

现在,作为替代方案:唯一的地方实际使用变量名,在公式中。因此,如果您不介意稍后可以清理的结果中的轻微化妆品差异,您可以进一步简化事项:不需要传递列名称。

Now, as for an alternative: the only place the variable names are actually used, are in the formula. As such, you can simplify matters further if you don't mind some slight cosmetic differences in the results that you can clean up later: there is no need for you to pass along the column names!!

myfun <- function (dataframe) {
              daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
              #now we know that columns A and B simply exist in data.frame daf2!!
              anv1 <- lm(A ~ B, daf2)
              print(anova(anv1))
             }             

作为最后一条建议:我不会在最后一条语句中打印:如果没有,但直接从R命令行使用这种方法,它将执行无论如何打印给你另外,您可以使用方法返回的对象执行进一步的工作。

As a final piece of advice: I would refrain from calling print on your last statement: if you don't, but use this method directly from the R command line, it will perform the print for you anyway. As an added advantage, you can perform further work with the object returned from your method.

已审核的已清理功能:

Cleaned Function with trial:

dataf <- data.frame (A= 1:10, B= 21:30, C= 51:60, D = 71:80)
myfun <- function (dataframe, varA, varB) {
               frm<-paste(varA, varB, sep="~")
               anv1 <- lm(formula(frm), dataframe)
               anova(anv1)
             }
 myfun (dataframe = dataf, varA = "A", varB = "B")
  myfun (dataframe = dataf, varA = "A", varB = "D")
    myfun (dataframe = dataf, varA = "B", varB = "C")