且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

计算复杂文件夹结构中每个文件夹的文件数?

更新时间:2022-06-13 01:11:02

list.dirs()提供一个从起始文件夹可访问的每个子目录的向量,以便处理数据帧的第一列.非常方便.

list.dirs() provides a vector of every subdirectory reachable from a starting folder, so that handles the first column of your data-frame. Very convenient.

# Get a vector of all the directories and subdirectories from this folder
dir <- "."
xs <- list.dirs(dir, recursive = TRUE)

list.files()可以告诉我们每个文件夹的内容,但其中包括文件和文件夹.我们只想要文件.要获取文件数,我们需要使用谓词过滤list.files()的输出. file.info()可以告诉我们给定文件是否为目录,因此我们可以以此为基础建立谓词.

list.files() can tell us the contents of each of those folders, but it includes files and folders. We just want the files. To get the count of files, we need to filter the output of list.files() with a predicate. file.info() can tell us whether a given file is a directory or not, so we build our predicate from that.

# Helper to check if something is folder or file
is_dir <- function(x) file.info(x)[["isdir"]]
is_file <- Negate(is_dir)

现在,我们解决了如何获取单个文件夹中文件的数量.布尔值的总和返回TRUE个案例的数量.

Now, we solve how to get the number of files in a single folder. Summing boolean values returns the number of TRUE cases.

# Count the files in a single folder
count_files_in_one_dir <- function(dir) {
  files <- list.files(dir, full.names = TRUE)
  sum(is_file(files))
}

为方便起见,我们包装了该功能以使其可以在许多文件夹中使用.

For convenience, we wrap that function to make it work on many folders.

# Vectorized version of the above
count_files_in_dir <- function(dir) {
  vapply(dir, count_files_in_one_dir, numeric(1), USE.NAMES = FALSE)
}

现在我们可以计算文件了.

Now we can count the files.

df <- tibble::data_frame(
  dir = xs,
  nfiles = count_files_in_dir(xs))

df
#> # A tibble: 688 x 2
#>                                                  dir nfiles
#>                                                <chr>  <dbl>
#>  1                                                 .     11
#>  2                                         ./.github      3
#>  3                                     ./actioncable      7
#>  4                                 ./actioncable/app      0
#>  5                          ./actioncable/app/assets      0
#>  6              ./actioncable/app/assets/javascripts      1
#>  7 ./actioncable/app/assets/javascripts/action_cable      5
#>  8                                 ./actioncable/bin      1
#>  9                                 ./actioncable/lib      1
#> 10                    ./actioncable/lib/action_cable      8
#> # ... with 678 more rows