且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何将文件分成相等的部分,而不会破坏单独的行?

更新时间:2022-04-25 21:39:09

如果你的意思是 行数相等, split 有一个选项:

If you mean an equal number of lines, split has an option for this:

split --lines=75

如果您需要知道 75 对于 N 等量部分的真正含义,其:

If you need to know what that 75 should really be for N equal parts, its:

lines_per_part = int(total_lines + N - 1) / N

其中可以使用 wc -l 获得总行数.

where total lines can be obtained with wc -l.

请参阅以下脚本示例:

#!/usr/bin/bash

# Configuration stuff

fspec=qq.c
num_files=6

# Work out lines per file.

total_lines=$(wc -l <${fspec})
((lines_per_file = (total_lines + num_files - 1) / num_files))

# Split the actual file, maintaining lines.

split --lines=${lines_per_file} ${fspec} xyzzy.

# Debug information

echo "Total lines     = ${total_lines}"
echo "Lines  per file = ${lines_per_file}"    
wc -l xyzzy.*

输出:

Total lines     = 70
Lines  per file = 12
  12 xyzzy.aa
  12 xyzzy.ab
  12 xyzzy.ac
  12 xyzzy.ad
  12 xyzzy.ae
  10 xyzzy.af
  70 total

最新版本的 split 允许您使用 -n/--number 选项指定多个 CHUNKS.因此,您可以使用以下内容:


More recent versions of split allow you to specify a number of CHUNKS with the -n/--number option. You can therefore use something like:

split --number=l/6 ${fspec} xyzzy.

(那是ell-slash-6,意思是lines,而不是one-slash-6).

(that's ell-slash-six, meaning lines, not one-slash-six).

这将使您的文件大小大致相同,没有中线分割.

That will give you roughly equal files in terms of size, with no mid-line splits.

我提到最后一点是因为它不会在每个文件中为您提供大致相同数量的,而是更多相同数量的字符.

I mention that last point because it doesn't give you roughly the same number of lines in each file, more the same number of characters.

因此,如果您有一个 20 个字符的行和 19 个 1 个字符的行(总共 20 行)并拆分为五个文件,您很可能不会在每个文件中得到四行.

So, if you have one 20-character line and 19 1-character lines (twenty lines in total) and split to five files, you most likely won't get four lines in every file.