且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在任意长度的子字符串上拆分字符串(Powershell)

更新时间:2023-02-21 14:06:54

事实上,有没有更有效的方法来做到这一点?

如果更高效"是指需要更少 CPU 周期的东西",那么是:

$string = 'Junction 0122 D150441-4 Ni Po De 210 Na'$cols = @(14, 5, 11, 2, 16, 3, 4, 2)$substrings = @($cols |选择 -SkipLast 1 |ForEach-Object {$string.Remove($_)$string = $string.Substring($_)}$字符串)# $substrings 现在包含单独的列值

上面的代码将通过从字符串的前一个副本中连续删除它们来获取第一个 n-1 个子字符串.

如果更高效"是指更少代码",您可以连接您构建的正则表达式模式并一次性获取所有捕获组:

$string = 'Junction 0122 D150441-4 Ni Po De 210 Na'$cols = @(14, 5, 11, 2, 16, 3, 4, 2)# 生成正则表达式# 在这种情况下 '(.{14})(.{5})(.{11})(.{2})(.{16})(.{3})(.{4})(.{2})'$pattern = $cols.ForEach({"(.{$_})"})-join''# 使用 `-match` 和 $Matches 来获取单个组$substrings = if($string -match $pattern){$Matches[1..($cols.Length-1)]}# $substrings 再次保存我们所有的子字符串

I have formatted text files from other sources; I can't control those sources or ask them to generate a more sensible-for-my-purposes format like CSV. I can look at the header lines of the files to determine the column widths (and names, but they're not at issue here). Once I've done that, I'll have an array of widths. I'd like to be able to split subsequent lines in that file based on the widths I've determined from the header.

Obviously, I can loop through the array of widths, and bite off the initial substring of the appropriate length, but I'm hoping there's a more efficient way - for example, if I wanted to use fixed-width columns, I could just use -split "(\w{$foo})", where $foo is the variable that contains the width of the column.

Is there, in fact, a more efficient way of doing this?

Example data:

Junction      0122 D150441-4    Ni Po De           210 Na

Column widths $cols=@(14, 5, 11, 2, 16, 3, 4, 2)

(Note: I don't care about trailing spaces in the chopped-up data; I can manage those later. I'm simply looking to chop the data at this point.)

(At iRon's request to be able to demonstrate his ConvertFrom-SourceTable, this is a full file that might need to be parsed)

@SUB-SECTOR: sec_C   SECTOR: reft
#
# Trade routes within the subsector
#
#--------1---------2---------3---------4---------5---------6---
#PlanetName   Loc. UPP Code   B   Notes         Z  PBG Al LRX *
#----------   ---- ---------  - --------------- -  --- -- --- -
Lemente       1907 B897563-B    Ag Ni              824 Na
Zamoran       2108 B674675-A  Q Ag Ni              904 Dr

Is there, in fact, a more efficient way of doing this?

If by "more efficient", you mean "something that takes fewer CPU cycles", then yes:

$string = 'Junction      0122 D150441-4    Ni Po De           210 Na'
$cols = @(14, 5, 11, 2, 16, 3, 4, 2)
$substrings = @(
  $cols |Select -SkipLast 1 |ForEach-Object {
    $string.Remove($_)
    $string = $string.Substring($_)
  }
  $string
)

# $substrings now contain the individual column values

The code above will grab the first n-1 substrings by continuously removing them from the previous copy of the string.


If by "more efficient" you mean "less code", you can concatenate your constructed regex patterns and grab all capture groups in one go:

$string = 'Junction      0122 D150441-4    Ni Po De           210 Na'
$cols = @(14, 5, 11, 2, 16, 3, 4, 2)

# generate the regex pattern 
# in this case '(.{14})(.{5})(.{11})(.{2})(.{16})(.{3})(.{4})(.{2})'
$pattern = $cols.ForEach({"(.{$_})"})-join''

# use `-match` and $Matches to grab the individual groups
$substrings = if($string -match $pattern){
  $Matches[1..($cols.Length-1)]
}

# $substrings again holds all our substrings