且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Perl:使用正则表达式从文本中提取数据

更新时间:2022-11-14 20:47:02

不要尝试使用一个正则表达式 一组正则表达式和拆分更容易理解:

#!/usr/bin/perl使用严格;使用警告;而(){接下来除非我的 ($data) =/\("Data" (.*)\)/;打印在 $. 行,我看到:\n";对于我的 $item ($data =~/\((.*?)\)/g) {我的 ($type, $var, $num) = split " ", $item;打印 "\ttype $type var $var num $num\n";}}__数据__("数据" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "D" 34896)(Int "E" 38046))("Data" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "B" 3)(Int "C" 4)(Int "B" 5)(Int "C"" 6)(国际"D" 34896)(国际"E" 38046))("数据" (Int "A" 22)(Int "B" 22)(Int "C" 59)(Int "B" 1143)(Int "C" 1210)(Int "B" 1232)(Int "C"" 34896)(国际"D" 34896)(国际"E" 38046))

如果您的数据可以跨行,我建议使用解析器而不是正则表达式.

I am using Perl to do text processing with regex. I have no control over the input. I have shown some examples of the input below.

As you can see the items B and C can be in the string n times with different values. I need to get all the values as back reference. Or if you know of a different way i am all ears.

I am trying to use branch reset pattern (as outlined at perldoc: "Extended Patterns") I am not having much luck matching the string.

("Data" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "D" 34896)(Int "E" 38046))
("Data" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "B" 3)(Int "C" 4)(Int "B" 5)(Int "C" 6)(Int "D" 34896)(Int "E" 38046))
("Data" (Int "A" 22)(Int "B" 22)(Int "C" 59)(Int "B" 1143)(Int "C" 1210)(Int "B" 1232)(Int "C" 34896)(Int "D" 34896)(Int "E" 38046))

My Perl is below, any help would be great. Thanks for any help you can give.

if($inputString =~/\("Data" \(Int "A" ([0-9]+)\)(?:\(Int "B" ([0-9]+)\)\(Int "C" ([0-9]+)\))+\(Int "D" ([0-9]+)\)\(Int "E" ([0-9]+)\)\)/) {

    print "\n\nmatched\n";

    print "1: $1\n";
    print "2: $2\n";
    print "3: $3\n";
    print "4: $4\n";
    print "5: $5\n";
    print "6: $6\n";
    print "7: $7\n";
    print "8: $8\n";
    print "9: $9\n";

}

Don't try to use one regex a set of regexes and splits are easier to understand:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    next unless my ($data) = /\("Data" (.*)\)/;
    print "on line $., I saw:\n";
    for my $item ($data =~ /\((.*?)\)/g) {
        my ($type, $var, $num) = split " ", $item;
        print "\ttype $type var $var num $num\n";
    }
}

__DATA__
("Data" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "D" 34896)(Int "E" 38046))
("Data" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "B" 3)(Int "C" 4)(Int "B" 5)(Int "C" 6)(Int "D" 34896)(Int "E" 38046))
("Data" (Int "A" 22)(Int "B" 22)(Int "C" 59)(Int "B" 1143)(Int "C" 1210)(Int "B" 1232)(Int "C" 34896)(Int "D" 34896)(Int "E" 38046))

If your data can stretch across lines, I would suggest using a parser instead of a regex.