且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Linux文本文件操作

更新时间:1970-01-01 07:57:00

假设你可以有一个或AFER更多空间< A ,和周围的零个或更多的空间 = 标志,下面应该工作:

Assuming you can have one or more space afer <a, and zero or more space around the = signs, the following should work:

$ cat in.txt
<a href="http://www.wowhead.com/?search=Superior Mana Oil">
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">
#
# The command to do the substitution
#
$ sed -e 's#<a[ \t][ \t]*href[ \t]*=[ \t]*".*search[ \t]*=[ \t]*\([^"]*\)">#&\1</a>#' in.txt
<a href="http://www.wowhead.com/?search=Superior Mana Oil">Superior Mana Oil</a>
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">Tabard of Brute Force</a>
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">Tabard of the Wyrmrest Accord</a>
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">Tattered Hexcloth Sack</a>

如果你确定你没有多余的空间,模式简化为:

If you're sure you don't have the extra spaces, the pattern simplifies to:

s#<a href=".*search=\([^"]*\)">#&\1</a>#

SED 取值后跟任意字符(在这种情况下)开始替换。被替换的模式,直到同一性质的第二次亮相。所以,在我们的第二个例子,要被替换的模式是:&LT; A HREF =([* \\&GT; *搜索= \\ ^])。我用 \\([^] * \\)来的意思是,非任何序列 - 字符,并保存它的反向引用 \\ 1 (即 \\(\\)对表示反向引用),最后,下一个标记被分隔是替换&放大器; SED 表示任何匹配,在这种情况下是整条生产线,而 \\ 1 只是匹配的链接文本。

In sed, s followed by any character (# in this case) starts substitution. The pattern to be substituted is until the second appearance of the same character. So, in our second example, the pattern to be substituted is: <a href=".*search=\([^"]*\)">. I used \([^"]*\) to mean, any sequence of non-" characters, and saved it in backreference \1 (the \(\) pair denotes a backreference). Finally, the next token delimited by # is the replacement. & in sed stands for "whatever matched", which in this case is the whole line, and \1 just matches the link text.

这里的样式再次:

's#<a[ \t][ \t]*href[ \t]*=[ \t]*".*search[ \t]*=[ \t]*\([^"]*\)">#&\1</a>#'

及其说明:

'                       quote so as to avoid shell interpreting the characters
s                       substitute
#                       delimiter
<a[ \t][ \t]*           <a followed by one or more whitespace
href[ \t][ \t]*=[ \t]*  href followed by optional space, = followed by optional space
".*search[ \t]*=[ \t]*  " followed by as many characters as needed, followed by
                        search, optional space, =, followed by optional space
\([^"]*\)               a sequence of non-" characters, saved in \1
">                      followed by ">
#                       delimiter, replacement pattern starts
&\1                     the matched pattern, followed by backreference \1.
</a>                    end the </a> tag
#                       end delimiter
'                       end quote

如果你的真正的肯定总是会有搜索= 其次是你想要的,你可以做文字:

If you're really sure that there will always be search= followed by the text you want, you can do:

$ sed -e 's#.*search=\(.*\)">#&\1</a>#'

希望有所帮助。