且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

shell脚本查找,搜索和文件替换字符串数组

更新时间:2023-12-05 14:48:16

纯巴什(无外部)

在bash命令行:

  $ sample=\"LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook.\"
$#或:样品= $(小于sample1.txt)
$阵列=(
LoremIpsum
LoremIpsu
dummytext
...

$标签= 0;在$ {数组[@]}项;做测试=< [^> / *> [^>] * $入门[^<] *< /;如果[! $样品=〜$测试]];然后((标签++));样品= $ {//示例$ {}进入/< T $标记> $输入< / T $标签>};网络连接;完成的;回声输出;回声$样本
输出:
<T1>LoremIpsum</T1>issimply<T2>dummytext</T2>oftheprintingandtypesetting<T3>industry</T3>.<T1>LoremIpsum</T1>hasbeenthe<T3>industry</T3>'sstandard<T2>dummytext</T2>eversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook.

This is linked to another question/code-golf i asked on http://***.com/questions/3171552/code-golf-color-highlighting-of-repeated-text

I've got a file 'sample1.txt' with the following content:

LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook.

I've got a script generating the following array of strings which occur in the file (only a few shown for illustration):

LoremIpsum
LoremIpsu
dummytext
oremIpsum
LoremIps
dummytex
industry
oremIpsu
remIpsum
ummytext
LoremIp
dummyte
emIpsum
industr
mmytext

I need to (from the top) see if 'LoremIpsum' occurs in file sample1.txt. If so, I want to replace all occurences of LoremIpsum with: <T1>LoremIpsum</T1>. Now, when the program moves to the next word 'LoremIpsu', it should NOT match against the <T1>LoremIpsum</T1> text inside sample1.txt. It should repeat the above for all elements of this 'array'. The next 'valid' one would be 'dummytext' and that should be tagged as <T2>dummytext</T2> .

I do think it should be possible to create a bash shell script solution for this rather than relying on perl/python/ruby programs.

Pure Bash (no externals)

At the Bash command line:

$ sample="LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook."
$ # or: sample=$(<sample1.txt)
$ array=(
LoremIpsum
LoremIpsu
dummytext
...
)
$ tag=0; for entry in ${array[@]}; do test="<[^>/]*>[^>]*$entry[^<]*</"; if [[ ! $sample =~ $test ]]; then ((tag++)); sample=${sample//${entry}/<T$tag>$entry</T$tag>}; fi; done; echo "Output:"; echo $sample
Output:
<T1>LoremIpsum</T1>issimply<T2>dummytext</T2>oftheprintingandtypesetting<T3>industry</T3>.<T1>LoremIpsum</T1>hasbeenthe<T3>industry</T3>'sstandard<T2>dummytext</T2>eversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook.