
且构网 - 分享程序员编程开发的那些事


更新时间:2023-02-08 20:32:46


This seems to work correctly, assuming that the data is ordered so that all the lines with the same first two name components are grouped together in the data file. The order of those lines within the group doesn't matter.


awk '
    function dump_memo()
        if (memo_num > 0)
            for (i = 0; i < memo_num; i++)
                print memo_line[i]
        split($1, a, ".")
        key = a[1] "." a[2]
        val = $NF
        # print "# " key " = " val " (memo_key = " memo_key ", memo_val = " memo_val ")"
        if (memo_key == key)
            if (memo_val == val)
                memo_line[memo_num++] = $0
            else if (memo_val < val)
                memo_val = val
                memo_num = 0
                memo_line[memo_num++] = $0
            memo_num = 0
            memo_line[memo_num++] = $0
            memo_key = key
            memo_val = val
    END { dump_memo() }' "$@"


When run on the data file shown in the question, the output is:

gene.100079.0.0.p1  transcript:OIS96097 82.2    169 30  0   1   169 4   172 1.3e-75 283.1   86.7
gene.100080.0.3.p1  transcript:OIS96097 82.2    169 30  0   1   169 4   172 1.3e-75 283.1   99.9
gene.100080.0.0.p1  transcript:OIS96097 82.2    169 30  0   1   169 4   172 1.3e-75 283.1   99.9
chr11_pilon3.g3568.t2   transcript:OIS96097 82.2    169 30  0   1   169 4   172 1.3e-75 283.1   76.7


The main difference between this and what you request is the sort order. If you need the data in sorted order, pipe the output of the script through sort.