且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何比较列表中的元素并比较Python列表中的键?

更新时间:2023-02-05 14:28:39

以下是我的解决方案。

  aminoacid = {'GCC':'A','TTT' :'F','TTC':'F','TTA':'L','TTG':'L','CTT':'L','CTC':'L','CTA':' L','CTG':'L','ATT':'我','ATC':'我','ATA':'我','ATG':'M','GTT':'V' ,'GTC':'V','GTA':'V','GTG':'V','TCT':'S','TCC':'S','TCA':'S',' TCG':'S','CCT':'P','CCC':'P','CCA':'P','CCG':'P','ACT':'T','ACC' :'T','ACA':'T','ACG':'T','GCT':'A','GCG':'A','GCA':'A','GCG':'一个,'TAT':'Y','TAC':'Y','TAA':'停止','TAG':'停止','CAT':'H','CAC':'H',' CAA':'Q','CAG':'Q','AAT':'N','AAC':'N','AAA':'K','AAG':'K','GAT' :'D','GAC':'D','GAA':'E','GAG':'E','TGT':'C','TGC':'C','TGA':'停止','TGG':'W','CGT':'R','CGC':'R','CGA':'R','CGG':'R','AGT':'S' ,'AGC':'S','AGA':'R','AGC':'R','CGT':'G','GGC':'G','GGA':'G',' GGG':'G',} 

seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA', 'CCT'],['GCC','GCG','GCA','GCT']]

Psyn = 0;
PNonsyn = 0;
输出= [];

#loop遍历列表中的每个列表
for selist中的子列表:
acids = [aminoacid [base] for base in sublist]
if len( set(acids))!= 1:#if有不同的氨基酸,然后nonsync
output.append('nonsync')
PNonsyn + = 1
else:#if相同的氨基酸
if len(set(sublist))== 1:#if same base
output.append(sublist [0]);
else:#if不一样基础
output.append(acids [0]);
Psyn + = 1

打印Psyn =+ str(Psyn)
打印PNonsyn =+ str(PNonsyn)
打印输出

不可否认,这不是对你的代码的修改,但这里有一个巧妙的技巧来取消双 for 循环。给定列表 mylist ,您可以通过调用 set(mylist)找到列表中的所有唯一元素。例如,

 >>> a = ['AGT','AGT','ACG'] 
>>>设置(a)
set(['AGT','ACG'])
>>> len(set(a))
2


I have the following sequence:

seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA','CCT'],['GCC','GCG','GCA','GCT']]

Here is a dictionary key that stores the value of amino acid for each of the codons (Triplet bases like ATG, GCT etc).

aminoacid = {'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCC' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','GGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G'}

As one can see several codons can code for the same aminoacid (eg. GGT,GGC,GGA, GGG etc all code for Glycine (G) ). These are Synonymous (PSyn) and if codons code for different amino acids they are Non-Synonymous (PNonsyn)

In this code, I need to do the following:

  1. For each element in the list of lists, if there is a change in the bases AND they all code for the same amino acid, then increase count of PSyn by 1 and if it codes for different amino acids increment count PNonsyn by 1

    Here,

    ATG all code for M #However, all are ATG's no change in bases. So no increment in count
    
    GAC, GAT for D; GAA for E; and CCT for P #Codes for three different amino acids, increment count by 1
    
    GGT,GGC,GGA, GGG for G #Different bases but all code for same amino acids, increment count by 1
    

    OutPut: CountPsyn = 1 CountPNonsyn = 1

  2. Generate a list of amino acids that corresponds to the above seq. such that:

    Output : ['ATG','nonsyn','G'] #For sites with different aminoacids, the list should say nonsyn and for sites which had identical bases it should list the bases

I need help modifying the following code to get the program to work. I am not confident on how to call values from dictionary and check them against all the elements. Code Attempted:

countPsyn = 0
countPnonsyn = 0
listofaa =[]

for i in seq:
    for base, value in enumerate(i):        
        if value[i] == value[i+1]: #eg. ['ATG','ATG','ATG','ATG'] 
            listofaa.append(value)

        if value[i] != value[i+1]: 
            if aminoacid[value][i] ==  aminoacid[value][i+1]: #eg.['GCC','GCG','GCA','GCT']
                countPsyn =+ 1
                listofaa.append(aminoacid)
            else: #eg. ['GAC','GAT','GAA','CCT']
                countPnonsyn =+ 1
                listofaa.append('nonsyn')

File Output can be found [here][1] https://eval.in/669107

Here is my stab at the solution.

aminoacid = {'GCC': 'A' ,'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCG' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','CGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G',}

seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA','CCT'],['GCC','GCG','GCA','GCT']]

Psyn = 0;
PNonsyn = 0;
output = [];

#loop through each list in your list of list
for sublist in seq:
    acids = [aminoacid[base] for base in sublist]
    if len(set(acids)) != 1: #if there are different amino acids, then nonsync
        output.append('nonsync')
        PNonsyn += 1
    else: #if same amino acid
        if len(set(sublist)) == 1: #if same base
            output.append(sublist[0]);
        else: #if not same base
            output.append(acids[0]);
            Psyn += 1

print "Psyn = "+ str(Psyn)
print "PNonsyn = "+ str(PNonsyn)
print output

Admittedly it's not a modification of your code, but there is a neat trick here to void the double for loop. Given a list mylist, you could find all uniques elements in a list by calling set(mylist). E.g.

>>> a = ['AGT','AGT','ACG']
>>> set(a)
set(['AGT', 'ACG'])
>>> len(set(a))
2