且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

读取.fasta序列以提取核苷酸数据,然后写入TabDelimited文件

更新时间:2022-06-17 16:00:07

在结尾处有一个额外的括号.这应该起作用:

You have an extra brace right near the end. This should work:

#!/usr/bin/perl -w
# This script reads several sequences and computes the relative content of G+C of each sequence.

use strict; 

my $infile = "Lab1_seq.fasta";                               # This is the file path
open INFILE, $infile or die "Can't open $infile: $!";        # This opens file, but if file isn't there it mentions this will not open
my $outfile = "Lab1_SeqOutput.txt";             # This is the file's output
open OUTFILE, ">$outfile" or die "Cannot open $outfile: $!"; # This opens the output file, otherwise it mentions this will not open

my $sequence = ();  # This sequence variable stores the sequences from the .fasta file
my $GC = 0;         # This variable checks for G + C content

my $line;                             # This reads the input file one-line-at-a-time

while ($line = <INFILE>) {
    chomp $line;                      # This removes "\n" at the end of each line (this is invisible)

    if($line =~ /^\s*$/) {         # This finds lines with whitespaces from the beginning to the ending of the sequence. Removes blank line.
        next;

    } elsif($line =~ /^\s*#/) {        # This finds lines with spaces before the hash character. Removes .fasta comment
        next; 
    } elsif($line =~ /^>/) {           # This finds lines with the '>' symbol at beginning of label. Removes .fasta label
        next;
    } else {
        $sequence = $line;
    }

    $sequence =~ s/\s//g;               # Whitespace characters are removed
    print OUTFILE $sequence;
}

我还编辑了您的退货行. Return将退出您的循环.我怀疑您想要将其打印到文件中,所以我已经做到了.您可能需要先进行一些进一步的转换,才能将其转换为制表符分隔的格式.

Also I edited your return line. Return will exit your loop. I suspect what you want is to print it to a file, so I have done that. You may need to do some further transformation first to get it into a tab separated format.