且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用python提高置换算法效率

更新时间:2023-11-24 09:01:34

这不是提高排列效率的问题。这确实是一个有关如何使算法更智能的问题,这是一个数据结构问题。


我有一个文件,包含刚超过100,000个单词。我要做的
是通过字母表的每5个字母组合来计算的,以找出
最少使用的5个字母。




浏览26个字母,并计算列表中使用每个字母的单词数:

 导入字符串
字母= string.ascii_lowercase
计数= {k:sum(k in word.lower()for word in word)for k in Alphabet}

这应该很快,并且应该为您提供足够的信息来轻松挑选出五个最受欢迎的字母。



等效方法,可能比上述方法更有效,但可能不太清楚。

 来自itertools进口链
来自集合进口柜台
柜台=柜台({k:k为0 in string.ascii_lowercase})
counter.update(Counter(c for w in set in w(set.w.lower()))c)


I have a file that contains just over 100,000 words. What I have to do is run through every 5 letter combination of the alphabet to work out the 5 letters that are used by the least number of words.

I've worked out a python based program that will eventually get the answer, but at the speed it is going it would probably take around 48 hours if not longer. Part the problem is the sheer number of calculations. I haven't worked out how to limit permutations so that only distinct strings are compared - so that is 265 calculations just on the combinations alone, and then each of these is compared against 100,000 words, which works out at around 10*1011 calculations at a bare minimum.

Is there any way to speed this process up substantially, be that through more efficient algorithms, or multi-threading or something like that?

Any recommendations for books/articles on algorithm efficiency would also be greatly appreciated.


My current program is as follows:

Imports permutation function from itertools module:

from itertools import permutations

Asks if the word contains the forbidden letters:

def avoids (word, letters): 
    for characters in letters:
        for character in word:
            if character == characters:
                return False
    return True     

Calculates the number of words from the file that do not contain the forbidden characters:

def number_of_words(file, letters):  

    open_file = open(file)

    x = 0 #counter
    for line in open_file:
        word = line.strip()
        if avoids(word, letters) == True:
        x += 1  
    return x

Runs through every variation of five letters that exists in the alphabet and calculates the combination that excludes the fewest words:

def find_smallest():

    y = 0

    #every combination of letters
    for letters in permutations("abcdefghijklmnopqrstuvwxyz", 5): 
        x = number_of_words("words.txt", letters)
        #sets y to the initial value of x
        if y == 0:
            y = x
            print "Start point, combination: %s, amount: %d" % (letters, y)

        #If current combination is greater than all previous combinations set y to x
        elif x > y:
            y = x
            combination = letters
            duplication = 0
            print "New highest, combination: %s, amount: %d" % (letters, y)

        print "%s excludes the smallest number of words (%d)" % (combination, y)

Runs the program:

find_smallest()

This is not a question about increasing permutation efficiency. This is really a question about how to make a smarter algorithm, it is a data structure question.

I have a file that contains just over 100,000 words. What I have to do is run through every 5 letter combination of the alphabet to work out the 5 letters that are used by the least number of words.

Loop through the 26 letters of the alphabet and count the number of words in your list which use each letter:

import string
alphabet = string.ascii_lowercase
counts = {k: sum(k in word.lower() for word in words) for k in alphabet}

This should be pretty fast, and should give you enough information to trivially pick out the five least popular letters.

Equivalent approach, probably more efficient but maybe less clear than the above.

from itertools import chain
from collections import Counter
counter = Counter({k: 0 for k in string.ascii_lowercase})
counter.update(Counter(c for w in words for c in set(w.lower())))