且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在一个表达式中合并两个词典(合并词典)?

更新时间:2022-11-17 19:37:24

如何在一个表达式中合并两个Python字典?

对于字典 x y z 变成浅表合并的字典,其中 y 中的值替换了其中的值 x .

For dictionaries x and y, z becomes a shallowly merged dictionary with values from y replacing those from x.

  • 在Python 3.9.0或更高版本(2020年10月17日发布)中:PEP-584 (此处讨论的)已实现并提供了最简单的方法:

  • In Python 3.9.0 or greater (released 17 October 2020): PEP-584, discussed here, was implemented and provides the simplest method:

z = x | y          # NOTE: 3.9+ ONLY

  • 在Python 3.5或更高版本中:

  • In Python 3.5 or greater:

    z = {**x, **y}
    

  • 在Python 2(或3.4或更低版本)中,编写一个函数:

  • In Python 2, (or 3.4 or lower) write a function:

    def merge_two_dicts(x, y):
        z = x.copy()   # start with x's keys and values
        z.update(y)    # modifies z with y's keys and values & returns None
        return z
    

    现在:

    z = merge_two_dicts(x, y)
    

  • 假设您有两个字典,并且想要将它们合并为一个新的字典,而无需更改原始字典:

    Say you have two dictionaries and you want to merge them into a new dict without altering the original dictionaries:

    x = {'a': 1, 'b': 2}
    y = {'b': 3, 'c': 4}
    

    理想的结果是获得一个合并了值的新字典( z ),第二个字典的值覆盖第一个字典的值.

    The desired result is to get a new dictionary (z) with the values merged, and the second dictionary's values overwriting those from the first.

    >>> z
    {'a': 1, 'b': 3, 'c': 4}
    

    为此新的语法,在 PEP 448 从Python 3.5开始可用,是

    A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

    z = {**x, **y}
    

    它确实是一个表达式.

    请注意,我们也可以将其与文字符号合并:

    Note that we can merge in with literal notation as well:

    z = {**x, 'foo': 1, 'bar': 2, **y}
    

    现在:

    >>> z
    {'a': 1, 'b': 3, 'foo': 1, 'bar': 2, 'c': 4}
    

    现在显示为在发行版中已实现计划3.5,PEP 478 ,现在已进入

    It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into What's New in Python 3.5 document.

    但是,由于许多组织仍在使用Python 2,因此您可能希望以向后兼容的方式进行操作.在Python 2和Python 3.0-3.4中可用的经典Pythonic方法是分两步完成的过程:

    However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

    z = x.copy()
    z.update(y) # which returns None since it mutates z
    

    在两种方法中, y 将排第二,其值将替换 x 的值,因此'b'将指向 3 .

    In both approaches, y will come second and its values will replace x's values, thus 'b' will point to 3 in our final result.

    如果您尚未使用Python 3.5或需要编写向后兼容的代码,并且希望在单个表达式中使用它,则最有效的方法是将其放入函数中:

    If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:

    def merge_two_dicts(x, y):
        """Given two dictionaries, merge them into a new dict as a shallow copy."""
        z = x.copy()
        z.update(y)
        return z
    

    ,然后您有一个表达式:

    and then you have a single expression:

    z = merge_two_dicts(x, y)
    

    您还可以创建一个函数来合并未定义数量的字典,字典的数量从零到很大:

    You can also make a function to merge an undefined number of dictionaries, from zero to a very large number:

    def merge_dicts(*dict_args):
        """
        Given any number of dictionaries, shallow copy and merge into a new dict,
        precedence goes to key-value pairs in latter dictionaries.
        """
        result = {}
        for dictionary in dict_args:
            result.update(dictionary)
        return result
    

    该功能将在Python 2和3中适用于所有词典.例如给字典 a g :

    This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a to g:

    z = merge_dicts(a, b, c, d, e, f, g) 
    

    g 中的键值对将优先于字典 a f 的字典,依此类推.

    and key-value pairs in g will take precedence over dictionaries a to f, and so on.

    请勿使用您在先前接受的答案中看到的内容:

    Don't use what you see in the formerly accepted answer:

    z = dict(x.items() + y.items())
    

    在Python 2中,您为每个dict在内存中创建两个列表,在内存中创建第三个列表,其长度等于前两个单词的长度,然后丢弃所有三个列表以创建dict.在Python 3中,此操作将失败,因为您将两个 dict_items 对象添加在一起,而不是两个列表-

    In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you're adding two dict_items objects together, not two lists -

    >>> c = dict(a.items() + b.items())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'
    

    ,您将必须将它们明确创建为列表,例如 z = dict(list(x.items())+ list(y.items())).这是浪费资源和计算能力.

    and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

    类似地,当值是不可散列的对象(例如列表)时,采用Python 3中的 items()的并集(Python 2.7中的 viewitems())也将失败.例子).即使您的值是可哈希的,由于集合在语义上是无序的,因此关于优先级的行为是不确定的.所以不要这样做:

    Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don't do this:

    >>> c = dict(a.items() | b.items())
    

    此示例说明了值不可散列时会发生什么:

    This example demonstrates what happens when values are unhashable:

    >>> x = {'a': []}
    >>> y = {'b': []}
    >>> dict(x.items() | y.items())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'list'
    

    这是一个示例,其中y应该优先,但是由于集合的任意顺序,保留了x中的值:

    Here's an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

    >>> x = {'a': 2}
    >>> y = {'a': 1}
    >>> dict(x.items() | y.items())
    {'a': 2}
    

    您不应该使用的另一种技巧:

    Another hack you should not use:

    z = dict(x, **y)
    

    这使用了 dict 构造函数,并且非常快速且内存高效(甚至比我们的两步过程略高),但是除非您确切地知道这里正在发生什么(即,第二个dict作为关键字参数传递给dict构造函数),它很难阅读,不是预期的用法,因此不是Pythonic.

    This uses the dict constructor and is very fast and memory-efficient (even slightly more-so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it's difficult to read, it's not the intended usage, and so it is not Pythonic.

    以下是用法的一个示例,该用法用django进行了补救.

    Here's an example of the usage being remediated in django.

    字典旨在获取可散列的键(例如,frozenset或元组),但当键不是字符串时,此方法在Python 3中失败.

    Dictionaries are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

    >>> c = dict(a, **b)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: keyword arguments must be strings
    

    邮件列表中,Guido van Rossum语言的创建者,写道:

    From the mailing list, Guido van Rossum, the creator of the language, wrote:

    我很好宣布dict({},** {1:3})非法,因为这毕竟是对**机制.

    I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.

    显然,dict(x,** y)随处可见为"cool hack".通话x.update(y)并返回x".就个人而言,我发现它比很酷.

    Apparently dict(x, **y) is going around as "cool hack" for "call x.update(y) and return x". Personally, I find it more despicable than cool.

    这是我的理解(以及对语言的创建者), dict(** y)的预期用途是出于可读性目的创建字典,例如:

    It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dictionaries for readability purposes, e.g.:

    dict(a=1, b=10, c=11)
    

    代替

    {'a': 1, 'b': 10, 'c': 11}
    

    回复评论

    尽管Guido所说, dict(x,** y)符合dict规范,顺便说一句.它仅适用于Python 2和3.事实上,这仅适用于字符串键,这是关键字参数如何工作的直接结果,而不是dict的缺点.在这个地方使用**运算符也不会滥用该机制,实际上,**的设计目的是将字典作为关键字进行传递.

    Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.

    同样,当键为非字符串时,它对于3无效.隐式调用协定是名称空间采用普通字典,而用户只能传递字符串形式的关键字参数.所有其他可调用对象都强制执行了此操作. dict 在Python 2中破坏了这种一致性:

    Again, it doesn't work for 3 when keys are non-strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

    >>> foo(**{('a', 'b'): None})
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: foo() keywords must be strings
    >>> dict(**{('a', 'b'): None})
    {('a', 'b'): None}
    

    考虑到其他Python实现(Pypy,Jython,IronPython),这种不一致是很严重的.因此,此用法已在Python 3中修复,因为此用法可能是一个重大更改.

    This inconsistency was bad given other implementations of Python (Pypy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

    我向您表示,故意编写仅适用于一种语言版本或仅在特定的任意约束下有效的代码是恶意的无能.

    I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

    更多评论:

    dict(x.items()+ y.items())仍然是Python 2最易读的解决方案.

    dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

    我的回答:如果我们实际上担心可读性,那么 merge_two_dicts(x,y)在我看来实际上要清晰得多.而且它不向前兼容,因为Python 2越来越不推荐使用.

    My response: merge_two_dicts(x, y) actually seems much clearer to me, if we're actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

    {** x,** y} 似乎无法处理嵌套字典.嵌套键的内容只是被覆盖,而不是被合并.我最终被这些没有递归合并的答案所困扰,让我惊讶的是,没有人提到它.在我对合并"一词的解释中,这些答案描述的是将一个词典与另一个词典更新",而不是合并.

    {**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged [...] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.

    是的.我必须回头问这个问题,它要求将 两个 字典的 shallow 合并,第一个字典的值被第二个字典的值覆盖-在一个表达式中.

    Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first's values being overwritten by the second's - in a single expression.

    假定两个字典,一个字典可能会递归地合并到一个函数中,但是您应注意不要从任何一个源修改字典,避免这种情况的最可靠方法是在分配值时进行复制.由于密钥必须是可散列的,因此通常是不可变的,因此复制它们毫无意义:

    Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

    from copy import deepcopy
    
    def dict_of_dicts_merge(x, y):
        z = {}
        overlapping_keys = x.keys() & y.keys()
        for key in overlapping_keys:
            z[key] = dict_of_dicts_merge(x[key], y[key])
        for key in x.keys() - overlapping_keys:
            z[key] = deepcopy(x[key])
        for key in y.keys() - overlapping_keys:
            z[key] = deepcopy(y[key])
        return z
    

    用法:

    >>> x = {'a':{1:{}}, 'b': {2:{}}}
    >>> y = {'b':{10:{}}, 'c': {11:{}}}
    >>> dict_of_dicts_merge(x, y)
    {'b': {2: {}, 10: {}}, 'a': {1: {}}, 'c': {11: {}}}
    

    提出其他价值类型的突发事件远远超出了此问题的范围,因此我将为您指出我对关于词典词典合并"的规范问题..

    Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".

    这些方法的性能较差,但它们将提供正确的行为.它们的性能将比 copy update 或新的拆包性能要差得多,因为它们在更高的抽象级别上遍历每个键值对,但他们 do 遵守优先顺序(后继字典具有优先顺序)

    These approaches are less performant, but they will provide correct behavior. They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)

    您还可以在 dict理解力内手动链接字典:

    You can also chain the dictionaries manually inside a dict comprehension:

    {k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7
    

    或在python 2.6中(也许在引入生成器表达式时最早在2.4中):

    or in python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

    dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2
    

    itertools.chain 将按正确的顺序在键值对上链接迭代器:

    itertools.chain will chain the iterators over the key-value pairs in the correct order:

    from itertools import chain
    z = dict(chain(x.items(), y.items())) # iteritems in Python 2
    

    性能分析

    我将仅对已知行为正确的用法进行性能分析.(自包含,因此您可以复制并粘贴自己.)

    Performance Analysis

    I'm only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)

    from timeit import repeat
    from itertools import chain
    
    x = dict.fromkeys('abcdefg')
    y = dict.fromkeys('efghijk')
    
    def merge_two_dicts(x, y):
        z = x.copy()
        z.update(y)
        return z
    
    min(repeat(lambda: {**x, **y}))
    min(repeat(lambda: merge_two_dicts(x, y)))
    min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
    min(repeat(lambda: dict(chain(x.items(), y.items()))))
    min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
    

    在Python 3.8.1中,NixOS:

    In Python 3.8.1, NixOS:

    >>> min(repeat(lambda: {**x, **y}))
    1.0804965235292912
    >>> min(repeat(lambda: merge_two_dicts(x, y)))
    1.636518670246005
    >>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
    3.1779992282390594
    >>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
    2.740647904574871
    >>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
    4.266070580109954
    

    $ uname -a
    Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux
    

    词典资源