且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

深度复制类实例并执行其他操作的最有效,最Pythonic的方法

更新时间:2023-12-02 20:46:46

  1. 使用模式非常重要.如果我只是开始考虑项目中大型对象的深层副本的效率,那么我会喜欢 lenses.这个想法是,与其直接与 Book 进行交互,不如与 Book 进行交互,并在修改时在内部跟踪不可变的增量和变更集.

  2. 这有待讨论,但是标准库中的 copy.deepcopy 在Python代码方面很难被击败.

  3. 绝对没问题.

3a.作为一个有点病态的例子,如果您的类仅存储几个字节,那么直接复制整个类并不比您要对其进行的任何修改都要昂贵.您的深层复制策略将是仅复制它并做您需要做的事情-pythonic解决方案也将是***"解决方案.大多数指标都可以解决问题.

3b.如果您每个类存储 10 ** 10 个字节并且仅修改5个字节,那么与您所做的工作相比,完整副本的成本可能会高得惊人.镜头看起来真的很吸引人(因为它们只是保留了对原始对象的引用和5个字节的更改).

3c.您提到数据访问时间是一个潜在因素.我在这里以一个长链表为例-完整的Deepcopy和lens在这里的表现都将很差,因为它们需要遍历整个表才能进行任何幼稚的实现.如果您知道需要对链接列表进行深度复制,尽管您可以提出一种将列表一分为二的不可变树结构,并使用该结构采用准镜头策略,则只交换要更改的树的叶子/分支./p>

3 *.再详细说明这些示例,您正在为代码中的对象集合选择一种表示形式,可以将其概念化为一系列深层副本和修改.有一些经典的方法(复制所有数据并对副本进行更改,让数据保留下来并记录所有更改)适用于具有某些性能特征的任意对象,但是如果您有关于该问题的其他信息(例如,这些都是大数组,并且只有一个坐标发生变化),您可以使用它来设计最适合您的问题的表示形式,更重要的是您希望达到的性能折衷.

Say I have a Python class instance, here's a really simple example:

class Book:
    def __init__(self, genre):
        self.genre = genre
        self.author = self.title = None
        return

    def assign_author(self, author_name):
        self.author = author_name
        return

    def assign_title(self, title_name):
        self.title = title_name
        return

western_book_template = Book("Western")

For various reasons, I want each class instance to be able to generate a deep copy of itself, and then perform some additional manipulation on the new object.

Here's what I'd like to know:

  1. What would be the most efficient way to do that?
  2. What about the most Pythonic one (best practice)?
  3. Does the amount of data stored in the instance, as well as the amount of computations performed to get to that data, influence the previous answers? If so, how?

I know I can use western_book_1 = copy.deepcopy(western_book_template), and then perform all additional manipulation I want on western_book_1 directly, but wouldn't it be better to do something like:

class Book:
    def __init__(self, genre):
        self.genre = genre
        self.author = self.title = None
        return

    def assign_author(self, author_name):
        self.author = author_name
        return

    def assign_title(self, title_name):
        self.title = title_name
        return

    def generate_specific_book(self, author_name, title_name):
        specific_book = Book(self.genre)
        specific_book.assign_author(author_name)
        specific_book.assign_title(title_name)
        return specific_book

western_book_template = Book("Western")
western_book_1 = western_book_template.generate_specific_book("John E. Williams", "Butcher's Crossing")

This would give me more control over what gets copied and what doesn't, as well as allowing me to perform all additional manipulation in one place. I am assuming the above is inefficient in cases where a lot of computation was needed to get to the data stored in the instance.

  1. Use patterns matter a lot. If I'm just starting to think about efficiency for deep copies of large objects in a project, I rather enjoy lenses. The idea is that instead of interacting with a Book directly you'll interact with something that's acts like a Book and internally keeps track of immutable deltas and changesets when you modify it.

  2. This is up for debate, but copy.deepcopy from the standard library is hard to beat in terms of pythonic code.

  3. Absolutely, no questions.

3a. As a somewhat pathological example, if your class is only storing a few bytes then straight up copying the entire class isn't any more expensive than any modification you were about to do to it. Your deepcopy strategy would be to just copy it and do what you need to do -- the pythonic solution would also be the "best" solution by most metrics.

3b. If you're storing 10**10 bytes per class and only modifying 5 then a full copy is probably incredibly expensive compared to the work you're doing. Lenses can look really attractive (since they'll just hold a reference to the original object and the 5 byte change).

3c. You mentioned data access times as a potential factor. I'll use a long linked list as an example here -- both a full deepcopy and lenses will perform poorly here since they need to walk the entire list for any naive implementation. If you knew you needed to deepcopy linked lists though you could come up with some kind of immutable tree structure bisecting the list and use that to adopt a quasi-lenses strategy, only swapping out the leaves/branches of the tree that you change.

3*. Elaborating on the examples a little more, you're choosing a representation for a collection of objects in your code that can be conceptualized as a chain of deep copies and modifications. There are some classic approaches (copy all the data and make changes to the copies, leave the data alone and record all the changes) that work on arbitrary objects that have certain performance characteristics, but if you have additional information about the problem (e.g., these are all big arrays and only one coordinate changes) you can use that to design a representation that best fits your problem, and more importantly your desired performance trade-offs.