且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

未解析XML文件并将其添加到列表

更新时间:2023-12-02 23:09:58

问题的原因在这里:

path = os.listdir(directory)
for filename in path:
    tree = ET.parse(filename)

os.listdir()返回名称的列表,而不是完整路径.因此,ET.parse()尝试在当前工作目录中而不是在directory中打开名称为的文件.

os.listdir() returns a list of names, not full path. So ET.parse() tries to open a file by that name in the current working directory, not in directory.

您要:

filenames = os.listdir(directory)
for filename in filenames:
    filepath = os.path.join(directory, filename) 
    tree = ET.parse(filepath)

另外,这个:

    try:
        tree = ET.parse(filename)
        root = tree.getroot()
        doc_parser(root)
    except:
        print("ERROR ON FILE: {}".format(filename))

是您最糟糕的事情.实际上,这将使您无法知道出了什么问题以及出了什么地方,因此根本无法调试代码.

is the worst thing you could do. This will actually prevent you from knowing what went wrong and where, so you cannot debug your code at all.

正确的异常处理准则:

1/从不永远不要使用"bare" except子句,请始终指定您期望的确切例外.对于***全部捕获"处理程序,至少将您的except子句限制为Exception,这样您就不会捕获SystemExit.

1/ NEVER EVER use a "bare" except clause, always specify the exact exception(s) you are expecting at this point. For a top-level "catch all" handler, at least restrict your except clause to Exception, so you don't catch SystemExit.

2/尽可能缩小try块(此处具有尽可能少的代码).这是为了确保您知道在哪里有效地处理了异常,因此,如果两个语句出于不相关的原因而引发相同的异常类型,则只能捕获所期望的异常.

2/ Have the narrower possible try block (have as few code as possible here). This is to make sure you know where the exception you are handling was effectively raised, so if two statements raises the same exception type for unrelated reasons, you only catch the one you expected.

3/仅捕获您可以在代码的这一点上实际有效地处理的异常.如果此时您无法处理该异常,则让其传播(或使用附加信息进行报告并重新引发).

3/ only catch exception you can actually and effectively handle at this point of the code. If you cannot handle the exception at this point, just let it propagate (or report it with additionnal informations and re-raise it).

4/永远不要对真实发生的事情承担任何责任.报告异常时,请使用异常消息和回溯. stdlib的logging模块使操作变得轻而易举(好了,一旦您学会了正确配置记录器,这可能就是PITA xD的一部分).

4/ Never assume anything about what really happened. Use the exception message and the traceback when reporting the exception. The stdlib's logging module makes it a breeze (well, once you've learned to properly configure your logger which can be a bit of a PITA xD).

您想要的是这样的

    try:
        tree = ET.parse(filepath)
    except ET.ParseError as e:
        # using `logging.exception()` would be better,
        # but we don't really need the whole traceback here
        # as the error is specific enough and we already
        # know where it happens
        print("{} is not valid XML: {}".format(filepath, e))
        continue 

    root = tree.getroot()
    doc_parser(root)