且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

tensorflow:在多个检查点上运行模型评估

更新时间:2023-12-02 11:44:58

最快的解决方案:

tensor2tensor 有一个模块 utils 和一个脚本 avg_checkpoints.py 保存新检查点的平均权重.假设您有一个想要平均的检查点列表.您有 2 个使用选项:

Fastest solution:

tensor2tensor has a module utils with a script avg_checkpoints.py that saves the averaged weights in a new checkpoint. Let us say you have a list of checkpoints that you want to average. You have 2 options for usage:

  1. 来自命令行

  1. From command line

TRAIN_DIR=path_to_your_model_folder
FNC_PATH=path_to_tensor2tensor+'/utils/avg.checkpoints.py'
CKPTS=model.ckpt-10000,model.ckpt-20000,model.ckpt-100000

python3 $FNC_PATH --prefix=$TRAIN_DIR --checkpoints=$CKPTS \ 
    --output_path="${TRAIN_DIR}averaged.ckpt"

  • 来自您自己的代码(使用 os.system):

    import os
    os.system(
        "python3 "+FNC_DIR+" --prefix="+TRAIN_DIR+" --checkpoints="+CKPTS+
        " --output_path="+TRAIN_DIR+"averaged.ckpt"
    )
    

  • 作为指定检查点列表并使用 --checkpoints 参数的替代方法,您可以只使用 --num_checkpoints=10 来平均最后 10 个检查点.

    As an alternative to specifying a list of checkpoints and using the --checkpoints argument, you can just use --num_checkpoints=10 to average the last 10 checkpoints.

    这是一个不依赖于 tensor2tensor 的代码片段,但仍然可以平均可变数量的检查点(与 ted 的答案相反).假设 steps 是应该合并的检查点列表(例如 [10000, 20000, 30000, 40000]).

    Here is a code snippet that does not rely on tensor2tensor, but can still average a variable number of checkpoints (as opposed to ted's answer). Assume steps is a list of checkpoints that should be merged (e.g. [10000, 20000, 30000, 40000]).

    那么:

    # Restore all sessions and save the weight matrices
    values = []
    for step in steps:
        tf.reset_default_graph()
        path = model_path+'/model.ckpt-'+str(step)
        with tf.Session() as sess:
            saver = tf.train.import_meta_graph(path+'.meta')
            saver.restore(sess, path)
            values.append(sess.run(tf.all_variables()))
    
    # Average weights
    variables = tf.all_variables()
    all_assign = []
    for ind, var in enumerate(variables):
        weights = np.concatenate(
            [np.expand_dims(w[ind],axis=0)  for w in values],
            axis=0
        )
        all_assign.append(tf.assign(var, np.mean(weights, axis=0))
    

    然后你可以继续,但是你喜欢,例如保存平均检查点:

    Then you can proceed, however you prefer, e.g. saving the averaged checkpoint:

    # Now save the new values into a separate checkpoint
    with tf.Session() as sess_test:
        sess_test.run(all_assign)
        saver = tf.train.Saver() 
        saver.save(sess_test, model_path+'/average_'+str(num_checkpoints))