且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

只能使用TensorFlow中处理梯度的代码示例来实现像优化器这样的梯度下降吗?

更新时间:2022-12-14 15:10:03

您的解决方案会降低代码速度,因为在创建"train_step"时会使用sess.run.eval()代码.相反,您应该仅使用内部tensorflow函数(不使用sess.run.eval())创建train_step图.此后,您只能循环评估train_step.

如果您不想使用任何标准的优化器,则可以编写自己的应用梯度"图.这是一种可能的解决方案:

learning_rate = tf.Variable(tf.constant(0.1))
mu_noise = 0.
stddev_noise = 0.01

#add all your W variables here when you have more than one:
train_w_vars_list = [W]
grad = tf.gradients(some_loss, train_w_vars_list)

assign_list = []
for g, v in zip(grad, train_w_vars_list):
  eps = tf.random_normal(tf.shape(g), mean=mu_noise, stddev=stddev_noise)
  assign_list.append(v.assign(tf.mod(v - learning_rate*g + eps, 20)))

#also update the learning rate here if you want to:
assign_list.append(learning_rate.assign(learning_rate - 0.001))

train_step = tf.group(*assign_list)

您还可以使用标准优化程序之一来创建grads_and_vars列表(然后使用它代替zip(grad,train_w_vars_list)).

这是MNIST遭受损失的一个简单例子:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow.examples.tutorials.mnist import input_data

import tensorflow as tf

# Import data
mnist = input_data.read_data_sets('PATH TO MNIST_data', one_hot=True)

# Create the model
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
y = tf.matmul(x, W)


# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10])

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

learning_rate = tf.Variable(tf.constant(0.1))
mu_noise = 0.
stddev_noise = 0.01

#add all your W variables here when you have more than one:
train_w_vars_list = [W]
grad = tf.gradients(cross_entropy, train_w_vars_list)

assign_list = []
for g, v in zip(grad, train_w_vars_list):
  eps = tf.random_normal(tf.shape(g), mean=mu_noise, stddev=stddev_noise)
  assign_list.append(v.assign(tf.mod(v - learning_rate*g + eps, 20)))

#also update the learning rate here if you want to:
assign_list.append(learning_rate.assign(learning_rate - 0.001))

train_step = tf.group(*assign_list)


sess = tf.InteractiveSession()
tf.global_variables_initializer().run()


# Train
for _ in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})


# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                    y_: mnist.test.labels}))

I was looking at the example code for processing gradients that TensorFlow has:

# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)

# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)

# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]

# Ask the optimizer to apply the capped gradients.
opt.apply_gradients(capped_grads_and_vars)

however, I noticed that the apply_gradients function was derived from the GradientDescentOptimizer. Does that mean that using the example code from above, one can only implement gradient like descent rules (notice we could change the opt = GradientDescentOptimizer or Adam or any of the the other optimizers)? In particular, what does apply_gradients do? I definitively check the code in the tf github page but it was a bunch of python that had nothing to do with mathematical expressions, so it was hard to tell what that was doing and how it changed from optimizer to optimizer.

For example, if I wanted to implement my own custom optimizer that might use gradients (or might not e.g. just change the weights directly with some rule, maybe more biologically plausible rule), its not possible with the above example code?


In particular I wanted to implement a gradient descent version that is artificially restricted in a compact domain. In particular I wanted to implement the following equation:

w := (w - mu*grad + eps) mod B

in TensorFlow. I realized that the following is true:

w := w mod B - mu*grad mod B + eps mod B

so I thought that I could just implement it by doing:

def Process_grads(g,mu_noise,stddev_noise,B):
    return (g+tf.random_normal(tf.shape(g),mean=mu_noise,stddev=stddev_noise) ) % B

and then just having:

processed_grads_and_vars = [(Process_grads(gv[0]), gv[1]) for gv in grads_and_vars]
# Ask the optimizer to apply the processed gradients.
opt.apply_gradients(processed_grads_and_vars)

however, I realized that that wasn't good enough because I don't actually have access to w so I can't implement:

w mod B

at least not the way I tried. Is there a way to do this? i.e. to actually directly change the update rule? At least the way I tried?

I know its sort of a hacky update rule, but my point is more to change the update equation than actually caring to much about that update rule (so don't get hung up on it if its a bit weird).


I came up with super hacky solution:

def manual_update_GDL(arg,learning_rate,g,mu_noise,stddev_noise):
    with tf.variable_scope(arg.mdl_scope_name,reuse=True):
        W_var = tf.get_variable(name='W')
        eps = tf.random_normal(tf.shape(g),mean=mu_noise,stddev=stddev_noise)
        #
        W_new = tf.mod( W_var - learning_rate*g + eps , 20)
        sess.run( W_var.assign(W_new) )

def manual_GDL(arg,loss,learning_rate,mu_noise,stddev_noise,compact,B):
    # Compute the gradients for a list of variables.
    grads_and_vars = opt.compute_gradients(loss)
    # process gradients
    processed_grads_and_vars = [(manual_update_GDL(arg,learning_rate,g,mu_noise,stddev_noise), v) for g,v in grads_and_vars]

not sure if it works but something like that should work in general. The idea is to just write down the equation one wants to use (in TensorFlow) for the learning rate and then update the weights manually using a session.

Unfortunately, such a solution means we have to take care of the annealing (decaying learning rate manually which seems annoying). This solution probably has many other problems, feel free to point them out (and give solutions if you can).


For this very simple problem I realized one can just do the normal optimizer update rule and then just take the mod of the weights and re-assign them to their value:

sess.run(fetches=train_step)
if arg.compact:
    # apply w := ( w - mu*g + eps ) mod B
    W_val = W_var.eval()
    W_new = tf.mod(W_var,arg.B).eval()
    W_var.assign(W_new).eval()

but in this case its a coincidence that such a simple solution exists (unfortunately, bypasses the whole point of my question).

Actually, this solutions slows down the code a lot. For the moment is the best that I've got.


As a reference, I have seen this question: How to create an optimizer in Tensorflow , but didn't find it responded directly to my question.

Your solution slows down the code because you use the sess.run and .eval() code during your "train_step" creation. Instead you should create the train_step graph using only internal tensorflow functions (without using sess.run and .eval()). Thereafter you only evaluate the train_step in a loop.

If you don't want to use any standard optimizer you can write your own "apply gradient" graph. Here is one possible solution for that:

learning_rate = tf.Variable(tf.constant(0.1))
mu_noise = 0.
stddev_noise = 0.01

#add all your W variables here when you have more than one:
train_w_vars_list = [W]
grad = tf.gradients(some_loss, train_w_vars_list)

assign_list = []
for g, v in zip(grad, train_w_vars_list):
  eps = tf.random_normal(tf.shape(g), mean=mu_noise, stddev=stddev_noise)
  assign_list.append(v.assign(tf.mod(v - learning_rate*g + eps, 20)))

#also update the learning rate here if you want to:
assign_list.append(learning_rate.assign(learning_rate - 0.001))

train_step = tf.group(*assign_list)

You can also use one of the standard optimizer to create the grads_and_vars list (use it instead of zip(grad, train_w_vars_list) then).

Here is a simple example for MNIST with your loss:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow.examples.tutorials.mnist import input_data

import tensorflow as tf

# Import data
mnist = input_data.read_data_sets('PATH TO MNIST_data', one_hot=True)

# Create the model
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
y = tf.matmul(x, W)


# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10])

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

learning_rate = tf.Variable(tf.constant(0.1))
mu_noise = 0.
stddev_noise = 0.01

#add all your W variables here when you have more than one:
train_w_vars_list = [W]
grad = tf.gradients(cross_entropy, train_w_vars_list)

assign_list = []
for g, v in zip(grad, train_w_vars_list):
  eps = tf.random_normal(tf.shape(g), mean=mu_noise, stddev=stddev_noise)
  assign_list.append(v.assign(tf.mod(v - learning_rate*g + eps, 20)))

#also update the learning rate here if you want to:
assign_list.append(learning_rate.assign(learning_rate - 0.001))

train_step = tf.group(*assign_list)


sess = tf.InteractiveSession()
tf.global_variables_initializer().run()


# Train
for _ in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})


# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                    y_: mnist.test.labels}))