ValueError: Argument must be a dense tensor

问题

报错:

ValueError: Argument must be a dense tensor: [array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)] - got shape [1, 10000, 10], but wanted [1].

原始代码

...
final_result = sess.run(fetches=[y], feed_dict={x: mnist.test.images})
...
y = tf.Variable(initial_value=final_result, name=TENSOR_NAME)
...

大概意思就是将一个神经网络的输出final_result作为参数, 传给下个方法, 下个方法以这个参数为初始值创建一个tf.Variable对象.

解决办法

...
final_result = sess.run(fetches=y, feed_dict={x: mnist.test.images})
...
y = tf.Variable(initial_value=final_result, name=TENSOR_NAME)
...

sess.run中包裹y的方括号去除就可以了.

想了一下, 可能是因为传入run方法的参数fetches存在类型判断, 如果是list类型, 则返回结果也是值的list类型; 如果是单个操作, 则可能直接返回的是值.

TensorFlow中的+和tf.add有没有区别?

    a = tf.get_variable(name='a', shape=(1,), initializer=tf.constant_initializer(value=0))
    b = tf.get_variable(name='b', shape=(1,), initializer=tf.constant_initializer(value=1))
    c = a + b
    d = tf.add(x=a, y=b, name='d')
    e = tf.get_variable(name='e', initializer=tf.truncated_normal(shape=(2, 3), stddev=1, mean=0))
    f = tf.get_variable(name='f', initializer=tf.truncated_normal(shape=(2, 3), stddev=1, mean=0))
    g = e + f
    h = tf.add(x=e, y=f, name='h')
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        print(a.name)  # a:0
        print(b.name)  # b:0
        print(c.name)  # add:0
        print(d.name)  # d:0
        print(e.name)  # e:0
        print(f.name)  # f:0
        print(sess.run(a))  # [0.]
        print(sess.run(b))  # [1.]
        print(sess.run(c))  # [1.]
        print(sess.run(d))  # [1.]
        print(sess.run(e))
        # [[-0.9891971   0.5030212  -0.48883826]
        #  [-1.4214745  -0.6322079  -0.4219446 ]]
        print(sess.run(f))
        # [[-0.10764185  0.6467568   0.03688372]
        #  [ 1.1124796  -0.89837056  0.67874044]]
        print(sess.run(g))
        # [[-1.096839    1.149778   -0.45195454]
        #  [-0.3089949  -1.5305784   0.25679585]]
        print(sess.run(h))
        # [[-1.096839    1.149778   -0.45195454]
        #  [-0.3089949  -1.5305784   0.25679585]]

An unusually high number of `Iterator.get_next()` calls was detected

问题描述

./EXAMPLE.py:407: UserWarning: An unusually high number of `Iterator.get_next()` calls was detected.
    This often indicates that `Iterator.get_next()` is being called inside a training loop, which will cause gradual slowdown and eventual resource exhaustion. 
    If this is the case, restructure your code to call `next_element = iterator.get_next()` once outside the loop, and use `next_element` as the input to some computation that is invoked inside the loop.
  warnings.warn(GET_NEXT_CALL_WARNING_MESSAGE)
libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: thread constructor failed: Resource temporarily unavailable

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

查看代码, 的确是将get_next放在了loop里面.

iterator = dataset.make_one_shot_iterator()
        while True:
            try:
                features, labels = iterator.get_next()
                features_val = sess.run(features)
                labels_val = sess.run(labels)
                features_val = np.array(features_val).reshape((32,28*28))
                labels_val = np.array(labels_val).reshape((32,1))
                sess.run(fetches=self.train_op, feed_dict={self.x: features_val, self.y: labels_val})

            except tf.errors.OutOfRangeError as _e:
                break

改正之后就正确了.

iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
        while True:
            try:
                features_val = sess.run(features)
                labels_val = sess.run(labels)
                features_val = np.array(features_val).reshape((32,28*28))
                labels_val = np.array(labels_val).reshape((32,1))
                sess.run(fetches=self.train_op, feed_dict={self.x: features_val, self.y: labels_val})

            except tf.errors.OutOfRangeError as _e:
                break

InvalidArgumentError : You must feed a value for placeholder tensor

问题描述

在运行过程中, 抛出异常:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'input/y-input' with dtype float and shape [?,10]

原始代码如下


import tensorflow as tf from tensorflow.contrib.layers import l2_regularizer from tensorflow.examples.tutorials.mnist import input_data REGULARIZATION_RATE = 0.0001 TRAINING_STEPS = 30000 MOVING_AVERAGE_DECAY = 0.99 BATCH_SIZE = 100 LEARNING_RATE_BASE = 0.8 LEARNING_RATE_DECAY = 0.99 INPUT_NODE = 28 * 28 OUTPUT_NODE = 10 LAYER1_NODE = 500 def get_weight_variable(shape, regularizer): weights = tf.get_variable('weights', shape=shape, initializer=tf.truncated_normal_initializer(stddev=0.1)) if regularizer is not None: tf.add_to_collection('losses', regularizer(weights)) return weights # 定义神经网络的前向传播过程. def inference(input_tensor, regularizer=None): # 声明第一层神经网络并前向传播. with tf.variable_scope('layer1'): weights = get_weight_variable(shape=[INPUT_NODE, LAYER1_NODE], regularizer=regularizer) biases = tf.get_variable(name='biases', shape=[LAYER1_NODE], initializer=tf.constant_initializer(value=0.0)) layer1 = tf.nn.relu(tf.matmul(a=input_tensor, b=weights) + biases) # 声明第二层神经网络并前向传播. with tf.variable_scope('layer2'): weights = get_weight_variable(shape=[LAYER1_NODE, OUTPUT_NODE], regularizer=regularizer) biases = tf.get_variable(name='biases', shape=[OUTPUT_NODE], initializer=tf.constant_initializer(value=0.0)) layer2 = tf.nn.relu(tf.matmul(a=layer1, b=weights) + biases) # 返回前向传播的结果. return layer2 def train(mnist): # 将处理输入数据的计算放在命名空间'input'内. with tf.name_scope('input'): x = tf.placeholder(dtype=tf.float32, shape=(None,INPUT_NODE), name='x-input') y_ = tf.placeholder(dtype=tf.float32, shape=(None,OUTPUT_NODE), name='y-input') regularizer = l2_regularizer(REGULARIZATION_RATE) y = inference(x, regularizer) global_step = tf.Variable(initial_value=0, trainable=False) # 将处理滑动平均相关的计算都放在命名空间'moving_average'内. with tf.name_scope('moving_average'): variable_averages = tf.train.ExponentialMovingAverage(decay=MOVING_AVERAGE_DECAY, num_updates=global_step) variables_average_op = variable_averages.apply(tf.trainable_variables()) # 将计算损失函数的计算都放在命名空间'loss_function'内. with tf.name_scope('loss_function'): cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1)) cross_entropy_mean = tf.reduce_mean(cross_entropy) loss = cross_entropy_mean + tf.add_n(inputs=tf.get_collection('losses')) # 将定义学习率, 优化器以及每一轮训练需要执行的操作都放在命名空间'train_step'内. with tf.name_scope('train_step'): learning_rate = tf.train.exponential_decay(learning_rate=LEARNING_RATE_BASE, global_step=global_step, decay_steps=mnist.train.num_examples / BATCH_SIZE, decay_rate=LEARNING_RATE_DECAY, staircase=True) train_step = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss=loss, global_step=global_step) with tf.control_dependencies(control_inputs=[train_step, variables_average_op]): train_op = tf.no_op(name='train') saver = tf.train.Saver() with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(TRAINING_STEPS): xs, ys = mnist.train.next_batch(BATCH_SIZE) _, loss_value, step = sess.run(fetches=[train_op, loss, global_step], feed_dict={x: xs, y: ys}) if i % 1000 == 0: print('After %d training steps, loss on training batch is %d.' % (step, loss_value)) # 将当前计算图输出到TensorBoard日志文件. writer = tf.summary.FileWriter(logdir='./log/', graph=tf.get_default_graph()) writer.close() def main(argv=None): mnist = input_data.read_data_sets('./mnist_data/', one_hot=True) train(mnist) if __name__ == '__main__': tf.app.run()

经过排查, 发现错误行:

_, loss_value, step = sess.run(fetches=[train_op, loss, global_step], feed_dict={x: xs, y: ys})

其中, feed_dict中应该是将ys传给y_占位符, 这里误写成了y, 导致报出占位符无输入的错误. 修改之后正常运行.

TensorFlow Debug

Debug in TensorFlow

官方介绍Guide – Debugging – TensorFlow Debugger中已经介绍的很详细了, 但是有几个疏漏的点需要补充一下.

  1. 使用 tfdbg 调试模型训练 章节中提到了可以通过自定义filter进行相关张量的筛选.
def my_filter_callable(datum, tensor):
  # A filter that detects zero-valued scalars.
  return len(tensor.shape) == 0 and tensor == 0.0

sess.add_tensor_filter('my_filter', my_filter_callable)


该filter是通过添加到Session中发挥作用的. 而Estimator则是在内部管理Session, 无法通过这种方式显式的增加filter.

# First, let your BUILD target depend on "//tensorflow/python/debug:debug_py"
# (You don't need to worry about the BUILD dependency if you are using a pip
#  install of open-source TensorFlow.)
from tensorflow.python import debug as tf_debug

# Create a LocalCLIDebugHook and use it as a monitor when calling fit().
hooks = [tf_debug.LocalCLIDebugHook()]

# To debug `train`:
classifier.train(input_fn, steps=1000,  hooks=hooks)

目前看起来正确的方式应该是通过LocalCLIDebugHook实例的add_tensor_filter方法进行添加.

sess通过tf_debug.LocalCLIDebugWrapperSession包裹之后, 其新添了add_tensor_filter方法, 源码是

  def add_tensor_filter(self, filter_name, tensor_filter):
    """Add a tensor filter.

    Args:
      filter_name: (`str`) name of the filter.
      tensor_filter: (`callable`) the filter callable. See the doc string of
        `DebugDumpDir.find()` for more details about its signature.
    """

    self._tensor_filters[filter_name] = tensor_filter

而LocalCLIDebugHook中的add_tensor_filter方法和上一种是相同的.

  def add_tensor_filter(self, filter_name, tensor_filter):
    """Add a tensor filter.

    See doc of `LocalCLIDebugWrapperSession.add_tensor_filter()` for details.
    Override default behavior to accommodate the possibility of this method being
    called prior to the initialization of the underlying
    `LocalCLIDebugWrapperSession` object.

    Args:
      filter_name: See doc of `LocalCLIDebugWrapperSession.add_tensor_filter()`
        for details.
      tensor_filter: See doc of
        `LocalCLIDebugWrapperSession.add_tensor_filter()` for details.
    """

    if self._session_wrapper:
      self._session_wrapper.add_tensor_filter(filter_name, tensor_filter)
    else:
      self._pending_tensor_filters[filter_name] = tensor_filter

basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.

报错tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.

是在调用wide_deep的wide过程中报错的.
用tfdbg查看, 发现是在节点linear/linear_model/item_id_embedding/Reshape中存在全部为nan的张量.

 2 input(s):
      [Reshape] linear/linear_model/item_id_embedding/Reshape
      [Identity]

Tensorflow估计模型随损失= NaN而发散?
Deep-Learning Nan loss reasons
Tensorflow NaN loss during training: trying to reshape logits and labels

_curses.error: setupterm: could not find terminal

按照 调试 TensorFlow Estimator 接受的步骤进行调试, 发现会报错 _curses.error: setupterm: could not find terminal.

于是在 Stack Overflow 上进行检索, 发现问题 tensorflow _curses.error: setupterm: could not find terminal
下有个回答说到, 如果不适用 Pycharm 而是在 terminal 中正常执行, 则不会报错.

按照这个说法进行尝试, 果然可以.

但是还是没有找到具体原因.

需要进一步核实.

TensorFlow Tutorial

简介

TensorFlow中有多个概念.

  1. Basic
    该章节介绍TensorFlow中一些基本的概念.
    TensorFlow中的Graph和Session

  2. Dataset
    TensorFlow Dataset

  3. Estimator
    http://www.blackpoint.tech/?p=1363&preview=true

  4. TensorBoard

  5. Keras

  6. Debug
    TensorFlow Debug

  7. TensorFlow Serving
    TensorFlow Serving – Architecture
    Tensorflow Serving Brief

  8. Save & restore with SavedModel

  9. Python 与 C/C++语言的交互

补充:

  1. tensorflow / numpy 如何进行高效的数学计算?

  2. TensorFlow常见问题集合

TensorFlow中的Graph和Session

Graph

A computational graph is a series of TensorFlow operations arranged into a graph. The graph is composed of two types of objects.

tf.Operation (or “ops”): The nodes of the graph. Operations describe calculations that consume and produce tensors.
tf.Tensor: The edges in the graph. These represent the values that will flow through the graph. Most TensorFlow functions return tf.Tensors.

Quoting from TensorFlow – Guide – Low Level APIs – Introduction

从上面的介绍可以看出, Graph主要包含两种构成,OperationTensor.
Operation构成了图中的节点, 用来表示消费和生产Tensor的计算过程.
Tensor构成了图中的边, 表示的是在图中进行流动的值. TensorFlow中大部分的方法都会返回Tensor.

What is a tf.Graph?
A tf.Graph contains two relevant kinds of information:

Graph structure. The nodes and edges of the graph, indicating how individual operations are composed together, but not prescribing how they should be used. The graph structure is like assembly code: inspecting it can convey some useful information, but it does not contain all of the useful context that source code conveys.

Graph collections. TensorFlow provides a general mechanism for storing collections of metadata in a tf.Graph. The tf.add_to_collection function enables you to associate a list of objects with a key (where tf.GraphKeys defines some of the standard keys), and tf.get_collection enables you to look up all objects associated with a key. Many parts of the TensorFlow library use this facility: for example, when you create a tf.Variable, it is added by default to collections representing “global variables” and “trainable variables”. When you later come to create a tf.train.Saver or tf.train.Optimizer, the variables in these collections are used as the default arguments.

Quoting from TensorFlow – Guide – Low Level APIs – Graph and Sessions

Session

Graph && Session

Graph是用来定义操作和变量的. 定义好了之后, 交给Session进行加载和计算. Session同时管理资源.

References

MVC模式简介

简介

关注点分离(SoC - Separation of Concerns)原则是软件工程相关的设计原则之一. SoC原则的本质是将应用切成不同的部分, 每个部分解决一个单独的关注点.

MVC模式代表的是Model-View-Controller模式, 分别代表了三种角色, 模型, 视图和控制器, 该模式是应用到面向对象编程的SoC原则.

模型是核心的部分, 代表着应用的信息本源, 包含和管理(业务)逻辑, 数据, 状态及应用的规则.

视图是模型的可视化表现, 举例来说, 程序的图形化界面, 终端文本输出, 智能手机的应用图形界面, 各种类型的图(柱形图, 饼状图等). 视图只是展示数据, 并不处理数据.

控制器是模型与视图之间的链接. 模型和视图之间的所有通信都是通过控制器进行控制.

参考 : 精通Python设计模式 – 第八章 模型-视图-控制器模式

References

MVC 模式 | 菜鸟教程
精通Python设计模式 – 第八章 模型-视图-控制器模式
MVC简介
MVC框架 – 百度百科