Tensorflow Dataset Brief

作为新手, 你可能还在使用feed_dict来给你的tensorflow 图提供数据. 但是, Tensorflow已经在tf.data中提供了更高层的API, 用于提供数据.
As green hand, you can use feed_dict to feed data to your tensorflow graph. However, Tensorflow has provided high-level API for feeding data in tf.data.

Basic Concept

tf.data.Dataset 用来表征数据.
tf.data.Iterator 用来消费数据.

Dataset.make_one_shot_iterator() 不需要显性的初始化.


Acccording to the definition of Iterator,

注意 : 大部分用户并不会直接调用, 而是用Dataset.make_initializable_iterator() or Dataset.make_one_shot_iterator()来间接生成.

Note: Most users will not call this initializer directly, and will instead use Dataset.make_initializable_iterator() or Dataset.make_one_shot_iterator().

但是, 如果你知道dataset的数据类型和结构, 可以通过调用staticmethod方法 from_structure 来实例化一个Iterator.

However, if you know the specific type and structure of dataset, you can generate one Iterator instance by calling the static method from_structure

Shell Tips

  1. Find some file with unnecessary warning ‘Permission denied’ ?
    A : find [path] -name "pattern " 2>/dev/null to brush those annoying waring off.


Wonderful answers on stackoverflow What are metaclasses in Python?.
In this answer, it has expalined the relation between concepts of metaclass, class and object in Python.
Stronlgy recommend read this artical at first. If link unavaiable, plz check blog inner link reprint What are metaclasses in Python?.


创建对象使用类, 创建类使用class, 而创建类也可以使用元类.

Metaclass() = class
class() = object  # object==>实例






What are metaclasses in Python ?

Reprint from answer of Thomas Wouters in question What are metaclasses in Python? just for case that original answer unavaiable.

A metaclass is the class of a class. A class defines how an instance of the class (i.e. an object) behaves while a metaclass defines how a class behaves. A class is an instance of a metaclass.

While in Python you can use arbitrary callables for metaclasses (like Jerub shows), the better approach is to make it an actual class itself. type is the usual metaclass in Python. type is itself a class, and it is its own type. You won’t be able to recreate something like type purely in Python, but Python cheats a little. To create your own metaclass in Python you really just want to subclass type.

A metaclass is most commonly used as a class-factory. When you create an object by calling the class, Python creates a new class (when it executes the ‘class’ statement) by calling the metaclass. Combined with the normal __init__ and __new__ methods, metaclasses therefore allow you to do ‘extra things’ when creating a class, like registering the new class with some registry or replace the class with something else entirely.

When the class statement is executed, Python first executes the body of the class statement as a normal block of code. The resulting namespace (a dict) holds the attributes of the class-to-be. The metaclass is determined by looking at the baseclasses of the class-to-be (metaclasses are inherited), at the __metaclass__ attribute of the class-to-be (if any) or the __metaclass__ global variable. The metaclass is then called with the name, bases and attributes of the class to instantiate it.

However, metaclasses actually define the type of a class, not just a factory for it, so you can do much more with them. You can, for instance, define normal methods on the metaclass. These metaclass-methods are like classmethods in that they can be called on the class without an instance, but they are also not like classmethods in that they cannot be called on an instance of the class. type.__subclasses__() is an example of a method on the type metaclass. You can also define the normal ‘magic’ methods, like __add__, __iter__ and __getattr__, to implement or change how the class behaves.

Here’s an aggregated example of the bits and pieces:

    def make_hook(f):
    """Decorator to turn 'foo' method into '__foo__'"""
    f.is_hook = 1
    return f

class MyType(type):
    def __new__(mcls, name, bases, attrs):

        if name.startswith('None'):
            return None

        # Go over attributes and see if they should be renamed.
        newattrs = {}
        for attrname, attrvalue in attrs.iteritems():
            if getattr(attrvalue, 'is_hook', 0):
                newattrs['__%s__' % attrname] = attrvalue
                newattrs[attrname] = attrvalue

        return super(MyType, mcls).__new__(mcls, name, bases, newattrs)

    def __init__(self, name, bases, attrs):
        super(MyType, self).__init__(name, bases, attrs)

        # classregistry.register(self, self.interfaces)
        print "Would register class %s now." % self

    def __add__(self, other):
        class AutoClass(self, other):
        return AutoClass
        # Alternatively, to autogenerate the classname as well as the class:
        # return type(self.__name__ + other.__name__, (self, other), {})

    def unregister(self):
        # classregistry.unregister(self)
        print "Would unregister class %s now." % self

class MyObject:
    __metaclass__ = MyType

class NoneSample(MyObject):

# Will print "NoneType None"
print type(NoneSample), repr(NoneSample)

class Example(MyObject):
    def __init__(self, value):
        self.value = value
    def add(self, other):
        return self.__class__(self.value + other.value)

# Will unregister the class

inst = Example(10)
# Will fail with an AttributeError

print inst + inst
class Sibling(MyObject):

ExampleSibling = Example + Sibling
# ExampleSibling is now a subclass of both Example and Sibling (with no
# content of its own) although it will believe it's called 'AutoClass'
print ExampleSibling
print ExampleSibling.__mro__

Wide Deep Learning for Recommender Systems


Google提出的一种推荐模型, 利用Wide和Deep两部分分别捕捉推荐过程中的特征并进行联合训练, 在Google AppStore上进行了实践, App的购买率得到了提升.


Memorization ; Generalization ; Cross-product transformation

Memorization 指的是从历史数据中总结并记忆规则的能力, 而Generalization指的是根据已有规则进行泛化的能力.
Cross-product transformation 指的则是不同特征之间相互组合构成新特征的过程.

Tensorflow的Models中有现成的实现, 分别在 Census 和 MovieLens 上进行了预测.

实验虽然有效果, 但是感觉效果提升并不大. 比较的范围也比较小, 仅进行了 Only Wide, Only Deep 和 Wide & Deep 三者的比较, 感觉缺乏说服性.


Paper Link : Wide Deep Learning for Recommender Systems
Google AI Blog Wide & Deep Learning: Better Together with TensorFlow
TensorFlow Linear Model Tutorial
TensorFlow Wide & Deep Learning Tutorial
CTR预估专栏 | 详解Wide&Deep理论与实践

Collection of algorithms used in machine learning with papers


Title : Adaptive subgradient methods for online learning and stochastic optimization
Author : J. Duchi, E. Hazan, and Y. Singer
Description : Most used optimiation algorithm in machine learning.
Publication : Journal of Machine Learning Research, 12:2121–2159, July 2011.
Used in : Wide & Deep Learning for Recommender Systems

FTRL : Follow the regularized leader

Title : Follow-the-regularized-leader and mirror descent: Equivalence theorems and l1 regularization
Author : H. B. McMahan
Description :
Publication : In Proc. AISTATS, 2011.
Used in : Wide & Deep Learning for Recommender Systems

Linear Algebra – Lesson 12. 图和网络


  • Graphs & Networks
  • Incidence Matrices
  • Kirchhoff’s Laws

Graph – 图

Graph : Nodes, Edges 图由节点和边构成.
potential : 电势
potential differences : 电势差
currents : 电流
图中节点和边的信息用关联矩阵(Incidence Matrix)来表示.
A=\begin{bmatrix}-1&1&0&0\\0&-1&1&0\\-1&0&1&0\\-1&0&0&1\\0&0&-1&1 \end{bmatrix}
A中可以看出,行1,2,3是相关的,对应的分别是边1,2,3, 所以回路意味着相关(loops correspond to dependent).
因为一条边只与两个节点相关,所以关联矩阵中每行只有两个非零值,该矩阵是一个稀疏矩阵(sparse matrix).

Ax=\begin{bmatrix}-1&1&0&0\\0&-1&1&0\\-1&0&1&0\\-1&0&0&1\\0&0&-1&1 \end{bmatrix}\begin{bmatrix}x_1\\x_2\\x_3\\x_4\end{bmatrix} = \begin{bmatrix}x_1-x_2\\x_3-x_2\\x_3-x_1\\x_4-x_1\\ x_4-x_3\end{bmatrix} = \begin{bmatrix}0\\0\\0\\0\\0 \end{bmatrix}

x = x_1,x_2,x_3,x_4 \text{(potentials at nodes)}\\
\rightarrow x_2-x_1,etc. \text{(potential differences)}\\
\rightarrow y_1,y_2,y_3,y_4,y_5\text{ current on edges (Ohm’s Law) }\\
\rightarrow A^Ty=0\text{(Kirchoff’s Current Law)}
可以解出一个解为 x=\begin{bmatrix}1\\1\\1\\1\end{bmatrix}
零空间的一组基其实就是上述的解,因为零空间是一维的(???为什么是一维的???),将x乘以常数c就是整个零空间,具体表现为四维空间中的一条直线 .
如果各节点的电势相等,则不会出现电流. 根据求出的解,将x_4的电势设为0(接地点),则其他点的电势也就可以求出,同时矩阵的秩也可以求出为3.

根据KCL可以得出 -y_1-y_3-y_4=0\\ 表示节点1的合电流为0.

在结点1,2,3,4构成的回路中, 解出的向量将是上述两个向量的线性组合.
A中得到没有回路的结构,y_1,y_2,y_4, 表示的是没有回路的图,也就是树(Tree).
dimN(A^T) = m-r \rightarrow # loops = # edges – (# nodes -1) (rank = n-1)
从而得到欧拉公式(Euler’s formula) # nodes – # edges + # loops = 1

将电势差记做e, 则e=Ax, y=Ce(电势差导致电流的产生),A^Ty=0(电流满足KCL定律方程),这是在无电源的情况下对应的方程.

留下的问题: 可以从A^TCAA^TA中得到什么?


推荐系统 Recommender system wikipedia link
协同过滤 Collaborative Filtering wikipedia link


基于隐语义模型的推荐(Latent Factor Model)


RandomWalk 随机游走

FM (Factorization Machine)

Gremlin-Python 线程池耗尽

Gremlin-python在链接远程JanusGraph的时候, 如果提交了一个复杂的查询, 远程JG会抛出异常错误, 这时Client中线程池中的Connection对象会捕捉该异常, 但是在3.2.6版本中不会将该Connection对象重新放回到线程池中, 这会导致经过多次类似复杂查询后, Client中线程池会被耗尽.
这个问题已经在 https://github.com/apache/tinkerpop/commit/6b51c54f67419039dc114406c1d61918b2ccf39f 中被修复了, 并且在3.4版本中进行了发布.

AttributeError: ‘SVC’ object has no attribute ‘_impl’

[Sklearn error: ‘SVR’ object has no attribute ‘_impl’] (https://stackoverflow.com/questions/19089386/sklearn-error-svr-object-has-no-attribute-impl)
0.14.1 SVC issue

Mostly, this error is caused by improper version of scikit-learn.
For example, you trained one model using scikit-learn with version of 0.1. However, you updated your packages to 0.2 after training.
When you using joblib.load to load the model file, it will throws out this error since pickle can’t recognize which version.