# Introduction

Random Walk
Random Walk with Restart
Fast Random Walk

# What’s Random Walk ?

Quoting from 重启随机游走算法(RWR)

# What’s Random Walk with Restart(RWR) ?

Let’s have some brief about RWR at first.
Quoting from What is Random Walk with Restart (RWR)?, it says :

RWR is a algorithm originally proposed for image segmentation. It iteratively explores the global structure of the network to estimate the proximity (affinity score) between two nodes. Starting at a node, the walker faces two choices at each step: either moving to a randomly chosen neighbor, or jumping back to the starting node. This algorithm only contains one fixed parameter r called ‘the restarting probability’ (with 1 − r for the probability of moving to a neighbor). After iteratively reaching stability, the steady probability vector contains the affinity score of all nodes in the network to the starting node. This steady probability vector can be viewed as the ‘influential impact’ over the network imposed by the starting node.

In addition to a single starting node, RWR can be extended to a set of starting nodes. In this context, the resulting steady probability vector can also be considered as the ‘influential impact’ over the network, but collectively imposed by a set of starting nodes.

The steady probability vector for a set of starting nodes can be calculated according to the steady probability vector for single starting nodes. For this reason, it is useful to pre-compute affinity matrix for nodes in the graph with respect to each starting node as a seed (by looping over every node in the graph). The motivation behind this is that this pre-computed affinity matrix can be extensively reused for obtaining affinity scores between any combinations of nodes. For instance, such a strategy can dramatically relieve the computational burden involved in sampling a large number of random node combinations. Therefore, this pre-computation does permit some flexibility in the downstream use, in particular for statistical testing.

Some features can be extracted from this quoting.
At first , RWR was first proposed for image sementation. Automatic Multimedia Cross-modal Correlation Discovery. It was proposed to solve the problem about proximity of two node in graph. Second, only one parmeter r ( restarting probability) and two choices (1.jumping batch to the starting node 2. move to a randomly chosen neighbor). After iteratively reaching stablility, the steady probability vector contains affinity score of all nodes in the graph to the starting node. That means in intuition, the more far away node from the starting node ,the less influence it has to the starting node.

# Words

affinity n. 类似, 近似, 密切关系.
burden n. 负担, 包袱, 责任 vt. 使烦恼, 劳累.
flexibility n. 柔度, 灵活性.

# 1. Two Sum

Difficulty : Easy
Description:

Given an array of integers, return indices of the two numbers such that they add up to a specific target.
You may assume that each input would have exactly one solution, and you may not use the same element twice.
Example:
Given nums = [2, 7, 11, 15], target = 9,
Because nums[0] + nums[1] = 2 + 7 = 9,
return [0, 1].

Solution :

class Solution:
def twoSum(self, nums, target):
"""
:type nums: List[int]
:type target: int
:rtype: List[int]
"""
placeHolder = dict()
for i, x in enumerate(nums):
if target - x in placeHolder:
return placeHolder[target - x], i
else:
placeHolder[x] = i

return false


Difficulty : Medium
Description :

You are given two non-empty linked lists representing two non-negative integers. The digits are stored in reverse order and each of their nodes contain a single digit. Add the two numbers and return it as a linked list.

You may assume the two numbers do not contain any leading zero, except the number 0 itself.

Example:
Input: (2 -> 4 -> 3) + (5 -> 6 -> 4)
Output: 7 -> 0 -> 8
Explanation: 342 + 465 = 807.

Solution :

# 4. Median of Two Sorted Arrays

Difficulty : Hard
Description :

There are two sorted arrays nums1 and nums2 of size m and n respectively.

Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

You may assume nums1 and nums2 cannot be both empty.

Example 1:
nums1 = [1, 3]
nums2 = [2]
The median is 2.0

Example 2:
nums1 = [1, 2]
nums2 = [3, 4]

The median is (2 + 3)/2 = 2.5There are two sorted arrays nums1 and nums2 of size m and n respectively.

Solution:

class Solution:
def findMedianSortedArrays(self, nums1, nums2):
"""
:type nums1: List[int]
:type nums2: List[int]
:rtype: float
"""
len1 = len(nums1)
len2 = len(nums2)

index_l = []

total_len = len1 + len2
if total_len % 2 == 0:
index_l.append(total_len / 2 - 1)
index_l.append(total_len / 2)
else:
index_l.append((total_len -1 ) / 2)

result = 0
index_1 = 0
index_2 = 0
for i in range(total_len):
current = 0
if index_1 >= len1 and index_2 < len2:
current = nums2[index_2]
index_2 += 1
elif index_2 >= len2 and index_1 < len1:
current = nums1[index_1]
index_1 +=1
elif nums1[index_1] < nums2[index_2]:
current = nums1[index_1]
index_1 += 1
else:
current = nums2[index_2]
index_2 += 1
if index_l :
if i in index_l:
result += current
index_l.remove(i)
else :
break

if total_len % 2 == 1:
return result
else:
return result / 2


# 5. Longest

Difficulty : Medium
Description :

Given a string s, find the longest palindromic substring in s. You may assume that the maximum length of s is 1000.

Example 1:
Note: “aba” is also a valid answer.

Example 2:
Input: “cbbd” Output: “bb”

Solution
python3

P1 – 24 正文
P24 – 29 习题

# 1 随机试验

1. 可以在相同的条件下重复地进行;
2. 每次试验的可能结果不止一个, 并且能实现明确试验的所有可能结果.
3. 进行一次试验之前不能确定哪一个结果会出现.

# Q&A

Q1. 在一批灯泡中任意抽取一只, 测试它的寿命. 该试验可以被看做随机试验么?
A1. 可以. 该试验的样本空间为 S:{t | t>= 0 } 也就是说, 随机试验的样本空间未必是可以一一穷举的.

# 集中趋势的量度

1. 均值和中位数的区别?
在数据存在偏斜的情况下, 中位数更能很好的表现数据分布.
大多数情况系, 均值远优于中位数, 尤其对抽样数据来说.

P62

P78值得记录.

P83 第三章

# Chapter 5. 离散概率分布的运用

\$\sigma\$

## WishList

1. CS231N 2017 李飞飞视觉课程
2. GAN实现
3. Transformer实现
4. Encoder-Decoder 实现
5. Attention机制了解
6. TransE 等系列实现
7. PRA 等系列实现
8. 遗传,退火算法实现
9. 线性代数复习 + 习题册练习
10. 统计复习 + 习题册联系
11. 传统机器学习相关知识整理
12. Coursera上的deeplearning.ai课程 在网易云课堂上有 可以先上这个 然后去Coursera上拿个证书. 不过这个需要尽快 节省费用.
12. 和阿木组队参加的DC比赛, 目标Top10
13. CSDN 皮果提的博客写的很好 可以浏览一遍
14. 选择一个数据集, 尝试使用各种模型比较一遍 CIF10.
15. 整理下各个数据集 FashionMNIST.
16. 复习老算法 例如红黑树 B+树等 图的遍历等 树的遍历等
17. 强化学习实践
19. 对话管理

Viterbi
Gibbs Sampling
RNN
CNN
CRF
Trie

# nan is not nan ?

param_dict = {'col1':1, 'col2':nan,'col3':3}


# 为什么在IDE中可以正常run起来的项目, 从命令行中启动就会报Import Error?

Python中，每个py文件被称之为模块(module)，每个具有init.py文件的目录被称为包(package)。只要模块或者包所在的目录在Python的sys.path中，就可以使用import 模块或import 包来使用.

import b


import sys
sys.path.append('c:\xxxx\b.py')# 这个例子针对 windows 用户来说的


sys.path.append('c:\\xxx\\b.py')



sys.path是Python的搜索模块的路径集，是一个list, 可以在Python 环境下使用sys.path.append(path)添加相关的路径，但在退出Python环境后, 自己添加的路径就会自动消失了！

IDE :
['/home/bp/PycharmProjects/untitled1/dir2', '/home/bp/PycharmProjects/untitled1/dir2', '/home/bp/PycharmProjects/untitled1', '/home/bp/PycharmProjects/untitled1/dir1', '/home/bp/VirtualEnvs/tensorflow0625/lib/Python35.zip', '/home/bp/VirtualEnvs/tensorflow0625/lib/Python3.5', '/home/bp/VirtualEnvs/tensorflow0625/lib/Python3.5/plat-x86_64-linux-gnu', '/home/bp/VirtualEnvs/tensorflow0625/lib/Python3.5/lib-dynload', '/usr/lib/Python3.5', '/usr/lib/Python3.5/plat-x86_64-linux-gnu', '/home/bp/VirtualEnvs/tensorflow0625/lib/Python3.5/site-packages', '/usr/local/lib/Python3.5/dist-packages', '/usr/lib/Python3/dist-packages']

Terminal :
['', '/usr/lib/Python35.zip', '/usr/lib/Python3.5', '/usr/lib/Python3.5/plat-x86_64-linux-gnu', '/usr/lib/Python3.5/lib-dynload', '/usr/local/lib/Python3.5/dist-packages', '/usr/lib/Python3/dist-packages']



# Solutions

1. import

1. sys.append

2. *.pth
在site-package下添加.pth文件, 在文件内用 绝对路径 标明引用路径.

python -c “from distutils.sysconfig import get_python_lib; print get_python_lib()”

# .pth file for the  my project(这行是注释)
E:\DjangoWord
E:\DjangoWord\mysite
E:\DjangoWord\mysite\polls


# Boob!

def append_sys_path(root_path=None):
"""
递归添加项目下所有的Package路径.
:return:
"""
if root_path is None:
pro_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
else:
pro_dir = root_path

result_dir = [pro_dir]
while result_dir:
single_dir = result_dir.pop()
init_flag = False
subdir_list = []
for single_path in os.listdir(single_dir):
if single_path == '__init__.py':
init_flag = True  # 根据当前目录是否存在__init__.py文件判断是否为package.
single_path = os.path.join(single_dir, single_path)
if os.path.isdir(single_path):
subdir_list.append(single_path)
if init_flag:
sys.path.append(single_dir)
result_dir.extend(subdir_list)


# 其他介绍ImportError的文章

1. Python 101: All about imports
2. Python导入模块的几种姿势 – 链接1的中文译文

# 哪个是我该用的Queue?

Queue是个在多进程中是个让人困惑的概念. 可以用多个方式导入Queue, 分别是 multiprocessing.queues.Queue, 和 multiprocessing中Manager中的Queue, 还有queue中的Queue

## 从Manager中导入的Queue

from multiprocessing import Manager
m = Manager()
q_from_manager = m.Queue()


Manager默认是multiprocessing中SyncManager类, SyncManager类中的Queue()方法则返回queue.Queue(). 但是并不是说由Manager生成的Queue 和 queue.Queue是一样的. 相反, 在Pool,Process和父进程通信的过程中, Manager.Queue可以很好的实现父子进程间通信, 子进程间通信, 而queue.Queue()并不能, 说明Manager通过继承的multiprocessing.managers.BaseManager中的机制来保障进程间通信, 而queue.Queue()只是实现这一机制的底层.

## 通过multiprecessing导入Queue

from multiprocessing import Queue
q = Queue(1)
print(type(q))
<class 'multiprocessing.queues.Queue'>


mp.Queue 可以用于spawn出的子进程间通信, 也可以用于 父进程和 spawn出的子进程通信,
multiprocessing.Queue只能用于显示地用Process创建的子进程间或与子进程与父进程之间通信;
multiprocessing.Manager.Queue 能用于Pool创建的子进程之间通信, 也能用于父进程和子进程通信, 也能用于Process进程与Pool创建的子进程之间通信. 这个还是比较强大的功能.

## 选择合适的Queue

from queue import Queue

from multiprocessing import Queue

# 问题

Terminate multi process/thread in Python correctly and gracefully中有很好的如何解决Process结束的问题. 但仍有一些点需要注意.

import multiprocessing
import time

def hang():
while True:
print('hanging..')
time.sleep(10)
print('after 10 , hanging...')

def main():
p = multiprocessing.Process(target=hang)
p.start()
time.sleep(1)
p.terminate()
p.join()
print('main process exiting..')

if __name__ == '__main__':
main()


Pool可以提供指定数量的进程，供用户调用，当有新的请求提交到pool中时，如果池还没有满，那么就会创建一个新的进程用来执行该请求；但如果池中的进程数已经达到规定最大值，那么该请求就会等待，直到池中有进程结束，才会创建新的进程来它。这里有一个简单的例子

#!/usr/bin/env python
#coding=utf-8
from multiprocessing import Pool
from time import sleep

def f(x):
for i in range(10):
print '%s --- %s ' % (i, x)
sleep(1)

def main():
pool = Pool(processes=3)    # set the processes max number 3
for i in range(11,20):
result = pool.apply_async(f, (i,))
pool.close()
pool.join()
if result.successful():
print 'successful'

if __name__ == "__main__":
main()


multiprocessing往往是python用来实现真正并行的方式. 引用回答中的话,

multiprocessing is for running functions within your existing (Python) code with support for more flexible communications among this family of processes.

# Process类继承

from multiprocessing import Queue, Process, current_process
import os
from time import sleep
from random import randint

class Producer(Process):
def __init__(self, queue): # Override
Process.__init__(self) # 加入父类init
self.queue = queue

def run(self): # 调用start()时会调用run(run为单进程).
while True:
self.queue.put('one product')
print(current_process().name + str(
os.getpid()) + ' produced one product, the no of queue now is: %d' % self.queue.qsize())
sleep(randint(1, 3))

class Consumer(Process):
def __init__(self, queue):
Process.__init__(self)
self.queue = queue

def run(self):
while True:
d = self.queue.get(1)
if d != None:
print(current_process().name + str(os.getpid()) + ' consumed  %s, the no of queue now is: %d' % (
d, self.queue.qsize()))
sleep(randint(1, 4))
continue
else:
break

if __name__ == "__main__":
queue = Queue(40)
# Init
processed = []
for i in range(3):
processed.append(Producer(queue))
processed.append(Consumer(queue))

# Start
for i in range(len(processed)):
processed[i].start()

# Join
for i in range(len(processed)):
processed[i].join()



## Mac OS系统下Python多进程Debug卡死状态

TEMP.py用于启动多进程读取文件,并行处理并写入文件中.

(virtualenv_3.6.5) localhost:other_requriment root$ps aux | grep TEMP.py root 10859 0.0 0.0 4286184 972 s000 S+ 8:07PM 0:00.01 grep TEMP.py root 10797 0.0 0.0 7846872 1300 ?? S 7:50PM 0:16.63 bin/python3.6 /TEMP.py root 10796 0.0 0.1 7875000 4360 ?? S 7:50PM 0:30.04 bin/python3.6 /TEMP.py root 10795 0.0 0.1 7875000 4256 ?? S 7:50PM 0:30.23 bin/python3.6 /TEMP.py root 10794 0.0 0.1 7875256 4236 ?? S 7:50PM 0:29.77 bin/python3.6 /TEMP.py root 10793 0.0 0.1 7875256 4236 ?? S 7:50PM 0:29.26 bin/python3.6 /TEMP.py root 10792 0.0 0.0 7847908 1824 ?? S 7:50PM 0:00.10 bin/python3.6 /TEMP.py root 10791 0.0 0.0 7853028 1336 ?? S 7:50PM 0:08.25 bin/python3.6 /TEMP.py root 10782 0.0 0.3 7846080 24124 ?? S 7:50PM 0:01.77 bin/python3.6 /TEMP.py  (virtualenv_3.6.5) localhost:$ pstree 10782
-+- 10782 root bin/python3.6 PROJECT/TEMP.py
|--- 10791 root bin/python3.6 PROJECT/TEMP.py
|--- 10792 root bin/python3.6 PROJECT/TEMP.py
|--- 10793 root bin/python3.6 PROJECT/TEMP.py
|--- 10794 root bin/python3.6 PROJECT/TEMP.py
|--- 10795 root bin/python3.6 PROJECT/TEMP.py
|--- 10796 root bin/python3.6 PROJECT/TEMP.py
\--- 10797 root bin/python3.6 PROJECT/TEMP.py


(virtualenv_3.6.5) localhost:$sudo dtruss -p 10782 dtrace: system integrity protection is on, some features will not be available SYSCALL(args) = return gettimeofday(0x7000046A9C68, 0x0, 0x0) = 0 0 gettimeofday(0x7000047AC538, 0x0, 0x0) = 0 0 psynch_cvwait(0x7FE75A178168, 0x100000100, 0x0) = -1 Err#316 gettimeofday(0x7000046A9C68, 0x0, 0x0) = 0 0 psynch_cvwait(0x7FE75A178168, 0x100000100, 0x0) = -1 Err#316  查询子进程10791 – 10797 发现除了10792报错和主进程一样之外, 其他子进程均和10793一样, 没有报错. (virtualenv_3.6.5) localhost:other_requriment didi$ sudo dtruss -p 10792
dtrace: system integrity protection is on, some features will not be available

SYSCALL(args)            = return
gettimeofday(0x7FFEE9E28BA8, 0x0, 0x0)           = 0 0
psynch_cvwait(0x7FE75A236A88, 0x100000100, 0x0)          = -1 Err#316

(virtualenv_3.6.5) localhost:other_requriment didi$sudo dtruss -p 10793 dtrace: system integrity protection is on, some features will not be available SYSCALL(args) = return  Linux系统下可以进入/proc/$PID/ 下 cat status 或者进入fd进行查看, 但是Mac OS系统下没有/proc目录, 所以根据Mac OS X equivalent of VirtualQuery or /proc/pid/maps?的推荐, 使用**vmmap**进行查看(需要root权限). 命令输出的信息较多(看不太懂这里的内容输出).

Stack Overflow上有类似的问题
Interpret dtruss output like “psynch_cvwait(…) = -1 Err#316”

/*
*  psynch_cvwait: This system call is used for psynch cvar waiters to block in kernel.
*/
int
psynch_cvwait(__unused proc_t p,
struct psynch_cvwait_args * uap,
uint32_t * retval)


Linux上调试类似问题

## Mac系统中配置字体

1、/System/Library/Fonts路径下。
2、/Library/Fonts路径下。

2、直接利用ATM字体管理软件进行字体导入管理即可。

1、文字的外在形式特征。就是文字的风格，是文字的外衣。 字体的艺术性体现在其完美的外在形式与丰富的内涵之中。 字体是文化的载体，是社会的缩影。
2、微机系统的字体font。这类字体是电脑必用字体，存在于“fonts”文件夹里。

1、Heiti SC(黑体-简,黑体-简的英文名称为Heiti SC.Heiti为黑体的拼音,SC代表简体中文(Simplified Chinese)),是Mac OS X Snow Leopard(版本10.6)包含的简体中文字型,也是iPhone OS

2、3.0(版本4.0后改名为iOS)及iPod nano第五代以来的预设简体中文字型.