diff --git a/README.md b/README.md index a61a1e0..f1df7ca 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,7 @@ - [25.求出列表所有奇数并构造新列表](#25求出列表所有奇数并构造新列表) - [26.用一行python代码写出1+2+3+10248](#26用一行python代码写出12310248) - [27.Python中变量的作用域?(变量查找顺序)](#27python中变量的作用域变量查找顺序) - - [28.字符串”123″转换成123,不使用内置api,例如int()](#28字符串123″转换成123不使用内置api例如int) + - [28.字符串 `"123"` 转换成 `123`,不使用内置api,例如 `int()`](#28字符串-123-转换成-123不使用内置api例如-int) - [29.Given an array of integers](#29given-an-array-of-integers) - [30.python代码实现删除一个list里面的重复元素](#30python代码实现删除一个list里面的重复元素) - [31.统计一个文本中单词频次最高的10个单词?](#31统计一个文本中单词频次最高的10个单词) @@ -56,7 +56,7 @@ - [47.Python中如何动态获取和设置对象的属性?](#47python中如何动态获取和设置对象的属性) - [内存管理与垃圾回收机制](#内存管理与垃圾回收机制) - [48.哪些操作会导致Python内存溢出,怎么处理?](#48哪些操作会导致python内存溢出怎么处理) - - [49.关于Python内存管理,下列说法错误的是](#49关于python内存管理下列说法错误的是) + - [49.关于Python内存管理,下列说法错误的是 B](#49关于python内存管理下列说法错误的是--b) - [50.Python的内存管理机制及调优手段?](#50python的内存管理机制及调优手段) - [51.内存泄露是什么?如何避免?](#51内存泄露是什么如何避免) - [函数](#函数) @@ -271,7 +271,10 @@ - [244.怎么在海量数据中找出重复次数最多的一个?](#244怎么在海量数据中找出重复次数最多的一个) - [245.判断数据是否在大量数据中](#245判断数据是否在大量数据中) + + + # Python基础 ## 文件操作 ### 1.有一个jsonline格式的文件file.txt大小约为10K @@ -291,6 +294,15 @@ def get_lines(): for i in f: yield i ``` +个人认为:还是设置下每次返回的行数较好,否则读取次数太多。 +```python +def get_lines(): + l = [] + with open('file.txt','rb') as f: + data = f.readlines(60000) + l.append(data) + yield l +``` Pandaaaa906提供的方法 ```python from mmap import mmap @@ -352,6 +364,7 @@ print(alist) ```python sorted(d.items(),key=lambda x:x[1]) ``` + x[0]代表用key进行排序;x[1]代表用value进行排序。 ### 6.字典推导式 ```python d = {key:value for (key,value) in iterable} @@ -440,10 +453,14 @@ c. Python2里面继承object的是新式类,没有写父类的是经典类 d. 经典类目前在Python里基本没有应用 +e. 保持class与type的统一对新式类的实例执行a.__class__与type(a)的结果是一致的,对于旧式类来说就不一样了。 + +f. 对于多重继承的属性搜索顺序不一样新式类是采用广度优先搜索,旧式类采用深度优先搜索。 + ### 16.python中内置的数据结构有几种? a. 整型 int、 长整型 long、浮点型 float、 复数 complex -b. 字符串 str、 列表 list、 元祖 tuple +b. 字符串 str、 列表 list、 元组 tuple c. 字典 dict 、 集合 set @@ -459,45 +476,53 @@ def singleton(cls): instances[cls] = cls(*args, **kwargs) return instances[cls] return wrapper + + @singleton class Foo(object): pass foo1 = Foo() foo2 = Foo() -print foo1 is foo2 #True +print(foo1 is foo2) # True ``` 第二种方法:使用基类 New 是真正创建实例对象的方法,所以重写基类的new 方法,以此保证创建对象的时候只生成一个实例 ```python class Singleton(object): - def __new__(cls,*args,**kwargs): - if not hasattr(cls,'_instance'): - cls._instance = super(Singleton,cls).__new__(cls,*args,**kwargs) + def __new__(cls, *args, **kwargs): + if not hasattr(cls, '_instance'): + cls._instance = super(Singleton, cls).__new__(cls, *args, **kwargs) return cls._instance + class Foo(Singleton): pass foo1 = Foo() foo2 = Foo() -print foo1 is foo2 #True +print(foo1 is foo2) # True ``` 第三种方法:元类,元类是用于创建类对象的类,类对象创建实例对象时一定要调用call方法,因此在调用call时候保证始终只创建一个实例即可,type是python的元类 ```python class Singleton(type): - def __call__(cls,*args,**kwargs): - if not hasattr(cls,'_instance'): - cls._instance = super(Singleton,cls).__call__(*args,**kwargs) + def __call__(cls, *args, **kwargs): + if not hasattr(cls, '_instance'): + cls._instance = super(Singleton, cls).__call__(*args, **kwargs) return cls._instance -``` -```python + + +# Python2 class Foo(object): __metaclass__ = Singleton +# Python3 +class Foo(metaclass=Singleton): + pass + foo1 = Foo() foo2 = Foo() -print foo1 is foo2 #True +print(foo1 is foo2) # True ``` ### 18.反转一个整数,例如-123 --> -321 @@ -542,14 +567,14 @@ get_files("./",'.pyc') import os def pick(obj): - if ob.endswith(".pyc"): + if obj.endswith(".pyc"): print(obj) def scan_path(ph): file_list = os.listdir(ph) for obj in file_list: if os.path.isfile(obj): - pick(obj) + pick(obj) elif os.path.isdir(obj): scan_path(obj) @@ -576,7 +601,7 @@ count = sum(range(0,101)) print(count) ``` ### 21.Python-遍历列表时删除元素的正确做法 -遍历在新在列表操作,删除时在原来的列表操作 +遍历在新的列表操作,删除时在原来的列表操作 ```python a = [1,2,3,4,5,6,7,8] print(id(a)) @@ -641,15 +666,23 @@ print(a) ```python def get_missing_letter(a): s1 = set("abcdefghijklmnopqrstuvwxyz") - s2 = set(a) + s2 = set(a.lower()) ret = "".join(sorted(s1-s2)) return ret print(get_missing_letter("python")) + +# other ways to generate letters +# range("a", "z") +# 方法一: +import string +letters = string.ascii_lowercase +# 方法二: +letters = "".join(map(chr, range(ord('a'), ord('z') + 1))) ``` ### 23.可变类型和不可变类型 -1,可变类型有list,dict.不可变类型有string,number,tuple. +1,可变类型有list,dict.不可变类型有string,number,tuple. 2,当进行修改操作时,可变类型传递的是内存中的地址,也就是说,直接修改内存中的值,并没有开辟新的内存。 @@ -662,7 +695,7 @@ is:比较的是两个对象的id值是否相等,也就是比较俩对象是 ### 25.求出列表所有奇数并构造新列表 ```python a = [1,2,3,4,5,6,7,8,9,10] -res = [ i for i in a if i%2==1] +res = [i for i in a if i%2==1] print(res) ``` ### 26.用一行python代码写出1+2+3+10248 @@ -680,13 +713,13 @@ print(num1) 1.什么是LEGB? -L: local 函数内部作用域 +L: local 函数内部作用域 E: enclosing 函数内部与内嵌函数之间 G: global 全局作用域 -B: build-in 内置作用 +B: build-in 内置作用 python在函数里面的查找分为4种,称之为LEGB,也正是按照这是顺序来查找的 ### 28.字符串 `"123"` 转换成 `123`,不使用内置api,例如 `int()` @@ -740,15 +773,26 @@ class Solution: if target-nums[size] in d: if d[target-nums[size]] 0: + tmp.append(l1[0]) + del l1[0] + while len(l2)>0: + tmp.append(l2[0]) + del l2[0] + return tmp ``` ### 37.给定一个任意长度数组,实现一个函数 让所有奇数都在偶数前面,而且奇数升序排列,偶数降序排序,如字符串'1982376455',变成'1355798642' @@ -1018,8 +1068,32 @@ class Array: ### 45.介绍Cython,Pypy Cpython Numba各有什么缺点 Cython ### 46.请描述抽象类和接口类的区别和联系 + +1.抽象类: 规定了一系列的方法,并规定了必须由继承类实现的方法。由于有抽象方法的存在,所以抽象类不能实例化。可以将抽象类理解为毛坯房,门窗,墙面的样式由你自己来定,所以抽象类与作为基类的普通类的区别在于约束性更强 + +2.接口类:与抽象类很相似,表现在接口中定义的方法,必须由引用类实现,但他与抽象类的根本区别在于用途:与不同个体间沟通的规则,你要进宿舍需要有钥匙,这个钥匙就是你与宿舍的接口,你的舍友也有这个接口,所以他也能进入宿舍,你用手机通话,那么手机就是你与他人交流的接口 + +3.区别和关联: + +1.接口是抽象类的变体,接口中所有的方法都是抽象的,而抽象类中可以有非抽象方法,抽象类是声明方法的存在而不去实现它的类 + +2.接口可以继承,抽象类不行 + +3.接口定义方法,没有实现的代码,而抽象类可以实现部分方法 + +4.接口中基本数据类型为static而抽象类不是 + ### 47.Python中如何动态获取和设置对象的属性? +```python +if hasattr(Parent, 'x'): + print(getattr(Parent, 'x')) + setattr(Parent, 'x',3) +print(getattr(Parent,'x')) +``` + + + ## 内存管理与垃圾回收机制 ### 48.哪些操作会导致Python内存溢出,怎么处理? ### 49.关于Python内存管理,下列说法错误的是 B @@ -1133,27 +1207,268 @@ if __name__ == "__main__": list(filter(lambda x: x % 2 == 0, range(10))) ``` ### 59.编写函数的4个原则 + +1.函数设计要尽量短小 + +2.函数声明要做到合理、简单、易于使用 + +3.函数参数设计应该考虑向下兼容 + +4.一个函数只做一件事情,尽量保证函数语句粒度的一致性 + ### 60.函数调用参数的传递方式是值传递还是引用传递? + +Python的参数传递有:位置参数、默认参数、可变参数、关键字参数。 + +函数的传值到底是值传递还是引用传递、要分情况: + +不可变参数用值传递:像整数和字符串这样的不可变对象,是通过拷贝进行传递的,因为你无论如何都不可能在原处改变不可变对象。 + +可变参数是引用传递:比如像列表,字典这样的对象是通过引用传递、和C语言里面的用指针传递数组很相似,可变对象能在函数内部改变。 + ### 61.如何在function里面设置一个全局变量 + +```python +globals() # 返回包含当前作用余全局变量的字典。 +global 变量 设置使用全局变量 +``` + ### 62.对缺省参数的理解 ? + +缺省参数指在调用函数的时候没有传入参数的情况下,调用默认的参数,在调用函数的同时赋值时,所传入的参数会替代默认参数。 + +*args是不定长参数,它可以表示输入参数是不确定的,可以是任意多个。 + +**kwargs是关键字参数,赋值的时候是以键值对的方式,参数可以是任意多对在定义函数的时候 + +不确定会有多少参数会传入时,就可以使用两个参数 + ### 63.Mysql怎么限制IP访问? + + + ### 64.带参数的装饰器? + +带定长参数的装饰器 + +```python +def new_func(func): + def wrappedfun(username, passwd): + if username == 'root' and passwd == '123456789': + print('通过认证') + print('开始执行附加功能') + return func() + else: + print('用户名或密码错误') + return + return wrappedfun + +@new_func +def origin(): + print('开始执行函数') +origin('root','123456789') +``` + +带不定长参数的装饰器 + +```python +def new_func(func): + def wrappedfun(*parts): + if parts: + counts = len(parts) + print('本系统包含 ', end='') + for part in parts: + print(part, ' ',end='') + print('等', counts, '部分') + return func() + else: + print('用户名或密码错误') + return func() + return wrappedfun + +``` + ### 65.为什么函数名字可以当做参数用? + +Python中一切皆对象,函数名是函数在内存中的空间,也是一个对象 + ### 66.Python中pass语句的作用是什么? + +在编写代码时只写框架思路,具体实现还未编写就可以用pass进行占位,是程序不报错,不会进行任何操作。 + ### 67.有这样一段代码,print c会输出什么,为什么? + +```python +a = 10 +b = 20 +c = [a] +a = 15 +``` + +答:10对于字符串,数字,传递是相应的值 + + + ### 68.交换两个变量的值? + +```python +a, b = b, a +``` + + + ### 69.map函数和reduce函数? + +```python +map(lambda x: x * x, [1, 2, 3, 4]) # 使用 lambda +# [1, 4, 9, 16] +reduce(lambda x, y: x * y, [1, 2, 3, 4]) # 相当于 ((1 * 2) * 3) * 4 +# 24 +``` + + + ### 70.回调函数,如何通信的? + +回调函数是把函数的指针(地址)作为参数传递给另一个函数,将整个函数当作一个对象,赋值给调用的函数。 + ### 71.Python主要的内置数据类型都有哪些? print dir( ‘a ’) 的输出? -### 72.map(lambda x:xx,[y for y in range(3)])的输出? + +内建类型:布尔类型,数字,字符串,列表,元组,字典,集合 + +输出字符串'a'的内建方法 + +### 72.map(lambda x:x*x,[y for y in range(3)])的输出? + +``` +[0, 1, 4] +``` + ### 73.hasattr() getattr() setattr() 函数使用详解? + +hasattr(object,name)函数: + +判断一个对象里面是否有name属性或者name方法,返回bool值,有name属性(方法)返回True,否则返回False。 + +```python +class function_demo(object): + name = 'demo' + def run(self): + return "hello function" +functiondemo = function_demo() +res = hasattr(functiondemo, "name") # 判断对象是否有name属性,True +res = hasattr(functiondemo, "run") # 判断对象是否有run方法,True +res = hasattr(functiondemo, "age") # 判断对象是否有age属性,False +print(res) +``` + +getattr(object, name[,default])函数: + +获取对象object的属性或者方法,如果存在则打印出来,如果不存在,打印默认值,默认值可选。注意:如果返回的是对象的方法,则打印结果是:方法的内存地址,如果需要运行这个方法,可以在后面添加括号(). + +```python +functiondemo = function_demo() +getattr(functiondemo, "name")# 获取name属性,存在就打印出来 --- demo +getattr(functiondemo, "run") # 获取run 方法,存在打印出方法的内存地址 +getattr(functiondemo, "age") # 获取不存在的属性,报错 +getattr(functiondemo, "age", 18)# 获取不存在的属性,返回一个默认值 +``` + +setattr(object, name, values)函数: + +给对象的属性赋值,若属性不存在,先创建再赋值 + +```python +class function_demo(object): + name = "demo" + def run(self): + return "hello function" +functiondemo = function_demo() +res = hasattr(functiondemo, "age") # 判断age属性是否存在,False +print(res) +setattr(functiondemo, "age", 18) # 对age属性进行赋值,无返回值 +res1 = hasattr(functiondemo, "age") # 再次判断属性是否存在,True +``` + +综合使用 + +```python +class function_demo(object): + name = "demo" + def run(self): + return "hello function" +functiondemo = function_demo() +res = hasattr(functiondemo, "addr") # 先判断是否存在 +if res: + addr = getattr(functiondemo, "addr") + print(addr) +else: + addr = getattr(functiondemo, "addr", setattr(functiondemo, "addr", "北京首都")) + print(addr) +``` + + + ### 74.一句话解决阶乘函数? + +``` +reduce(lambda x,y : x*y,range(1,n+1)) +``` + + + ### 75.什么是lambda函数? 有什么好处? + +lambda 函数是一个可以接收任意多个参数(包括可选参数)并且返回单个表达式值的函数 + +1.lambda函数比较轻便,即用即仍,很适合需要完成一项功能,但是此功能只在此一处使用,连名字都很随意的情况下 + +2.匿名函数,一般用来给filter,map这样的函数式编程服务 + +3.作为回调函数,传递给某些应用,比如消息处理 + ### 76.递归函数停止的条件? + +递归的终止条件一般定义在递归函数内部,在递归调用前要做一个条件判断,根据判断的结果选择是继续调用自身,还是return,,返回终止递归。 + +终止的条件:判断递归的次数是否达到某一限定值 + +2.判断运算的结果是否达到某个范围等,根据设计的目的来选择 + ### 77.下面这段代码的输出结果将是什么?请解释。 -### 78.什么是lambda函数?它有什么好处?写一个匿名函数求两个数的和 +```python +def multipliers(): + return [lambda x: i *x for i in range(4)] + print([m(2) for m in multipliers()]) +``` + +上面代码的输出结果是[6,6,6,6],不是我们想的[0,2,4,6] + +你如何修改上面的multipliers的定义产生想要的结果? + +上述问题产生的原因是python闭包的延迟绑定。这意味着内部函数被调用时,参数的值在闭包内进行查找。因此,当任何由multipliers()返回的函数被调用时,i的值将在附近的范围进行查找。那时,不管返回的函数是否被调用,for循环已经完成,i被赋予了最终的值3. + +```python +def multipliers(): + for i in range(4): + yield lambda x: i *x +``` + +```python +def multipliers(): + return [lambda x,i = i: i*x for i in range(4)] + +``` + + + + + +### 78.什么是lambda函数?它有什么好处?写一个匿名函数求两个数的和 + +lambda函数是匿名函数,使用lambda函数能创建小型匿名函数,这种函数得名于省略了用def声明函数的标准步骤 ## 设计模式 @@ -1230,23 +1545,192 @@ print ([[x for x in range(1,100)] [i:i+3] for i in range(0,100,3)]) yield就是保存当前程序执行状态。你用for循环的时候,每次取一个元素的时候就会计算一次。用yield的函数叫generator,和iterator一样,它的好处是不用一次计算所有元素,而是用一次算一次,可以节省很多空间,generator每次计算需要上一次计算结果,所以用yield,否则一return,上次计算结果就没了 ## 面向对象 ### 90.Python中的可变对象和不可变对象? + +不可变对象,该对象所指向的内存中的值不能被改变。当改变某个变量时候,由于其所指的值不能被改变,相当于把原来的值复制一份后再改变,这会开辟一个新的地址,变量再指向这个新的地址。 + +可变对象,该对象所指向的内存中的值可以被改变。变量(准确的说是引用)改变后,实际上其所指的值直接发生改变,并没有发生复制行为,也没有开辟出新的地址,通俗点说就是原地改变。 + +Pyhton中,数值类型(int 和float),字符串str、元组tuple都是不可变类型。而列表list、字典dict、集合set是可变类型 + ### 91.Python的魔法方法 + +魔法方法就是可以给你的类增加魔力的特殊方法,如果你的对象实现(重载)了这些方法中的某一个,那么这个方法就会在特殊的情况下被Python所调用,你可以定义自己想要的行为,而这一切都是自动发生的,它们经常是两个下划线包围来命名的(比如`__init___`,`__len__`),Python的魔法方法是非常强大的所以了解其使用方法也变得尤为重要! + +`__init__`构造器,当一个实例被创建的时候初始化的方法,但是它并不是实例化调用的第一个方法。 + +`__new__`才是实例化对象调用的第一个方法,它只取下cls参数,并把其他参数传给`__init___`. + +`___new__`很少使用,但是也有它适合的场景,尤其是当类继承自一个像元组或者字符串这样不经常改变的类型的时候。 + +`__call__`让一个类的实例像函数一样被调用 + +`__getitem__`定义获取容器中指定元素的行为,相当于self[key] + +`__getattr__`定义当用户试图访问一个不存在属性的时候的行为。 + +`__setattr__`定义当一个属性被设置的时候的行为 + +`__getattribute___`定义当一个属性被访问的时候的行为 + ### 92.面向对象中怎么实现只读属性? + +将对象私有化,通过共有方法提供一个读取数据的接口 + +```python +class person: + def __init__(self, x): + self.__age = 10 + def age(self): + return self.__age +t = person(22) +# t.__age =100 +print(t.age()) +``` + +最好的方法 + +```python +class MyCls(object): + __weight = 50 + + @property + def weight(self): + return self.__weight + +``` + ### 93.谈谈你对面向对象的理解? +面向对象是相当于面向过程而言的,面向过程语言是一种基于功能分析的,以算法为中心的程序设计方法,而面向对象是一种基于结构分析的,以数据为中心的程序设计思想。在面向对象语言中有一个很重要的东西,叫做类。面向对象有三大特性:封装、继承、多态。 + ## 正则表达式 ### 94.请写出一段代码用正则匹配出ip? + ### 95.a = “abbbccc”,用正则匹配为abccc,不管有多少b,就出现一次? + 思路:不管有多少个b替换成一个 + + re.sub(r'b+', 'b', a) ### 96.Python字符串查找和替换? -### 97.用Python匹配HTML g tag的时候,<.> 和 <.*?> 有什么区别 + a、str.find():正序字符串查找函数 + 函数原型: + str.find(substr [,pos_start [,pos_end ] ] ) + 返回str中第一次出现的substr的第一个字母的标号,如果str中没有substr则返回-1,也就是说从左边算起的第一次出现的substr的首字母标号。 + + 参数说明: + str:代表原字符串 + substr:代表要查找的字符串 + pos_start:代表查找的开始位置,默认是从下标0开始查找 + pos_end:代表查找的结束位置 + + 例子: + 'aabbcc.find('bb')' # 2 + + b、str.index():正序字符串查找函数 + index()函数类似于find()函数,在Python中也是在字符串中查找子串第一次出现的位置,跟find()不同的是,未找到则抛出异常。 + + 函数原型: + str.index(substr [, pos_start, [ pos_end ] ] ) + + 参数说明: + str:代表原字符串 + substr:代表要查找的字符串 + pos_start:代表查找的开始位置,默认是从下标0开始查找 + pos_end:代表查找的结束位置 + + 例子: + 'acdd l1 23'.index(' ') # 4 + + c、str.rfind():倒序字符串查找函数 + + 函数原型: + str.rfind( substr [, pos_start [,pos_ end ] ]) + 返回str中最后出现的substr的第一个字母的标号,如果str中没有substr则返回-1,也就是说从右边算起的第一次出现的substr的首字母标号。 + + 参数说明: + str:代表原字符串 + substr:代表要查找的字符串 + pos_start:代表查找的开始位置,默认是从下标0开始查找 + pos_end:代表查找的结束位置 + + 例子: + 'adsfddf'.rfind('d') # 5 + + d、str.rindex():倒序字符串查找函数 + rindex()函数类似于rfind()函数,在Python中也是在字符串中倒序查找子串最后一次出现的位置,跟rfind()不同的是,未找到则抛出异常。 + + 函数原型: + str.rindex(substr [, pos_start, [ pos_end ] ] ) + + 参数说明: + str:代表原字符串 + substr:代表要查找的字符串 + pos_start:代表查找的开始位置,默认是从下标0开始查找 + pos_end:代表查找的结束位置 + + 例子: + 'adsfddf'.rindex('d') # 5 + + e、使用re模块进行查找和替换: +函数 | 说明 +---|--- +re.match(pat, s) | 只从字符串s的头开始匹配,比如(‘123’, ‘12345’)匹配上了,而(‘123’,’01234’)就是没有匹配上,没有匹配上返回None,匹配上返回matchobject +re.search(pat, s) | 从字符串s的任意位置都进行匹配,比如(‘123’,’01234’)就是匹配上了,只要s只能存在符合pat的连续字符串就算匹配上了,没有匹配上返回None,匹配上返回matchobject +re.sub(pat,newpat,s) | re.sub(pat,newpat,s) 对字符串中s的包含的所有符合pat的连续字符串进行替换,如果newpat为str,那么就是替换为newpat,如果newpat是函数,那么就按照函数返回值替换。sub函数两个有默认值的参数分别是count表示最多只处理前几个匹配的字符串,默认为0表示全部处理;最后一个是flags,默认为0 + + f、使用replace()进行替换: + 基本用法:对象.replace(rgExp,replaceText,max) + + 其中,rgExp和replaceText是必须要有的,max是可选的参数,可以不加。 + rgExp是指正则表达式模式或可用标志的正则表达式对象,也可以是 String 对象或文字; + replaceText是一个String 对象或字符串文字; + max是一个数字。 + 对于一个对象,在对象的每个rgExp都替换成replaceText,从左到右最多max次。 + + s1='hello world' + s1.replace('world','liming') + +### 97.用Python匹配HTML tag的时候,<.*> 和 <.*?> 有什么区别 + 第一个代表贪心匹配,第二个代表非贪心; + ?在一般正则表达式里的语法是指的"零次或一次匹配左边的字符或表达式"相当于{0,1} + 而当?后缀于*,+,?,{n},{n,},{n,m}之后,则代表非贪心匹配模式,也就是说,尽可能少的匹配左边的字符或表达式,这里是尽可能少的匹配.(任意字符) + + 所以:第一种写法是,尽可能多的匹配,就是匹配到的字符串尽量长,第二中写法是尽可能少的匹配,就是匹配到的字符串尽量短。 + 比如tag>tag>end,第一个会匹配tag>tag>,第二个会匹配。 ### 98.正则表达式贪婪与非贪婪模式的区别? + 贪婪模式: + 定义:正则表达式去匹配时,会尽量多的匹配符合条件的内容 + 标识符:+,?,*,{n},{n,},{n,m} + 匹配时,如果遇到上述标识符,代表是贪婪匹配,会尽可能多的去匹配内容 + + 非贪婪模式: + 定义:正则表达式去匹配时,会尽量少的匹配符合条件的内容 也就是说,一旦发现匹配符合要求,立马就匹配成功,而不会继续匹配下去(除非有g,开启下一组匹配) + 标识符:+?,??,*?,{n}?,{n,}?,{n,m}? + 可以看到,非贪婪模式的标识符很有规律,就是贪婪模式的标识符后面加上一个? + + 参考文章:https://dailc.github.io/2017/07/06/regularExpressionGreedyAndLazy.html + ### 99.写出开头匹配字母和下划线,末尾是数字的正则表达式? + s1='_aai0efe00' + res=re.findall('^[a-zA-Z_]?[a-zA-Z0-9_]{1,}\d$',s1) + print(res) + ### 100.正则表达式操作 ### 101.请匹配出变量A 中的json字符串。 ### 102.怎么过滤评论中的表情? + 思路:主要是匹配表情包的范围,将表情包的范围用空替换掉 +``` +import re +pattern = re.compile(u'[\uD800-\uDBFF][\uDC00-\uDFFF]') +pattern.sub('',text) + +``` ### 103.简述Python里面search和match的区别 + match()函数只检测字符串开头位置是否匹配,匹配成功才会返回结果,否则返回None; + search()函数会在整个字符串内查找模式匹配,只到找到第一个匹配然后返回一个包含匹配信息的对象,该对象可以通过调用group()方法得到匹配的字符串,如果字符串没有匹配,则返回None。 + ### 104.请写出匹配ip的Python正则表达式 ### 105.Python里match与search的区别? + 见103题 ## 系统编程 ### 106.进程总结 @@ -1380,7 +1864,7 @@ def reader(q): def writer(q): print("writer 启动(%s),父进程为(%s)"%(os.getpid(),os.getpid())) - for i ini "itcast": + for i in "itcast": q.put(i) if __name__ == "__main__": print("(%s)start"%os.getpid()) @@ -1401,7 +1885,7 @@ if __name__ == "__main__": 协程: 是一种用户态的轻量级线程,协程的调度完全由用户控制。协程拥有自己的寄存器上下文和栈。协程调度时,将寄存器上下文和栈保存到其他地方,在切回来的时候,恢复先前保存的寄存器上下文和栈,直接操中栈则基本没有内核切换的开销,可以不加锁的访问全局变量,所以上下文的切换非常快。 -### 108.Python异常使用场景有那些? +### 108.Python异步使用场景有那些? 异步的使用场景: 1、 不涉及共享资源,获对共享资源只读,即非互斥操作 @@ -1526,7 +2010,7 @@ if __name__=='__main___': ### 113.什么是死锁? 若干子线程在系统资源竞争时,都在等待对方对某部分资源解除占用状态,结果是谁也不愿先解锁,互相干等着,程序无法执行下去,这就是死锁。 -GIL锁 全局解释器锁(只在cython里才有) +GIL锁 全局解释器锁 作用: 限制多线程同时执行,保证同一时间只有一个线程执行,所以cython里的多线程其实是伪多线程! @@ -1576,12 +2060,12 @@ GIL锁 全局解释器锁(只在cython里才有) ### 119.线程是并发还是并行,进程是并发还是并行? 线程是并发,进程是并行; -进程之间互相独立,是系统分配资源的最小单位,同一个线程中的所有线程共享资源。 +进程之间互相独立,是系统分配资源的最小单位,同一个进程中的所有线程共享资源。 ### 120.并行(parallel)和并发(concurrency)? 并行: 同一时刻多个任务同时在运行 -不会在同一时刻同时运行,存在交替执行的情况。 +并发:不会在同一时刻同时运行,存在交替执行的情况。 实现并行的库有: multiprocessing @@ -1601,6 +2085,8 @@ asyncio这个库就是使用python的yield这个可以打断保存当前函数 ### 123.怎么实现强行关闭客户端和服务器之间的连接? ### 124.简述TCP和UDP的区别以及优缺点? ### 125.简述浏览器通过WSGI请求动态资源的过程? +浏览器发送的请求被Nginx监听到,Nginx根据请求的URL的PATH或者后缀把请求静态资源的分发到静态资源的目录,别的请求根据配置好的转发到相应端口。 +实现了WSGI的程序会监听某个端口,监听到Nginx转发过来的请求接收后(一般用socket的recv来接收HTTP的报文)以后把请求的报文封装成`environ`的字典对象,然后再提供一个`start_response`的方法。把这两个对象当成参数传入某个方法比如`wsgi_app(environ, start_response)`或者实现了`__call__(self, environ, start_response)`方法的某个实例。这个实例再调用`start_response`返回给实现了WSGI的中间件,再由中间件返回给Nginx。 ### 126.描述用浏览器访问www.baidu.com的过程 ### 127.Post和Get请求的区别? ### 128.cookie 和session 的区别? @@ -1705,7 +2191,7 @@ ioloop: 对I/O 多路复用的封装,它实现一个单例 什么是CORS? CORS是一个W3C标准,全称是“跨域资源共享"(Cross-origin resoure sharing). -它允许浏览器向跨源服务器,发出XMLHttpRequest请求,从而客服了AJAX只能同源使用的限制。 +它允许浏览器向跨源服务器,发出XMLHttpRequest请求,从而克服了AJAX只能同源使用的限制。 什么是CSRF? @@ -1902,8 +2388,31 @@ Session采用的是在服务器端保持状态的方案,而Cookie采用的是 ## 爬虫 ### 159.试列出至少三种目前流行的大型数据库 ### 160.列举您使用过的Python网络爬虫所用到的网络数据包? + +requests, urllib,urllib2, httplib2 + ### 161.爬取数据后使用哪个数据库存储数据的,为什么? + ### 162.你用过的爬虫框架或者模块有哪些?优缺点? + +Python自带:urllib,urllib2 + +第三方:requests + +框架: Scrapy + +urllib 和urllib2模块都做与请求URL相关的操作,但他们提供不同的功能。 + +urllib2: urllib2.urlopen可以接受一个Request对象或者url,(在接受Request对象时,并以此可以来设置一个URL的headers),urllib.urlopen只接收一个url。 + +urllib 有urlencode,urllib2没有,因此总是urllib, urllib2常会一起使用的原因 + +scrapy是封装起来的框架,他包含了下载器,解析器,日志及异常处理,基于多线程,twisted的方式处理,对于固定单个网站的爬取开发,有优势,但是对于多网站爬取100个网站,并发及分布式处理不够灵活,不便调整与扩展 + +requests是一个HTTP库,它只是用来请求,它是一个强大的库,下载,解析全部自己处理,灵活性高 + +Scrapy优点:异步,xpath,强大的统计和log系统,支持不同url。shell方便独立调试。写middleware方便过滤。通过管道存入数据库 + ### 163.写爬虫是用多进程好?还是多线程好? ### 164.常见的反爬虫和应对方法? ### 165.解析网页的解析器使用最多的是哪几个? @@ -1943,9 +2452,49 @@ Session采用的是在服务器端保持状态的方案,而Cookie采用的是 # 数据库 ## MySQL ### 198.主键 超键 候选键 外键 + +主键:数据库表中对存储数据对象予以唯一和完整标识的数据列或属性的组合。一个数据列只能有一个主键,且主键的取值不能缺失,即不能为空值(Null). + +超键:在关系中能唯一标识元组的属性集称为关系模式的超键。一个属性可以作为一个超键,多个属性组合在一起也可以作为一个超键。超键包含候选键和主键。 + +候选键:是最小超键,即没有冗余元素的超键。 + +外键:在一个表中存在的另一个表的主键称此表的外键。 + ### 199.视图的作用,视图可以更改么? + +视图是虚拟的表,与包含数据的表不一样,视图只包含使用时动态检索数据的查询;不包含任何列或数据。使用视图可以简化复杂的sql操作,隐藏具体的细节,保护数据;视图创建后,可以使用与表相同的方式利用它们。 + +视图不能被索引,也不能有关联的触发器或默认值,如果视图本身内有order by则对视图再次order by将被覆盖。 + +创建视图: create view xxx as xxxxxx + +对于某些视图比如未使用联结子查询分组聚集函数Distinct Union等,是可以对其更新的,对视图的更新将对基表进行更新;但是视图主要用于简化检索,保护数据,并不用于更新,而且大部分视图都不可以更新。 + ### 200.drop,delete与truncate的区别 + +drop直接删掉表,truncate删除表中数据,再插入时自增长id又从1开始,delete删除表中数据,可以加where字句。 + +1.delete 语句执行删除的过程是每次从表中删除一行,并且同时将该行的删除操作作为事务记录在日志中保存以便进行回滚操作。truncate table则一次性地从表中删除所有的数据并不把单独的删除操作记录记入日志保存,删除行是不能恢复的。并且在删除的过程中不会激活与表有关的删除触发器,执行速度快。 + +2.表和索引所占空间。当表被truncate后,这个表和索引所占用的空间会恢复到初始大小,而delete操作不会减少表或索引所占用的空间。drop语句将表所占用的空间全释放掉。 + +3.一般而言,drop>truncate>delete + +4.应用范围。truncate只能对table,delete可以是table和view + +5.truncate和delete只删除数据,而drop则删除整个表(结构和数据) + +6.truncate与不带where的delete:只删除数据,而不删除表的结构(定义)drop语句将删除表的结构被依赖的约束(constrain),触发器(trigger)索引(index);依赖于该表的存储过程/函数将被保留,但其状态会变为:invalid. + ### 201.索引的工作原理及其种类 + +数据库索引,是数据库管理系统中一个排序的数据结构,以协助快速查询,更新数据库表中数据。索引的实现通常使用B树以其变种B+树。 + +在数据之外,数据库系统还维护着满足特定查找算法的数据结构,这些数据结构以某种方式引用(指向)数据,这样就可以在这些数据结构上实现高级查找算法。这种数据结构,就是索引。 + +为表设置索引要付出代价的:一是增加了数据库的存储空间,二是在插入和修改数据时要花费较多的时间(因为索引也要随之变动) + ### 202.连接的种类 ### 203.数据库优化的思路 ### 204.存储过程与触发器的区别 @@ -1954,10 +2503,67 @@ Session采用的是在服务器端保持状态的方案,而Cookie采用的是 ## Redis ### 207.Redis宕机怎么解决? + +宕机:服务器停止服务‘ + +如果只有一台redis,肯定 会造成数据丢失,无法挽救 + +多台redis或者是redis集群,宕机则需要分为在主从模式下区分来看: + +slave从redis宕机,配置主从复制的时候才配置从的redis,从的会从主的redis中读取主的redis的操作日志1,在redis中从库重新启动后会自动加入到主从架构中,自动完成同步数据; + +2, 如果从数据库实现了持久化,此时千万不要立马重启服务,否则可能会造成数据丢失,正确的操作如下:在slave数据上执行SLAVEOF ON ONE,来断开主从关系并把slave升级为主库,此时重新启动主数据库,执行SLAVEOF,把它设置为从库,连接到主的redis上面做主从复制,自动备份数据。 + +以上过程很容易配置错误,可以使用redis提供的哨兵机制来简化上面的操作。简单的方法:redis的哨兵(sentinel)的功能 + ### 208.redis和mecached的区别,以及使用场景 + +区别 + +1、redis和Memcache都是将数据存放在内存中,都是内存数据库。不过memcache还可以用于缓存其他东西,例如图片,视频等等 + +2、Redis不仅仅支持简单的k/v类型的数据,同时还提供list,set,hash等数据结构的存储 + +3、虚拟内存-redis当物流内存用完时,可以将一些很久没用的value交换到磁盘 + +4、过期策略-memcache在set时就指定,例如set key1 0 0 8,即永不过期。Redis可以通过例如expire设定,例如expire name 10 + +5、分布式-设定memcache集群,利用magent做一主多从,redis可以做一主多从。都可以一主一丛 + +6、存储数据安全-memcache挂掉后,数据没了,redis可以定期保存到磁盘(持久化) + +7、灾难恢复-memcache挂掉后,数据不可恢复,redis数据丢失后可以通过aof恢复 + +8、Redis支持数据的备份,即master-slave模式的数据备份 + +9、应用场景不一样,redis除了作为NoSQL数据库使用外,还能用做消息队列,数据堆栈和数据缓存等;Memcache适合于缓存SQL语句,数据集,用户临时性数据,延迟查询数据和session等 + +使用场景 + +1,如果有持久方面的需求或对数据类型和处理有要求的应该选择redis + +2,如果简单的key/value存储应该选择memcached. + ### 209.Redis集群方案该怎么做?都有哪些方案? + +1,codis + +目前用的最多的集群方案,基本和twemproxy一致的效果,但它支持在节点数量改变情况下,旧节点数据客恢复到新hash节点 + +2redis cluster3.0自带的集群,特点在于他的分布式算法不是一致性hash,而是hash槽的概念,以及自身支持节点设置从节点。具体看官方介绍 + +3.在业务代码层实现,起几个毫无关联的redis实例,在代码层,对key进行hash计算,然后去对应的redis实例操作数据。这种方式对hash层代码要求比较高,考虑部分包括,节点失效后的替代算法方案,数据震荡后的字典脚本恢复,实例的监控,等等 + ### 210.Redis回收进程是如何工作的 +一个客户端运行了新的命令,添加了新的数据。 + +redis检查内存使用情况,如果大于maxmemory的限制,则根据设定好的策略进行回收。 + +一个新的命令被执行等等,所以我们不断地穿越内存限制的边界,通过不断达到边界然后不断回收回到边界以下。 + +如果一个命令的结果导致大量内存被使用(例如很大的集合的交集保存到一个新的键),不用多久内存限制就会被这个内存使用量超越。 + ## MongoDB ### 211.MongoDB中对多条记录做更新操作命令是什么? ### 212.MongoDB如何才会拓展到多个shard里? @@ -1985,9 +2591,191 @@ Session采用的是在服务器端保持状态的方案,而Cookie采用的是 ### 230.如何判断单向链表中是否有环? ### 231.你知道哪些排序算法(一般是通过问题考算法) ### 232.斐波那契数列 + +**数列定义: ** + +f 0 = f 1 = 1 +f n = f (n-1) + f (n-2) + +#### 根据定义 + +速度很慢,另外(暴栈注意!⚠️️) `O(fibonacci n)` + +```python +def fibonacci(n): + if n == 0 or n == 1: + return 1 + return fibonacci(n - 1) + fibonacci(n - 2) +``` + +#### 线性时间的 + +**状态/循环** + +```python +def fibonacci(n): + a, b = 1, 1 + for _ in range(n): + a, b = b, a + b + return a +``` + +**递归** + +```python +def fibonacci(n): + def fib(n_, s): + if n_ == 0: + return s[0] + a, b = s + return fib(n_ - 1, (b, a + b)) + return fib(n, (1, 1)) +``` + +**map(zipwith)** + +```python +def fibs(): + yield 1 + fibs_ = fibs() + yield next(fibs_) + fibs__ = fibs() + for fib in map(lambad a, b: a + b, fibs_, fibs__): + yield fib + + +def fibonacci(n): + fibs_ = fibs() + for _ in range(n): + next(fibs_) + return next(fibs) +``` + +**做缓存** + +```python +def cache(fn): + cached = {} + def wrapper(*args): + if args not in cached: + cached[args] = fn(*args) + return cached[args] + wrapper.__name__ = fn.__name__ + return wrapper + +@cache +def fib(n): + if n < 2: + return 1 + return fib(n-1) + fib(n-2) +``` + +**利用 funtools.lru_cache 做缓存** + +```python +from functools import lru_cache + +@lru_cache(maxsize=32) +def fib(n): + if n < 2: + return 1 + return fib(n-1) + fib(n-2) +``` + +#### Logarithmic + +**矩阵** + +```python +import numpy as np +def fibonacci(n): + return (np.matrix([[0, 1], [1, 1]]) ** n)[1, 1] +``` + +**不是矩阵** + +```python +def fibonacci(n): + def fib(n): + if n == 0: + return (1, 1) + elif n == 1: + return (1, 2) + a, b = fib(n // 2 - 1) + c = a + b + if n % 2 == 0: + return (a * a + b * b, c * c - a * a) + return (c * c - a * a, b * b + c * c) + return fib(n)[0] +``` + ### 233.如何翻转一个单链表? + +```python +class Node: + def __init__(self,data=None,next=None): + self.data = data + self.next = next + +def rev(link): + pre = link + cur = link.next + pre.next = None + while cur: + temp = cur.next + cur.next = pre + pre = cur + cur = tmp + return pre + +if __name__ == '__main__': + link = Node(1,Node(2,Node(3,Node(4,Node(5,Node(6,Node7,Node(8.Node(9)))))))) + root = rev(link) + while root: + print(roo.data) + root = root.next +``` + + + ### 234.青蛙跳台阶问题 + +一只青蛙要跳上n层高的台阶,一次能跳一级,也可以跳两级,请问这只青蛙有多少种跳上这个n层台阶的方法? + +方法1:递归 + +设青蛙跳上n级台阶有f(n)种方法,把这n种方法分为两大类,第一种最后一次跳了一级台阶,这类共有f(n-1)种,第二种最后一次跳了两级台阶,这种方法共有f(n-2)种,则得出递推公式f(n)=f(n-1) + f(n-2),显然f(1)=1,f(2)=2,这种方法虽然代码简单,但效率低,会超出时间上限 + +```python +class Solution: + def climbStairs(self,n): + if n ==1: + return 1 + elif n==2: + return 2 + else: + return self.climbStairs(n-1) + self.climbStairs(n-2) +``` + +方法2:用循环来代替递归 + +```python +class Solution: + def climbStairs(self,n): + if n==1 or n==2: + return n + a,b,c = 1,2,3 + for i in range(3,n+1): + c = a+b + a = b + b = c + return c +``` + ### 235.两数之和 Two Sum + + + ### 236.搜索旋转排序数组 Search in Rotated Sorted Array ### 237.Python实现一个Stack的数据结构 ### 238.写一个二分查找 @@ -1999,3 +2787,9 @@ Session采用的是在服务器端保持状态的方案,而Cookie采用的是 ### 243.一个大约有一万行的文本文件统计高频词 ### 244.怎么在海量数据中找出重复次数最多的一个? ### 245.判断数据是否在大量数据中 + +## 架构 + +### [Python后端架构演进]() + +这篇文章几乎涵盖了python会用的架构,在面试可以手画架构图,根据自己的项目谈下技术选型和优劣,遇到的坑等。绝对加分 diff --git a/README_EN.md b/README_EN.md new file mode 100644 index 0000000..5e40424 --- /dev/null +++ b/README_EN.md @@ -0,0 +1,2528 @@ + + +# Python Basics +## File operations +### 1. There is a jsonline format file file.txt with a size of about 10K +```python +def get_lines(): + with open('file.txt','rb') as f: + return f.readlines() + +if __name__ =='__main__': + for e in get_lines(): + process(e) # Process each row of data +``` +Now we have to process a file with a size of 10G, but the memory is only 4G. If only the get_lines function is modified and other codes remain unchanged, how should this be achieved? What are the issues that need to be considered? +```python +def get_lines(): + with open('file.txt','rb') as f: + for i in f: + yield i +``` +Personally think: It is better to set the number of rows returned each time, otherwise there are too many reads. +``` +def get_lines(): + l = [] + with open('file.txt','rb') as f: + data = f.readlines(60000) + l.append(data) + yield l +``` +Method provided by Pandaaaa906 +```python +from mmap import mmap + + +def get_lines(fp): + with open(fp,"r+") as f: + m = mmap(f.fileno(), 0) + tmp = 0 + for i, char in enumerate(m): + if char==b"\n": + yield m[tmp:i+1].decode() + tmp = i+1 + +if __name__=="__main__": + for i in get_lines("fp_some_huge_file"): + print(i) +``` +The problems to be considered are: the memory is only 4G and cannot read 10G files at one time, and the data needs to be read in batches to record the location of each data read. If the size of the data read in batches is too small, it will take too much time in the read operation. +https://stackoverflow.com/questions/30294146/python-fastest-way-to-process-large-file + +### 2. Add missing code +```python +def print_directory_contents(sPath): +""" +This function receives the name of the folder as an input parameter +Returns the path of the file in the folder +And the path to the file in its containing folder +""" +import os +for s_child in os.listdir(s_path): + s_child_path = os.path.join(s_path, s_child) + if os.path.isdir(s_child_path): + print_directory_contents(s_child_path) + else: + print(s_child_path) +``` +## Modules and packages +### 3. Enter the date, and determine which day is the day of the year? +```python +import datetime +def dayofyear(): + year = input("Please enter the year: ") + month = input("Please enter the month: ") + day = input("Please enter the day: ") + date1 = datetime.date(year=int(year),month=int(month),day=int(day)) + date2 = datetime.date(year=int(year),month=1,day=1) + return (date1-date2).days+1 +``` +### 4. Disrupt a sorted list object alist? +```python +import random +alist = [1,2,3,4,5] +random.shuffle(alist) +print(alist) +``` +## type of data +### 5. Existing dictionary d = {'a':24,'g':52,'i':12,'k':33} Please sort by value? +```python +sorted(d.items(),key=lambda x:x[1]) +``` + x[0] represents sorting by key; x[1] represents sorting by value. +### 6. Dictionary comprehension +```python +d = {key:value for (key,value) in iterable} +``` +### 7. Please reverse the string "aStr"? +```python +print("aStr"[::-1]) +``` +### 8. Process the string "k:1 |k1:2|k2:3|k3:4" into a dictionary {k:1,k1:2,...} +```python +str1 = "k:1|k1:2|k2:3|k3:4" +def str2dict(str1): + dict1 = {} + for iterms in str1.split('|'): + key,value = iterms.split(':') + dict1[key] = value + return dict1 +#Dictionary derivation +d = {k:int(v) for t in str1.split("|") for k, v in (t.split(":"), )} +``` +### 9. Please sort by the age of the elements in alist from largest to smallest +```python +alist = [{'name':'a','age':20},{'name':'b','age':30},{'name':'c','age':25} ] +def sort_by_age(list1): + return sorted(alist,key=lambda x:x['age'],reverse=True) +``` +### 10. What will be the output of the following code? +```python +list = ['a','b','c','d','e'] +print(list[10:]) +``` +The code will output [], no IndexError error will be generated, as expected, try to get a member of a list with an index that exceeds the number of members. For example, trying to get the members of list[10] and later will result in IndexError. However, trying to get a slice of the list, the initial index exceeds the number of members will not generate IndexError, but only return an empty list. This has become a particularly nauseating incurable disease, because there are no errors during operation, making it difficult to track down bugs. +### 11. Write a list production to generate an arithmetic sequence with a tolerance of 11 +```python +print([x*11 for x in range(10)]) +``` +### 12. Given two lists, how to find the same elements and different elements? +```python +list1 = [1,2,3] +list2 = [3,4,5] +set1 = set(list1) +set2 = set(list2) +print(set1 & set2) +print(set1 ^ set2) +``` +### 13. Please write a piece of python code to delete duplicate elements in the list? +```python +l1 = ['b','c','d','c','a','a'] +l2 = list(set(l1)) +print(l2) +``` +Use the sort method of the list class: +```python +l1 = ['b','c','d','c','a','a'] +l2 = list(set(l1)) +l2.sort(key=l1.index) +print(l2) +``` +It can also be written like this: +```python +l1 = ['b','c','d','c','a','a'] +l2 = sorted(set(l1),key=l1.index) +print(l2) +``` +You can also use traversal: +```python +l1 = ['b','c','d','c','a','a'] +l2 = [] +for i in l1: + if not i in l2: + l2.append(i) +print(l2) +``` +### 14. Given two lists A, B, please use to find the same and different elements in A and B +```python +Same elements in A and B: print(set(A)&set(B)) +Different elements in A, B: print(set(A)^set(B)) +``` +## Corporate Interview Questions +### 15. What is the difference between the new python class and the classic class? +a. In python, all classes that inherit object are new-style classes + +b. There are only new-style classes in Python3 + +c. In Python2, objects that inherit object are new-style classes, and those that do not have parent classes are classic classes + +d. Classic classes are currently not used in Python + +e. Maintain the unity of class and type. The results of executing a.__class__ and type(a) on instances of new-style classes are the same, but they are different for old-style classes. + +f. The search order for multiple inherited attributes is different. The new-style class uses breadth-first search, and the old-style class uses depth-first search. + +### 16. How many built-in data structures in python? +a. Integer type int, long integer type long, floating point type float, complex number complex + +b. String str, list list, tuple + +c. Dictionary dict, set + +d. There is no long in Python3, only int with infinite precision + +### 17. How does python implement the singleton mode? Please write two implementation methods? +The first method: use a decorator +```python +def singleton(cls): + instances = {} + def wrapper(*args, **kwargs): + if cls not in instances: + instances[cls] = cls(*args, **kwargs) + return instances[cls] + return wrapper + + +@singleton +class Foo(object): + pass +foo1 = Foo() +foo2 = Foo() +print(foo1 is foo2) # True +``` +The second method: use the base class +New is the method to actually create an instance object, so rewrite the new method of the base class to ensure that only one instance is generated when the object is created +```python +class Singleton(object): + def __new__(cls, *args, **kwargs): + if not hasattr(cls,'_instance'): + cls._instance = super(Singleton, cls).__new__(cls, *args, **kwargs) + return cls._instance + + +class Foo(Singleton): + pass + +foo1 = Foo() +foo2 = Foo() + +print(foo1 is foo2) # True +``` +The third method: metaclass. Metaclass is a class used to create class objects. When a class object creates an instance object, the call method must be called. Therefore, when calling call, ensure that only one instance is always created. Type is the python meta class +```python +class Singleton(type): + def __call__(cls, *args, **kwargs): + if not hasattr(cls,'_instance'): + cls._instance = super(Singleton, cls).__call__(*args, **kwargs) + return cls._instance + + +# Python2 +class Foo(object): + __metaclass__ = Singleton + +# Python3 +class Foo(metaclass=Singleton): + pass + +foo1 = Foo() +foo2 = Foo() +print(foo1 is foo2) # True + +``` + +### 18. Reverse an integer, for example -123 --> -321 +```python +class Solution(object): + def reverse(self,x): + if -105: + pass + else: + a.remove(i) + print(a) +print('-----------') +print(id(a)) + +``` +```python +#filter +a=[1,2,3,4,5,6,7,8] +b = filter(lambda x: x>5,a) +print(list(b)) +``` +List comprehension +```python +a=[1,2,3,4,5,6,7,8] +b = [i for i in a if i>5] +print(b) +``` +Delete in reverse order +Because the list is always ‘forward’, it can be traversed in reverse order. Even if the following elements are modified, the elements that have not been traversed and their coordinates remain unchanged +```python +a=[1,2,3,4,5,6,7,8] +print(id(a)) +for i in range(len(a)-1,-1,-1): + if a[i]>5: + pass + else: + a.remove(a[i]) +print(id(a)) +print('-----------') +print(a) +``` +### 22. String operation topic +Full-letter short sentence PANGRAM is a sentence containing all English letters, such as: A QUICK BROWN FOX JUMPS OVER THE LAZY DOG. Define and implement a method get_missing_letter, pass in a string acceptance number, and the returned parameter string becomes a PANGRAM. Missing characters. The case in the incoming string parameters should be ignored, and the return should be all lowercase characters and sorted alphabetically (please ignore all non-ACSII characters) + +**The following example is for explanation, double quotes do not need to be considered:** + +(0)Input: "A quick brown for jumps over the lazy dog" + +Returns: "" + +(1) Input: "A slow yellow fox crawls under the proactive dog" + +Returns: "bjkmqz" + +(2) Input: "Lions, and tigers, and bears, oh my!" + +Returns: "cfjkpquvwxz" + +(3) Input: "" + +Returns: "abcdefghijklmnopqrstuvwxyz" + +```python +def get_missing_letter(a): + s1 = set("abcdefghijklmnopqrstuvwxyz") + s2 = set(a.lower()) + ret = "".join(sorted(s1-s2)) + return ret + +print(get_missing_letter("python")) + +# other ways to generate letters +# range("a", "z") +# method one: +import string +letters = string.ascii_lowercase +# Method Two: +letters = "".join(map(chr, range(ord('a'), ord('z') + 1))) +``` + +### 23. Mutable and Immutable Types +1. Variable types include list and dict. Immutable types include string, number, tuple. + +2. When the modification operation is performed, the variable type transfers the address in the memory, that is, directly modifies the value in the memory, and does not open up new memory. + +3. When the immutable type is changed, the value in the original memory address is not changed, but a new memory is opened, the value in the original address is copied over, and the value in this newly opened memory is operated. + +### 24. What is the difference between is and ==? +is: The comparison is whether the id values ​​of the two objects are equal, that is, whether the two objects are the same instance object. Point to the same memory address + +==: Whether the contents/values ​​of the two objects to be compared are equal, the eq() method of the object will be called by default +### 25. Find all odd numbers in the list and construct a new list +```python +a = [1,2,3,4,5,6,7,8,9,10] +res = [i for i in a if i%2==1] +print(res) +``` +### 26. Write 1+2+3+10248 with one line of python code +```python +from functools import reduce +#1. Use sum built-in sum function +num = sum([1,2,3,10248]) +print(num) +#2.reduce function +num1 = reduce(lambda x,y :x+y,[1,2,3,10248]) +print(num1) +``` +### 27. What is the scope of variables in Python? (Variable search order) +LEGB order of function scope + +1. What is LEGB? + +L: the internal scope of the local function + +E: Inside the enclosing function and between the embedded function + +G: global scope + +B: build-in built-in function + +Python's search in the function is divided into 4 types, called LEGB, which is exactly the order to search +### 28. The string `"123"` is converted to `123` without using built-in api, such as `int()` +Method 1: Use the `str` function +```python +def atoi(s): + num = 0 + for v in s: + for j in range(10): + if v == str(j): + num = num * 10 + j + return num +``` +Method 2: Use the `ord` function +```python +def atoi(s): + num = 0 + for v in s: + num = num * 10 + ord(v)-ord('0') + return num +``` +Method 3: Use the `eval` function +```python +def atoi(s): + num = 0 + for v in s: + t = "%s * 1"% v + n = eval(t) + num = num * 10 + n + return num +``` +Method four: Combine method two, use `reduce`, one-line solution +```python +from functools import reduce +def atoi(s): + return reduce(lambda num, v: num * 10 + ord(v)-ord('0'), s, 0) +``` +### 29.Given an array of integers +Given an integer array and a target value, find the two numbers in the array whose sum is the target value. You can assume that each input corresponds to only one answer, and the same elements cannot be reused. Example: Given nums = [2,7,11,15], target=9 because nums[0]+nums[1] = 2+7 =9, so return [0,1] +```python +class Solution: + def twoSum(self,nums,target): + """ + :type nums: List[int] + :type target: int + :rtype: List[int] + """ + d = {} + size = 0 + while size 0 and len(l2)>0: + if l1[0] 0: + tmp.append(l1[0]) + del l1[0] + while len(l2)>0: + tmp.append(l2[0]) + del l2[0] + return tmp +``` +### 37. Given an arbitrary length array, implement a function +Let all odd numbers come before even numbers, and sort the odd numbers in ascending order and even numbers in descending order. For example, the string '1982376455' becomes '1355798642' +```python +# method one +def func1(l): + if isinstance(l, str): + l = [int(i) for i in l] + l.sort(reverse=True) + for i in range(len(l)): + if l[i]% 2> 0: + l.insert(0, l.pop(i)) + print(``.join(str(e) for e in l)) + +# Method Two +def func2(l): + print("".join(sorted(l, key=lambda x: int(x)% 2 == 0 and 20-int(x) or int(x)))) +``` +### 38. Write a function to find the second largest number in an integer array +```python +def find_second_large_num(num_list): + """ + Find the second largest number in the array + """ + # method one + # Sort directly, output the second to last number + tmp_list = sorted(num_list) + print("Method One\nSecond_large_num is :", tmp_list[-2]) + + # Method Two + # Set two flags, one to store the largest number and the other to store the second largest number + # two stores the next largest value, one stores the maximum value, and traverses the array once. First, judge whether it is greater than one. If it is greater, give the value of one to two, and give the value of num_list[i] to one. Otherwise, compare whether it is greater than two. Greater than directly give the value of num_list[i] to two, otherwise pass + one = num_list[0] + two = num_list[0] + for i in range(1, len(num_list)): + if num_list[i]> one: + two = one + one = num_list[i] + elif num_list[i]> two: + two = num_list[i] + print("Method Two\nSecond_large_num is :", two) + + # Method Three + # Use reduce and logical symbols (and, or) + # The basic idea is the same as Method 2, but there is no need to use if for judgment. + from functools import reduce + num = reduce(lambda ot, x: ot[1] and <.*?> + The first represents greedy matching, and the second represents non-greedy; + ? The grammar in the general regular expression means "zero or one match of the left character or expression" is equivalent to {0,1} + When the? Suffix is ​​after *,+,?,{n},{n,},{n,m}, it represents a non-greedy matching mode, that is to say, match the characters or expressions on the left as little as possible, Here are as few matches as possible. (any character) + + So: The first way of writing is to match as much as possible, that is, the matched string is as long as possible, and the second way of writing is to match as few as possible, that is, the matched string is as short as possible. + For example, tag>tag>end, the first will match tag>tag>, and the second will match . +### 98. What is the difference between regular expression greedy and non-greedy mode? + Greedy mode: + Definition: When using regular expressions to match, it will try to match as many content as possible + Identifier: +,?, *, {n}, {n,}, {n,m} + When matching, if the above identifier is encountered, it means that it is a greedy match, and it will match as much content as possible + + Non-greedy mode: + Definition: When the regular expression is matched, it will match the content that meets the conditions as little as possible. That is, once the match is found to meet the requirements, the match will be successful immediately, and the match will not continue (unless there is g, open the next set of matching) + Identifier: +?,??, *?, {n}?, {n,}?, {n,m}? + As you can see, the identifier of non-greedy mode is very regular, that is, the identifier of greedy mode is followed by a? + + Reference article: https://dailc.github.io/2017/07/06/regularExpressionGreedyAndLazy.html + +### 99. Write a regular expression that matches letters and underscores at the beginning and numbers at the end? + s1='_aai0efe00' + res=re.findall('^[a-zA-Z_]?[a-zA-Z0-9_]{1,}\d$',s1) + print(res) + +### 100. Regular expression operations +### 101. Please match the json string in variable A. +### 102. How to filter expressions in comments? + Idea: It is mainly to match the range of the emoticon package, and replace the range of the emoticon package with empty +``` +import re +pattern = re.compile(u'[\uD800-\uDBFF][\uDC00-\uDFFF]') +pattern.sub('',text) + +``` +### 103. Briefly describe the difference between search and match in Python + The match() function only detects whether the beginning of the string matches, and returns the result if the match is successful, otherwise it returns None; + The search() function will search for a pattern match in the entire string, until the first match is found, and then return an object containing the matching information. The object can get the matched string by calling the group() method. If the string does not match , It returns None. + +### 104. Please write a Python regular expression that matches ip +### 105. What is the difference between match and search in Python? + See question 103 + +## System Programming +### 106. Process summary +Process: An instance of a program running on an operating system is called a process. The process needs corresponding system resources: memory, time slice, pid. +Create process: +First, import the Process in multiprocessing: +Create a Process object; +When creating a Process object, you can pass parameters; +```python +p = Process(target=XXX,args=(tuple,),kwargs={key:value}) +target = the task function specified by XXX, no need to add (), +args=(tuple,)kwargs=(key:value) parameters passed to the task function +``` +Use start() to start the process +end process +Pass parameter Demo to the specified function of the child process +```python +import os +from mulitprocessing import Process +import time + +def pro_func(name,age,**kwargs): + for i in range(5): + print("The child process is running, name=%s,age=%d,pid=%d"%(name,age,os.getpid())) + print(kwargs) + time.sleep(0.2) +if __name__ == "__main__": + #Create Process Object + p = Process(target=pro_func,args=('小明',18),kwargs={'m':20}) + #Start process + p.start() + time.sleep(1) + #1 second later, immediately end the child process + p.terminate() + p.join() +``` +Note: global variables are not shared between processes + +Communication between processes-Queue + +When initializing the Queue() object (for example, q=Queue(), if the maximum acceptable number of messages is not specified in the parentheses, and the obtained number is negative, it means that the number of acceptable messages has no upper limit until the end of the memory) + +Queue.qsize(): returns the number of messages contained in the current queue + +Queue.empty(): If the queue is empty, return True, otherwise False + +Queue.full(): If the queue is full, return True, otherwise False + +Queue.get([block[,timeout]]): Get a message in the queue, and then remove it from the queue, + +The default value of block is True. + +If the block uses the default value and no timeout (in seconds) is set, if the message queue is empty, the program will be blocked (stopped in the state of reading) until the message queue has read the message. If the timeout is set, it will wait timeout seconds, if no message has been read yet, the "Queue.Empty" exception will be thrown: + +Queue.get_nowait() is equivalent to Queue.get(False) + +Queue.put(item,[block[,timeout]]): write the item message to the queue, the default value of block is True; +If the block uses the default value and the timeout (in seconds) is not set, if the message queue has no space to write, the program will be blocked (stopped in the writing state) until space is free from the message queue. If set If timeout is reached, it will wait for timeout seconds, if there is still no space, it will throw "Queue.Full" exception +If the block value is False, if the message queue has no space to write, it will immediately throw a "Queue.Full" exception; +Queue.put_nowait(item): equivalent to Queue.put(item, False) + +Demo of inter-process communication: +```python +from multiprocessing import Process.Queue +import os,time,random +#Write the code executed by the data process: +def write(q): + for value in ['A','B','C']: + print("Put %s to queue...",%value) + q.put(value) + time.sleep(random.random()) +#Read the code executed by the data process +def read(q): + while True: + if not q.empty(): + value = q.get(True) + print("Get %s from queue.",%value) + time.sleep(random.random()) + else: + break +if __name__=='__main__': + #The parent process creates a Queue and passes it to each child process + q = Queue() + pw = Process(target=write,args=(q,)) + pr = Process(target=read,args=(q,)) + #Start the child process pw, write: + pw.start() + #Wait for pw to end + pw.join() + #Start the child process pr, read: + pr.start() + pr.join() + #pr There is an endless loop in the process, you cannot wait for its end, you can only terminate it forcefully: + print('') + print('All data are written and read') +``` + Process Pool Pool +```python +#coding:utf-8 +from multiprocessing import Pool +import os,time,random + +def worker(msg): + t_start = time.time() + print("%s starts to execute, the process number is %d"%(msg,os.getpid())) + # random.random() Randomly generate floating-point numbers between 0-1 + time.sleep(random.random()*2) + t_stop = time.time() + print(msg,"Execution completed, time-consuming %0.2f"%(t_stop-t_start)) + +po = Pool(3)#Define a process pool, the maximum number of processes is 3 +for i in range(0,10): + po.apply_async(worker,(i,)) +print("---start----") +po.close() +po.join() +print("----end----") +``` +Use Queue in the process pool + +If you want to use Pool to create a process, you need to use Queue() in multiprocessing.Manager() instead of multiprocessing.Queue(), otherwise you will get the following error message: + +RuntimeError: Queue objects should only be shared between processs through inheritance +```python +from multiprocessing import Manager,Pool +import os,time,random +def reader(q): + print("reader start (%s), parent process is (%s)"%(os.getpid(),os.getpid())) + for i in range(q.qsize()): + print("reader gets the message from Queue:%s"%q.get(True)) + +def writer(q): + print("writer started (%s), parent process is (%s)"%(os.getpid(),os.getpid())) + for i ini "itcast": + q.put(i) +if __name__ == "__main__": + print("(%s)start"%os.getpid()) + q = Manager().Queue()#Use Queue in Manager + po = Pool() + po.apply_async(wrtier,(q,)) + time.sleep(1) + po.apply_async(reader,(q,)) + po.close() + po.join() + print("(%s)End"%os.getpid()) +``` +### 107. Talk about your understanding of multi-processes, multi-threads, and coroutines. Does the project use it? +The concept of this question being asked is quite big, +Process: A running program (code) is a process, and the code that is not running is called a program. The process is the smallest unit of system resource allocation. The process has its own independent memory space. All the data is not shared between processes, and the overhead is high. + +Thread: The smallest unit of CPU scheduling execution, also called execution path, cannot exist independently, depends on the existence of the process, a process has at least one thread, called the main thread, and multiple threads share memory (data sharing, shared global variables), thus extremely The operation efficiency of the program is greatly improved. + +Coroutine: It is a lightweight thread in user mode, and the scheduling of the coroutine is completely controlled by the user. The coroutine has its own register context and stack. When the coroutine is scheduled, save the register context and stack to other places. When switching back, restore the previously saved register context and stack. Directly operating the stack will basically have no kernel switching overhead, and you can access global variables without locking. , So the context switching is very fast. + +### 108. What are the asynchronous usage scenarios of Python? +Asynchronous usage scenarios: + +1. No shared resources are involved, and shared resources are read-only, that is, non-mutually exclusive operations + +2. There is no strict relationship in timing + +3. No atomic operation is required, or atomicity can be controlled by other means + +4. It is often used for time-consuming operations such as IO operations, because it affects customer experience and performance + +5. Does not affect the logic of the main thread + +### 109. Multi-threads work together to synchronize the same data mutex? +```python +import threading +import time +class MyThread(threading.Thread): + def run(self): + global num + time.sleep(1) + + if mutex.acquire(1): + num +=1 + msg = self.name +'set num to '+str(num) + print msg + mutex.release() +num = 0 +mutex = threading.Lock() +def test(): + for i in range(5): + t = MyThread() + t.start() +if __name__=="__main__": + test() +``` +### 110. What is multi-threaded competition? +Threads are not independent. Threads in the same process share data. When each thread accesses data resources, there will be a state of competition, that is: data is almost synchronized and will be occupied by multiple threads, causing data confusion, which is the so-called thread insecurity. + +So how to solve the multi-threaded competition problem? ---lock + +The benefits of locks: Ensure that a certain piece of critical code (shared data resources) can only be executed completely by one thread from beginning to end, which can solve the problem of atomic operations under multi-threaded resource competition. + +Disadvantages of locks: Prevents concurrent execution of multiple threads. In fact, a certain piece of code containing locks can only be executed in single-threaded mode, and the efficiency is greatly reduced. + +The fatal problem of locks: deadlocks +### 111. Please tell me about thread synchronization in Python? + One, setDaemon(False) +When a process is started, a main thread will be generated by default, because the thread is the smallest unit of program execution. When multi-threading is set, the main thread will create multiple child threads. In Python, the default is setDaemon(False), the main After the thread finishes its task, it exits. At this time, the child thread will continue to perform its task until the end of its task. + +example +```python +import threading +import time + +def thread(): + time.sleep(2) + print('---End of child thread---') + +def main(): + t1 = threading.Thread(target=thread) + t1.start() + print('---Main thread--End') + +if __name__ =='__main__': + main() +#Results of the +---Main thread--End +---End of child thread--- +``` +Two, setDaemon (True) +When we use setDaemon(True), this is the child thread as a daemon thread. Once the main thread is executed, all child threads are forcibly terminated + +example +```python +import threading +import time +def thread(): + time.sleep(2) + print(’---End of child thread---') +def main(): + t1 = threading.Thread(target=thread) + t1.setDaemon(True)#Set the child thread to guard the main thread + t1.start() + print('---End of main thread---') + +if __name__ =='__main__': + main() +#Results of the +---The main thread ends--- #Only the main thread ends, and the child threads are forced to end before execution +``` +Three, join (thread synchronization) +The work done by join is thread synchronization, that is, after the task of the main thread ends, it enters a blocked state, and waits for the end of all child threads before the main thread terminates. + +When setting the daemon thread, the meaning is that the main thread will kill the child thread for the timeout timeout of the child thread, and finally exit the program, so if there are 10 child threads, the total waiting time is the cumulative sum of each timeout, Simply put, it is to give each child thread a timeou time and let him execute it. When the time is up, no matter whether the task is completed or not, it will be killed directly. + +When the daemon thread is not set, the main thread will wait for the accumulation of timeout and such a period of time. Once the time is up, the main thread ends, but the child threads are not killed, and the child threads can continue to execute until the child threads are all finished. drop out. + +example +```python +import threading +import time + +def thread(): + time.sleep(2) + print('---End of child thread---') + +def main(): + t1 = threading.Thread(target=thread) + t1.setDaemon(True) + t1.start() + t1.join(timeout=1)#1 Thread synchronization, the main thread is blocked for 1s, then the main thread ends, and the child threads continue to execute + #2 If you don't set the timeout parameter, wait until the child thread ends and the main thread ends + #3 If setDaemon=True and timeout=1 are set, the main thread will forcibly kill the child thread after waiting for 1s, and then the main thread ends + print('---End of main thread---') + +if __name__=='__main___': + main() +``` +### 112. Explain what is a lock, and what kinds of locks are there? +Lock (Lock) is an object for thread control provided by python. There are mutex locks, reentrant locks, and deadlocks. + +### 113. What is a deadlock? +When several sub-threads compete for system resources, they are all waiting for the other party to release some resources. As a result, no one wants to unlock first, waiting for each other, and the program cannot be executed. This is a deadlock. + +GIL lock global interpreter lock + +Function: Limit the simultaneous execution of multiple threads to ensure that only one thread executes at the same time, so multithreading in cython is actually pseudo multithreading! + +So Python often uses coroutine technology to replace multithreading, and coroutine is a more lightweight thread. + +The process and thread switching is determined by the system, and the coroutine is determined by our programmers, and the switch under the module gevent is switched only when it encounters a time-consuming operation. + +The relationship between the three: there are threads in the process, and there are coroutines in the threads. +### 114. Multi-threaded interactive access to data, if it is accessed, it will not be accessed? +How to avoid rereading? + +Create a visited data list to store the data that has been visited, and add a mutex lock. When multithreading accesses the data, first check whether the data is in the visited list, and skip it if it already exists. + +### 115. What is thread safety and what is a mutex? +Each object corresponds to a tag that can be called a "mutual exclusion lock". This tag is used to ensure that at any one time, only one thread can access the object. + +System resources are shared among multiple threads in the same process. Multiple threads operate on an object at the same time. One thread has not yet finished the operation, and another thread has already operated on it, resulting in an error in the final result. The operation object adds a mutex lock to ensure that each thread's operation on the object obtains the correct result. + +### 116. Tell me about the following concepts: synchronous, asynchronous, blocking, non-blocking? +Synchronization: Multiple tasks are executed in sequence, and the next can be executed after one is executed. + +Asynchronous: There is no sequence between multiple tasks and can be executed at the same time. Sometimes a task may need to obtain the result of another task executed at the same time when necessary. This is called a callback! + +Blocking: If the caller is stuck, the caller cannot continue to execute, that is, the caller is blocked. + +Non-blocking: If you don't get stuck, you can continue execution, that is, non-blocking. + +Synchronous and asynchronous are relative to multitasking, and blocking and non-blocking are relative to code execution. + +### 117. What are zombie processes and orphan processes? How to avoid zombie processes? +Orphan process: The parent process exits and the child processes that are still running are all orphan processes. The orphan process will be adopted by the init process (process number 1), and the init process will complete the status collection work for them. + +Zombie process: The process uses fork to create a child process. If the child process exits and the parent process does not call wait to obtain waitpid to obtain the status information of the child process, then the process descriptor of the child process is still stored in the system. These processes are zombie processes. + +Ways to avoid zombie processes: + +1. Fork twice use the grandchild process to complete the task of the child process + +2. Use the wait() function to block the parent process + +3. Use the semaphore, call waitpid in the signal handler, so that the parent process does not need to be blocked +### 118. What are the usage scenarios of processes and threads in python? +Multi-process is suitable for CPU-intensive operations (cpu operation instructions are more, such as floating-point operations with more bits). + +Multithreading is suitable for IO dense operations (read and write data operations are more than that, such as crawlers) + +### 119. Are threads concurrent or parallel, and are processes concurrent or parallel? +Threads are concurrent and processes are parallel; + +Processes are independent of each other and are the smallest unit for the system to allocate resources. All threads in the same thread share resources. + +### 120. Parallel (parallel) and concurrency (concurrency)? +Parallel: multiple tasks are running at the same time + +Will not run at the same time at the same time, there is a case of alternate execution. + +The libraries that implement parallelism are: multiprocessing + +Libraries that implement concurrency are: threading + +Programs that need to perform more read and write, request and reply tasks require a lot of IO operations, and IO-intensive operations use concurrency better. + +For programs with a large amount of CPU calculations, it is better to use parallelism +### 121. What is the difference between IO-intensive and CPU-intensive? +IO intensive: The system is running, most of the conditions are CPU waiting for I/O (hard disk/memory) read/write + +CPU-intensive: Most of the time is used to do calculations, logic judgments and other CPU actions are called CPU-intensive. +### 122. How does python asyncio work? +The asyncio library is to use python's yield, a mechanism that can interrupt the context of saving the current function, encapsulate the selector and get rid of the complex callback relationship + +## network programming +### 123. How to forcibly close the connection between the client and the server? +### 124. Briefly describe the difference, advantages and disadvantages of TCP and UDP? +### 125. Briefly describe the process of the browser requesting dynamic resources through WSGI? +The request sent by the browser is monitored by Nginx. Nginx distributes the requested static resource to the static resource directory according to the PATH or suffix of the requested URL, and other requests are forwarded to the corresponding port according to the configured. +A program that implements WSGI will listen to a certain port. After receiving the request forwarded by Nginx (usually use socket recv to receive HTTP messages), the requested message will be encapsulated into a dictionary object of `environ`, and then Provide a `start_response` method. Pass these two objects as parameters to a method such as `wsgi_app(environ, start_response)` or implement an instance of the `__call__(self, environ, start_response)` method. This instance calls `start_response` to return to the middleware that implements WSGI, and then the middleware returns to Nginx. +### 126. Describe the process of visiting www.baidu.com with a browser +### 127. The difference between Post and Get requests? +### 128. The difference between cookie and session? +### 129. List the status codes of the HTTP protocol you know, and what do they mean? +### 130. Please briefly talk about the three handshake and the four wave of hands? +### 131. Tell me what is 2MSL of tcp? +### 132. Why must the client wait for 2MSL in the TIME-WAIT state? +### 133. Tell me about the difference between HTTP and HTTPS? +### 134. Talk about the HTTP protocol and the fields that indicate the data type in the protocol header? +### 135. What are the HTTP request methods? +### 136. What parameters need to be passed in to use Socket? +### 137. Common HTTP request headers? +### 138. Seven-layer model? +### 139. The form of the url? + +# Web +## Flask +### 140. Understanding of Flask Blueprint? +Definition of blueprint + +Blueprint/Blueprint is a method of componentization of Flask applications. Blueprints can be shared within an application or across multiple projects. Using blueprints can greatly simplify the development of large-scale applications, and also provides a centralized mechanism for Flask extensions to register services in applications. + +Application scenarios of the blueprint: + +Decompose an application into a collection of blueprints. This is ideal for large applications. A project can instantiate an application object, initialize several extensions, and register a collection of blueprints. + +Register a blueprint on the app with URL prefix and/or subdomain. The parameters in the URL prefix/subdomain name become the common view parameters of all view functions under this blueprint (by default) +Register a blueprint multiple times with different URL rules in an application. + +Provide template filters, static files, templates, and other functions through blueprints. A blueprint does not have to implement application or view functions. + +When initializing a Flask extension, register a blueprint in these situations. + +Disadvantages of blueprints: + +You cannot deregister a blueprint after the application is created without destroying the entire application object. + +Three steps to use blueprints + +1. Create a blueprint object +```python +blue = Blueprint("blue",__name__) +``` +2. Perform operations on this blueprint object, such as registering routes, specifying static folders, registering template filters... +```python +@blue.route('/') +def blue_index(): + return "Welcome to my blueprint" +``` +3. Register the blueprint object on the application object +```python +app.register_blueprint(blue,url_prefix="/blue") +``` + +### 141. The difference between Flask and Django routing mapping? + In django, routing is the url in the project that the browser visits first when the browser accesses the server, and then the url in the project is used to find the url in the application. These urls are placed in a list and follow the rule of matching from front to back. In flask, routing is provided to each view function through a decorator, and a URL can be used for different functions depending on the request method. + +## Django +### 142. What is wsgi, uwsgi, uWSGI? +WSGI: + +The web server gateway interface is a set of protocols. Used to receive user requests and encapsulate the request for the first time, and then hand the request to the web framework. + +The module that implements the wsgi protocol: wsgiref, essentially writing a socket server to receive user requests (django) + +werkzeug, essentially writing a socket server to receive user requests (flask) + +uwsgi: + +It is a communication protocol like WSGI. It is an exclusive protocol of the uWSGI server and is used to define the type of transmission information. +uWSGI: + +It is a web server that implements the WSGI protocol, uWSGI protocol, and http protocol + +### 143. Comparison of Django, Flask, Tornado? +1. Django takes a broad and comprehensive direction and has high development efficiency. Its MTV framework, built-in ORM, admin background management, built-in sqlite database and server for development and testing, have improved the developer's ultra-high development efficiency. +A heavyweight web framework with complete functions and a one-stop solution, so that developers do not need to spend a lot of time on selection. + +Comes with ORM and template engine, supports unofficial template engines such as jinja. + +The built-in ORM makes Django and the relational database highly coupled. If you want to use a non-relational database, you need to use a third-party library + +Built-in database management app + +Mature, stable, and highly efficient in development. Compared with Flask, Django has better overall closedness and is suitable for enterprise-level website development. Pioneer of python web framework, rich third-party libraries + +2. Flask is a lightweight framework, free, flexible, and extensible. The core is based on Werkzeug WSGI tool and jinja2 template engine + +It is suitable for small websites and web service APIs, there is no pressure to develop large websites, but the architecture needs to be designed by yourself + +The combination with relational databases is not weaker than Django, and the combination with non-relational databases is far superior to Django + +3. Tornado is taking a small but precise direction, with superior performance, its most famous asynchronous non-blocking design method + +Two core modules of Tornado: + +iostraem: Simple encapsulation of non-blocking sockets + +ioloop: A encapsulation of I/O multiplexing, which implements a singleton + +### 144. The difference between CORS and CSRF? +What is CORS? + +CORS is a W3C standard, the full name is "Cross-origin resoure sharing" (Cross-origin resoure sharing). +It allows browsers to send XMLHttpRequest requests to cross-origin servers, thereby overcoming the restriction that AJAX can only be used from the same source. + +What is CSRF? + +The mainstream CSRF defense method is to generate a string of random tokens when the form is generated on the backend, which is built into the form as a field, and at the same time, this string of tokens is placed in the session. Each time the form is submitted to the backend, it will check whether these two values ​​are the same to determine whether the form submission is credible. After one submission, if the page does not generate a CSRF token, the token will be cleared. , If there is a new demand, then the token will be updated. +An attacker can fake a POST form submission, but he does not have a token built into the form generated by the backend, and no token in the session will not help. + +### 145.Session, Cookie, JWT Understanding +Why use session management + +As we all know, the HTTP protocol is a stateless protocol, which means that each request is an independent request, and there is no relationship between the request and the request. But in actual application scenarios, this approach does not meet our needs. For an example that everyone likes to use, add a product to the shopping cart, and consider this request separately. The server does not know who owns the product, and whose shopping cart should be added? Therefore, the context of this request should actually contain user-related information. Each time the user makes a request, this small amount of additional information is also included as part of the request, so that the server can target specific information based on the information in the context. Of users to operate. Therefore, the emergence of these several technologies is a supplement to the HTTP protocol, so that we can use HTTP protocol + state management to build a user-oriented WEB application. + +The difference between Session and Cookie + + Here I want to talk about session and cookies first, because these two technologies are the most common for development. So what is the difference between session and cookies? I personally think that the core difference between session and cookies is who maintains the additional information. When cookies are used to implement session management, user-related information or other information we want to keep in each request is placed in cookies, and cookies are saved by the client, whenever the client sends a new request , It will bring cookies a little, and the server will operate according to the information in them. + When using session for session management, the client actually only saves a session_id sent by the server, and from this session_id, all the state information needed can be restored on the server. From here, it can be seen that this part of the information is Maintained by the server. + +In addition, sessions and cookies have some disadvantages of their own: + +The security of cookies is not good. Attackers can deceive by obtaining local cookies or use cookies to conduct CSRF attacks. When cookies are used, there will be cross-domain issues under multiple domain names. +The session needs to be stored on the server for a certain period of time. Therefore, when there are a large number of users, the performance of the server will be greatly reduced. When there are multiple machines, how to share the session will also be a problem. (redis cluster) also That is to say, the first time the user visits is server A, and the second request is forwarded to server B, how does server B know its status? In fact, session and cookies are related, for example, we can store session_id in cookies. + +How does JWT work + +First, the user sends a login request, and the server performs matching according to the user's login request. If the matching is successful, put the relevant information into the payload, use the algorithm, plus the server's key to generate the token. It is important to note here that the secret_key is very important. If this is leaked, the client can randomly tamper with the additional information sent, which is a guarantee of the integrity of the information. After the token is generated, the server returns it to the client, and the client can pass the token to the server in the next request. Generally speaking, we can put it in the Authorization header, so that cross-domain problems can be avoided. + +### 146. Briefly describe the Django request life cycle +Generally, the user initiates a request to our server through the browser. This request will access the view function. If there is no data call involved, then the view function returns a template that is a web page to the user at this time) +The view function calls the model hair model to find the data in the database, and then returns step by step. The view function fills the returned data into the blanks in the template, and finally returns the web page to the user. + +1.wsgi, the request is encapsulated and handed over to the web framework (Flask, Django) + +2. Middleware, to verify the request or add other relevant data to the request object, for example: csrf, request.session + +3. Route matching according to the different URL sent by the browser to match different view functions + +4. View function, the processing of business logic in the view function, may involve: orm, templates + +5. Middleware to process the response data + +6.wsgi, send the content of the response to the browser + +### 147. Use restframework to complete the api sending time and time zone +The current problem is to use django's rest framework module to make a get request sending time and time zone information api +```python +class getCurrenttime(APIView): + def get(self,request): + local_time = time.localtime() + time_zone =settings.TIME_ZONE + temp = {'localtime':local_time,'timezone':time_zone} + return Response(temp) +``` +### 148. What are nginx, tomcat and apach? +Nginx (engine x) is a high-performance HTTP and reverse proxy server. It is also an IMAP/POP3/SMTP server. It works at OSI seven layers. The load implementation method: polling, IP_HASH, fair, session_sticky. +Apache HTTP Server is a modular server, derived from the NCSAhttpd server +Tomcat server is a free and open source web application server, which is a lightweight application server and is the first choice for developing and debugging JSP programs. + +### 149. What are the paradigms of relational database you are familiar with, and what are their functions? +When designing a database, you can design a database structure without data redundancy and abnormal data maintenance as long as you design in accordance with the design specifications. + +There are many specifications for database design. Generally speaking, when we set up a database, we only need to meet some of these specifications. These specifications are also called the three paradigms of databases. There are three in total, and there are other paradigms. We just need to do To meet the requirements of the first three paradigms, we can set up a database that conforms to ours. We can't all follow the requirements of the paradigm, but also consider the actual business usage, so sometimes we need to do something that violates the paradigm. Requirements. +1. The first paradigm of database design (the most basic). Basically all database paradigms conform to the first paradigm. The tables that conform to the first paradigm have the following characteristics: + +All fields in the database table have only a single attribute. The columns of a single attribute are composed of basic data types (integer, floating point, character, etc.). The designed tables are simple two-comparison tables + +2. The second paradigm of database design (designed on the basis of the first paradigm) requires only one business primary key in a table, which means that there can be no non-primary key column pairs in the second paradigm. Dependency of the primary key + +3. The third paradigm of database design means that every non-primary attribute is neither partially dependent nor transitively dependent on the business primary key, which is based on the second paradigm, eliminating the transitive dependence of non-primary attributes on the primary key + +### 150. Briefly describe the QQ login process +QQ login is divided into three interfaces in our project, + +The first interface is to request the QQ server to return a QQ login interface; + +The second interface is to verify by scanning code or account login. The QQ server returns a code and state to the browser. Use this code to get the access_token from the QQ server through the local server, and then return it to the local server, and then get the user from the QQ server with the access_token. Openid (unique identifier of openid user) + +The third interface is to determine whether the user is logging in to QQ for the first time, if not, log in the returned jwt-token directly to the user, and for users who have not been bound to this website, encrypt the openid to generate the token for binding + +### 151. What is the difference between post and get? +1. GET is to get data from the server, POST is to send data to the server + +2. On the client side, the GET method is to submit the data through the URL, the data can be seen in the URL, and the POST method, the data is placed in HTML-HEADER to submit + +3. For the GET method, the server side uses Request.QueryString to obtain the value of the variable. For the POST method, the server side uses Request.Form to obtain the submitted data. + + +### 152. The role of the log in the project +1. Log related concepts + +1. Logs are a way to track events that occur when certain software is running + +2. Software developers can call logging-related methods into their code to indicate that something has happened + +3. An event can be described by a message containing optional variable data + +4. In addition, events also have the concept of importance, which can also be called severity level (level) + +Second, the role of the log + +1. Through log analysis, it is convenient for users to understand the operation of the system, software, and application; + +2. If your application log is rich enough, you can analyze past user behavior, type preferences, geographic distribution or more information; + +3. If the log of an application is divided into multiple levels at the same time, the health status of the application can be easily analyzed, problems can be discovered in time, and problems can be quickly located, solved, and remedied. + +4. Simply speaking, we can understand whether a system or software program is operating normally by recording and analyzing logs, and can also quickly locate problems when an application fails. Logs are also very important not only in development, but also in operation and maintenance, and the role of logs can also be simple. Summarized as the following points: + +1. Program debugging + +2. Understand the operation of the software program, whether it is normal + +3. Software program operation failure analysis and problem location + +4. If the log information of the application is sufficiently detailed and rich, it can also be used for user behavior analysis + +### 153. How to use django middleware? +Django presets six methods in the middleware. The difference between these six methods is that they are executed in different stages and intervene in input or output. The methods are as follows: + +1. Initialization: without any parameters, it is called once when the server responds to the first request to determine whether to enable the current middleware +```python +def __init__(): + pass +``` +2. Before processing the request: call on each request and return None or HttpResponse object. +```python +def process_request(request): + pass +``` +3. Before processing the view: call on each request, return None or HttpResponse object. +```python +def process_view(request,view_func,view_args,view_kwargs): + pass +``` +4. Before processing the template response: call on each request, and return the response object that implements the render method. +```python +def process_template_response(request,response): + pass +``` +5. After processing the response: All responses are called before returning to the browser, called on each request, and the HttpResponse object is returned. +```python +def process_response(request,response): + pass +``` +6. Exception handling: called when the view throws an exception, called on each request, and returns an HttpResponse object. +```python +def process_exception(request,exception): + pass +``` +### 154. Tell me about your understanding of uWSGI and nginx? +1. uWSGI is a web server, which implements the WSGI protocol, uwsgi, http and other protocols. The role of HttpUwsgiModule in Nginx is to exchange with uWSGI server. WSGI is a web server gateway interface. It is a specification for communication between a web server (such as nginx, uWSGI, etc.) and web applications (such as programs written in the Flask framework). + +Pay attention to the distinction between the three concepts of WSGI/uwsgi/uWSGI. + +WSGI is a communication protocol. + +uwsgi is a wire protocol rather than a communication protocol. It is often used here for data communication between the uWSGI server and other network servers. + +uWSGI is a web server that implements both uwsgi and WSGI protocols. + +nginx is an open source high-performance HTTP server and reverse proxy: + +1. As a web server, it handles static files and index files very efficiently + +2. Its design pays great attention to efficiency, supports up to 50,000 concurrent connections, but only takes up very little memory space + +3. High stability and simple configuration. + +4. Powerful reverse proxy and load balancing function, balance the load pressure application of each server in the cluster + +### 155. What are the application scenarios of the three major frameworks in Python? +Django: It is mainly used for rapid development. Its highlight is rapid development and cost saving. If high concurrency is to be achieved, Django must be developed twice, such as removing the entire bulky framework and writing sockets by yourself. To achieve http communication, the bottom layer is written in pure c, c++ to improve efficiency, the ORM framework is killed, and the framework that encapsulates the interaction with the database is written by yourself. Although the ORM is object-oriented to operate the database, its efficiency is very low, and the foreign key is used to contact the table. Query with the table; +Flask: Lightweight, it is mainly used to write a framework for the interface, to achieve the separation of front and back ends, and to test the development efficiency. Flask itself is equivalent to a core, and almost all other functions need to be extended (mail extension Flask-Mail, User authentication (Flask-Login), all need to be implemented with third-party extensions. For example, you can use Flask-extension to join ORM, file upload, identity verification, etc. Flask does not have a default database. You can choose MySQL or NoSQL. + +Its WSGI toolbox uses Werkzeug (routing module), and its template engine uses Jinja2. These two are also the core of the Flask framework. + +Tornado: Tornado is an open source version of web server software. Tornado is obviously different from current mainstream web server frameworks (including most Python frameworks): it is a non-blocking server, and it is quite fast. Thanks to its non-blocking method and the use of epoll, Tornado can handle thousands of connections per second, so Tornado is an ideal framework for real-time web services +### 156. Where are threads used in Django? Where is the coroutine used? Where is the process used? +1. Time-consuming tasks in Django are executed by a process or thread, such as sending emails, using celery. + +2. It is time to deploy the django project, and the relevant configuration of the process and the coroutine is set in the configuration file. + +### 157. Have you ever used Django REST framework? +Django REST framework is a powerful and flexible Web API tool. The reasons for using RESTframework are: + +Web browsable API has great benefits for developers + +Including OAuth1a and OAuth2 authentication strategies + +Support serialization of ORM and non-ORM data resources + +Full custom development-if you don't want to use more powerful functions, you can just use regular function-based views, additional documentation and strong community support +### 158. Know about cookies and session? Can they be used alone? +Session adopts the scheme of keeping state on the server side, and Cookie adopts the scheme of keeping state on the client side. But if you disable cookies, you cannot get the Session. Because Session uses Session ID to determine the server Session corresponding to the current session, and Session ID is passed through Cookie, disabling Cookie is equivalent to SessionID, so Session cannot be obtained. + +## Crawler +### 159. Try to list at least three currently popular large databases +### 160. List the network packets used by the Python web crawler you have used? + +requests, urllib,urllib2, httplib2 + +### 161. Which database is used to store the data after crawling the data, and why? + +### 162. What crawler frameworks or modules have you used? Pros and cons? + +Python comes with: urllib, urllib2 + +Third party: requests + +Framework: Scrapy + +Both the urllib and urllib2 modules do operations related to requesting URLs, but they provide different functions. + +urllib2: urllib2.urlopen can accept a Request object or url, (when receiving a Request object, and use this to set a URL header), urllib.urlopen only accepts a url. + +urllib has urlencode, urllib2 does not, so it is always the reason why urllib and urllib2 are often used together + +Scrapy is a packaged framework. It includes downloader, parser, log and exception handling. It is based on multi-threaded and twisted processing. It has advantages for crawling development of a fixed single website, but it can crawl 100 for multiple websites. The website, concurrent and distributed processing is not flexible enough, and it is inconvenient to adjust and expand + +requests is an HTTP library, it is only used for requests, it is a powerful library, downloading and parsing are all handled by themselves, with high flexibility + +Scrapy advantages: asynchronous, xpath, powerful statistics and log system, support for different URLs. The shell is convenient for independent debugging. Write middleware to facilitate filtering. Stored in the database through the pipeline + +### 163. Is it better to use multiple processes to write crawlers? Is multithreading better? +### 164. Common anti-reptiles and countermeasures? +### 165. Which are the most used parsers for parsing web pages? +### 166. How to solve the problem of restricting ip, cookie, session at the same time for web pages that need to log in +### 167. How to solve the verification code? +### 168. What do you understand about the most used databases? +### 169. Which crawler middleware have you written? +### 170. How to crack the "JiYi" sliding verification code? +### 171. How often does the crawler crawl, and how is the data stored? +### 172. How to deal with cookie expiration? +### 173. How to deal with dynamic loading and high requirements for timeliness? +### 174. What are the advantages and disadvantages of HTTPS? +### 175. How does HTTPS realize secure data transmission? +### 176. What are TTL, MSL and RTT? +### 177. Talk about your understanding of Selenium and PhantomJS +### 178. How do you usually use a proxy? +### 179. Stored in the database (redis, mysql, etc.). +### 180. How to monitor the status of crawlers? +### 181. Describe the mechanism of scrapy framework operation? +### 182. Talk about your understanding of Scrapy? +### 183. How to make the scrapy framework send a post request (write it out) +### 184. How to monitor the status of crawlers? +### 185. How to judge whether the website is updated? +### 186. How to bypass the anti-theft connection when crawling pictures and videos +### 187. How large is the amount of data you crawled out of? How often does it take to climb? +### 188. What data inventory is used to climb down the data? Did you do the deployment? How to deploy? +### 189. Incremental crawling +### 190. How to de-duplicate the crawled data, and talk about the specific algorithm basis of scrapy. +### 191. What are the advantages and disadvantages of Scrapy? +### 192. How to set the crawl depth? +### 193. What is the difference between scrapy and scrapy-redis? Why choose redis database? +### 194. What problem does distributed crawler mainly solve? +### 195. What is distributed storage? +### 196. What distributed crawler solutions do you know? +### 197.scrapy-redis, have you done other distributed crawlers? + +# Database +## MySQL +### 198. Primary key Super key Candidate key Foreign key + +Primary key: A combination of data columns or attributes in a database table that uniquely and completely identify the stored data object. A data column can only have one primary key, and the value of the primary key cannot be missing, that is, it cannot be a null value (Null). + +Super key: The set of attributes that can uniquely identify the tuple in the relationship is called the super key of the relationship mode. An attribute can be used as a super key, and multiple attributes can also be used as a super key. Super keys include candidate keys and primary keys. + +Candidate key: It is the smallest super key, that is, the super key without redundant elements. + +Foreign key: The primary key of another table that exists in one table is called the foreign key of this table. + +### 199. The role of the view, can the view be changed? + +Views are virtual tables, which are not the same as tables that contain data. Views only contain queries that dynamically retrieve data when used; they do not contain any columns or data. Using views can simplify complex SQL operations, hide specific details, and protect data; after views are created, they can be used in the same way as tables. + +The view cannot be indexed, nor can it have associated triggers or default values. If there is an order by in the view itself, the order by of the view will be overwritten again. + +Create a view: create view xxx as xxxxxx + +For some views, such as the grouping aggregate function Distinct Union that does not use join subqueries, it can be updated. The update of the view will update the base table; but the view is mainly used to simplify retrieval and protect data, and is not used for updating , And most views cannot be updated. + +### 200. The difference between drop, delete and truncate + +Drop directly deletes the table, truncate deletes the data in the table, and then inserts the auto-increment id from 1 again, delete deletes the data in the table, you can add the word where. + +1. The delete statement executes the delete process to delete a row from the table each time, and at the same time the delete operation of the row is recorded as a transaction and saved in the log for rollback operation. Truncate table deletes all data from the table at one time and does not record a separate delete operation record into the log for storage. Deleted rows cannot be recovered. And the delete trigger related to the table will not be activated during the delete process, and the execution speed is fast. + +2. The space occupied by tables and indexes. When the table is truncate, the space occupied by the table and index will be restored to the initial size, and the delete operation will not reduce the space occupied by the table or index. The drop statement releases all the space occupied by the table. + +3. Generally speaking, drop>truncate>delete + +4. The scope of application. Truncate can only be table, delete can be table and view + +5.truncate and delete only delete data, while drop deletes the entire table (structure and data) + +6.truncate and delete without where: only delete data, without deleting the structure (definition) of the table. The drop statement will delete the constraint (constrain), trigger (trigger) index (index) on which the structure of the table is dependent; depends on The stored procedure/function of the table will be retained, but its status will become: invalid. + +### 201. The working principle and types of indexes + +The database index is a sorted data structure in the database management system to assist in quick query and update the data in the database table. The realization of the index usually uses the B tree and its variant B+ tree. + +In addition to data, the database system also maintains data structures that meet specific search algorithms. These data structures reference (point to) data in a certain way, so that advanced search algorithms can be implemented on these data structures. This data structure is the index. + +There is a price to pay for setting up an index for the table: one is to increase the storage space of the database, and the other is to spend more time when inserting and modifying data (because the index will also change accordingly) +### 202. Connection type +### 203. Thoughts on Database Optimization +### 204. The difference between stored procedures and triggers +### 205. What are pessimistic locks and optimistic locks? +### 206. What are your commonly used mysql engines? What are the differences between the engines? + +## Redis +### 207. How to solve Redis downtime? + +Downtime: The server is out of service' + +If there is only one redis, it will definitely cause data loss and cannot be saved + +For multiple redis or redis clusters, downtime needs to be divided into master-slave mode: + +The slave is down from redis, and the slave redis is configured when the master-slave replication is configured. The slave will read the master redis operation log 1 from the master redis. After the slave library restarts in the redis, it will automatically be added to the master-slave In the architecture, the synchronization of data is automatically completed; + +2, If the slave database is persisted, do not restart the service immediately at this time, otherwise it may cause data loss. The correct operation is as follows: execute SLAVEOF ON ONE on the slave data to disconnect the master-slave relationship and upgrade the slave As the master database, restart the master database at this time, execute SLAVEOF, set it as a slave database, connect to the master redis for master-slave replication, and automatically back up data. + +The above process is easy to configure errors, you can use the sentinel mechanism provided by redis to simplify the above operations. The simple way: the function of the sentinel of redis + +### 208. The difference between redis and mecached, and usage scenarios + +the difference + +1. Both redis and Memcache store data in memory, and both are memory databases. But memcache can also be used to cache other things, such as pictures, videos, etc. + +2. Redis not only supports simple k/v type data, but also provides storage for list, set, hash and other data structures + +3. Virtual memory-redis When the logistics memory is used up, some values ​​that have not been used for a long time can be exchanged to disk + +4. Expiration policy-memcache is specified when set, such as set key1 0 0 8, which means it will never expire. Redis can be set by, for example, expire, such as expire name 10 + +5. Distributed-set up a memcache cluster, use magent to do one master and multiple slaves, redis can do one master and multiple slaves. Can be one master and one cluster + +6. Store data security-After memcache hangs, the data is gone, redis can be saved to disk regularly (persistence) + +7. Disaster recovery-data cannot be recovered after memcache is down, redis data can be recovered by aof after data loss + +8. Redis supports data backup, that is, data backup in master-slave mode + +9. The application scenarios are different. In addition to being used as a NoSQL database, redis can also be used as a message queue, data stack, and data cache; Memcache is suitable for caching SQL statements, data sets, temporary user data, delayed query data and session, etc. + +scenes to be used + +1. If you have long-lasting requirements or have requirements for data types and processing, you should choose redis + +2. If simple key/value storage, you should choose memcached. + +### 209. How to do the Redis cluster solution? What are the solutions? + +1, codis + +The most commonly used cluster solution at present has basically the same effect as twemproxy, but it supports the restoration of data from the old node to the new hash node when the number of nodes changes. + +2 The cluster that comes with redis cluster3.0 is characterized in that its distributed algorithm is not a consistent hash, but the concept of a hash slot, and its own support for node setting slave nodes. See the official introduction for details + +3. Realize in the business code layer, set up several unrelated redis instances, in the code layer, perform hash calculation on the key, and then go to the corresponding redis instance to manipulate the data. This method has relatively high requirements for the hash layer code. Some considerations include alternative algorithm schemes after node failure, dictionary script recovery after data shock, instance monitoring, etc. + +### 210. How does the Redis recycling process work? + +A client ran a new command and added new data. + +Redis checks the memory usage, and if it is greater than the maxmemory limit, it will be recycled according to the set strategy. + +A new command is executed and so on, so we are constantly crossing the boundary of the memory limit, by continuously reaching the boundary and then continuously reclaiming back below the boundary. + +If the result of a command causes a large amount of memory to be used (for example, the intersection of a large set is saved to a new key), it will not take long for the memory limit to be exceeded by this memory usage. + +## MongoDB +### 211. What is the command to update multiple records in MongoDB? +### 212. How does MongoDB expand to multiple shards? + +## Test +### 213. The purpose of writing a test plan is +### 214. Test the keyword trigger module +### 215. Summary of other commonly used written exam URLs +### 216. What are the tasks of testers in the software development process +### 217. What is included in a software bug record? +### 218. Briefly describe the advantages and disadvantages of black box testing and white box testing +### 219. Please list the types of software testing you know, at least 5 items +### 220. What is the difference between Alpha test and Beta test? +### 221. Give examples to illustrate what is a bug? What keywords should a bug report contain? + +## data structure +### 222. Numbers that appear more than half the number of times in the array-Python version +### 223. Find prime numbers within 100 +### 224. The longest substring without repeated characters-Python implementation +### 225. Get 3 liters of water from the pond through 2 5/6 liter kettles +### 226. What is MD5 encryption and what are its characteristics? +### 227. What is symmetric encryption and asymmetric encryption +### 228. The idea of ​​bubble sorting? +### 229. The idea of ​​quick sort? +### 230. How to judge whether there is a ring in a singly linked list? +### 231. Which sorting algorithm do you know (usually through the question test algorithm) +### 232. Fibonacci Sequence + +**Sequence definition: ** + +f 0 = f 1 = 1 +f n = f (n-1) + f (n-2) + +#### By definition + +The speed is very slow, in addition (Attention to the violent stack! ⚠️️) `O(fibonacci n)` + +```python +def fibonacci(n): + if n == 0 or n == 1: + return 1 + return fibonacci(n-1) + fibonacci(n-2) +``` + +#### Linear time + +**Status/Circulation** + +```python +def fibonacci(n): + a, b = 1, 1 + for _ in range(n): + a, b = b, a + b + return a +``` + +**Recursion** + +```python +def fibonacci(n): + def fib(n_, s): + if n_ == 0: + return s[0] + a, b = s + return fib(n_-1, (b, a + b)) + return fib(n, (1, 1)) +``` + +**map(zipwith)** + +```python +def fibs(): + yield 1 + fibs_ = fibs() + yield next(fibs_) + fibs__ = fibs() + for fib in map(lambad a, b: a + b, fibs_, fibs__): + yield fib + + +def fibonacci(n): + fibs_ = fibs() + for _ in range(n): + next(fibs_) + return next(fibs) +``` + +**Do caching** + +```python +def cache(fn): + cached = {} + def wrapper(*args): + if args not in cached: + cached[args] = fn(*args) + return cached[args] + wrapper.__name__ = fn.__name__ + return wrapper + +@cache +def fib(n): + if n <2: + return 1 + return fib(n-1) + fib(n-2) +``` + +**Use funtools.lru_cache for caching** + +```python +from functools import lru_cache + +@lru_cache(maxsize=32) +def fib(n): + if n <2: + return 1 + return fib(n-1) + fib(n-2) +``` + +#### Logarithmic + +**matrix** + +```python +import numpy as np +def fibonacci(n): + return (np.matrix([[0, 1], [1, 1]]) ** n)[1, 1] +``` + +**Not a matrix** + +```python +def fibonacci(n): + def fib(n): + if n == 0: + return (1, 1) + elif n == 1: + return (1, 2) + a, b = fib(n // 2-1) + c = a + b + if n% 2 == 0: + return (a * a + b * b, c * c-a * a) + return (c * c-a * a, b * b + c * c) + return fib(n)[0] +``` + +### 233. How to flip a singly linked list? + +```python +class Node: + def __init__(self,data=None,next=None): + self.data = data + self.next = next + +def rev(link): + pre = link + cur = link.next + pre.next = None + while cur: + temp = cur.next + cur.next = pre + pre = cur + cur = tmp + return pre + +if __name__ =='__main__': + link = Node(1,Node(2,Node(3,Node(4,Node(5,Node(6,Node7,Node(8.Node(9)))))))) + root = rev(link) + while root: + print(roo.data) + root = root.next +``` + + + +### 234. The problem of frog jumping + +A frog wants to jump up n-level steps. It can jump one level or two at a time. How many ways does this frog have to jump up this n-level step? + +Method 1: Recursion + +Suppose there are f(n) ways for a frog to jump on n steps. These n methods are divided into two categories. The first one jumps one step last time. There are f(n-1) kinds of this kind, and the second This method jumped two steps at the last time. There are f(n-2) kinds of this method, and the recursive formula f(n)=f(n-1) + f(n-2) is obtained. Obviously f(1 )=1, f(2)=2. Although this method is simple in code, it is inefficient and will exceed the time limit + +```python +class Solution: + def climbStairs(self,n): + if n == 1: + return 1 + elif n==2: + return 2 + else: + return self.climbStairs(n-1) + self.climbStairs(n-2) +``` + +Method 2: Use loops instead of recursion + +```python +class Solution: + def climbStairs(self,n): + if n==1 or n==2: + return n + a,b,c = 1,2,3 + for i in range(3,n+1): + c = a+b + a = b + b = c + return c +``` + +### 235. Two Sum Two Sum + + + +### 236. Search in Rotated Sorted Array Search in Rotated Sorted Array +### 237. Python implements a Stack data structure +### 238. Write a binary search +### 239. What is the time complexity of using in for set and why? +### 240. There are n positive integers in the range of [0, 1000] in the list, sorted; +### 241. There are methods of composition and inheritance in object-oriented programming to implement new classes +## Big Data +### 242. Find out high-frequency words in 1G files +### 243. Count high-frequency words in a text file of about ten thousand lines +### 244. How to find the most repeated one among the massive data? +### 245. Determine whether the data is in a large amount of data + +## Architecture + +### [Python back-end architecture evolution]() + +This article almost covers the architecture that python will use. In the interview, you can draw the architecture diagram by hand, and talk about the technical selection and pros and cons according to your own project, and the pits you encounter. Absolute bonus. + +## CREDITS + +Original Credits: [kenwoodjw](https://github.com/kenwoodjw) + +English Credits: [jishanshaikh4](https://github.com/jishanshaikh4) + +