python怎么读取txt文件内容然后分成列表

  txt文件长这样:

python怎么读取txt文件内容然后分成列表


  我们想要将所有单词读取出来并存储到list当中,需要经历以下几个步骤:

  1. 先将空行去掉
    data = open(r'E:\Program Files\PyCharm 2019.2\machinelearning\homework\Emails\Training\spam\3.txt') cab = [] for line in data.readlines(): cab.append(line.strip().split(',')) print(cab)

输出cab:

[[‘You Have Everything To Gain!’], [’’], [‘Incredib1e gains in length of 3-4 inches to yourPenis’, ’ PERMANANTLY’], [’’], [‘Amazing increase in thickness of yourPenis’, ’ up to 30%’], [‘BetterEjacu1ation control’], [‘Experience Rock-HardErecetions’], [‘Explosive’, ’ intenseOrgasns’], [‘Increase volume ofEjacu1ate’], [‘Doctor designed and endorsed’], [‘100% herbal’, ’ 100% Natural’, ’ 100% Safe’], [‘The proven NaturalPenisEnhancement that works!’], [‘100% MoneyBack Guaranteeed’]]

可以看到cab[1]为一个异常值。

  1. 除掉类似cab[1]这样的异常值
    cab_f=[] for i in range(len(cab)): for j in range(len(cab[i])): if cab[i][j] != '': cab_f.append(cab[i][j].strip())

输出cab_f:

[‘You Have Everything To Gain!’, ‘Incredib1e gains in length of 3-4 inches to yourPenis’, ‘PERMANANTLY’, ‘Amazing increase in thickness of yourPenis’, ‘up to 30%’, ‘BetterEjacu1ation control’, ‘Experience Rock-HardErecetions’, ‘Explosive’, ‘intenseOrgasns’, ‘Increase volume ofEjacu1ate’, ‘Doctor designed and endorsed’, ‘100% herbal’, ‘100% Natural’, ‘100% Safe’, ‘The proven NaturalPenisEnhancement that works!’, ‘100% MoneyBack Guaranteeed’]

可以看到我们将list的维数变成了一维,且除去了异常值。

  1. 分割单词
    cab_final = [] for i in cab_f: for j in i.split(' '): cab_final.append(j)

输出cab_final:

[‘You’, ‘Have’, ‘Everything’, ‘To’, ‘Gain!’, ‘Incredib1e’, ‘gains’, ‘in’, ‘length’, ‘of’, ‘3-4’, ‘inches’, ‘to’, ‘yourPenis’, ‘PERMANANTLY’, ‘Amazing’, ‘increase’, ‘in’, ‘thickness’, ‘of’, ‘yourPenis’, ‘up’, ‘to’, ‘30%’, ‘BetterEjacu1ation’, ‘control’, ‘Experience’, ‘Rock-HardErecetions’, ‘Explosive’, ‘intenseOrgasns’, ‘Increase’, ‘volume’, ‘ofEjacu1ate’, ‘Doctor’, ‘designed’, ‘and’, ‘endorsed’, ‘100%’, ‘herbal’, ‘100%’, ‘Natural’, ‘100%’, ‘Safe’, ‘The’, ‘proven’, ‘NaturalPenisEnhancement’, ‘that’, ‘works!’, ‘100%’, ‘MoneyBack’, ‘Guaranteeed’]

可以看到,得到了我们想要的结果!!!

完整代码:

    def read_txt(): data = open(r'E:\Program Files\PyCharm 2019.2\machinelearning\homework\Emails\Training\spam\3.txt') cab = [] for line in data.readlines(): cab.append(line.strip().split(',')) cab_f = [] for i in range(len(cab)): for j in range(len(cab[i])): if cab[i][j] != '': cab_f.append(cab[i][j].strip()) cab_final = [] for i in cab_f: for j in i.split(' '): cab_final.append(j) return cab_final if __name__=='__main__': print(read_txt())