一、从文件读入
pandas支持文件类型,CSV, general delimited text files, Excel files, json, html tables, HDF5 and STATA。
1.Comma-separated value (CSV) files can be read using read_csv,
>>> from pandas import read_csv>>> csv_data = read_csv(’FTSE_1984_2012.csv’)>>> csv_data = csv_data.values>>> csv_data[:4]array([[’2012-02-15’, 5899.9, 5923.8, 5880.6, 5892.2, 801550000L, 5892.2],[’2012-02-14’, 5905.7, 5920.6, 5877.2, 5899.9, 832567200L, 5899.9],[’2012-02-13’, 5852.4, 5920.1, 5852.4, 5905.7, 643543000L, 5905.7],[’2012-02-10’, 5895.5, 5895.5, 5839.9, 5852.4, 948790200L, 5852.4]], dtype=object)
2、Excel files
使用read_excel函数,需要两个参数,一个文件名,一个sheet名。默认会省略掉第一行数据。
from pandas import read_excel
exceldate=read_excel('score.xlsx','Sheet1');
exceldate=exceldate.values
print type(exceldate)
print exceldate.shape
exceldate[0,:]
Out[6]:
3、STATA files
>>> from pandas import read_stata
>>> stata_data = read_stata(’FTSE_1984_2012.dta’)>>> stata_data = stata_data.values>>> stata_data[:4,:2]array([[ 0.00000000e+00, 4.09540000e+04],[ 1.00000000e+00, 4.09530000e+04],[ 2.00000000e+00, 4.09520000e+04],[ 3.00000000e+00, 4.09490000e+04]])
4、不使用pandas来读取文件内容
对于Excel Files使用xlrd来读取,xlrd,负责读取excel,xlwt,负责写excel模块。
import xlrdwb = xlrd.open_workbook('score.xlsx');sheetnames=wb.sheet_names()sheet = wb.sheet_by_name(sheetnames[0])exceldate=[]for i in xrange(sheet.nrows): exceldate.append(sheet.row_values(i));print '%d rows,'%len(exceldate),'%d columns'%len(exceldate[0])adate=np.empty(len(exceldate))for i in xrange(len(exceldate)): adate[i]=exceldate[i][0];print adate.shapeprint adate5 rows, 7 columns(5L,)[ 12. 15. 51. 65. 45.]
二、保存数据
1、numpy专有格式保存数据npz,
savez_compressed会在保存数据时进行压缩。
x=np.arange(10)y=np.zeros((100,100))np.savez_compressed('date1',x,y)date=np.load('date1.npz')print date['arr_0']np.savez_compressed('date2',x=x,ontherDate=y)date2=np.load('date2.npz');print date2['x'][0 1 2 3 4 5 6 7 8 9][0 1 2 3 4 5 6 7 8 9]
2、保存为csv文件,使用np.savatxt方法。
注意:pandas里面的read_csv和read_excel方法都会省略第一行,默认是标题
from pandas import read_csvx=np.random.randn(10,10);np.savetxt('date1.csv',x,delimiter=',')date=read_csv('date1.csv')date=date.valuesprint x.shapeprint date.shapeprint xprint date[0](10L, 10L)(9L, 10L)[[ 1.77015084 -1.80554159 1.28403537 0.2009891 0.26291606 0.08448012 1.66140115 0.17728159 0.88959083 0.56291309] [ 0.58518743 1.44373927 0.54993558 0.01054313 0.59017053 -0.35133822 -0.42014888 -0.3079049 0.94373013 1.35954942] [-0.54426668 0.04622141 -0.66634713 0.45793767 -0.63685413 0.99976971 -0.39326027 -0.93163258 -0.79656236 0.72966639] [-0.39963295 -1.79753906 0.32433359 0.82947734 1.54987769 2.77115954 0.22080235 -0.60776182 2.57004264 0.59011931] [-0.19130441 -0.12465107 1.40619987 -0.61049826 -0.39827838 -1.25752483 -0.91058091 0.36020845 -0.10908816 1.45316786] [ 0.47408008 -0.28463786 -1.92910625 -0.50288128 -0.06007105 -0.12408027 -0.84164768 -0.42411635 0.69954835 -0.41664136] [ 0.42336169 0.23625584 1.11511232 -1.08894244 -0.79186067 -1.71206423 -0.02372556 -0.71933255 -1.33979181 -0.41698675] [-0.06578197 1.04509307 0.1279905 1.03185255 1.15403322 -0.18110707 -0.60340346 -0.33581049 0.02637558 -1.06997906] [-1.84514777 1.19496964 -1.70550266 1.30863094 -1.48711603 1.55044598 0.64066525 0.39086305 0.15076543 1.42276444] [-1.23244051 -0.03354092 0.84729912 0.15254869 -0.33402971 -0.59486921 -0.28056973 -1.72189462 -0.0156615 -1.22688771]][ 0.58518743 1.44373927 0.54993558 0.01054313 0.59017053 -0.35133822 -0.42014888 -0.3079049 0.94373013 1.35954942]
三、数字精度
任何系统都有数字精度,在python中,数字精度是2.2204 × 10^−16 ,当两个数相差小于这个数时,会认为是相同的两个数。表示的最小和最大数是−1.7976×10^308和 1.7976×10^308.
x1=1eps=np.finfo(float).epsx2=x1+eps/10x1==x2Out[4]:True