如果说plotly提供了强大的交互式作图工具,那么我认为plotnine则是静态作图的绝佳选择(至少从现阶段和趋势来看问题不大),所以我觉得Python数据可视化掌握三个工具应该就不错了:Matplotlib, Plotly和plotnine。这个文档简单整理了一些plotnine的基本用法,方便自己学习使用。
以下内容主要参考 http://pythonplot.com/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 import pandas as pdimport numpy as npimport plotly.plotly as pyimport plotly.offline as py_offlineimport plotly.graph_objs as gofrom plotnine.ggplot import *from plotnine.geoms import *from plotnine.coords import *from plotnine.labels import *from plotnine.data import *from plotnine.facets import *from plotnine.scales import *
manufacturer
model
displ
year
cyl
trans
drv
cty
hwy
fl
class
0
audi
a4
1.8
1999
4
auto(l5)
f
18
29
p
compact
1
audi
a4
1.8
1999
4
manual(m5)
f
21
29
p
compact
2
audi
a4
2.0
2008
4
manual(m6)
f
20
31
p
compact
3
audi
a4
2.0
2008
4
auto(av)
f
21
30
p
compact
4
audi
a4
2.8
1999
6
auto(l5)
f
16
26
p
compact
Bar Chart
1 2 3 4 5 6 (ggplot(mpg) + aes(x='manufacturer' ) + geom_bar(size=20 ) + coord_flip() + ggtitle('Number of Cars by Make' ) )
Histogram
1 2 3 (ggplot(mpg) + aes(x='cty' ) + geom_histogram(binwidth=2 ))
Scatter Plot
1 2 3 4 5 6 (ggplot(mpg) + aes(x='displ' , y='hwy' ) + geom_point() + ggtitle('Engine Displacement in Liters vs Highway MPG' ) + xlab('Engine Displacement in Liters' ) + ylab('High MPG' ))
Time Series
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 def geometric_brownian_motion (T = 1 , N = 100 , mu = 0.1 , sigma = 0.01 , S0 = 20 ): dt = float (T)/N t = np.linspace(0 , T, N) W = np.random.standard_normal(size = N) W = np.cumsum(W)*np.sqrt(dt) X = (mu-0.5 *sigma**2 )*t + sigma*W S = S0*np.exp(X) return S dates = pd.date_range('2012-01-01' , '2019-01-02' ) T = (dates.max ()-dates.min ()).days / 365 N = dates.size start_price = 100 y = pd.Series( geometric_brownian_motion(T, N, sigma=0.1 , S0=start_price), index=dates) ts = pd.DataFrame(y) (ggplot(ts) + aes('ts.index' , 'ts.values' ) + geom_line() )
Scatter Plot with Faceted with Color
1 2 3 4 5 6 (ggplot(mpg) + aes(x='displ' , y='hwy' , color='class' ) + geom_point() + ggtitle('Engine Displacement in Liters vs Highway MPG' ) + xlab('Engine Displacement in Liters' ) + ylab('Highway MPG' ))
Scatter Plot with Points Sized by Continuous Value
1 2 3 (ggplot(mpg) + aes(x='cty' , y='hwy' , size='cyl' ) + geom_point(alpha=0.5 ))
Scatter Plot Faceted on One Variable
1 2 3 4 (ggplot(mpg.assign(c=mpg['class' ])) + aes(x='displ' , y='hwy' ) + geom_point() + facet_wrap(' ~ c' , nrow = 2 ))
Scatter Plot Faceted on Two Variables
1 2 3 4 (ggplot(mpg) + aes(x='displ' , y='hwy' ) + geom_point() + facet_grid('drv~cyl' ))
Scatter Plot and Regression Line with 95% Confidence Interval Layered
1 2 3 4 (ggplot(mpg) + aes('displ' , 'hwy' ) + geom_point() + geom_smooth(method='lm' ))
Smoothed Line Plot and Scater Plot Layered
1 2 3 4 5 6 7 (ggplot(data=mpg, mapping=aes(x='displ' , y='hwy' )) + geom_point(mapping=aes(color='class' )) + geom_smooth(data=mpg[mpg['class' ]=='subcompact' ], se=False , method='loess' ))
Stacked Bar Chart
1 2 3 (ggplot(diamonds) + aes(x='cut' , fill='clarity' ) + geom_bar())
Dodged Bar Chart
1 2 3 (ggplot(diamonds) + aes(x='cut' , fill='clarity' ) + geom_bar(position = 'dodge' ))
Stacked KDE Plot
1 2 3 4 (ggplot(diamonds) + aes('depth' , fill='cut' , color='cut' ) + geom_density(alpha=0.1 ) + xlim(50 , 80 ))
参考资料
我这里只列出了最常用的几种图的最简单的用法,正如ggplot2一样,plotnine也可以做出内容丰富、花样繁多的图来,有兴趣的可以通过以下链接进一步了解:
https://github.com/has2k1/plotnine-examples plotnine开发者提供的例子
https://plotnine.readthedocs.io/en/stable/ plotnine的官方文档