plotnine作图基本命令(plotnine Cheat Sheet)

如果说plotly提供了强大的交互式作图工具,那么我认为plotnine则是静态作图的绝佳选择(至少从现阶段和趋势来看问题不大),所以我觉得Python数据可视化掌握三个工具应该就不错了:Matplotlib, Plotly和plotnine。这个文档简单整理了一些plotnine的基本用法,方便自己学习使用。

以下内容主要参考 http://pythonplot.com/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import pandas as pd
import numpy as np

import plotly.plotly as py
import plotly.offline as py_offline
import plotly.graph_objs as go

from plotnine.ggplot import *
from plotnine.geoms import *
from plotnine.coords import *
from plotnine.labels import *
from plotnine.data import *
from plotnine.facets import *
from plotnine.scales import *
1
mpg.head()
manufacturer model displ year cyl trans drv cty hwy fl class
0 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
1 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
2 audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
3 audi a4 2.0 2008 4 auto(av) f 21 30 p compact
4 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact

Bar Chart

1
2
3
4
5
6
(ggplot(mpg) +
aes(x='manufacturer') +
geom_bar(size=20) +
coord_flip() +
ggtitle('Number of Cars by Make')
)

output_4_0

Histogram

1
2
3
(ggplot(mpg) +
aes(x='cty') +
geom_histogram(binwidth=2))

output_6_0

Scatter Plot

1
2
3
4
5
6
(ggplot(mpg) +
aes(x='displ', y='hwy') +
geom_point() +
ggtitle('Engine Displacement in Liters vs Highway MPG') +
xlab('Engine Displacement in Liters') +
ylab('High MPG'))

output_8_0

Time Series

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def geometric_brownian_motion(T = 1, N = 100, mu = 0.1, sigma = 0.01, S0 = 20):        
dt = float(T)/N
t = np.linspace(0, T, N)
W = np.random.standard_normal(size = N)
W = np.cumsum(W)*np.sqrt(dt) ### standard brownian motion ###
X = (mu-0.5*sigma**2)*t + sigma*W
S = S0*np.exp(X) ### geometric brownian motion ###
return S

dates = pd.date_range('2012-01-01', '2019-01-02')
T = (dates.max()-dates.min()).days / 365
N = dates.size
start_price = 100
y = pd.Series(
geometric_brownian_motion(T, N, sigma=0.1, S0=start_price), index=dates)

ts = pd.DataFrame(y)

(ggplot(ts) +
aes('ts.index', 'ts.values') +
geom_line()
)

output_10_0

Scatter Plot with Faceted with Color

1
2
3
4
5
6
(ggplot(mpg) +
aes(x='displ', y='hwy', color='class') +
geom_point() +
ggtitle('Engine Displacement in Liters vs Highway MPG') +
xlab('Engine Displacement in Liters') +
ylab('Highway MPG'))

output_12_0

Scatter Plot with Points Sized by Continuous Value

1
2
3
(ggplot(mpg) +
aes(x='cty', y='hwy', size='cyl') +
geom_point(alpha=0.5))

output_14_0

Scatter Plot Faceted on One Variable

1
2
3
4
(ggplot(mpg.assign(c=mpg['class'])) +
aes(x='displ', y='hwy') +
geom_point() +
facet_wrap(' ~ c', nrow = 2))

output_16_0

Scatter Plot Faceted on Two Variables

1
2
3
4
(ggplot(mpg) +
aes(x='displ', y='hwy') +
geom_point() +
facet_grid('drv~cyl'))

output_18_0

Scatter Plot and Regression Line with 95% Confidence Interval Layered

1
2
3
4
(ggplot(mpg) +
aes('displ', 'hwy') +
geom_point() +
geom_smooth(method='lm'))

output_20_0

Smoothed Line Plot and Scater Plot Layered

1
2
3
4
5
6
7
(ggplot(data=mpg,
mapping=aes(x='displ', y='hwy')) +
geom_point(mapping=aes(color='class')) +
geom_smooth(data=mpg[mpg['class']=='subcompact'],
se=False,
method='loess'
))

output_22_0

Stacked Bar Chart

1
2
3
(ggplot(diamonds) +
aes(x='cut', fill='clarity') +
geom_bar())

output_24_0

Dodged Bar Chart

1
2
3
(ggplot(diamonds) +
aes(x='cut', fill='clarity') +
geom_bar(position = 'dodge'))

output_26_0

Stacked KDE Plot

1
2
3
4
(ggplot(diamonds) +
aes('depth', fill='cut', color='cut') +
geom_density(alpha=0.1) +
xlim(50, 80))

output_28_1

参考资料

我这里只列出了最常用的几种图的最简单的用法,正如ggplot2一样,plotnine也可以做出内容丰富、花样繁多的图来,有兴趣的可以通过以下链接进一步了解:

  1. https://github.com/has2k1/plotnine-examples plotnine开发者提供的例子
  2. https://plotnine.readthedocs.io/en/stable/ plotnine的官方文档