Inconsistent binning when plotting multiple histograms split over a factor #644

jwimberley · 2018-06-12T13:16:41Z

Some minimal reproducible code:

import numpy as np
from ggplot import *
import pandas as pd
N = 1000
coin = np.random.binomial(size=N,n=1,p=0.5)
N1 = np.sum(coin)
N0 = np.sum(1-coin)
values = np.zeros(N)
values[coin==0] = np.random.normal(size=N0,loc=0,scale=1)
values[coin==1] = np.random.normal(size=N1,loc=1,scale=2)
categories = map(lambda x: 'A' if x == 0 else 'B', coin)
dat = pd.DataFrame({'Value' : values,
                    'Category' : categories})
ggplot(dat,aes(x='Value',fill='Category')) + geom_histogram(alpha=0.5) + theme_bw()

The produced graph uses different bin schemas for the two histograms however, making comparisons of counts difficult, unlike ggplot2 in R:

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent binning when plotting multiple histograms split over a factor #644

Inconsistent binning when plotting multiple histograms split over a factor #644

jwimberley commented Jun 12, 2018 •

edited

Loading

Inconsistent binning when plotting multiple histograms split over a factor #644

Inconsistent binning when plotting multiple histograms split over a factor #644

Comments

jwimberley commented Jun 12, 2018 • edited Loading

jwimberley commented Jun 12, 2018 •

edited

Loading