{smcl}
{* *! Help file for hist_overlay.ado version 1.4 23Apr2024 UNDER CONSTRUCTION}{...}
{vieweralsosee "[R] histogram" "mansection R histogram"}{...}
{vieweralsosee "[G-2] twoway_histogram" "mansection G-2 graphtwowayhistogram"}{...}
{vieweralsosee "" "--"}{...}
{vieweralsosee "help histogram" "help histogram"}{...}
{vieweralsosee "help twoway histogram" "help twoway_histogram"}{...}
{vieweralsosee "help twoway__histogram_gen" "help twoway__histogram_gen"}{...}
{vieweralsosee "help return" "help return"}{...}
{vieweralsosee "help marker labels" "help marker_label_options"}{...}
{vieweralsosee "help niceloglabels" "help niceloglabels"}{...}
{vieweralsosee "search niceloglabels" "search niceloglabels"}{...}
{viewerjumpto "Syntax" "hist_overlay##syntax"}{...}
{viewerjumpto "Options" "hist_overlay##options"}{...}
{viewerjumpto "Description" "hist_overlay##description"}{...}
{viewerjumpto "Examples" "hist_overlay##examples"}{...}
{viewerjumpto "Returned results" "hist_overlay##returned"}{...}
{viewerjumpto "Requirements" "hist_overlay##requirements"}{...}
{viewerjumpto "Author" "hist_overlay##author"}{...}
{title:Title}
{p2colset 5 22 26 2}{...}
{p2col :{cmd:hist_overlay} {hline 2}}Overlay one histogram on another,
so both are visible{p_end}
{p2colreset}{...}
{marker syntax}{...}
{title:Syntax}
{p 8 17 2}
{cmd:hist_overlay}
{it:xvar}
{ifin}
{cmd:,}
{opt over(ovar)}
[
{it:optional_options}
]
{marker options}{...}
{title:Options}
{synoptset 20 tabbed}{...}
{synopthdr}
{synoptline}
{syntab:Required -option-}
{synopt:{opt over(ovar)}}The two overlaid histograms are constructed for the
two discrete values of this variable.{p_end}
{syntab:Official {help histogram} -options-}
{synopt :{opt ti:tle(tinfo)}}overall title{p_end}
{synopt :{opt sub:title(tinfo)}}subtitle of title{p_end}
{synopt :{opt freq:uency}}draw as frequencies (default). With two overlaid histograms
the {opt freq:uency} option produces the most unambiguous graph.{p_end}
{synopt :{opt frac:tion}}draw as fractions. The fractions add to 1.0 for
each of the two histograms, separately. {p_end}
{synopt :{opt den:sity}}draw as density. The density integrates to 1.0 for
each of the two histograms, separately.{p_end}
{synopt :{opt addl:abels}}add height labels to bars{p_end}
{synopt :{opt addlabop:ts(label_options)}}affect rendition of labels.
See {help marker_label_options:marker label options}{p_end}
{synopt :{opt start(#)}}set lower limit of leftmost bin to {it:#}, any real value.{p_end}
{synopt :{opt w:idth(#)}}set width of bins to {it:#}, a positive real value. Applies to both histograms.{p_end}
{synopt :{opt bin(#)}}set number of bins to {it:#}, a positive integer.
Because this number applies to both histograms,
typical use is to omit this option, allowing the number of bins to be determined by the options
{opt start(#)} and {opt width(#)}.{p_end}
{synopt :{opt barw:idth(#)}}set width of bars to {it:#}, a positive real value.
The width of the bars may be less than or, rarely, greater than the width of the bins.
By default the {opt barw:idth(#)} equals the bin width. Applies to both histograms.{p_end}
{syntab:hist_overlay -options-}
{synopt:{opt color1(colorstyle)}}this option and its pair, {opt color2(colorstyle)},
give the user optional control over the color and, in Stata 15, the opacity
of the two histograms. See {help colorstyle}.
The default colors in Stata 15 for the two values of the {opt over(ovar)}
are navy%50 and red%50 respectively. When they are overlaid, the color is purple.{p_end}
{synopt:{opth xline1(real,added line options)}}this option and its pair, {opt xline2(real)},
allow the user to specify that a vertical line be constructed using the same color
used to display one or the other of the two histograms.
The available {it:added line options} are those that modify the rendition of the line
and are documented {help added_line_options:here}.
The default colors in Stata 15 for the two values of the {opt over(ovar)} variable
are navy and red respectively.
These options can be combined with the official Stata
option {help added_line_options:xline}.{p_end}
{synopt:{opt meanlines}}Draw a vertical line at the mean of each of the two distributions.
This option is a shortcut to previously computing the mean of each distribution
and then specifying the options {opt xline1(`mean1')} {opt xline2(`mean2')}.{p_end}
{synopt:{opth meanopts(added line options)}}The {it:added line options} applied to both
of the two mean lines.{p_end}
{synopt:{opt normal:den}}Superimpose a normal distribution on each of the two histograms.
Emulates {help histogram}'s option {opt normal}.{p_end}
{synopt:{opth normopts(added line options)}}Options applied to both of the normal distribution curves{p_end}
{synopt:{opt normscale(#)}}Scaling factor to adjust the height of the two normal distribution curves.
The default value is 0.8.{p_end}
{synopt :{opt xlog}}Construct the histograms on the {help log10} transformation of {it:xvar}.
Requires Nick Cox's {help hist_overlay##niceloglabels:program}
{stata `"view net describe niceloglabels, from(http://fmwww.bc.edu/RePEc/bocode/n)"':niceloglabels}.
{p_end}
{synopt :{opt pow:ers}}Display x-axis labels as powers of 10. Only useful with
the option {opt xlog}.
Requires Nick Cox's program
{stata `"view net describe niceloglabels, from(http://fmwww.bc.edu/RePEc/bocode/n)"':niceloglabels}.
{p_end}
{synopt :{opt plusone}}Add 1.0 to {it:xvar} before transforming it with the {help log10}
transformation. Only useful with the option {opt xlog}.{p_end}
{synopt :{opt v:erbose}}Provides additional output on the construction of the histogram bins.{p_end}
{synopt :{opt *}}All other options are passed to {help histogram}{p_end}
{marker description}{...}
{title:Description}
{pstd}
While there are many situations in which two overlaid or superimposed histograms
are a useful visualization tool, a prominent use is in impact evaluation or event studies.
In such studies, the plausibility of the estimated impact of the treatment or the event
is substantially enhanced if the observations receiving the treatment
can be shown to be comparable on observed covariates
(and/or on a help {help teffects psmatch:propensity score})
to the observations that do not receive it.
The most common tool for analyzing the degree of comparability is a
"balance table", which compares the means (and possibly the variances) of key covariates
between the control and treatment groups. Stata's command {help tebalance summarize}
performs this task, but at time of writing requires that the user estimate
the treatment model before producing the balance table.
In principle, the balance table should be examined before choosing and executing the evaluation method.
(User contributed programs fill this gap and provide additional statistics to compare the two groups.
See my program
{stata `"view net describe baltable, from("C:\Users\Mead\Documents\CGD_Files\CGD_Stata_Site\MO\Misc")"':baltable}
and the other community contributed programs cited in the
{help baltable##thanks:help file}.){p_end}
{pstd}
Instead of, or in addition to, comparing the summary statistics of covariates
in the treatment and control observations, one can compare the entire distribution
of a covariate in one group to its distribution in the other group. The analyst may want to exclude from
from analysis any observations in one or the other group
that are not "matched" by observations in the other group.
That is, one may want to exclude from the analysis observations on which the values
of key exogenous or pre-determined covariates lie outside the domain of overlap or of
"common support". Stata's commands {help tebalance density} and {help tebalance box}
provide visualizations of overlap or common support, but require the prior execution
of the impact evaluation or event study estimation.{p_end}
{pstd}
UCLA Statistical Consulting Group's website
{browse "https://stats.oarc.ucla.edu/stata/faq/how-can-i-overlay-two-histograms/":website}
demonstrates a simple approach to overlaying two histograms using {help twoway:twoway (histogram y x if z==0) (histogram y x if z==1)}
such that both are visible and can be distinguished from one another.
(accessed 1May, 2024){p_end}
{pstd}
This program, {cmd:hist_overlay} produces a similar result to UCLA's code,
but automates several featurs the user could otherwise implement manually.
{cmd:hist_overlay}'s options {opt color1()} and {opt color2()} designate the
colors assigned to the two histograms. Option {opt meanlines} constructs
a color-coordinated vertical line for the mean of each distribution.
Option {opt normal:den} overlays a normal density function on each distribution.
Option {opt xlog} performs these functions on the log transform
of {it:xvar}. Option {opt powers} converts the x-axis labels to powers of 10.
Option {opt plusone}, if used with option {opt xlog} adds 1.0 to {it:xvar}
before the log-transformation (in order to retain zero values).{p_end}
{pstd}
For further analysis of the control and treatment observations within
individual histogram bins, {cmd:hist_overlay} returns in {cmd:r(bindata)}
the number of observations in each bin for each group. The option
{opt verbose} displays intermediate results,
including the matrix {cmd:bindata}, in the Results window and log file.
{pstd}
The trick to producing a readable graph of two overlaid histograms
is in the choice of {opth color1(color_style)} and {opth color2(color_style)}.
With {cmd:hist_overlay}'s default choices of
{opt color1(navy%50)} and {opt color1(red%50)}, the area of overlap is colored purple.
With UCLA's choices of
{opt color1(red%30)} and {opt color1(green%30)}, the overlap is colored brown.
The user should experiment to discover the two colors and their overlap color
which best matches the color {help scheme} of their graph settings.{p_end}
{marker examples}{...}
{title:Examples}
{pstd}
The DO-file, {cmd:hist_overlay_example.do},
available from CGD's Stata repository {net "describe http://digital.cgdev.org/doc/stata/MO/Misc/hist_overlay":here},
contains the code to run the following
examples. First, clear memory and load Stata's
{help sysuse:auto} example data.
{com}. clear
. sysuse auto
{txt}(1978 Automobile Data)
{pstd}
{txt}Using Stata's official {help histogram} command, construct a histogram of the variable mpg.{p_end}
{com}.
. * Without options, the frequency histogram of the variable mpg is:
. histogram mpg, frequency name(hist1_0, replace)
{txt}(bin={res}8{txt}, start={res}12{txt}, width={res}3.625{txt})
{res}{txt}
{com}.
. * The data contains a dichotomous variable: -foreign-
. tab foreign, sum(mpg)
{txt}{c |} Summary of Mileage (mpg)
Car origin {c |} Mean Std. dev. Freq.
{hline 12}{c +}{hline 36}
Domestic {c |} {res} 19.826923 4.7432972 52
{txt} Foreign {c |} {res} 24.772727 6.6111869 22
{txt}{hline 12}{c +}{hline 36}
Total {c |} {res} 21.297297 5.7855032 74
{txt}
{com}.
. * Overlay the histograms of mpg for domestic and foreign cars
. * (Note that the frequency option is the default for -hist_overlay-.)
. hist_overlay mpg, over(foreign) name(hist2_0,replace)
{res}{txt}
{com}.
. * Compare the results from -histogram- and -hist_overlay- side by side
. graph combine hist1_0 hist2_0, ycommon name(no_options, replace)
{stata hist_overlay_viewgph no_options:click to see the combined graph}
{res}{txt}
{com}.
. * To make it easier to compare the two depictions of the data,
. * specify common starting points, widths and styles for the bars
. * and similarly formatted legends.
. histogram mpg, frequency start(10) width(2.5) fcolor(none) lcolor(black) ///
> legend(on order(1 "All cars") rows(1)) name(hist1_1, replace)
{txt}(bin={res}13{txt}, start={res}10{txt}, width={res}2.5{txt})
{res}{txt}
{com}.
. hist_overlay mpg, over(foreign) start(10) width(2.5) name(hist2_1, replace)
{res}{txt}
{com}.
. * Compare them side by side
. * The histograms now resemble one another.
. gr combine hist1_1 hist2_1, ycommon name(no_lbls, replace)
{stata hist_overlay_viewgph no_lbls:click to see the combined graph}
{res}{txt}
{com}.
. * Add labels to make it easier to read the height of the bars
. histogram mpg, frequency start(10) width(2.5) fcolor(none) lcolor(black) ///
> legend(on order(1 "All cars") rows(1)) addlabels name(hist1_2, replace)
{txt}(bin={res}13{txt}, start={res}10{txt}, width={res}2.5{txt})
{res}{txt}
{com}.
. hist_overlay mpg, over(foreign) frequency start(10) width(2.5) addlabels name(hist2_2, replace)
{res}{txt}
{com}.
. * Compare them side by side
. * Note that the heights of the two overlaid bars on the right
. * sum to the height of the corresponding bar on the left.
. gr combine hist1_2 hist2_2, ycommon name(withlbls, replace)
{stata hist_overlay_viewgph withlbls:click to see the combined graph}
{res}{txt}
{com}.
. * It's apparent from the overlaid histogram that foreign cars
. * achieve higher mileage. To emphasize this, add a vertical line
. * at the means of the distributions
.
. * Compute the overall mean
. sum mpg
{txt} Variable {c |} Obs Mean Std. dev. Min Max
{hline 13}{c +}{hline 57}
{space 9}mpg {c |}{res} 74 21.2973 5.785503 12 41
{txt}
{com}. local mpgall = `r(mean)'
{txt}
{com}. local mpgalltxt : di %9.1f `mpgall'
{res}{txt}
{com}.
. * Compute mean for domestic cars
. sum mpg if foreign==0
{txt} Variable {c |} Obs Mean Std. dev. Min Max
{hline 13}{c +}{hline 57}
{space 9}mpg {c |}{res} 52 19.82692 4.743297 12 34
{txt}
{com}. local mpgdom = `r(mean)'
{txt}
{com}. local mpgdomtxt : di %4.1f `mpgdom'
{res}{txt}
{com}.
. * Compute mean for foreign cars
. sum mpg if foreign==1, meanonly
{txt}
{com}. local mpgfor = `r(mean)'
{txt}
{com}. local mpgfortxt : di %4.1f `mpgfor'
{res}{txt}
{com}.
. * Add a vertical line to mark the overall mean and label the line
. * Superimpose a normal distribution.
. histogram mpg, frequency start(10) width(2.5) fcolor(none) lcolor(black) ///
> legend(on order(1 "All cars") rows(1)) addlabels ///
> normal xline(`mpgall',lcolor(black) lwidth(thick)) ///
> text(18 `mpgall' "Overall" "`mpgalltxt' mpg", orientation(vertical)) ///
> yscale(range(0 20)) name(hist1_3, replace)
{txt}(bin={res}13{txt}, start={res}10{txt}, width={res}2.5{txt})
{res}{txt}
{com}.
. * Add a vertical line to mark each of the two means and label them.
. * Use options -meanlines- and -meanopts- to place the vertical lines.
. * Superimpose a normal distribution on each histogram using options
. * -normalden- and -normopts-.
. hist_overlay mpg, over(foreign) start(10) width(2.5) addlabels ///
> meanlines meanopts(lwidth(thick)) normalden normopts(lwidth(thick)) ///
> text(18 `mpgdom' "Domestic" "`mpgdomtxt' mpg", orientation(vertical)) ///
> text(18 `mpgfor' "Foreign" "`mpgfortxt' mpg", orientation(vertical)) ///
> yscale(range(0 20)) name(hist2_3, replace)
{res}{txt}
{com}.
. * Compare the two histograms, this time on top of one another
. gr combine hist1_3 hist2_3, ycommon col(1) name(withlines, replace)
{stata hist_overlay_viewgph withlines:click to see the combined graph}
{res}{txt}
{com}.
. * For variables with right-skewed distributions, like -displacement-,
. * the option -xlog- plots the two histograms with respect to the log
. * of the variable being analyzed. The bars have equal widths on the log scale.
. * Options -meanlines- and -normalden- also use the log scale.
. * Stata's -histogram- comand does not have a comparable option.
. hist_overlay displacement, over(foreign) addlabels xlog ///
> meanlines meanopts(lwidth(thick)) normalden normopts(lwidth(thick)) ///
> name(withxlog, replace)
{stata hist_overlay_viewgph withxlog:click to see the combined graph}
{res}{txt}
{marker returned}
{title:Returned results}
{pstd}{txt}
The program returns the following results:{p_end}
{txt}macros:
r(start) Starting value of the first bar
r(width) Width of the bins
r(barwidth) Width of the bars (by default equal to width of the bins)
r(mean1) Mean of observations in Group 0 as defined by {it:ovar} (With option {opt meanlines})
r(mean2) Mean of observations in Group 1 as defined by {it:ovar} (With option {opt meanlines})
r(normscale) Scaling factor for option {opt normal:den}
r(maxbins) Maximum of the number of bins in the two histograms. Number of rows in r(bindata)
matrices:
r(bindata) Data on the heights and centers of both sets of bars
{marker requirements}
{title:Requirements}
{pstd}
Options {opt xlog} and {opt powers} require Nick Cox's program
{stata `"view net describe niceloglabels, from(http://fmwww.bc.edu/RePEc/bocode/n)"':niceloglabels}.
See: {p_end}
{marker niceloglabels}
{phang}
Cox, N. J. (2018) "Speaking Stata: Logarithmic binning and labeling", Stata Journal,
{browse "https://www.stata-journal.com/article.html?article=gr0072":18:1}
{p_end}
{marker author}{...}
{title:Author}
{phang}{browse "http://www.cgdev.org/expert/mead-over/":Mead Over}, Senior Fellow, Center for Global Development{p_end}
{phang}Email: {browse "mailto:mover@cgdev.org":MOver@CGDev.Org} if you observe any problems. {p_end}
{* Version history of this help file}
{* Ver 0.0 30Jan2019 for hist_overlay.ado ver 1.0 of 11Jan2019}
{* Ver 1.2 20Apr2019 Document -addlabels, -addlabopts-, corrects typos.}
{* Ver 1.3 26June2020 Edit the examples section and fix typos.}
{* Ver 1.3 26June2020 Edit the examples section and fix typos.}
{* Ver 1.31 22May2022 Add link to UCLA's help page on overlaying histograms }
{* Ver 1.32 17Apr2024 Correct name of option -addlabopts()-}
{* Ver 1.33 22Apr2024 Restore the option -barwidth()-}
{* Ver 1.40 23Apr2024 UNDER CONSTRUCTION. Must document additional options.}